TurboAngle: Near-Lossless KV Cache Compression via Uniform Angle Quantization

📰 ArXiv cs.AI

TurboAngle compresses KV cache entries using uniform angle quantization in the Fast Walsh-Hadamard domain

advanced Published 31 Mar 2026
Action Steps
  1. Quantize angles in the Fast Walsh-Hadamard domain to compress KV cache entries
  2. Apply random diagonal rotation to make consecutive element pairs approximately uniformly distributed on the unit circle
  3. Extend the angular quantizer with per-layer early-boost to configure K and V codebook sizes at each layer
  4. Allocate higher precision to critical layers using model-specific subset selection
Who Needs to Know This

This research benefits AI engineers and ML researchers working on model compression and optimization, as it provides a new approach to reducing memory usage while maintaining model performance

Key Insight

💡 Uniform angle quantization in the Fast Walsh-Hadamard domain can achieve near-lossless compression of KV cache entries

Share This
🚀 TurboAngle: Near-Lossless KV Cache Compression via Uniform Angle Quantization 📈
Read full paper → ← Back to News