LoopGuard: Breaking Self-Reinforcing Attention Loops via Dynamic KV Cache Intervention

📰 ArXiv cs.AI

arXiv:2604.10044v1 Announce Type: new Abstract: Through systematic experiments on long-context generation, we observe a damaging failure mode in which decoding can collapse into persistent repetition loops. We find that this degeneration is driven by collapsed attention patterns, where a subset of heads locks onto a narrow suffix of the history, and is further stabilized by inference-time KV cache reuse. Crucially, since many existing KV cache policies rely on attention-based importance, this co

Published 14 Apr 2026

Read full paper → ← Back to Reads