Why Attend to Everything? Focus is the Key

📰 ArXiv cs.AI

arXiv:2604.03260v1 Announce Type: cross Abstract: We introduce Focus, a method that learns which token pairs matter rather than approximating all of them. Learnable centroids assign tokens to groups; distant attention is restricted to same-group pairs while local attention operates at full resolution. Because all model weights stay frozen, Focus is purely additive: centroid-only training (as few as 148K parameters) improves domain perplexity with zero degradation on downstream benchmarks--from 1

Published 7 Apr 2026
Read full paper → ← Back to News