Why Attend to Everything? Focus is the Key
📰 ArXiv cs.AI
arXiv:2604.03260v1 Announce Type: cross Abstract: We introduce Focus, a method that learns which token pairs matter rather than approximating all of them. Learnable centroids assign tokens to groups; distant attention is restricted to same-group pairs while local attention operates at full resolution. Because all model weights stay frozen, Focus is purely additive: centroid-only training (as few as 148K parameters) improves domain perplexity with zero degradation on downstream benchmarks--from 1
DeepCamp AI