LangFIR: Discovering Sparse Language-Specific Features from Monolingual Data for Language Steering
📰 ArXiv cs.AI
arXiv:2604.03532v1 Announce Type: cross Abstract: Large language models (LLMs) show strong multilingual capabilities, yet reliably controlling the language of their outputs remains difficult. Representation-level steering addresses this by adding language-specific vectors to model activations at inference time, but identifying language-specific directions in the residual stream often relies on multilingual or parallel data that can be expensive to obtain. Sparse autoencoders (SAEs) decompose res
DeepCamp AI