One Token Away from Collapse: The Fragility of Instruction-Tuned Helpfulness
📰 ArXiv cs.AI
arXiv:2604.13006v1 Announce Type: cross Abstract: Instruction-tuned large language models produce helpful, structured responses, but how robust is this helpfulness when trivially constrained? We show that simple lexical constraints (banning a single punctuation character or common word) cause instruction-tuned LLMs to collapse their responses, losing 14--48% of comprehensiveness in pairwise evaluation across three open-weight model families and one closed-weight model (GPT-4o-mini). The baseline
DeepCamp AI