Phonetic Perturbations Reveal Tokenizer-Rooted Safety Gaps in LLMs
📰 ArXiv cs.AI
arXiv:2505.14226v5 Announce Type: replace-cross Abstract: Safety-aligned LLMs remain vulnerable to digital phenomena like textese that introduce non-canonical perturbations to words but preserve the phonetics. We introduce CMP-RT (code-mixed phonetic perturbations for red-teaming), a novel diagnostic probe that pinpoints tokenization as the root cause of this vulnerability. A mechanistic analysis reveals that phonetic perturbations fragment safety-critical tokens into benign sub-words, suppressi
DeepCamp AI