Stealthy Backdoor Attacks against LLMs Based on Natural Style Triggers
📰 ArXiv cs.AI
arXiv:2604.21700v1 Announce Type: cross Abstract: The growing application of large language models (LLMs) in safety-critical domains has raised urgent concerns about their security. Many recent studies have demonstrated the feasibility of backdoor attacks against LLMs. However, existing methods suffer from three key shortcomings: explicit trigger patterns that compromise naturalness, unreliable injection of attacker-specified payloads in long-form generation, and incompletely specified threat mo
DeepCamp AI