Kill-Chain Canaries: Stage-Level Tracking of Prompt Injection Across Attack Surfaces and Model Safety Tiers

📰 ArXiv cs.AI

Researchers propose a stage-decomposed analysis to track prompt injection attacks across LLM agents and model safety tiers

advanced Published 31 Mar 2026

Action Steps

Instrumenting LLM agents with cryptographic canary tokens to track prompt injection attacks
Decomposing the attack pipeline into four kill-chain stages: Exposed, Persisted, Relayed, Executed
Analyzing the activation of model defenses at each pipeline stage
Evaluating the effectiveness of different defense conditions against prompt injection attacks

Who Needs to Know This

AI researchers and engineers on a team benefit from this research as it provides a detailed analysis of prompt injection attacks, while security experts and data scientists can apply these findings to improve model safety

Key Insight

💡 Stage-decomposed analysis can help localize the pipeline stage at which model defenses activate against prompt injection attacks