SafetyDrift: Predicting When AI Agents Cross the Line Before They Actually Do

📰 ArXiv cs.AI

SafetyDrift predicts when AI agents will cross safety boundaries by modeling their actions as Markov chains

advanced Published 31 Mar 2026

Action Steps

Model agent safety trajectories as absorbing Markov chains
Compute the probability of a trajectory reaching a violation within a given number of steps
Use this probability to predict when an AI agent is likely to cross a safety boundary
Take preventative measures to prevent safety violations before they occur

Who Needs to Know This

AI engineers and researchers benefit from this work as it helps predict and prevent safety violations in AI agents, while product managers and entrepreneurs can use it to ensure their AI systems are safe and reliable

Key Insight

💡 Individually safe actions can compound into safety violations, making prediction crucial