The Geometry of Harmful Intent: Training-Free Anomaly Detection via Angular Deviation in LLM Residual Streams

📰 ArXiv cs.AI

LatentBiopsy detects harmful prompts in LLMs using angular deviation in residual streams without training

advanced Published 31 Mar 2026

Action Steps

Compute the leading principal component of activations for 200 safe normative prompts at a target layer
Characterise new prompts by their radial deviation angle from the reference direction
Calculate the anomaly score as the negative log-likelihood of the deviation angle
Use the anomaly score to detect harmful prompts

Who Needs to Know This

AI researchers and engineers working on LLMs can benefit from this method to detect harmful prompts, and it can be used by product managers to improve the safety of their AI-powered products

Key Insight

💡 Angular deviation in residual streams can be used to detect harmful prompts in LLMs without training