Diagnosing and Repairing Unsafe Channels in Vision-Language Models via Causal Discovery and Dual-Modal Safety Subspace Projection

📰 ArXiv cs.AI

Researchers propose a framework to diagnose and repair unsafe channels in Vision-Language Models using causal discovery and dual-modal safety subspace projection

advanced Published 31 Mar 2026

Action Steps

Perform causal mediation analysis to identify neurons and layers responsible for unsafe behaviors
Apply dual-modal safety subspace projection to repair unsafe channels
Evaluate the safety and performance of the repaired model
Refine the framework based on experimental results

Who Needs to Know This

AI engineers and researchers working on Vision-Language Models can benefit from this framework to improve model safety and reliability, while data scientists can apply the causal mediation analysis to identify unsafe behaviors

Key Insight

💡 Causal discovery and dual-modal safety subspace projection can be used to identify and repair unsafe channels in Vision-Language Models