CoLoRSMamba: Conditional LoRA-Steered Mamba for Supervised Multimodal Violence Detection

📰 ArXiv cs.AI

arXiv:2604.03329v1 Announce Type: cross Abstract: Violence detection benefits from audio, but real-world soundscapes can be noisy or weakly related to the visible scene. We present CoLoRSMamba, a directional Video to Audio multimodal architecture that couples VideoMamba and AudioMamba through CLS-guided conditional LoRA. At each layer, the VideoMamba CLS token produces a channel-wise modulation vector and a stabilization gate that adapt the AudioMamba projections responsible for the selective st

Published 7 Apr 2026
Read full paper → ← Back to News