Are Video Reasoning Models Ready to Go Outside?

📰 ArXiv cs.AI

arXiv:2603.10652v2 Announce Type: replace-cross Abstract: In real-world deployment, vision-language models often encounter disturbances such as weather, occlusion, and camera motion. Under such conditions, their understanding and reasoning degrade substantially, revealing a gap between clean, controlled (i.e., unperturbed) evaluation settings and real-world robustness. To address this limitation, we propose ROVA, a novel training framework that improves robustness by modeling a robustness-aware

Published 15 Apr 2026

Read full paper → ← Back to Reads