Benchmarking Deflection and Hallucination in Large Vision-Language Models

📰 ArXiv cs.AI

arXiv:2604.12033v1 Announce Type: cross Abstract: Large Vision-Language Models (LVLMs) increasingly rely on retrieval to answer knowledge-intensive multimodal questions. Existing benchmarks overlook conflicts between visual and textual evidence and the importance of generating deflections (e.g., Sorry, I cannot answer...) when retrieved knowledge is incomplete. These benchmarks also suffer from rapid obsolescence, as growing LVLM training sets allow models to answer many questions without retrie

Published 15 Apr 2026
Read full paper → ← Back to Reads