Benchmarking Deflection and Hallucination in Large Vision-Language Models
📰 ArXiv cs.AI
arXiv:2604.12033v1 Announce Type: cross Abstract: Large Vision-Language Models (LVLMs) increasingly rely on retrieval to answer knowledge-intensive multimodal questions. Existing benchmarks overlook conflicts between visual and textual evidence and the importance of generating deflections (e.g., Sorry, I cannot answer...) when retrieved knowledge is incomplete. These benchmarks also suffer from rapid obsolescence, as growing LVLM training sets allow models to answer many questions without retrie
DeepCamp AI