CDH-Bench: A Commonsense-Driven Hallucination Benchmark for Evaluating Visual Fidelity in Vision-Language Models

📰 ArXiv cs.AI

CDH-Bench is a benchmark for evaluating visual fidelity in vision-language models by testing their tendency to hallucinate based on commonsense rather than visual evidence

advanced Published 31 Mar 2026

Action Steps

Identify the problem of commonsense-driven hallucination (CDH) in vision-language models
Develop a benchmark to evaluate the visual fidelity of these models
Use the CDH-Bench benchmark to test models' tendency to override visual evidence with commonsense alternatives
Analyze the results to improve models' reliability and visual fidelity

Who Needs to Know This

AI researchers and engineers working on vision-language models can benefit from this benchmark to evaluate and improve their models' visual fidelity, while data scientists and ML engineers can use it to identify potential flaws in their models

Key Insight

💡 Vision-language models can override visual evidence with commonsense alternatives, leading to unreliable outputs