CDH-Bench: A Commonsense-Driven Hallucination Benchmark for Evaluating Visual Fidelity in Vision-Language Models
📰 ArXiv cs.AI
CDH-Bench is a benchmark for evaluating visual fidelity in vision-language models by testing their tendency to hallucinate based on commonsense rather than visual evidence
Action Steps
- Identify the problem of commonsense-driven hallucination (CDH) in vision-language models
- Develop a benchmark to evaluate the visual fidelity of these models
- Use the CDH-Bench benchmark to test models' tendency to override visual evidence with commonsense alternatives
- Analyze the results to improve models' reliability and visual fidelity
Who Needs to Know This
AI researchers and engineers working on vision-language models can benefit from this benchmark to evaluate and improve their models' visual fidelity, while data scientists and ML engineers can use it to identify potential flaws in their models
Key Insight
💡 Vision-language models can override visual evidence with commonsense alternatives, leading to unreliable outputs
Share This
💡 Introducing CDH-Bench: a benchmark to evaluate visual fidelity in vision-language models and prevent commonsense-driven hallucination
DeepCamp AI