Decoding by Perturbation: Mitigating MLLM Hallucinations via Dynamic Textual Perturbation

📰 ArXiv cs.AI

arXiv:2604.12424v1 Announce Type: cross Abstract: Multimodal Large Language Models frequently suffer from inference hallucinations, partially stemming from language priors dominating visual evidence. Existing training-free mitigation methods either perturb the visual representation and deviate from the natural image distribution, or enforce intrusive manipulations that compromise the model's inherent generative fluency. We introduce a novel perspective that multimodal hallucination manifests as

Published 15 Apr 2026

Read full paper → ← Back to Reads