R3G: A Reasoning--Retrieval--Reranking Framework for Vision-Centric Answer Generation
📰 ArXiv cs.AI
arXiv:2602.00104v2 Announce Type: replace-cross Abstract: Vision-centric retrieval for VQA requires retrieving images to supply missing visual cues and integrating them into the reasoning process. However, selecting the right images and integrating them effectively into the model's reasoning remains challenging.To address this challenge, we propose R3G, a modular Reasoning-Retrieval-Reranking framework.It first produces a brief reasoning plan that specifies the required visual cues, then adopts
DeepCamp AI