Skill-Conditioned Visual Geolocation for Vision-Language

📰 ArXiv cs.AI

arXiv:2604.09025v1 Announce Type: cross Abstract: Vision-language models (VLMs) have shown a promising ability in image geolocation, but they still lack structured geographic reasoning and the capacity for autonomous self-evolution. Existing methods predominantly rely on implicit parametric memory, which often exploits outdated knowledge and generates hallucinated reasoning. Furthermore, current inference is a "one-off" process, lacking the feedback loops necessary for self-evolution based on re

Published 13 Apr 2026

Read full paper → ← Back to Reads