Gram-Anchored Prompt Learning for Vision-Language Models via Second-Order Statistics

📰 ArXiv cs.AI

arXiv:2604.03980v1 Announce Type: cross Abstract: Parameter-efficient prompt learning has become the de facto standard for adapting Vision-Language Models (VLMs) to downstream tasks. Existing approaches predominantly focus on aligning text prompts with first-order visual features (i.e., spatial feature maps). While effective for fine-grained semantic discrimination, we argue that relying solely on first-order information is insufficient for robust adaptation, as these spatially entangled feature

Published 7 Apr 2026
Read full paper → ← Back to News