Gram-Anchored Prompt Learning for Vision-Language Models via Second-Order Statistics
📰 ArXiv cs.AI
arXiv:2604.03980v1 Announce Type: cross Abstract: Parameter-efficient prompt learning has become the de facto standard for adapting Vision-Language Models (VLMs) to downstream tasks. Existing approaches predominantly focus on aligning text prompts with first-order visual features (i.e., spatial feature maps). While effective for fine-grained semantic discrimination, we argue that relying solely on first-order information is insufficient for robust adaptation, as these spatially entangled feature
DeepCamp AI