Domain-Invariant Prompt Learning for Vision-Language Models
📰 ArXiv cs.AI
arXiv:2603.28555v1 Announce Type: cross Abstract: Large pre-trained vision-language models like CLIP have transformed computer vision by aligning images and text in a shared feature space, enabling robust zero-shot transfer via prompting. Soft-prompting, such as Context Optimization (CoOp), effectively adapts these models for downstream recognition tasks by learning a set of context vectors. However, CoOp lacks explicit mechanisms for handling domain shifts across unseen distributions. To addres
DeepCamp AI