Yasser Benigmin - Domain Adaptation in the Era of Foundation Models

Cohere · Advanced ·👁️ Computer Vision ·1mo ago
In this presentation, we address domain adaptation in semantic segmentation, where deep learning models rely heavily on large labeled datasets and struggle with domain shift, limiting real-world generalization. We show how Foundation Models (FMs) can be adapted to overcome these challenges under resource constraints through three key contributions. First, we present DATUM, a one-shot unsupervised domain adaptation approach that personalizes text-to-image diffusion models to generate diverse, style-consistent training data from a single target image. Next, we introduce CLOUDS, a collaborative framework in which multiple foundation models, such as CLIP, large language models, diffusion models, and Segment Anything Model, work together to generate synthetic data and automate the creation of high-quality pseudo-labels for self-training, enabling improved domain generalization.. Finally, we discuss FLOSS, a training-free strategy for open-vocabulary segmentation that enhances CLIP’s performance by automatically discovering class-specific “expert” text templates. Yasser Benigmin is a recent PhD graduate in Computer Vision within the Multimedia team at Telecom Paris and the VISTA team at LIX (Laboratoire d'Informatique de l'X) at École Polytechnique, supervised by Stéphane Lathuilière, Vicky Kalogeiton, and Slim Essid. His research focuses on domain adaptation for semantic segmentation leveraging foundation models, with a particular emphasis on resource-constrained scenarios. Previously, he interned at INRIA Paris in the Astra-Vision team, working on open-vocabulary semantic segmentation under Raoul de Charette. Yasser holds an engineering degree from École des Mines de Saint-Étienne and completed an exchange year at EURECOM. This session is brought to you by the Cohere Labs Open Science Community - a space where ML researchers, engineers, linguists, social scientists, and lifelong learners connect and collaborate with each other. We'd like to extend a special thank you
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

Traffic Light Recognition (TLR) Architecture: 2D Bounding Box Detection
Learn to build a Traffic Light Recognition model using a Fully Convolutional Network and anchor-free approach
Medium · Machine Learning
2D Gaussian Splatting: when removing a dimension makes 3D better
Learn how 2D Gaussian Splatting improves 3D rendering by addressing surface failures
Medium · AI
"Mastering Digital Logic Counters with C++ OOP: A Hands-On Guide”
Learn to implement digital logic counters using C++ and object-oriented programming (OOP) to track events and understand fundamental electronics and computing concepts
Dev.to · Abdullah Fiaz
Como o pensamento computacional me ajudou a estruturar minhas entregas
Learn how computational thinking helped structure deliveries in programming
Medium · Programming
Up next
Yulu Gan - FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos
Cohere
Watch →