The Future of Vision in ML | Merve Noyan | HF Podcast #1

Hugging Face · Beginner ·👁️ Computer Vision ·1mo ago
In this episode, we sit down with Merve to talk about where vision AI is heading: from early computer vision systems to modern multimodal models, world models, robotics, and open source AI. We discuss LLaVA, IDEFICS, Vision Transformers, CNNs, JEPA, V-JEPA, Genie 3, OpenClaw, IMCP, PaliGemma, ColPali, ColQwen, and why Hugging Face has become such a central part of the open ecosystem. ## Connect with Merve Noyan, the open-sourceress 👇 - X (twitter): https://x.com/mervenoyann - LinkedIn: https://www.linkedin.com/in/merve-noyan-28b1a113a/ - Personal Site: https://merveenoyan.github.io/me/ - GitHub: https://github.com/merveenoyan ## Chapters 00:00 Intro: vision, Hugging Face, and the future of AI 00:31 Why vision feels different now 03:58 LLaVA, IDEFICS, and multimodal training 08:56 CNNs, ViTs, and older vision architectures 15:46 How vision models could reach everyday users 16:50 World models, JEPA, V-JEPA, Genie 3, and robotics 25:44 OpenClaw, IMCP, and agent safety 28:01 Small vision models, fine-tuning, and getting started 34:39 Why Hugging Face matters in open source AI 42:49 PaliGemma, ColPali, ColQwen, and vision retrieval 47:26 Before Hugging Face: how models were shared 49:48 Mentors, culture, and closing thoughts If you enjoyed the episode, subscribe for more conversations about open models, multimodal systems, and the future of AI.
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

Traffic Light Recognition (TLR) Architecture: 2D Bounding Box Detection
Learn to build a Traffic Light Recognition model using a Fully Convolutional Network and anchor-free approach
Medium · Machine Learning
2D Gaussian Splatting: when removing a dimension makes 3D better
Learn how 2D Gaussian Splatting improves 3D rendering by addressing surface failures
Medium · AI
"Mastering Digital Logic Counters with C++ OOP: A Hands-On Guide”
Learn to implement digital logic counters using C++ and object-oriented programming (OOP) to track events and understand fundamental electronics and computing concepts
Dev.to · Abdullah Fiaz
Como o pensamento computacional me ajudou a estruturar minhas entregas
Learn how computational thinking helped structure deliveries in programming
Medium · Programming

Chapters (12)

Intro: vision, Hugging Face, and the future of AI
0:31 Why vision feels different now
3:58 LLaVA, IDEFICS, and multimodal training
8:56 CNNs, ViTs, and older vision architectures
15:46 How vision models could reach everyday users
16:50 World models, JEPA, V-JEPA, Genie 3, and robotics
25:44 OpenClaw, IMCP, and agent safety
28:01 Small vision models, fine-tuning, and getting started
34:39 Why Hugging Face matters in open source AI
42:49 PaliGemma, ColPali, ColQwen, and vision retrieval
47:26 Before Hugging Face: how models were shared
49:48 Mentors, culture, and closing thoughts
Up next
Yulu Gan - FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos
Cohere
Watch →