How AI Vision Evolved | Merve Noyan

Hugging Face · Intermediate ·👁️ Computer Vision ·1mo ago
In this clip, Merve breaks down how AI vision evolved and explains why it matters in practice. Dense explanation of how vision evolved and why progress feels incremental now. 🤗 Listen to the full podcast episode 👉 Here: https://youtu.be/SjjCpeTjXIY Connect with Merve: - Merve on X — https://x.com/mervenoyann - Vision Language Models (O'Reilly) — https://www.oreilly.com/library/view/vision-language-models/9798341624030/ Chapters: - 00:00 How AI Vision Evolved - 00:12 Vision Transformers - 01:06 LLaVA - 01:38 IDEFICS - 02:06 CLIP + Projection Layer - 02:54 Interleaving - 05:42 Segment Anything Topics covered: - Vision Transformers - LLaVA - IDEFICS - CLIP + Projection Layer - Interleaving Sources mentioned: - An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale — https://arxiv.org/abs/2010.11929 - Visual Instruction Tuning project page — https://llava-vl.github.io/ - IDEFICS: an open reproduction of Flamingo — https://huggingface.co/blog/idefics - CLIP: Connecting text and images — https://arxiv.org/abs/2103.00020 - IDEFICS2 model documentation — https://huggingface.co/docs/transformers/model_doc/idefics2 - Segment Anything — https://arxiv.org/abs/2304.02643
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

Traffic Light Recognition (TLR) Architecture: 2D Bounding Box Detection
Learn to build a Traffic Light Recognition model using a Fully Convolutional Network and anchor-free approach
Medium · Machine Learning
2D Gaussian Splatting: when removing a dimension makes 3D better
Learn how 2D Gaussian Splatting improves 3D rendering by addressing surface failures
Medium · AI
"Mastering Digital Logic Counters with C++ OOP: A Hands-On Guide”
Learn to implement digital logic counters using C++ and object-oriented programming (OOP) to track events and understand fundamental electronics and computing concepts
Dev.to · Abdullah Fiaz
Como o pensamento computacional me ajudou a estruturar minhas entregas
Learn how computational thinking helped structure deliveries in programming
Medium · Programming
Up next
Yulu Gan - FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos
Cohere
Watch →