How AI Vision Evolved | Merve Noyan
In this clip, Merve breaks down how AI vision evolved and explains why it matters in practice.
Dense explanation of how vision evolved and why progress feels incremental now.
🤗 Listen to the full podcast episode
👉 Here: https://youtu.be/SjjCpeTjXIY
Connect with Merve:
- Merve on X — https://x.com/mervenoyann
- Vision Language Models (O'Reilly) — https://www.oreilly.com/library/view/vision-language-models/9798341624030/
Chapters:
- 00:00 How AI Vision Evolved
- 00:12 Vision Transformers
- 01:06 LLaVA
- 01:38 IDEFICS
- 02:06 CLIP + Projection Layer
- 02:54 Interleaving
- 05:42 Segment Anything
Topics covered:
- Vision Transformers
- LLaVA
- IDEFICS
- CLIP + Projection Layer
- Interleaving
Sources mentioned:
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale — https://arxiv.org/abs/2010.11929
- Visual Instruction Tuning project page — https://llava-vl.github.io/
- IDEFICS: an open reproduction of Flamingo — https://huggingface.co/blog/idefics
- CLIP: Connecting text and images — https://arxiv.org/abs/2103.00020
- IDEFICS2 model documentation — https://huggingface.co/docs/transformers/model_doc/idefics2
- Segment Anything — https://arxiv.org/abs/2304.02643
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
More on: CV Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Traffic Light Recognition (TLR) Architecture: 2D Bounding Box Detection
Medium · Machine Learning
2D Gaussian Splatting: when removing a dimension makes 3D better
Medium · AI
"Mastering Digital Logic Counters with C++ OOP: A Hands-On Guide”
Dev.to · Abdullah Fiaz
Como o pensamento computacional me ajudou a estruturar minhas entregas
Medium · Programming
🎓
Tutor Explanation
DeepCamp AI