How Transformers Finally Ate Vision – Isaac Robinson, Roboflow

AI Engineer · Beginner ·👁️ Computer Vision ·1w ago
Skills: CV Basics90%
Vision used to belong to CNNs. This talk explains why that changed, and why transformers only recently started winning for vision despite looking like the less natural fit for images. The answer runs through pretraining, scaling, borrowed infrastructure from the LLM world, and the long arc back to the simple architecture that scales best. Using the evolution from ViT and Swin through ConvNeXt, Hiera, SAM, and RF-DETR, Isaac Robinson walks through what actually made transformer vision systems practical, where the tradeoffs still are, and why deployment flexibility now matters as much as raw benchmark wins. What comes next for VLMs, world models, and physical AI? Speaker info: - https://www.linkedin.com/in/robinsonish/
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

Traffic Light Recognition (TLR) Architecture: 2D Bounding Box Detection
Learn to build a Traffic Light Recognition model using a Fully Convolutional Network and anchor-free approach
Medium · Machine Learning
2D Gaussian Splatting: when removing a dimension makes 3D better
Learn how 2D Gaussian Splatting improves 3D rendering by addressing surface failures
Medium · AI
"Mastering Digital Logic Counters with C++ OOP: A Hands-On Guide”
Learn to implement digital logic counters using C++ and object-oriented programming (OOP) to track events and understand fundamental electronics and computing concepts
Dev.to · Abdullah Fiaz
Como o pensamento computacional me ajudou a estruturar minhas entregas
Learn how computational thinking helped structure deliveries in programming
Medium · Programming
Up next
Yulu Gan - FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos
Cohere
Watch →