Brian Chao - Foveated Diffusion: Efficient Spatially Adaptive Image and Video Generation
Skills:
Image Generation Basics85%
00:00 Intro and Setup
01:02 Why Efficiency Matters
02:48 Two Speedup Paradigms
04:38 Human Vision and Foveation
06:34 Foveated Diffusion Overview
07:45 Mask and Tokenization
09:54 Generation Pipeline
11:09 Naive Artifacts and RoPE Fix
14:39 Training with LoRA Finetune
15:55 Image and Video Results
17:43 Designing Better Masks
21:10 User Study Findings
22:24 Future Directions and Apps
25:27 Website Demo Walkthrough
29:48 Q&A on Speed and Distillation
31:34 Q&A on Token Length and Training
37:45 Closing Remarks
Diffusion and flow matching models have unlocked unprecedented capabilities for creative content creation, such as interactive image and streaming video generation. The growing demand for higher resolutions, frame rates, and context lengths, however, makes efficient generation increasingly challenging, as computational complexity grows quadratically with the number of generated tokens. Their work seeks to optimize the efficiency of the generation process in settings where the user's gaze location is known or can be estimated, for example, by using eye tracking. In these settings, we leverage the eccentricity-dependent acuity of human vision: while a user perceives very high-resolution visual information in a small region around their gaze location (the foveal region), the ability to resolve detail quickly degrades in the periphery of the visual field. Their approach starts with a mask modeling the foveated resolution to allocate tokens non-uniformly, assigning higher token density to foveal regions and lower density to peripheral regions. An image or video is generated in a mixed-resolution token setting, yielding results perceptually indistinguishable from full-resolution generation, while drastically reducing the token count and generation time. To this end, we develop a principled mechanism for constructing mixed-resolution tokens directly from high-resolution data, allowing a foveated diffusion model to be post-trained from an existing base model while mai
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
More on: Image Generation Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Image Captioning API: Auto-Generate Alt Text and Descriptions
Dev.to · Om Prakash
Long video generation blog: Six Approaches, One Decision
Dev.to · Atlas Cloud
Optimasi Kompresi Citra Tanpa Kehilangan Detail (Lossless) pada Data High-Resolution
Medium · Data Science
The Complete Guide to Programmatic Image Generation
Dev.to · Iteration Layer
Chapters (17)
Intro and Setup
1:02
Why Efficiency Matters
2:48
Two Speedup Paradigms
4:38
Human Vision and Foveation
6:34
Foveated Diffusion Overview
7:45
Mask and Tokenization
9:54
Generation Pipeline
11:09
Naive Artifacts and RoPE Fix
14:39
Training with LoRA Finetune
15:55
Image and Video Results
17:43
Designing Better Masks
21:10
User Study Findings
22:24
Future Directions and Apps
25:27
Website Demo Walkthrough
29:48
Q&A on Speed and Distillation
31:34
Q&A on Token Length and Training
37:45
Closing Remarks
🎓
Tutor Explanation
DeepCamp AI