Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 5 - Architectures

Stanford Online · Beginner ·🎨 Image & Video AI ·1w ago
Skills: CV Basics70%
Learn more details about this course: https://online.stanford.edu/courses/cme296-diffusion-and-large-vision-models To follow along with the course schedule and syllabus, visit: https://cme296.stanford.edu/syllabus/ Chapters: 00:00:00 Introduction 00:05:26 Objective 00:09:58 Convolutions, filters 00:14:44 Receptive field 00:17:14 Pooling 00:19:06 U-Net 00:27:52 Timestep representation 00:30:31 Class label representation 00:33:21 Timeline of U-Net models 00:35:43 Diffusion Transformer (DiT) 00:48:08 Adaptive layer normalization (adaLN) 01:02:30 DiT end-to-end example 01:12:57 Multimodal DiT (MM-DiT) 01:23:33 Qwen-Image, Z-Image, FLUX.1 01:24:27 Timeline of DiT models 01:25:25 Absolute position embeddings 01:38:48 Rotary position embeddings (RoPE) 01:39:59 2D RoPE variants For more information about Stanford’s graduate programs, visit: https://online.stanford.edu/graduate-education Afshine Amidi is an Adjunct Lecturer at Stanford University. Shervine Amidi is an Adjunct Lecturer at Stanford University. View the course playlist: https://www.youtube.com/playlist?list=PLoROMvodv4rNdy8rt2rZ4T2xM0OjADnfu
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

I Thought AI Image Tools Were Broken… Until I Realized My Prompts Had No Structure
Learn how to improve AI-generated images by structuring your prompts, a crucial step for reliable results
Medium · ChatGPT
I built a Stable Diffusion playground in 200 lines and zero API keys. Here's how.
Build a Stable Diffusion playground in under 200 lines of code without needing API keys, and explore AI image generation
Dev.to · Devanshu Biswas
What makes an AI image workflow useful for real commercial output?
Learn how to create a useful AI image workflow for commercial output, focusing on repeatability, versatility, and clarity
Dev.to AI
How to Write Better AI Image Prompts for Midjourney (With Examples That Actually Work)
Learn to write effective AI image prompts for Midjourney with actionable examples and techniques
Medium · ChatGPT

Chapters (18)

Introduction
5:26 Objective
9:58 Convolutions, filters
14:44 Receptive field
17:14 Pooling
19:06 U-Net
27:52 Timestep representation
30:31 Class label representation
33:21 Timeline of U-Net models
35:43 Diffusion Transformer (DiT)
48:08 Adaptive layer normalization (adaLN)
1:02:30 DiT end-to-end example
1:12:57 Multimodal DiT (MM-DiT)
1:23:33 Qwen-Image, Z-Image, FLUX.1
1:24:27 Timeline of DiT models
1:25:25 Absolute position embeddings
1:38:48 Rotary position embeddings (RoPE)
1:39:59 2D RoPE variants
Up next
Google says it's making it easier to figure out what images are made with AI.
The Verge
Watch →