Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 5 - Architectures

Stanford Online · Beginner ·🎨 Image & Video AI ·1w ago

Skills: CV Basics70%

Learn more details about this course: https://online.stanford.edu/courses/cme296-diffusion-and-large-vision-models To follow along with the course schedule and syllabus, visit: https://cme296.stanford.edu/syllabus/ Chapters: 00:00:00 Introduction 00:05:26 Objective 00:09:58 Convolutions, filters 00:14:44 Receptive field 00:17:14 Pooling 00:19:06 U-Net 00:27:52 Timestep representation 00:30:31 Class label representation 00:33:21 Timeline of U-Net models 00:35:43 Diffusion Transformer (DiT) 00:48:08 Adaptive layer normalization (adaLN) 01:02:30 DiT end-to-end example 01:12:57 Multimodal DiT (MM-DiT) 01:23:33 Qwen-Image, Z-Image, FLUX.1 01:24:27 Timeline of DiT models 01:25:25 Absolute position embeddings 01:38:48 Rotary position embeddings (RoPE) 01:39:59 2D RoPE variants For more information about Stanford’s graduate programs, visit: https://online.stanford.edu/graduate-education Afshine Amidi is an Adjunct Lecturer at Stanford University. Shervine Amidi is an Adjunct Lecturer at Stanford University. View the course playlist: https://www.youtube.com/playlist?list=PLoROMvodv4rNdy8rt2rZ4T2xM0OjADnfu

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

More on: CV Basics

View skill →

Identify Horses or Humans with TensorFlow and Vertex AI

How to Build and Install OpenCV from Source | Using Visual Studio and CMake | Computer Vision

How to Build and Install OpenCV from Source | Using Visual Studio and CMake | Computer Vision

Building a Dog Breed Identifier App from scratch - DogNet

Building a Dog Breed Identifier App from scratch - DogNet

Aladdin Persson

Apply OpenGL Texturing and Camera Systems

Apply OpenGL Texturing and Camera Systems

Aerial Image Segmentation with PyTorch

Aerial Image Segmentation with PyTorch

How to Install Stable Diffusion - automatic1111

How to Install Stable Diffusion - automatic1111

Sebastian Kamph

Related AI Lessons

I Thought AI Image Tools Were Broken… Until I Realized My Prompts Had No Structure

Learn how to improve AI-generated images by structuring your prompts, a crucial step for reliable results

Medium · ChatGPT

I built a Stable Diffusion playground in 200 lines and zero API keys. Here's how.

Build a Stable Diffusion playground in under 200 lines of code without needing API keys, and explore AI image generation

Dev.to · Devanshu Biswas

What makes an AI image workflow useful for real commercial output?

Learn how to create a useful AI image workflow for commercial output, focusing on repeatability, versatility, and clarity

How to Write Better AI Image Prompts for Midjourney (With Examples That Actually Work)

Learn to write effective AI image prompts for Midjourney with actionable examples and techniques

Medium · ChatGPT

Chapters (18)

Introduction

5:26 Objective

9:58 Convolutions, filters

14:44 Receptive field

17:14 Pooling

19:06 U-Net

27:52 Timestep representation

30:31 Class label representation

33:21 Timeline of U-Net models

35:43 Diffusion Transformer (DiT)

48:08 Adaptive layer normalization (adaLN)

1:02:30 DiT end-to-end example

1:12:57 Multimodal DiT (MM-DiT)

1:23:33 Qwen-Image, Z-Image, FLUX.1

1:24:27 Timeline of DiT models

1:25:25 Absolute position embeddings

1:38:48 Rotary position embeddings (RoPE)

1:39:59 2D RoPE variants

Google says it's making it easier to figure out what images are made with AI.