What I Learned From Implementing LLM Architectures From Scratch (And How to Get Started)

Sebastian Raschka · Beginner ·🧠 Large Language Models ·19h ago
LLM Architecture Gallery: https://llm-gallery.com In this talk, I discuss what we can learn from implementing LLM architectures from scratch in Python and PyTorch. The main idea is that to really understand how modern LLMs work, it helps to inspect the actual implementation details: attention variants, normalization layers, configuration files, KV cache optimizations, and the small architectural choices that often make a model work correctly. I also walk through how I approach new open-weight models, how I compare them against reference implementations, and what broader architecture trends emerge from looking at many recent LLMs. Chapters: 00:00 Introduction 01:15 Running LLMs locally in Python 02:30 What "Python" means in practice: PyTorch and hardware backends 04:45 The LLM ecosystem: training and inference tools 07:25 Why implementation details matter 09:35 From model releases to architecture diagrams 12:00 Reading model cards and config files 15:50 Debugging architecture implementations 18:00 Comparing against Hugging Face Transformers 20:30 A Gemma 3 RMSNorm example 24:00 A 12-step workflow for understanding new architectures 25:15 The LLM Architecture Gallery 26:10 Architecture trends across recent LLMs 27:30 KV cache motivation 29:40 Grouped-query attention 32:30 Multi-head latent attention 36:40 Sliding window attention 40:00 Sparse and selective attention trends 42:00 KV cache quantization 43:30 LLMs inside agentic software harnesses 46:10 Getting started with LLMs from scratch 48:00 When to use libraries instead of from-scratch code 50:00 Transparent open-source training codebases 52:00 Build a Reasoning Model From Scratch
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

I Built the Same B2B Document Extractor Twice: Rules vs. LLM
Compare rule-based and LLM-based approaches for B2B document extraction using pytesseract, Ollama, and LLaMA 3
Towards Data Science
35 ChatGPT Prompts for Speech Language Pathologists (Claude, ChatGPT & DeepSeek)
Boost productivity with 35 ChatGPT prompts tailored for Speech Language Pathologists to streamline tasks like SOAP notes and parent education handouts
Dev.to AI
35 ChatGPT Prompts for Compliance Officers (Claude, ChatGPT & DeepSeek)
Learn how to leverage ChatGPT prompts for compliance officers to streamline tasks and improve efficiency in managing audits, training, and risk reports
Dev.to AI
35 ChatGPT Prompts for Criminal Defense Attorneys (Claude, ChatGPT & DeepSeek)
Learn 35 ChatGPT prompts to help criminal defense attorneys streamline their workflow and improve case outcomes
Dev.to AI

Chapters (24)

Introduction
1:15 Running LLMs locally in Python
2:30 What "Python" means in practice: PyTorch and hardware backends
4:45 The LLM ecosystem: training and inference tools
7:25 Why implementation details matter
9:35 From model releases to architecture diagrams
12:00 Reading model cards and config files
15:50 Debugging architecture implementations
18:00 Comparing against Hugging Face Transformers
20:30 A Gemma 3 RMSNorm example
24:00 A 12-step workflow for understanding new architectures
25:15 The LLM Architecture Gallery
26:10 Architecture trends across recent LLMs
27:30 KV cache motivation
29:40 Grouped-query attention
32:30 Multi-head latent attention
36:40 Sliding window attention
40:00 Sparse and selective attention trends
42:00 KV cache quantization
43:30 LLMs inside agentic software harnesses
46:10 Getting started with LLMs from scratch
48:00 When to use libraries instead of from-scratch code
50:00 Transparent open-source training codebases
52:00 Build a Reasoning Model From Scratch
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →