What I Learned From Implementing LLM Architectures From Scratch (And How to Get Started)

Sebastian Raschka · Beginner ·🧠 Large Language Models ·19h ago

Skills: LLM Engineering90%

LLM Architecture Gallery: https://llm-gallery.com In this talk, I discuss what we can learn from implementing LLM architectures from scratch in Python and PyTorch. The main idea is that to really understand how modern LLMs work, it helps to inspect the actual implementation details: attention variants, normalization layers, configuration files, KV cache optimizations, and the small architectural choices that often make a model work correctly. I also walk through how I approach new open-weight models, how I compare them against reference implementations, and what broader architecture trends emerge from looking at many recent LLMs. Chapters: 00:00 Introduction 01:15 Running LLMs locally in Python 02:30 What "Python" means in practice: PyTorch and hardware backends 04:45 The LLM ecosystem: training and inference tools 07:25 Why implementation details matter 09:35 From model releases to architecture diagrams 12:00 Reading model cards and config files 15:50 Debugging architecture implementations 18:00 Comparing against Hugging Face Transformers 20:30 A Gemma 3 RMSNorm example 24:00 A 12-step workflow for understanding new architectures 25:15 The LLM Architecture Gallery 26:10 Architecture trends across recent LLMs 27:30 KV cache motivation 29:40 Grouped-query attention 32:30 Multi-head latent attention 36:40 Sliding window attention 40:00 Sparse and selective attention trends 42:00 KV cache quantization 43:30 LLMs inside agentic software harnesses 46:10 Getting started with LLMs from scratch 48:00 When to use libraries instead of from-scratch code 50:00 Transparent open-source training codebases 52:00 Build a Reasoning Model From Scratch

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

More on: LLM Engineering

View skill →

Build an LLM and RAG-based Chat Application using AlloyDB and LangChain

FULLY LOCAL Mistral AI PDF Processing [Hands-on Tutorial]

FULLY LOCAL Mistral AI PDF Processing [Hands-on Tutorial]

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Ultimate Guide: Deploy Google ADK Agents to Vertex AI & Cloud Run (Step-by-Step Tutorial)

Ultimate Guide: Deploy Google ADK Agents to Vertex AI & Cloud Run (Step-by-Step Tutorial)

Shane | LLM Implementation

How to Make an Asteroids Game Bot (LIVE)

How to Make an Asteroids Game Bot (LIVE)

Using Claude Code + Nano Banana Pro To Create a Dataset of Engineering Drawings

Using Claude Code + Nano Banana Pro To Create a Dataset of Engineering Drawings

Automata Learning Lab

Related AI Lessons

I Built the Same B2B Document Extractor Twice: Rules vs. LLM

Compare rule-based and LLM-based approaches for B2B document extraction using pytesseract, Ollama, and LLaMA 3

Towards Data Science

35 ChatGPT Prompts for Speech Language Pathologists (Claude, ChatGPT & DeepSeek)

Boost productivity with 35 ChatGPT prompts tailored for Speech Language Pathologists to streamline tasks like SOAP notes and parent education handouts

35 ChatGPT Prompts for Compliance Officers (Claude, ChatGPT & DeepSeek)

Learn how to leverage ChatGPT prompts for compliance officers to streamline tasks and improve efficiency in managing audits, training, and risk reports

35 ChatGPT Prompts for Criminal Defense Attorneys (Claude, ChatGPT & DeepSeek)

Learn 35 ChatGPT prompts to help criminal defense attorneys streamline their workflow and improve case outcomes

Chapters (24)

Introduction

1:15 Running LLMs locally in Python

2:30 What "Python" means in practice: PyTorch and hardware backends

4:45 The LLM ecosystem: training and inference tools

7:25 Why implementation details matter

9:35 From model releases to architecture diagrams

12:00 Reading model cards and config files

15:50 Debugging architecture implementations

18:00 Comparing against Hugging Face Transformers

20:30 A Gemma 3 RMSNorm example

24:00 A 12-step workflow for understanding new architectures

25:15 The LLM Architecture Gallery

26:10 Architecture trends across recent LLMs

27:30 KV cache motivation

29:40 Grouped-query attention

32:30 Multi-head latent attention

36:40 Sliding window attention

40:00 Sparse and selective attention trends

42:00 KV cache quantization

43:30 LLMs inside agentic software harnesses

46:10 Getting started with LLMs from scratch

48:00 When to use libraries instead of from-scratch code

50:00 Transparent open-source training codebases

52:00 Build a Reasoning Model From Scratch

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)