BERT explained: Training, Inference, BERT vs GPT/LLamA, Fine tuning, [CLS] token
Full explanation of the BERT model, including a comparison with other language models like LLaMA and GPT. I cover topics like: training, inference, fine tuning, Masked Language Models (MLM), Next Sentence Prediction (NSP), [CLS] token, sentence embedding, text classification, question answering, self-attention mechanism. Everything is visually explained step by step.
I also review the background knowledge in order to understand BERT, by starting from an introduction to large language models (LLM) and the attention mechanism.
Slides PDF: https://github.com/hkproj/bert-from-scratch
BERT paper: https://arxiv.org/abs/1810.04805
Chapters
00:00 - Introduction
02:00 - Language Models
03:10 - Training (Language Models)
07:23 - Inference (Language Models)
09:15 - Transformer architecture (Encoder)
10:28 - Input Embeddings
14:17 - Positional Encoding
17:14 - Self-Attention and causal mask
29:14 - BERT (overview)
32:08 - BERT vs GPT/LLaMA
34:25 - Left context and right context
36:36 - BERT pre-training
37:05 - Masked Language Model
45:01 - [CLS] token
48:26 - BERT fine-tuning
49:00 - Text classification
50:50 - Question answering
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
More on: LLM Foundations
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
DeepSeek V4 - almost on the frontier, a fraction of the price
Simon Willison's Blog
Deterministic (Vectorless) vs Semantic (Vector) Retrieval in RAG
Medium · LLM
On-Device AI Market: Demand, Innovation, and Competitive Landscape
Medium · AI
How is AI used in daily life?
Medium · Machine Learning
Chapters (17)
Introduction
2:00
Language Models
3:10
Training (Language Models)
7:23
Inference (Language Models)
9:15
Transformer architecture (Encoder)
10:28
Input Embeddings
14:17
Positional Encoding
17:14
Self-Attention and causal mask
29:14
BERT (overview)
32:08
BERT vs GPT/LLaMA
34:25
Left context and right context
36:36
BERT pre-training
37:05
Masked Language Model
45:01
[CLS] token
48:26
BERT fine-tuning
49:00
Text classification
50:50
Question answering
🎓
Tutor Explanation
DeepCamp AI