CM3: A Causal Masked Multimodal Model of the Internet (Paper Explained w/ Author Interview)
#cm3 #languagemodel #transformer
This video contains a paper explanation and an incredibly informative interview with first author Armen Aghajanyan.
Autoregressive Transformers have come to dominate many fields in Machine Learning, from text generation to image creation and many more. However, there are two problems. First, the collected data is usually scraped from the web and uni- or bi-modal and throws away a lot of structure of the original websites, and second, language modelling losses are uni-directional. CM3 addresses both problems: It directly operates on HTML and includes text, hyperlinks, and even images (via VQGAN tokenization) and can therefore be used in plenty of ways: Text generation, captioning, image creation, entity linking, and much more. It also introduces a new training strategy called Causally Masked Language Modelling, which brings a level of bi-directionality into autoregressive language modelling. In the interview after the paper explanation, Armen and I go deep into the how and why of these giant models, we go over the stunning results and we make sense of what they mean for the future of universal models.
OUTLINE:
0:00 - Intro & Overview
6:30 - Directly learning the structure of HTML
12:30 - Causally Masked Language Modelling
18:50 - A short look at how to use this model
23:20 - Start of interview
25:30 - Feeding language models with HTML
29:45 - How to get bi-directionality into decoder-only Transformers?
37:00 - Images are just tokens
41:15 - How does one train such giant models?
45:40 - CM3 results are amazing
58:20 - Large-scale dataset collection and content filtering
1:04:40 - More experimental results
1:12:15 - Why don't we use raw HTML?
1:18:20 - Does this paper contain too many things?
Paper: https://arxiv.org/abs/2201.07520
Abstract:
We introduce CM3, a family of causally masked generative models trained over a large corpus of structured multi-modal documents that can contain both text and image tokens. Our new causally mask
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Yannic Kilcher · Yannic Kilcher · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Imagination-Augmented Agents for Deep Reinforcement Learning
Yannic Kilcher
Learning model-based planning from scratch
Yannic Kilcher
Reinforcement Learning with Unsupervised Auxiliary Tasks
Yannic Kilcher
Attention Is All You Need
Yannic Kilcher
git for research basics: fundamentals, commits, branches, merging
Yannic Kilcher
Curiosity-driven Exploration by Self-supervised Prediction
Yannic Kilcher
World Models
Yannic Kilcher
Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations
Yannic Kilcher
Stochastic RNNs without Teacher-Forcing
Yannic Kilcher
What’s in a name? The need to nip NIPS
Yannic Kilcher
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Yannic Kilcher
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Yannic Kilcher
GPT-2: Language Models are Unsupervised Multitask Learners
Yannic Kilcher
Neural Ordinary Differential Equations
Yannic Kilcher
The Odds are Odd: A Statistical Test for Detecting Adversarial Examples
Yannic Kilcher
Discriminating Systems - Gender, Race, and Power in AI
Yannic Kilcher
Blockwise Parallel Decoding for Deep Autoregressive Models
Yannic Kilcher
S.H.E. - Search. Human. Equalizer.
Yannic Kilcher
Reinforcement Learning, Fast and Slow
Yannic Kilcher
Adversarial Examples Are Not Bugs, They Are Features
Yannic Kilcher
I'm at ICML19 :)
Yannic Kilcher
Population-Based Search and Open-Ended Algorithms
Yannic Kilcher
XLNet: Generalized Autoregressive Pretraining for Language Understanding
Yannic Kilcher
Conversation about Population-Based Methods (Re-upload)
Yannic Kilcher
Reconciling modern machine learning and the bias-variance trade-off
Yannic Kilcher
Learning World Graphs to Accelerate Hierarchical Reinforcement Learning
Yannic Kilcher
Manifold Mixup: Better Representations by Interpolating Hidden States
Yannic Kilcher
Processing Megapixel Images with Deep Attention-Sampling Models
Yannic Kilcher
Gauge Equivariant Convolutional Networks and the Icosahedral CNN
Yannic Kilcher
Auditing Radicalization Pathways on YouTube
Yannic Kilcher
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yannic Kilcher
Dynamic Routing Between Capsules
Yannic Kilcher
DEEP LEARNING MEME REVIEW - Episode 1
Yannic Kilcher
Accelerating Deep Learning by Focusing on the Biggest Losers
Yannic Kilcher
[News] The Siraj Raval Controversy
Yannic Kilcher
LeDeepChef 👨🍳 Deep Reinforcement Learning Agent for Families of Text-Based Games
Yannic Kilcher
The Visual Task Adaptation Benchmark
Yannic Kilcher
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
Yannic Kilcher
AlphaStar: Grandmaster level in StarCraft II using multi-agent reinforcement learning
Yannic Kilcher
SinGAN: Learning a Generative Model from a Single Natural Image
Yannic Kilcher
A neurally plausible model learns successor representations in partially observable environments
Yannic Kilcher
MuZero: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
Yannic Kilcher
Reinforcement Learning Upside Down: Don't Predict Rewards -- Just Map Them to Actions
Yannic Kilcher
NeurIPS 19 Poster Session
Yannic Kilcher
Go-Explore: a New Approach for Hard-Exploration Problems
Yannic Kilcher
Reformer: The Efficient Transformer
Yannic Kilcher
[Interview] Mark Ledwich - Algorithmic Extremism: Examining YouTube's Rabbit Hole of Radicalization
Yannic Kilcher
Turing-NLG, DeepSpeed and the ZeRO optimizer
Yannic Kilcher
Growing Neural Cellular Automata
Yannic Kilcher
NeurIPS 2020 Changes to Paper Submission Process
Yannic Kilcher
Deep Learning for Symbolic Mathematics
Yannic Kilcher
Online Education - How I Make My Videos
Yannic Kilcher
[Rant] coronavirus
Yannic Kilcher
Axial Attention & MetNet: A Neural Weather Model for Precipitation Forecasting
Yannic Kilcher
Agent57: Outperforming the Atari Human Benchmark
Yannic Kilcher
State-of-Art-Reviewing: A Radical Proposal to Improve Scientific Publication
Yannic Kilcher
Dream to Control: Learning Behaviors by Latent Imagination
Yannic Kilcher
POET: Endlessly Generating Increasingly Complex and Diverse Learning Environments and Solutions
Yannic Kilcher
Evaluating NLP Models via Contrast Sets
Yannic Kilcher
[Drama] Who invented Contrast Sets?
Yannic Kilcher
More on: Multimodal LLMs
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
I Tried 10 ChatGPT Resume Prompts. Here's What Actually Got Me Interviews.
Dev.to AI
How does indirect prompt injection work? #tech
Dev.to AI
A Unified View of AI Evolution: From Machine Learning to LLMs, RAG, and Fine-Tuning
Dev.to · Naimul Karim
OpenAI Just Unleashed GPT-5.5 — And It Signals the Next Phase of AI
Medium · AI
Chapters (14)
Intro & Overview
6:30
Directly learning the structure of HTML
12:30
Causally Masked Language Modelling
18:50
A short look at how to use this model
23:20
Start of interview
25:30
Feeding language models with HTML
29:45
How to get bi-directionality into decoder-only Transformers?
37:00
Images are just tokens
41:15
How does one train such giant models?
45:40
CM3 results are amazing
58:20
Large-scale dataset collection and content filtering
1:04:40
More experimental results
1:12:15
Why don't we use raw HTML?
1:18:20
Does this paper contain too many things?
🎓
Tutor Explanation
DeepCamp AI