Self-Training improves Pre-Training for Natural Language Understanding
Skills:
LLM Foundations80%
This video explains a new paper that shows benefits by Self-Training after Language Modeling to improve the performance of RoBERTa-Large. The paper goes on to show Self-Training gains in Knowledge Distillation and Few-Shot Learning as well. They also introduce an interesting unlabeled data filtering algorithm, SentAugment that improves performance and reduces the computational cost of this kind of self-training looping. Thanks for watching! Please Subscribe!
Paper Links:
Paper Link: https://arxiv.org/pdf/2010.02194.pdf
Distributed Representations of Words and Phrases: https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf
Rethinking Pre-training and Self-training: https://arxiv.org/pdf/2006.06882.pdf
Don't Stop Pretraining: https://arxiv.org/pdf/2004.10964.pdf
Universal Sentence Encoder: https://arxiv.org/abs/1803.11175
Common Crawl Corpus: https://commoncrawl.org/the-data/
Fairseq: https://github.com/pytorch/fairseq
BERT: https://arxiv.org/pdf/1810.04805.pdf
Noisy Student: https://arxiv.org/abs/1911.04252
POET: https://arxiv.org/pdf/1901.01753.pdf
PET - Small Language Models are Also Few-Shot Learners: https://arxiv.org/pdf/2009.07118.pdf
Chapters:
0:00 Introduction
1:50 Background on Transfer Learning
2:40 Self-Training
5:25 Not all unlabeled data is equally useful
6:54 SentAugment Retrieval and Filtering
12:55 Experimental Data
14:55 Results
18:15 Some Interesting Details
19:02 Ablations
20:20 Nearest Neighbor Visualization
21:05 Computational Cost of Self-Training
22:30 Few-Shot Learning comparison with GPT-3, PET
23:52 Phases of Representation Learning
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Connor Shorten · Connor Shorten · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
DenseNets
Connor Shorten
DeepWalk Explained
Connor Shorten
Inception Network Explained
Connor Shorten
StackGAN
Connor Shorten
StyleGAN
Connor Shorten
Progressive Growing of GANs Explained
Connor Shorten
Improved Techniques for Training GANs
Connor Shorten
Word2Vec Explained
Connor Shorten
Must Read Papers on GANs
Connor Shorten
Unsupervised Feature Learning
Connor Shorten
Self-Supervised GANs
Connor Shorten
Embedding Graphs with Deep Learning
Connor Shorten
Transfer Learning in GANs
Connor Shorten
ReLU Activation Function
Connor Shorten
AC-GAN Explained
Connor Shorten
SimGAN Explained
Connor Shorten
DC-GAN Explained!
Connor Shorten
ResNet Explained!
Connor Shorten
Graph Convolutional Networks
Connor Shorten
Neural Architecture Search
Connor Shorten
Henry AI Labs
Connor Shorten
Video Classification with Deep Learning
Connor Shorten
BigGANs in Data Augmentation
Connor Shorten
Introduction to Deep Learning
Connor Shorten
EfficientNet Explained!
Connor Shorten
Self-Attention GAN
Connor Shorten
Curriculum Learning in Deep Neural Networks
Connor Shorten
Deep Learning Podcast #1 | Edward Dixon | Stochastic Weight Averaging
Connor Shorten
Deep Compression
Connor Shorten
Skin Cancer Classification with Deep Learning
Connor Shorten
Deep Learning Podcast #2 | Edward Peake | Deep Learning in Medical Imaging
Connor Shorten
The Lottery Ticket Hypothesis Explained!
Connor Shorten
SqueezeNet
Connor Shorten
GauGAN Explained!
Connor Shorten
AutoML with Hyperband
Connor Shorten
DL Podcast #3 | Yannic Kilcher | Population-Based Search
Connor Shorten
Weakly Supervised Pretraining
Connor Shorten
Image Data Augmentation for Deep Learning
Connor Shorten
Unsupervised Data Augmentation
Connor Shorten
Wide ResNet Explained!
Connor Shorten
RevNet: Backpropagation without Storing Activations
Connor Shorten
GANs with Fewer Labels
Connor Shorten
BigBiGAN Unsupervised Learning!
Connor Shorten
Self-Supervised Learning
Connor Shorten
Multi-Task Self-Supervised Learning
Connor Shorten
Self-Supervised GANs
Connor Shorten
Population Based Training
Connor Shorten
Show, Attend and Tell
Connor Shorten
Siamese Neural Networks
Connor Shorten
WaveGAN Explained!
Connor Shorten
VAE-GAN Explained!
Connor Shorten
Evolution in Neural Architecture Search!
Connor Shorten
AI Research Weekly Update August 18th, 2019
Connor Shorten
Weight Agnostic Neural Networks Explained!
Connor Shorten
AI Research Weekly Update August 25th, 2019
Connor Shorten
Neuroevolution of Augmenting Topologies (NEAT)
Connor Shorten
CoDeepNEAT
Connor Shorten
AI Research Weekly Update September 1st, 2019
Connor Shorten
Randomly Wired Neural Networks
Connor Shorten
Genetic CNN
Connor Shorten
More on: LLM Foundations
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
I Tried 10 ChatGPT Resume Prompts. Here's What Actually Got Me Interviews.
Dev.to AI
How does indirect prompt injection work? #tech
Dev.to AI
A Unified View of AI Evolution: From Machine Learning to LLMs, RAG, and Fine-Tuning
Dev.to · Naimul Karim
OpenAI Just Unleashed GPT-5.5 — And It Signals the Next Phase of AI
Medium · AI
Chapters (13)
Introduction
1:50
Background on Transfer Learning
2:40
Self-Training
5:25
Not all unlabeled data is equally useful
6:54
SentAugment Retrieval and Filtering
12:55
Experimental Data
14:55
Results
18:15
Some Interesting Details
19:02
Ablations
20:20
Nearest Neighbor Visualization
21:05
Computational Cost of Self-Training
22:30
Few-Shot Learning comparison with GPT-3, PET
23:52
Phases of Representation Learning
🎓
Tutor Explanation
DeepCamp AI