Self-Training improves Pre-Training for Natural Language Understanding

Connor Shorten · Beginner ·🧠 Large Language Models ·5y ago

Skills: LLM Foundations80%

This video explains a new paper that shows benefits by Self-Training after Language Modeling to improve the performance of RoBERTa-Large. The paper goes on to show Self-Training gains in Knowledge Distillation and Few-Shot Learning as well. They also introduce an interesting unlabeled data filtering algorithm, SentAugment that improves performance and reduces the computational cost of this kind of self-training looping. Thanks for watching! Please Subscribe! Paper Links: Paper Link: https://arxiv.org/pdf/2010.02194.pdf Distributed Representations of Words and Phrases: https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf Rethinking Pre-training and Self-training: https://arxiv.org/pdf/2006.06882.pdf Don't Stop Pretraining: https://arxiv.org/pdf/2004.10964.pdf Universal Sentence Encoder: https://arxiv.org/abs/1803.11175 Common Crawl Corpus: https://commoncrawl.org/the-data/ Fairseq: https://github.com/pytorch/fairseq BERT: https://arxiv.org/pdf/1810.04805.pdf Noisy Student: https://arxiv.org/abs/1911.04252 POET: https://arxiv.org/pdf/1901.01753.pdf PET - Small Language Models are Also Few-Shot Learners: https://arxiv.org/pdf/2009.07118.pdf Chapters: 0:00 Introduction 1:50 Background on Transfer Learning 2:40 Self-Training 5:25 Not all unlabeled data is equally useful 6:54 SentAugment Retrieval and Filtering 12:55 Experimental Data 14:55 Results 18:15 Some Interesting Details 19:02 Ablations 20:20 Nearest Neighbor Visualization 21:05 Computational Cost of Self-Training 22:30 Few-Shot Learning comparison with GPT-3, PET 23:52 Phases of Representation Learning

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Connor Shorten · Connor Shorten · 0 of 60

← Previous Next →

DeepWalk Explained

DeepWalk Explained

Inception Network Explained

Inception Network Explained

Progressive Growing of GANs Explained

Progressive Growing of GANs Explained

Improved Techniques for Training GANs

Improved Techniques for Training GANs

Word2Vec Explained

Word2Vec Explained

Must Read Papers on GANs

Must Read Papers on GANs

Unsupervised Feature Learning

Unsupervised Feature Learning

Self-Supervised GANs

Self-Supervised GANs

Embedding Graphs with Deep Learning

Embedding Graphs with Deep Learning

Transfer Learning in GANs

Transfer Learning in GANs

ReLU Activation Function

ReLU Activation Function

AC-GAN Explained

AC-GAN Explained

SimGAN Explained

SimGAN Explained

DC-GAN Explained!

DC-GAN Explained!

ResNet Explained!

ResNet Explained!

Graph Convolutional Networks

Graph Convolutional Networks

Neural Architecture Search

Neural Architecture Search

Video Classification with Deep Learning

Video Classification with Deep Learning

BigGANs in Data Augmentation

BigGANs in Data Augmentation

Introduction to Deep Learning

Introduction to Deep Learning

EfficientNet Explained!

EfficientNet Explained!

Self-Attention GAN

Self-Attention GAN

Curriculum Learning in Deep Neural Networks

Curriculum Learning in Deep Neural Networks

Deep Learning Podcast #1 | Edward Dixon | Stochastic Weight Averaging

Deep Learning Podcast #1 | Edward Dixon | Stochastic Weight Averaging

Deep Compression

Deep Compression

Skin Cancer Classification with Deep Learning

Skin Cancer Classification with Deep Learning

Deep Learning Podcast #2 | Edward Peake | Deep Learning in Medical Imaging

Deep Learning Podcast #2 | Edward Peake | Deep Learning in Medical Imaging

The Lottery Ticket Hypothesis Explained!

The Lottery Ticket Hypothesis Explained!

GauGAN Explained!

GauGAN Explained!

AutoML with Hyperband

AutoML with Hyperband

DL Podcast #3 | Yannic Kilcher | Population-Based Search

DL Podcast #3 | Yannic Kilcher | Population-Based Search

Weakly Supervised Pretraining

Weakly Supervised Pretraining

Image Data Augmentation for Deep Learning

Image Data Augmentation for Deep Learning

Unsupervised Data Augmentation

Unsupervised Data Augmentation

Wide ResNet Explained!

Wide ResNet Explained!

RevNet: Backpropagation without Storing Activations

RevNet: Backpropagation without Storing Activations

GANs with Fewer Labels

GANs with Fewer Labels

BigBiGAN Unsupervised Learning!

BigBiGAN Unsupervised Learning!

Self-Supervised Learning

Self-Supervised Learning

Multi-Task Self-Supervised Learning

Multi-Task Self-Supervised Learning

Self-Supervised GANs

Self-Supervised GANs

Population Based Training

Population Based Training

Show, Attend and Tell

Show, Attend and Tell

Siamese Neural Networks

Siamese Neural Networks

WaveGAN Explained!

WaveGAN Explained!

VAE-GAN Explained!

VAE-GAN Explained!

Evolution in Neural Architecture Search!

Evolution in Neural Architecture Search!

AI Research Weekly Update August 18th, 2019

AI Research Weekly Update August 18th, 2019

Weight Agnostic Neural Networks Explained!

Weight Agnostic Neural Networks Explained!

AI Research Weekly Update August 25th, 2019

AI Research Weekly Update August 25th, 2019

Neuroevolution of Augmenting Topologies (NEAT)

Neuroevolution of Augmenting Topologies (NEAT)

AI Research Weekly Update September 1st, 2019

AI Research Weekly Update September 1st, 2019

Randomly Wired Neural Networks

Randomly Wired Neural Networks

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Beginners Tutorial to Upload Github Jupyter Notebook to Google Colab

Beginners Tutorial to Upload Github Jupyter Notebook to Google Colab

Related AI Lessons

I Tried 10 ChatGPT Resume Prompts. Here's What Actually Got Me Interviews.

Learn how to use ChatGPT prompts to improve your resume and get more interview callbacks

How does indirect prompt injection work? #tech

Indirect prompt injection is a technique used in AI to manipulate model outputs by injecting prompts indirectly, and understanding how it works is crucial for developing secure AI systems.

A Unified View of AI Evolution: From Machine Learning to LLMs, RAG, and Fine-Tuning

Learn about the evolution of AI from machine learning to LLMs, RAG, and fine-tuning, and how to apply these concepts in practice

Dev.to · Naimul Karim

OpenAI Just Unleashed GPT-5.5 — And It Signals the Next Phase of AI

OpenAI's GPT-5.5 signals a shift towards practical AI applications in the real world

Chapters (13)

Introduction

1:50 Background on Transfer Learning

2:40 Self-Training

5:25 Not all unlabeled data is equally useful

6:54 SentAugment Retrieval and Filtering

12:55 Experimental Data

14:55 Results

18:15 Some Interesting Details

19:02 Ablations

20:20 Nearest Neighbor Visualization

21:05 Computational Cost of Self-Training

22:30 Few-Shot Learning comparison with GPT-3, PET

23:52 Phases of Representation Learning

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)