Reinforcement Learning from Human Feedback From Zero to ChatGPT [Record of the live]

HuggingFace · Beginner ·🧠 Large Language Models ·3y ago
In this talk, we will cover the basics of Reinforcement Learning from Human Feedback (RLHF) and how this technology is being used to enable state-of-the-art ML tools like ChatGPT. Most of the talk will be an overview of the interconnected ML models and cover the basics of Natural Language Processing and RL that one needs to understand how RLHF is used on large language models. It will conclude with open question in RLHF. Slides: https://docs.google.com/presentation/d/1eI9PqRJTCFOIVihkig1voRM4MHDpLpCicX9lX1J2fqk/edit?usp=sharing RLHF Blogpost: https://huggingface.co/blog/rlhf The Deep RL Course: https://hf.co/deep-rl-course Nathan Twitter: https://twitter.com/natolambert Thomas Twitter: https://twitter.com/thomassimonini Hugging Face Discord server: https://hf.co/join/discord Nathan Lambert is a Research Scientist at HuggingFace. He received his PhD from the University of California, Berkeley working at the intersection of machine learning and robotics. He was advised by Professor Kristofer Pister in the Berkeley Autonomous Microsystems Lab and Roberto Calandra at Meta AI Research. He was lucky to intern at Facebook AI and DeepMind during his Ph.D. Nathan was was awarded the UC Berkeley EECS Demetri Angelakos Memorial Achievement Award for Altruism for his efforts to better community norms.
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from HuggingFace · HuggingFace · 0 of 60

← Previous Next →
1 The Future of Natural Language Processing
The Future of Natural Language Processing
HuggingFace
2 Trends in Model Size & Computational Efficiency in NLP
Trends in Model Size & Computational Efficiency in NLP
HuggingFace
3 Increasing Data Usage in Natural Language Processing
Increasing Data Usage in Natural Language Processing
HuggingFace
4 In Domain & Out of Domain Generalization in the Future of NLP
In Domain & Out of Domain Generalization in the Future of NLP
HuggingFace
5 The Limits of NLU & the Rise of NLG in the Future of NLP
The Limits of NLU & the Rise of NLG in the Future of NLP
HuggingFace
6 The Lack of Robustness in the Future of NLP
The Lack of Robustness in the Future of NLP
HuggingFace
7 Inductive Bias, Common Sense, Continual Learning in The Future of NLP
Inductive Bias, Common Sense, Continual Learning in The Future of NLP
HuggingFace
8 Train a Hugging Face Transformers Model with Amazon SageMaker
Train a Hugging Face Transformers Model with Amazon SageMaker
HuggingFace
9 What is Transfer Learning?
What is Transfer Learning?
HuggingFace
10 The pipeline function
The pipeline function
HuggingFace
11 Navigating the Model Hub
Navigating the Model Hub
HuggingFace
12 Transformer models: Decoders
Transformer models: Decoders
HuggingFace
13 The Transformer architecture
The Transformer architecture
HuggingFace
14 Transformer models: Encoder-Decoders
Transformer models: Encoder-Decoders
HuggingFace
15 Transformer models: Encoders
Transformer models: Encoders
HuggingFace
16 Keras introduction
Keras introduction
HuggingFace
17 The push to hub API
The push to hub API
HuggingFace
18 Fine-tuning with TensorFlow
Fine-tuning with TensorFlow
HuggingFace
19 Learning rate scheduling with TensorFlow
Learning rate scheduling with TensorFlow
HuggingFace
20 TensorFlow Predictions and metrics
TensorFlow Predictions and metrics
HuggingFace
21 Welcome to the Hugging Face course
Welcome to the Hugging Face course
HuggingFace
22 The tokenization pipeline
The tokenization pipeline
HuggingFace
23 Supercharge your PyTorch training loop with Accelerate
Supercharge your PyTorch training loop with Accelerate
HuggingFace
24 The Trainer API
The Trainer API
HuggingFace
25 Batching inputs together (PyTorch)
Batching inputs together (PyTorch)
HuggingFace
26 Batching inputs together (TensorFlow)
Batching inputs together (TensorFlow)
HuggingFace
27 Hugging Face Datasets overview (Pytorch)
Hugging Face Datasets overview (Pytorch)
HuggingFace
28 Hugging Face Datasets overview (Tensorflow)
Hugging Face Datasets overview (Tensorflow)
HuggingFace
29 What is dynamic padding?
What is dynamic padding?
HuggingFace
30 What happens inside the pipeline function? (PyTorch)
What happens inside the pipeline function? (PyTorch)
HuggingFace
31 What happens inside the pipeline function? (TensorFlow)
What happens inside the pipeline function? (TensorFlow)
HuggingFace
32 Instantiate a Transformers model (PyTorch)
Instantiate a Transformers model (PyTorch)
HuggingFace
33 Instantiate a Transformers model (TensorFlow)
Instantiate a Transformers model (TensorFlow)
HuggingFace
34 Preprocessing sentence pairs (PyTorch)
Preprocessing sentence pairs (PyTorch)
HuggingFace
35 Preprocessing sentence pairs (TensorFlow)
Preprocessing sentence pairs (TensorFlow)
HuggingFace
36 Write your training loop in PyTorch
Write your training loop in PyTorch
HuggingFace
37 Managing a repo on the Model Hub
Managing a repo on the Model Hub
HuggingFace
38 Chapter 1 Live Session with Sylvain
Chapter 1 Live Session with Sylvain
HuggingFace
39 Chapter 2 Live Session with Lewis
Chapter 2 Live Session with Lewis
HuggingFace
40 The push to hub API
The push to hub API
HuggingFace
41 Chapter 2 Live Session with Sylvain
Chapter 2 Live Session with Sylvain
HuggingFace
42 Chapter 3 live sessions with Lewis (PyTorch)
Chapter 3 live sessions with Lewis (PyTorch)
HuggingFace
43 Day 1 Talks: JAX, Flax & Transformers 🤗
Day 1 Talks: JAX, Flax & Transformers 🤗
HuggingFace
44 Day 2 Talks: JAX, Flax & Transformers 🤗
Day 2 Talks: JAX, Flax & Transformers 🤗
HuggingFace
45 Day 3 Talks JAX, Flax, Transformers 🤗
Day 3 Talks JAX, Flax, Transformers 🤗
HuggingFace
46 Chapter 4 live sessions with Omar
Chapter 4 live sessions with Omar
HuggingFace
47 Deploy a Hugging Face Transformers Model from S3 to Amazon SageMaker
Deploy a Hugging Face Transformers Model from S3 to Amazon SageMaker
HuggingFace
48 Deploy a Hugging Face Transformers Model from the Model Hub to Amazon SageMaker
Deploy a Hugging Face Transformers Model from the Model Hub to Amazon SageMaker
HuggingFace
49 Run a Batch Transform Job using Hugging Face Transformers and Amazon SageMaker
Run a Batch Transform Job using Hugging Face Transformers and Amazon SageMaker
HuggingFace
50 [Webinar] How to add machine learning capabilities with just a few lines of code
[Webinar] How to add machine learning capabilities with just a few lines of code
HuggingFace
51 Hugging Face + Zapier Demo Video
Hugging Face + Zapier Demo Video
HuggingFace
52 Hugging Face + Google Sheets Demo
Hugging Face + Google Sheets Demo
HuggingFace
53 Hugging Face Infinity Launch - 09/28
Hugging Face Infinity Launch - 09/28
HuggingFace
54 Build and Deploy a Machine Learning App in 2 Minutes
Build and Deploy a Machine Learning App in 2 Minutes
HuggingFace
55 Hugging Face Infinity - GPU Walkthrough
Hugging Face Infinity - GPU Walkthrough
HuggingFace
56 Otto - 🤗 Infinity Case Study
Otto - 🤗 Infinity Case Study
HuggingFace
57 Workshop: Getting started with Amazon Sagemaker Train a Hugging Face Transformers and deploy it
Workshop: Getting started with Amazon Sagemaker Train a Hugging Face Transformers and deploy it
HuggingFace
58 Workshop: Going Production: Deploying, Scaling & Monitoring Hugging Face Transformer models
Workshop: Going Production: Deploying, Scaling & Monitoring Hugging Face Transformer models
HuggingFace
59 🤗 Tasks: Causal Language Modeling
🤗 Tasks: Causal Language Modeling
HuggingFace
60 🤗 Tasks: Masked Language Modeling
🤗 Tasks: Masked Language Modeling
HuggingFace

Related AI Lessons

Deep Generative Adversarial Networks for Compressed Sensing Automates MRI
Learn how Deep Generative Adversarial Networks can automate MRI using Compressed Sensing, and why this matters for medical imaging
Dev.to AI
Self-Reference Cluster: A Lean 4 Common-Encoding Attempt for Lob's Theorem, Reflective Programming, and Acausal Decision Theory (Paper 135)
Learn about the Self-Reference Cluster, a Lean 4 common-encoding attempt for Lob's Theorem, Reflective Programming, and Acausal Decision Theory, and how it can be applied to various fields
Dev.to AI
# The Sad King and the Problem of Controlling a Closed Intelligence
Learn how controlling a closed intelligence system can be problematic due to interpretation and influence limits, and why understanding these limits is crucial for effective control and decision-making.
Medium · LLM
Self-Improving Python Scripts with LLMs: My Journey
Learn how to create self-improving Python scripts using Large Language Models (LLMs) and automate your development process
Dev.to · RTT Enjoy
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →