Foundations

Reinforcement Learning

RL algorithms, reward modelling, RLHF, policy gradients, Q-learning and multi-agent RL

39
lessons
Skills in this topic
View full skill map →
RL Foundations
beginner
Formalise a problem as an MDP
Policy Gradient Methods
intermediate
Implement REINFORCE from scratch
RLHF & Alignment
advanced
Describe the RLHF pipeline end-to-end
All Reads (19) Articles (12)Blog Posts (1)Tutorials (3)Research Papers (3)