Tracing GRPO's Biased Objective Back to DeepSeek Math

Name: Tracing GRPO's Biased Objective Back to DeepSeek Math
Uploaded: 2026-03-13T14:30:36+00:00
Channel: Deep Learning with Yacine
Description: Zichen Liu, author of Dr. GRPO, walks through where the length normalization term in the standard GRPO formulation originates — the DeepSeek Math paper'...

Deep Learning with Yacine · Intermediate ·🛡️ AI Safety & Ethics ·3w ago

Zichen Liu, author of Dr. GRPO, walks through where the length normalization term in the standard GRPO formulation originates — the DeepSeek Math paper's equation and the common implementation choice of averaging loss over the token axis instead of summing. This biased formulation propagated through follow-up papers and major open-source libraries like TRL, OpenRLHF, and verl. amazing man wow.

Watch on YouTube ↗ (saves to browser)

Next Up

The Propaganda Playbook: Deepfakes (That Wasn’t Your Grandson)

Russell Brunson

Tracing GRPO's Biased Objective Back to DeepSeek Math

Lesson complete!