Bringing Value Models Back: Generative Critics for Value Modeling in LLM Reinforcement Learning

📰 ArXiv cs.AI

arXiv:2604.10701v1 Announce Type: cross Abstract: Credit assignment is a central challenge in reinforcement learning (RL). Classical actor-critic methods address this challenge through fine-grained advantage estimation based on a learned value function. However, learned value models are often avoided in modern large language model (LLM) RL because conventional discriminative critics are difficult to train reliably. We revisit value modeling and argue that this difficulty is partly due to limited

Published 14 Apr 2026

Read full paper → ← Back to Reads