Process Supervision via Verbal Critique Improves Reasoning in Large Language Models
📰 ArXiv cs.AI
arXiv:2604.21611v1 Announce Type: cross Abstract: Inference-time scaling for LLM reasoning has focused on three axes: chain depth, sample breadth, and learned step-scorers (PRMs). We introduce a fourth axis, granularity of external verbal supervision, via Verbal Process Supervision (VPS), a training-free framework that uses structured natural-language critique from a stronger supervisor to guide an iterative generate-critique-refine loop up to a round budget R. Across GPQA Diamond, AIME 2025, an
DeepCamp AI