Large Language Model Post-Training: A Unified View of Off-Policy and On-Policy Learning

📰 ArXiv cs.AI

arXiv:2604.07941v2 Announce Type: replace-cross Abstract: Post-training has become central to turning pretrained large language models (LLMs) into aligned, capable, and deployable systems. Recent progress spans supervised fine-tuning (SFT), preference optimization, reinforcement learning (RL), process supervision, verifier-guided methods, distillation, and multi-stage pipelines. Yet these methods are often discussed in fragmented ways, organized by labels or objectives rather than by the behavio

Published 17 Apr 2026
Read full paper → ← Back to Reads