SOAR: Self-Correction for Optimal Alignment and Refinement in Diffusion Models

📰 ArXiv cs.AI

arXiv:2604.12617v1 Announce Type: cross Abstract: The post-training pipeline for diffusion models currently has two stages: supervised fine-tuning (SFT) on curated data and reinforcement learning (RL) with reward models. A fundamental gap separates them. SFT optimizes the denoiser only on ground-truth states sampled from the forward noising process; once inference deviates from these ideal states, subsequent denoising relies on out-of-distribution generalization rather than learned correction, e

Published 15 Apr 2026

Read full paper → ← Back to Reads