Distributionally Robust Token Optimization in RLHF

📰 ArXiv cs.AI

arXiv:2604.08577v1 Announce Type: cross Abstract: Large Language Models (LLMs) tend to respond correctly to prompts that align to the data they were trained and fine-tuned on. Yet, small shifts in wording, format, or language can trigger surprisingly large failures, especially on multi-step reasoning problems. To address this problem, we propose a Distributionally Robust Token Optimization (DRTO) approach, which combines token-level Reinforcement Learning from Human Feedback (RLHF) with Distribu

Published 13 Apr 2026
Read full paper → ← Back to Reads