Toward Consistent World Models with Multi-Token Prediction and Latent Semantic Enhancement

📰 ArXiv cs.AI

arXiv:2604.06155v1 Announce Type: cross Abstract: Whether Large Language Models (LLMs) develop coherent internal world models remains a core debate. While conventional Next-Token Prediction (NTP) focuses on one-step-ahead supervision, Multi-Token Prediction (MTP) has shown promise in learning more structured representations. In this work, we provide a theoretical perspective analyzing the gradient inductive bias of MTP, supported by empirical evidence, showing that MTP promotes the convergence t

Published 8 Apr 2026
Read full paper → ← Back to News