From Noise to Intent: Anchoring Generative VLA Policies with Residual Bridges

📰 ArXiv cs.AI

arXiv:2604.21391v1 Announce Type: cross Abstract: Bridging high-level semantic understanding with low-level physical control remains a persistent challenge in embodied intelligence, stemming from the fundamental spatiotemporal scale mismatch between cognition and action. Existing generative VLA policies typically adopt a "Generation-from-Noise" paradigm, which disregards this disparity, leading to representation inefficiency and weak condition alignment during optimization. In this work, we prop

Published 25 Apr 2026
Read full paper → ← Back to Reads