Optimizing Neurorobot Policy under Limited Demonstration Data through Preference Regret

📰 ArXiv cs.AI

arXiv:2604.03523v1 Announce Type: cross Abstract: Robot reinforcement learning from demonstrations (RLfD) assumes that expert data is abundant; this is usually unrealistic in the real world given data scarcity as well as high collection cost. Furthermore, imitation learning algorithms assume that the data is independently and identically distributed, which ultimately results in poorer performance as gradual errors emerge and compound within test-time trajectories. We address these issues by intr

Published 7 Apr 2026
Read full paper → ← Back to News