VLA-Forget: Vision-Language-Action Unlearning for Embodied Foundation Models

📰 ArXiv cs.AI

arXiv:2604.03956v1 Announce Type: cross Abstract: Vision-language-action (VLA) models are emerging as embodied foundation models for robotic manipulation, but their deployment introduces a new unlearning challenge: removing unsafe, spurious, or privacy-sensitive behaviors without degrading perception, language grounding, and action control. In OpenVLA-style policies, behavior is produced through a fused visual encoder, a cross-modal projector, and a language backbone that predicts tokenized robo

Published 7 Apr 2026
Read full paper → ← Back to News