VLA-Forget: Vision-Language-Action Unlearning for Embodied Foundation Models
📰 ArXiv cs.AI
arXiv:2604.03956v1 Announce Type: cross Abstract: Vision-language-action (VLA) models are emerging as embodied foundation models for robotic manipulation, but their deployment introduces a new unlearning challenge: removing unsafe, spurious, or privacy-sensitive behaviors without degrading perception, language grounding, and action control. In OpenVLA-style policies, behavior is produced through a fused visual encoder, a cross-modal projector, and a language backbone that predicts tokenized robo
DeepCamp AI