Robotics-Inspired Guardrails for Foundation Models in Socially Sensitive Domains

📰 ArXiv cs.AI

arXiv:2605.19940v1 Announce Type: new Abstract: Foundation models are increasingly deployed in socially sensitive domains such as education, mental health, and caregiving, where failures are often cumulative and context-dependent. Existing guardrail approaches -- ranging from training-time alignment to prompting, decoding constraints, and post-hoc moderation -- primarily provide empirical risk reduction rather than enforceable behavioral guarantees, and largely treat safety as a property of indi

Published 20 May 2026

Read full paper → ← Back to Reads