Implementing surrogate goals for safer bargaining in LLM-based agents

📰 ArXiv cs.AI

arXiv:2604.04341v1 Announce Type: new Abstract: Surrogate goals have been proposed as a strategy for reducing risks from bargaining failures. A surrogate goal is goal that a principal can give an AI agent and that deflects any threats against the agent away from what the principal cares about. For example, one might make one's agent care about preventing money from being burned. Then in bargaining interactions, other agents can threaten to burn their money instead of threatening to spending mone

Published 7 Apr 2026

Read full paper → ← Back to News