Erfan Shayegani - Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness

Cohere · Beginner ·🛡️ AI Safety & Ethics ·2d ago
00:00 Intro and Paper Overview 00:46 What Are Computer Use Agents 02:50 Safety Focus and Blind Goal Directedness 07:37 Pattern One Context Failures 13:45 Pattern Two Risky Assumptions 17:47 Pattern Three Infeasible Goals 20:02 BlindAct Benchmark Setup 23:45 Evaluation With LLM Judges 25:20 Results BGD vs Completion 31:41 Prompting Mitigations Limits 35:03 Qualitative Failure Modes 39:51 Q&A and Closing In this talk Erfan will talk about adversarial attacks on multi-modal language models. Computer-Use Agents (CUAs) are an increasingly deployed class of agents that take actions on GUIs to accomplish user goals. In this paper, we show that CUAs consistently exhibit Blind Goal-Directedness (BGD): a bias to pursue goals regardless of feasibility, safety, reliability, or context. We characterize three prevalent patterns of BGD: (i) lack of contextual reasoning, (ii) assumptions and decisions under ambiguity, and (iii) contradictory or infeasible goals. We develop BLIND-ACT, a benchmark of 90 tasks capturing these three patterns. Built on OSWorld, BLIND-ACT provides realistic environments and employs LLM-based judges to evaluate agent behavior, achieving 93.75% agreement with human annotations. We use BLIND-ACT to evaluate nine frontier models, including Claude Sonnet and Opus 4, Computer-Use-Preview, and GPT-5, observing high average BGD rates (80.8%) across them. We show that BGD exposes subtle risks that arise even when inputs are not directly harmful. While prompting-based interventions lower BGD levels, substantial risk persists, highlighting the need for stronger training- or inference-time interventions. Qualitative analysis reveals observed failure modes: execution-first bias (focusing on how to act over whether to act), thought–action disconnect (execution diverging from reasoning), and request-primacy (justifying actions due to user request). Identifying BGD and introducing BLIND-ACT establishes a foundation for future research on studying and mitigating this f
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

SpaceX Listed Grok's ‘Spicy’ Mode as a Risk in Its IPO Filing
SpaceX lists Grok's 'Spicy' mode as a risk in its IPO filing due to potential litigation losses from complaints about sexualized images
Wired AI
Raising a Good Junior: What AI Gets Wrong About Knowledge and What It Means for the Next Generation
Learn how AI's limitations in understanding tacit knowledge impact the next generation and what it means for their development
Dev.to · Andre Faria
The Observability Crisis in AI Systems: Why Your Logs Are Lying to You
AI systems' lack of transparency poses a significant operational risk, making observability a crucial concern
Hackernoon
My BlueDot Technical AI Safety Participation Experience
Learn about AI safety participation through a personal experience with BlueDot Technical, understanding its importance in the AI development process
Medium · Machine Learning

Chapters (12)

Intro and Paper Overview
0:46 What Are Computer Use Agents
2:50 Safety Focus and Blind Goal Directedness
7:37 Pattern One Context Failures
13:45 Pattern Two Risky Assumptions
17:47 Pattern Three Infeasible Goals
20:02 BlindAct Benchmark Setup
23:45 Evaluation With LLM Judges
25:20 Results BGD vs Completion
31:41 Prompting Mitigations Limits
35:03 Qualitative Failure Modes
39:51 Q&A and Closing
Up next
The Ethics of a Free Market
MIT OpenCourseWare
Watch →