Erfan Shayegani - Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness

Cohere · Beginner ·🛡️ AI Safety & Ethics ·2d ago

Skills: AI Alignment Basics80%AI Safety Engineering70%

00:00 Intro and Paper Overview 00:46 What Are Computer Use Agents 02:50 Safety Focus and Blind Goal Directedness 07:37 Pattern One Context Failures 13:45 Pattern Two Risky Assumptions 17:47 Pattern Three Infeasible Goals 20:02 BlindAct Benchmark Setup 23:45 Evaluation With LLM Judges 25:20 Results BGD vs Completion 31:41 Prompting Mitigations Limits 35:03 Qualitative Failure Modes 39:51 Q&A and Closing In this talk Erfan will talk about adversarial attacks on multi-modal language models. Computer-Use Agents (CUAs) are an increasingly deployed class of agents that take actions on GUIs to accomplish user goals. In this paper, we show that CUAs consistently exhibit Blind Goal-Directedness (BGD): a bias to pursue goals regardless of feasibility, safety, reliability, or context. We characterize three prevalent patterns of BGD: (i) lack of contextual reasoning, (ii) assumptions and decisions under ambiguity, and (iii) contradictory or infeasible goals. We develop BLIND-ACT, a benchmark of 90 tasks capturing these three patterns. Built on OSWorld, BLIND-ACT provides realistic environments and employs LLM-based judges to evaluate agent behavior, achieving 93.75% agreement with human annotations. We use BLIND-ACT to evaluate nine frontier models, including Claude Sonnet and Opus 4, Computer-Use-Preview, and GPT-5, observing high average BGD rates (80.8%) across them. We show that BGD exposes subtle risks that arise even when inputs are not directly harmful. While prompting-based interventions lower BGD levels, substantial risk persists, highlighting the need for stronger training- or inference-time interventions. Qualitative analysis reveals observed failure modes: execution-first bias (focusing on how to act over whether to act), thought–action disconnect (execution diverging from reasoning), and request-primacy (justifying actions due to user request). Identifying BGD and introducing BLIND-ACT establishes a foundation for future research on studying and mitigating this f

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

More on: AI Alignment Basics

View skill →

Interpretable machine learning applications: Part 5

Interpretable machine learning applications: Part 5

GenAI news from Weights & Biases CEO, Lukas Biewald

GenAI news from Weights & Biases CEO, Lukas Biewald

Weights & Biases

Responsible AI Winners, 2020 PyTorch Summer Hackathon

Responsible AI Winners, 2020 PyTorch Summer Hackathon

Near Real-Time Analytics to GenAI Centralized Observability | Amazon Web Services

Near Real-Time Analytics to GenAI Centralized Observability | Amazon Web Services

Amazon Web Services

Kiro Hooks | Event-Driven Automation for Your IDE | Amazon Web Services

Kiro Hooks | Event-Driven Automation for Your IDE | Amazon Web Services

Amazon Web Services

Get Started with Raven AGI

Get Started with Raven AGI

Related AI Lessons

SpaceX Listed Grok's ‘Spicy’ Mode as a Risk in Its IPO Filing

SpaceX lists Grok's 'Spicy' mode as a risk in its IPO filing due to potential litigation losses from complaints about sexualized images

Raising a Good Junior: What AI Gets Wrong About Knowledge and What It Means for the Next Generation

Learn how AI's limitations in understanding tacit knowledge impact the next generation and what it means for their development

Dev.to · Andre Faria

The Observability Crisis in AI Systems: Why Your Logs Are Lying to You

AI systems' lack of transparency poses a significant operational risk, making observability a crucial concern

My BlueDot Technical AI Safety Participation Experience

Learn about AI safety participation through a personal experience with BlueDot Technical, understanding its importance in the AI development process

Medium · Machine Learning

Chapters (12)

Intro and Paper Overview

0:46 What Are Computer Use Agents

2:50 Safety Focus and Blind Goal Directedness

7:37 Pattern One Context Failures

13:45 Pattern Two Risky Assumptions

17:47 Pattern Three Infeasible Goals

20:02 BlindAct Benchmark Setup

23:45 Evaluation With LLM Judges

25:20 Results BGD vs Completion

31:41 Prompting Mitigations Limits

35:03 Qualitative Failure Modes

39:51 Q&A and Closing

The Ethics of a Free Market

MIT OpenCourseWare