Qwen 3.5 | Building a Visual AI Agent to Control Your Computer

Roboflow · Beginner ·🤖 AI Agents & Automation ·6h ago

Skills: Agent Foundations90%

In this video, Matvei Popov, Machine Learning Engineer at Roboflow, explores the capabilities the Qwen 3.5 vision language model (VLM). Unlike previous architectures that separated language and vision encoders, Qwen 3.5 natively combines vision, language, and coding capabilities within a single model, unlocking advanced agentic capabilities like tool calling and direct computer use. Matvei starts by showing how to run Qwen 3.5 locally using Roboflow Inference and how to set it up within Roboflow Workflows for basic image description tasks. He demonstrates how to use the 0.8B and 2B models, adjust parameters like system prompts, and configure token generation limits. The true power of Qwen 3.5 is revealed in the second half of the video when Matvei builds a visual AI agent capable of controlling his computer. Using a custom Python script and Roboflow Inference, Matvei tasks Qwen 3.5 with navigating the Roboflow UI to kick off a new model training job. The model analyzes screenshots of the UI, outputs normalized screen coordinates for specific buttons, and executes the clicks autonomously—proving how this technology can be used to automate complex UI manipulation or guide physical systems. = Additional Resources = Roboflow Inference: https://inference.roboflow.com/ Roboflow Workflows: https://roboflow.com/workflows = Chapters = 00:00 Introduction to Qwen 3.5: Why Native Vision-Language Models Matter 02:03 Setting Up Qwen 3.5 in Roboflow Workflows 03:51 Running Qwen 3.5 Locally with Roboflow Inference 05:47 Building a Visual Agent for "Computer Use" Automation 07:29 Live Demo: Qwen 3.5 Clicks Buttons and Starts a Training Job 08:38 Exploring Tool Calls and Future Use Cases

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

More on: Agent Foundations

View skill →

Build and Deploy an Agent with Reasoning Engine in Vertex AI

Adding a Phone Gateway to a Virtual Agent

From Zero to Working AI Agent in 60 Seconds

From Zero to Working AI Agent in 60 Seconds

Create An AI Agent With Replit That Automates Your Sales

Create An AI Agent With Replit That Automates Your Sales

Capstone: Autonomous Runway Detection for IoT

Capstone: Autonomous Runway Detection for IoT

AI Agents with Model Context Protocol & Typescript

AI Agents with Model Context Protocol & Typescript

Related AI Lessons

Actually, vibe coding didn't kill testing — agentic engineering did

Learn how agentic engineering is changing the landscape of testing and development, and why it's more impactful than vibe coding

Dev.to · Muggle AI

Gemini 3.1 Flash Lite vs DeepSeek V4 Flash: Budget API Showdown for High-Volume Agent Loops (2026)

Compare Gemini 3.1 Flash Lite and DeepSeek V4 Flash for budget-friendly API options in high-volume agent loops, considering tradeoffs between pricing and reliability

WebMCP Reality Check: Where the Spec Actually Stands

Learn the current state of WebMCP and its limitations, and why major agents aren't using it yet

The 2026 Enterprise AI Mandate: From Generative Potential to Agentic Execution

Enterprises must shift from AI experimentation to Agentic Execution, leveraging AI as a proactive coworker in operational workflows

Chapters (6)

Introduction to Qwen 3.5: Why Native Vision-Language Models Matter

2:03 Setting Up Qwen 3.5 in Roboflow Workflows

3:51 Running Qwen 3.5 Locally with Roboflow Inference

5:47 Building a Visual Agent for "Computer Use" Automation

7:29 Live Demo: Qwen 3.5 Clicks Buttons and Starts a Training Job

8:38 Exploring Tool Calls and Future Use Cases

This NEW Chinese AI Agent is INSANE! (FREE!) 🤯

Julian Goldie SEO