Building Conversational Agents — Thor Schaeff and Philipp Schmid, Google DeepMind

AI Engineer · Intermediate ·🔧 Backend Engineering ·1w ago
Thor Schaeff and Philipp Schmid show how to build conversational agents with Google DeepMind's Gemini APIs, from tool-using coding agents to realtime voice interfaces. The session covers the new Interactions API, agent skills, server-side state, and the Live API workflow for streaming audio, video, and tool calls into multimodal assistants. Speaker info: - https://x.com/_philschmid - https://x.com/thorwebdev Timestamps 0:14 - Introduction and speaker introductions 6:15 - Audience interaction and project discussions 8:38 - Introduction to building conversational agents 28:17 - Discussion on Gemini Flash for coding and agentic use 36:28 - Coding agent implementation and tool calling demonstration 42:55 - Overview of the Interactions API and state management 49:05 - Introduction to the Gemini Live API 50:02 - Live Jukebox demo with music generation 54:49 - Deep dive into Gemini Flash Live features (multimodality, latency, tools) 1:06:54 - Technical setup and implementation of the Live API using WebSockets 1:25:14 - Session management and context window compression 1:26:57 - Real-world business use cases for conversational agents 1:35:02 - Multimodal grounding and handling audio inputs 1:40:00 - Discussion on personalization and speaker identification
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

`setTimeout()` Is NOT Part of JavaScript
Learn why setTimeout() is not a part of JavaScript and how it's actually a part of the Web APIs, with implications for coding and understanding browser behavior
Dev.to · CodeWithIshwar
Installing Node.js and npm on Ubuntu 26.04
Learn to install the latest Node.js and npm on Ubuntu 26.04, bypassing the outdated default version
Dev.to · Sanskriti Harmukh
How to Modernize a Node.js Backend Without Rewriting It (Using Zuplo)
Learn how to modernize a Node.js backend without rewriting it using Zuplo, improving performance and scalability
Dev.to · Chidera Humphrey
Firebase for Startups: When to Switch to Enterprise Solutions
Learn when to switch from Firebase to enterprise solutions for your startup, and how to navigate the 300-500% yearly cost increase
Dev.to · Horizon Dev

Chapters (14)

0:14 Introduction and speaker introductions
6:15 Audience interaction and project discussions
8:38 Introduction to building conversational agents
28:17 Discussion on Gemini Flash for coding and agentic use
36:28 Coding agent implementation and tool calling demonstration
42:55 Overview of the Interactions API and state management
49:05 Introduction to the Gemini Live API
50:02 Live Jukebox demo with music generation
54:49 Deep dive into Gemini Flash Live features (multimodality, latency, tools)
1:06:54 Technical setup and implementation of the Live API using WebSockets
1:25:14 Session management and context window compression
1:26:57 Real-world business use cases for conversational agents
1:35:02 Multimodal grounding and handling audio inputs
1:40:00 Discussion on personalization and speaker identification
Up next
Lovable AI + Kling 3.0 + Cookiebot = INSANE AI 3D Websites in Minutes (GDPR Ready)
Tin Rovic
Watch →