Building Conversational Agents — Thor Schaeff and Philipp Schmid, Google DeepMind
Thor Schaeff and Philipp Schmid show how to build conversational agents with Google DeepMind's Gemini APIs, from tool-using coding agents to realtime voice interfaces. The session covers the new Interactions API, agent skills, server-side state, and the Live API workflow for streaming audio, video, and tool calls into multimodal assistants.
Speaker info:
- https://x.com/_philschmid
- https://x.com/thorwebdev
Timestamps
0:14 - Introduction and speaker introductions
6:15 - Audience interaction and project discussions
8:38 - Introduction to building conversational agents
28:17 - Discussion on Gemini Flash for coding and agentic use
36:28 - Coding agent implementation and tool calling demonstration
42:55 - Overview of the Interactions API and state management
49:05 - Introduction to the Gemini Live API
50:02 - Live Jukebox demo with music generation
54:49 - Deep dive into Gemini Flash Live features (multimodality, latency, tools)
1:06:54 - Technical setup and implementation of the Live API using WebSockets
1:25:14 - Session management and context window compression
1:26:57 - Real-world business use cases for conversational agents
1:35:02 - Multimodal grounding and handling audio inputs
1:40:00 - Discussion on personalization and speaker identification
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
More on: API Design
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
`setTimeout()` Is NOT Part of JavaScript
Dev.to · CodeWithIshwar
Installing Node.js and npm on Ubuntu 26.04
Dev.to · Sanskriti Harmukh
How to Modernize a Node.js Backend Without Rewriting It (Using Zuplo)
Dev.to · Chidera Humphrey
Firebase for Startups: When to Switch to Enterprise Solutions
Dev.to · Horizon Dev
Chapters (14)
0:14
Introduction and speaker introductions
6:15
Audience interaction and project discussions
8:38
Introduction to building conversational agents
28:17
Discussion on Gemini Flash for coding and agentic use
36:28
Coding agent implementation and tool calling demonstration
42:55
Overview of the Interactions API and state management
49:05
Introduction to the Gemini Live API
50:02
Live Jukebox demo with music generation
54:49
Deep dive into Gemini Flash Live features (multimodality, latency, tools)
1:06:54
Technical setup and implementation of the Live API using WebSockets
1:25:14
Session management and context window compression
1:26:57
Real-world business use cases for conversational agents
1:35:02
Multimodal grounding and handling audio inputs
1:40:00
Discussion on personalization and speaker identification
🎓
Tutor Explanation
DeepCamp AI