Let's go Bananas with GenMedia — Guillaume Vernade, Google DeepMind

AI Engineer · Intermediate ·🧠 Large Language Models ·16h ago
Guillaume Vernade from Google DeepMind takes a public domain book and runs it through the full gen media stack live. Gemini reads the whole text and writes image prompts for each character and chapter. Imagen generates the portraits. Veo animates them into video clips using those images as first frames. Lyria composes a different piece of music per chapter, with or without lyrics. The TTS model reads dialogue from the book using a trick that makes two voices sound like four distinct characters. The interesting layer underneath all of it is that Gemini acts as the prompt engineer for every other model, and it works well partly because the gen media models were trained on prompts written by Gemini. The workshop also covers the Lyria Realtime model, which generates music continuously and responds to new prompts mid-stream like a DJ, and a new interactions API that makes chained multi-turn calls cheaper by caching context server-side instead of resending the full book on every turn. Speaker info: - https://x.com/Giom_V - https://www.linkedin.com/in/guillaumevernade - https://github.com/Giom-V
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

Meet the AI Minds Behind Grok: Full Companion List You Should Know About
Discover the AI minds behind Grok and learn how they're revolutionizing chatbots with emotionally expressive digital companions
Dev.to AI
What I shipped during I/O 2026 week: Gemma 4 on Ollama with a five-piece safety stack
Learn how to deploy Gemma 4 on Ollama with a five-piece safety stack and improve your AI model deployment skills
Dev.to AI
Building an Enterprise-Grade Multimodal Educational AI System — Key Engineering Learnings
Learn key engineering lessons for building an enterprise-grade multimodal educational AI system, focusing on a NEET Biology Learning Assistant
Medium · RAG
When AI Meets Reality: Why “Hello World” Isn’t Enough for LLM Systems
Learn why basic AI tutorials are insufficient for building real-world LLM systems and how to take your skills to the next level
Dev.to · Printo Tom
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →