How to Find the Agent Failures Your Evals Miss [Scott Clark] - 767

The TWIML AI Podcast with Sam Charrington · Intermediate ·🤖 AI Agents & Automation ·1w ago
In this episode, Scott Clark, co-founder and CEO of Distributional, joins us to explore how teams can reliably operate and improve complex LLM systems and agents in production. Scott introduces a Maslow’s hierarchy of observability: telemetry for logging, monitoring for known signals, and post-production or online analytics to surface unknown unknowns. We dig into examples of real-world failures Scott’s team has seen in production systems, such as “lazy” tool-use hallucinations that standard evals miss, and how mapping traces into vector fingerprints enables clustering and topic discovery to uncover emergent behaviors. Scott explains how analytics can feed the data flywheel by generating evals, guardrails, and training data, and why online, adaptive approaches are essential for non-stationary models. We also touch on practical how-to’s such as instrumentation with OpenTelemetry, the GenAI semantic conventions, and the role of dedicated analytics tools. 🗒️ For the full list of resources for this episode, visit the show notes page: https://twimlai.com/go/767. 🔔 Subscribe to our channel for more great content just like this: https://youtube.com/twimlai?sub_confirmation=1 🗣️ CONNECT WITH US! =============================== Subscribe to the TWIML AI Podcast: https://twimlai.com/podcast/twimlai/ Follow us on Twitter: https://twitter.com/twimlai Follow us on LinkedIn: https://www.linkedin.com/company/twimlai/ Join our Slack Community: https://twimlai.com/community/ Subscribe to our newsletter: https://twimlai.com/newsletter/ Want to get in touch? Send us a message: https://twimlai.com/contact/ 🔗 LINKS & RESOURCES =============================== Distributional - https://www.distributional.com/ Distributional App - http://app.dbnl.com/ Distributional Docs - http://docs.dbnl.com/ Supporting Rapid Model Development at Two Sigma with Scott Clark & Matthew Adereth - 273 - https://twimlai.com/podcast/twimlai/supporting-rapid-model-development-at-two-sigma Bayesian Opt
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

AMRs in Indian warehouses: How 3PL and e-commerce firms can make automation work
Learn how Autonomous Mobile Robots (AMRs) can improve warehouse efficiency in India's growing e-commerce and logistics sector
Dev.to AI
SEARCH
Learn how AiFinPay SDK empowers AI agents with seamless financial integration, and how to apply it in your projects
Dev.to AI
Models shouldn't have execution authority. Why we built a deterministic FSM runtime for AI agents.
Learn why probabilistic models shouldn't have execution authority and how a deterministic FSM runtime can improve safety for AI agents
Dev.to AI
Google I/O 2026 Turned Gemini Into An Agent Platform
Google I/O 2026 introduces Gemini as an agent platform, reframing its products around AI agents, and learn how this impacts AI development
Forbes Innovation
Up next
New Gemini App: Automate & Build ANYTHING!
Julian Goldie SEO
Watch →