Mind the Gap (In your Agent Observability) — Amy Boyd & Nitya Narasimhan, Microsoft

AI Engineer · Intermediate ·🤖 AI Agents & Automation ·1h ago
Agents drift. Models change, prompts get tweaked, edge cases accumulate, and the gap between what your agent does and what you need it to do widens without you noticing. Amy and Nitya walk through Microsoft Foundry's observability stack: tracing built on OpenTelemetry, built-in evaluators for quality, safety, and agentic metrics like intent resolution and task adherence, and red teaming where a second AI attacks your agent with adversarial prompts to find vulnerabilities before your users do. The piece worth watching for is the observe skill demo. You point it at an agent with no eval dataset, no baselines, nothing. It generates the dataset, runs batch evaluations, optimizes the prompt, compares versions, and rolls back to the best one... all from a single prompt to a coding agent. The skill shows its reasoning at each step, which is where the real value is: it surfaces the failures you didn't know to look for. Speaker info: - https://x.com/NityaNarasimhan - https://www.linkedin.com/in/nityan/ - https://x.com/AmyKateNicho - https://www.linkedin.com/in/amykatenicho/
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

10 Real-World AI Agent Projects
Learn from 10 real-world AI agent projects to advance your AI engineering skills and build more complex applications
Medium · LLM
Actually, vibe coding didn't kill testing — agentic engineering did
Learn how agentic engineering is changing the landscape of testing and development, and why it's more impactful than vibe coding
Dev.to · Muggle AI
Gemini 3.1 Flash Lite vs DeepSeek V4 Flash: Budget API Showdown for High-Volume Agent Loops (2026)
Compare Gemini 3.1 Flash Lite and DeepSeek V4 Flash for budget-friendly API options in high-volume agent loops, considering tradeoffs between pricing and reliability
Dev.to AI
WebMCP Reality Check: Where the Spec Actually Stands
Learn the current state of WebMCP and its limitations, and why major agents aren't using it yet
Dev.to AI
Up next
NEW Chinese AI AGENT Changes Everything! 🤯
Julian Goldie SEO
Watch →