23. LLM Ops: Building a Quality Gate for Retrieval & Generation (Regression Detection)
The hardest part of AI production isn't a crash—it's a quiet decline in quality.
In this video, we explore why evaluation is not just a one-time development step, but a continuous monitoring discipline in LLM Ops. Whether you’ve updated a prompt, changed your model, or added new documents to your index, you need a repeatable way to ensure your system hasn't silently gotten worse.
What we cover in this deep dive:
1. Relevance vs. Faithfulness: Why sounding "fluent" isn't enough. We break down Answer Relevancy, Context Relevancy, and the critical metric of Faithfulness (Grounding).
2. Isolating …
Watch on YouTube ↗
(saves to browser)
DeepCamp AI