36 articles

📰 Dev.to · Jamie Cole

Articles from Dev.to · Jamie Cole · 36 articles · Updated every 3 hours · View all reads

All ⚡ AI Lessons (9011) ArXiv cs.AIDev.to · FORUM WEBForbes InnovationOpenAI NewsDev.to AIHugging Face Blog
How to Detect LLM Drift Before It Breaks Your Users
Dev.to · Jamie Cole 2w ago
How to Detect LLM Drift Before It Breaks Your Users
The most common LLM production incident is silent quality degradation. Here is how to detect it before it breaks your users.
I Analyzed 300 LLM Drift Checks: Here's What I Found
Dev.to · Jamie Cole 2w ago
I Analyzed 300 LLM Drift Checks: Here's What I Found
6 months of production data on LLM drift. Which models drift most, which tasks are affected, and how to detect it.
The 7 LLM Integration Patterns That Break in Production
Dev.to · Jamie Cole 2w ago
The 7 LLM Integration Patterns That Break in Production
After 18 months of LLM integrations, these are the patterns that fail most often in production. Not theoretical failures — real incidents.
I Built a $400/mo LLM Cost Monitoring System (Here's What I Learned)
Dev.to · Jamie Cole 2w ago
I Built a $400/mo LLM Cost Monitoring System (Here's What I Learned)
After a $3000 surprise bill, I built cost monitoring for every LLM call. Here's the exact architecture and what it cost me.
The LLM Monitoring Stack I Run in Production (It's 3 Tools, $50/mo)
Dev.to · Jamie Cole 2w ago
The LLM Monitoring Stack I Run in Production (It's 3 Tools, $50/mo)
After 18 months of running LLMs in production, this is the exact monitoring setup I use and what it costs.
Why Prompt Testing Alone Won't Catch LLM Drift (And What Will)
Dev.to · Jamie Cole 2w ago
Why Prompt Testing Alone Won't Catch LLM Drift (And What Will)
Everyone tests their prompts before launch. Nobody catches what happens when the model silently updates a week later.
How to Add LLM Drift Monitoring to Your CI/CD Pipeline in 10 Minutes
Dev.to · Jamie Cole 2w ago
How to Add LLM Drift Monitoring to Your CI/CD Pipeline in 10 Minutes
A practical guide to adding automated LLM drift detection to your existing CI/CD workflow. Step-by-step with GitHub Actions.
I Ran 300 LLM Drift Checks: Here's the Distribution of Failure Patterns I Found
Dev.to · Jamie Cole 2w ago
I Ran 300 LLM Drift Checks: Here's the Distribution of Failure Patterns I Found
After 300 automated drift checks across GPT-4o, Claude, and Gemini, here's exactly where models fail most often.
The LLM Tooling Stack I Actually Use in 2026 (After 18 Months of Testing)
Dev.to · Jamie Cole 2w ago
The LLM Tooling Stack I Actually Use in 2026 (After 18 Months of Testing)
The exact tools I use daily for LLM development — not the popular ones, the ones that actually work.
I Built an LLM Drift Detector — It Caught GPT-4o Changing Behaviour in Production
Dev.to · Jamie Cole 2w ago
I Built an LLM Drift Detector — It Caught GPT-4o Changing Behaviour in Production
The story of building an automated regression testing system for LLMs — and what it found when it watched GPT-4o for 30 days.
The Structured Output Pattern: How to Get LLMs to Return Clean JSON Every Time
Dev.to · Jamie Cole 2w ago
The Structured Output Pattern: How to Get LLMs to Return Clean JSON Every Time
JSON mode, system prompts, and parsing tricks that make LLM output actually usable in production.
GPT-5.1 Was Retired on March 11 — Here's What Broke in Your LLM App
Dev.to · Jamie Cole 4w ago
GPT-5.1 Was Retired on March 11 — Here's What Broke in Your LLM App
OpenAI retired GPT-5.1 on March 11 with automatic fallback to GPT-5.3/5.4. If your app calls gpt-5.1, it's now running a different model — silently. Here's exac
How to Add LLM Drift Monitoring to Your CI/CD Pipeline (Free, 5 Minutes)
Dev.to · Jamie Cole 4w ago
How to Add LLM Drift Monitoring to Your CI/CD Pipeline (Free, 5 Minutes)
Unit tests don't catch LLM behavioral drift. Here's a practical CI/CD setup that detects format regressions, instruction compliance drift, and output changes be
I Found a 0.575 Drift Score Between Two Consecutive LLM Runs. Here's Exactly What Changed.
Dev.to · Jamie Cole 4w ago
I Found a 0.575 Drift Score Between Two Consecutive LLM Runs. Here's Exactly What Changed.
Real data: same prompt, same model, two consecutive runs. Drift score 0.575. The cause: a trailing period. Here's the exact output diff and why it breaks produc
Real LLM Drift Detection Results: Exact Outputs, Real Scores, No Fabrication
Dev.to · Jamie Cole 4w ago
Real LLM Drift Detection Results: Exact Outputs, Real Scores, No Fabrication
Real drift scores from DriftWatch on production-style prompts. Exact outputs shown. 0.575: trailing period dropped on sentiment classifier. 0.316: JSON whitespa
PromptFoo Passes. Production Still Breaks. Here's the Gap.
Dev.to · Jamie Cole 4w ago
PromptFoo Passes. Production Still Breaks. Here's the Gap.
I had PromptFoo set up in CI. Evals passed on every deployment. The model still silently changed in...
How to Get Notified the Moment OpenAI or Anthropic Changes Your Model
Dev.to · Jamie Cole 4w ago
How to Get Notified the Moment OpenAI or Anthropic Changes Your Model
OpenAI doesn't email you when GPT-4o changes. Anthropic doesn't either. You find out from users. Or...
Your LLM CI/CD Tests Aren't Enough — Here's the Gap
Dev.to · Jamie Cole 4w ago
Your LLM CI/CD Tests Aren't Enough — Here's the Gap
Your CI/CD pipeline runs before every deploy. Your LLM prompt tests pass. You ship. Three days...
GPT-5.2 Changed on Feb 10 — Here's How to Know If Your Prompts Broke
Dev.to · Jamie Cole 4w ago
GPT-5.2 Changed on Feb 10 — Here's How to Know If Your Prompts Broke
On February 10, 2026, OpenAI pushed a silent update to GPT-5.2 Instant. The release notes said it...
My LLM Started Lying to My App and I Didn't Notice for Three Days
Dev.to · Jamie Cole 4w ago
My LLM Started Lying to My App and I Didn't Notice for Three Days
It started with a Slack message from a user: "Your summaries look weird." Not an error. Not a crash....