📰 Dev.to · Jamie Cole

Articles from Dev.to · Jamie Cole · 36 articles · Updated every 3 hours · View all reads

All ⚡ AI Lessons (9011) ArXiv cs.AI Dev.to · FORUM WEB Forbes Innovation OpenAI News Dev.to AI Hugging Face Blog

We Built a Service That Catches LLM Drift Before Your Users Do

Dev.to · Jamie Cole 2w ago

We Built a Service That Catches LLM Drift Before Your Users Do

You shipped your LLM-powered feature. It worked perfectly in testing. Users loved the beta. Three...

The Single Best Way to Reduce LLM Costs (It Is Not What You Think)

Dev.to · Jamie Cole 2w ago

The Single Best Way to Reduce LLM Costs (It Is Not What You Think)

Everyone says: use caching, use cheaper models, reduce token counts. Here is the one thing that actually cuts LLM costs by 40%. ## The Real Problem

The 5 LLM Architecture Patterns That Scale (And 2 That Do Not)

Dev.to · Jamie Cole 2w ago

The 5 LLM Architecture Patterns That Scale (And 2 That Do Not)

After building LLM features for 18 months, here are the architecture patterns I have seen work at scale. And the two that consistently fail. ## Patte

I Built a £500/mo Side Project Using Only Free AI Tools (Here's What Actually Worked)

Dev.to · Jamie Cole 2w ago

I Built a £500/mo Side Project Using Only Free AI Tools (Here's What Actually Worked)

Six months ago I built a SaaS tool using only free AI tools. No paid APIs, no expensive infrastructure. Here's what I learned about what actually work

How to Detect LLM Drift Before It Breaks Your Users

Dev.to · Jamie Cole 2w ago

How to Detect LLM Drift Before It Breaks Your Users

The most common LLM production incident is silent quality degradation. Here is how to detect it before it breaks your users.

I Analyzed 300 LLM Drift Checks: Here's What I Found

Dev.to · Jamie Cole 2w ago

I Analyzed 300 LLM Drift Checks: Here's What I Found

6 months of production data on LLM drift. Which models drift most, which tasks are affected, and how to detect it.

The 7 LLM Integration Patterns That Break in Production

Dev.to · Jamie Cole 2w ago

The 7 LLM Integration Patterns That Break in Production

After 18 months of LLM integrations, these are the patterns that fail most often in production. Not theoretical failures — real incidents.

I Built a $400/mo LLM Cost Monitoring System (Here's What I Learned)

Dev.to · Jamie Cole 2w ago

I Built a $400/mo LLM Cost Monitoring System (Here's What I Learned)

After a $3000 surprise bill, I built cost monitoring for every LLM call. Here's the exact architecture and what it cost me.

The LLM Monitoring Stack I Run in Production (It's 3 Tools, $50/mo)

Dev.to · Jamie Cole 2w ago

The LLM Monitoring Stack I Run in Production (It's 3 Tools, $50/mo)

After 18 months of running LLMs in production, this is the exact monitoring setup I use and what it costs.

Why Prompt Testing Alone Won't Catch LLM Drift (And What Will)

Dev.to · Jamie Cole 2w ago

Why Prompt Testing Alone Won't Catch LLM Drift (And What Will)

Everyone tests their prompts before launch. Nobody catches what happens when the model silently updates a week later.

How to Add LLM Drift Monitoring to Your CI/CD Pipeline in 10 Minutes

Dev.to · Jamie Cole 2w ago

How to Add LLM Drift Monitoring to Your CI/CD Pipeline in 10 Minutes

A practical guide to adding automated LLM drift detection to your existing CI/CD workflow. Step-by-step with GitHub Actions.

I Ran 300 LLM Drift Checks: Here's the Distribution of Failure Patterns I Found

Dev.to · Jamie Cole 2w ago

I Ran 300 LLM Drift Checks: Here's the Distribution of Failure Patterns I Found

After 300 automated drift checks across GPT-4o, Claude, and Gemini, here's exactly where models fail most often.

The LLM Tooling Stack I Actually Use in 2026 (After 18 Months of Testing)

Dev.to · Jamie Cole 2w ago

The LLM Tooling Stack I Actually Use in 2026 (After 18 Months of Testing)

The exact tools I use daily for LLM development — not the popular ones, the ones that actually work.

I Built an LLM Drift Detector — It Caught GPT-4o Changing Behaviour in Production

Dev.to · Jamie Cole 2w ago

I Built an LLM Drift Detector — It Caught GPT-4o Changing Behaviour in Production

The story of building an automated regression testing system for LLMs — and what it found when it watched GPT-4o for 30 days.

The Structured Output Pattern: How to Get LLMs to Return Clean JSON Every Time

Dev.to · Jamie Cole 2w ago

The Structured Output Pattern: How to Get LLMs to Return Clean JSON Every Time

JSON mode, system prompts, and parsing tricks that make LLM output actually usable in production.

GPT-5.1 Was Retired on March 11 — Here's What Broke in Your LLM App

Dev.to · Jamie Cole 4w ago

GPT-5.1 Was Retired on March 11 — Here's What Broke in Your LLM App

OpenAI retired GPT-5.1 on March 11 with automatic fallback to GPT-5.3/5.4. If your app calls gpt-5.1, it's now running a different model — silently. Here's exac

How to Add LLM Drift Monitoring to Your CI/CD Pipeline (Free, 5 Minutes)

Dev.to · Jamie Cole 4w ago

How to Add LLM Drift Monitoring to Your CI/CD Pipeline (Free, 5 Minutes)

Unit tests don't catch LLM behavioral drift. Here's a practical CI/CD setup that detects format regressions, instruction compliance drift, and output changes be

I Found a 0.575 Drift Score Between Two Consecutive LLM Runs. Here's Exactly What Changed.

Dev.to · Jamie Cole 4w ago

I Found a 0.575 Drift Score Between Two Consecutive LLM Runs. Here's Exactly What Changed.

Real data: same prompt, same model, two consecutive runs. Drift score 0.575. The cause: a trailing period. Here's the exact output diff and why it breaks produc

Real LLM Drift Detection Results: Exact Outputs, Real Scores, No Fabrication

Dev.to · Jamie Cole 4w ago

Real LLM Drift Detection Results: Exact Outputs, Real Scores, No Fabrication

Real drift scores from DriftWatch on production-style prompts. Exact outputs shown. 0.575: trailing period dropped on sentiment classifier. 0.316: JSON whitespa

PromptFoo Passes. Production Still Breaks. Here's the Gap.

Dev.to · Jamie Cole 4w ago

PromptFoo Passes. Production Still Breaks. Here's the Gap.

I had PromptFoo set up in CI. Evals passed on every deployment. The model still silently changed in...

How to Get Notified the Moment OpenAI or Anthropic Changes Your Model

Dev.to · Jamie Cole 4w ago

How to Get Notified the Moment OpenAI or Anthropic Changes Your Model

OpenAI doesn't email you when GPT-4o changes. Anthropic doesn't either. You find out from users. Or...

Your LLM CI/CD Tests Aren't Enough — Here's the Gap

Dev.to · Jamie Cole 4w ago

Your LLM CI/CD Tests Aren't Enough — Here's the Gap

Your CI/CD pipeline runs before every deploy. Your LLM prompt tests pass. You ship. Three days...

GPT-5.2 Changed on Feb 10 — Here's How to Know If Your Prompts Broke

Dev.to · Jamie Cole 4w ago

GPT-5.2 Changed on Feb 10 — Here's How to Know If Your Prompts Broke

On February 10, 2026, OpenAI pushed a silent update to GPT-5.2 Instant. The release notes said it...

My LLM Started Lying to My App and I Didn't Notice for Three Days

Dev.to · Jamie Cole 4w ago

My LLM Started Lying to My App and I Didn't Notice for Three Days

It started with a Slack message from a user: "Your summaries look weird." Not an error. Not a crash....