I needed to know if the cheaper model was good enough. So I built an LLM-as-a-Judge pipeline

📰 Dev.to · archminor

Benchmarks are useful, but they don't really tell me whether a prompt change or cheaper model is good...

Published 6 Apr 2026