Prompt Optimization Is a Coin Flip: Diagnosing When It Helps in Compound AI Systems
📰 ArXiv cs.AI
arXiv:2604.14585v1 Announce Type: new Abstract: Prompt optimization in compound AI systems is statistically indistinguishable from a coin flip: across 72 optimization runs on Claude Haiku (6 methods $\times$ 4 tasks $\times$ 3 repeats), 49% score below zero-shot; on Amazon Nova Lite, the failure rate is even higher. Yet on one task, all six methods improve over zero-shot by up to $+6.8$ points. What distinguishes success from failure? We investigate with 18,000 grid evaluations and 144 optimizat
DeepCamp AI