ARC-AGI-3 Just Broke Every Frontier Model. Humans Score 100%. GPT-5.4 Scores 0.26%.
📰 Dev.to · Skila AI
Every frontier AI model — GPT-5.4, Claude Opus 4.6, Gemini 3.1 Pro — just scored below 1% on a test...
Every frontier AI model — GPT-5.4, Claude Opus 4.6, Gemini 3.1 Pro — just scored below 1% on a test...