Does anyone get amazed by LLM performance on benchmarks but incredibly disappointed by its performance on mundane tasks, specifically those involving data lookup?

📰 Reddit r/singularity

So AIs blow a lot of benchmarks out of the water. And as a doctor, I feel like it answers well structured medical questions, even extremely hard ones, insanely well. However, I find that whenever I ask it to do mundane tasks, specifically ones that involve pulling data from the Internet or working with data it’s given, it’s stupid. Examples: If I ask it to lookup which lawyers near me do traffic ticket cases, it will just give me 5 random

Published 12 Apr 2026

Read full article → ← Back to Reads