Does anyone get amazed by LLM performance on benchmarks but incredibly disappointed by its performance on mundane tasks, specifically those involving data lookup?
📰 Reddit r/singularity
So AIs blow a lot of benchmarks out of the water. And as a doctor, I feel like it answers well structured medical questions, even extremely hard ones, insanely well. However, I find that whenever I ask it to do mundane tasks, specifically ones that involve pulling data from the Internet or working with data it’s given, it’s stupid. Examples: If I ask it to lookup which lawyers near me do traffic ticket cases, it will just give me 5 random
DeepCamp AI