Baby vs LLM: Agent evaluation under operational disguise ( with source code )
📰 Dev.to · Alexandru Spînu
Results are subject to change as I continue to complete it for the rest of the models. A few days...
Results are subject to change as I continue to complete it for the rest of the models. A few days...