ReplicatorBench: Benchmarking LLM Agents for Replicability in Social and Behavioral Sciences

📰 ArXiv cs.AI

arXiv:2602.11354v2 Announce Type: replace Abstract: The literature has witnessed an emerging interest in AI agents for automated assessment of scientific papers. Existing benchmarks focus primarily on the computational aspect of this task, testing agents' ability to reproduce or replicate research outcomes when having access to the code and data. This setting, while foundational, (1) fails to capture the inconsistent availability of new data for replication as opposed to reproduction, and (2) la

Published 13 Apr 2026

Read full paper → ← Back to Reads