League of LLMs: A Benchmark-Free Paradigm for Mutual Evaluation of Large Language Models
📰 ArXiv cs.AI
arXiv:2507.22359v4 Announce Type: replace Abstract: Although large language models (LLMs) have shown exceptional capabilities across a wide range of tasks, reliable evaluation remains a critical challenge due to data contamination, opaque operation, and subjective preferences. To address these issues, we propose League of LLMs (LOL), a novel benchmark-free evaluation paradigm that organizes multiple LLMs into a self-governed league for multi-round mutual evaluation. LOL integrates four core crit
DeepCamp AI