League of LLMs: A Benchmark-Free Paradigm for Mutual Evaluation of Large Language Models

📰 ArXiv cs.AI

arXiv:2507.22359v4 Announce Type: replace Abstract: Although large language models (LLMs) have shown exceptional capabilities across a wide range of tasks, reliable evaluation remains a critical challenge due to data contamination, opaque operation, and subjective preferences. To address these issues, we propose League of LLMs (LOL), a novel benchmark-free evaluation paradigm that organizes multiple LLMs into a self-governed league for multi-round mutual evaluation. LOL integrates four core crit

Published 15 Apr 2026

Read full paper → ← Back to Reads