Soft Tournament Equilibrium
📰 ArXiv cs.AI
arXiv:2604.04328v1 Announce Type: new Abstract: The evaluation of general-purpose artificial agents, particularly those based on large language models, presents a significant challenge due to the non-transitive nature of their interactions. When agent A defeats B, B defeats C, and C defeats A, traditional ranking methods that force a linear ordering can be misleading and unstable. We argue that for such cyclic domains, the fundamental object of evaluation should not be a ranking but a set-valued
DeepCamp AI