Quantifying Cross-Query Contradictions in Multi-Query LLM Reasoning
📰 ArXiv cs.AI
arXiv:2604.14525v1 Announce Type: new Abstract: Large language models frequently produce mutually inconsistent answers when reasoning over multiple related queries. We study case-file logical consistency: maintaining a globally satisfiable belief state across interdependent queries. We introduce a benchmark of 390 multi-query reasoning instances with entailment/contradiction/unknown labels and propose set-level metrics including Case Satisfiability Rate, Contradiction Density and Revision Cost.
DeepCamp AI