FormalProofBench: Can Models Write Graduate Level Math Proofs That Are Formally Verified?

📰 ArXiv cs.AI

FormalProofBench evaluates AI models' ability to produce formally verified graduate-level math proofs

advanced Published 31 Mar 2026

Action Steps

Design a private benchmark with natural-language problems and Lean~4 formal statements
Pair each problem with a Lean proof accepted by the Lean 4 checker
Evaluate AI models' ability to output valid Lean proofs
Assess model performance on advanced undergraduate and graduate mathematics problems

Who Needs to Know This

Researchers and developers in AI and mathematics can benefit from FormalProofBench to assess and improve model performance, while educators can use it to enhance graduate-level math education

Key Insight

💡 AI models can be evaluated on their ability to produce formally verified mathematical proofs at the graduate level using FormalProofBench