LLM Benchmarks - a Ironpiper Collection

Ironpiper 's Collections

LLM Benchmarks

updated Jul 6

Can LLMs Identify Critical Limitations within Scientific Research? A Systematic Evaluation on AI Research Papers

Paper • 2507.02694 • Published Jul 3 • 19
Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs

Paper • 2507.02778 • Published Jul 3 • 9