FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction Paper • 2508.11987 • Published Aug 16 • 69
Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning Paper • 2405.06680 • Published May 5, 2024 • 1
Code2Logic: Game-Code-Driven Data Synthesis for Enhancing VLMs General Reasoning Paper • 2505.13886 • Published May 20 • 6