StockBench: Can LLM Agents Trade Stocks Profitably In Real-world Markets? Paper • 2510.02209 • Published Oct 2 • 53
MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use Paper • 2509.24002 • Published Sep 28 • 173
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search Paper • 2509.25454 • Published Sep 29 • 140
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models Paper • 2508.06471 • Published Aug 8 • 192
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning Paper • 2508.16949 • Published Aug 23 • 22
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation Paper • 2507.10524 • Published Jul 14 • 70
OpenCulture Collection A multilingual dataset of public domain books and newspapers. • 27 items • Updated Nov 6, 2024 • 131