Glyph: Scaling Context Windows via Visual-Text Compression Paper • 2510.17800 • Published 5 days ago • 55
A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning Paper • 2510.15444 • Published 9 days ago • 135
Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL Paper • 2508.07976 • Published Aug 11 • 51
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use Paper • 2510.05592 • Published 19 days ago • 91
Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval and Synthesis for SLMs Paper • 2509.24107 • Published 27 days ago • 74
BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents Paper • 2504.12516 • Published Apr 16 • 1
BrowseComp-ZH: Benchmarking Web Browsing Ability of Large Language Models in Chinese Paper • 2504.19314 • Published Apr 27 • 7
Reasoning Introduces New Poisoning Attacks Yet Makes Them More Complicated Paper • 2509.05739 • Published Sep 6 • 2
Can Understanding and Generation Truly Benefit Together -- or Just Coexist? Paper • 2509.09666 • Published Sep 11 • 33
SimpleQA Verified: A Reliable Factuality Benchmark to Measure Parametric Knowledge Paper • 2509.07968 • Published Sep 9 • 14
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers Paper • 2509.03059 • Published Sep 3 • 24
MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent Systems Paper • 2505.18943 • Published May 25 • 24
BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs Paper • 2505.13529 • Published May 18 • 11
How Should We Enhance the Safety of Large Reasoning Models: An Empirical Study Paper • 2505.15404 • Published May 21 • 13
Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen! Paper • 2505.15656 • Published May 21 • 15
AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning Paper • 2505.11896 • Published May 17 • 58