kaizuberbuehler
's Collections
Reasoning, Thinking, RL and Test-Time Scaling
updated
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via
Collective Monte Carlo Tree Search
Paper
•
2412.18319
•
Published
•
39
Token-Budget-Aware LLM Reasoning
Paper
•
2412.18547
•
Published
•
46
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper
•
2412.20993
•
Published
•
37
B-STaR: Monitoring and Balancing Exploration and Exploitation in
Self-Taught Reasoners
Paper
•
2412.17256
•
Published
•
47
Paper
•
2412.16720
•
Published
•
35
DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought
Paper
•
2412.17498
•
Published
•
22
Outcome-Refining Process Supervision for Code Generation
Paper
•
2412.15118
•
Published
•
19
Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's
Reasoning Capability
Paper
•
2411.19943
•
Published
•
63
MALT: Improving Reasoning with Multi-Agent LLM Training
Paper
•
2412.01928
•
Published
•
45
Mars-PO: Multi-Agent Reasoning System Preference Optimization
Paper
•
2411.19039
•
Published
•
1
Flow-DPO: Improving LLM Mathematical Reasoning through Online
Multi-Agent Learning
Paper
•
2410.22304
•
Published
•
18
o1-Coder: an o1 Replication for Coding
Paper
•
2412.00154
•
Published
•
44
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
Paper
•
2411.14405
•
Published
•
61
OpenR: An Open Source Framework for Advanced Reasoning with Large
Language Models
Paper
•
2410.09671
•
Published
•
1
SRA-MCTS: Self-driven Reasoning Augmentation with Monte Carlo Tree
Search for Code Generation
Paper
•
2411.11053
•
Published
•
4
Beyond Examples: High-level Automated Reasoning Paradigm in In-Context
Learning via MCTS
Paper
•
2411.18478
•
Published
•
37
Reverse Thinking Makes LLMs Stronger Reasoners
Paper
•
2411.19865
•
Published
•
23
Enhancing LLM Reasoning via Critique Models with Test-Time and
Training-Time Supervision
Paper
•
2411.16579
•
Published
•
3
Vision-Language Models Can Self-Improve Reasoning via Reflection
Paper
•
2411.00855
•
Published
•
5
Language Models are Hidden Reasoners: Unlocking Latent Reasoning
Capabilities via Self-Rewarding
Paper
•
2411.04282
•
Published
•
37
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large
Language Models
Paper
•
2411.14432
•
Published
•
25
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
Paper
•
2411.18203
•
Published
•
41
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple
Distillation, Big Progress or Bitter Lesson?
Paper
•
2411.16489
•
Published
•
47
VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained
Video Reasoning via Core Frame Selection
Paper
•
2411.14794
•
Published
•
13
Enhancing the Reasoning Ability of Multimodal Large Language Models via
Mixed Preference Optimization
Paper
•
2411.10442
•
Published
•
87
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level
Mathematical Reasoning
Paper
•
2410.02884
•
Published
•
54
LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Paper
•
2411.10440
•
Published
•
129
Large Language Models Can Self-Improve in Long-context Reasoning
Paper
•
2411.08147
•
Published
•
66
Self-Consistency Preference Optimization
Paper
•
2411.04109
•
Published
•
19
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep
Thinking
Paper
•
2501.04519
•
Published
•
285
URSA: Understanding and Verifying Chain-of-thought Reasoning in
Multimodal Mathematics
Paper
•
2501.04686
•
Published
•
53
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta
Chain-of-Though
Paper
•
2501.04682
•
Published
•
99
BoostStep: Boosting mathematical capability of Large Language Models via
improved single-step reasoning
Paper
•
2501.03226
•
Published
•
44
Test-time Computing: from System-1 Thinking to System-2 Thinking
Paper
•
2501.02497
•
Published
•
46
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM
Paper
•
2501.01904
•
Published
•
33
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
Paper
•
2412.21187
•
Published
•
41
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Paper
•
2501.05366
•
Published
•
102
The Lessons of Developing Process Reward Models in Mathematical
Reasoning
Paper
•
2501.07301
•
Published
•
99
O1 Replication Journey -- Part 3: Inference-time Scaling for Medical
Reasoning
Paper
•
2501.06458
•
Published
•
31
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
Paper
•
2501.06186
•
Published
•
65
OmniThink: Expanding Knowledge Boundaries in Machine Writing through
Thinking
Paper
•
2501.09751
•
Published
•
48
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with
Large Language Models
Paper
•
2501.09686
•
Published
•
41
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
Paper
•
2501.12948
•
Published
•
420
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Paper
•
2501.12599
•
Published
•
123
s1: Simple test-time scaling
Paper
•
2501.19393
•
Published
•
124
Demystifying Long Chain-of-Thought Reasoning in LLMs
Paper
•
2502.03373
•
Published
•
58
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time
Scaling
Paper
•
2502.06703
•
Published
•
153
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model
Post-training
Paper
•
2501.17161
•
Published
•
122
On the Emergence of Thinking in LLMs I: Searching for the Right
Intuition
Paper
•
2502.06773
•
Published
•
1
Competitive Programming with Large Reasoning Models
Paper
•
2502.06807
•
Published
•
68
Evolving Deeper LLM Thinking
Paper
•
2501.09891
•
Published
•
115
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs)
More Self-Confident Even When They Are Wrong
Paper
•
2501.09775
•
Published
•
33
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward
Model
Paper
•
2501.12368
•
Published
•
45
Reasoning Language Models: A Blueprint
Paper
•
2501.11223
•
Published
•
33
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
Paper
•
2501.12570
•
Published
•
27
Pairwise RM: Perform Best-of-N Sampling with Knockout Tournament
Paper
•
2501.13007
•
Published
•
20
Can We Generate Images with CoT? Let's Verify and Reinforce Image
Generation Step by Step
Paper
•
2501.13926
•
Published
•
42
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary
Feedback
Paper
•
2501.10799
•
Published
•
15
Chain-of-Retrieval Augmented Generation
Paper
•
2501.14342
•
Published
•
58
RL + Transformer = A General-Purpose Problem Solver
Paper
•
2501.14176
•
Published
•
28
Towards General-Purpose Model-Free Reinforcement Learning
Paper
•
2501.16142
•
Published
•
30
Atla Selene Mini: A General Purpose Evaluation Model
Paper
•
2501.17195
•
Published
•
35
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper
•
2501.18585
•
Published
•
61
Large Language Models Think Too Fast To Explore Effectively
Paper
•
2501.18009
•
Published
•
24
Reward-Guided Speculative Decoding for Efficient LLM Reasoning
Paper
•
2501.19324
•
Published
•
39
Process Reinforcement through Implicit Rewards
Paper
•
2502.01456
•
Published
•
61
NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions
Paper
•
2502.13124
•
Published
•
6
ACECODER: Acing Coder RL via Automated Test-Case Synthesis
Paper
•
2502.01718
•
Published
•
29
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM
Reasoning via Autoregressive Search
Paper
•
2502.02508
•
Published
•
23
QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search
Paper
•
2502.02584
•
Published
•
17
LIMO: Less is More for Reasoning
Paper
•
2502.03387
•
Published
•
62
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Paper
•
2502.02339
•
Published
•
22
On Teacher Hacking in Language Model Distillation
Paper
•
2502.02671
•
Published
•
18
Token Assorted: Mixing Latent and Text Tokens for Improved Language
Model Reasoning
Paper
•
2502.03275
•
Published
•
18
Gold-medalist Performance in Solving Olympiad Geometry with
AlphaGeometry2
Paper
•
2502.03544
•
Published
•
43
BOLT: Bootstrap Long Chain-of-Thought in Language Models without
Distillation
Paper
•
2502.03860
•
Published
•
25
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth
Approach
Paper
•
2502.05171
•
Published
•
150
DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM
Guardrails
Paper
•
2502.05163
•
Published
•
23
Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of
Language Models
Paper
•
2502.04404
•
Published
•
25
Generating Symbolic World Models via Test-time Scaling of Large Language
Models
Paper
•
2502.04728
•
Published
•
19
Exploring the Limit of Outcome Reward for Learning Mathematical
Reasoning
Paper
•
2502.06781
•
Published
•
59
Training Language Models for Social Deduction with Multi-Agent
Reinforcement Learning
Paper
•
2502.06060
•
Published
•
38
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates
Paper
•
2502.06772
•
Published
•
22
CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction
Paper
•
2502.07316
•
Published
•
50
LLMs Can Easily Learn to Reason from Demonstrations Structure, not
content, is what matters!
Paper
•
2502.07374
•
Published
•
40
Teaching Language Models to Critique via Reinforcement Learning
Paper
•
2502.03492
•
Published
•
24
Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance
Paper
•
2502.08127
•
Published
•
58
Ignore the KL Penalty! Boosting Exploration on Critical Tokens to
Enhance RL Fine-Tuning
Paper
•
2502.06533
•
Published
•
17
An Open Recipe: Adapting Language-Specific LLMs to a Reasoning Model in
One Day via Model Merging
Paper
•
2502.09056
•
Published
•
31
SelfCite: Self-Supervised Alignment for Context Attribution in Large
Language Models
Paper
•
2502.09604
•
Published
•
35
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for
Reasoning Quality, Robustness, and Efficiency
Paper
•
2502.09621
•
Published
•
28
Logical Reasoning in Large Language Models: A Survey
Paper
•
2502.09100
•
Published
•
24
SQuARE: Sequential Question Answering Reasoning Engine for Enhanced
Chain-of-Thought in Large Language Models
Paper
•
2502.09390
•
Published
•
16
Typhoon T1: An Open Thai Reasoning Model
Paper
•
2502.09042
•
Published
•
16
CoT-Valve: Length-Compressible Chain-of-Thought Tuning
Paper
•
2502.09601
•
Published
•
14
Mathematical Reasoning in Large Language Models: Assessing Logical and
Arithmetic Errors across Wide Numerical Ranges
Paper
•
2502.08680
•
Published
•
11
Small Models Struggle to Learn from Strong Reasoners
Paper
•
2502.12143
•
Published
•
39
S*: Test Time Scaling for Code Generation
Paper
•
2502.14382
•
Published
•
63
Diverse Inference and Verification for Advanced Reasoning
Paper
•
2502.09955
•
Published
•
18
Search-R1: Training LLMs to Reason and Leverage Search Engines with
Reinforcement Learning
Paper
•
2503.09516
•
Published
•
36
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open
Software Evolution
Paper
•
2502.18449
•
Published
•
75
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement
Learning
Paper
•
2502.14768
•
Published
•
47
AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via
GRPO
Paper
•
2502.14669
•
Published
•
14
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning
in Diffusion Models
Paper
•
2502.10458
•
Published
•
37
S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement
Learning
Paper
•
2502.12853
•
Published
•
29
Thinking Preference Optimization
Paper
•
2502.13173
•
Published
•
17
Self-rewarding correction for mathematical reasoning
Paper
•
2502.19613
•
Published
•
83
Can Large Language Models Detect Errors in Long Chain-of-Thought
Reasoning?
Paper
•
2502.19361
•
Published
•
28
LightThinker: Thinking Step-by-Step Compression
Paper
•
2502.15589
•
Published
•
31
Reinforcement Learning for Reasoning in Small LLMs: What Works and What
Doesn't
Paper
•
2503.16219
•
Published
•
52
R1-T1: Fully Incentivizing Translation Capability in LLMs via Reasoning
Learning
Paper
•
2502.19735
•
Published
•
9
PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning
Trajectories for Complex Problem Solving
Paper
•
2502.16111
•
Published
•
9
TAG: A Decentralized Framework for Multi-Agent Hierarchical
Reinforcement Learning
Paper
•
2502.15425
•
Published
•
9
The Relationship Between Reasoning and Performance in Large Language
Models -- o3 (mini) Thinks Harder, Not Longer
Paper
•
2502.15631
•
Published
•
9
Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for
Multimodal Reasoning Models
Paper
•
2502.16033
•
Published
•
18
Linguistic Generalizability of Test-Time Scaling in Mathematical
Reasoning
Paper
•
2502.17407
•
Published
•
26
VEM: Environment-Free Exploration for Training GUI Agent with Value
Environment Model
Paper
•
2502.18906
•
Published
•
12
Agentic Reward Modeling: Integrating Human Preferences with Verifiable
Correctness Signals for Reliable Reward Systems
Paper
•
2502.19328
•
Published
•
23
START: Self-taught Reasoner with Tools
Paper
•
2503.04625
•
Published
•
113
Visual-RFT: Visual Reinforcement Fine-Tuning
Paper
•
2503.01785
•
Published
•
84
Chain of Draft: Thinking Faster by Writing Less
Paper
•
2502.18600
•
Published
•
49
Process-based Self-Rewarding Language Models
Paper
•
2503.03746
•
Published
•
39
DeepSolution: Boosting Complex Engineering Solution Design via
Tree-based Exploration and Bi-point Thinking
Paper
•
2502.20730
•
Published
•
38
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four
Habits of Highly Effective STaRs
Paper
•
2503.01307
•
Published
•
38
Sim-to-Real Reinforcement Learning for Vision-Based Dexterous
Manipulation on Humanoids
Paper
•
2502.20396
•
Published
•
15
Language Models can Self-Improve at State-Value Estimation for Better
Search
Paper
•
2503.02878
•
Published
•
10
Does Reinforcement Learning Really Incentivize Reasoning Capacity in
LLMs Beyond the Base Model?
Paper
•
2504.13837
•
Published
•
135
SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers
Paper
•
2502.20545
•
Published
•
22
Unified Reward Model for Multimodal Understanding and Generation
Paper
•
2503.05236
•
Published
•
123
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through
Two-Stage Rule-Based RL
Paper
•
2503.07536
•
Published
•
88
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale
Reinforcement Learning
Paper
•
2503.07365
•
Published
•
61
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model
Paper
•
2503.05132
•
Published
•
57
World Modeling Makes a Better Planner: Dual Preference Optimization for
Embodied Task Planning
Paper
•
2503.10480
•
Published
•
55
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive
Cognitive-Inspired Sketching
Paper
•
2503.05179
•
Published
•
46
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Paper
•
2503.07572
•
Published
•
46
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
Paper
•
2503.10291
•
Published
•
36
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with
Reinforcing Learning
Paper
•
2503.05379
•
Published
•
38
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large
Language Models
Paper
•
2503.06749
•
Published
•
31
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and
Beyond
Paper
•
2503.10460
•
Published
•
29
R1-Searcher: Incentivizing the Search Capability in LLMs via
Reinforcement Learning
Paper
•
2503.05592
•
Published
•
27
AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via
Reinforcement Learning and Reasoning
Paper
•
2503.07608
•
Published
•
23
Implicit Reasoning in Transformers is Reasoning through Shortcuts
Paper
•
2503.07604
•
Published
•
23
TTRL: Test-Time Reinforcement Learning
Paper
•
2504.16084
•
Published
•
120
Learning from Failures in Multi-Attempt Reinforcement Learning
Paper
•
2503.04808
•
Published
•
18
R1-Onevision: Advancing Generalized Multimodal Reasoning through
Cross-Modal Formalization
Paper
•
2503.10615
•
Published
•
17
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based
VLM Agent Training
Paper
•
2503.08525
•
Published
•
17
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement
Learning
Paper
•
2503.21620
•
Published
•
62
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper
•
2503.14476
•
Published
•
141
Stop Overthinking: A Survey on Efficient Reasoning for Large Language
Models
Paper
•
2503.16419
•
Published
•
75
Being-0: A Humanoid Robotic Agent with Vision-Language Models and
Modular Skills
Paper
•
2503.12533
•
Published
•
68
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Paper
•
2503.15558
•
Published
•
50
SPIN-Bench: How Well Do LLMs Plan Strategically and Reason Socially?
Paper
•
2503.12349
•
Published
•
44
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Paper
•
2503.12605
•
Published
•
35
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs
for Knowledge-Intensive Visual Grounding
Paper
•
2503.12797
•
Published
•
32
R1-VL: Learning to Reason with Multimodal Large Language Models via
Step-wise Group Relative Policy Optimization
Paper
•
2503.12937
•
Published
•
30
MathFusion: Enhancing Mathematic Problem-solving of LLM through
Instruction Fusion
Paper
•
2503.16212
•
Published
•
25
MetaLadder: Ascending Mathematical Solution Quality via
Analogical-Problem Reasoning Transfer
Paper
•
2503.14891
•
Published
•
22
STEVE: AStep Verification Pipeline for Computer-use Agent Training
Paper
•
2503.12532
•
Published
•
17
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning
Tasks
Paper
•
2503.15478
•
Published
•
13
Measuring AI Ability to Complete Long Tasks
Paper
•
2503.14499
•
Published
•
15
CLS-RL: Image Classification with Rule-Based Reinforcement Learning
Paper
•
2503.16188
•
Published
•
11
Temporal Consistency for LLM Reasoning Process Error Identification
Paper
•
2503.14495
•
Published
•
11
Free-form language-based robotic reasoning and grasping
Paper
•
2503.13082
•
Published
•
11
MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process
Errors Identification
Paper
•
2503.12505
•
Published
•
11
I Have Covered All the Bases Here: Interpreting Reasoning Features in
Large Language Models via Sparse Autoencoders
Paper
•
2503.18878
•
Published
•
119
Video-R1: Reinforcing Video Reasoning in MLLMs
Paper
•
2503.21776
•
Published
•
79
Open Deep Search: Democratizing Search with Open-source Reasoning Agents
Paper
•
2503.20201
•
Published
•
48
Challenging the Boundaries of Reasoning: An Olympiad-Level Math
Benchmark for Large Language Models
Paper
•
2503.21380
•
Published
•
38
Exploring Hallucination of Large Multimodal Models in Video
Understanding: Benchmark, Analysis and Mitigation
Paper
•
2503.19622
•
Published
•
31
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for
Open Base Models in the Wild
Paper
•
2503.18892
•
Published
•
31
ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large
Reasoning Models with Iterative Retrieval Augmented Generation
Paper
•
2503.21729
•
Published
•
29
Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time
Thinking
Paper
•
2503.19855
•
Published
•
29
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning
via Iterative Self-Improvement
Paper
•
2503.17352
•
Published
•
24
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for
Embodied Interactive Tasks
Paper
•
2503.21696
•
Published
•
23
Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models
via Vision-Guided Reinforcement Learning
Paper
•
2503.18013
•
Published
•
20
ReSearch: Learning to Reason with Search for LLMs via Reinforcement
Learning
Paper
•
2503.19470
•
Published
•
19
FastCuRL: Curriculum Reinforcement Learning with Progressive Context
Extension for Efficient Training R1-like Reasoning Models
Paper
•
2503.17287
•
Published
•
11
Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging
Paper
•
2503.20641
•
Published
•
10
Implicit Bias-Like Patterns in Reasoning Models
Paper
•
2503.11572
•
Published
•
8
RL Tango: Reinforcing Generator and Verifier Together for Language
Reasoning
Paper
•
2505.15034
•
Published
•
5
Improved Visual-Spatial Reasoning via R1-Zero-Like Training
Paper
•
2504.00883
•
Published
•
66
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement
Learning on the Base Model
Paper
•
2503.24290
•
Published
•
62
JudgeLRM: Large Reasoning Models as a Judge
Paper
•
2504.00050
•
Published
•
62
Inference-Time Scaling for Generalist Reward Modeling
Paper
•
2504.02495
•
Published
•
56
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large
Language Models
Paper
•
2503.24235
•
Published
•
54
Understanding R1-Zero-Like Training: A Critical Perspective
Paper
•
2503.20783
•
Published
•
56
Efficient Inference for Large Reasoning Models: A Survey
Paper
•
2503.23077
•
Published
•
46
A Survey of Efficient Reasoning for Large Reasoning Models: Language,
Multimodality, and Beyond
Paper
•
2503.21614
•
Published
•
42
Exploring the Effect of Reinforcement Learning on Video Understanding:
Insights from SEED-Bench-R1
Paper
•
2503.24376
•
Published
•
38
Think Before Recommend: Unleashing the Latent Reasoning Power for
Sequential Recommendation
Paper
•
2503.22675
•
Published
•
36
CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive
Program Synthesis
Paper
•
2503.23145
•
Published
•
35
Rethinking RL Scaling for Vision Language Models: A Transparent,
From-Scratch Framework and Comprehensive Evaluation Scheme
Paper
•
2504.02587
•
Published
•
32
Landscape of Thoughts: Visualizing the Reasoning Process of Large
Language Models
Paper
•
2503.22165
•
Published
•
28
Z1: Efficient Test-time Scaling with Code
Paper
•
2504.00810
•
Published
•
26
Expanding RL with Verifiable Rewards Across Diverse Domains
Paper
•
2503.23829
•
Published
•
23
Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on
Elementary School-Level Reasoning Problems?
Paper
•
2504.00509
•
Published
•
22
ReFeed: Multi-dimensional Summarization Refinement with Reflective
Reasoning on Feedback
Paper
•
2503.21332
•
Published
•
23
Effectively Controlling Reasoning Models through Thinking Intervention
Paper
•
2503.24370
•
Published
•
19
Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for
Large Language Models
Paper
•
2503.24377
•
Published
•
18
When To Solve, When To Verify: Compute-Optimal Problem Solving and
Generative Verification for LLM Reasoning
Paper
•
2504.01005
•
Published
•
15
GenPRM: Scaling Test-Time Compute of Process Reward Models via
Generative Reasoning
Paper
•
2504.00891
•
Published
•
14
Interpreting Emergent Planning in Model-Free Reinforcement Learning
Paper
•
2504.01871
•
Published
•
12
m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning
with Large Language Models
Paper
•
2504.00869
•
Published
•
10
Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies
Ahead
Paper
•
2504.00294
•
Published
•
10
Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards
for Reasoning-Enhanced Text-to-SQL
Paper
•
2503.23157
•
Published
•
10
VerifiAgent: a Unified Verification Agent in Language Model Reasoning
Paper
•
2504.00406
•
Published
•
8
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper
•
2504.07128
•
Published
•
86
Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought
Paper
•
2504.05599
•
Published
•
85
Rethinking Reflection in Pre-Training
Paper
•
2504.04022
•
Published
•
79
T1: Tool-integrated Self-verification for Test-time Compute Scaling in
Small Language Models
Paper
•
2504.04718
•
Published
•
42
Missing Premise exacerbates Overthinking: Are Reasoning Models losing
Critical Thinking Skill?
Paper
•
2504.06514
•
Published
•
39
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning
Models
Paper
•
2504.04823
•
Published
•
31
VAPO: Efficient and Reliable Reinforcement Learning for Advanced
Reasoning Tasks
Paper
•
2504.05118
•
Published
•
26
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths
to Reproducibility
Paper
•
2504.07086
•
Published
•
21
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual
Reasoning Self-Improvement
Paper
•
2504.07934
•
Published
•
20
Self-Steering Language Models
Paper
•
2504.07081
•
Published
•
18
SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge
Refinement
Paper
•
2504.03561
•
Published
•
18
Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning
(v1)
Paper
•
2504.03151
•
Published
•
15
Generative Evaluation of Complex Reasoning in Large Language Models
Paper
•
2504.02810
•
Published
•
14
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement
Fine-Tuning
Paper
•
2504.06958
•
Published
•
12
Accelerate Parallelizable Reasoning via Parallel Decoding within One
Sequence
Paper
•
2503.20533
•
Published
•
12
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
Paper
•
2504.05520
•
Published
•
11
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper
•
2504.10481
•
Published
•
84
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper
•
2504.11536
•
Published
•
62
Genius: A Generalizable and Purely Unsupervised Self-Training Framework
For Advanced Reasoning
Paper
•
2504.08672
•
Published
•
55
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models
with Reinforcement Learning
Paper
•
2504.08837
•
Published
•
43
How Instruction and Reasoning Data shape Post-Training: Data Quality
through the Lens of Layer-wise Gradients
Paper
•
2504.10766
•
Published
•
40
Heimdall: test-time scaling on the generative verification
Paper
•
2504.10337
•
Published
•
33
VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model
Paper
•
2504.07615
•
Published
•
33
SQL-R1: Training Natural Language to SQL Reasoning Model By
Reinforcement Learning
Paper
•
2504.08600
•
Published
•
31
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large
Vision-Language Models
Paper
•
2504.11468
•
Published
•
29
S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability
of Large Reasoning Models
Paper
•
2504.10368
•
Published
•
21
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
Paper
•
2504.13055
•
Published
•
19
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to
Reinforce
Paper
•
2504.11343
•
Published
•
19
Efficient Reasoning Models: A Survey
Paper
•
2504.10903
•
Published
•
20
DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM
Post-training
Paper
•
2504.09710
•
Published
•
19
TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning
Paper
•
2504.09641
•
Published
•
16
Sleep-time Compute: Beyond Inference Scaling at Test-time
Paper
•
2504.13171
•
Published
•
15
ReZero: Enhancing LLM search ability by trying one-more-time
Paper
•
2504.11001
•
Published
•
15
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper
•
2504.10449
•
Published
•
15
SimpleAR: Pushing the Frontier of Autoregressive Visual Generation
through Pretraining, SFT, and RL
Paper
•
2504.11455
•
Published
•
14
The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via
Agentic Tree Search
Paper
•
2504.08066
•
Published
•
14
Reasoning Models Can Be Effective Without Thinking
Paper
•
2504.09858
•
Published
•
12
VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search
Paper
•
2504.09130
•
Published
•
12
Syzygy of Thoughts: Improving LLM CoT with the Minimal Free Resolution
Paper
•
2504.09566
•
Published
•
11
Do PhD-level LLMs Truly Grasp Elementary Addition? Probing Rule Learning
vs. Memorization in Large Language Models
Paper
•
2504.05262
•
Published
•
11
SpecReason: Fast and Accurate Inference-Time Compute via Speculative
Reasoning
Paper
•
2504.07891
•
Published
•
5
Learning to Reason under Off-Policy Guidance
Paper
•
2504.14945
•
Published
•
88
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal
Large Language Models
Paper
•
2504.15279
•
Published
•
76
Tina: Tiny Reasoning Models via LoRA
Paper
•
2504.15777
•
Published
•
56
FlowReasoner: Reinforcing Query-Level Meta-Agents
Paper
•
2504.15257
•
Published
•
47
ToolRL: Reward is All Tool Learning Needs
Paper
•
2504.13958
•
Published
•
48
Learning Adaptive Parallel Reasoning with Language Models
Paper
•
2504.15466
•
Published
•
43
PHYBench: Holistic Evaluation of Physical Perception and Reasoning in
Large Language Models
Paper
•
2504.16074
•
Published
•
36
OTC: Optimal Tool Calls via Reinforcement Learning
Paper
•
2504.14870
•
Published
•
35
Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery
Simulation
Paper
•
2504.17207
•
Published
•
30
THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating
Overthinking in Reasoning Models
Paper
•
2504.13367
•
Published
•
26
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making
Abilities
Paper
•
2504.16078
•
Published
•
21
Generative AI Act II: Test Time Scaling Drives Cognition Engineering
Paper
•
2504.13828
•
Published
•
18
Process Reward Models That Think
Paper
•
2504.16828
•
Published
•
18