- 
	
	
	
CoRAG: Collaborative Retrieval-Augmented Generation
Paper • 2504.01883 • Published • 9 - 
	
	
	
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Paper • 2504.08837 • Published • 43 - 
	
	
	
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Paper • 2504.10068 • Published • 30 - 
	
	
	
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper • 2504.10481 • Published • 85 
Collections
Discover the best community collections!
Collections including paper arxiv:2504.15257 
						
					
				- 
	
	
	
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 - 
	
	
	
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 - 
	
	
	
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 - 
	
	
	
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4 
- 
	
	
	
ReAct: Synergizing Reasoning and Acting in Language Models
Paper • 2210.03629 • Published • 30 - 
	
	
	
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework
Paper • 2308.08155 • Published • 10 - 
	
	
	
Preference Learning Unlocks LLMs' Psycho-Counseling Skills
Paper • 2502.19731 • Published • 7 - 
	
	
	
Learning to Reason under Off-Policy Guidance
Paper • 2504.14945 • Published • 88 
- 
	
	
	
ChipNeMo: Domain-Adapted LLMs for Chip Design
Paper • 2311.00176 • Published • 9 - 
	
	
	
Language Models can be Logical Solvers
Paper • 2311.06158 • Published • 23 - 
	
	
	
JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models
Paper • 2311.05997 • Published • 37 - 
	
	
	
Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs
Paper • 2311.05657 • Published • 32 
- 
	
	
	
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Paper • 2503.24290 • Published • 62 - 
	
	
	
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 119 - 
	
	
	
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 - 
	
	
	
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 141 
- 
	
	
	
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39 - 
	
	
	
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 46 - 
	
	
	
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 37 - 
	
	
	
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 47 
- 
	
	
	
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
Paper • 2412.14161 • Published • 51 - 
	
	
	
Training Software Engineering Agents and Verifiers with SWE-Gym
Paper • 2412.21139 • Published • 24 - 
	
	
	
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
Paper • 2412.19723 • Published • 87 - 
	
	
	
AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation
Paper • 2408.00764 • Published • 1 
- 
	
	
	
Contrastive Decoding Improves Reasoning in Large Language Models
Paper • 2309.09117 • Published • 39 - 
	
	
	
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Paper • 2310.08491 • Published • 55 - 
	
	
	
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding
Paper • 2411.04282 • Published • 37 - 
	
	
	
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
Paper • 2411.14432 • Published • 25 
- 
	
	
	
CoRAG: Collaborative Retrieval-Augmented Generation
Paper • 2504.01883 • Published • 9 - 
	
	
	
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Paper • 2504.08837 • Published • 43 - 
	
	
	
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Paper • 2504.10068 • Published • 30 - 
	
	
	
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper • 2504.10481 • Published • 85 
- 
	
	
	
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Paper • 2503.24290 • Published • 62 - 
	
	
	
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 119 - 
	
	
	
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 - 
	
	
	
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 141 
- 
	
	
	
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 - 
	
	
	
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 - 
	
	
	
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 - 
	
	
	
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4 
- 
	
	
	
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39 - 
	
	
	
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 46 - 
	
	
	
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 37 - 
	
	
	
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 47 
- 
	
	
	
ReAct: Synergizing Reasoning and Acting in Language Models
Paper • 2210.03629 • Published • 30 - 
	
	
	
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework
Paper • 2308.08155 • Published • 10 - 
	
	
	
Preference Learning Unlocks LLMs' Psycho-Counseling Skills
Paper • 2502.19731 • Published • 7 - 
	
	
	
Learning to Reason under Off-Policy Guidance
Paper • 2504.14945 • Published • 88 
- 
	
	
	
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
Paper • 2412.14161 • Published • 51 - 
	
	
	
Training Software Engineering Agents and Verifiers with SWE-Gym
Paper • 2412.21139 • Published • 24 - 
	
	
	
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
Paper • 2412.19723 • Published • 87 - 
	
	
	
AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation
Paper • 2408.00764 • Published • 1 
- 
	
	
	
ChipNeMo: Domain-Adapted LLMs for Chip Design
Paper • 2311.00176 • Published • 9 - 
	
	
	
Language Models can be Logical Solvers
Paper • 2311.06158 • Published • 23 - 
	
	
	
JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models
Paper • 2311.05997 • Published • 37 - 
	
	
	
Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs
Paper • 2311.05657 • Published • 32 
- 
	
	
	
Contrastive Decoding Improves Reasoning in Large Language Models
Paper • 2309.09117 • Published • 39 - 
	
	
	
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Paper • 2310.08491 • Published • 55 - 
	
	
	
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding
Paper • 2411.04282 • Published • 37 - 
	
	
	
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
Paper • 2411.14432 • Published • 25