Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2504.16078

Reinforcement learning

Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning

Paper • 2407.20798 • Published Jul 30, 2024 • 24
Offline Reinforcement Learning for LLM Multi-Step Reasoning

Paper • 2412.16145 • Published Dec 20, 2024 • 38
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4 • 102
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Paper • 2502.18449 • Published Feb 25 • 75

LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities

Paper • 2504.16078 • Published Apr 22 • 21

I-Con: A Unifying Framework for Representation Learning

Paper • 2504.16929 • Published Apr 23 • 29
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities

Paper • 2504.16078 • Published Apr 22 • 21
WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents

Paper • 2504.15785 • Published Apr 22 • 20
OTC: Optimal Tool Calls via Reinforcement Learning

Paper • 2504.14870 • Published Apr 21 • 35

CoRAG: Collaborative Retrieval-Augmented Generation

Paper • 2504.01883 • Published Apr 2 • 9
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Paper • 2504.08837 • Published Apr 10 • 43
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model

Paper • 2504.10068 • Published Apr 14 • 30
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

Paper • 2504.10481 • Published Apr 14 • 84

Large Language Models Think Too Fast To Explore Effectively

Paper • 2501.18009 • Published Jan 29 • 24
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 123
Intuitive physics understanding emerges from self-supervised pretraining on natural videos

Paper • 2502.11831 • Published Feb 17 • 20
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity

Paper • 2502.13063 • Published Feb 18 • 72

Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning

Paper • 2505.01441 • Published Apr 28 • 39
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities

Paper • 2504.16078 • Published Apr 22 • 21
Emergent Agentic Transformer from Chain of Hindsight Experience

Paper • 2305.16554 • Published May 26, 2023
DiaTool-DPO: Multi-Turn Direct Preference Optimization for Tool-Augmented Large Language Models

Paper • 2504.02882 • Published Apr 2 • 7

Interesting Papers

LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities

Paper • 2504.16078 • Published Apr 22 • 21
PaSa: An LLM Agent for Comprehensive Academic Paper Search

Paper • 2501.10120 • Published Jan 17 • 52

Reinforcement Learning

LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities

Paper • 2504.16078 • Published Apr 22 • 21
Toward Evaluative Thinking: Meta Policy Optimization with Evolving Reward Models

Paper • 2504.20157 • Published Apr 28 • 37

Representation & Optimization

Understanding about representation sheds light on optimization

Nuclear Norm Regularization for Deep Learning

Paper • 2405.14544 • Published May 23, 2024 • 1
Token embeddings violate the manifold hypothesis

Paper • 2504.01002 • Published Apr 1 • 1
Approximate Nullspace Augmented Finetuning for Robust Vision Transformers

Paper • 2403.10476 • Published Mar 15, 2024 • 1
ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning

Paper • 2504.00254 • Published Mar 31 • 1

Reasoning, Thinking, RL and Test-Time Scaling

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 39
Token-Budget-Aware LLM Reasoning

Paper • 2412.18547 • Published Dec 24, 2024 • 46
Efficiently Serving LLM Reasoning Programs with Certaindex

Paper • 2412.20993 • Published Dec 30, 2024 • 37
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

Paper • 2412.17256 • Published Dec 23, 2024 • 47

Reinforcement learning

Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning

Paper • 2407.20798 • Published Jul 30, 2024 • 24
Offline Reinforcement Learning for LLM Multi-Step Reasoning

Paper • 2412.16145 • Published Dec 20, 2024 • 38
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4 • 102
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Paper • 2502.18449 • Published Feb 25 • 75

Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning

Paper • 2505.01441 • Published Apr 28 • 39
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities

Paper • 2504.16078 • Published Apr 22 • 21
Emergent Agentic Transformer from Chain of Hindsight Experience

Paper • 2305.16554 • Published May 26, 2023
DiaTool-DPO: Multi-Turn Direct Preference Optimization for Tool-Augmented Large Language Models

Paper • 2504.02882 • Published Apr 2 • 7

LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities

Paper • 2504.16078 • Published Apr 22 • 21

Interesting Papers

LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities

Paper • 2504.16078 • Published Apr 22 • 21
PaSa: An LLM Agent for Comprehensive Academic Paper Search

Paper • 2501.10120 • Published Jan 17 • 52

I-Con: A Unifying Framework for Representation Learning

Paper • 2504.16929 • Published Apr 23 • 29
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities

Paper • 2504.16078 • Published Apr 22 • 21
WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents

Paper • 2504.15785 • Published Apr 22 • 20
OTC: Optimal Tool Calls via Reinforcement Learning

Paper • 2504.14870 • Published Apr 21 • 35

Reinforcement Learning

LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities

Paper • 2504.16078 • Published Apr 22 • 21
Toward Evaluative Thinking: Meta Policy Optimization with Evolving Reward Models

Paper • 2504.20157 • Published Apr 28 • 37

CoRAG: Collaborative Retrieval-Augmented Generation

Paper • 2504.01883 • Published Apr 2 • 9
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Paper • 2504.08837 • Published Apr 10 • 43
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model

Paper • 2504.10068 • Published Apr 14 • 30
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

Paper • 2504.10481 • Published Apr 14 • 84

Representation & Optimization

Understanding about representation sheds light on optimization

Nuclear Norm Regularization for Deep Learning

Paper • 2405.14544 • Published May 23, 2024 • 1
Token embeddings violate the manifold hypothesis

Paper • 2504.01002 • Published Apr 1 • 1
Approximate Nullspace Augmented Finetuning for Robust Vision Transformers

Paper • 2403.10476 • Published Mar 15, 2024 • 1
ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning

Paper • 2504.00254 • Published Mar 31 • 1

Large Language Models Think Too Fast To Explore Effectively

Paper • 2501.18009 • Published Jan 29 • 24
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 123
Intuitive physics understanding emerges from self-supervised pretraining on natural videos

Paper • 2502.11831 • Published Feb 17 • 20
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity

Paper • 2502.13063 • Published Feb 18 • 72

Reasoning, Thinking, RL and Test-Time Scaling

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 39
Token-Budget-Aware LLM Reasoning

Paper • 2412.18547 • Published Dec 24, 2024 • 46
Efficiently Serving LLM Reasoning Programs with Certaindex

Paper • 2412.20993 • Published Dec 30, 2024 • 37
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

Paper • 2412.17256 • Published Dec 23, 2024 • 47

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs