AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning Paper • 2509.08755 • Published Sep 10 • 56
Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents Paper • 2509.09265 • Published Sep 11 • 46
Evaluating Language Model Context Windows: A "Working Memory" Test and Inference-time Correction Paper • 2407.03651 • Published Jul 4, 2024 • 18