Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward Paper • 2510.03222 • Published Oct 3 • 75
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use Paper • 2510.05592 • Published Oct 7 • 106
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning Paper • 2509.22576 • Published Sep 26 • 134
A Survey of Reinforcement Learning for Large Reasoning Models Paper • 2509.08827 • Published Sep 10 • 189
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning Paper • 2509.08755 • Published Sep 10 • 56
Dyna-Mind: Learning to Simulate from Experience for Better AI Agents Paper • 2510.09577 • Published Oct 10 • 7
Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning Paper • 2512.15687 • Published 7 days ago • 17
Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models Paper • 2512.13607 • Published 9 days ago • 25
SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning Paper • 2512.13874 • Published 9 days ago • 16