Parallel-R1: Towards Parallel Thinking via Reinforcement Learning Paper • 2509.07980 • Published Sep 9, 2025 • 105
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices Paper • 2512.01374 • Published Dec 1, 2025 • 105
How Far Are We from Genuinely Useful Deep Research Agents? Paper • 2512.01948 • Published Dec 1, 2025 • 56
On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral Paper • 2512.04220 • Published Dec 3, 2025 • 16
Rethinking the Trust Region in LLM Reinforcement Learning Paper • 2602.04879 • Published 26 days ago • 36
Self-Hinting Language Models Enhance Reinforcement Learning Paper • 2602.03143 • Published 28 days ago • 29
Does Your Reasoning Model Implicitly Know When to Stop Thinking? Paper • 2602.08354 • Published 22 days ago • 253