Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds Paper • 2511.08892 • Published 29 days ago • 194
Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values Paper • 2510.20187 • Published Oct 23 • 18
Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values Paper • 2510.20187 • Published Oct 23 • 18 • 2
VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning Paper • 2510.01444 • Published Oct 1 • 19
CLUE: Non-parametric Verification from Experience via Hidden-State Clustering Paper • 2510.01591 • Published Oct 2 • 26
CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models Paper • 2509.09675 • Published Sep 11 • 28
DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning Paper • 2504.11456 • Published Apr 15 • 12
Expanding RL with Verifiable Rewards Across Diverse Domains Paper • 2503.23829 • Published Mar 31 • 23
Expanding RL with Verifiable Rewards Across Diverse Domains Paper • 2503.23829 • Published Mar 31 • 23
RLVR Collection Model and data for 'Expanding RL with Verifiable Rewards Across Diverse Domains' • 3 items • Updated Mar 31 • 13
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs Paper • 2412.21187 • Published Dec 30, 2024 • 40
OpenCharacter: Training Customizable Role-Playing LLMs with Large-Scale Synthetic Personas Paper • 2501.15427 • Published Jan 26 • 6
Improving LLM General Preference Alignment via Optimistic Online Mirror Descent Paper • 2502.16852 • Published Feb 24