-
JudgeLRM: Large Reasoning Models as a Judge
Paper • 2504.00050 • Published • 62 -
RM-R1: Reward Modeling as Reasoning
Paper • 2505.02387 • Published • 79 -
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning
Paper • 2505.01441 • Published • 39 -
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning
Paper • 2505.03318 • Published • 92
Roman256
Roman12322
·
AI & ML interests
None yet
Recent Activity
upvoted
a
collection
about 13 hours ago
NeMo Gym
liked
a model
about 14 hours ago
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
upvoted
an
article
about 2 months ago
Why Did MiniMax M2 End Up as a Full Attention Model?
Organizations
None yet