SPEAR Collection Checkpoints "Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning" arxiv [2509.22601] • 14 items • Updated Dec 4, 2025 • 2
SmartSnap Collection Data and Checkpoints of "SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents" [arxiv.org/abs/2512.22322] • 7 items • Updated 27 days ago • 3
Youtu-Agent RL Collection The checkpoints of the models trained with Youtu-Agent RL for Code/Math and Search tasks. • 3 items • Updated 14 days ago • 3
Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization Paper • 2512.24615 • Published 26 days ago • 117
YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection Paper • 2512.23273 • Published 27 days ago • 14
Nested Browser-Use Learning for Agentic Information Seeking Paper • 2512.23647 • Published 27 days ago • 18
SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents Paper • 2512.22322 • Published about 1 month ago • 39
ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting Paper • 2410.17856 • Published Oct 23, 2024 • 52
SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space Paper • 2511.20102 • Published Nov 25, 2025 • 27
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning Paper • 2511.16043 • Published Nov 20, 2025 • 109
Scalable Multi-Task Reinforcement Learning for Generalizable Spatial Intelligence in Visuomotor Agents Paper • 2507.23698 • Published Jul 31, 2025 • 11
Agent Lightning: Train ANY AI Agents with Reinforcement Learning Paper • 2508.03680 • Published Aug 5, 2025 • 126
LTD-Bench: Evaluating Large Language Models by Letting Them Draw Paper • 2511.02347 • Published Nov 4, 2025 • 9
view article Article Aligning to What? Rethinking Agent Generalization in MiniMax M2 Oct 30, 2025 • 42
Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning Paper • 2509.22601 • Published Sep 26, 2025 • 30
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization Paper • 2508.14460 • Published Aug 20, 2025 • 85