Verifiable Rewards Beyond Math and Code: Lightweight Corpus-Grounded Process Supervision for Factual Question Answering Paper • 2605.29648 • Published 14 days ago • 10
Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention Paper • 2605.29548 • Published 14 days ago • 11
Towards Verifiable Multimodal Deep Research: A Multi-Agent Harness for Interleaved Report Generation Paper • 2605.29861 • Published 14 days ago • 16
COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation Paper • 2605.31264 • Published 13 days ago • 111
SWE-Explore: Benchmarking How Coding Agents Explore Repositories Paper • 2606.07297 • Published 6 days ago • 105
Whisper Hallucination Detection and Mitigation via Hidden Representation Steering and Sparse AutoEncoders Paper • 2606.07473 • Published 6 days ago • 12
DuMate-DeepResearch: An Auditable Multi-Agent System with Recursive Search and Rubric-Grounded Reasoning Paper • 2606.07299 • Published 6 days ago • 6
Experience Makes Skillful: Enabling Generalizable Medical Agent Reasoning via Self-Evolving Skill Memory Paper • 2606.09365 • Published 2 days ago • 2