LOCA-bench: Benchmarking Language Agents Under Controllable and Extreme Context Growth Paper • 2602.07962 • Published 1 day ago • 8
The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution Paper • 2510.25726 • Published Oct 29, 2025 • 46
WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents Paper • 2509.06501 • Published Sep 8, 2025 • 80
Learn to Reason Efficiently with Adaptive Length-based Reward Shaping Paper • 2505.15612 • Published May 21, 2025 • 34