Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories Paper • 2606.02060 • Published 10 days ago • 53
On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters Paper • 2606.02437 • Published 10 days ago • 225
Lean Refactor: Multi-Objective Controllable Proof Optimization via Agentic Strategy Search Paper • 2605.20244 • Published 24 days ago • 4
EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation Paper • 2605.23271 • Published 20 days ago • 79
bilabila/b-b7_olr_ts10_gru_hib_costdyn_util_w3_sym7_202601_lossq_ms400k_h12 68k • Updated 19 days ago • 383 • 1
Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality? Paper • 2605.22109 • Published 21 days ago • 169
WorldAct: Activating Monolithic 3D Worlds into Interactive-Ready Object-Centric Scenes Paper • 2605.15843 • Published 27 days ago • 6
SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise Paper • 2602.12783 • Published Feb 13 • 246
MemLens: Benchmarking Multimodal Long-Term Memory in Large Vision-Language Models Paper • 2605.14906 • Published 28 days ago • 77
DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices Paper • 2605.10933 • Published May 11 • 3
Scaling Continual Learning to 300+ Tasks with Bi-Level Routing Mixture-of-Experts Paper • 2602.03473 • Published May 8 • 11