DeepSearchQA: Bridging the Comprehensiveness Gap for Deep Research Agents Paper • 2601.20975 • Published 2 days ago • 6
OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models Paper • 2601.21639 • Published 1 day ago • 41
DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation Paper • 2601.22153 • Published 1 day ago • 46
Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision Paper • 2601.19798 • Published 3 days ago • 38
AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning Paper • 2601.18631 • Published 4 days ago • 47
From Passive Metric to Active Signal: The Evolving Role of Uncertainty Quantification in Large Language Models Paper • 2601.15690 • Published 9 days ago • 4
ActionMesh: Animated 3D Mesh Generation with Temporal 3D Diffusion Paper • 2601.16148 • Published 8 days ago • 12
VideoMaMa: Mask-Guided Video Matting via Generative Prior Paper • 2601.14255 • Published 10 days ago • 13
Numba-Accelerated 2D Diffusion-Limited Aggregation: Implementation and Fractal Characterization Paper • 2601.15440 • Published 9 days ago • 1
360Anything: Geometry-Free Lifting of Images and Videos to 360° Paper • 2601.16192 • Published 8 days ago • 8
EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience Paper • 2601.15876 • Published 9 days ago • 89
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding Paper • 2601.14724 • Published 10 days ago • 73
Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning Paper • 2601.16163 • Published 8 days ago • 13