LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling Paper • 2511.20785 • Published 15 days ago • 150
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe Paper • 2511.16334 • Published 20 days ago • 91
RynnVLA-002: A Unified Vision-Language-Action and World Model Paper • 2511.17502 • Published 19 days ago • 24
MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling Paper • 2511.11793 • Published 26 days ago • 159
PixelRefer: A Unified Framework for Spatio-Temporal Object Referring with Arbitrary Granularity Paper • 2510.23603 • Published Oct 27 • 22
Scaling Language-Centric Omnimodal Representation Learning Paper • 2510.11693 • Published Oct 13 • 100
High-Fidelity Simulated Data Generation for Real-World Zero-Shot Robotic Manipulation Learning with Gaussian Splatting Paper • 2510.10637 • Published Oct 12 • 12
TIME: A Multi-level Benchmark for Temporal Reasoning of LLMs in Real-World Scenarios Paper • 2505.12891 • Published May 19 • 10
Residual Off-Policy RL for Finetuning Behavior Cloning Policies Paper • 2509.19301 • Published Sep 23 • 18