Talk2Move: Reinforcement Learning for Text-Instructed Object-Level Geometric Transformation in Scenes Paper • 2601.02356 • Published 1 day ago • 11
SS4D: Native 4D Generative Model via Structured Spacetime Latents Paper • 2512.14284 • Published 22 days ago • 13
LongVie 2: Multimodal Controllable Ultra-Long Video World Model Paper • 2512.13604 • Published 23 days ago • 72
ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning Paper • 2512.05111 • Published Dec 4, 2025 • 47
ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation Paper • 2512.03036 • Published Dec 2, 2025 • 21
ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation Paper • 2512.03036 • Published Dec 2, 2025 • 21 • 2
STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence Paper • 2510.24693 • Published Oct 28, 2025 • 18
Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation Paper • 2510.01284 • Published Sep 30, 2025 • 34
SPARK: Synergistic Policy And Reward Co-Evolving Framework Paper • 2509.22624 • Published Sep 26, 2025 • 17
CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning Paper • 2509.22647 • Published Sep 26, 2025 • 32
Gemini vs GPT-4V: A Preliminary Comparison and Combination of Vision-Language Models Through Qualitative Cases Paper • 2312.15011 • Published Dec 22, 2023 • 18
3DGen-Bench: Comprehensive Benchmark Suite for 3D Generative Models Paper • 2503.21745 • Published Mar 27, 2025 • 1
Hi3DEval: Advancing 3D Generation Evaluation with Hierarchical Validity Paper • 2508.05609 • Published Aug 7, 2025 • 29
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience Paper • 2508.04700 • Published Aug 6, 2025 • 52