An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges Paper • 2512.11362 • Published 14 days ago • 20
An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges Paper • 2512.11362 • Published 14 days ago • 20
Synchronized Video-to-Audio Generation via Mel Quantization-Continuum Decomposition Paper • 2503.06984 • Published Mar 10 • 5
Beyond Description: Cognitively Benchmarking Fine-Grained Action for Embodied Agents Paper • 2511.18685 • Published Nov 24 • 3
Beyond Description: Cognitively Benchmarking Fine-Grained Action for Embodied Agents Paper • 2511.18685 • Published Nov 24 • 3 • 2
Video Generation Models Are Good Latent Reward Models Paper • 2511.21541 • Published 30 days ago • 45
FaceChain: A Playground for Human-centric Artificial Intelligence Generated Content Paper • 2308.14256 • Published Aug 28, 2023 • 2