Composing Concepts from Images and Videos via Concept-prompt Binding Paper โข 2512.09824 โข Published 16 days ago โข 27
MIND-V: Hierarchical Video Generation for Long-Horizon Robotic Manipulation with RL-based Physical Alignment Paper โข 2512.06628 โข Published 20 days ago โข 12
AnyTalker: Scaling Multi-Person Talking Video Generation with Interactivity Refinement Paper โข 2511.23475 โข Published 28 days ago โข 41
Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation Paper โข 2509.18824 โข Published Sep 23 โข 22
pyannote/speaker-diarization-3.1 Automatic Speech Recognition โข Updated May 10, 2024 โข 15.6M โข 1.38k