Running on Zero Featured 307 Depth Anything 3 🏢 307 Generate depth maps from images using GPU acceleration
OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs Paper • 2510.10689 • Published Oct 12 • 46
VideoPrism: A Foundational Visual Encoder for Video Understanding Paper • 2402.13217 • Published Feb 20, 2024 • 38
Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations Paper • 2506.18898 • Published Jun 23 • 33
NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks Paper • 2504.19854 • Published Apr 28 • 7