google/siglip2-giant-opt-patch16-256 Zero-Shot Image Classification • 2B • Updated Feb 21, 2025 • 2.18k • 3
facebook/dinov3-vitl16-pretrain-lvd1689m Image Feature Extraction • 0.3B • Updated Aug 19, 2025 • 474k • 165
Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models Paper • 2601.19834 • Published 29 days ago • 25
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models Paper • 2512.24165 • Published Dec 30, 2025 • 51
In Pursuit of Pixel Supervision for Visual Pre-training Paper • 2512.15715 • Published Dec 17, 2025 • 11
Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs Paper • 2512.07525 • Published Dec 8, 2025 • 59
Back to Basics: Let Denoising Generative Models Denoise Paper • 2511.13720 • Published Nov 17, 2025 • 69
PAN: A World Model for General, Interactable, and Long-Horizon World Simulation Paper • 2511.09057 • Published Nov 12, 2025 • 80
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers Paper • 2401.11605 • Published Jan 21, 2024 • 23
Advancing End-to-End Pixel Space Generative Modeling via Self-supervised Pre-training Paper • 2510.12586 • Published Oct 14, 2025 • 113