HySparse: A Hybrid Sparse Attention Architecture with Oracle Token Selection and KV Cache Sharing Paper β’ 2602.03560 β’ Published 10 days ago β’ 42
Transition Matching Distillation for Fast Video Generation Paper β’ 2601.09881 β’ Published 29 days ago β’ 33
Self-Evaluation Unlocks Any-Step Text-to-Image Generation Paper β’ 2512.22374 β’ Published Dec 26, 2025 β’ 17
view article Article M2.1: Multilingual and Multi-Task Coding with Strong Generalization Jan 5 β’ 39
FlowBlending: Stage-Aware Multi-Model Sampling for Fast and High-Fidelity Video Generation Paper β’ 2512.24724 β’ Published Dec 31, 2025 β’ 7
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times Paper β’ 2512.16093 β’ Published Dec 18, 2025 β’ 95
Region-Constraint In-Context Generation for Instructional Video Editing Paper β’ 2512.17650 β’ Published Dec 19, 2025 β’ 51
Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition Paper β’ 2512.15603 β’ Published Dec 17, 2025 β’ 66
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation Paper β’ 2511.14993 β’ Published Nov 19, 2025 β’ 231
One Small Step in Latent, One Giant Leap for Pixels: Fast Latent Upscale Adapter for Your Diffusion Models Paper β’ 2511.10629 β’ Published Nov 13, 2025 β’ 127
PAN: A World Model for General, Interactable, and Long-Horizon World Simulation Paper β’ 2511.09057 β’ Published Nov 12, 2025 β’ 80
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale Paper β’ 2508.10711 β’ Published Aug 14, 2025 β’ 145
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization Paper β’ 2507.14683 β’ Published Jul 19, 2025 β’ 134
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 β’ 11 items β’ Updated Dec 31, 2025 β’ 557
Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations Paper β’ 2506.18898 β’ Published Jun 23, 2025 β’ 34