JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation Paper • 2602.19163 • Published 4 days ago • 9
SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning Paper • 2602.13515 • Published 13 days ago • 43
Autoregressive Image Generation with Masked Bit Modeling Paper • 2602.09024 • Published 17 days ago • 6
PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss Paper • 2602.02493 • Published 24 days ago • 42
One-step Latent-free Image Generation with Pixel Mean Flows Paper • 2601.22158 • Published 28 days ago • 18
Revisiting Diffusion Model Predictions Through Dimensionality Paper • 2601.21419 • Published 29 days ago • 4
Towards Pixel-Level VLM Perception via Simple Points Prediction Paper • 2601.19228 • Published about 1 month ago • 18
OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation Paper • 2601.15369 • Published Jan 21 • 21
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders Paper • 2601.16208 • Published Jan 22 • 52
Bidirectional Normalizing Flow: From Data to Noise and Back Paper • 2512.10953 • Published Dec 11, 2025 • 7
Towards Scalable Pre-training of Visual Tokenizers for Generation Paper • 2512.13687 • Published Dec 15, 2025 • 106
SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder Paper • 2512.11749 • Published Dec 12, 2025 • 39
TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models Paper • 2512.08153 • Published Dec 9, 2025 • 8
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance Paper • 2512.08765 • Published Dec 9, 2025 • 132