StableVLA: Towards Robust Vision-Language-Action Models without Extra Data Paper • 2605.18287 • Published 24 days ago • 15
HumanNet: Scaling Human-centric Video Learning to One Million Hours Paper • 2605.06747 • Published May 7 • 52
LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning Paper • 2603.21065 • Published Mar 22 • 78
Enhancing Spatial Understanding in Image Generation via Reward Modeling Paper • 2602.24233 • Published Feb 27 • 60
iFSQ: Improving FSQ for Image Generation with 1 Line of Code Paper • 2601.17124 • Published Jan 23 • 34
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation Paper • 2601.02204 • Published Jan 5 • 64
view article Article Skill is All You Need: Lessons from Building Marketing Agents at Noumena Noumena-AI • Dec 25, 2025 • 14
Next-Embedding Prediction Makes Strong Vision Learners Paper • 2512.16922 • Published Dec 18, 2025 • 91
Emu3.5: Native Multimodal Models are World Learners Paper • 2510.26583 • Published Oct 30, 2025 • 117
Imaginarium: Vision-guided High-Quality 3D Scene Layout Generation Paper • 2510.15564 • Published Oct 17, 2025 • 11