arxiv:2603.04379

Helios: Real Real-Time Long Video Generation Model

Published on Mar 4

· Submitted by

taesiri on Mar 5

Authors:

Shenghai Yuan ,

Xiao Yang ,

Abstract

Helios is a 14 billion parameter autoregressive diffusion model for video generation that achieves real-time performance and high-quality long-video synthesis without conventional optimization techniques.

AI-generated summary

We introduce Helios, the first 14B video generation model that runs at 19.5 FPS on a single NVIDIA H100 GPU and supports minute-scale generation while matching the quality of a strong baseline. We make breakthroughs along three key dimensions: (1) robustness to long-video drifting without commonly used anti-drifting heuristics such as self-forcing, error-banks, or keyframe sampling; (2) real-time generation without standard acceleration techniques such as KV-cache, sparse/linear attention, or quantization; and (3) training without parallelism or sharding frameworks, enabling image-diffusion-scale batch sizes while fitting up to four 14B models within 80 GB of GPU memory. Specifically, Helios is a 14B autoregressive diffusion model with a unified input representation that natively supports T2V, I2V, and V2V tasks. To mitigate drifting in long-video generation, we characterize typical failure modes and propose simple yet effective training strategies that explicitly simulate drifting during training, while eliminating repetitive motion at its source. For efficiency, we heavily compress the historical and noisy context and reduce the number of sampling steps, yielding computational costs comparable to -- or lower than -- those of 1.3B video generative models. Moreover, we introduce infrastructure-level optimizations that accelerate both inference and training while reducing memory consumption. Extensive experiments demonstrate that Helios consistently outperforms prior methods on both short- and long-video generation. We plan to release the code, base model, and distilled model to support further development by the community.

View arXiv page View PDF Project page GitHub 597 Add to collection

Community

BestWishYsh

Paper author 1 day ago

•

edited 1 day ago

Code: https://github.com/PKU-YuanGroup/Helios
Page: https://pku-yuangroup.github.io/Helios-Page/

Inference Speed:

BestWishYsh

Paper author 1 day ago

Huge thanks to Ascend Team, Diffusers Team, vLLM Team and SGLang Team, Day-0 support !

librarian-bot

about 9 hours ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

avahal

about 8 hours ago

arXivLens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/helios-real-real-time-long-video-generation-model-2072-b2b90818

Executive Summary
Detailed Breakdown
Practical Applications

grantsing

about 7 hours ago

19.5 FPS on a single H100 for a 14B video generation model is pretty wild. most video models are painfully slow so this real-time performance is a big deal. the fact they did this without typical acceleration tricks like KV cache or quantization makes it more impressive. handling long video drift without anti-drift heuristics seems tricky but important. good analysis here https://arxivexplained.com/helios-real-real-time-long-video-generation-model