Diffusers
Safetensors

Frame In-N-Out: Unbounded Controllable Image-to-Video Generation


Intro Video


Frame In-N-Out is a controllable Image-to-Video generation Diffusion Transformer model where objects can enter or exit the scene along user-specified motion trajectories and ID reference. Our method introduces a new dataset curation pattern recognition, evaluation protocol, and a motion-controllable, identity-preserving, unbounded canvas Video Diffusion Transformer, to achieve Frame In and Frame Out in the cinematic domain.

Model Zoo πŸ€—

Model Description Huggingface
CogVideoX-I2V-5B V1.0 (Stage 1 - Motion Control) Paper Weight v1.0 Download
CogVideoX-I2V-5B (Stage 2 - Motion + In-N-Out Control) Paper Weight v1.0 Download
Wan2.2-TI2V-5B (Stage 1 - Motion Control) New Weight v1.5 on 704P Download
Wan2.2-TI2V-5B (Stage 2 - Motion + In-N-Out Control) New Weight v1.5 on 704P Download
Wan2.2-TI2V-5B (Stage 2 - Motion + In-N-Out Control) New Weight v1.6 on Arbitrary Resolution Download

πŸ“š Citation

@article{wang2025frame,
  title={Frame In-N-Out: Unbounded Controllable Image-to-Video Generation},
  author={Wang, Boyang and Chen, Xuweiyi and Gadelha, Matheus and Cheng, Zezhou},
  journal={arXiv preprint arXiv:2505.21491},
  year={2025}
}
Downloads last month
23
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including uva-cv-lab/FrameINO_CogVideoX_Stage1_Motion_v1.0