diff-storyboard / README_zh.md

jiaxi2002

Upload folder using huggingface_hub

feb33a0 verified 9 days ago

preview code

raw

history blame contribute delete

58.5 kB

DiffSynth-Studio

Switch to English

简介

欢迎来到 Diffusion 模型的魔法世界！DiffSynth-Studio 是由魔搭社区团队开发和维护的开源 Diffusion 模型引擎。我们期望以框架建设孵化技术创新，凝聚开源社区的力量，探索生成式模型技术的边界！

DiffSynth 目前包括两个开源项目：

DiffSynth-Studio: 聚焦于激进的技术探索，面向学术界，提供更前沿的模型能力支持。
DiffSynth-Engine: 聚焦于稳定的模型部署，面向工业界，提供更高的计算性能与更稳定的功能。

DiffSynth-Studio 与 DiffSynth-Engine 作为魔搭社区 AIGC 专区的核心技术支撑，提供了强大的AI生成内容能力。欢迎体验我们精心打造的产品化功能，开启您的AI创作之旅！

安装

从源码安装（推荐）：

git clone https://github.com/modelscope/DiffSynth-Studio.git
cd DiffSynth-Studio
pip install -e .

其他安装方式

从 pypi 安装（存在版本更新延迟，如需使用最新功能，请从源码安装）

pip install diffsynth

如果在安装过程中遇到问题，可能是由上游依赖包导致的，请参考这些包的文档：

基础框架

DiffSynth-Studio 为主流 Diffusion 模型（包括 FLUX、Wan 等）重新设计了推理和训练流水线，能够实现高效的显存管理、灵活的模型训练。

Qwen-Image 系列 (🔥新模型)

详细页面：./examples/qwen_image/

快速开始

from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig
from PIL import Image
import torch

pipe = QwenImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
        ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"),
        ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"),
        ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
    ],
    tokenizer_config=ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="tokenizer/"),
)
prompt = "精致肖像，水下少女，蓝裙飘逸，发丝轻扬，光影透澈，气泡环绕，面容恬静，细节精致，梦幻唯美。"
image = pipe(
    prompt, seed=0, num_inference_steps=40,
    # edit_image=Image.open("xxx.jpg").resize((1328, 1328)) # For Qwen-Image-Edit
)
image.save("image.jpg")

模型总览

模型 ID	推理	低显存推理	全量训练	全量训练后验证	LoRA 训练	LoRA 训练后验证
Qwen/Qwen-Image	code	code	code	code	code	code
Qwen/Qwen-Image-Edit	code	code	code	code	code	code
Qwen/Qwen-Image-Edit-2509	code	code	code	code	code	code
DiffSynth-Studio/Qwen-Image-EliGen-V2	code	code	-	-	code	code
DiffSynth-Studio/Qwen-Image-EliGen-Poster	code	code	-	-	code	code
DiffSynth-Studio/Qwen-Image-Distill-Full	code	code	code	code	code	code
DiffSynth-Studio/Qwen-Image-Distill-LoRA	code	code	-	-	code	code
DiffSynth-Studio/Qwen-Image-EliGen	code	code	-	-	code	code
DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Canny	code	code	code	code	code	code
DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Depth	code	code	code	code	code	code
DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Inpaint	code	code	code	code	code	code
DiffSynth-Studio/Qwen-Image-In-Context-Control-Union	code	code	-	-	code	code
DiffSynth-Studio/Qwen-Image-Edit-Lowres-Fix	code	code	-	-	-	-

FLUX 系列

详细页面：./examples/flux/

快速开始

import torch
from diffsynth.pipelines.flux_image_new import FluxImagePipeline, ModelConfig

pipe = FluxImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="flux1-dev.safetensors"),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder/model.safetensors"),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="text_encoder_2/"),
        ModelConfig(model_id="black-forest-labs/FLUX.1-dev", origin_file_pattern="ae.safetensors"),
    ],
)

image = pipe(prompt="a cat", seed=0)
image.save("image.jpg")

模型总览

模型 ID	额外参数	推理	低显存推理	全量训练	全量训练后验证	LoRA 训练	LoRA 训练后验证
FLUX.1-dev		code	code	code	code	code	code
FLUX.1-Krea-dev		code	code	code	code	code	code
FLUX.1-Kontext-dev	`kontext_images`	code	code	code	code	code	code
FLUX.1-dev-Controlnet-Inpainting-Beta	`controlnet_inputs`	code	code	code	code	code	code
FLUX.1-dev-Controlnet-Union-alpha	`controlnet_inputs`	code	code	code	code	code	code
FLUX.1-dev-Controlnet-Upscaler	`controlnet_inputs`	code	code	code	code	code	code
FLUX.1-dev-IP-Adapter	`ipadapter_images`, `ipadapter_scale`	code	code	code	code	code	code
FLUX.1-dev-InfiniteYou	`infinityou_id_image`, `infinityou_guidance`, `controlnet_inputs`	code	code	code	code	code	code
FLUX.1-dev-EliGen	`eligen_entity_prompts`, `eligen_entity_masks`, `eligen_enable_on_negative`, `eligen_enable_inpaint`	code	code	-	-	code	code
FLUX.1-dev-LoRA-Encoder	`lora_encoder_inputs`, `lora_encoder_scale`	code	code	code	code	-	-
FLUX.1-dev-LoRA-Fusion-Preview		code	-	-	-	-	-
Step1X-Edit	`step1x_reference_image`	code	code	code	code	code	code
FLEX.2-preview	`flex_inpaint_image`, `flex_inpaint_mask`, `flex_control_image`, `flex_control_strength`, `flex_control_stop`	code	code	code	code	code	code
Nexus-Gen	`nexus_gen_reference_image`	code	code	code	code	code	code

Wan 系列

详细页面：./examples/wanvideo/

https://github.com/user-attachments/assets/1d66ae74-3b02-40a9-acc3-ea95fc039314

快速开始

import torch
from diffsynth import save_video
from diffsynth.pipelines.wan_video_new import WanVideoPipeline, ModelConfig

pipe = WanVideoPipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
        ModelConfig(model_id="Wan-AI/Wan2.1-T2V-1.3B", origin_file_pattern="diffusion_pytorch_model*.safetensors", offload_device="cpu"),
        ModelConfig(model_id="Wan-AI/Wan2.1-T2V-1.3B", origin_file_pattern="models_t5_umt5-xxl-enc-bf16.pth", offload_device="cpu"),
        ModelConfig(model_id="Wan-AI/Wan2.1-T2V-1.3B", origin_file_pattern="Wan2.1_VAE.pth", offload_device="cpu"),
    ],
)
pipe.enable_vram_management()

video = pipe(
    prompt="纪实摄影风格画面，一只活泼的小狗在绿茵茵的草地上迅速奔跑。小狗毛色棕黄，两只耳朵立起，神情专注而欢快。阳光洒在它身上，使得毛发看上去格外柔软而闪亮。背景是一片开阔的草地，偶尔点缀着几朵野花，远处隐约可见蓝天和几片白云。透视感鲜明，捕捉小狗奔跑时的动感和四周草地的生机。中景侧面移动视角。",
    negative_prompt="色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走",
    seed=0, tiled=True,
)
save_video(video, "video1.mp4", fps=15, quality=5)

模型总览

模型 ID	额外参数	推理	全量训练	全量训练后验证	LoRA 训练	LoRA 训练后验证
Wan-AI/Wan2.2-Animate-14B	`input_image`, `animate_pose_video`, `animate_face_video`, `animate_inpaint_video`, `animate_mask_video`	code	code	code	code	code
Wan-AI/Wan2.2-S2V-14B	`input_image`, `input_audio`, `audio_sample_rate`, `s2v_pose_video`	code	code	code	code	code
Wan-AI/Wan2.2-I2V-A14B	`input_image`	code	code	code	code	code
Wan-AI/Wan2.2-T2V-A14B		code	code	code	code	code
Wan-AI/Wan2.2-TI2V-5B	`input_image`	code	code	code	code	code
Wan-AI/Wan2.2-VACE-Fun-A14B	`vace_control_video`, `vace_reference_image`	code	code	code	code	code
PAI/Wan2.2-Fun-A14B-InP	`input_image`, `end_image`	code	code	code	code	code
PAI/Wan2.2-Fun-A14B-Control	`control_video`, `reference_image`	code	code	code	code	code
PAI/Wan2.2-Fun-A14B-Control-Camera	`control_camera_video`, `input_image`	code	code	code	code	code
Wan-AI/Wan2.1-T2V-1.3B		code	code	code	code	code
Wan-AI/Wan2.1-T2V-14B		code	code	code	code	code
Wan-AI/Wan2.1-I2V-14B-480P	`input_image`	code	code	code	code	code
Wan-AI/Wan2.1-I2V-14B-720P	`input_image`	code	code	code	code	code
Wan-AI/Wan2.1-FLF2V-14B-720P	`input_image`, `end_image`	code	code	code	code	code
PAI/Wan2.1-Fun-1.3B-InP	`input_image`, `end_image`	code	code	code	code	code
PAI/Wan2.1-Fun-1.3B-Control	`control_video`	code	code	code	code	code
PAI/Wan2.1-Fun-14B-InP	`input_image`, `end_image`	code	code	code	code	code
PAI/Wan2.1-Fun-14B-Control	`control_video`	code	code	code	code	code
PAI/Wan2.1-Fun-V1.1-1.3B-Control	`control_video`, `reference_image`	code	code	code	code	code
PAI/Wan2.1-Fun-V1.1-14B-Control	`control_video`, `reference_image`	code	code	code	code	code
PAI/Wan2.1-Fun-V1.1-1.3B-InP	`input_image`, `end_image`	code	code	code	code	code
PAI/Wan2.1-Fun-V1.1-14B-InP	`input_image`, `end_image`	code	code	code	code	code
PAI/Wan2.1-Fun-V1.1-1.3B-Control-Camera	`control_camera_video`, `input_image`	code	code	code	code	code
PAI/Wan2.1-Fun-V1.1-14B-Control-Camera	`control_camera_video`, `input_image`	code	code	code	code	code
iic/VACE-Wan2.1-1.3B-Preview	`vace_control_video`, `vace_reference_image`	code	code	code	code	code
Wan-AI/Wan2.1-VACE-1.3B	`vace_control_video`, `vace_reference_image`	code	code	code	code	code
Wan-AI/Wan2.1-VACE-14B	`vace_control_video`, `vace_reference_image`	code	code	code	code	code
DiffSynth-Studio/Wan2.1-1.3b-speedcontrol-v1	`motion_bucket_id`	code	code	code	code	code
krea/krea-realtime-video		code	code	code	code	code
meituan-longcat/LongCat-Video	`longcat_video`	code	code	code	code	code
ByteDance/Video-As-Prompt-Wan2.1-14B	`vap_video`, `vap_prompt`	code	code	code	code	code

创新成果

DiffSynth-Studio 不仅仅是一个工程化的模型框架，更是创新成果的孵化器。

Nexus-Gen: 统一架构的图像理解、生成、编辑

详细页面：https://github.com/modelscope/Nexus-Gen
论文：Nexus-Gen: Unified Image Understanding, Generation, and Editing via Prefilled Autoregression in Shared Embedding Space
模型：ModelScope, HuggingFace
数据集：ModelScope Dataset
在线体验：ModelScope Nexus-Gen Studio

ArtAug: 图像生成模型的美学提升

详细页面：./examples/ArtAug/
论文：ArtAug: Enhancing Text-to-Image Generation through Synthesis-Understanding Interaction
模型：ModelScope, HuggingFace
在线体验：ModelScope AIGC Tab

FLUX.1-dev	FLUX.1-dev + ArtAug LoRA

EliGen: 精准的图像分区控制

详细页面：./examples/EntityControl/
论文：EliGen: Entity-Level Controlled Image Generation with Regional Attention
模型：ModelScope, HuggingFace
在线体验：ModelScope EliGen Studio
数据集：EliGen Train Set

实体控制区域	生成图像

ExVideo: 视频生成模型的扩展训练

项目页面：Project Page
论文：ExVideo: Extending Video Diffusion Models via Parameter-Efficient Post-Tuning
代码样例：./examples/ExVideo/
模型：ModelScope, HuggingFace

https://github.com/modelscope/DiffSynth-Studio/assets/35051019/d97f6aa9-8064-4b5b-9d49-ed6001bb9acc

Diffutoon: 高分辨率动漫风格视频渲染

项目页面：Project Page
论文：Diffutoon: High-Resolution Editable Toon Shading via Diffusion Models
代码样例：./examples/Diffutoon/

https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/b54c05c5-d747-4709-be5e-b39af82404dd

DiffSynth: 本项目的初代版本

项目页面：Project Page
论文：DiffSynth: Latent In-Iteration Deflickering for Realistic Video Synthesis
代码样例：./examples/diffsynth/

https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/59fb2f7b-8de0-4481-b79f-0c3a7361a1ea

更新历史

2025年11月4日 支持了 ByteDance/Video-As-Prompt-Wan2.1-14B 模型，该模型基于 Wan 2.1 训练，支持根据参考视频生成相应的动作。
2025年10月30日 支持了 meituan-longcat/LongCat-Video 模型，该模型支持文生视频、图生视频、视频续写。这个模型在本项目中沿用 Wan 的框架进行推理和训练。
2025年10月27日 支持了 krea/krea-realtime-video 模型，Wan 模型生态再添一员。
2025年9月23日 DiffSynth-Studio/Qwen-Image-EliGen-Poster 发布！本模型由我们与淘天体验设计团队联合研发并开源。模型基于 Qwen-Image 构建，专为电商海报场景设计，支持精确的分区布局控制。请参考我们的示例代码。
2025年9月9日 我们的训练框架支持了多种训练模式，目前已适配 Qwen-Image，除标准 SFT 训练模式外，已支持 Direct Distill，请参考我们的示例代码。这项功能是实验性的，我们将会继续完善已支持更全面的模型训练功能。
2025年8月28日 我们支持了Wan2.2-S2V，一个音频驱动的电影级视频生成模型。请参见./examples/wanvideo/。
2025年8月21日 DiffSynth-Studio/Qwen-Image-EliGen-V2 发布！相比于 V1 版本，训练数据集变为 Qwen-Image-Self-Generated-Dataset，因此，生成的图像更符合 Qwen-Image 本身的图像分布和风格。请参考我们的示例代码。
2025年8月21日 我们开源了 DiffSynth-Studio/Qwen-Image-In-Context-Control-Union 结构控制 LoRA 模型，采用 In Context 的技术路线，支持多种类别的结构控制条件，包括 canny, depth, lineart, softedge, normal, openpose。请参考我们的示例代码。
2025年8月20日 我们开源了 DiffSynth-Studio/Qwen-Image-Edit-Lowres-Fix 模型，提升了 Qwen-Image-Edit 对低分辨率图像输入的编辑效果。请参考我们的示例代码
2025年8月19日 🔥 Qwen-Image-Edit 开源，欢迎图像编辑模型新成员！
2025年8月18日 我们训练并开源了 Qwen-Image 的图像重绘 ControlNet 模型 DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Inpaint，模型结构采用了轻量化的设计，请参考我们的示例代码。
2025年8月15日 我们开源了 Qwen-Image-Self-Generated-Dataset 数据集。这是一个使用 Qwen-Image 模型生成的图像数据集，共包含 160,000 张1024 x 1024图像。它包括通用、英文文本渲染和中文文本渲染子集。我们为每张图像提供了图像描述、实体和结构控制图像的标注。开发者可以使用这个数据集来训练 Qwen-Image 模型的 ControlNet 和 EliGen 等模型，我们旨在通过开源推动技术发展！
2025年8月13日 我们训练并开源了 Qwen-Image 的 ControlNet 模型 DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Depth，模型结构采用了轻量化的设计，请参考我们的示例代码。
2025年8月12日 我们训练并开源了 Qwen-Image 的 ControlNet 模型 DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Canny，模型结构采用了轻量化的设计，请参考我们的示例代码。
2025年8月11日 我们开源了 Qwen-Image 的蒸馏加速模型 DiffSynth-Studio/Qwen-Image-Distill-LoRA，沿用了与 DiffSynth-Studio/Qwen-Image-Distill-Full 相同的训练流程，但模型结构修改为了 LoRA，因此能够更好地与其他开源生态模型兼容。
2025年8月7日 我们开源了 Qwen-Image 的实体控制 LoRA 模型 DiffSynth-Studio/Qwen-Image-EliGen。Qwen-Image-EliGen 能够实现实体级可控的文生图。技术细节请参见论文。训练数据集：EliGenTrainSet。
2025年8月5日 我们开源了 Qwen-Image 的蒸馏加速模型 DiffSynth-Studio/Qwen-Image-Distill-Full，实现了约 5 倍加速。
2025年8月4日 🔥 Qwen-Image 开源，欢迎图像生成模型家族新成员！
2025年8月1日 FLUX.1-Krea-dev 开源，这是一个专注于美学摄影的文生图模型。我们第一时间提供了全方位支持，包括低显存逐层 offload、LoRA 训练、全量训练。详细信息请参考 ./examples/flux/。
2025年7月28日 Wan 2.2 开源，我们第一时间提供了全方位支持，包括低显存逐层 offload、FP8 量化、序列并行、LoRA 训练、全量训练。详细信息请参考 ./examples/wanvideo/。
2025年7月11日 我们提出 Nexus-Gen，一个将大语言模型（LLM）的语言推理能力与扩散模型的图像生成能力相结合的统一框架。该框架支持无缝的图像理解、生成和编辑任务。
- 论文: Nexus-Gen: Unified Image Understanding, Generation, and Editing via Prefilled Autoregression in Shared Embedding Space
- Github 仓库: https://github.com/modelscope/Nexus-Gen
- 模型: ModelScope, HuggingFace
- 训练数据集: ModelScope Dataset
- 在线体验: ModelScope Nexus-Gen Studio

2025年6月15日 ModelScope 官方评测框架 EvalScope 现已支持文生图生成评测。请参考最佳实践指南进行尝试。
2025年3月25日 我们的新开源项目 DiffSynth-Engine 现已开源！专注于稳定的模型部署，面向工业界，提供更好的工程支持、更高的计算性能和更稳定的功能。
2025年3月31日 我们支持 InfiniteYou，一种用于 FLUX 的人脸特征保留方法。更多细节请参考 ./examples/InfiniteYou/。
2025年3月13日 我们支持 HunyuanVideo-I2V，即腾讯开源的 HunyuanVideo 的图像到视频生成版本。更多细节请参考 ./examples/HunyuanVideo/。
2025年2月25日 我们支持 Wan-Video，这是阿里巴巴开源的一系列最先进的视频合成模型。详见 ./examples/wanvideo/。
2025年2月17日 我们支持 StepVideo！先进的视频合成模型！详见 ./examples/stepvideo。
2024年12月31日 我们提出 EliGen，一种用于精确实体级别控制的文本到图像生成的新框架，并辅以修复融合管道，将其能力扩展到图像修复任务。EliGen 可以无缝集成现有的社区模型，如 IP-Adapter 和 In-Context LoRA，提升其通用性。更多详情，请见 ./examples/EntityControl。
- 论文: EliGen: Entity-Level Controlled Image Generation with Regional Attention
- 模型: ModelScope, HuggingFace
- 在线体验: ModelScope EliGen Studio
- 训练数据集: EliGen Train Set
2024年12月19日 我们为 HunyuanVideo 实现了高级显存管理，使得在 24GB 显存下可以生成分辨率为 129x720x1280 的视频，或在仅 6GB 显存下生成分辨率为 129x512x384 的视频。更多细节请参考 ./examples/HunyuanVideo/。
2024年12月18日 我们提出 ArtAug，一种通过合成-理解交互来改进文生图模型的方法。我们以 LoRA 格式为 FLUX.1-dev 训练了一个 ArtAug 增强模块。该模型将 Qwen2-VL-72B 的美学理解融入 FLUX.1-dev，从而提升了生成图像的质量。
- 论文: https://arxiv.org/abs/2412.12888
- 示例: https://github.com/modelscope/DiffSynth-Studio/tree/main/examples/ArtAug
- 模型: ModelScope, HuggingFace
- 演示: ModelScope, HuggingFace (即将上线)
2024年10月25日 我们提供了广泛的 FLUX ControlNet 支持。该项目支持许多不同的 ControlNet 模型，并且可以自由组合，即使它们的结构不同。此外，ControlNet 模型兼容高分辨率优化和分区控制技术，能够实现非常强大的可控图像生成。详见 ./examples/ControlNet/。
2024年10月8日 我们发布了基于 CogVideoX-5B 和 ExVideo 的扩展 LoRA。您可以从 ModelScope 或 HuggingFace 下载此模型。
2024年8月22日 本项目现已支持 CogVideoX-5B。详见此处。我们为这个文生视频模型提供了几个有趣的功能，包括：
- 文本到视频
- 视频编辑
- 自我超分
- 视频插帧
2024年8月22日 我们实现了一个有趣的画笔功能，支持所有文生图模型。现在，您可以在 AI 的辅助下使用画笔创作惊艳的图像了！
- 在我们的 WebUI 中使用它。
2024年8月21日 DiffSynth-Studio 现已支持 FLUX。
- 启用 CFG 和高分辨率修复以提升视觉质量。详见此处
- LoRA、ControlNet 和其他附加模型将很快推出。
2024年6月21日 我们提出 ExVideo，一种旨在增强视频生成模型能力的后训练微调技术。我们将 Stable Video Diffusion 进行了扩展，实现了长达 128 帧的长视频生成。
- 项目页面
- 源代码已在此仓库中发布。详见 examples/ExVideo。
- 模型已发布于 HuggingFace 和 ModelScope。
- 技术报告已发布于 arXiv。
- 您可以在此演示中试用 ExVideo！
2024年6月13日 DiffSynth Studio 已迁移至 ModelScope。开发团队也从“我”转变为“我们”。当然，我仍会参与后续的开发和维护工作。
2024年1月29日 我们提出 Diffutoon，这是一个出色的卡通着色解决方案。
- 项目页面
- 源代码已在此项目中发布。
- 技术报告（IJCAI 2024）已发布于 arXiv。
2023年12月8日 我们决定启动一个新项目，旨在释放扩散模型的潜力，尤其是在视频合成方面。该项目的开发工作正式开始。
2023年11月15日 我们提出 FastBlend，一种强大的视频去闪烁算法。
- sd-webui 扩展已发布于 GitHub。
- 演示视频已在 Bilibili 上展示，包含三个任务：
- 技术报告已发布于 arXiv。
- 其他用户开发的非官方 ComfyUI 扩展已发布于 GitHub。
2023年10月1日 我们发布了该项目的早期版本，名为 FastSDXL。这是构建一个扩散引擎的初步尝试。
- 源代码已发布于 GitHub。
- FastSDXL 包含一个可训练的 OLSS 调度器，以提高效率。
  - OLSS 的原始仓库位于此处。
  - 技术报告（CIKM 2023）已发布于 arXiv。
  - 演示视频已发布于 Bilibili。
  - 由于 OLSS 需要额外训练，我们未在本项目中实现它。
2023年8月29日 我们提出 DiffSynth，一个视频合成框架。
- 项目页面。
- 源代码已发布在 EasyNLP。
- 技术报告（ECML PKDD 2024）已发布于 arXiv。

jiaxi2002
/

diff-storyboard

DiffSynth-Studio

简介

安装

基础框架

Qwen-Image 系列 (🔥新模型)

FLUX 系列

Wan 系列

更多模型

创新成果

更新历史