Z-Image-Turbo on AXERA AX650N
This project provides a complete implementation for deploying the Z-Image-Turbo diffusion model on AXERA AX650N NPU hardware. Z-Image-Turbo is a high-performance text-to-image generation model that leverages advanced diffusion techniques to produce high-quality images with fast inference speed.
Table of Contents
- Overview
- Requirements
- Project Structure
- Model Components
- Complete Inference Pipeline
- Advanced Usage
- Technical Support
Overview
The Z-Image-Turbo model consists of three main components:
- Text Encoder: Converts text prompts into embeddings
- Transformer: Core diffusion model that processes latent representations
- VAE (Variational Autoencoder): Encodes/decodes between pixel space and latent space
Deployment Strategy
The deployment architecture is optimized for AXERA AX650N with the following design decisions:
- Text Encoder: Currently runs on PyTorch for simplicity and faster development iteration. This component uses the Qwen3 model and can be converted to axmodel format for pure NPU inference in future releases to achieve end-to-end NPU acceleration.
- Transformer: Fully converted to axmodel format and runs on NPU through model partitioning and subgraph optimization, achieving optimal performance on the target hardware.
- VAE: Both encoder and decoder are converted to axmodel format for complete NPU acceleration, enabling fast image encoding and decoding operations.
Requirements
This project requires the following Python environment and dependencies:
Python 3.9.20
torch 2.7.0
torchvision 0.22.0
transformers 4.53.1
diffusers 0.32.1
Additional Dependencies:
- ONNX Runtime (for ONNX model inference and validation)
- onnxslim (for ONNX model optimization)
- numpy (for numerical operations and calibration data handling)
- Pulsar2 toolchain (for AXERA AX650N model compilation)
Hardware Requirements:
- AXERA AX650N development board for deployment
- x86/ARM Linux system for model conversion and compilation
Project Structure
Z-Image-Turbo/
โโโ original_onnx/ # Exported ONNX models (original format)
โ โโโ vae_decoder_simp_slim.onnx
โ โโโ vae_encoder_simp_slim.onnx
โ โโโ z_image_transformer_body_only_simp_slim.onnx
โโโ text_encoder_axmodel/ # Text encoder models in axmodel format
โ โโโ model.embed_tokens.weight.npy
โ โโโ qwen3_p128_l0_together.axmodel
โ โโโ qwen3_p128_l1_together.axmodel
โ โโโ ... (36 layer models for Qwen3)
โโโ transformer_axmodel/ # Transformer subgraph models in axmodel format
โ โโโ auto_00_model_layers_29_Add_4_output_0_to_sample_auto.axmodel
โ โโโ cfg_00_timestep_to_model_t_embedder_mlp_mlp_2_Gemm_output_0_config.axmodel
โ โโโ ... (compiled subgraph models)
โโโ transformer_onnx/ # Transformer models in ONNX format
โโโ vae_model/ # VAE models (both ONNX and axmodel formats)
โโโ VideoX-Fun/ # Main conversion and inference code
โโโ README.md # This documentation
Model Components
1. Transformer Module
The transformer module is the core component responsible for the diffusion process. It iteratively processes latent representations to generate high-quality images from noise. Due to the model's complexity and size, we employ a subgraph partitioning strategy to optimize deployment on the AX650N NPU.
Step 1: Export to ONNX Format
First, export the transformer model to ONNX format (without ControlNet support):
python scripts/z_image/export_transformer_body_onnx.py \
--output onnx-models-512x512/z_image_transformer_body_only_512x512.onnx \
--height 512 --width 512 --sequence-length 128 \
--latent-downsample-factor 8 \
--dtype fp32 \
--skip-slim
Parameters:
--output: Output path for the ONNX model--height,--width: Target image dimensions (512x512)--sequence-length: Maximum sequence length for text embeddings (128 tokens)--latent-downsample-factor: VAE downsample factor (8x)--dtype: Data type (fp32 for highest accuracy)--skip-slim: Skip ONNX simplification (optional)
Note: If you don't use
--skip-slim, the model will be automatically simplified and the output will be named:z_image_transformer_body_only_512x512_simp_slim.onnx
Step 2: Collect Calibration Data
Collect calibration dataset from the original model for quantization. This step generates representative input data that will be used during the quantization process:
python ./examples/z_image_fun/collect_onnx_inputs.py \
--model_name models/Diffusion_Transformer/Z-Image-Turbo/ \
--output_dir transformer_body_only_512x512_simp_slim/calibration \
--height 512 --width 512 \
--max_sequence_length 128
This command generates calibration data by running the model with various prompts and diffusion steps, capturing the actual input distributions that the model will encounter during inference.
Step 3: Split ONNX Model into Subgraphs
Split the monolithic ONNX model into multiple subgraphs for better memory management and compilation optimization:
python ./scripts/split_onnx_by_subconfig.py \
--model ./onnx-models-512x512/z_image_transformer_body_only_512x512_simp_slim.onnx \
--config ./pulsar2_configs/transformers_subgraph_512x512.json \
--output-dir ./transformers_body_only_512_512_split_onnx \
--verify \
--input-data ./transformer_body_only_512x512_simp_slim/calibration/transformer_inputs_prompt000_step00.npy \
--providers CPUExecutionProvider
The subgraph configuration file (transformers_subgraph_512x512.json) defines the splitting strategy, determining how the model is partitioned into smaller, manageable pieces that fit within the NPU's constraints.
Step 4: Collect Subgraph Calibration Data
After splitting, collect calibration data for each individual subgraph:
python examples/z_image_fun/collect_subgraph_inputs.py \
--onnx ./onnx-models-512x512/z_image_transformer_body_only_512x512_simp_slim.onnx \
--subgraph-config ./pulsar2_configs/transformers_subgraph_512x512.json \
--output-dir ./transformer_body_only_512x512_simp_slim/subgraph-calib \
--tar-list-file ./transformer_body_only_512x512_simp_slim/subgraph-calib/paths.txt \
--skip-existing
For collecting additional calibration data with different resolutions (for instance: 1728x992):
python examples/z_image_fun/collect_subgraph_inputs.py \
--onnx ./onnx-models-1728x992/z_image_transformer_body_only_1728x992_simp_slim.onnx \
--subgraph-config ./pulsar2_configs/transformers_subgraph_1728x992.json \
--output-dir ./transformer_body_only_1728x992_simp_slim/subgraph-calib \
--tar-list-file ./transformer_body_only_1728x992_simp_slim/subgraph-calib/paths.txt \
--sample-size 1728 992 \
--max-seq-len 256
Step 5: Generate Compilation Configuration Files
Automatically generate individual compilation configuration files for each subgraph:
python ./scripts/generate_subgraph_configs.py \
--tar-list-file ./transformer_body_only_512x512_simp_slim/subgraph-calib/paths.txt \
--output-config-dir pulsar2_configs/subgraphs_512x512
This step creates tailored configuration files for each subgraph, specifying quantization settings, calibration data paths, and compilation options.
Important: After generating the sub-ONNX files, you need to apply ONNX simplification (
onnxslim) to each subgraph for optimal performance.
Step 6: Compile All Subgraphs
Compile all subgraphs using the Pulsar2 toolchain:
./compile_all_subgraphs.sh \
--onnx-dir ./transformers_body_only_512_512_split_onnx \
--config-dir pulsar2_configs/subgraphs_512x512 \
--output-base-dir ./compiled_transformers_body_only_512x512/out_all \
--final-output-dir ./compiled_transformers_body_only_512x512/out_final
Output Directories:
out_all: Contains compilation logs and intermediate files for all subgraphsout_final: Contains only the successfully compiled axmodel files, ready for deployment
The compilation process converts each ONNX subgraph into an optimized axmodel format that can run efficiently on the AX650N NPU.
2. VAE Decoder Module
The Variational Autoencoder (VAE) is responsible for converting between the latent space representation and pixel space. The decoder takes the denoised latent representation from the transformer and generates the final RGB image.
Step 1: Export VAE to ONNX Format
Export both the VAE encoder and decoder to ONNX format:
python scripts/z_image_fun/export_vae_onnx.py \
--model-root models/Diffusion_Transformer/Z-Image-Turbo/ \
--height 512 --width 512 \
--encoder-output onnx-models-512x512/vae_encoder.onnx \
--decoder-output onnx-models-512x512/vae_decoder.onnx \
--dtype fp32 \
--save-calib-inputs \
--calib-dir onnx-calibration-512x512 \
--skip-ort-check
Parameters:
--model-root: Path to the Z-Image-Turbo model--encoder-output,--decoder-output: Output paths for the encoder and decoder ONNX models--save-calib-inputs: Save calibration inputs for quantization--calib-dir: Directory to store calibration data--skip-ort-check: Skip ONNX Runtime validation (useful when ORT has compatibility issues)
Step 2: Create Compilation Configuration
Create a configuration file for the VAE decoder compilation. Example configuration file: pulsar2_configs/vae_decoder.json
This configuration should specify:
- Input/output tensor names and shapes
- Quantization strategy (e.g., int8, mixed precision)
- Calibration data paths
- Hardware target (AX650)
Step 3: Compile VAE Decoder
Compile the ONNX model to axmodel format using Pulsar2:
pulsar2 build \
--output_dir ./compiled_output_vae_decoder \
--config pulsar2_configs/vae_decoder.json \
--npu_mode NPU3 \
--input onnx-models/vae_decoder_simp_slim.onnx \
--target_hardware AX650
Parameters:
--output_dir: Output directory for compiled models--config: Path to the compilation configuration file--npu_mode: NPU mode (NPU3 for maximum performance on AX650N)--target_hardware: Target hardware platform (AX650)
The compiled VAE decoder will be saved in the output directory and can be deployed to the AX650N board.
Complete Inference Pipeline
After compiling all components, you can run the complete text-to-image inference pipeline on the AXERA AX650N development board.
Running on the Development Board
- Transfer all compiled axmodel files to the development board
- Ensure all dependencies are installed
- Run the inference script:
python3 examples/z_image_fun/launcher_axmodel.py \
--transformer-config pulsar2_configs/transformers_subgraph.json \
--transformer-subgraph-dir ../transformer_axmodel \
--vae-axmodel ../vae_model/vae_decoder.axmodel
Parameters:
--transformer-config: Configuration file that defines the subgraph structure--transformer-subgraph-dir: Directory containing all compiled transformer subgraph axmodels--vae-axmodel: Path to the compiled VAE decoder axmodel
The launcher script will:
- Load the text encoder (PyTorch)
- Process input prompts into embeddings
- Run the transformer subgraphs sequentially on NPU
- Decode the latent representation using VAE decoder on NPU
- Output the final generated image
Example Output
Here's an example of the inference process running on the AX650N development board:
root@ax650 Z-Image-Turbo/VideoX-Fun $ python3 examples/z_image_fun/launcher_axmodel.py \
--transformer-config pulsar2_configs/transformers_subgraph.json \
--transformer-subgraph-dir ../transformer_axmodel \
--vae-axmodel ../vae_model/vae_decoder.axmodel
[INFO] Available providers: ['AxEngineExecutionProvider']
/root/yongqiang/push_hugging_face/Z-Image-Turbo/VideoX-Fun/videox_fun/dist/wan_xfuser.py:22: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
@amp.autocast(enabled=False)
...
/root/yongqiang/push_hugging_face/Z-Image-Turbo/VideoX-Fun/videox_fun/models/wan_audio_injector.py:114: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
@amp.autocast(enabled=False)
/root/yongqiang/push_hugging_face/Z-Image-Turbo/VideoX-Fun/videox_fun/models/wan_transformer3d_s2v.py:55: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
@amp.autocast(enabled=False)
2026-01-15 15:55:55.577 | INFO | __main__:main:425 - ไฝฟ็จ็ prompt: sunrise over alpine mountains, low clouds in valleys, god rays, ultra-detailed landscape
`torch_dtype` is deprecated! Use `dtype` instead!
Loading checkpoint shards: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 3/3 [00:01<00:00, 2.26it/s]
The module name (originally ) is not a valid Python identifier. Please rename the original module to avoid import issues.
^@^@^@[INFO] Using provider: AxEngineExecutionProvider
[INFO] Chip type: ChipType.MC50
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Engine version: 2.12.0s
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 5.1-patch1-dirty 5c5e711b-dirty
AX Denoising: 0%| | 0/9 [00:00<?, ?it/s][INFO] Using provider: AxEngineExecutionProvider
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 5.1-patch1-dirty 5c5e711b-dirty
2026-01-15 15:58:44.111 | INFO | __main__:_get_session:301 - ๅ ่ฝฝๅญๅพ session: cfg_00 from cfg_00_timestep_to_model_t_embedder_mlp_mlp_2_Gemm_output_0_config.axmodel
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 5.1-patch1-dirty 5c5e711b-dirty
2026-01-15 15:58:48.882 | INFO | __main__:_get_session:301 - ๅ ่ฝฝๅญๅพ session: cfg_01 from cfg_01_prompt_embeds_to_model_Slice_1_output_0_config.axmodel
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 5.1-patch1-dirty 5c5e711b-dirty
...
2026-01-15 16:00:08.612 | INFO | __main__:_get_session:301 - ๅ ่ฝฝๅญๅพ session: cfg_30 from cfg_30_model_layers_26_Add_4_output_0_to_model_layers_27_Add_4_output_0_config.axmodel
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 5.1-patch1 5c5e711b
2026-01-15 16:00:11.179 | INFO | __main__:_get_session:301 - ๅ ่ฝฝๅญๅพ session: cfg_31 from cfg_31_model_layers_27_Add_4_output_0_to_model_layers_28_Add_4_output_0_config.axmodel
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 5.1-patch1 5c5e711b
2026-01-15 16:00:13.868 | INFO | __main__:_get_session:301 - ๅ ่ฝฝๅญๅพ session: cfg_32 from cfg_32_model_layers_28_Add_4_output_0_to_model_layers_29_Add_4_output_0_config.axmodel
AX Denoising: 22%|โโโโโโโโโโโโโโโโ | 2/9 [01:36<04:45, 40.84s/it]AX Denoising: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 9/9 [02:20<00:00, 15.60s/it]
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 5.1-patch1 5c5e711b
2026-01-15 16:01:06.972 | INFO | __main__:main:537 - AXModel ๆจ็ๅฎๆ๏ผ็ปๆไฟๅญๅฐ /root/yongqiang/push_hugging_face/Z-Image-Turbo/VideoX-Fun/samples/z-image-t2i-axmodel/z_image_axmodel_2.png
The inference process demonstrates the complete pipeline working on the hardware, including:
- Model loading and initialization (~3 minutes for all 33 subgraphs)
- Denoising iterations (9 steps, ~2 minutes 20 seconds total)
- Final image generation and saving
Known Limitations
Quantization Accuracy: Unfortunately, due to quantization precision limitations, the axmodel inference results show some differences compared to the original ONNX model outputs. This is a trade-off between inference speed and numerical precision when deploying on NPU hardware. Future work may include:
- Fine-tuning quantization parameters to improve accuracy
- Exploring mixed-precision quantization strategies
- Implementing calibration with more diverse datasets
Advanced Usage
Frontend-Only Export for Graph Analysis
For debugging and graph analysis, you can export only the frontend graph without compilation:
ENABLE_COMPILER=0 DUMP_FRONTEND_GRAPH=1 \
pulsar2 build \
--output_dir ./compiled_output_trans_body_only_frontend \
--config pulsar2_configs/config_controlnet.json \
--npu_mode NPU3 \
--input ../original_onnx/z_image_transformer_body_only_simp_slim.onnx \
--target_hardware AX650
This is useful for:
- Analyzing the graph structure before compilation
- Debugging subgraph partitioning strategies
- Verifying model transformations
Compile from Quantized ONNX
If you already have a quantized ONNX model, you can compile it directly:
pulsar2 build \
--input compiled_output_trans_body_only_use_calibration/quant/quant_axmodel.onnx \
--model_type QuantAxModel \
--output_dir compiled_subgraph_from_quant_onnx \
--output_name transformers.axmodel \
--config pulsar2_configs/transformers_subgraph.json \
--target_hardware AX650 \
--npu_mode NPU3
Technical Support
If you encounter any issues or have questions about the implementation:
- GitHub Issues: Create an issue for bug reports and feature requests
- QQ Group: 139953715 (Chinese community support)
License
This project is licensed under the BSD-3-Clause License. See the LICENSE file for details.
Note: This implementation is optimized for AXERA AX650N hardware. Performance and compatibility may vary on other platforms.