File size: 5,086 Bytes
34755a3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58d136e
34755a3
58d136e
34755a3
58d136e
 
 
 
34755a3
58d136e
7666392
58d136e
 
 
7666392
58d136e
 
 
 
 
 
 
 
 
 
 
 
 
 
fb36bb6
58d136e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7666392
 
34755a3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
---
license: apache-2.0
language:
- en
pipeline_tag: text-to-image
tags:
- text-to-image
- diffusers
- ZImagePipeline
library_name: diffusers
base_model:
- Tongyi-MAI/Z-Image-Turbo
---


## โœจ Z-Image-Turbo FP32 / FP16 / BF16 EMA-ONLY & FULL

Multiple versions of Z-Image-Turbo model in various precisions and configurations, prepared directly from the original [Tongyi-MAI/Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) repository.

## ๐Ÿ“ฆ Available Variants

| Type | Precision | Size | Description |
|------|-----------|------|-------------|
| **Full** | FP32/FP16/BF16 | Largest | Complete model with training and EMA parameters |
| **EMA-only** | FP32/FP16/BF16 | Smaller | Only EMA parameters - **recommended for inference** |

### EMA vs Full - Which to Choose?

- **EMA-only**: Contains only Exponential Moving Average parameters - averaged weights from training process. Provides more stable and better results during image generation, smaller file size. **Use this for inference.**
  
- **Full**: Contains all parameters (training + EMA). Only needed if you want to continue training the model.

## ๐Ÿ”ง Preparation Process

Models were processed using:

1. **[merge-safetensors](https://github.com/dkotel/merge-safetensors)** - merging split transformer parts into single `*.safetensors` file (placed in `transformer` directory)

2. **[PyTorch-Precision-Converter](https://github.com/angelolamonaca/PyTorch-Precision-Converter)** - converting precision from FP32 to FP16/BF16 and creating EMA-only variants

## ๐Ÿ’ก For Diffusers Users

> โš ๏ธ **This is NOT compatible with ComfyUI** - models are prepared for `diffusers` library.

### Required File Names

To use with `ZImagePipeline` without specifying full paths, rename model files in appropriate folders:
```
text_encoder/
  โ””โ”€โ”€ model.safetensors              # Text encoder

transformer/
  โ””โ”€โ”€ diffusion_pytorch_model.safetensors   # Transformer

vae/
  โ””โ”€โ”€ diffusion_pytorch_model.safetensors   # VAE
```

### Example Usage (based on one from original repo)

`pip install git+https://github.com/huggingface/diffusers`


```python
import torch
from diffusers import ZImagePipeline

# 1. Load the pipeline
# Use bfloat16 for optimal performance on supported GPUs
pipe = ZImagePipeline.from_pretrained(
    "path/to/model_files_main_dir",
    torch_dtype=torch.float32, # or torch.bfloat16 / torch.float16
    low_cpu_mem_usage=False,
)
pipe.to("cuda")

# [Optional] Attention Backend
# Diffusers uses SDPA by default. Switch to Flash Attention for better efficiency if supported:
# pipe.transformer.set_attention_backend("flash")    # Enable Flash-Attention-2
# pipe.transformer.set_attention_backend("_flash_3") # Enable Flash-Attention-3

# [Optional] Model Compilation
# Compiling the DiT model accelerates inference, but the first run will take longer to compile.
# pipe.transformer.compile()

# [Optional] CPU Offloading
# Enable CPU offloading for memory-constrained devices.
# pipe.enable_model_cpu_offload()

prompt = "Young Chinese woman in red Hanfu, intricate embroidery. Impeccable makeup, red floral forehead pattern. Elaborate high bun, golden phoenix headdress, red flowers, beads. Holds round folding fan with lady, trees, bird. Neon lightning-bolt lamp (โšก๏ธ), bright yellow glow, above extended left palm. Soft-lit outdoor night background, silhouetted tiered pagoda (่ฅฟๅฎ‰ๅคง้›ๅก”), blurred colorful distant lights."

# 2. Generate Image
image = pipe(
    prompt=prompt,
    height=1024,
    width=1024,
    num_inference_steps=9,  # This actually results in 8 DiT forwards
    guidance_scale=0.0,     # Guidance should be 0 for the Turbo models
    generator=torch.Generator("cuda").manual_seed(42),
).images[0]

image.save("example.png")

```

## ๐ŸŽฏ Recommendations

- **RTX 3060 and similar**: Use **BF16** or **FP16** for optimal performance
- **Less than 12GB VRAM**: **FP16 EMA-only** 
- **12GB+ VRAM**: **BF16 EMA-only** (better numerical stability)
- **Training**: **FP32 Full**

## ๐Ÿ“ License

Same as original [Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) model.



README was generated with a help of AI


```bibtex
@article{team2025zimage,
  title={Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer},
  author={Z-Image Team},
  journal={arXiv preprint arXiv:2511.22699},
  year={2025}
}

@article{liu2025decoupled,
  title={Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield},
  author={Dongyang Liu and Peng Gao and David Liu and Ruoyi Du and Zhen Li and Qilong Wu and Xin Jin and Sihan Cao and Shifeng Zhang and Hongsheng Li and Steven Hoi},
  journal={arXiv preprint arXiv:2511.22677},
  year={2025}
}

@article{jiang2025distribution,
  title={Distribution Matching Distillation Meets Reinforcement Learning},
  author={Jiang, Dengyang and Liu, Dongyang and Wang, Zanyi and Wu, Qilong and Jin, Xin and Liu, David and Li, Zhen and Wang, Mengmeng and Gao, Peng and Yang, Harry},
  journal={arXiv preprint arXiv:2511.13649},
  year={2025}
}
```