tsqn commited on
Commit
58d136e
·
verified ·
1 Parent(s): 7666392

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +97 -4
README.md CHANGED
@@ -15,15 +15,108 @@ base_model:
15
 
16
  ## ✨ Z-Image-Turbo FP32 / FP16 / BF16 EMA-ONLY & FULL
17
 
18
- Used script [merge-safetensors](https://github.com/dkotel/merge-safetensors) to merge transformer parts into one `*.safetensors` file(placed in `transformer` directory).
19
- Used script [PyTorch-Precision-Converter](https://github.com/angelolamonaca/PyTorch-Precision-Converter) to transform full fp32 model into fp32/fp16/bf16 - full/ema-only model versions.
20
 
 
21
 
 
 
 
 
22
 
23
- ### THIS IS NOT FOR COMFYUI BUT FOR DIFFUSERS LIBRARY
24
- #### YOU HAVE TO RENAME TEXT ENCODER MODEL TO `model.safetensors`, TRANSFORMER/VAE MODEL TO `diffusion_pytorch_model.safetensors` AND PUT THEM IN THEIR APPROPRIATE FOLDERS IF YOU WANT TO USE IT WITH `ZImagePipeline` WITHOUT SPECIFYING THE EXACT PATH TO THE MODEL FILE.
25
 
 
 
 
26
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
 
29
  ```bibtex
 
15
 
16
  ## ✨ Z-Image-Turbo FP32 / FP16 / BF16 EMA-ONLY & FULL
17
 
18
+ Multiple versions of Z-Image-Turbo model in various precisions and configurations, prepared directly from the original [Tongyi-MAI/Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) repository.
 
19
 
20
+ ## 📦 Available Variants
21
 
22
+ | Type | Precision | Size | Description |
23
+ |------|-----------|------|-------------|
24
+ | **Full** | FP32/FP16/BF16 | Largest | Complete model with training and EMA parameters |
25
+ | **EMA-only** | FP32/FP16/BF16 | Smaller | Only EMA parameters - **recommended for inference** |
26
 
27
+ ### EMA vs Full - Which to Choose?
 
28
 
29
+ - **EMA-only**: Contains only Exponential Moving Average parameters - averaged weights from training process. Provides more stable and better results during image generation, smaller file size. **Use this for inference.**
30
+
31
+ - **Full**: Contains all parameters (training + EMA). Only needed if you want to continue training the model.
32
 
33
+ ## 🔧 Preparation Process
34
+
35
+ Models were processed using:
36
+
37
+ 1. **[merge-safetensors](https://github.com/dkotel/merge-safetensors)** - merging split transformer parts into single `*.safetensors` file (placed in `transformer` directory)
38
+
39
+ 2. **[PyTorch-Precision-Converter](https://github.com/angelolamonaca/PyTorch-Precision-Converter)** - converting precision from FP32 to FP16/BF16 and creating EMA-only variants
40
+
41
+ ## 💡 For Diffusers Users
42
+
43
+ > ⚠️ **This is NOT compatible with ComfyUI** - models are prepared for `diffusers` library.
44
+
45
+ ### Required File Names
46
+
47
+ To use with `ZImagePipeline` without specifying full paths:
48
+ ```
49
+ text_encoder/
50
+ └── model.safetensors # Text encoder
51
+
52
+ transformer/
53
+ └── diffusion_pytorch_model.safetensors # Transformer
54
+
55
+ vae/
56
+ └── diffusion_pytorch_model.safetensors # VAE
57
+ ```
58
+
59
+ ### Example Usage (based on one from original repo)
60
+
61
+ `pip install git+https://github.com/huggingface/diffusers`
62
+
63
+
64
+ ```python
65
+ import torch
66
+ from diffusers import ZImagePipeline
67
+
68
+ # 1. Load the pipeline
69
+ # Use bfloat16 for optimal performance on supported GPUs
70
+ pipe = ZImagePipeline.from_pretrained(
71
+ "path/to/model_files_main_dir",
72
+ torch_dtype=torch.float32, # or torch.bfloat16 / torch.float16
73
+ low_cpu_mem_usage=False,
74
+ )
75
+ pipe.to("cuda")
76
+
77
+ # [Optional] Attention Backend
78
+ # Diffusers uses SDPA by default. Switch to Flash Attention for better efficiency if supported:
79
+ # pipe.transformer.set_attention_backend("flash") # Enable Flash-Attention-2
80
+ # pipe.transformer.set_attention_backend("_flash_3") # Enable Flash-Attention-3
81
+
82
+ # [Optional] Model Compilation
83
+ # Compiling the DiT model accelerates inference, but the first run will take longer to compile.
84
+ # pipe.transformer.compile()
85
+
86
+ # [Optional] CPU Offloading
87
+ # Enable CPU offloading for memory-constrained devices.
88
+ # pipe.enable_model_cpu_offload()
89
+
90
+ prompt = "Young Chinese woman in red Hanfu, intricate embroidery. Impeccable makeup, red floral forehead pattern. Elaborate high bun, golden phoenix headdress, red flowers, beads. Holds round folding fan with lady, trees, bird. Neon lightning-bolt lamp (⚡️), bright yellow glow, above extended left palm. Soft-lit outdoor night background, silhouetted tiered pagoda (西安大雁塔), blurred colorful distant lights."
91
+
92
+ # 2. Generate Image
93
+ image = pipe(
94
+ prompt=prompt,
95
+ height=1024,
96
+ width=1024,
97
+ num_inference_steps=9, # This actually results in 8 DiT forwards
98
+ guidance_scale=0.0, # Guidance should be 0 for the Turbo models
99
+ generator=torch.Generator("cuda").manual_seed(42),
100
+ ).images[0]
101
+
102
+ image.save("example.png")
103
+
104
+ ```
105
+
106
+ ## 🎯 Recommendations
107
+
108
+ - **RTX 3060 and similar**: Use **BF16** or **FP16** for optimal performance
109
+ - **Less than 12GB VRAM**: **FP16 EMA-only**
110
+ - **12GB+ VRAM**: **BF16 EMA-only** (better numerical stability)
111
+ - **Training**: **FP32 Full**
112
+
113
+ ## 📝 License
114
+
115
+ Same as original [Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) model.
116
+
117
+
118
+
119
+ README was generated with a help of AI
120
 
121
 
122
  ```bibtex