Update README.md
Browse files
README.md
CHANGED
|
@@ -6,7 +6,6 @@ pipeline_tag: text-to-image
|
|
| 6 |
library_name: diffusers
|
| 7 |
---
|
| 8 |
|
| 9 |
-
|
| 10 |
<h1 align="center">β‘οΈ- Image<br><sub><sup>An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer</sup></sub></h1>
|
| 11 |
|
| 12 |
<div align="center">
|
|
@@ -18,50 +17,71 @@ library_name: diffusers
|
|
| 18 |
[](https://www.modelscope.cn/aigc/imageGeneration?tab=advanced&versionId=569345&modelType=Checkpoint&sdVersion=Z_IMAGE&modelUrl=modelscope%3A%2F%2FTongyi-MAI%2FZ-Image%3Frevision%3Dmaster) 
|
| 19 |
<a href="https://arxiv.org/abs/2511.22699" target="_blank"><img src="https://img.shields.io/badge/Report-b5212f.svg?logo=arxiv" height="21px"></a>
|
| 20 |
|
| 21 |
-
|
| 22 |
-
<!-- [](assets/Z-Image-Gallery.pdf) 
|
| 23 |
-
[](https://modelscope.cn/studios/Tongyi-MAI/Z-Image-Gallery/summary)  -->
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
Welcome to the official repository for the Z-ImageοΌι ηΈοΌproject!
|
| 27 |
|
| 28 |
</div>
|
| 29 |
|
|
|
|
| 30 |
|
|
|
|
|
|
|
| 31 |
|
| 32 |
-
##
|
| 33 |
|
| 34 |
-
|
|
|
|
|
|
|
| 35 |
|
| 36 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
|
| 40 |
-
###
|
|
|
|
| 41 |
|
| 42 |
-
#### π¨ Aesthetic & Artistic Diversity
|
| 43 |
-
Z-Image maintains high photorealism while supporting a wider range of artistic styles. Unlike the Turbo version, which is heavily optimized for realism via RL, Z-Image preserves more stylistic varietyβmaking it better suited for anime, digital art, and other creative genres.
|
| 44 |
|
| 45 |
-
###
|
| 46 |
-
Z-Image is a non-distilled base model, making it a more flexible starting point for fine-tuning (LoRA, ControlNet, etc.).
|
| 47 |
-
* **CFG Support:** Unlike distilled models that often bypass Classifier-Free Guidance, Z-Image retains full CFG support for precise prompt control.
|
| 48 |
-
* **Training Stability:** The model's internal diversity and weight distribution make it more receptive to learning new concepts during downstream training compared to low-step variants.
|
| 49 |
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
|
|
|
|
|
|
|
|
|
| 54 |
|
| 55 |
-
##
|
| 56 |
-
Z-Image is highly responsive to Negative Prompts. This allows for better steerability and more control over the final output, effectively filtering out unwanted elements or artifacts.
|
| 57 |
|
|
|
|
| 58 |
|
| 59 |
-
|
| 60 |
-
Install the latest version of diffusers, use the following command:
|
| 61 |
```bash
|
| 62 |
pip install git+https://github.com/huggingface/diffusers
|
| 63 |
```
|
| 64 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 65 |
```python
|
| 66 |
import torch
|
| 67 |
from diffusers import ZImagePipeline
|
|
@@ -84,7 +104,7 @@ image = pipe(
|
|
| 84 |
negative_prompt=negative_prompt,
|
| 85 |
height=1280,
|
| 86 |
width=720,
|
| 87 |
-
cfg_normalization=
|
| 88 |
num_inference_steps=50, # May use 28-50 for Z-Image Model
|
| 89 |
guidance_scale=4.0, # Suggested guidance scale is 3.0 to 5.0 for Z-Image Model
|
| 90 |
generator=torch.Generator("cuda").manual_seed(42),
|
|
@@ -93,12 +113,6 @@ image = pipe(
|
|
| 93 |
image.save("example.png")
|
| 94 |
```
|
| 95 |
|
| 96 |
-
## β¬ Download
|
| 97 |
-
```bash
|
| 98 |
-
pip install -U huggingface_hub
|
| 99 |
-
HF_XET_HIGH_PERFORMANCE=1 hf download Tongyi-MAI/Z-Image
|
| 100 |
-
```
|
| 101 |
-
|
| 102 |
## π Citation
|
| 103 |
|
| 104 |
If you find our work useful in your research, please consider citing:
|
|
|
|
| 6 |
library_name: diffusers
|
| 7 |
---
|
| 8 |
|
|
|
|
| 9 |
<h1 align="center">β‘οΈ- Image<br><sub><sup>An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer</sup></sub></h1>
|
| 10 |
|
| 11 |
<div align="center">
|
|
|
|
| 17 |
[](https://www.modelscope.cn/aigc/imageGeneration?tab=advanced&versionId=569345&modelType=Checkpoint&sdVersion=Z_IMAGE&modelUrl=modelscope%3A%2F%2FTongyi-MAI%2FZ-Image%3Frevision%3Dmaster) 
|
| 18 |
<a href="https://arxiv.org/abs/2511.22699" target="_blank"><img src="https://img.shields.io/badge/Report-b5212f.svg?logo=arxiv" height="21px"></a>
|
| 19 |
|
| 20 |
+
Welcome to the official repository for the β‘οΈ- Image family!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
</div>
|
| 23 |
|
| 24 |
+
## π¨ Z-Image
|
| 25 |
|
| 26 |
+
**Z-Image** is the foundation model behind Z-Image-Turbo, designed for high-quality image generation with strong controllability, broad stylistic coverage, and support for downstream development.
|
| 27 |
+
It serves as the primary community model in the β‘οΈ- Image family, while Z-Image-Turbo focuses on high-speed inference.
|
| 28 |
|
| 29 |
+
### π Key Features
|
| 30 |
|
| 31 |
+
#### π¨ Aesthetic & Artistic Diversity
|
| 32 |
+
Z-Image supports a wide range of aesthetics and artistic styles, including realistic photography, anime, illustration, digital art, and stylized visuals.
|
| 33 |
+
It is suitable for creative scenarios that require rich stylistic expression rather than a single preferred aesthetic.
|
| 34 |
|
| 35 |
+
#### 𧬠Generative Diversity
|
| 36 |
+
Z-Image emphasizes diversity across multiple generative dimensions:
|
| 37 |
+
- Variations in facial identity, body pose, composition, and layout across different seeds
|
| 38 |
+
- Distinct appearances for individuals in multi-person scenes
|
| 39 |
+
- Higher overall variability compared to heavily speed-optimized models
|
| 40 |
|
| 41 |
+
#### π Foundation Model for Fine-tuning & Control
|
| 42 |
+
Z-Image is a non-distilled base model for downstream development:
|
| 43 |
+
- Compatible with parameter-efficient fine-tuning methods
|
| 44 |
+
- Extendable with structural conditioning approaches
|
| 45 |
+
- Supports full Classifier-Free Guidance (CFG) for precise prompt control
|
| 46 |
|
| 47 |
+
#### π« Effective Negative Prompting
|
| 48 |
+
Z-Image responds strongly to negative prompts, enabling reliable suppression of unwanted artifacts, styles, and compositional errors.
|
| 49 |
|
|
|
|
|
|
|
| 50 |
|
| 51 |
+
### π Z-Image vs Z-Image-Turbo
|
|
|
|
|
|
|
|
|
|
| 52 |
|
| 53 |
+
| Aspect | Z-Image | Z-Image-Turbo |
|
| 54 |
+
|------|------|------|
|
| 55 |
+
| CFG support | Yes | No |
|
| 56 |
+
| Fine-tuning | Yes | Limited |
|
| 57 |
+
| Aesthetic diversity | High | Reduced |
|
| 58 |
+
| Negative prompt control | Strong | None |
|
| 59 |
+
| Inference speed | Slower | Faster |
|
| 60 |
|
| 61 |
+
## π Quick Start
|
|
|
|
| 62 |
|
| 63 |
+
### Installation & Download
|
| 64 |
|
| 65 |
+
Install the latest version of diffusers:
|
|
|
|
| 66 |
```bash
|
| 67 |
pip install git+https://github.com/huggingface/diffusers
|
| 68 |
```
|
| 69 |
|
| 70 |
+
Download the model:
|
| 71 |
+
```bash
|
| 72 |
+
pip install -U huggingface_hub
|
| 73 |
+
HF_XET_HIGH_PERFORMANCE=1 hf download Tongyi-MAI/Z-Image
|
| 74 |
+
```
|
| 75 |
+
|
| 76 |
+
### Recommended Parameters
|
| 77 |
+
|
| 78 |
+
- **Resolution:** 512Γ512 to 2048Γ2048 (total pixel area, any aspect ratio)
|
| 79 |
+
- **Guidance scale:** 3.0 β 5.0
|
| 80 |
+
- **Inference steps:** 28 β 50
|
| 81 |
+
- **Negative prompts:** Strongly recommended for better control
|
| 82 |
+
|
| 83 |
+
### Usage Example
|
| 84 |
+
|
| 85 |
```python
|
| 86 |
import torch
|
| 87 |
from diffusers import ZImagePipeline
|
|
|
|
| 104 |
negative_prompt=negative_prompt,
|
| 105 |
height=1280,
|
| 106 |
width=720,
|
| 107 |
+
cfg_normalization=False, # Could switch if needed: True for more realism, False for general stylism
|
| 108 |
num_inference_steps=50, # May use 28-50 for Z-Image Model
|
| 109 |
guidance_scale=4.0, # Suggested guidance scale is 3.0 to 5.0 for Z-Image Model
|
| 110 |
generator=torch.Generator("cuda").manual_seed(42),
|
|
|
|
| 113 |
image.save("example.png")
|
| 114 |
```
|
| 115 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 116 |
## π Citation
|
| 117 |
|
| 118 |
If you find our work useful in your research, please consider citing:
|