Tongyi-MAI
/

Z-Image

@@ -6,7 +6,6 @@ pipeline_tag: text-to-image
 library_name: diffusers
 ---
 <h1 align="center">⚡️- Image<br><sub><sup>An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer</sup></sub></h1>
 <div align="center">
@@ -18,50 +17,71 @@ library_name: diffusers
 [![ModelScope Space](https://img.shields.io/badge/🤖%20Online_Demo-Z--Image-17c7a7)](https://www.modelscope.cn/aigc/imageGeneration?tab=advanced&versionId=569345&modelType=Checkpoint&sdVersion=Z_IMAGE&modelUrl=modelscope%3A%2F%2FTongyi-MAI%2FZ-Image%3Frevision%3Dmaster)&#160;
 <a href="https://arxiv.org/abs/2511.22699" target="_blank"><img src="https://img.shields.io/badge/Report-b5212f.svg?logo=arxiv" height="21px"></a>
-<!-- [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Online_Demo-Z--Image-blue)](https://huggingface.co/spaces/Tongyi-MAI/Z-Image)&#160; -->
-<!-- [![Art Gallery PDF](https://img.shields.io/badge/%F0%9F%96%BC%20Art_Gallery-PDF-ff69b4)](assets/Z-Image-Gallery.pdf)&#160;
-[![Web Art Gallery](https://img.shields.io/badge/%F0%9F%8C%90%20Web_Art_Gallery-online-00bfff)](https://modelscope.cn/studios/Tongyi-MAI/Z-Image-Gallery/summary)&#160; -->
-Welcome to the official repository for the Z-Image（造相）project!
 </div>
-## ✨ Z-Image
-We are excited to introduce **Z-Image**, a powerful and efficient image generation model with **6B** parameters. While **Z-Image-Turbo** is designed for speed, the standard **Z-Image** stands out as our primary community foundation model, delivering higher flexibility in generation and style, excellent generative quality and aesthetics, and exceptional support for robust secondary development.
-<!-- 📸 **Photorealistic Quality**: **Z-Image-Turbo** delivers strong photorealistic image generation while maintaining excellent aesthetic quality.
-![Showcase of Z-Image on Photo-realistic image Generation](assets/showcase_realistic.png) -->
-### 🌟 Key Features
-#### 🎨 Aesthetic & Artistic Diversity
-Z-Image maintains high photorealism while supporting a wider range of artistic styles. Unlike the Turbo version, which is heavily optimized for realism via RL, Z-Image preserves more stylistic variety—making it better suited for anime, digital art, and other creative genres.
-#### 🛠 Fine-tuning & Community Development
-Z-Image is a non-distilled base model, making it a more flexible starting point for fine-tuning (LoRA, ControlNet, etc.).
-* **CFG Support:** Unlike distilled models that often bypass Classifier-Free Guidance, Z-Image retains full CFG support for precise prompt control.
-* **Training Stability:** The model's internal diversity and weight distribution make it more receptive to learning new concepts during downstream training compared to low-step variants.
-#### 🧬 Improved Generative Diversity
-We have focused on solving the homogenization issues common in many modern generators:
-* **Distinct Identities:** Different seeds produce noticeably different faces and compositions, avoiding the "same face" problem across generations.
-* **Multi-subject Scenes:** In prompts with multiple people, Z-Image generates individuals with unique features instead of the "cloning effect" often seen in high-speed models.
-#### 🚫 Effective Negative Prompting
-Z-Image is highly responsive to Negative Prompts. This allows for better steerability and more control over the final output, effectively filtering out unwanted elements or artifacts.
-### 🚀 Quick Start
-Install the latest version of diffusers, use the following command:
 ```bash
 pip install git+https://github.com/huggingface/diffusers
 ```
 ```python
 import torch
 from diffusers import ZImagePipeline
@@ -84,7 +104,7 @@ image = pipe(
     negative_prompt=negative_prompt,
     height=1280,
     width=720,
-    cfg_normalization=True,  # could switch if needed
     num_inference_steps=50,  # May use 28-50 for Z-Image Model
     guidance_scale=4.0,      # Suggested guidance scale is 3.0 to 5.0 for Z-Image Model
     generator=torch.Generator("cuda").manual_seed(42),
@@ -93,12 +113,6 @@ image = pipe(
 image.save("example.png")
 ```
-## ⏬ Download
-```bash
-pip install -U huggingface_hub
-HF_XET_HIGH_PERFORMANCE=1 hf download Tongyi-MAI/Z-Image
-```
 ## 📜 Citation
 If you find our work useful in your research, please consider citing:

 library_name: diffusers
 ---
 <h1 align="center">⚡️- Image<br><sub><sup>An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer</sup></sub></h1>
 <div align="center">
 [![ModelScope Space](https://img.shields.io/badge/🤖%20Online_Demo-Z--Image-17c7a7)](https://www.modelscope.cn/aigc/imageGeneration?tab=advanced&versionId=569345&modelType=Checkpoint&sdVersion=Z_IMAGE&modelUrl=modelscope%3A%2F%2FTongyi-MAI%2FZ-Image%3Frevision%3Dmaster)&#160;
 <a href="https://arxiv.org/abs/2511.22699" target="_blank"><img src="https://img.shields.io/badge/Report-b5212f.svg?logo=arxiv" height="21px"></a>
+Welcome to the official repository for the ⚡️- Image family!
 </div>
+## 🎨 Z-Image
+**Z-Image** is the foundation model behind Z-Image-Turbo, designed for high-quality image generation with strong controllability, broad stylistic coverage, and support for downstream development.
+It serves as the primary community model in the ⚡️- Image family, while Z-Image-Turbo focuses on high-speed inference.
+### 🌟 Key Features
+#### 🎨 Aesthetic & Artistic Diversity
+Z-Image supports a wide range of aesthetics and artistic styles, including realistic photography, anime, illustration, digital art, and stylized visuals.
+It is suitable for creative scenarios that require rich stylistic expression rather than a single preferred aesthetic.
+#### 🧬 Generative Diversity
+Z-Image emphasizes diversity across multiple generative dimensions:
+- Variations in facial identity, body pose, composition, and layout across different seeds
+- Distinct appearances for individuals in multi-person scenes
+- Higher overall variability compared to heavily speed-optimized models
+#### 🛠 Foundation Model for Fine-tuning & Control
+Z-Image is a non-distilled base model for downstream development:
+- Compatible with parameter-efficient fine-tuning methods
+- Extendable with structural conditioning approaches
+- Supports full Classifier-Free Guidance (CFG) for precise prompt control
+#### 🚫 Effective Negative Prompting
+Z-Image responds strongly to negative prompts, enabling reliable suppression of unwanted artifacts, styles, and compositional errors.
+### 🆚 Z-Image vs Z-Image-Turbo
+| Aspect | Z-Image | Z-Image-Turbo |
+|------|------|------|
+| CFG support | Yes | No |
+| Fine-tuning | Yes | Limited |
+| Aesthetic diversity | High | Reduced |
+| Negative prompt control | Strong | None |
+| Inference speed | Slower | Faster |
+## 🚀 Quick Start
+### Installation & Download
+Install the latest version of diffusers:
 ```bash
 pip install git+https://github.com/huggingface/diffusers
 ```
+Download the model:
+```bash
+pip install -U huggingface_hub
+HF_XET_HIGH_PERFORMANCE=1 hf download Tongyi-MAI/Z-Image
+```
+### Recommended Parameters
+- **Resolution:** 512×512 to 2048×2048 (total pixel area, any aspect ratio)
+- **Guidance scale:** 3.0 – 5.0
+- **Inference steps:** 28 – 50
+- **Negative prompts:** Strongly recommended for better control
+### Usage Example
 ```python
 import torch
 from diffusers import ZImagePipeline
     negative_prompt=negative_prompt,
     height=1280,
     width=720,
+    cfg_normalization=False, # Could switch if needed: True for more realism, False for general stylism
     num_inference_steps=50,  # May use 28-50 for Z-Image Model
     guidance_scale=4.0,      # Suggested guidance scale is 3.0 to 5.0 for Z-Image Model
     generator=torch.Generator("cuda").manual_seed(42),
 image.save("example.png")
 ```
 ## 📜 Citation
 If you find our work useful in your research, please consider citing: