Tongyi-MAI
/

Z-Image

+---
+license: apache-2.0
+language:
+- en
+pipeline_tag: text-to-image
+library_name: diffusers
+---
+<h1 align="center">⚡️- Image<br><sub><sup>An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer</sup></sub></h1>
+<div align="center">
+[![Official Site](https://img.shields.io/badge/Official%20Site-333399.svg?logo=homepage)](https://tongyi-mai.github.io/Z-Image-blog/)&#160;
+[![GitHub](https://img.shields.io/badge/GitHub-Z--Image-181717?logo=github&logoColor=white)](https://github.com/Tongyi-MAI/Z-Image)&#160;
+[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Checkpoint-Z--Image-yellow)](https://huggingface.co/Tongyi-MAI/Z-Image)&#160;
+[![ModelScope Model](https://img.shields.io/badge/🤖%20Checkpoint-Z--Image-624aff)](https://www.modelscope.cn/models/Tongyi-MAI/Z-Image)&#160;
+[![ModelScope Space](https://img.shields.io/badge/🤖%20Online_Demo-Z--Image-17c7a7)](https://www.modelscope.cn/aigc/imageGeneration?tab=advanced&versionId=569345&modelType=Checkpoint&sdVersion=Z_IMAGE&modelUrl=modelscope%3A%2F%2FTongyi-MAI%2FZ-Image%3Frevision%3Dmaster)&#160;
+<a href="https://arxiv.org/abs/2511.22699" target="_blank"><img src="https://img.shields.io/badge/Report-b5212f.svg?logo=arxiv" height="21px"></a>
+<!-- [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Online_Demo-Z--Image-blue)](https://huggingface.co/spaces/Tongyi-MAI/Z-Image)&#160; -->
+<!-- [![Art Gallery PDF](https://img.shields.io/badge/%F0%9F%96%BC%20Art_Gallery-PDF-ff69b4)](assets/Z-Image-Gallery.pdf)&#160;
+[![Web Art Gallery](https://img.shields.io/badge/%F0%9F%8C%90%20Web_Art_Gallery-online-00bfff)](https://modelscope.cn/studios/Tongyi-MAI/Z-Image-Gallery/summary)&#160; -->
+Welcome to the official repository for the Z-Image（造相）project!
+</div>
+## ✨ Z-Image
+We are excited to introduce **Z-Image**, a powerful and efficient image generation model with **6B** parameters. While **Z-Image-Turbo** is designed for speed, the standard **Z-Image** stands out as our primary community foundation model, delivering higher flexibility in generation and style, excellent generative quality and aesthetics, and exceptional support for robust secondary development.
+<!-- 📸 **Photorealistic Quality**: **Z-Image-Turbo** delivers strong photorealistic image generation while maintaining excellent aesthetic quality.
+![Showcase of Z-Image on Photo-realistic image Generation](assets/showcase_realistic.png) -->
+### 🌟 Key Features
+#### 🎨 Aesthetic & Artistic Diversity
+Z-Image maintains high photorealism while supporting a wider range of artistic styles. Unlike the Turbo version, which is heavily optimized for realism via RL, Z-Image preserves more stylistic variety—making it better suited for anime, digital art, and other creative genres.
+#### 🛠 Fine-tuning & Community Development
+Z-Image is a non-distilled base model, making it a more flexible starting point for fine-tuning (LoRA, ControlNet, etc.).
+* **CFG Support:** Unlike distilled models that often bypass Classifier-Free Guidance, Z-Image retains full CFG support for precise prompt control.
+* **Training Stability:** The model's internal diversity and weight distribution make it more receptive to learning new concepts during downstream training compared to low-step variants.
+#### 🧬 Improved Generative Diversity
+We have focused on solving the homogenization issues common in many modern generators:
+* **Distinct Identities:** Different seeds produce noticeably different faces and compositions, avoiding the "same face" problem across generations.
+* **Multi-subject Scenes:** In prompts with multiple people, Z-Image generates individuals with unique features instead of the "cloning effect" often seen in high-speed models.
+#### 🚫 Effective Negative Prompting
+Z-Image is highly responsive to Negative Prompts. This allows for better steerability and more control over the final output, effectively filtering out unwanted elements or artifacts.
+### 🚀 Quick Start
+Install the latest version of diffusers, use the following command:
+```bash
+pip install git+https://github.com/huggingface/diffusers
+```
+```python
+import torch
+from diffusers import ZImagePipeline
+# 1. Load the pipeline
+# Use bfloat16 for optimal performance on supported GPUs
+pipe = ZImagePipeline.from_pretrained(
+    "Tongyi-MAI/Z-Image",
+    torch_dtype=torch.bfloat16,
+    low_cpu_mem_usage=False,
+)
+pipe.to("cuda")
+# 2. Generate Image
+prompt = "两名年轻亚裔女性紧密站在一起，背景为朴素的灰色纹理墙面，可能是室内地毯地面。左侧女性留着长卷发，身穿藏青色毛衣，左袖有奶油色褶皱装饰，内搭白色立领衬衫，下身白色裤子；佩戴小巧金色耳钉，双臂交叉于背后。右侧女性留直肩长发，身穿奶油色卫衣，胸前印有“Tunthetables”字样，下方为“New ideas”，搭配白色裤子；佩戴银色小环耳环，双臂交叉于胸前。两人均面带微笑直视镜头。���片，自然光照明，柔和阴影，以藏青、奶油白为主的中性色调，休闲时尚摄影，中等景深，面部和上半身对焦清晰，姿态放松，表情友好，室内环境，地毯地面，纯色背景。"
+negative_prompt = "" # optional, but would be powerful when you want to remove some unwanted content
+image = pipe(
+    prompt=prompt,
+    negative_prompt=negative_prompt,
+    height=1280,
+    width=720,
+    cfg_normalization=True,  # could switch if needed
+    num_inference_steps=50,  # May use 28-50 for Z-Image Model
+    guidance_scale=4.0,      # Suggested guidance scale is 3.0 to 5.0 for Z-Image Model
+    generator=torch.Generator("cuda").manual_seed(42),
+).images[0]
+image.save("example.png")
+```
+## ⏬ Download
+```bash
+pip install -U huggingface_hub
+HF_XET_HIGH_PERFORMANCE=1 hf download Tongyi-MAI/Z-Image
+```
+## 📜 Citation
+If you find our work useful in your research, please consider citing:
+```bibtex
+@article{team2025zimage,
+  title={Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer},
+  author={Z-Image Team},
+  journal={arXiv preprint arXiv:2511.22699},
+  year={2025}
+}
+```