Tongyi-MAI
/

Z-Image

@@ -23,36 +23,24 @@ Welcome to the official repository for the Z-Image（造相）project!
 ## 🎨 Z-Image
-**Z-Image** is the foundation model behind Z-Image-Turbo, designed for high-quality image generation with strong controllability, broad stylistic coverage, and support for downstream development. It serves as the primary community model in the ⚡️- Image family, while Z-Image-Turbo focuses on high-speed inference.
 ### 🌟 Key Features
-#### 🎨 Aesthetics
-Z-Image supports a wide range of aesthetics and artistic styles, including realistic photography, anime, illustration, digital art, and stylized visuals.
-It is suitable for creative scenarios that require rich stylistic expression rather than a single preferred aesthetic.
-#### 🌈 Diversity
-Z-Image emphasizes diversity across multiple generative dimensions:
-- Variations in facial identity, body pose, composition, and layout across different seeds
-- Distinct appearances for individuals in multi-person scenes
-- Higher overall variability compared to heavily speed-optimized models
-#### 🛠 Foundation Model for Fine-tuning & Control
-Z-Image is a non-distilled base model for downstream development:
-- Compatible with parameter-efficient fine-tuning methods
-- Extendable with structural conditioning approaches
-- Supports full Classifier-Free Guidance (CFG) for precise prompt control
-#### 🚫 Effective Negative Prompting
-Z-Image responds strongly to negative prompts, enabling reliable suppression of unwanted artifacts, styles, and compositional errors.
 ### 🆚 Z-Image vs Z-Image-Turbo
 | Aspect | Z-Image | Z-Image-Turbo |
 |------|------|------|
 | CFG | ✅ | ❌ |
-| Steps | 50 | 8 |
 | Fintunablity | ✅ | ❌ |
 | Negative Prompting | ✅ | ❌ |
 | Diversity | High | Low |
@@ -77,8 +65,6 @@ HF_XET_HIGH_PERFORMANCE=1 hf download Tongyi-MAI/Z-Image
 - **Resolution:** 512×512 to 2048×2048 (total pixel area, any aspect ratio)
 - **Guidance scale:** 3.0 – 5.0
 - **Inference steps:** 28 – 50
-- **Negative prompts:** Strongly recommended for better control
-- **CFG normalization:** `False` for general stylism, `True` for realism
 ### Usage Example

 ## 🎨 Z-Image
+**Z-Image** is the foundation model of the ⚡️- Image family, engineered for good quality, robust generative diversity, and broad stylistic coverage.
+While Z-Image-Turbo is built for speed,
+Z-Image is a full-capacity, undistilled transformer designed to be the backbone for creators, researchers, and developers who require the highest level of creative freedom.
 ### 🌟 Key Features
+- **Undistilled Foundation**: As a non-distilled base model, Z-Image preserves the complete training signal. It supports full Classifier-Free Guidance (CFG), providing the precision required for complex prompt engineering and professional workflows.
+- **Aesthetic Versatility**: Z-Image masters a vast spectrum of visual languages—from hyper-realistic photography and cinematic digital art to intricate anime and stylized illustrations. It is the ideal engine for scenarios requiring rich, multi-dimensional expression.
+- **Enhanced Output Diversity**: Built for exploration, Z-Image delivers significantly higher variability in composition, facial identity, and lighting across different seeds, ensuring that multi-person scenes remain distinct and dynamic.
+- **Built for Development**: The ideal starting point for the community. Its non-distilled nature makes it a good base for LoRA training, structural conditioning (ControlNet) and semantic conditioning.
+- **Robust Negative Control**: Responds with high fidelity to negative prompting, allowing users to reliably suppress artifacts and adjust compositions.
 ### 🆚 Z-Image vs Z-Image-Turbo
 | Aspect | Z-Image | Z-Image-Turbo |
 |------|------|------|
 | CFG | ✅ | ❌ |
+| Steps | 28~50 | 8 |
 | Fintunablity | ✅ | ❌ |
 | Negative Prompting | ✅ | ❌ |
 | Diversity | High | Low |
 - **Resolution:** 512×512 to 2048×2048 (total pixel area, any aspect ratio)
 - **Guidance scale:** 3.0 – 5.0
 - **Inference steps:** 28 – 50
 ### Usage Example