IMAGE MODEL by Alibaba Tongyi Lab Apache 2.0 #1 open-source

Z Image Turbo

Alibaba's ultra-fast, open-source image model — 8-step distilled generation, bilingual English and Chinese text rendering, and photorealistic quality at approximately 4× the speed of FLUX. Ranked #1 among open-source models on the Artificial Analysis Text-to-Image Leaderboard.

Parameters

6.15 Billion

Inference steps

8 steps

Resolution

Up to 2048×2048

Released

November 2025

## Built for speed without sacrificing quality Z Image Turbo is built on **S3-DiT** (Scalable Single-Stream Diffusion Transformer) — a unified architecture where text, visual semantic, and image tokens are processed in a single stream rather than dual-stream models like FLUX. Combined with **Decoupled-DMD distillation**, generation is compressed to just 8 steps with no classifier-free guidance required, delivering results approximately 4× faster than FLUX.2 Dev at comparable or better quality. ## Capabilities Photography-grade quality with accurate lighting, shadows, and fine material detail. Performs at or above models with 5× more parameters. Accurately renders named landmarks, cultural references, and recognizable figures — drawing on Alibaba's Qwen3-4B text encoder for depth. Built-in structured reasoning chains expand and refine prompts automatically for richer, more coherent outputs from short instructions.

## Specifications | Feature | Details | | -------------------- | ----------------------------------------------------- | | **Architecture** | S3-DiT (Scalable Single-Stream Diffusion Transformer) | | **Text encoder** | Qwen3-4B | | **Parameters** | 6.15 billion | | **Inference steps** | 8 (distilled via Decoupled-DMD) | | **CFG guidance** | Not required (scale: 0.0) | | **Resolution** | 512×512 to 2048×2048 | | **VRAM requirement** | 16 GB (fits RTX 3080 Ti, 4080, Mac M-series) | | **License** | Apache 2.0 | | **Released** | November 26, 2025 | ## Benchmarks Z Image Turbo was evaluated against leading proprietary and open-source models: | Benchmark | Z Image Turbo | Ranking | | --------------------------------------------- | ---------------------- | ------------------------------- | | Artificial Analysis Text-to-Image Leaderboard | Elo 1025, 45% win rate | **#1 open-source**, 4th overall | | CVTG-2K text rendering (word accuracy) | 0.8585 | Top tier | | LongText-Bench English | 0.917 | Top tier | | LongText-Bench Chinese | 0.926 | Top tier | | Speed vs. FLUX.2 Dev (100 imgs @ 1024×1024) | 279s vs. 1,152s | **\~4× faster** |

## How to use Go to the **ImagineArt AI Image Generator**. From the model dropdown, choose **Z Image Turbo**. Write a clear, focused prompt. Z Image Turbo responds best to precise, concise descriptions — overly long prompts can add noise rather than detail. Choose from 512×512 up to 2048×2048. The model performs consistently across the full resolution range. Click **Generate**. At 8 steps, results arrive significantly faster than most other models. ## Prompting tips * **Keep prompts concise and specific** — Z Image Turbo is optimized for structured, precise prompts. Dense, paragraph-length prompts can reduce coherence rather than improve it. * **For bilingual text in images** — Include both the English and Chinese text you want rendered, with explicit placement: *"A product banner with bold red text reading 'Summer Sale' and '夏季特卖' below it."* * **Avoid high CFG values** — The model was trained at guidance scale 0.0. Using high CFG in manual configurations introduces artifacts. Leave guidance at default. * **Use prompt enhancement** — Enable the built-in prompt enhancer for short or abstract prompts. It applies Alibaba's structured reasoning to expand your intent into richer descriptions. ### Example prompts > A Japanese ramen shop at night, warm amber light spilling from the windows onto rain-wet cobblestones, steam rising from bowls inside, photorealistic, cinematic composition. > A product flatlay of a wireless speaker on brushed concrete, minimalist studio lighting, crisp shadow, commercial photography style. > A bold event poster with "OPEN MIC NIGHT" in large neon-style lettering and "每周五 / Every Friday" beneath it, dark urban background. ## Compare models | Model | Speed | Text rendering | Parameters | License | Best for | | ------------------------------------------------- | --------------------- | ---------------------------- | ---------- | -------------- | ----------------------------------------- | | **Z Image Turbo** | \~4× faster than FLUX | Bilingual (EN + ZH), low WER | 6.15B | Apache 2.0 | Rapid generation, bilingual, photorealism | | [Flux Dev](/ai-models/image/flux-dev) | Moderate (\~7–18s) | Decent | 12B | Non-commercial | Fine-tuning base, creative research | | [Qwen Image](/ai-models/image/qwen-image) | Fast | Excellent (EN + ZH) | 7B (2.0) | Apache 2.0 | Illustrations, bilingual, complex layouts | | [Seedream v3](/ai-models/image/seedream-1) | Seconds | EN + ZH | 12B | Commercial | Fast branded imagery | | [ImagineArt 1.0](/ai-models/image/imagineart-1-0) | Industry-leading | Good | — | Commercial | Photorealistic portraits | Z Image Turbo is developed by Alibaba's Tongyi Lab under the Apache 2.0 license. It outperforms models with 5× more parameters — including FLUX.2 Dev (32B) — on several benchmarks, making it one of the most efficient high-quality image models available.