Documentation Index
Fetch the complete documentation index at: https://docs.imagine.art/llms.txt
Use this file to discover all available pages before exploring further.

Built for speed without sacrificing quality
Z Image Turbo is built on S3-DiT (Scalable Single-Stream Diffusion Transformer) — a unified architecture where text, visual semantic, and image tokens are processed in a single stream rather than dual-stream models like FLUX. Combined with Decoupled-DMD distillation, generation is compressed to just 8 steps with no classifier-free guidance required, delivering results approximately 4× faster than FLUX.2 Dev at comparable or better quality.Capabilities
Photorealistic output
Photography-grade quality with accurate lighting, shadows, and fine material detail. Performs at or above models with 5× more parameters.
World knowledge grounding
Accurately renders named landmarks, cultural references, and recognizable figures — drawing on Alibaba’s Qwen3-4B text encoder for depth.
Prompt enhancement
Built-in structured reasoning chains expand and refine prompts automatically for richer, more coherent outputs from short instructions.
.webp?fit=max&auto=format&n=410AHfb3Tlhn-w0b&q=85&s=5215acbee07a87b8bd0b715f87836fd8)

Specifications
| Feature | Details |
|---|---|
| Architecture | S3-DiT (Scalable Single-Stream Diffusion Transformer) |
| Text encoder | Qwen3-4B |
| Parameters | 6.15 billion |
| Inference steps | 8 (distilled via Decoupled-DMD) |
| CFG guidance | Not required (scale: 0.0) |
| Resolution | 512×512 to 2048×2048 |
| VRAM requirement | 16 GB (fits RTX 3080 Ti, 4080, Mac M-series) |
| License | Apache 2.0 |
| Released | November 26, 2025 |
Benchmarks
Z Image Turbo was evaluated against leading proprietary and open-source models:| Benchmark | Z Image Turbo | Ranking |
|---|---|---|
| Artificial Analysis Text-to-Image Leaderboard | Elo 1025, 45% win rate | #1 open-source, 4th overall |
| CVTG-2K text rendering (word accuracy) | 0.8585 | Top tier |
| LongText-Bench English | 0.917 | Top tier |
| LongText-Bench Chinese | 0.926 | Top tier |
| Speed vs. FLUX.2 Dev (100 imgs @ 1024×1024) | 279s vs. 1,152s | ~4× faster |


How to use
Write your prompt
Write a clear, focused prompt. Z Image Turbo responds best to precise, concise descriptions — overly long prompts can add noise rather than detail.
Set your resolution
Choose from 512×512 up to 2048×2048. The model performs consistently across the full resolution range.
Prompting tips
- Keep prompts concise and specific — Z Image Turbo is optimized for structured, precise prompts. Dense, paragraph-length prompts can reduce coherence rather than improve it.
- For bilingual text in images — Include both the English and Chinese text you want rendered, with explicit placement: “A product banner with bold red text reading ‘Summer Sale’ and ‘夏季特卖’ below it.”
- Avoid high CFG values — The model was trained at guidance scale 0.0. Using high CFG in manual configurations introduces artifacts. Leave guidance at default.
- Use prompt enhancement — Enable the built-in prompt enhancer for short or abstract prompts. It applies Alibaba’s structured reasoning to expand your intent into richer descriptions.
Example prompts
A Japanese ramen shop at night, warm amber light spilling from the windows onto rain-wet cobblestones, steam rising from bowls inside, photorealistic, cinematic composition.
A product flatlay of a wireless speaker on brushed concrete, minimalist studio lighting, crisp shadow, commercial photography style.
A bold event poster with “OPEN MIC NIGHT” in large neon-style lettering and “每周五 / Every Friday” beneath it, dark urban background.
Compare models
| Model | Speed | Text rendering | Parameters | License | Best for |
|---|---|---|---|---|---|
| Z Image Turbo | ~4× faster than FLUX | Bilingual (EN + ZH), low WER | 6.15B | Apache 2.0 | Rapid generation, bilingual, photorealism |
| Flux Dev | Moderate (~7–18s) | Decent | 12B | Non-commercial | Fine-tuning base, creative research |
| Qwen Image | Fast | Excellent (EN + ZH) | 7B (2.0) | Apache 2.0 | Illustrations, bilingual, complex layouts |
| Seedream v3 | Seconds | EN + ZH | 12B | Commercial | Fast branded imagery |
| ImagineArt 1.0 | Industry-leading | Good | — | Commercial | Photorealistic portraits |
Z Image Turbo is developed by Alibaba’s Tongyi Lab under the Apache 2.0 license. It outperforms models with 5× more parameters — including FLUX.2 Dev (32B) — on several benchmarks, making it one of the most efficient high-quality image models available.

