IMAGE MODELby Alibaba Tongyi LabApache 2.0#1 open-source

Z Image Turbo

Alibaba’s ultra-fast, open-source image model — 8-step distilled generation, bilingual English and Chinese text rendering, and photorealistic quality at approximately 4× the speed of FLUX. Ranked #1 among open-source models on the Artificial Analysis Text-to-Image Leaderboard.

Parameters

6.15 Billion

Inference steps

8 steps

Resolution

Up to 2048×2048

Released

November 2025

Built for speed without sacrificing quality

Z Image Turbo is built on S3-DiT (Scalable Single-Stream Diffusion Transformer) — a unified architecture where text, visual semantic, and image tokens are processed in a single stream rather than dual-stream models like FLUX. Combined with Decoupled-DMD distillation, generation is compressed to just 8 steps with no classifier-free guidance required, delivering results approximately 4× faster than FLUX.2 Dev at comparable or better quality.

Capabilities

Photorealistic output

Photography-grade quality with accurate lighting, shadows, and fine material detail. Performs at or above models with 5× more parameters.

World knowledge grounding

Accurately renders named landmarks, cultural references, and recognizable figures — drawing on Alibaba’s Qwen3-4B text encoder for depth.

Prompt enhancement

Built-in structured reasoning chains expand and refine prompts automatically for richer, more coherent outputs from short instructions.

Specifications

Feature	Details
Architecture	S3-DiT (Scalable Single-Stream Diffusion Transformer)
Text encoder	Qwen3-4B
Parameters	6.15 billion
Inference steps	8 (distilled via Decoupled-DMD)
CFG guidance	Not required (scale: 0.0)
Resolution	512×512 to 2048×2048
VRAM requirement	16 GB (fits RTX 3080 Ti, 4080, Mac M-series)
License	Apache 2.0
Released	November 26, 2025

Benchmarks

Z Image Turbo was evaluated against leading proprietary and open-source models:

Benchmark	Z Image Turbo	Ranking
Artificial Analysis Text-to-Image Leaderboard	Elo 1025, 45% win rate	#1 open-source, 4th overall
CVTG-2K text rendering (word accuracy)	0.8585	Top tier
LongText-Bench English	0.917	Top tier
LongText-Bench Chinese	0.926	Top tier
Speed vs. FLUX.2 Dev (100 imgs @ 1024×1024)	279s vs. 1,152s	~4× faster

How to use

Open the AI Image Generator

Go to the ImagineArt AI Image Generator.

Select the model

From the model dropdown, choose Z Image Turbo.

Write your prompt

Write a clear, focused prompt. Z Image Turbo responds best to precise, concise descriptions — overly long prompts can add noise rather than detail.

Set your resolution

Choose from 512×512 up to 2048×2048. The model performs consistently across the full resolution range.

Generate

Click Generate. At 8 steps, results arrive significantly faster than most other models.

Prompting tips

Keep prompts concise and specific — Z Image Turbo is optimized for structured, precise prompts. Dense, paragraph-length prompts can reduce coherence rather than improve it.
For bilingual text in images — Include both the English and Chinese text you want rendered, with explicit placement: “A product banner with bold red text reading ‘Summer Sale’ and ‘夏季特卖’ below it.”
Avoid high CFG values — The model was trained at guidance scale 0.0. Using high CFG in manual configurations introduces artifacts. Leave guidance at default.
Use prompt enhancement — Enable the built-in prompt enhancer for short or abstract prompts. It applies Alibaba’s structured reasoning to expand your intent into richer descriptions.

Example prompts

A Japanese ramen shop at night, warm amber light spilling from the windows onto rain-wet cobblestones, steam rising from bowls inside, photorealistic, cinematic composition.

A product flatlay of a wireless speaker on brushed concrete, minimalist studio lighting, crisp shadow, commercial photography style.

A bold event poster with “OPEN MIC NIGHT” in large neon-style lettering and “每周五 / Every Friday” beneath it, dark urban background.

Compare models

Model	Speed	Text rendering	Parameters	License	Best for
Z Image Turbo	~4× faster than FLUX	Bilingual (EN + ZH), low WER	6.15B	Apache 2.0	Rapid generation, bilingual, photorealism
Flux Dev	Moderate (~7–18s)	Decent	12B	Non-commercial	Fine-tuning base, creative research
Qwen Image	Fast	Excellent (EN + ZH)	7B (2.0)	Apache 2.0	Illustrations, bilingual, complex layouts
Seedream v3	Seconds	EN + ZH	12B	Commercial	Fast branded imagery
ImagineArt 1.0	Industry-leading	Good	—	Commercial	Photorealistic portraits

Z Image Turbo is developed by Alibaba’s Tongyi Lab under the Apache 2.0 license. It outperforms models with 5× more parameters — including FLUX.2 Dev (32B) — on several benchmarks, making it one of the most efficient high-quality image models available.

​Z Image Turbo

​Built for speed without sacrificing quality

​Capabilities

Photorealistic output

World knowledge grounding

Prompt enhancement

​Specifications

​Benchmarks

​How to use

​Prompting tips

​Example prompts

​Compare models

Z Image Turbo

Built for speed without sacrificing quality

Capabilities

Specifications

Benchmarks

How to use

Prompting tips

Example prompts

Compare models