Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.imagine.art/llms.txt

Use this file to discover all available pages before exploring further.

IMAGE MODELby Alibaba Tongyi LabApache 2.0#1 open-source

Z Image Turbo

Alibaba’s ultra-fast, open-source image model — 8-step distilled generation, bilingual English and Chinese text rendering, and photorealistic quality at approximately 4× the speed of FLUX. Ranked #1 among open-source models on the Artificial Analysis Text-to-Image Leaderboard.

Parameters
6.15 Billion
Inference steps
8 steps
Resolution
Up to 2048×2048
Released
November 2025
Zimageturbo 3

Built for speed without sacrificing quality

Z Image Turbo is built on S3-DiT (Scalable Single-Stream Diffusion Transformer) — a unified architecture where text, visual semantic, and image tokens are processed in a single stream rather than dual-stream models like FLUX. Combined with Decoupled-DMD distillation, generation is compressed to just 8 steps with no classifier-free guidance required, delivering results approximately 4× faster than FLUX.2 Dev at comparable or better quality.

Capabilities

Photorealistic output

Photography-grade quality with accurate lighting, shadows, and fine material detail. Performs at or above models with 5× more parameters.

World knowledge grounding

Accurately renders named landmarks, cultural references, and recognizable figures — drawing on Alibaba’s Qwen3-4B text encoder for depth.

Prompt enhancement

Built-in structured reasoning chains expand and refine prompts automatically for richer, more coherent outputs from short instructions.
Z Image Turbo exampleZ Image Turbo example

Specifications

FeatureDetails
ArchitectureS3-DiT (Scalable Single-Stream Diffusion Transformer)
Text encoderQwen3-4B
Parameters6.15 billion
Inference steps8 (distilled via Decoupled-DMD)
CFG guidanceNot required (scale: 0.0)
Resolution512×512 to 2048×2048
VRAM requirement16 GB (fits RTX 3080 Ti, 4080, Mac M-series)
LicenseApache 2.0
ReleasedNovember 26, 2025

Benchmarks

Z Image Turbo was evaluated against leading proprietary and open-source models:
BenchmarkZ Image TurboRanking
Artificial Analysis Text-to-Image LeaderboardElo 1025, 45% win rate#1 open-source, 4th overall
CVTG-2K text rendering (word accuracy)0.8585Top tier
LongText-Bench English0.917Top tier
LongText-Bench Chinese0.926Top tier
Speed vs. FLUX.2 Dev (100 imgs @ 1024×1024)279s vs. 1,152s~4× faster
Z Image Turbo exampleZ Image Turbo example

How to use

1

Open the AI Image Generator

Go to the ImagineArt AI Image Generator.
2

Select the model

From the model dropdown, choose Z Image Turbo.
3

Write your prompt

Write a clear, focused prompt. Z Image Turbo responds best to precise, concise descriptions — overly long prompts can add noise rather than detail.
4

Set your resolution

Choose from 512×512 up to 2048×2048. The model performs consistently across the full resolution range.
5

Generate

Click Generate. At 8 steps, results arrive significantly faster than most other models.

Prompting tips

  • Keep prompts concise and specific — Z Image Turbo is optimized for structured, precise prompts. Dense, paragraph-length prompts can reduce coherence rather than improve it.
  • For bilingual text in images — Include both the English and Chinese text you want rendered, with explicit placement: “A product banner with bold red text reading ‘Summer Sale’ and ‘夏季特卖’ below it.”
  • Avoid high CFG values — The model was trained at guidance scale 0.0. Using high CFG in manual configurations introduces artifacts. Leave guidance at default.
  • Use prompt enhancement — Enable the built-in prompt enhancer for short or abstract prompts. It applies Alibaba’s structured reasoning to expand your intent into richer descriptions.

Example prompts

A Japanese ramen shop at night, warm amber light spilling from the windows onto rain-wet cobblestones, steam rising from bowls inside, photorealistic, cinematic composition.
A product flatlay of a wireless speaker on brushed concrete, minimalist studio lighting, crisp shadow, commercial photography style.
A bold event poster with “OPEN MIC NIGHT” in large neon-style lettering and “每周五 / Every Friday” beneath it, dark urban background.

Compare models

ModelSpeedText renderingParametersLicenseBest for
Z Image Turbo~4× faster than FLUXBilingual (EN + ZH), low WER6.15BApache 2.0Rapid generation, bilingual, photorealism
Flux DevModerate (~7–18s)Decent12BNon-commercialFine-tuning base, creative research
Qwen ImageFastExcellent (EN + ZH)7B (2.0)Apache 2.0Illustrations, bilingual, complex layouts
Seedream v3SecondsEN + ZH12BCommercialFast branded imagery
ImagineArt 1.0Industry-leadingGoodCommercialPhotorealistic portraits
Z Image Turbo is developed by Alibaba’s Tongyi Lab under the Apache 2.0 license. It outperforms models with 5× more parameters — including FLUX.2 Dev (32B) — on several benchmarks, making it one of the most efficient high-quality image models available.