Seedance 2 Fast uses the same underlying model as Seedance 2 but is optimized for lower latency. Choose Seedance 2 Fast for rapid iteration and production pipelines where speed matters; choose Seedance 2 when maximum quality is the priority.
Fast-tier Seedance 2
Seedance 2 Fast is ByteDance’s production-optimized endpoint for the Seedance 2.0 architecture — released February 10, 2026 alongside the standard model. The underlying Dual-Branch Diffusion Transformer is identical; the Fast variant trades a small margin of peak quality for meaningfully lower inference times, making it the practical choice for iterative workflows, A/B testing, and high-frequency generation pipelines. Native audio-video joint generation is preserved in the Fast variant — dialogue, sound effects, and music are generated simultaneously with the video, synchronized at the frame level.Capabilities
Native audio-video generation
Generates dialogue, sound effects, and music synchronized with the video in a single pass — no post-production audio required.
Up to 15 seconds
Supports generation lengths from 4 to 15 seconds, covering short social clips through extended narrative sequences.
Multimodal references
Accepts up to 9 reference images, 3 reference video clips, and 3 audio clips simultaneously for maximum creative direction.
Multi-shot narratives
Generates coherent multi-shot sequences from a single prompt — scene transitions, subject consistency, and style maintained across cuts.
Wide aspect ratio support
Supports 21:9, 16:9, 4:3, 1:1, 3:4, and 9:16 aspect ratios for any platform or format.
Fast inference
Optimized for lower latency — ideal for rapid iteration, pipeline integrations, and high-volume production.
Specifications
| Feature | Details |
|---|---|
| Developer | ByteDance |
| Released | February 10, 2026 |
| Resolution | 720p |
| Duration | 4–15 seconds |
| Aspect ratios | 21:9, 16:9, 4:3, 1:1, 3:4, 9:16 |
| Audio | Dialogue, SFX, music (native) |
| Max reference images | 9 |
| Max reference videos | 3 |
| Max reference audio | 3 |
| Architecture | Dual-Branch Diffusion Transformer (DB-DiT) |
How to use
Choose your input mode
Select text-to-video, image-to-video, or references mode depending on your creative needs.
Add references (optional)
Upload up to 9 reference images, 3 video clips, and 3 audio clips to guide the output style, subject, and sound.
Write your prompt
Describe the scene, subjects, motion, camera behavior, and audio atmosphere in your prompt.
Prompting tips
- Describe audio explicitly — Include what you want to hear: “with the sound of rain pattering on a window and a soft piano melody in the background.”
- Specify camera movement — “Slow dolly forward,” “static wide shot,” or “handheld tracking shot” all meaningfully influence the output.
- Use reference audio for tone — Uploading a reference audio clip helps anchor the musical style and ambient mood of the generated video.
- Keep multi-shot prompts structured — For sequences, describe each shot with a clear transition cue: “SHOT 1: … CUT TO SHOT 2: …”
Example prompts
A chef in a professional kitchen carefully plates a dish under warm overhead lighting. Close-up on hands arranging microgreens. Ambient kitchen sounds — sizzling pans, light chatter in the background. Cinematic, handheld camera.
A timelapse of a city square from empty early morning through bustling midday. Wide establishing shot. Birds chirping at dawn, building to the hum of traffic and crowd noise by noon.
Compare models
| Model | Speed | Max duration | Audio | References | Best for |
|---|---|---|---|---|---|
| Seedance 2 Fast | Fast | 15s | Native | 9 img + 3 vid + 3 audio | Production pipelines, iteration |
| Seedance 2 | Standard | 15s | Native | 9 img + 3 vid + 3 audio | Maximum quality output |
| Seedance 1.5 Pro | Standard | 12s | Native lip-sync | Image input | Dialogue, multilingual |
| Seedance Pro Fast | Fast | 10s | No | Image input | Quick clips without audio |

