Documentation Index
Fetch the complete documentation index at: https://docs.imagine.art/llms.txt
Use this file to discover all available pages before exploring further.
Seedance 2 is available in a Fast variant with the same architecture but lower latency — use Fast for rapid iteration and Seedance 2 for maximum quality final renders.
ByteDance’s most capable video model
Seedance 2, released February 10, 2026, is built on the Dual-Branch Diffusion Transformer (DB-DiT) architecture — a significant advancement over the Seedance 1 generation. The model generates audio and video jointly in a single pass, with audio (dialogue, sound effects, music) synchronized at the frame level with the visual output. The references system is the most expansive in the Seedance lineup: up to 9 reference images, 3 reference video clips, and 3 reference audio clips can be provided simultaneously, giving exhaustive creative control over visual style, character appearance, motion patterns, and audio atmosphere.Generation modes
Text to Video
Generate video directly from a text prompt. Describe scene, motion, camera behavior, and audio environment — Seedance 2 generates the complete audio-visual output.
Image to Video
Animate a reference image with described motion. Camera behavior, lighting changes, and audio elements are all added in generation.
First and Last Frame
Define both the opening and closing frames — Seedance 2 generates the motion, lighting, and audio between them for precise transition control.
References Mode
Use up to 9 images, 3 video clips, and 3 audio clips as simultaneous references for maximum creative direction over every aspect of the output.
Capabilities
Native audio-video joint generation
Audio and video generated in a single pass — dialogue, sound effects, and music synchronized at the frame level without post-processing.
Multi-shot narrative coherence
Maintains subject identity, visual style, and scene logic across shots and transitions within a single generation.
Exhaustive reference system
9 reference images + 3 reference videos + 3 reference audio clips — the most comprehensive reference input system in the lineup.
Advanced camera control
Complex camera movements including dolly, zoom, pan, tracking, and crane shots with cinematographic accuracy.
Up to 15 seconds
Extended generation window at 720p–1080p — suitable for narrative sequences, commercial spots, and music video segments.
DB-DiT architecture
Dual-Branch Diffusion Transformer processes visual and audio branches simultaneously for coherent joint generation.
Specifications
| Feature | Details |
|---|---|
| Developer | ByteDance |
| Released | February 10, 2026 |
| Architecture | Dual-Branch Diffusion Transformer (DB-DiT) |
| Resolution | 720p–1080p |
| Duration | 4–15 seconds |
| Aspect ratios | 21:9, 16:9, 4:3, 1:1, 3:4, 9:16 |
| Audio | Dialogue, SFX, music (native) |
| Max reference images | 9 |
| Max reference videos | 3 |
| Max reference audio | 3 |
| Generation modes | 4 |
Availability and requirements
| Requirement | Details |
|---|---|
| Plan | Creator plan or above |
| Email verification | Business domain verification required |
How to use
Verify your business email
Before accessing Seedance 2, complete business domain email verification in your account settings.
Choose your generation mode
Select Text to Video, Image to Video, First and Last Frame, or References depending on your workflow.
Add references (optional)
In References mode, upload up to 9 images, 3 video clips, and 3 audio clips to guide the output.
Prompting tips
- Describe audio explicitly — “With the sound of a violin playing softly in the background” or “city traffic noise in the distance” directly influences the audio generation.
- Use audio references for music style — Upload a short audio clip in References mode to anchor the musical style and tempo of the generated audio.
- First-and-Last-Frame for precise transitions — Define your opening and closing images; write the prompt around motion style and atmosphere rather than restating what’s in the frames.
- Multi-shot: use transition cues — “THEN CUT TO:” or “The camera pulls back to reveal…” helps Seedance 2 understand shot structure.
Example prompts
A musician plays acoustic guitar on a rooftop at sunset. The camera slowly orbits around them. Warm orange light, city skyline in background. Guitar melody generated naturally with the visuals. 10 seconds.
FIRST FRAME: woman standing at a window looking out at rain. LAST FRAME: woman smiling, holding a warm mug. Generate the transition — mood shift from pensive to content. Soft piano music.
Seedance family comparison
| Model | Audio | References | Duration | Speed | Best for |
|---|---|---|---|---|---|
| Seedance 2 | Yes | 9 img + 3 vid + 3 audio | 4–15s | Standard | Max quality, full multimodal |
| Seedance 2 Fast | Yes | 9 img + 3 vid + 3 audio | 4–15s | Fast | Rapid iteration, pipelines |
| Seedance 1.5 Pro | Yes (lip-sync) | Image input | 4–12s | Standard | Multilingual dialogue |
| Seedance 1.0 Pro | No | Image input | 5–10s | Standard | Cinematic storytelling |

