VIDEO MODEL by ByteDance Released February 2026

Seedance 2

ByteDance's most advanced video model — native audio-video joint generation, four generation modes including first-and-last-frame and full references mode, up to 9 reference images, 3 reference videos, and 3 audio clips, with up to 15 seconds of output at 720p–1080p.

References

9 img + 3 vid + 3 audio

Duration

4–15 seconds

Audio

Dialogue + SFX + Music

Modes

4 generation modes

Seedance 2 is available in a [Fast variant](/ai-models/video/seedance-2-fast) with the same architecture but lower latency — use Fast for rapid iteration and Seedance 2 for maximum quality final renders. ## ByteDance's most capable video model Seedance 2, released February 10, 2026, is built on the Dual-Branch Diffusion Transformer (DB-DiT) architecture — a significant advancement over the Seedance 1 generation. The model generates audio and video jointly in a single pass, with audio (dialogue, sound effects, music) synchronized at the frame level with the visual output. The references system is the most expansive in the Seedance lineup: up to 9 reference images, 3 reference video clips, and 3 reference audio clips can be provided simultaneously, giving exhaustive creative control over visual style, character appearance, motion patterns, and audio atmosphere. ## Generation modes Generate video directly from a text prompt. Describe scene, motion, camera behavior, and audio environment — Seedance 2 generates the complete audio-visual output. Animate a reference image with described motion. Camera behavior, lighting changes, and audio elements are all added in generation. Define both the opening and closing frames — Seedance 2 generates the motion, lighting, and audio between them for precise transition control. Use up to 9 images, 3 video clips, and 3 audio clips as simultaneous references for maximum creative direction over every aspect of the output. ## Capabilities Audio and video generated in a single pass — dialogue, sound effects, and music synchronized at the frame level without post-processing. Maintains subject identity, visual style, and scene logic across shots and transitions within a single generation. 9 reference images + 3 reference videos + 3 reference audio clips — the most comprehensive reference input system in the lineup. Complex camera movements including dolly, zoom, pan, tracking, and crane shots with cinematographic accuracy. Extended generation window at 720p–1080p — suitable for narrative sequences, commercial spots, and music video segments. Dual-Branch Diffusion Transformer processes visual and audio branches simultaneously for coherent joint generation. ## Specifications | Feature | Details | | ------------------------ | ------------------------------------------ | | **Developer** | ByteDance | | **Released** | February 10, 2026 | | **Architecture** | Dual-Branch Diffusion Transformer (DB-DiT) | | **Resolution** | 720p–1080p | | **Duration** | 4–15 seconds | | **Aspect ratios** | 21:9, 16:9, 4:3, 1:1, 3:4, 9:16 | | **Audio** | Dialogue, SFX, music (native) | | **Max reference images** | 9 | | **Max reference videos** | 3 | | **Max reference audio** | 3 | | **Generation modes** | 4 | ## Availability and requirements | Requirement | Details | | ---------------------- | ------------------------------------- | | **Plan** | Creator plan or above | | **Email verification** | Business domain verification required | ## How to use Before accessing Seedance 2, complete business domain email verification in your account settings. Log into ImagineArt and go to the **AI Video Generator**. Choose **Seedance 2** from the model dropdown. Confirm your plan is Creator or above. Select **Text to Video**, **Image to Video**, **First and Last Frame**, or **References** depending on your workflow. In References mode, upload up to 9 images, 3 video clips, and 3 audio clips to guide the output. Describe the scene, subject, motion, camera behavior, and audio atmosphere. Click **Generate** to create your video with synchronized audio. ## Prompting tips * **Describe audio explicitly** — "With the sound of a violin playing softly in the background" or "city traffic noise in the distance" directly influences the audio generation. * **Use audio references for music style** — Upload a short audio clip in References mode to anchor the musical style and tempo of the generated audio. * **First-and-Last-Frame for precise transitions** — Define your opening and closing images; write the prompt around motion style and atmosphere rather than restating what's in the frames. * **Multi-shot: use transition cues** — "THEN CUT TO:" or "The camera pulls back to reveal..." helps Seedance 2 understand shot structure. ### Example prompts > A musician plays acoustic guitar on a rooftop at sunset. The camera slowly orbits around them. Warm orange light, city skyline in background. Guitar melody generated naturally with the visuals. 10 seconds. > FIRST FRAME: woman standing at a window looking out at rain. LAST FRAME: woman smiling, holding a warm mug. Generate the transition — mood shift from pensive to content. Soft piano music. ## Seedance family comparison | Model | Audio | References | Duration | Speed | Best for | | ----------------------------------------------------- | -------------- | ----------------------- | -------- | -------- | ---------------------------- | | **Seedance 2** | Yes | 9 img + 3 vid + 3 audio | 4–15s | Standard | Max quality, full multimodal | | [Seedance 2 Fast](/ai-models/video/seedance-2-fast) | Yes | 9 img + 3 vid + 3 audio | 4–15s | Fast | Rapid iteration, pipelines | | [Seedance 1.5 Pro](/ai-models/video/seedance-1-5-pro) | Yes (lip-sync) | Image input | 4–12s | Standard | Multilingual dialogue | | [Seedance 1.0 Pro](/ai-models/video/seedance-1-0-pro) | No | Image input | 5–10s | Standard | Cinematic storytelling |