Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.imagine.art/llms.txt

Use this file to discover all available pages before exploring further.

VIDEO MODELby ByteDanceReleased February 2026

Seedance 2

ByteDance’s most advanced video model — native audio-video joint generation, four generation modes including first-and-last-frame and full references mode, up to 9 reference images, 3 reference videos, and 3 audio clips, with up to 15 seconds of output at 720p–1080p.

References
9 img + 3 vid + 3 audio
Duration
4–15 seconds
Audio
Dialogue + SFX + Music
Modes
4 generation modes
Seedance 2 is available in a Fast variant with the same architecture but lower latency — use Fast for rapid iteration and Seedance 2 for maximum quality final renders.

ByteDance’s most capable video model

Seedance 2, released February 10, 2026, is built on the Dual-Branch Diffusion Transformer (DB-DiT) architecture — a significant advancement over the Seedance 1 generation. The model generates audio and video jointly in a single pass, with audio (dialogue, sound effects, music) synchronized at the frame level with the visual output. The references system is the most expansive in the Seedance lineup: up to 9 reference images, 3 reference video clips, and 3 reference audio clips can be provided simultaneously, giving exhaustive creative control over visual style, character appearance, motion patterns, and audio atmosphere.

Generation modes

Text to Video

Generate video directly from a text prompt. Describe scene, motion, camera behavior, and audio environment — Seedance 2 generates the complete audio-visual output.

Image to Video

Animate a reference image with described motion. Camera behavior, lighting changes, and audio elements are all added in generation.

First and Last Frame

Define both the opening and closing frames — Seedance 2 generates the motion, lighting, and audio between them for precise transition control.

References Mode

Use up to 9 images, 3 video clips, and 3 audio clips as simultaneous references for maximum creative direction over every aspect of the output.

Capabilities

Native audio-video joint generation

Audio and video generated in a single pass — dialogue, sound effects, and music synchronized at the frame level without post-processing.

Multi-shot narrative coherence

Maintains subject identity, visual style, and scene logic across shots and transitions within a single generation.

Exhaustive reference system

9 reference images + 3 reference videos + 3 reference audio clips — the most comprehensive reference input system in the lineup.

Advanced camera control

Complex camera movements including dolly, zoom, pan, tracking, and crane shots with cinematographic accuracy.

Up to 15 seconds

Extended generation window at 720p–1080p — suitable for narrative sequences, commercial spots, and music video segments.

DB-DiT architecture

Dual-Branch Diffusion Transformer processes visual and audio branches simultaneously for coherent joint generation.

Specifications

FeatureDetails
DeveloperByteDance
ReleasedFebruary 10, 2026
ArchitectureDual-Branch Diffusion Transformer (DB-DiT)
Resolution720p–1080p
Duration4–15 seconds
Aspect ratios21:9, 16:9, 4:3, 1:1, 3:4, 9:16
AudioDialogue, SFX, music (native)
Max reference images9
Max reference videos3
Max reference audio3
Generation modes4

Availability and requirements

RequirementDetails
PlanCreator plan or above
Email verificationBusiness domain verification required

How to use

1

Verify your business email

Before accessing Seedance 2, complete business domain email verification in your account settings.
2

Open the AI Video Generator

Log into ImagineArt and go to the AI Video Generator.
3

Select Seedance 2

Choose Seedance 2 from the model dropdown. Confirm your plan is Creator or above.
4

Choose your generation mode

Select Text to Video, Image to Video, First and Last Frame, or References depending on your workflow.
5

Add references (optional)

In References mode, upload up to 9 images, 3 video clips, and 3 audio clips to guide the output.
6

Write your prompt

Describe the scene, subject, motion, camera behavior, and audio atmosphere.
7

Generate

Click Generate to create your video with synchronized audio.

Prompting tips

  • Describe audio explicitly — “With the sound of a violin playing softly in the background” or “city traffic noise in the distance” directly influences the audio generation.
  • Use audio references for music style — Upload a short audio clip in References mode to anchor the musical style and tempo of the generated audio.
  • First-and-Last-Frame for precise transitions — Define your opening and closing images; write the prompt around motion style and atmosphere rather than restating what’s in the frames.
  • Multi-shot: use transition cues — “THEN CUT TO:” or “The camera pulls back to reveal…” helps Seedance 2 understand shot structure.

Example prompts

A musician plays acoustic guitar on a rooftop at sunset. The camera slowly orbits around them. Warm orange light, city skyline in background. Guitar melody generated naturally with the visuals. 10 seconds.
FIRST FRAME: woman standing at a window looking out at rain. LAST FRAME: woman smiling, holding a warm mug. Generate the transition — mood shift from pensive to content. Soft piano music.

Seedance family comparison

ModelAudioReferencesDurationSpeedBest for
Seedance 2Yes9 img + 3 vid + 3 audio4–15sStandardMax quality, full multimodal
Seedance 2 FastYes9 img + 3 vid + 3 audio4–15sFastRapid iteration, pipelines
Seedance 1.5 ProYes (lip-sync)Image input4–12sStandardMultilingual dialogue
Seedance 1.0 ProNoImage input5–10sStandardCinematic storytelling