> ## Documentation Index
> Fetch the complete documentation index at: https://docs.imagine.art/llms.txt
> Use this file to discover all available pages before exploring further.

# Wan 2 6

<div style={{background: "linear-gradient(135deg, #00080f 0%, #001a3a 55%, #000812 100%)", borderRadius: "20px", padding: "3.5rem 3rem 3rem", marginBottom: "2.5rem", overflow: "hidden", position: "relative"}}>
  <div style={{position: "absolute", inset: "0", background: "radial-gradient(ellipse at 65% 20%, rgba(0,100,255,0.18) 0%, transparent 55%), radial-gradient(ellipse at 10% 75%, rgba(124,0,251,0.12) 0%, transparent 50%)", pointerEvents: "none"}} />

  <div style={{position: "relative"}}>
    <div style={{display: "flex", gap: "0.5rem", marginBottom: "1.5rem", flexWrap: "wrap"}}>
      <span style={{background: "rgba(0,80,200,0.3)", border: "1px solid rgba(0,100,255,0.4)", borderRadius: "100px", padding: "0.3rem 1rem", fontSize: "0.72rem", color: "#7eb8ff", fontWeight: "500", letterSpacing: "0.06em"}}>VIDEO MODEL</span>
      <span style={{background: "rgba(255,255,255,0.06)", border: "1px solid rgba(255,255,255,0.12)", borderRadius: "100px", padding: "0.3rem 1rem", fontSize: "0.72rem", color: "rgba(255,255,255,0.45)", fontWeight: "400"}}>by Alibaba</span>
      <span style={{background: "rgba(255,255,255,0.06)", border: "1px solid rgba(255,255,255,0.12)", borderRadius: "100px", padding: "0.3rem 1rem", fontSize: "0.72rem", color: "rgba(255,255,255,0.45)", fontWeight: "400"}}>Wan family</span>
    </div>

    <h1 style={{fontSize: "clamp(2.5rem, 5vw, 3.75rem)", fontWeight: "700", color: "#ffffff", lineHeight: "1.1", letterSpacing: "-0.025em", margin: "0 0 1.1rem 0"}}>Wan 2.6</h1>
    <p style={{fontSize: "1.1rem", color: "rgba(255,255,255,0.52)", maxWidth: "580px", lineHeight: "1.7", marginBottom: "2.25rem"}}>Alibaba's reference-to-video model — insert a character's appearance and voice from a reference input, generate multi-shot narratives with synchronized audio, and produce up to 15 seconds at 1080p with precise lip-sync. Built for character-centric, multilingual, and audio-synchronized production.</p>

    <div style={{display: "flex", gap: "0.75rem", flexWrap: "wrap"}}>
      <div style={{background: "rgba(255,255,255,0.06)", borderRadius: "14px", padding: "0.875rem 1.5rem", border: "1px solid rgba(255,255,255,0.1)"}}>
        <div style={{fontSize: "0.62rem", color: "rgba(255,255,255,0.32)", textTransform: "uppercase", letterSpacing: "0.1em", marginBottom: "0.3rem"}}>Duration</div>
        <div style={{fontSize: "1rem", color: "#ffffff", fontWeight: "600"}}>Up to 15 seconds</div>
      </div>

      <div style={{background: "rgba(255,255,255,0.06)", borderRadius: "14px", padding: "0.875rem 1.5rem", border: "1px solid rgba(255,255,255,0.1)"}}>
        <div style={{fontSize: "0.62rem", color: "rgba(255,255,255,0.32)", textTransform: "uppercase", letterSpacing: "0.1em", marginBottom: "0.3rem"}}>Resolution</div>
        <div style={{fontSize: "1rem", color: "#ffffff", fontWeight: "600"}}>720p–1080p</div>
      </div>

      <div style={{background: "rgba(255,255,255,0.06)", borderRadius: "14px", padding: "0.875rem 1.5rem", border: "1px solid rgba(255,255,255,0.1)"}}>
        <div style={{fontSize: "0.62rem", color: "rgba(255,255,255,0.32)", textTransform: "uppercase", letterSpacing: "0.1em", marginBottom: "0.3rem"}}>Audio</div>
        <div style={{fontSize: "1rem", color: "#ffffff", fontWeight: "600"}}>SFX + Music + Lip-sync</div>
      </div>

      <div style={{background: "rgba(255,255,255,0.06)", borderRadius: "14px", padding: "0.875rem 1.5rem", border: "1px solid rgba(255,255,255,0.1)"}}>
        <div style={{fontSize: "0.62rem", color: "rgba(255,255,255,0.32)", textTransform: "uppercase", letterSpacing: "0.1em", marginBottom: "0.3rem"}}>Frame rate</div>
        <div style={{fontSize: "1rem", color: "#ffffff", fontWeight: "600"}}>24 FPS</div>
      </div>
    </div>
  </div>
</div>

## Reference-to-video: put real characters in any scene

Wan 2.6's headline capability is its R2V (Reference-to-Video) mode — upload a reference image of a character and Wan 2.6 inserts that character's appearance consistently into a generated scene. Combined with voice reference input, both the character's face and voice can be preserved in the generated video, making Wan 2.6 uniquely capable for creator-centric workflows where personal or brand character identity needs to appear in generated content.

The model also introduces comprehensive upgrades across text-to-video, image-to-video, and audio-to-video generation with one-pass A/V synchronization and precise lip-sync.

## Capabilities

<CardGroup cols={3}>
  <Card title="Reference-to-video (R2V)" icon="user">
    Insert a character's appearance from a reference image — and optionally their voice — into any generated scene with consistent identity preservation.
  </Card>

  <Card title="One-pass A/V synchronization" icon="music">
    Audio and video generated in a single pass — synchronized sound effects, music, and voice generated with the video without post-production.
  </Card>

  <Card title="Precise lip-sync" icon="waveform">
    Character lip movements synchronized accurately with generated or reference audio — suitable for dialogue-driven content.
  </Card>

  <Card title="Multi-shot storytelling" icon="clapperboard">
    Generates coherent multi-shot sequences from simple prompts — scene transitions, character continuity, and narrative flow maintained automatically.
  </Card>

  <Card title="Up to 15 seconds" icon="clock">
    One of the longer generation windows in the lineup — supports more developed narrative sequences at 5, 10, or 15-second intervals.
  </Card>

  <Card title="Multiple generation modes" icon="layer-group">
    Text-to-video, image-to-video, audio-to-video, and reference-to-video all supported in a single model.
  </Card>
</CardGroup>

## Generation modes

| Mode                         | Description                                                     |
| ---------------------------- | --------------------------------------------------------------- |
| **Text-to-video**            | Generate video from text prompt with A/V sync                   |
| **Image-to-video**           | Animate a reference image with motion and audio                 |
| **Reference-to-video (R2V)** | Insert a character's appearance and voice from reference inputs |
| **Audio-to-video**           | Generate matching visuals from an audio reference               |

## Specifications

| Feature           | Details                                |
| ----------------- | -------------------------------------- |
| **Developer**     | Alibaba (Wan Video)                    |
| **Resolution**    | 720p, 1080p                            |
| **Duration**      | 5, 10, or 15 seconds                   |
| **Frame rate**    | 24 FPS                                 |
| **Aspect ratios** | 16:9, 9:16, 1:1, 4:3, 3:4              |
| **Audio**         | SFX, music, synchronized, lip-sync     |
| **R2V**           | Character appearance + voice insertion |

## How to use

<Tabs>
  <Tab title="Reference-to-video">
    <Steps>
      <Step title="Open the AI Video Generator">
        Log into ImagineArt and go to the **AI Video Generator**.
      </Step>

      <Step title="Select Wan 2.6">
        Choose **Wan 2.6** from the model dropdown.
      </Step>

      <Step title="Select R2V mode">
        Choose the **Reference-to-Video** generation mode.
      </Step>

      <Step title="Upload character reference">
        Upload a reference image of the character to use. Optionally, upload a voice reference audio clip.
      </Step>

      <Step title="Describe the scene">
        Write a prompt describing the scene, environment, action, and audio atmosphere around your character.
      </Step>

      <Step title="Generate">
        Click **Generate**. Wan 2.6 places your referenced character into the generated scene with synchronized audio.
      </Step>
    </Steps>
  </Tab>

  <Tab title="Text to video">
    <Steps>
      <Step title="Open the AI Video Generator">
        Go to the **AI Video Generator** and select **Wan 2.6**.
      </Step>

      <Step title="Write your prompt">
        Describe scene, subjects, motion, and audio cues. Include any multi-shot structure with transition cues.
      </Step>

      <Step title="Set duration and resolution">
        Choose 5, 10, or 15 seconds at your target resolution.
      </Step>

      <Step title="Generate">
        Click **Generate** for audio-synced video.
      </Step>
    </Steps>
  </Tab>
</Tabs>

## Prompting tips

* **R2V: describe the scene, not the character** — The reference image provides the character; your prompt should focus on the setting, action, camera, and audio environment.
* **Include audio cues for one-pass sync** — "A jazz trio plays softly in the background" or "footsteps echo on the marble floor" integrate directly into the audio generation.
* **Multi-shot: use transition language** — "THEN CUT TO:" or "The camera pulls back to reveal..." cues structured multi-shot generation.
* **15-second clips for narratives** — Use the full 15-second window for storylines that need a beginning, middle, and resolution within one generation.

### Example prompts

> \[R2V mode] Reference character appears as a chef in a busy restaurant kitchen. The chef plates a dish confidently, a soft smile as they look at the camera. Warm kitchen sounds, sizzling in background. 10 seconds.

> A multilingual brand video: a young woman introduces a product in front of a clean white background. She speaks naturally, hands gesturing. Confident, friendly. 10 seconds, 1080p.

## Compare models

| Model                                                 | R2V | Audio | Lip-sync     | Duration | Best for                   |
| ----------------------------------------------------- | --- | ----- | ------------ | -------- | -------------------------- |
| **Wan 2.6**                                           | Yes | Yes   | Yes          | 15s      | Character reference, A/V   |
| [Wan 2.5](/ai-models/video/wan-2-5)                   | No  | Yes   | Yes          | 10s      | General A/V production     |
| [Wan 2.2](/ai-models/video/wan-2-2)                   | No  | No    | No           | 5s       | Camera control, style LoRA |
| [Seedance 1.5 Pro](/ai-models/video/seedance-1-5-pro) | No  | Yes   | Multilingual | 12s      | Multilingual precision     |

<Tip>
  Wan 2.6 is the best choice when a specific character needs to appear consistently in generated video — the R2V system provides character identity preservation that other models can't match from a simple image reference alone.
</Tip>
