> ## Documentation Index
> Fetch the complete documentation index at: https://docs.imagine.art/llms.txt
> Use this file to discover all available pages before exploring further.

# Wan 2 5

<div style={{background: "linear-gradient(135deg, #00080f 0%, #001a3a 55%, #000812 100%)", borderRadius: "20px", padding: "3.5rem 3rem 3rem", marginBottom: "2.5rem", overflow: "hidden", position: "relative"}}>
  <div style={{position: "absolute", inset: "0", background: "radial-gradient(ellipse at 80% 60%, rgba(124,0,251,0.18) 0%, transparent 55%), radial-gradient(ellipse at 15% 20%, rgba(0,100,255,0.1) 0%, transparent 50%)", pointerEvents: "none"}} />

  <div style={{position: "relative"}}>
    <div style={{display: "flex", gap: "0.5rem", marginBottom: "1.5rem", flexWrap: "wrap"}}>
      <span style={{background: "rgba(0,80,200,0.3)", border: "1px solid rgba(0,100,255,0.4)", borderRadius: "100px", padding: "0.3rem 1rem", fontSize: "0.72rem", color: "#7eb8ff", fontWeight: "500", letterSpacing: "0.06em"}}>VIDEO MODEL</span>
      <span style={{background: "rgba(255,255,255,0.06)", border: "1px solid rgba(255,255,255,0.12)", borderRadius: "100px", padding: "0.3rem 1rem", fontSize: "0.72rem", color: "rgba(255,255,255,0.45)", fontWeight: "400"}}>by Alibaba</span>
      <span style={{background: "rgba(255,255,255,0.06)", border: "1px solid rgba(255,255,255,0.12)", borderRadius: "100px", padding: "0.3rem 1rem", fontSize: "0.72rem", color: "rgba(255,255,255,0.45)", fontWeight: "400"}}>Wan family</span>
    </div>

    <h1 style={{fontSize: "clamp(2.5rem, 5vw, 3.75rem)", fontWeight: "700", color: "#ffffff", lineHeight: "1.1", letterSpacing: "-0.025em", margin: "0 0 1.1rem 0"}}>Wan 2.5</h1>
    <p style={{fontSize: "1.1rem", color: "rgba(255,255,255,0.52)", maxWidth: "580px", lineHeight: "1.7", marginBottom: "2.25rem"}}>Alibaba's audio-visual sync model — generates ambient sounds, sound effects, and voice with precise lip-sync alongside the video in a single pass. Supports 480p to 1080p at 5 or 10 seconds with flexible aspect ratios and text or image input.</p>

    <div style={{display: "flex", gap: "0.75rem", flexWrap: "wrap"}}>
      <div style={{background: "rgba(255,255,255,0.06)", borderRadius: "14px", padding: "0.875rem 1.5rem", border: "1px solid rgba(255,255,255,0.1)"}}>
        <div style={{fontSize: "0.62rem", color: "rgba(255,255,255,0.32)", textTransform: "uppercase", letterSpacing: "0.1em", marginBottom: "0.3rem"}}>Resolution</div>
        <div style={{fontSize: "1rem", color: "#ffffff", fontWeight: "600"}}>480p – 1080p</div>
      </div>

      <div style={{background: "rgba(255,255,255,0.06)", borderRadius: "14px", padding: "0.875rem 1.5rem", border: "1px solid rgba(255,255,255,0.1)"}}>
        <div style={{fontSize: "0.62rem", color: "rgba(255,255,255,0.32)", textTransform: "uppercase", letterSpacing: "0.1em", marginBottom: "0.3rem"}}>Duration</div>
        <div style={{fontSize: "1rem", color: "#ffffff", fontWeight: "600"}}>5–10 seconds</div>
      </div>

      <div style={{background: "rgba(255,255,255,0.06)", borderRadius: "14px", padding: "0.875rem 1.5rem", border: "1px solid rgba(255,255,255,0.1)"}}>
        <div style={{fontSize: "0.62rem", color: "rgba(255,255,255,0.32)", textTransform: "uppercase", letterSpacing: "0.1em", marginBottom: "0.3rem"}}>Audio</div>
        <div style={{fontSize: "1rem", color: "#ffffff", fontWeight: "600"}}>Ambient + SFX + Voice</div>
      </div>

      <div style={{background: "rgba(255,255,255,0.06)", borderRadius: "14px", padding: "0.875rem 1.5rem", border: "1px solid rgba(255,255,255,0.1)"}}>
        <div style={{fontSize: "0.62rem", color: "rgba(255,255,255,0.32)", textTransform: "uppercase", letterSpacing: "0.1em", marginBottom: "0.3rem"}}>Lip-sync</div>
        <div style={{fontSize: "1rem", color: "#ffffff", fontWeight: "600"}}>Yes</div>
      </div>
    </div>
  </div>
</div>

## Audio-visual synchronization in a single pass

Wan 2.5 is Alibaba's dedicated audio-visual synchronization model in the Wan family. Its primary strength is the one-pass A/V generation system — ambient sounds, sound effects, and voice are generated simultaneously with the video, synchronized at the frame level without post-production. Lip-sync support makes it particularly well-suited for content where characters speak, sing, or react expressively to audio.

For reference-to-video with character insertion and voice reference support, see [Wan 2.6](/ai-models/video/wan-2-6) — the successor model with expanded capabilities. Wan 2.5 is the audio-capable general-purpose member of the Wan family for standard A/V production.

## Capabilities

<CardGroup cols={3}>
  <Card title="One-pass A/V synchronization" icon="music">
    Ambient sounds, sound effects, and voice generated simultaneously with the video — no separate audio editing or syncing required.
  </Card>

  <Card title="Precise lip-sync" icon="waveform">
    Character lip movements synchronized accurately with generated audio — suitable for dialogue, narration, and character-driven clips.
  </Card>

  <Card title="Smooth motion flow" icon="film">
    Consistent subject movement, natural transitions, and fluid camera behavior across the full clip duration.
  </Card>

  <Card title="Flexible resolution" icon="expand">
    480p, 720p, or 1080p — select based on quality requirements and credit budget.
  </Card>

  <Card title="Text and image input" icon="images">
    Supports text prompts, uploaded reference images, or a combination of both for broader creative control.
  </Card>

  <Card title="Multiple aspect ratios" icon="rectangle-wide">
    16:9, 9:16, 1:1, 4:3, and 3:4 — flexible framing for social, cinematic, and standard formats.
  </Card>
</CardGroup>

## Specifications

| Feature           | Details                                |
| ----------------- | -------------------------------------- |
| **Developer**     | Alibaba (Wan Video)                    |
| **Resolution**    | 480p, 720p, 1080p                      |
| **Duration**      | 5 or 10 seconds                        |
| **Aspect ratios** | 16:9, 9:16, 1:1, 4:3, 3:4              |
| **Audio**         | Ambient, SFX, voice (native, one-pass) |
| **Lip-sync**      | Yes                                    |
| **Input modes**   | Text-to-video, image-to-video          |

## How to use

<Steps>
  <Step title="Open the AI Video Generator">
    Log into ImagineArt and go to the **AI Video Generator**.
  </Step>

  <Step title="Select Wan 2.5">
    Choose **Wan 2.5** from the model dropdown.
  </Step>

  <Step title="Enter your prompt">
    Write a text prompt or upload a reference image. Include explicit motion, mood, and audio cues for best results.
  </Step>

  <Step title="Set duration and resolution">
    Choose **5 or 10 seconds** and your preferred resolution (480p, 720p, or 1080p).
  </Step>

  <Step title="Generate">
    Click **Generate** to produce the video with synchronized audio.
  </Step>

  <Step title="Review and iterate">
    Preview the clip, adjust your prompt or settings as needed, and download.
  </Step>
</Steps>

## Prompting tips

* **Include audio cues explicitly** — "Rain in the background," "distant city traffic," or "soft piano music" feed directly into the audio generation alongside the visual.
* **Describe motion and mood** — Be specific about how subjects move and the atmosphere you want. "Slow pan," "bustling city energy," or "tense stillness" all guide the model.
* **Use camera terminology** — "Overhead shot," "wide establishing shot," and "slow zoom in" give clear directional cues.
* **Specify lighting** — "Golden hour," "low-key studio lighting," or "overcast afternoon" guide the visual output alongside the audio.
* **For lip-sync** — Describe your character's speech or emotional reaction explicitly to anchor the lip movement generation.

### Example prompts

> Close-up shot: a woman in a vintage suit sits pensively at a café table. The camera slowly zooms in on her thoughtful expression as she speaks softly. Warm, ambient café sounds — quiet chatter, distant music. 10 seconds, 16:9.

> A young man carefully unpacks a pair of headphones in a modern apartment. Smooth dolly shot, slow zoom in on his focused expression. City ambient sounds through open windows in the background. 10 seconds, 1080p.

## Compare models

| Model                                                 | Audio | Lip-sync     | Duration | R2V | Best for                             |
| ----------------------------------------------------- | ----- | ------------ | -------- | --- | ------------------------------------ |
| **Wan 2.5**                                           | Yes   | Yes          | 10s      | No  | General A/V, lip-sync                |
| [Wan 2.6](/ai-models/video/wan-2-6)                   | Yes   | Yes          | 15s      | Yes | Character reference, voice insertion |
| [Wan 2.2](/ai-models/video/wan-2-2)                   | No    | No           | 5s       | No  | Camera control, LoRA style           |
| [Seedance 1.5 Pro](/ai-models/video/seedance-1-5-pro) | Yes   | Multilingual | 12s      | No  | Multilingual precision lip-sync      |

<Tip>
  Use Wan 2.5 when your project needs both visual impact and audio coherence in a single generation. For character identity preservation with voice reference input, upgrade to [Wan 2.6](/ai-models/video/wan-2-6) — the R2V successor model.
</Tip>
