Available models
| Model | Aspect ratios | Duration | Description | Best for |
|---|---|---|---|---|
| Google Veo 3 | 9:16, 16:9, 1:1 | 4–8s | Multimodal AI model with native audio, lip-sync, and cinematic prompt control. | Short films, marketing content, audio-visual storytelling |
| Google Veo 3.1 | 9:16, 16:9, 1:1 | 4–8s | Enhanced version of Veo 3 with multi-reference input (up to 3 images), improved interpolation, and 360° camera support. | Product showcases, character-based storytelling, campaign videos |
| Kling 2.1 | 9:16, 16:9, 1:1 | 5–10s | High-fidelity video with smooth motion, realistic character behavior, and strong spatial awareness. Also available as Kling 2.1 Master for higher prompt precision. | Cinematic sequences, multi-character scenes, commercials |
| Kling 2.6 | 1:1, 9:16, 16:9 | 5–10s | Significant upgrade with native audio (dialogues, SFX, music), 1080p output, and support for English and Chinese voice output. | Film scenes, trailers, podcasts, ASMR content |
| Minimax Hailuo 02 | 16:9 | 6s | Cinematic text-to-video model with camera-aware motion (pans, tilts, zooms), up to 1080p resolution, and wide stylistic range. | Filmmaking, marketing campaigns, cinematic short-form content |
| Minimax Hailuo 2.3 | Multiple | 6s | Advanced motion tracking, facial micro-expression detail, expanded stylization options, and improved frame interpolation. | Character animation, product showcases, stylized content |
| PixVerse v5 | 9:16, 16:9, 1:1, 3:4, 4:3 | 5–8s | Faster rendering, sharper visuals, smoother motion, and visual prompting support. | Social media clips, branded motion visuals, concept art |
| Seedance 1.0 | 1:1, 4:3, 16:9, 3:4, 9:16 | 5–10s | ByteDance’s video model with strong multi-shot consistency, cinematic camera styles, and both text-to-video and image-to-video workflows. | Video storytelling, narrative sequences, storyboarding |
| Sora 2 Pro | 9:16, 16:9 | 4–12s | OpenAI’s most advanced video model with integrated audio, physics-aware motion, and strong prompt control. | Narrative content, branded video with audio, complex scenes |
| Wan 2.2 | 9:16, 16:9, 1:1 | 5–10s | MoE diffusion architecture with complex stable multi-object motion, cinematic aesthetic controls, and a 5B hybrid model. | Cinematic visuals, complex scenes, local prototyping |
| Wan 2.5 | 9:16, 16:9, 1:1 | 5–10s | Latest Wan model with native audio-video synchronization, lip-sync, improved motion flow, and flexible resolution (480p–1080p). | Short clips with audio, storytelling, product and brand videos |
Audio capabilities at a glance
- Models with native audio
- Models without audio
These models generate synchronized audio — including dialogue, ambient sound, and effects — as part of the video generation process:
| Model | Audio type | Lip-sync |
|---|---|---|
| Google Veo 3 | Dialogue, ambiance, SFX | Yes |
| Google Veo 3.1 | Ambient sound, effects | Yes |
| Kling 2.6 | Dialogue, SFX, music | No (in-progress) |
| Sora 2 Pro | Dialogue, ambiance, SFX | No |
| Wan 2.5 | Ambient sound, voice | Yes |
Extended videos generated with the Extend Video tool do not include audio, regardless of which model you use.
Choosing the right model
I need synchronized audio in my video
I need synchronized audio in my video
Use Google Veo 3 or Google Veo 3.1 for full audio including dialogue and lip-sync. Kling 2.6 offers native audio with dialogue and music. Sora 2 Pro provides integrated audio with strong prompt control. Wan 2.5 adds audio with lip-sync capability.
I need the highest motion quality and realism
I need the highest motion quality and realism
Use Kling 2.6 (1080p, cinematic action consistency) or Kling 2.1 (smooth motion, realistic character behavior, strong spatial awareness). Sora 2 Pro also delivers physics-aware motion with high fidelity.
I need long videos (up to 10 seconds)
I need long videos (up to 10 seconds)
Use Kling 2.1, Kling 2.6, Seedance 1.0, Wan 2.2, Wan 2.5, or Sora 2 Pro. All support clips up to 10 seconds. Sora 2 Pro extends to 12 seconds.
I need consistent characters or subjects across shots
I need consistent characters or subjects across shots
Use Seedance 1.0 (strong multi-shot consistency), Google Veo 3.1 (multi-reference input, up to 3 images), or PixVerse v5 (improved multi-shot consistency over previous versions).
I need fast rendering for rapid iteration
I need fast rendering for rapid iteration
Use PixVerse v5 (significantly faster than PixVerse 4.5) or Minimax Hailuo 2.3 Fast (optimized for speed, up to 768p).
I need a versatile model for social media and marketing
I need a versatile model for social media and marketing
Featured models
Google Veo 3
Full audio generation, lip-sync, and cinematic prompt control. Google’s flagship video model.
Kling 2.6
Native audio integration, 1080p cinematic output, and realistic action consistency.
Sora 2 Pro
OpenAI’s most advanced video model with integrated audio, physics-aware motion, and up to 12 seconds.
Wan 2.5
Audio-video synchronization with lip-sync, flexible resolution, and efficient performance.

