Simultaneous audio and video
Kling 2.6 Pro, released December 2025, introduced simultaneous audio-visual generation to the Kling Pro lineup — audio is not added after video generation but produced in a single pass alongside the visuals. This ensures tight synchronization between lip movements, dialogue, sound effects, and ambient audio. The lip-sync system supports both English and Chinese dialogue, narration, and singing — with accurate tone production for singing content, not just spoken words. At 48 FPS, motion sequences — particularly martial arts, dance, and fast physical action — are rendered with the smoothness typically associated with high-frame-rate broadcast and sports content.Capabilities
Simultaneous A/V generation
Audio and video generated in a single pass — tight synchronization between dialogue, lip movements, sound effects, and ambient audio.
English + Chinese lip-sync
Accurate lip-sync for English and Chinese dialogue and narration — tone-accurate singing in both languages.
48 FPS output
High frame rate output at 48 FPS — smooth motion for dance, martial arts, sports, and fast action sequences.
Enhanced full-body motion
Improved fidelity for fast, intricate full-body movements — martial arts, dance, gymnastics — with no ghosting or body part distortion.
Motion reference support
Accepts motion reference clips (3–30 seconds) to anchor specific movement patterns and action sequences.
Built-in sound effects
Native sound effects and ambient noise generation — footsteps, environment sounds, impact effects — synchronized to the visual action.
Specifications
| Feature | Details |
|---|---|
| Developer | Kling AI (Kuaishou) |
| Released | December 2025 |
| Resolution | 1080p |
| Frame rate | 48 FPS |
| Duration | Up to 10 seconds |
| Audio | Dialogue, SFX, ambient sounds, singing |
| Languages | English, Chinese |
| Lip-sync | Yes — including singing |
| Motion reference | 3–30 seconds |
How to use
- Audio-visual generation
- Motion reference
Write your prompt with audio direction
Include explicit audio cues in your prompt: dialogue lines, sound effect descriptions, music style, and ambient environment.
Prompting tips
- Include dialogue in quotes — “A character says ‘Welcome home’ warmly” — quoted text is interpreted as a lip-sync target for the audio generation.
- Specify Chinese or English explicitly — “The character speaks in Mandarin Chinese” or “narration in English” ensures accurate phoneme production.
- Singing works — “A singer performs a pop chorus, upbeat tempo, clear pronunciation” will produce tone-accurate singing with synchronized lip movements.
- 48 FPS rewards fast motion — Prompts involving dance, martial arts, and sports produce their best results at 48 FPS. Describe the full action to benefit from the frame rate.
Example prompts
A pop singer performs on stage under colorful spotlights. The camera slowly circles. The singer sings in English with clear enunciation. Upbeat music, crowd cheering in the background. 10 seconds, 1080p.
A martial artist performs a high-speed combination — three kicks and a spinning strike. 48 FPS, smooth motion, dojo setting, impact sound effects synchronized to each strike.
Compare models
| Model | Audio | Lip-sync | FPS | Motion fidelity | Best for |
|---|---|---|---|---|---|
| Kling 2.6 Pro | Yes | EN + Chinese | 48 | Enhanced | Audio-synced, fast motion |
| Kling 3.0 Pro | Yes | Multilingual | 60 | Strong | 4K cinematic multi-shot |
| Kling O3 | Yes | 10+ languages | 60 | Advanced | Physics + audio, 4K |
| Seedance 1.5 Pro | Yes | 8+ languages | — | Good | Multilingual dialogue focus |

