Six modes, one model
Kling O3 is Kling AI’s most capable unified model — designed to handle every stage of a production workflow in a single system. Six distinct generation modes (text-to-video, image-to-video, video-to-video, frames-to-video, motion control, and reference-to-video) eliminate the need to switch between models for different tasks. The advanced physics engine is the headline capability that separates Kling O3 from earlier models: gravity, balance, deformation, collision, and inertia are all simulated accurately. Characters fall with real weight, water flows with physical plausibility, and rigid objects interact with correct momentum.Capabilities
Advanced physics engine
Simulates gravity, balance, deformation, collision, and inertia — objects fall, splash, and interact with physical accuracy rarely seen in generative video.
4K at 60 FPS
Native 4K (3840×2160) resolution at 60 frames per second — broadcast-grade output without upscaling.
6 generation modes
Text-to-video, image-to-video, video-to-video, frames-to-video, motion control, and reference-to-video — a complete production toolkit in one model.
Native audio with 10+ languages
Generates dialogue, ambient sounds, and sound effects with support for 10+ languages including English, Japanese, Korean, Spanish, and more.
Up to 6 shots per generation
Generate up to 6 distinct cinematographic shots within a single 15-second output — structured multi-shot storytelling from one prompt.
10+ reference images
Accepts more than 10 reference images for character appearance, style anchoring, and multi-subject scene construction.
Generation modes
| Mode | Description |
|---|---|
| Text-to-video | Generate video directly from a text prompt |
| Image-to-video | Animate a reference image with described motion |
| Video-to-video | Restyle or transform an existing video |
| Frames-to-video | Specify start, middle, and/or end frames |
| Motion control | Apply specific camera trajectories and subject movements |
| Reference-to-video | Anchor generation to reference subjects and styles |
Specifications
| Feature | Details |
|---|---|
| Developer | Kling AI (Kuaishou) |
| Resolution | 4K (3840×2160) |
| Frame rate | 60 FPS |
| Duration | Up to 15 seconds |
| Shots per generation | Up to 6 |
| Generation modes | 6 |
| Audio | Dialogue, SFX, ambient sounds |
| Languages | 10+ |
| Max reference images | 10+ |
How to use
Choose your generation mode
Select the mode that fits your workflow — text-to-video, image-to-video, video-to-video, frames-to-video, motion control, or reference-to-video.
Structure multi-shot prompts
For multi-shot output, describe each shot with explicit shot size, camera movement, duration, and scene content.
Add references (optional)
Upload reference images, video clips, or motion references to anchor characters, styles, or movement patterns.
Prompting tips
- Leverage physics descriptions — “A glass falls off a table and shatters, water splashing realistically” will be rendered with physically accurate simulation. Be explicit about the physical behavior you want.
- Use mode strategically — For restyling existing footage, use video-to-video. For maximum control over a sequence, use frames-to-video with defined start and end frames.
- Multi-shot structure — “SHOT 1 (wide, 3s): … SHOT 2 (close-up, 2s): …” is interpreted as discrete cinematographic cuts by Kling O3.
- Specify audio language — If characters speak, state the language to get accurate lip-sync and phoneme generation.
Example prompts
A professional boxer trains in a gym. SHOT 1 (4s, wide): Boxer shadowboxes in the ring, motion blur on fast punches, gym ambient sound. SHOT 2 (3s, close-up): Sweat flies off the boxer’s face as they throw a right hook. Impact sound effect. SHOT 3 (3s, medium): The boxer catches their breath, hands on the rope. 4K, 60 FPS.
A raindrop falls onto a still pond surface. Extreme close-up macro shot. Water ripples expand outward with physically accurate fluid dynamics. Ambient rain sound. Slow motion, 60 FPS, 5 seconds.
Compare models
| Model | Resolution | FPS | Physics | Modes | Best for |
|---|---|---|---|---|---|
| Kling O3 | 4K | 60 | Advanced | 6 | Max capability, physics-heavy |
| Kling 3.0 Pro | 4K | 60 | Standard | Text + Image | Cinematic 4K, multi-shot |
| Kling O1 | 1080p | — | Standard | Unified | Editing + generation workflows |
| Kling 2.6 Pro | 1080p | 48 | Standard | Text + Image | Audio-synced, fast motion |

