Documentation Index
Fetch the complete documentation index at: https://docs.imagine.art/llms.txt
Use this file to discover all available pages before exploring further.
Kling 3.0 Pro
Kling 3.0 Pro marks Kling AI’s most significant architectural leap — 1080p output at 60 frames per second with Omni Native Audio and multi-shot storyboarding in a single generation. The Multi-modal Visual Language (MVL) architecture unifies text, image, video, and audio inputs into a single model, enabling true multi-shot storyboarding — up to 6 distinct shots, each with specified duration, shot size, perspective, narrative, and camera movement, all generated from one prompt.Capabilities
1080p at 60 FPS
Generates 1080p video at 60 frames per second — smooth, high frame-rate output for cinematic and action-heavy content.
Omni Native Audio
Multilingual audio generation including English, Japanese, Korean, Spanish, and environmental soundscapes — generated natively alongside the video.
Multi-shot storyboarding
Specify up to 6 shots in a single 15-second generation — each with its own duration, shot size, perspective, camera movement, and narrative.
MVL architecture
Multi-modal Visual Language architecture natively processes text, images, video, and audio as unified inputs for coherent multimodal output.
Up to 10 reference images
Accepts up to 10 reference images for subject appearance, style, and composition anchoring across a multi-shot sequence.
Complex action accuracy
Handles fast, intricate physical actions — martial arts, dance, sports — with consistent body mechanics and no ghosting artifacts.
Specifications
| Feature | Details |
|---|---|
| Developer | Kling AI (Kuaishou) |
| Base credits | 300 |
| Resolution | 1080p |
| Frame rate | 60 FPS |
| Duration | Up to 15 seconds |
| Shots per generation | Up to 6 |
| Audio | Omni Native Audio — dialogue, SFX, music |
| Languages | English, Japanese, Korean, Spanish + more |
| Max reference images | 10 |
| Architecture | Multi-modal Visual Language (MVL) |
How to use
Structure your prompt for multi-shot
For multi-shot output, describe each shot with explicit transitions: “SHOT 1 (3s, wide, establishing): … SHOT 2 (2s, close-up): …” Kling 3.0 Pro interprets these cues to generate distinct cinematographic cuts.
Add reference images (optional)
Upload up to 10 reference images for character appearance, environment style, or composition guidance.
Include audio direction
Describe the audio landscape — dialogue lines, ambient environment, music style — within the prompt for Omni Native Audio.
Prompting tips
- Structure shots explicitly — “SHOT 1: wide establishing exterior, 3 seconds, slow pan right. SHOT 2: medium close-up on protagonist, 2 seconds, static camera.” Kling 3.0 Pro follows cinematographic structure in prompts.
- Specify language for dialogue — If your scene requires characters speaking a specific language, state it clearly: “The character speaks in Japanese with a formal tone.”
- Reference images anchor identity — For character consistency across shots, upload a reference image and describe the character consistently in each shot description.
- Use technical camera terms — “Shallow depth of field,” “Dutch angle,” “rack focus,” and “tracking shot” all meaningfully influence the cinematic output.
Example prompts
SHOT 1 (4s, wide, cinematic): A samurai stands at the edge of a misty forest at dawn. Slow pan left, revealing a village in the distance. Traditional Japanese ambient sounds. SHOT 2 (3s, close-up): The samurai’s hand grips a sword hilt. Rain begins to fall. SHOT 3 (3s, medium): The samurai turns and walks into the mist.
A professional basketball player dribbles through defenders and dunks. Wide angle, 60 FPS, 5 seconds. Arena crowd roaring in the background, sneakers squeaking on hardwood.
Compare models
| Model | Resolution | FPS | Audio | Shots | Best for |
|---|---|---|---|---|---|
| Kling 3.0 Pro | 1080p | 60 | Omni Native | Up to 6 | Multi-shot storytelling, 60 FPS |
| Kling O3 | 4K | 60 | Yes | Up to 6 | Advanced physics, 6 generation modes |
| Kling 2.6 Pro | 1080p | 48 | Lip-sync | — | Audio-synced content, fast motion |
| Kling 2.5 Pro | 1080p | — | No | — | Cost-efficient HD production |

