Skip to main content
VIDEO MODELby Kling AIKling O series

Kling O1

Kling AI’s unified video creation and editing model — the world’s first multimodal video model to unify generation and editing in a single system. Accepts text, images, keyframes, reference videos, and motion inputs, with up to 7 reference images and 6 camera cuts per generation.

Resolution
Up to 1080p
Reference images
Up to 7
Camera cuts
Up to 6
Architecture
MVL unified

Unified creation and editing

Kling O1 is the first video model to unify generation and editing in a single system — you can create a new video from scratch and then edit specific sections, restyle footage, extend shots, or swap elements within the same model, without exporting to a separate editing tool. The Multi-modal Visual Language (MVL) architecture accepts six input types simultaneously: text, images, keyframes, reference videos, motion references, and video editing instructions. This makes Kling O1 uniquely capable for production pipelines that need a single model to handle multiple stages.

Capabilities

Unified generation and editing

The first model to handle both video creation and video editing in one system — generate footage and edit it within the same generation pipeline.

6 input types

Accepts text, images, keyframes, reference videos, motion references, and editing instructions as simultaneous inputs.

Up to 7 reference images

Anchor character appearance, visual style, and scene composition with up to 7 reference images in a single generation.

Up to 6 camera cuts

Generates up to 6 distinct shots per generation — structured multi-shot output from a single model invocation.

Video restyling

Transform the visual style of existing footage — apply new aesthetics, change time of day, or retheme content while preserving the underlying motion.

Shot extension

Extend existing shots seamlessly — continue the motion and scene from the end of an existing clip.

Input types supported

InputUse
TextScene description, style direction, audio cues
Images (up to 7)Subject appearance, visual style, composition anchoring
KeyframesDefine start, middle, or end frames for transition control
Reference videosMotion and style reference from existing footage
Motion referencesCamera trajectory and subject movement patterns
Editing instructionsTargeted edits to specific elements in existing video

Specifications

FeatureDetails
DeveloperKling AI (Kuaishou)
ArchitectureMulti-modal Visual Language (MVL)
ResolutionUp to 1080p
Duration5–10 seconds
Reference imagesUp to 7
Camera cutsUp to 6 per generation
AudioNo native audio
Input modes6 (text, image, keyframe, ref video, motion ref, editing)

How to use

1

Open the AI Video Generator

Log into ImagineArt and go to the AI Video Generator.
2

Select Kling O1

Choose Kling O1 from the model dropdown.
3

Choose your input combination

Select the combination of input types that fits your use case — text only, text + images, keyframes + motion reference, or video editing mode.
4

Upload references

Upload up to 7 reference images, a reference video, or motion reference as needed.
5

Describe your multi-shot structure

For multi-shot output, structure your prompt with explicit shot descriptions — up to 6 shots per generation.
6

Generate

Click Generate. Generation typically completes in 1–2 minutes for complex multi-input requests.

Prompting tips

  • Describe edit targets precisely — In editing mode: “Change the background from day to night while keeping the subject unchanged” is more accurate than “make it darker.”
  • Use keyframes for transitions — Define your start and end keyframes; let Kling O1 fill in the motion between them consistently.
  • Combine input types — “Based on this reference image [image], in this visual style [image 2], with this camera movement [motion ref]…” — the MVL architecture processes all inputs cohesively.

Example prompts

SHOT 1 (wide, 3s): A detective walks into a rain-soaked alley at night. SHOT 2 (close-up, 2s): Detective looks at a clue on the ground, rain drops visible. SHOT 3 (medium, 3s): Detective turns and exits the alley. Reference image for detective character appearance attached.
Restyle the provided footage to a vintage 1970s Super 8 film look. Keep all motion and subjects identical; change only the visual aesthetic.

Compare models

ModelEdit supportInput typesCamera cutsAudioBest for
Kling O1Yes (unified)6Up to 6NoCreate + edit workflows
Kling O3Partial6Up to 6YesMax capability + audio
Kling 3.0 ProNo2Up to 6Yes4K cinematic, multi-shot
Pika 2.2Partial (swaps, scenes)2NoNoCreative effects + keyframes
Kling O1 is the strongest model when your workflow requires both creating new footage and editing or transforming existing video within the same pipeline. For maximum capability with audio, consider Kling O3.