VIDEO MODELby Kling AIKling O series

Kling O1

Kling AI’s unified video creation and editing model — the world’s first multimodal video model to unify generation and editing in a single system. Accepts text, images, keyframes, reference videos, and motion inputs, with up to 7 reference images and 6 camera cuts per generation.

Resolution

Up to 1080p

Reference images

Up to 7

Camera cuts

Up to 6

Architecture

MVL unified

Unified creation and editing

Kling O1 is the first video model to unify generation and editing in a single system — you can create a new video from scratch and then edit specific sections, restyle footage, extend shots, or swap elements within the same model, without exporting to a separate editing tool. The Multi-modal Visual Language (MVL) architecture accepts six input types simultaneously: text, images, keyframes, reference videos, motion references, and video editing instructions. This makes Kling O1 uniquely capable for production pipelines that need a single model to handle multiple stages.

Capabilities

Unified generation and editing

The first model to handle both video creation and video editing in one system — generate footage and edit it within the same generation pipeline.

6 input types

Accepts text, images, keyframes, reference videos, motion references, and editing instructions as simultaneous inputs.

Up to 7 reference images

Anchor character appearance, visual style, and scene composition with up to 7 reference images in a single generation.

Up to 6 camera cuts

Generates up to 6 distinct shots per generation — structured multi-shot output from a single model invocation.

Video restyling

Transform the visual style of existing footage — apply new aesthetics, change time of day, or retheme content while preserving the underlying motion.

Shot extension

Extend existing shots seamlessly — continue the motion and scene from the end of an existing clip.

Input types supported

Input	Use
Text	Scene description, style direction, audio cues
Images (up to 7)	Subject appearance, visual style, composition anchoring
Keyframes	Define start, middle, or end frames for transition control
Reference videos	Motion and style reference from existing footage
Motion references	Camera trajectory and subject movement patterns
Editing instructions	Targeted edits to specific elements in existing video

Specifications

Feature	Details
Developer	Kling AI (Kuaishou)
Architecture	Multi-modal Visual Language (MVL)
Resolution	Up to 1080p
Duration	5–10 seconds
Reference images	Up to 7
Camera cuts	Up to 6 per generation
Audio	No native audio
Input modes	6 (text, image, keyframe, ref video, motion ref, editing)

How to use

Open the AI Video Generator

Log into ImagineArt and go to the AI Video Generator.

Select Kling O1

Choose Kling O1 from the model dropdown.

Choose your input combination

Select the combination of input types that fits your use case — text only, text + images, keyframes + motion reference, or video editing mode.

Upload references

Upload up to 7 reference images, a reference video, or motion reference as needed.

Describe your multi-shot structure

For multi-shot output, structure your prompt with explicit shot descriptions — up to 6 shots per generation.

Generate

Click Generate. Generation typically completes in 1–2 minutes for complex multi-input requests.

Prompting tips

Describe edit targets precisely — In editing mode: “Change the background from day to night while keeping the subject unchanged” is more accurate than “make it darker.”
Use keyframes for transitions — Define your start and end keyframes; let Kling O1 fill in the motion between them consistently.
Combine input types — “Based on this reference image [image], in this visual style [image 2], with this camera movement [motion ref]…” — the MVL architecture processes all inputs cohesively.

Example prompts

SHOT 1 (wide, 3s): A detective walks into a rain-soaked alley at night. SHOT 2 (close-up, 2s): Detective looks at a clue on the ground, rain drops visible. SHOT 3 (medium, 3s): Detective turns and exits the alley. Reference image for detective character appearance attached.

Restyle the provided footage to a vintage 1970s Super 8 film look. Keep all motion and subjects identical; change only the visual aesthetic.

Compare models

Model	Edit support	Input types	Camera cuts	Audio	Best for
Kling O1	Yes (unified)	6	Up to 6	No	Create + edit workflows
Kling O3	Partial	6	Up to 6	Yes	Max capability + audio
Kling 3.0 Pro	No	2	Up to 6	Yes	4K cinematic, multi-shot
Pika 2.2	Partial (swaps, scenes)	2	No	No	Creative effects + keyframes

Kling O1 is the strongest model when your workflow requires both creating new footage and editing or transforming existing video within the same pipeline. For maximum capability with audio, consider Kling O3.

​Kling O1

​Unified creation and editing

​Capabilities

Unified generation and editing

6 input types

Up to 7 reference images

Up to 6 camera cuts

Video restyling

Shot extension

​Input types supported

​Specifications

​How to use

​Prompting tips

​Example prompts

​Compare models

Kling O1

Unified creation and editing

Capabilities

Input types supported

Specifications

How to use

Prompting tips

Example prompts

Compare models