> ## Documentation Index
> Fetch the complete documentation index at: https://docs.imagine.art/llms.txt
> Use this file to discover all available pages before exploring further.

# Kling o1

<div style={{background: "linear-gradient(135deg, #00080f 0%, #001a3a 55%, #000812 100%)", borderRadius: "20px", padding: "3.5rem 3rem 3rem", marginBottom: "2.5rem", overflow: "hidden", position: "relative"}}>
  <div style={{position: "absolute", inset: "0", background: "radial-gradient(ellipse at 35% 70%, rgba(124,0,251,0.18) 0%, transparent 55%), radial-gradient(ellipse at 80% 15%, rgba(0,100,255,0.12) 0%, transparent 50%)", pointerEvents: "none"}} />

  <div style={{position: "relative"}}>
    <div style={{display: "flex", gap: "0.5rem", marginBottom: "1.5rem", flexWrap: "wrap"}}>
      <span style={{background: "rgba(0,80,200,0.3)", border: "1px solid rgba(0,100,255,0.4)", borderRadius: "100px", padding: "0.3rem 1rem", fontSize: "0.72rem", color: "#7eb8ff", fontWeight: "500", letterSpacing: "0.06em"}}>VIDEO MODEL</span>
      <span style={{background: "rgba(255,255,255,0.06)", border: "1px solid rgba(255,255,255,0.12)", borderRadius: "100px", padding: "0.3rem 1rem", fontSize: "0.72rem", color: "rgba(255,255,255,0.45)", fontWeight: "400"}}>by Kling AI</span>
      <span style={{background: "rgba(255,255,255,0.06)", border: "1px solid rgba(255,255,255,0.12)", borderRadius: "100px", padding: "0.3rem 1rem", fontSize: "0.72rem", color: "rgba(255,255,255,0.45)", fontWeight: "400"}}>Kling O series</span>
    </div>

    <h1 style={{fontSize: "clamp(2.5rem, 5vw, 3.75rem)", fontWeight: "700", color: "#ffffff", lineHeight: "1.1", letterSpacing: "-0.025em", margin: "0 0 1.1rem 0"}}>Kling O1</h1>
    <p style={{fontSize: "1.1rem", color: "rgba(255,255,255,0.52)", maxWidth: "580px", lineHeight: "1.7", marginBottom: "2.25rem"}}>Kling AI's unified video creation and editing model — the world's first multimodal video model to unify generation and editing in a single system. Accepts text, images, keyframes, reference videos, and motion inputs, with up to 7 reference images and 6 camera cuts per generation.</p>

    <div style={{display: "flex", gap: "0.75rem", flexWrap: "wrap"}}>
      <div style={{background: "rgba(255,255,255,0.06)", borderRadius: "14px", padding: "0.875rem 1.5rem", border: "1px solid rgba(255,255,255,0.1)"}}>
        <div style={{fontSize: "0.62rem", color: "rgba(255,255,255,0.32)", textTransform: "uppercase", letterSpacing: "0.1em", marginBottom: "0.3rem"}}>Resolution</div>
        <div style={{fontSize: "1rem", color: "#ffffff", fontWeight: "600"}}>Up to 1080p</div>
      </div>

      <div style={{background: "rgba(255,255,255,0.06)", borderRadius: "14px", padding: "0.875rem 1.5rem", border: "1px solid rgba(255,255,255,0.1)"}}>
        <div style={{fontSize: "0.62rem", color: "rgba(255,255,255,0.32)", textTransform: "uppercase", letterSpacing: "0.1em", marginBottom: "0.3rem"}}>Reference images</div>
        <div style={{fontSize: "1rem", color: "#ffffff", fontWeight: "600"}}>Up to 7</div>
      </div>

      <div style={{background: "rgba(255,255,255,0.06)", borderRadius: "14px", padding: "0.875rem 1.5rem", border: "1px solid rgba(255,255,255,0.1)"}}>
        <div style={{fontSize: "0.62rem", color: "rgba(255,255,255,0.32)", textTransform: "uppercase", letterSpacing: "0.1em", marginBottom: "0.3rem"}}>Camera cuts</div>
        <div style={{fontSize: "1rem", color: "#ffffff", fontWeight: "600"}}>Up to 6</div>
      </div>

      <div style={{background: "rgba(255,255,255,0.06)", borderRadius: "14px", padding: "0.875rem 1.5rem", border: "1px solid rgba(255,255,255,0.1)"}}>
        <div style={{fontSize: "0.62rem", color: "rgba(255,255,255,0.32)", textTransform: "uppercase", letterSpacing: "0.1em", marginBottom: "0.3rem"}}>Architecture</div>
        <div style={{fontSize: "1rem", color: "#ffffff", fontWeight: "600"}}>MVL unified</div>
      </div>
    </div>
  </div>
</div>

## Unified creation and editing

Kling O1 is the first video model to unify generation and editing in a single system — you can create a new video from scratch and then edit specific sections, restyle footage, extend shots, or swap elements within the same model, without exporting to a separate editing tool.

The Multi-modal Visual Language (MVL) architecture accepts six input types simultaneously: text, images, keyframes, reference videos, motion references, and video editing instructions. This makes Kling O1 uniquely capable for production pipelines that need a single model to handle multiple stages.

## Capabilities

<CardGroup cols={3}>
  <Card title="Unified generation and editing" icon="pen-to-square">
    The first model to handle both video creation and video editing in one system — generate footage and edit it within the same generation pipeline.
  </Card>

  <Card title="6 input types" icon="layer-group">
    Accepts text, images, keyframes, reference videos, motion references, and editing instructions as simultaneous inputs.
  </Card>

  <Card title="Up to 7 reference images" icon="images">
    Anchor character appearance, visual style, and scene composition with up to 7 reference images in a single generation.
  </Card>

  <Card title="Up to 6 camera cuts" icon="clapperboard">
    Generates up to 6 distinct shots per generation — structured multi-shot output from a single model invocation.
  </Card>

  <Card title="Video restyling" icon="palette">
    Transform the visual style of existing footage — apply new aesthetics, change time of day, or retheme content while preserving the underlying motion.
  </Card>

  <Card title="Shot extension" icon="arrows-left-right">
    Extend existing shots seamlessly — continue the motion and scene from the end of an existing clip.
  </Card>
</CardGroup>

## Input types supported

| Input                    | Use                                                        |
| ------------------------ | ---------------------------------------------------------- |
| **Text**                 | Scene description, style direction, audio cues             |
| **Images (up to 7)**     | Subject appearance, visual style, composition anchoring    |
| **Keyframes**            | Define start, middle, or end frames for transition control |
| **Reference videos**     | Motion and style reference from existing footage           |
| **Motion references**    | Camera trajectory and subject movement patterns            |
| **Editing instructions** | Targeted edits to specific elements in existing video      |

## Specifications

| Feature              | Details                                                   |
| -------------------- | --------------------------------------------------------- |
| **Developer**        | Kling AI (Kuaishou)                                       |
| **Architecture**     | Multi-modal Visual Language (MVL)                         |
| **Resolution**       | Up to 1080p                                               |
| **Duration**         | 5–10 seconds                                              |
| **Reference images** | Up to 7                                                   |
| **Camera cuts**      | Up to 6 per generation                                    |
| **Audio**            | No native audio                                           |
| **Input modes**      | 6 (text, image, keyframe, ref video, motion ref, editing) |

## How to use

<Steps>
  <Step title="Open the AI Video Generator">
    Log into ImagineArt and go to the **AI Video Generator**.
  </Step>

  <Step title="Select Kling O1">
    Choose **Kling O1** from the model dropdown.
  </Step>

  <Step title="Choose your input combination">
    Select the combination of input types that fits your use case — text only, text + images, keyframes + motion reference, or video editing mode.
  </Step>

  <Step title="Upload references">
    Upload up to 7 reference images, a reference video, or motion reference as needed.
  </Step>

  <Step title="Describe your multi-shot structure">
    For multi-shot output, structure your prompt with explicit shot descriptions — up to 6 shots per generation.
  </Step>

  <Step title="Generate">
    Click **Generate**. Generation typically completes in 1–2 minutes for complex multi-input requests.
  </Step>
</Steps>

## Prompting tips

* **Describe edit targets precisely** — In editing mode: "Change the background from day to night while keeping the subject unchanged" is more accurate than "make it darker."
* **Use keyframes for transitions** — Define your start and end keyframes; let Kling O1 fill in the motion between them consistently.
* **Combine input types** — "Based on this reference image \[image], in this visual style \[image 2], with this camera movement \[motion ref]..." — the MVL architecture processes all inputs cohesively.

### Example prompts

> SHOT 1 (wide, 3s): A detective walks into a rain-soaked alley at night. SHOT 2 (close-up, 2s): Detective looks at a clue on the ground, rain drops visible. SHOT 3 (medium, 3s): Detective turns and exits the alley. Reference image for detective character appearance attached.

> Restyle the provided footage to a vintage 1970s Super 8 film look. Keep all motion and subjects identical; change only the visual aesthetic.

## Compare models

| Model                                           | Edit support            | Input types | Camera cuts | Audio | Best for                     |
| ----------------------------------------------- | ----------------------- | ----------- | ----------- | ----- | ---------------------------- |
| **Kling O1**                                    | Yes (unified)           | 6           | Up to 6     | No    | Create + edit workflows      |
| [Kling O3](/ai-models/video/kling-o3)           | Partial                 | 6           | Up to 6     | Yes   | Max capability + audio       |
| [Kling 3.0 Pro](/ai-models/video/kling-3-0-pro) | No                      | 2           | Up to 6     | Yes   | 4K cinematic, multi-shot     |
| [Pika 2.2](/ai-models/video/pika-2-2)           | Partial (swaps, scenes) | 2           | No          | No    | Creative effects + keyframes |

<Tip>
  Kling O1 is the strongest model when your workflow requires both creating new footage and editing or transforming existing video within the same pipeline. For maximum capability with audio, consider [Kling O3](/ai-models/video/kling-o3).
</Tip>
