> ## Documentation Index
> Fetch the complete documentation index at: https://docs.imagine.art/llms.txt
> Use this file to discover all available pages before exploring further.

# Grok video

<div style={{background: "linear-gradient(135deg, #00080f 0%, #001a3a 55%, #000812 100%)", borderRadius: "20px", padding: "3.5rem 3rem 3rem", marginBottom: "2.5rem", overflow: "hidden", position: "relative"}}>
  <div style={{position: "absolute", inset: "0", background: "radial-gradient(ellipse at 55% 20%, rgba(124,0,251,0.18) 0%, transparent 55%), radial-gradient(ellipse at 10% 75%, rgba(0,100,255,0.14) 0%, transparent 50%)", pointerEvents: "none"}} />

  <div style={{position: "relative"}}>
    <div style={{display: "flex", gap: "0.5rem", marginBottom: "1.5rem", flexWrap: "wrap"}}>
      <span style={{background: "rgba(0,80,200,0.3)", border: "1px solid rgba(0,100,255,0.4)", borderRadius: "100px", padding: "0.3rem 1rem", fontSize: "0.72rem", color: "#7eb8ff", fontWeight: "500", letterSpacing: "0.06em"}}>VIDEO MODEL</span>
      <span style={{background: "rgba(255,255,255,0.06)", border: "1px solid rgba(255,255,255,0.12)", borderRadius: "100px", padding: "0.3rem 1rem", fontSize: "0.72rem", color: "rgba(255,255,255,0.45)", fontWeight: "400"}}>by xAI</span>
      <span style={{background: "rgba(255,255,255,0.06)", border: "1px solid rgba(255,255,255,0.12)", borderRadius: "100px", padding: "0.3rem 1rem", fontSize: "0.72rem", color: "rgba(255,255,255,0.45)", fontWeight: "400"}}>Aurora architecture</span>
    </div>

    <h1 style={{fontSize: "clamp(2.5rem, 5vw, 3.75rem)", fontWeight: "700", color: "#ffffff", lineHeight: "1.1", letterSpacing: "-0.025em", margin: "0 0 1.1rem 0"}}>xAI Grok Video</h1>
    <p style={{fontSize: "1.1rem", color: "rgba(255,255,255,0.52)", maxWidth: "580px", lineHeight: "1.7", marginBottom: "2.25rem"}}>xAI's Aurora autoregressive video model — generates video in approximately 17 seconds with native audio including background music, sound effects, and ambient audio. Accepts up to 7 reference images for identity and style preservation, with text-to-video, image-to-video, and reference-to-video modes. Supports clips from 6 to 15 seconds.</p>

    <div style={{display: "flex", gap: "0.75rem", flexWrap: "wrap"}}>
      <div style={{background: "rgba(255,255,255,0.06)", borderRadius: "14px", padding: "0.875rem 1.5rem", border: "1px solid rgba(255,255,255,0.1)"}}>
        <div style={{fontSize: "0.62rem", color: "rgba(255,255,255,0.32)", textTransform: "uppercase", letterSpacing: "0.1em", marginBottom: "0.3rem"}}>Generation time</div>
        <div style={{fontSize: "1rem", color: "#ffffff", fontWeight: "600"}}>\~17 seconds</div>
      </div>

      <div style={{background: "rgba(255,255,255,0.06)", borderRadius: "14px", padding: "0.875rem 1.5rem", border: "1px solid rgba(255,255,255,0.1)"}}>
        <div style={{fontSize: "0.62rem", color: "rgba(255,255,255,0.32)", textTransform: "uppercase", letterSpacing: "0.1em", marginBottom: "0.3rem"}}>Resolution</div>
        <div style={{fontSize: "1rem", color: "#ffffff", fontWeight: "600"}}>720p</div>
      </div>

      <div style={{background: "rgba(255,255,255,0.06)", borderRadius: "14px", padding: "0.875rem 1.5rem", border: "1px solid rgba(255,255,255,0.1)"}}>
        <div style={{fontSize: "0.62rem", color: "rgba(255,255,255,0.32)", textTransform: "uppercase", letterSpacing: "0.1em", marginBottom: "0.3rem"}}>Audio</div>
        <div style={{fontSize: "1rem", color: "#ffffff", fontWeight: "600"}}>Music + SFX + Ambient</div>
      </div>

      <div style={{background: "rgba(255,255,255,0.06)", borderRadius: "14px", padding: "0.875rem 1.5rem", border: "1px solid rgba(255,255,255,0.1)"}}>
        <div style={{fontSize: "0.62rem", color: "rgba(255,255,255,0.32)", textTransform: "uppercase", letterSpacing: "0.1em", marginBottom: "0.3rem"}}>References</div>
        <div style={{fontSize: "1rem", color: "#ffffff", fontWeight: "600"}}>Up to 7 images</div>
      </div>
    </div>
  </div>
</div>

## The fastest AI video generation available

xAI's Grok Video is built on Aurora — an autoregressive architecture that predicts video frames sequentially rather than through the diffusion process used by most other models. This fundamental difference is what enables Aurora's \~17-second generation time, making it the fastest AI video model available on ImagineArt by a significant margin.

Despite the speed advantage, Grok Video delivers native audio (background music, sound effects, and ambient audio synchronized with the video), identity preservation with up to 7 reference images, and smooth natural motion. The reference-to-video mode is particularly strong: character identity, style, and visual consistency are preserved across the generation with minimal drift.

## Capabilities

<CardGroup cols={3}>
  <Card title="Ultra-fast generation" icon="bolt">
    Approximately 17 seconds per clip — the fastest generation time in the lineup. Enables rapid iteration at a pace no diffusion model can match.
  </Card>

  <Card title="Native audio" icon="music">
    Background music, sound effects, and ambient audio generated natively with the video — synchronized without post-production.
  </Card>

  <Card title="Up to 7 reference images" icon="images">
    Identity and style preservation with up to 7 reference images — characters and visual styles are maintained consistently throughout the generated video.
  </Card>

  <Card title="Aurora autoregressive architecture" icon="microchip">
    Sequential frame prediction rather than diffusion — produces smooth, coherent motion with natural temporal consistency between frames.
  </Card>

  <Card title="Reference-to-video mode" icon="user">
    Strong identity preservation in reference-based generation — character appearance, style, and smooth natural movement preserved from reference inputs.
  </Card>

  <Card title="3 generation modes" icon="layer-group">
    Text-to-video, image-to-video, and reference-to-video — flexible workflow support from any starting point.
  </Card>
</CardGroup>

## Aurora vs. diffusion architecture

| Feature              | **Grok Video (Aurora)**     | Diffusion models             |
| -------------------- | --------------------------- | ---------------------------- |
| Architecture         | Autoregressive (sequential) | Diffusion (iterative)        |
| Generation speed     | \~17 seconds                | 30 seconds – several minutes |
| Temporal consistency | Strong (sequential)         | Variable                     |
| Output resolution    | 720p                        | Up to 4K                     |
| Audio                | Native                      | Varies                       |

## Specifications

| Feature              | Details                                              |
| -------------------- | ---------------------------------------------------- |
| **Developer**        | xAI                                                  |
| **Architecture**     | Aurora (autoregressive, sequential frame prediction) |
| **Resolution**       | 720p                                                 |
| **Duration**         | 6–15 seconds                                         |
| **Frame rate**       | —                                                    |
| **Audio**            | Background music, SFX, ambient (native)              |
| **Reference images** | Up to 7                                              |
| **Aspect ratios**    | 16:9, 9:16, 1:1                                      |
| **Input modes**      | Text-to-video, image-to-video, reference-to-video    |

## How to use

<Steps>
  <Step title="Open the AI Video Generator">
    Log into ImagineArt and go to the **AI Video Generator**.
  </Step>

  <Step title="Select xAI Grok Video">
    Choose **xAI Grok Video** from the model dropdown.
  </Step>

  <Step title="Choose your generation mode">
    Select text-to-video, image-to-video, or reference-to-video.
  </Step>

  <Step title="Upload references (optional)">
    For reference-to-video, upload up to 7 reference images to anchor identity and visual style.
  </Step>

  <Step title="Write your prompt">
    Describe the scene, motion, and audio environment. Include sound cues explicitly for the audio generation.
  </Step>

  <Step title="Generate">
    Click **Generate** — expect results in approximately 17 seconds.
  </Step>
</Steps>

## Prompting tips

* **Use it for rapid iteration** — 17-second generation means you can test 10–15 variations in the time it takes other models to produce 2 or 3. Explore directions aggressively before committing.
* **Audio cues work naturally** — "With upbeat jazz playing in the background" or "the sound of waves crashing" integrate naturally into Grok Video's audio generation.
* **Reference-to-video for consistent characters** — Upload multiple reference angles of a character (front, side, 3/4 view) to improve identity consistency across different generated scenes.
* **Keep prompts focused** — Aurora's sequential architecture produces the most coherent motion when the prompt describes a single, clear visual sequence rather than a complex multi-event narrative.

### Example prompts

> A golden retriever puppy plays in a field of daisies, tail wagging. Upbeat acoustic guitar music. Bright afternoon sunlight, slow motion on the playful moments. 6 seconds, 16:9.

> A barista writes a customer's name on a coffee cup with a marker. Soft café ambient sounds, quiet chatter in background. Close-up, handheld feel. 6 seconds.

## Compare models

| Model                                                     | Speed     | Audio | References  | Best for                 |
| --------------------------------------------------------- | --------- | ----- | ----------- | ------------------------ |
| **xAI Grok Video**                                        | \~17s     | Yes   | Up to 7     | Maximum speed + audio    |
| [Runway Gen 4 Turbo](/ai-models/video/runway-gen-4-turbo) | \~30s     | No    | —           | Fast cinematic, no audio |
| [Seedance Pro Fast](/ai-models/video/seedance-pro-fast)   | Under 60s | No    | Image input | Fast Seedance quality    |
| [PixVerse v5](/ai-models/video/pixverse-v5)               | \~30s     | No    | —           | Fast character animation |

<Tip>
  xAI Grok Video is the right choice when generation speed is a priority — especially for clients needing fast previews, high-volume production pipelines, or exploratory rapid iteration with audio. For maximum resolution or longer clips, other models in the lineup offer higher output specifications.
</Tip>
