Documentation Index
Fetch the complete documentation index at: https://docs.imagine.art/llms.txt
Use this file to discover all available pages before exploring further.
Google’s most capable video model
Google Veo 3.1, released October 2025, is the production flagship of the Veo 3.1 family — generating 4–8 seconds of content per generation. Combined with broadcast-quality audio at 48kHz stereo (including dialogue, voiceover, and sound effects) and 4K resolution, Veo 3.1 is positioned for broadcast, commercial, and high-end production. Selectable frame rates (24 FPS for cinematic, 30 FPS for standard, 60 FPS for sports and fast motion) give the model flexibility across production contexts that no other model in the lineup matches.Capabilities
4–8 seconds
Choose from 4, 6, or 8 second clips — enough for complete commercial spots, narrative sequences, and high-quality short-form content.
48kHz stereo audio
Broadcast-quality audio at 48kHz stereo — includes dialogue, voiceover, sound effects, and ambient soundscapes generated natively.
4K resolution
4K output with the Transformer backbone’s spatial fidelity — production-ready for broadcast, cinema, and large-format display.
Selectable frame rates
Choose 24 FPS (cinematic standard), 30 FPS (broadcast/digital), or 60 FPS (sports, action, smooth motion) per generation.
Highest visual fidelity in Veo family
Maximum quality configuration of the Veo 3.1 Transformer backbone — richer texture, more stable scene composition, and more precise prompt adherence than Fast and Lite tiers.
Cinematic prompt control
Responds accurately to cinematographic language — camera movements, lighting conditions, scene pacing, and narrative cues all translate to the output.
Specifications
| Feature | Details |
|---|---|
| Developer | Google DeepMind |
| Released | October 2025 |
| Resolution | 720p, 1080p, 4K |
| Duration | 4–8 seconds (4, 6, or 8s selectable) |
| Frame rates | 24 FPS, 30 FPS, 60 FPS (selectable) |
| Audio | 48kHz stereo — dialogue, voiceover, SFX, ambient |
| Aspect ratios | 16:9, 9:16 |
| Architecture | Transformer backbone, spatio-temporal patches |
How to use
Write a detailed prompt
For long-form content, structure your prompt as a scene description with clear narrative progression — describe the arc of what happens over the full duration.
Select frame rate
Choose 24 FPS for cinematic, 30 FPS for standard delivery, or 60 FPS for smooth high-speed action.
Prompting tips
- Structure prompts as narratives — “The scene opens on… then transitions to… building to a conclusion where…” — Veo 3.1 understands temporal narrative structure.
- Specify audio timing — “The music builds from quiet to full orchestral by the halfway point” — temporal audio cues are interpreted accurately.
- Match FPS to content — 24 FPS for drama and cinema; 30 FPS for documentary and commercial; 60 FPS for sports or fluid slow-motion effect.
- Use 4K for large-format or print-quality moments — Product launches, hero brand shots, and cinematic sequences benefit most from 4K.
Example prompts
A nature documentary segment: A wolf pack hunts across a frozen tundra at dusk. The alpha leads, the pack follows in formation. Wide shot establishing the landscape, intercutting with close-ups of paw prints, breath visible in the cold air. Narration: “In the far north, survival depends on the pack.” Ambient wind and distant howling. 8 seconds, 4K.
A coffee brand spot: Product sits on a warm kitchen table at sunrise. A pair of hands wrap around the mug. Steam rises in close-up. Brand voice: “Start every morning right.” Warm, cinematic. 6 seconds.
Compare models
| Model | Duration | Audio | Resolution | FPS options | Best for |
|---|---|---|---|---|---|
| Veo 3.1 | 4–8s | 48kHz stereo | Up to 4K | 24/30/60 | Broadcast, commercial, high-end |
| Veo 3.1 Fast | 4/6/8s | SFX + ambient | Up to 4K | 24 | Short-form production, 4K |
| Veo 3.1 Lite | 8s | No | 1080p | — | Cost-efficient, no audio |
| Sora 2 Pro | 25s | Dialogue + SFX | 1080p | — | Long-form A/V, physics |

