Skip to main content
Reference to Video generates new video content inspired by reference images you provide. Unlike Image to Video — which uses an image as a literal starting or ending frame — Reference to Video extracts visual features from your references (appearance, clothing, props, environment details) and recreates those elements in a brand-new scenario you describe via a prompt. Use this mode when you want a character or object from your reference images to appear in a completely new scene, action, or environment that doesn’t exist in any of the source images.

How Reference to Video differs from Image to Video

Image to VideoReference to Video
Input roleDefines the literal first (and optionally last) frame of the videoProvides visual features for the AI to extract and recreate
Output relationship to inputVideo begins from and stays visually close to the uploaded imageVideo depicts a new scenario; references guide appearance, not composition
Prompt roleOptional guidance for motion and styleRequired to describe the scenario, action, and environment
Best forAnimating an existing scene or visualPlacing characters or objects in entirely new contexts

How to use Reference to Video

1

Open Video mode

Navigate to Video mode from the left sidebar.
2

Select Reference to Video mode

Click Add image below the prompt field to open the input modes tray, then select Reference-to-Video.
3

Upload your reference images

Upload one to four images of the subject(s) you want to appear in the video — characters, props, costume details, or scene elements. The model extracts visual features from all provided images and uses them to maintain consistency in the output.For best results:
  • Use images that show your subject clearly from multiple angles when possible
  • Avoid heavily cropped or obscured images
  • Provide images with consistent clothing, accessories, or design details if character or object consistency is important
  • Formats supported: JPG, JPEG, PNG, WEBP (min 300px, max 10 MB each)
4

Write a scenario prompt

Describe the new scene or action you want the video to depict. Be specific about the environment, the action, the mood, and the camera angle.Example prompts:
  • The character walking through a futuristic city at night, neon lights reflecting on wet streets, cinematic tracking shot
  • A woman in a red dress dancing in a grand ballroom, warm candlelight, slow zoom out
  • The robot standing on a rocky cliff overlooking a stormy ocean, dramatic wide angle, overcast sky
Describe the setting in detail. Because the model generates an entirely new video rather than animating your reference, a strong environmental description helps anchor the output and maintain visual coherence with your references.
5

Configure settings

Choose the video duration, aspect ratio, and resolution for your output.
6

Generate the video

Click Generate. The model creates a new video that maintains the visual identity of your reference subjects while placing them in the scenario you described.

Input media specifications

  • Up to 4 images per generation
  • Minimum resolution: 300px (shortest side)
  • Maximum file size: 10 MB per image
  • Supported formats: JPG, JPEG, PNG, WEBP

Key capabilities

  • Multi-reference subject creation: Combine up to four images of the same subject to give the model more information about their appearance, helping it maintain consistency in clothing, accessories, and distinguishing features.
  • Subject consistency across the clip: Characters, props, and scenes remain visually stable throughout the generated video, even as the action and environment change.
  • Creative flexibility: The AI can place your subjects in any scenario you can describe — new environments, action sequences, different camera angles, or lighting conditions entirely distinct from the source images.

When to use Reference to Video vs other modes

  • You have an existing character design, illustration, or photo and want to see it in a new scene
  • You want to create multiple videos featuring the same character in different situations
  • You need subject consistency across generated clips without being constrained by a specific starting frame
  • You want the video to begin exactly from a specific image
  • You want to animate a scene that already exists rather than create a new one
  • The precise composition of your source image should be preserved in the output
  • You don’t have a reference image and want to generate everything from a text description
  • You’re exploring ideas and don’t need visual consistency with existing assets

What to do next

Image to Video

Animate a specific image as the literal start frame of your video.

Edit Video

Modify an existing video using natural language commands.

Motion Control

Transfer body motion from a reference video onto a character image.

Video Credits

Understand credit costs for Reference to Video generations.