Reference to Video - ImagineArt Help Center

Reference to Video generates new video content inspired by reference images you provide. Unlike Image to Video — which uses an image as a literal starting or ending frame — Reference to Video extracts visual features from your references (appearance, clothing, props, environment details) and recreates those elements in a brand-new scenario you describe via a prompt. Use this mode when you want a character or object from your reference images to appear in a completely new scene, action, or environment that doesn’t exist in any of the source images.

How Reference to Video differs from Image to Video

	Image to Video	Reference to Video
Input role	Defines the literal first (and optionally last) frame of the video	Provides visual features for the AI to extract and recreate
Output relationship to input	Video begins from and stays visually close to the uploaded image	Video depicts a new scenario; references guide appearance, not composition
Prompt role	Optional guidance for motion and style	Required to describe the scenario, action, and environment
Best for	Animating an existing scene or visual	Placing characters or objects in entirely new contexts

How to use Reference to Video

Open Video mode

Navigate to Video mode from the left sidebar.

Select Reference to Video mode

Click Add image below the prompt field to open the input modes tray, then select Reference-to-Video.

Upload your reference images

Upload one to four images of the subject(s) you want to appear in the video — characters, props, costume details, or scene elements. The model extracts visual features from all provided images and uses them to maintain consistency in the output.For best results:

Use images that show your subject clearly from multiple angles when possible
Avoid heavily cropped or obscured images
Provide images with consistent clothing, accessories, or design details if character or object consistency is important
Formats supported: JPG, JPEG, PNG, WEBP (min 300px, max 10 MB each)

Write a scenario prompt

Describe the new scene or action you want the video to depict. Be specific about the environment, the action, the mood, and the camera angle.Example prompts:

The character walking through a futuristic city at night, neon lights reflecting on wet streets, cinematic tracking shot
A woman in a red dress dancing in a grand ballroom, warm candlelight, slow zoom out
The robot standing on a rocky cliff overlooking a stormy ocean, dramatic wide angle, overcast sky

Describe the setting in detail. Because the model generates an entirely new video rather than animating your reference, a strong environmental description helps anchor the output and maintain visual coherence with your references.

Configure settings

Choose the video duration, aspect ratio, and resolution for your output.

Generate the video

Click Generate. The model creates a new video that maintains the visual identity of your reference subjects while placing them in the scenario you described.

Input media specifications

Images
Combined with video

Up to 4 images per generation
Minimum resolution: 300px (shortest side)
Maximum file size: 10 MB per image
Supported formats: JPG, JPEG, PNG, WEBP

Key capabilities

Multi-reference subject creation: Combine up to four images of the same subject to give the model more information about their appearance, helping it maintain consistency in clothing, accessories, and distinguishing features.
Subject consistency across the clip: Characters, props, and scenes remain visually stable throughout the generated video, even as the action and environment change.
Creative flexibility: The AI can place your subjects in any scenario you can describe — new environments, action sequences, different camera angles, or lighting conditions entirely distinct from the source images.

When to use Reference to Video vs other modes

Use Reference to Video when...

You have an existing character design, illustration, or photo and want to see it in a new scene
You want to create multiple videos featuring the same character in different situations
You need subject consistency across generated clips without being constrained by a specific starting frame

Use Image to Video instead when...

You want the video to begin exactly from a specific image
You want to animate a scene that already exists rather than create a new one
The precise composition of your source image should be preserved in the output

Use Text to Video instead when...

You don’t have a reference image and want to generate everything from a text description
You’re exploring ideas and don’t need visual consistency with existing assets

What to do next

Image to Video

Animate a specific image as the literal start frame of your video.

Edit Video

Modify an existing video using natural language commands.

Motion Control

Transfer body motion from a reference video onto a character image.

Video Credits

Understand credit costs for Reference to Video generations.

Documentation Index

​How Reference to Video differs from Image to Video

​How to use Reference to Video

​Input media specifications

​Key capabilities

​When to use Reference to Video vs other modes

​What to do next

Image to Video

Edit Video

Motion Control

Video Credits

How Reference to Video differs from Image to Video

How to use Reference to Video

Input media specifications

Key capabilities

When to use Reference to Video vs other modes

What to do next