Available Modes
- Text to Video — Generate videos from text descriptions
- Image to Video — Transform static images into dynamic videos
- First and Last Frame — Control video output with specified start and end frames
- References — Supports up to 12 references (images, videos, and audio)

