Model Details
# Veo 3.1 Reference-to-Video
This advanced video generation model enables users to create high-quality, realistic videos that are visually guided by specific reference images. Unlike standard text-to-video approaches, this model accepts up to three input images (Reference Images) alongside a text prompt, allowing for precise control over the visual style, character appearance, color palette, and composition of the final output.
### Key Capabilities
* **Image-Based Direction:** Supply 1 to 3 images to acting as distinct visual anchors. The model synthesizes these references to ensure the generated video maintains consistency with your provided art direction or subject matter. * **Cinematic Realism:** Captures complex visual details, lighting effects, and camera movements, rendering outputs in either 720p or 1080p resolution. * **Native Audio Generation:** The model understands the semantic content of the scene to generate synchronized audio tracks—including sound effects and ambient noise—that match the action unfolding in the video. * **Precise Control:** Use the text prompt to direct the *action* and *movement* within the scene, while relying on the reference images to define the *aesthetics* and *objects*.
### Use Cases
This tool is ideal for creative professionals, marketers, and storytellers who need to animate static assets. Use cases include bringing concept art to life, creating consistent social media content from product photos, or visualizing storyboard frames with specific stylistic constraints. By bridging the gap between static imagery and dynamic video, it allows for a seamless workflow where brand identity or artistic vision remains intact across the generation process.
### Technical Details
The model generates videos at 24 frames per second. While the text prompt drives the narrative progression, the reference images restrict the visual drift often seen in generative AI, ensuring the output adheres to the supplied visual context. The output includes a single video file with an integrated audio track.
To run via ModelRunner javascript client, use the following code:
```javascript import { modelrunner } from "@modelrunner/client";
const result = await modelrunner.subscribe('google/veo-3.1-reference-to-video', { input: { "prompt": "A cinematic drone shot panning over the futuristic city, neon lights reflecting on wet pavement.", "image_urls": [ "https://example.com/images/cyberpunk_city_visual_ref.jpg", "https://example.com/images/neon_color_palette.jpg" ], "negative_prompt": "blurry, low resolution, distorted, cartoon", "resolution": "1080p", "seed": 12345 } }); ```
