Go to Studio

Image to Video

Transform static images into dynamic videos using AI

v1.1 — AI Video

Info

What’s New — April 2026 — Added Seedance 2.0 and Seedance 2.0 Reference models. Seedance 2.0 supports end frame, audio generation, flexible aspect ratios (including 21:9, 4:3, 3:4), and auto duration. Seedance 2.0 Reference generates video from multi-modal references — up to 9 images, 3 videos, and 3 audio files referenced in the prompt as @Image1, @Video1, @Audio1. No start frame needed for Reference mode.

Previous Updates

April 2026 (v1.1) — Multi-provider support

Multi-provider support (Veo 3.1, Sora 2/Pro, Kling v3/o3/o3 Ref), AI audio generation, prompt enhancement, end frame and reference image support, elements system for subject/style consistency, multi-prompt mode for multi-shot videos, resolution options up to 4K, and negative prompt support.

What does this node do?

The Image to Video node transforms static images into dynamic videos using AI. It supports multiple model providers, each offering different capabilities such as audio generation, end frame control, reference images, and multi-shot narratives.

Common uses:

  • Animate product photos into polished showcase videos
  • Create engaging social media videos from static visuals
  • Generate cinematic clips with camera motion and effects
  • Build multi-shot narratives from a series of images using multi-prompt mode

Quick setup

Add the Image to Video node

Find it in AI NodesAI_VIDEOImage to Video

Connect a start frame

Connect an image output to the input_start_frame input. This is the image that will be animated.

Select a model and describe the motion

Choose a model (e.g. Veo 3.1, Sora 2, Kling v3, Seedance 2.0) and write a prompt describing the desired motion and animation.

Run the workflow

Execute the workflow. The node outputs a video file.

Configuration

Model

modelName LLM selection required

The AI model to use for video generation. Each model family offers different capabilities — see the comparison table below.

Prompt

prompt string required

Description of the desired motion and animation. Supports {{variables}} for dynamic content. You can reference connected inputs using @Element1, @Element2 (Kling), or @Image1, @Video1, @Audio1 (Seedance 2.0 Reference).

Examples:

  • “Slow cinematic zoom in, soft lighting transitions”
  • “Product rotates 360 degrees on a white background”
  • “Camera pans left to right across the landscape, clouds moving”
  • “@Image1 is walking through a forest in the style of @Image2. The ambient soundtrack from @Audio1 plays throughout.” (Seedance 2.0 Ref)

Audio

generate_audio boolean default: true

Enable AI-generated audio for the video. Supported by Veo 3.1 and Seedance 2.0 models (enabled by default on Seedance).

Enhance Prompt

enhance_prompt boolean default: true

Let the AI enhance your prompt for better results. The model rewrites your prompt with more detail and cinematic direction.

Aspect Ratio

aspect_ratio string default: 16:9

Output video aspect ratio. Available options vary by model:

  • Veo 3.1: 16:9, 9:16
  • Sora 2 / Sora 2 Pro: Auto, 9:16, 16:9
  • Kling v3 / o3 / o3 Ref: 16:9, 9:16, 1:1
  • Seedance 2.0 / 2.0 Ref: Auto, 21:9, 16:9, 4:3, 1:1, 3:4, 9:16

Duration

duration_seconds number default: 8

Video duration in seconds. Range varies by model:

  • Veo 3.1: 4–8s
  • Sora 2 / Sora 2 Pro: 4, 8, or 12s
  • Kling v3 / o3 / o3 Ref: 3–15s
  • Seedance 2.0 / 2.0 Ref: Auto, or 4–15s

Number of Videos

num_videos number default: 1

Number of videos to generate (1–2).

Resolution

resolution string default: 1080p

Output video resolution. Available options vary by model (up to 4K on supported models).

Negative Prompt

negative_prompt string

Describe what you want to avoid in the generated video. Only supported by Kling models.

Example: “blurry, low quality, distorted faces, watermark”

End Frame

use_end_frame boolean default: false

Enable end frame support. When turned on, a dynamic input_end_frame input appears. Connect an image to define how the video should end. Supported by Veo 3.1, Kling models, and Seedance 2.0.

Reference Images

use_reference_images boolean default: false

Enable reference images for visual consistency. When turned on, a dynamic input_reference_images input appears. Supported by Veo 3.1 (up to 3 images) and Kling o3 Ref (up to 4 images).

Elements

elements_config json

Array of element configurations, each with an id and type (image or video). Connected element inputs can be referenced in the prompt via @Element1, @Element2, etc. to maintain subject or style consistency across the video. Supported by Kling v3 (up to 4) and Kling o3 Ref (up to 4).

Seedance References

seedance_refs_config json

Configure multi-modal reference inputs for Seedance 2.0 Reference. This model does not use a start frame — instead, all media is provided as named references and cited in the prompt.

Use the counter controls in the settings panel to add references:

  • Images (@Image1@Image9): Up to 9 reference images. JPEG, PNG, or WebP. Max 30 MB each.
  • Videos (@Video1@Video3): Up to 3 reference videos. MP4 or MOV. Resolution 480p–720p, combined duration 2–15s, total size under 50 MB.
  • Audio (@Audio1@Audio3): Up to 3 audio files. MP3 or WAV. Max 15 MB each, combined duration max 15s. Requires at least 1 image or video.

Total across all modalities must not exceed 12. Each reference creates a connector input on the node. Reference them in your prompt using @Image1, @Video1, @Audio1, etc.

Example prompt:

@Image1 is walking through a forest in the style of @Image2.
The camera follows her from behind as she moves along the path shown in @Video1.
The ambient soundtrack from @Audio1 plays throughout the scene.

Multi-Prompt

multi_prompt_enabled boolean default: false

Enable multi-shot video generation. When turned on, the video is composed of multiple sequential shots, each with its own prompt and duration.

multi_prompt_config json

Array of shot definitions, each containing a prompt and duration. Used when multi_prompt_enabled is true. Supported by Kling v3 and Kling o3 Ref.

Example:

[
  { "prompt": "Close-up of the product on a table", "duration": 5 },
  { "prompt": "Camera pulls back to reveal the full scene", "duration": 5 }
]

Model comparison

FeatureVeo 3.1Sora 2Sora 2 ProKling v3Kling o3Kling o3 RefSeedance 2.0Seedance 2.0 Ref
Start FrameOptionalOptionalOptionalRequiredRequiredOptionalRequiredNo
Aspect Ratios16:9, 9:16Auto, 9:16, 16:9Auto, 9:16, 16:916:9, 9:16, 1:116:9, 9:16, 1:116:9, 9:16, 1:1Auto, 21:9, 16:9, 4:3, 1:1, 3:4, 9:16Auto, 21:9, 16:9, 4:3, 1:1, 3:4, 9:16
Duration4–8s4, 8, 12s4, 8, 12s3–15s3–15s3–15sAuto, 4–15sAuto, 4–15s
Resolution4K, 1080p, 720pAuto, 720pAuto, 1080p, 720p1080p1080p1080p720p, 480p720p, 480p
AudioYesNoNoYesYesNoYesYes
End FrameYesNoNoYesYesYesYesNo
References3 imagesNoNoNoNo4 imagesNo9 images, 3 videos, 3 audio
ElementsNoNoNo4 maxNo4 maxNoNo
Multi-PromptNoNoNoYesNoYesNoNo

Output

output video

The generated video file.

Examples

Product animation with Veo 3.1

Model: Veo 3.1 Start frame: Product photo on a clean background Prompt: “Product slowly rotates with soft studio lighting, gentle reflections on surface, ambient background music” Audio: Enabled Duration: 6s

The node generates a polished product showcase video with synchronized AI audio.

Social media clip with Sora 2

Model: Sora 2 Start frame: Landscape photograph Prompt: “Cinematic camera pan from left to right, clouds drifting in the sky, sun rays breaking through” Aspect ratio: 9:16 Duration: 8s

Produces a vertical video ready for social media platforms.

Multi-shot narrative with Kling v3

Model: Kling v3 Start frame: Character portrait Multi-prompt enabled: true Shots:

  1. “Close-up of the character looking at the camera, subtle smile” — 5s
  2. “Camera pulls back to reveal a city skyline at sunset behind the character” — 5s
  3. “Wide aerial shot of the city as the sun sets” — 5s

Creates a 15-second narrative video with three sequential shots, maintaining visual consistency.

Reference-driven video with Seedance 2.0 Reference

Model: Seedance 2.0 Reference References: 2 images, 1 video, 1 audio Prompt: “@Image1 is walking through a forest in the style of @Image2. The camera follows her from behind as she moves along the path shown in @Video1. The ambient soundtrack from @Audio1 plays throughout the scene with birds chirping.” Duration: Auto Audio: Enabled

No start frame is needed. The model composes the video entirely from the referenced media and the prompt description. Each @Image, @Video, and @Audio tag maps to a connector input on the node.

Best practices

  • Start with high-quality images. The output quality directly depends on the input image resolution and clarity.
  • Be specific in your prompts. Describe camera motion, lighting changes, and subject movement explicitly rather than using vague terms.
  • Match the model to your needs. Use Veo 3.1 for high-res output, Sora 2 for longer clips, Kling for multi-shot narratives or element consistency, and Seedance 2.0 Reference when you need multi-modal references (images + videos + audio).
  • Use end frames for controlled transitions. When you need the video to arrive at a specific final state, provide an end frame image.
  • Keep multi-prompt shots coherent. Each shot should flow naturally into the next. Describe transitions in the prompts.

Common issues

Video quality is low or blurry Use a higher-resolution source image and increase the output resolution setting. Avoid upscaling small images before input.

Motion does not match the prompt Be more explicit about the type of motion. Instead of “make it move,” describe the exact camera movement or subject action. Enable prompt enhancement to let the model refine your description.

Audio is missing from the output AI audio generation is only supported by Veo 3.1. Verify that generate_audio is enabled and that you are using a Veo model.

Elements are not reflected in the video Ensure you reference elements in the prompt using @Element1, @Element2, etc. Elements are only supported by Kling v3 and Kling o3 Ref.

Seedance 2.0 Reference: audio is ignored Audio references require at least one image or video reference to be provided. Make sure you have added at least one @Image or @Video before adding @Audio inputs.