Seedance 2.0 Text to Video | Powerful Text-to-Video API

Home/Explore/ByteDance/Seedance 2.0/Text To Video

bytedance /

Seedance 2.0 (Text-to-Video) generates Hollywood-grade cinematic videos from text prompts with native audio-visual synchronization, director-level camera and lighting control, and exceptional motion stability. Built on Seed's unified multimodal architecture, it leads on instruction adherence, motion quality, and visual aesthetics.

text-to-video

Input

prompt*

A master chef, 60s, with a white cane hooked on the counter edge, works alone in a chaotic Michelin-star kitchen during dinner rush.
Shot 1: Extreme close-up — his fingertips ghost across the surface of a tomato with surgical precision, mapping its shape before the knife descends in a blur.
Shot 2: Wide shot — the kitchen erupts around him, flames leaping, pans clanging, young chefs sprinting — but he stands absolutely still at his station, an eye of calm in the storm.
Shot 3: Over-the-shoulder shot — his hands move at impossible speed across the cutting board, paper-thin slices falling in perfect sequence without a single wasted motion.
Shot 4: Close-up on his face — nostrils flaring, head tilted slightly, reading the kitchen entirely through smell and sound, a faint smile crossing his lips.
Shot 5: He plates the dish by touch alone, sets it on the pass, and steps back — a young chef stares at the perfect dish in disbelief, then looks at him with wide eyes.
Raw and visceral, the film crackles with the tension of mastery existing beyond the limits of sight.

reference_images

reference_videos

reference_audios

aspect_ratio

resolution

duration

enable_web_search

Enable web search for real-time information.

generate_audio

Whether to generate native audio synchronized with the output video. Defaults to true.

Enable Safety Checker

Idle

$0.6per run·~16 / $10

ExamplesView all

A master chef, 60s, with a white cane hooked on the counter edge, works alone in a chaotic Michelin-star kitchen during dinner rush. Shot 1: Extreme close-up — his fingertips ghost across the surface of a tomato with surgical precision, mapping its shape before the knife descends in a blur. Shot 2: Wide shot — the kitchen erupts around him, flames leaping, pans clanging, young chefs sprinting — but he stands absolutely still at his station, an eye of calm in the storm. Shot 3: Over-the-shoulder shot — his hands move at impossible speed across the cutting board, paper-thin slices falling in perfect sequence without a single wasted motion. Shot 4: Close-up on his face — nostrils flaring, head tilted slightly, reading the kitchen entirely through smell and sound, a faint smile crossing his lips. Shot 5: He plates the dish by touch alone, sets it on the pass, and steps back — a young chef stares at the perfect dish in disbelief, then looks at him with wide eyes. Raw and visceral, the film crackles with the tension of mastery existing beyond the limits of sight.

A 22-year-old delivery rider with noise-canceling headphones and a battered electric bike tears through a rain-soaked megacity at 2 AM, racing the clock. Shot 1: Close-up — his hands squeeze the throttle to max, water spraying off the handlebars in sheets, delivery box strapped tight to the rack behind him. Shot 2: Wide shot — he weaves between buses and taxis at full speed, neon reflections streaking across the flooded street beneath his tires like liquid fire. Shot 3: POV shot — through his rain-spattered visor, a red light ahead, then a split-second decision — a hard lean into a narrow alley, bricks inches from his shoulder. Shot 4: Close-up on the app screen zip-locked to his handlebars — 3 minutes remaining, 1.2 km left, rating: 4.97 — his jaw tightens. Shot 5: He skids to a stop, yanks off his helmet, sprints the last 10 meters on foot to the door — rings the bell at exactly 2:00 AM — and exhales a breath he's been holding for six blocks. Breathless and kinetic, soaked in city rain and the specific dignity of someone who refuses to be late.

Related Models

ltx-2.3-spicy/image-to-video-lora

lora-support

ltx-2.3-spicy/image-to-video

image-to-video

zonos2

audio-to-audio

open-video/image-to-video-lora

lora-support

open-video/image-to-video

image-to-video

ray-3.2/video-edit

video-to-video

README

Seedance 2.0 Text-to-Video

Seedance 2.0 is Seed's latest video generation model, built on a unified multimodal architecture that accepts text, image, audio, and video inputs. The Text-to-Video mode generates production-grade cinematic videos from text prompts alone — with native audio, director-level control, and exceptional motion stability.

Key Features

Unified multimodal architecture A single model that handles text, image, audio, and video inputs for comprehensive creative flexibility.
Native audio-visual synchronization Generates video with synchronized audio in a single pass — no separate audio generation needed.
Director-level control Granular control over camera movement, lighting, shadows, and character performance through natural language prompts.
Production-grade cinematic quality Hollywood-grade visual fidelity with dramatic lighting, professional color grading, and smooth natural motion.
Exceptional motion stability Industry-leading motion coherence with stable subjects, consistent physics, and fluid transitions.
Strong instruction adherence Accurately follows detailed scene descriptions, shot compositions, and creative direction.

Parameters

Parameter	Required	Description
prompt	Yes	Detailed description of the cinematic scene
aspect_ratio	No	Output format: 16:9 (default), 9:16, 4:3, 3:4, 1:1, 21:9
duration	No	Video length in seconds: 4-15 (default: 5)
resolution	No	Output resolution: 480p, 720p (default), or 1080p
reference_images	No	Reference image URLs to guide style, characters, or composition
reference_videos	No	Reference video URLs (total length must not exceed 15 seconds)
reference_audios	No	Reference audio URLs (total length must not exceed 15 seconds)

How to Use

Write your prompt — describe the scene with cinematic detail: lighting, mood, camera movement, action, and style.
Select aspect ratio — 16:9 for widescreen, 9:16 for vertical, 4:3 or 3:4 for classic formats.
Set duration — choose any duration from 4 to 15 seconds.
Optionally add references — provide reference images, videos, or audios for style guidance.
Run — submit and download your cinematic video with synchronized audio.

Pricing

Without Reference Videos

Billed per second of output duration, anchored at $0.60 per 5 seconds at 480p.

Resolution	Duration	Cost
480p	5 s	$0.60
480p	10 s	$1.20
480p	15 s	$1.80
720p	5 s	$1.20
720p	10 s	$2.40
720p	15 s	$3.60
1080p	5 s	$3.00
1080p	10 s	$6.00
1080p	15 s	$9.00

With Reference Videos

When reference_videos are provided, billing follows the same scheme as Seedance 2.0 Video-Edit: billed per second across input duration + output duration, where input duration is the total length of the supplied reference videos clamped to the 2-15 s range.

Resolution	Per second
480p	$0.075
720p	$0.15
1080p	$0.375

Examples (reference videos totaling 5 s, output 5 s = 10 billed seconds):

Resolution	Cost
480p	$0.75
720p	$1.50
1080p	$3.75

Billing Rules

Without reference videos: $0.60 per 5 seconds at 480p, scaled by resolution; prorated per second.
With reference videos: per-second billing matching Seedance 2.0 Video-Edit, using the total reference-video duration as input (clamped 2-15 s) plus the output duration.
720p: 2x the 480p price.
1080p: 5x the 480p price (2.5x the 720p price).
Duration range: 4-15 seconds (continuous).

Best Use Cases

Film & Production — Generate cinematic footage for professional video projects.
Commercials & Ads — Create high-end promotional content with Hollywood aesthetics.
Music Videos — Produce visually stunning sequences with native audio sync.
Social Media Premium — Stand out with film-quality short-form content.
Concept Visualization — Pitch film and TV concepts with production-quality previews.

Pro Tips

Write prompts like a film director — include lighting (e.g., "dramatic rim lighting"), camera angles, and mood.
Use 16:9 for cinematic widescreen; 9:16 for premium vertical content.
Include specific visual details for best results (e.g., "golden hour sunlight casting long shadows").
Describe character expressions and actions for more engaging scenes.
Start with a short duration (4-5s) to iterate on the look, then extend up to 15s.

Notes

Native audio generation is included — videos come with synchronized sound.
Duration range: 4-15 seconds (continuous).
Built on the same architecture as Seedance 2.0 Image-to-Video.

Related Models

Seedance 2.0 Image-to-Video — Generate video from reference images + prompt.
Seedance 2.0 Fast Text-to-Video — Faster generation at lower cost.
Seedance 2.0 Fast Image-to-Video — Fast image-guided video generation.
Seedance V1.5 Pro Text-to-Video — Previous generation Seedance model.

Accessibility:This website uses AI models provided by third parties.

ExamplesView all

Related Models

README

Seedance 2.0 Text-to-Video

Key Features

Parameters

How to Use

Pricing

Without Reference Videos

With Reference Videos

Billing Rules

Best Use Cases

Pro Tips

Notes

Related Models

Seedance 2.0 Text To Video API — Quick start

Seedance 2.0 Text To Video API — Frequently asked questions