Seedance 2.0 15% OFF | Create in Video Generator →

Seedance 2.0 Text to Video

bytedance /

Seedance 2.0 (Text-to-Video) generates Hollywood-grade cinematic videos from text prompts with native audio-visual synchronization, director-level camera and lighting control, and exceptional motion stability. Built on Seed's unified multimodal architecture, it leads on instruction adherence, motion quality, and visual aesthetics.

text-to-video
Input
Enable web search for real-time information.
Whether to generate native audio synchronized with the output video. Defaults to true.

Idle

$0.6per run·~16 / $10

Next:

ExamplesView all

A master chef, 60s, with a white cane hooked on the counter edge, works alone in a chaotic Michelin-star kitchen during dinner rush. Shot 1: Extreme close-up — his fingertips ghost across the surface of a tomato with surgical precision, mapping its shape before the knife descends in a blur. Shot 2: Wide shot — the kitchen erupts around him, flames leaping, pans clanging, young chefs sprinting — but he stands absolutely still at his station, an eye of calm in the storm. Shot 3: Over-the-shoulder shot — his hands move at impossible speed across the cutting board, paper-thin slices falling in perfect sequence without a single wasted motion. Shot 4: Close-up on his face — nostrils flaring, head tilted slightly, reading the kitchen entirely through smell and sound, a faint smile crossing his lips. Shot 5: He plates the dish by touch alone, sets it on the pass, and steps back — a young chef stares at the perfect dish in disbelief, then looks at him with wide eyes. Raw and visceral, the film crackles with the tension of mastery existing beyond the limits of sight.

A 22-year-old delivery rider with noise-canceling headphones and a battered electric bike tears through a rain-soaked megacity at 2 AM, racing the clock. Shot 1: Close-up — his hands squeeze the throttle to max, water spraying off the handlebars in sheets, delivery box strapped tight to the rack behind him. Shot 2: Wide shot — he weaves between buses and taxis at full speed, neon reflections streaking across the flooded street beneath his tires like liquid fire. Shot 3: POV shot — through his rain-spattered visor, a red light ahead, then a split-second decision — a hard lean into a narrow alley, bricks inches from his shoulder. Shot 4: Close-up on the app screen zip-locked to his handlebars — 3 minutes remaining, 1.2 km left, rating: 4.97 — his jaw tightens. Shot 5: He skids to a stop, yanks off his helmet, sprints the last 10 meters on foot to the door — rings the bell at exactly 2:00 AM — and exhales a breath he's been holding for six blocks. Breathless and kinetic, soaked in city rain and the specific dignity of someone who refuses to be late.

Related Models

README

Seedance 2.0 Text-to-Video

Seedance 2.0 is Seed's latest video generation model, built on a unified multimodal architecture that accepts text, image, audio, and video inputs. The Text-to-Video mode generates production-grade cinematic videos from text prompts alone — with native audio, director-level control, and exceptional motion stability.

Key Features

  • Unified multimodal architecture A single model that handles text, image, audio, and video inputs for comprehensive creative flexibility.

  • Native audio-visual synchronization Generates video with synchronized audio in a single pass — no separate audio generation needed.

  • Director-level control Granular control over camera movement, lighting, shadows, and character performance through natural language prompts.

  • Production-grade cinematic quality Hollywood-grade visual fidelity with dramatic lighting, professional color grading, and smooth natural motion.

  • Exceptional motion stability Industry-leading motion coherence with stable subjects, consistent physics, and fluid transitions.

  • Strong instruction adherence Accurately follows detailed scene descriptions, shot compositions, and creative direction.

Parameters

ParameterRequiredDescription
promptYesDetailed description of the cinematic scene
aspect_ratioNoOutput format: 16:9 (default), 9:16, 4:3, 3:4, 1:1, 21:9
durationNoVideo length in seconds: 4-15 (default: 5)
resolutionNoOutput resolution: 480p, 720p (default), or 1080p
reference_imagesNoReference image URLs to guide style, characters, or composition
reference_videosNoReference video URLs (total length must not exceed 15 seconds)
reference_audiosNoReference audio URLs (total length must not exceed 15 seconds)

How to Use

  1. Write your prompt — describe the scene with cinematic detail: lighting, mood, camera movement, action, and style.
  2. Select aspect ratio — 16:9 for widescreen, 9:16 for vertical, 4:3 or 3:4 for classic formats.
  3. Set duration — choose any duration from 4 to 15 seconds.
  4. Optionally add references — provide reference images, videos, or audios for style guidance.
  5. Run — submit and download your cinematic video with synchronized audio.

Pricing

Without Reference Videos

Billed per second of output duration, anchored at $0.60 per 5 seconds at 480p.

ResolutionDurationCost
480p5 s$0.60
480p10 s$1.20
480p15 s$1.80
720p5 s$1.20
720p10 s$2.40
720p15 s$3.60
1080p5 s$3.00
1080p10 s$6.00
1080p15 s$9.00

With Reference Videos

When reference_videos are provided, billing follows the same scheme as Seedance 2.0 Video-Edit: billed per second across input duration + output duration, where input duration is the total length of the supplied reference videos clamped to the 2-15 s range.

ResolutionPer second
480p$0.075
720p$0.15
1080p$0.375

Examples (reference videos totaling 5 s, output 5 s = 10 billed seconds):

ResolutionCost
480p$0.75
720p$1.50
1080p$3.75

Billing Rules

  • Without reference videos: $0.60 per 5 seconds at 480p, scaled by resolution; prorated per second.
  • With reference videos: per-second billing matching Seedance 2.0 Video-Edit, using the total reference-video duration as input (clamped 2-15 s) plus the output duration.
  • 720p: 2x the 480p price.
  • 1080p: 5x the 480p price (2.5x the 720p price).
  • Duration range: 4-15 seconds (continuous).

Best Use Cases

  • Film & Production — Generate cinematic footage for professional video projects.
  • Commercials & Ads — Create high-end promotional content with Hollywood aesthetics.
  • Music Videos — Produce visually stunning sequences with native audio sync.
  • Social Media Premium — Stand out with film-quality short-form content.
  • Concept Visualization — Pitch film and TV concepts with production-quality previews.

Pro Tips

  • Write prompts like a film director — include lighting (e.g., "dramatic rim lighting"), camera angles, and mood.
  • Use 16:9 for cinematic widescreen; 9:16 for premium vertical content.
  • Include specific visual details for best results (e.g., "golden hour sunlight casting long shadows").
  • Describe character expressions and actions for more engaging scenes.
  • Start with a short duration (4-5s) to iterate on the look, then extend up to 15s.

Notes

  • Native audio generation is included — videos come with synchronized sound.
  • Duration range: 4-15 seconds (continuous).
  • Built on the same architecture as Seedance 2.0 Image-to-Video.

Related Models

Accessibility:This website uses AI models provided by third parties.

Seedance 2.0 Text To Video API — Quick start

Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/bytedance/seedance-2.0/text-to-video with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to completed, then read the output URL from data.outputs[0]. Examples for Seedance 2.0 Text To Video below.

HTTP example
# Submit the prediction
curl -X POST "https://api.wavespeed.ai/api/v3/bytedance/seedance-2.0/text-to-video" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY" \
  -d '{
    "prompt": "A cinematic shot of a city at sunset, soft golden light",
    "aspect_ratio": "16:9",
    "resolution": "720p",
    "duration": 5,
    "enable_web_search": false,
    "generate_audio": true
}'

# Response includes a prediction id. Poll for the result:
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY"

# When status is "completed", read the output from data.outputs[0].
Node.js example
// npm install wavespeed
const WaveSpeed = require('wavespeed');

const client = new WaveSpeed(); // reads WAVESPEED_API_KEY from env

const result = await client.run("bytedance/seedance-2.0/text-to-video", {
        "prompt": "A cinematic shot of a city at sunset, soft golden light",
        "aspect_ratio": "16:9",
        "resolution": "720p",
        "duration": 5,
        "enable_web_search": false,
        "generate_audio": true
});

console.log(result.outputs[0]); // → URL of the generated output
Python example
# pip install wavespeed
import wavespeed

output = wavespeed.run(
    "bytedance/seedance-2.0/text-to-video",
    {
    "prompt": "A cinematic shot of a city at sunset, soft golden light",
    "aspect_ratio": "16:9",
    "resolution": "720p",
    "duration": 5,
    "enable_web_search": false,
    "generate_audio": true
}
)

print(output["outputs"][0])  # → URL of the generated output

Seedance 2.0 Text To Video API — Frequently asked questions

What is the Seedance 2.0 Text To Video API?

Seedance 2.0 Text To Video is a ByteDance model for video generation, exposed as a REST API on WaveSpeedAI. Seedance 2.0 (Text-to-Video) generates Hollywood-grade cinematic videos from text prompts with native audio-visual synchronization, director-level camera and lighting control, and exceptional motion stability. Built on Seed's unified multimodal architecture, it leads on instruction adherence, motion quality, and visual aesthetics. You can call it programmatically or try it from the playground above.

How do I call the Seedance 2.0 Text To Video API?

POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID; poll the prediction endpoint until status flips to "completed", then read the output URL from the result. The playground generates a ready-to-paste code sample in Python, JavaScript, or cURL for whatever inputs you've set. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/bytedance/bytedance-seedance-2.0-text-to-video.

How much does Seedance 2.0 Text To Video cost per run?

Seedance 2.0 Text To Video starts at $0.60 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.

What inputs does Seedance 2.0 Text To Video accept?

Key inputs: `prompt`, `aspect_ratio`, `resolution`, `duration`, `reference_images`, `enable_web_search`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/bytedance/bytedance-seedance-2.0-text-to-video.

How do I get started with the Seedance 2.0 Text To Video API?

Sign up for a free WaveSpeedAI account to claim starter credits, copy your API key from /accesskey, then call the endpoint shown in the API tab of the playground. The playground also auto-generates a code sample in Python, JavaScript, or cURL for the parameters you've set.

Can I use Seedance 2.0 Text To Video outputs commercially?

Commercial usage rights depend on the model's license, set by its provider (ByteDance). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.