Seedance 2.0 (Text-to-Video) generates Hollywood-grade cinematic videos from text prompts with native audio-visual synchronization, director-level camera and lighting control, and exceptional motion stability. Built on Seed's unified multimodal architecture, it leads on instruction adherence, motion quality, and visual aesthetics.
Idle
$0.6per run·~16 / $10
A master chef, 60s, with a white cane hooked on the counter edge, works alone in a chaotic Michelin-star kitchen during dinner rush. Shot 1: Extreme close-up — his fingertips ghost across the surface of a tomato with surgical precision, mapping its shape before the knife descends in a blur. Shot 2: Wide shot — the kitchen erupts around him, flames leaping, pans clanging, young chefs sprinting — but he stands absolutely still at his station, an eye of calm in the storm. Shot 3: Over-the-shoulder shot — his hands move at impossible speed across the cutting board, paper-thin slices falling in perfect sequence without a single wasted motion. Shot 4: Close-up on his face — nostrils flaring, head tilted slightly, reading the kitchen entirely through smell and sound, a faint smile crossing his lips. Shot 5: He plates the dish by touch alone, sets it on the pass, and steps back — a young chef stares at the perfect dish in disbelief, then looks at him with wide eyes. Raw and visceral, the film crackles with the tension of mastery existing beyond the limits of sight.
A 22-year-old delivery rider with noise-canceling headphones and a battered electric bike tears through a rain-soaked megacity at 2 AM, racing the clock. Shot 1: Close-up — his hands squeeze the throttle to max, water spraying off the handlebars in sheets, delivery box strapped tight to the rack behind him. Shot 2: Wide shot — he weaves between buses and taxis at full speed, neon reflections streaking across the flooded street beneath his tires like liquid fire. Shot 3: POV shot — through his rain-spattered visor, a red light ahead, then a split-second decision — a hard lean into a narrow alley, bricks inches from his shoulder. Shot 4: Close-up on the app screen zip-locked to his handlebars — 3 minutes remaining, 1.2 km left, rating: 4.97 — his jaw tightens. Shot 5: He skids to a stop, yanks off his helmet, sprints the last 10 meters on foot to the door — rings the bell at exactly 2:00 AM — and exhales a breath he's been holding for six blocks. Breathless and kinetic, soaked in city rain and the specific dignity of someone who refuses to be late.
Seedance 2.0 is Seed's latest video generation model, built on a unified multimodal architecture that accepts text, image, audio, and video inputs. The Text-to-Video mode generates production-grade cinematic videos from text prompts alone — with native audio, director-level control, and exceptional motion stability.
Unified multimodal architecture A single model that handles text, image, audio, and video inputs for comprehensive creative flexibility.
Native audio-visual synchronization Generates video with synchronized audio in a single pass — no separate audio generation needed.
Director-level control Granular control over camera movement, lighting, shadows, and character performance through natural language prompts.
Production-grade cinematic quality Hollywood-grade visual fidelity with dramatic lighting, professional color grading, and smooth natural motion.
Exceptional motion stability Industry-leading motion coherence with stable subjects, consistent physics, and fluid transitions.
Strong instruction adherence Accurately follows detailed scene descriptions, shot compositions, and creative direction.
| Parameter | Required | Description |
|---|---|---|
| prompt | Yes | Detailed description of the cinematic scene |
| aspect_ratio | No | Output format: 16:9 (default), 9:16, 4:3, 3:4, 1:1, 21:9 |
| duration | No | Video length in seconds: 4-15 (default: 5) |
| resolution | No | Output resolution: 480p, 720p (default), or 1080p |
| reference_images | No | Reference image URLs to guide style, characters, or composition |
| reference_videos | No | Reference video URLs (total length must not exceed 15 seconds) |
| reference_audios | No | Reference audio URLs (total length must not exceed 15 seconds) |
Billed per second of output duration, anchored at $0.60 per 5 seconds at 480p.
| Resolution | Duration | Cost |
|---|---|---|
| 480p | 5 s | $0.60 |
| 480p | 10 s | $1.20 |
| 480p | 15 s | $1.80 |
| 720p | 5 s | $1.20 |
| 720p | 10 s | $2.40 |
| 720p | 15 s | $3.60 |
| 1080p | 5 s | $3.00 |
| 1080p | 10 s | $6.00 |
| 1080p | 15 s | $9.00 |
When reference_videos are provided, billing follows the same scheme as Seedance 2.0 Video-Edit: billed per second across input duration + output duration, where input duration is the total length of the supplied reference videos clamped to the 2-15 s range.
| Resolution | Per second |
|---|---|
| 480p | $0.075 |
| 720p | $0.15 |
| 1080p | $0.375 |
Examples (reference videos totaling 5 s, output 5 s = 10 billed seconds):
| Resolution | Cost |
|---|---|
| 480p | $0.75 |
| 720p | $1.50 |
| 1080p | $3.75 |
Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/bytedance/seedance-2.0/text-to-video with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to completed, then read the output URL from data.outputs[0]. Examples for Seedance 2.0 Text To Video below.
# Submit the prediction
curl -X POST "https://api.wavespeed.ai/api/v3/bytedance/seedance-2.0/text-to-video" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $WAVESPEED_API_KEY" \
-d '{
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"aspect_ratio": "16:9",
"resolution": "720p",
"duration": 5,
"enable_web_search": false,
"generate_audio": true
}'
# Response includes a prediction id. Poll for the result:
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
-H "Authorization: Bearer $WAVESPEED_API_KEY"
# When status is "completed", read the output from data.outputs[0].// npm install wavespeed
const WaveSpeed = require('wavespeed');
const client = new WaveSpeed(); // reads WAVESPEED_API_KEY from env
const result = await client.run("bytedance/seedance-2.0/text-to-video", {
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"aspect_ratio": "16:9",
"resolution": "720p",
"duration": 5,
"enable_web_search": false,
"generate_audio": true
});
console.log(result.outputs[0]); // → URL of the generated output# pip install wavespeed
import wavespeed
output = wavespeed.run(
"bytedance/seedance-2.0/text-to-video",
{
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"aspect_ratio": "16:9",
"resolution": "720p",
"duration": 5,
"enable_web_search": false,
"generate_audio": true
}
)
print(output["outputs"][0]) # → URL of the generated outputSeedance 2.0 Text To Video is a ByteDance model for video generation, exposed as a REST API on WaveSpeedAI. Seedance 2.0 (Text-to-Video) generates Hollywood-grade cinematic videos from text prompts with native audio-visual synchronization, director-level camera and lighting control, and exceptional motion stability. Built on Seed's unified multimodal architecture, it leads on instruction adherence, motion quality, and visual aesthetics. You can call it programmatically or try it from the playground above.
POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID; poll the prediction endpoint until status flips to "completed", then read the output URL from the result. The playground generates a ready-to-paste code sample in Python, JavaScript, or cURL for whatever inputs you've set. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/bytedance/bytedance-seedance-2.0-text-to-video.
Seedance 2.0 Text To Video starts at $0.60 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.
Key inputs: `prompt`, `aspect_ratio`, `resolution`, `duration`, `reference_images`, `enable_web_search`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/bytedance/bytedance-seedance-2.0-text-to-video.
Sign up for a free WaveSpeedAI account to claim starter credits, copy your API key from /accesskey, then call the endpoint shown in the API tab of the playground. The playground also auto-generates a code sample in Python, JavaScript, or cURL for the parameters you've set.
Commercial usage rights depend on the model's license, set by its provider (ByteDance). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.