WaveSpeedAI·video·From $0.075/run

InfiniteTalk API

WaveSpeedAI InfiniteTalk — converts one photo + audio into talking OR singing avatar videos, up to 10 minutes. Standard and Fast tiers, plus Multi variants (single image + two audio inputs for multi-character output) and Video-to-Video variants (drive an existing video with new audio).

Standard and Fast tiers, plus 720p output. Multi variants accept a single image + two audio inputs to generate multi-character talking/singing video. Video-to-Video variants drive an existing video with new audio for lip-sync replacement. Up to 10-minute outputs.

Open Playground →View API Docs

About the InfiniteTalk API

What InfiniteTalk does, how it fits in the WaveSpeedAI model lineup, and why teams reach for it.

InfiniteTalk is a video generation model from WaveSpeedAI, available through the WaveSpeedAI REST API. WaveSpeedAI InfiniteTalk — converts one photo + audio into talking OR singing avatar videos, up to 10 minutes. Standard and Fast tiers, plus Multi variants (single image + two audio inputs for multi-character output) and Video-to-Video variants (drive an existing video with new audio).

Standard and Fast tiers, plus 720p output. Multi variants accept a single image + two audio inputs to generate multi-character talking/singing video. Video-to-Video variants drive an existing video with new audio for lip-sync replacement. Up to 10-minute outputs.

The InfiniteTalk family on WaveSpeedAI ships 8 REST endpoints covering Digital-Human workflow. Each variant carries its own pricing, parameter knobs, and example outputs — pick the one that matches your input modality and production constraints, or call several from the same API key to compose multi-step pipelines.

Run InfiniteTalk through the same API key, billing account, and rate-limit envelope you use for the other 1,000+ AI models on WaveSpeedAI. No separate vendor setup, no per-provider SDKs, no per-vendor rate-limit envelopes — one integration covers everything from text-to-image and text-to-video through audio synthesis, 3D generation, upscaling, and editing.

All InfiniteTalk API endpoints

8 endpoints available now on WaveSpeedAI — pick the variant that matches your workflow.

Video To Video Multi (Fast)

InfiniteTalk fast video-to-video multi converts a video and two audio inputs into multi-character talking or singing videos. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-humanfrom $0.075

Video To Video (Fast)

Audio-driven infinitetalk-fast turns one video plus audio into realistic talking or singing videos with lip-sync. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-humanfrom $0.075

Video To Video Multi

InfiniteTalk Video-to-Video Multi converts a video and two audio inputs into multi-character talking or singing videos at up to 720p. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-humanfrom $0.15

Multi (Fast)

InfiniteTalk fast multi converts a single image and two audio inputs into multi-character talking or singing videos. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-humanfrom $0.075

Multi

InfiniteTalk Multi converts a single image and two audio inputs into multi-character talking or singing videos at up to 720p. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-humanfrom $0.15

Infinitetalk Fast (Fast)

InfiniteTalk fast converts one photo + audio into audio-driven talking or singing avatar videos (Image-to-Video), up to 10 minutes. Ready-to-use REST API, no coldstarts, affordable pricing.

digital-humanfrom $0.075

Video To Video

Audio-driven InfiniteTalk turns one video plus audio into realistic talking or singing videos with lip-sync in 480p or 720p. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-humanfrom $0.15

Infinitetalk

InfiniteTalk converts one photo + audio into audio-driven talking or singing avatar videos (Image-to-Video), up to 10 minutes, 720p tier $0.30/5s. Ready-to-use REST API, no coldstarts, affordable pricing.

digital-humanfrom $0.15

See InfiniteTalk in action

Real outputs generated by the InfiniteTalk API. Hover any video to preview, click to open the full-size viewer.

How to use the InfiniteTalk API

Four steps from signup to a finished generation. Full Python, Node.js, and cURL examples are in the API section below.

1
Get an API key
Sign up for a WaveSpeedAI account and copy your API key from the dashboard. New accounts come with free starter credits — enough to run the playground a few dozen times before billing kicks in.
2
Submit a prediction
POST your input as JSON to https://api.wavespeed.ai/api/v3/wavespeed-ai/infinitetalk. The endpoint returns a prediction id immediately — generations are async so you don't hold an open connection during inference.
3
Poll for completion
GET https://api.wavespeed.ai/api/v3/predictions/{request_id}/result every 1-2 seconds. The response includes a status field; keep polling until it flips from"queued" or"processing" to"completed".
4
Read the output URL
Once status is"completed", read the URL from data.outputs[0]. The URL points to your generated media on the WaveSpeedAI CDN — image, video, audio, or 3D file depending on the InfiniteTalk variant you called.

What you can build with InfiniteTalk

Common workflows developers and creators use the InfiniteTalk API for.

Talking OR singing avatar from a photo

wavespeed-ai/infinitetalk converts one photo + audio into talking or singing avatar videos. Catalog framing: "Image-to-Video, up to 10 minutes, 720p tier." Singing support is a distinguishing feature.

talkingsingingavatar

Multi-character with two audio inputs

wavespeed-ai/infinitetalk/multi accepts a single image + two audio inputs to generate multi-character talking/singing videos at up to 720p. Useful for dialog scenes, duets, and two-person presentations from a single setup.

multi-charactertwo-audiodialog

Video-to-video lip-sync replacement

wavespeed-ai/infinitetalk/video-to-video drives an existing video with new audio, producing realistic talking or singing with lip-sync (480p or 720p). Useful for re-dubbing existing footage with a new voiceover.

video-to-videodublip-sync

Up to 10-minute outputs

Catalog claim on the base variant: up to 10 minutes. Significantly longer than most video models — usable for podcasts, lectures, long-form narration, and full-episode talking-avatar content in one generation.

long-form10-minutepodcast

Fast tier

wavespeed-ai/infinitetalk-fast. The same image-to-video flow with optimized inference — useful for high-volume work and pre-production iteration. Multi and video-to-video also ship Fast variants.

fastcostiteration

Multi-character video-to-video

wavespeed-ai/infinitetalk/video-to-video-multi accepts a video and two audio inputs for multi-character talking/singing at up to 720p. The full combination of multi-character + video-source workflows in one call.

multivideo-sourcetwo-character

Tips for prompting InfiniteTalk

Practical advice for getting better outputs from InfiniteTalk — drawn from the patterns that work across video models in production pipelines.

Clean audio first, sync second

Lip-sync quality is bottlenecked by audio clarity. Remove background noise, normalize levels, and check for clipping before feeding the audio in. Clean audio improves lip-sync more than any other variable.

Match reference image to audio language/culture

English-language audio paired with a clearly Western character looks more natural than a mismatch. Same for Japanese / Chinese / Korean audio + corresponding character references.

Front-facing portraits sync cleanest

Straight-on portrait references produce the most natural lip-sync. Three-quarter and profile angles work, but with subtle artifacts. If you have control over the character image, supply a near-frontal pose.

Singing is supported, not just talking

Catalog feature: "talking or singing avatar videos." Use for music videos, choir / soloist content, lyric videos with a featured character — not just spoken-word content.

Multi variant accepts two audio inputs

wavespeed-ai/infinitetalk/multi takes a single image + two audio inputs to generate multi-character talking/singing videos at up to 720p. Useful for dialog scenes, duets, and two-person presentations from one setup.

Video-to-video for re-dubbing existing footage

wavespeed-ai/infinitetalk/video-to-video drives an existing video with new audio, producing realistic talking or singing with lip-sync. Useful for re-dubbing clips with a new voiceover while keeping the original visual.

InfiniteTalk API pricing

Pricing is per-output. The final charge scales with the parameters you set in each variant's playground (resolution, duration, output count, references).

Endpoint	Type	Starting price
wavespeed-ai/infinitetalk-fast/video-to-video-multi	digital-human	$0.075
wavespeed-ai/infinitetalk-fast/video-to-video	digital-human	$0.075
wavespeed-ai/infinitetalk/video-to-video-multi	digital-human	$0.15
wavespeed-ai/infinitetalk-fast/multi	digital-human	$0.075
wavespeed-ai/infinitetalk/multi	digital-human	$0.15
wavespeed-ai/infinitetalk-fast	digital-human	$0.075
wavespeed-ai/infinitetalk/video-to-video	digital-human	$0.15
wavespeed-ai/infinitetalk	digital-human	$0.15

Call the InfiniteTalk API

Sign up for an API key at wavespeed.ai/accesskey, then submit a prediction via REST. The playground generates ready-to-paste samples for any combination of inputs.

HTTP example

# 1. Submit a prediction
curl -X POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/infinitetalk" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY" \
  -d '{}'

# 2. Poll the result until status = "completed"
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY"

# Read the output URL from data.outputs[0].

Node.js example

// npm install wavespeed
const WaveSpeed = require('wavespeed');
const client = new WaveSpeed(); // reads WAVESPEED_API_KEY

const result = await client.run("wavespeed-ai/infinitetalk", {});
console.log(result.outputs[0]); // → URL of the generated output

Python example

# pip install wavespeed
import wavespeed

output = wavespeed.run(
    "wavespeed-ai/infinitetalk",
    {}
)
print(output["outputs"][0])  # → URL of the generated output

InfiniteTalk vs alternatives

When to pick InfiniteTalk over similar models on WaveSpeedAI.

InfiniteTalk vs Wan 2.2 Speech-to-Video

Wan 2.2 Speech-to-Video (wavespeed-ai/wan-2.2/speech-to-video) supports up to 10-minute clips at 480p — same maximum duration as InfiniteTalk. InfiniteTalk adds Multi variants (two audio inputs), Video-to-Video, and a Fast tier that Wan 2.2 doesn't ship as a separate endpoint.

InfiniteTalk vs Stock avatar tools

Stock-avatar tools (HeyGen-style) limit you to a curated library of pre-trained avatars. InfiniteTalk accepts any character image — brand mascot, AI-generated character, illustrated host — without per-character setup, and supports singing as well as talking.

InfiniteTalk vs ElevenLabs voice tools

ElevenLabs handles voice (generation, cloning, multilingual TTS). InfiniteTalk is the video layer: pair an ElevenLabs voiceover with a character image (or existing video) to produce a full lip-synced video.

InfiniteTalk API — Frequently asked questions

Pricing, license, integration — common questions about running InfiniteTalk on WaveSpeedAI.

What is the InfiniteTalk API?

InfiniteTalk is a WaveSpeedAI video generation model exposed as a REST API on WaveSpeedAI. WaveSpeedAI InfiniteTalk — converts one photo + audio into talking OR singing avatar videos, up to 10 minutes. Standard and Fast tiers, plus Multi variants (single image + two audio inputs for multi-character output) and Video-to-Video variants (drive an existing video with new audio). You can call it programmatically or try it from the playground linked above.

How do I call the InfiniteTalk API?

Sign up for a WaveSpeedAI account, copy your API key from /accesskey, then POST to https://api.wavespeed.ai/api/v3/wavespeed-ai/infinitetalk with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to "completed", then read the output URL from data.outputs[0]. Full Python / Node.js / cURL examples are above.

How much does the InfiniteTalk API cost?

InfiniteTalk starts at $0.075 per run. The exact cost scales with the parameters you set (resolution, duration, output count, references). The live cost preview next to the Generate button in the playground shows the exact price for your current input.

Which InfiniteTalk variants are available?

WaveSpeedAI hosts 8 InfiniteTalk endpoints: wavespeed-ai/infinitetalk-fast/video-to-video-multi, wavespeed-ai/infinitetalk-fast/video-to-video, wavespeed-ai/infinitetalk/video-to-video-multi, wavespeed-ai/infinitetalk-fast/multi, wavespeed-ai/infinitetalk/multi, wavespeed-ai/infinitetalk-fast, wavespeed-ai/infinitetalk/video-to-video, wavespeed-ai/infinitetalk. Each variant has its own playground page and pricing.

Can I use InfiniteTalk outputs commercially?

Commercial usage rights follow the WaveSpeedAI model license. Most WaveSpeedAI models permit commercial output use; see each model's playground page for the specific license summary, and WaveSpeedAI's Terms of Service for platform-level conditions.

Why use InfiniteTalk on WaveSpeedAI instead of going direct?

One API key + one billing account across InfiniteTalk AND 1,000+ other AI models from other providers. No per-vendor SDK setup, no separate rate-limit envelopes, no rewrite-per-vendor integration code. Pricing is typically at parity with or below WaveSpeedAI's direct API.

About WaveSpeedAI

The team behind InfiniteTalk and the broader WaveSpeedAI model lineup on WaveSpeedAI.

WaveSpeedAI runs an inference platform that hosts 1,000+ AI models from every major provider — ByteDance, Google, OpenAI, Alibaba, Kuaishou, ElevenLabs, and dozens of independent labs — behind one API key, one billing account, and one rate-limit envelope. WaveSpeedAI also ships first-party models (Image / Video Upscalers, Watermark Removers, Animate, InfiniteTalk) tuned for production pipelines.

Related model APIs on WaveSpeedAI

Other AI APIs from WaveSpeedAI and the rest of the video model lineup — one API key, one billing account.

Image Upscaler Collection API

WaveSpeedAI

Ten image super-resolution endpoints from five providers — WaveSpeedAI (Standard, Ultimate, Real-ESRGAN), Clarity AI (Creative, Crystal, Flux, Pro), Recraft (Creative, Crisp), and Pruna AI — all reachable through one WaveSpeedAI API key. Pick by target resolution (up to 16×), style preservation, prompt-guided refinement, or per-call cost.

Video Upscaler Collection API

WaveSpeedAI

Eight video super-resolution endpoints from five providers — WaveSpeedAI (Standard, Pro, Ultimate, SeedVR2, LTX-2 19B), ByteDance, Bria, and Clarity AI — all reachable through one WaveSpeedAI API key. Pick by target resolution (up to 8K), source length, motion characteristics, and per-call cost.

Video Watermark Remover API

WaveSpeedAI

WaveSpeedAI Video Watermark Remover — removes Kling and Seedance watermarks, logos, captions, and text from videos while preserving quality. Supports many formats and 10-minute files.

Spicy Image-to-Video Collection API

WaveSpeedAI

Curated lineup of image-to-video endpoints optimized for unlimited, high-volume content generation — spans ByteDance Seedance 2.0 and 1.5 Pro, Alibaba Wan 2.7, 2.6, and 2.2, and Shengshu Vidu Q3, with LoRA customization and video-extend support on the Wan 2.2 line, all reachable through a single WaveSpeedAI API key.

Seedance 2.0 API

ByteDance

ByteDance Seedance 2.0 — Hollywood-grade cinematic video with native audio-visual synchronization, director-level camera and lighting control, and exceptional motion stability. Built on Seed's unified multimodal architecture.

Seedance 1.5 Pro API

ByteDance

ByteDance Seedance 1.5 Pro — cinematic, live-action-leaning clips with strong prompt adherence, expressive motion, and stable aesthetics. 4-12s duration with Smart Duration, multiple aspect ratios, reproducible generation via seeds.

Start building with InfiniteTalk on WaveSpeedAI

Free starter credits on signup. One API key across 1,000+ AI models from WaveSpeedAI and every other provider.

Open InfiniteTalk Playground →Get an API Key

InfiniteTalk API

About the InfiniteTalk API

All InfiniteTalk API endpoints

Video To Video Multi (Fast)

Video To Video (Fast)

Video To Video Multi

Multi (Fast)

Multi

Infinitetalk Fast (Fast)

Video To Video

Infinitetalk

See InfiniteTalk in action

How to use the InfiniteTalk API

Get an API key

Submit a prediction

Poll for completion

Read the output URL

What you can build with InfiniteTalk

Talking OR singing avatar from a photo

Multi-character with two audio inputs

Video-to-video lip-sync replacement

Up to 10-minute outputs

Fast tier

Multi-character video-to-video

Tips for prompting InfiniteTalk

Clean audio first, sync second

Match reference image to audio language/culture

Front-facing portraits sync cleanest

Singing is supported, not just talking

Multi variant accepts two audio inputs

Video-to-video for re-dubbing existing footage

InfiniteTalk API pricing

Call the InfiniteTalk API

InfiniteTalk vs alternatives

InfiniteTalk vs Wan 2.2 Speech-to-Video

InfiniteTalk vs Stock avatar tools

InfiniteTalk vs ElevenLabs voice tools

InfiniteTalk API — Frequently asked questions

About WaveSpeedAI

Related model APIs on WaveSpeedAI

Image Upscaler Collection API

Video Upscaler Collection API

Video Watermark Remover API

Spicy Image-to-Video Collection API

Seedance 2.0 API

Seedance 1.5 Pro API

Start building with InfiniteTalk on WaveSpeedAI