OpenAI Whisper Speech-to-Text

WaveSpeed's Whisper deployment delivers production-ready speech recognition built on the large-v3-turbo checkpoint. Upload audio (MP3, WAV, FLAC) and receive accurate transcripts with automatic language detection.

Highlights

Multilingual recognition across 50+ languages
Automatic punctuation and casing
Robust to background noise and accents
Runs on GPU-accelerated infrastructure for fast turnaround

Quick Start

Provide an audio file or HTTPS URL in the "audio" field.
Submit the request via API or dashboard.
Receive a JSON response containing the transcribed text.

Example output:

{
  "outputs": {
    "text": "Hello everyone, welcome to the show."
  }
}

Best Practices

Prefer 16 kHz mono WAV or high-quality MP3 for optimal accuracy.
Clip length up to 60 minutes per request.
For long recordings, break audio into segments with natural pauses.

Pricing

Usage is billed per request based on audio duration. Contact the WaveSpeed team for volume discounts and custom SLAs.

wavespeed-ai/openai-whisper

Instant, accurate speech-to-text powered by Whisper large-v3-turbo. Upload audio and receive multilingual transcripts with automatic language detection and punctuation.

README

OpenAI Whisper Speech-to-Text

Highlights

Quick Start

Best Practices

Pricing