Home/Explore/wavespeed-ai/openai-whisper-turbo

speech-to-text

wavespeed-ai/openai-whisper-turbo

Instant, accurate speech-to-text powered by Whisper large-v3-turbo. Upload audio and receive multilingual transcripts with automatic language detection and punctuation.

Hint: You can drag and drop a file or click to upload

If set to true, the function will wait for the image to be generated and uploaded before returning the response. It allows you to get the image directly in the response. This property is only available through the API.

Idle

Your request will cost $0.0007 per run.

For $1 you can run this model approximately 1428 times.

README

OpenAI Whisper Speech-to-Text

WaveSpeed's Whisper deployment delivers production-ready speech recognition built on the large-v3-turbo checkpoint. Upload audio (MP3, WAV, FLAC) and receive accurate transcripts with automatic language detection.

Highlights

  • Multilingual recognition across 50+ languages
  • Automatic punctuation and casing
  • Robust to background noise and accents
  • Runs on GPU-accelerated infrastructure for fast turnaround

Quick Start

  1. Provide an audio file or HTTPS URL in the "audio" field.
  2. Submit the request via API or dashboard.
  3. Receive a JSON response containing the transcribed text.

Example output:

{
  "outputs": {
    "text": "Hello everyone, welcome to the show."
  }
}

Best Practices

  • Prefer 16 kHz mono WAV or high-quality MP3 for optimal accuracy.
  • Clip length up to 60 minutes per request.
  • For long recordings, break audio into segments with natural pauses.

Pricing

Usage is billed per request based on audio duration. Contact the WaveSpeed team for volume discounts and custom SLAs.