Whisper Large v3 Turbo

Fast multilingual Whisper built from pruned large-v3

MultilingualMIT
Desktop app Open the Models screen and click install.
CLI
$ openasr pull whisper-large-v3-turbo:q8
Download .oasr

Overview

Whisper Large v3 Turbo is OpenAI's faster variant of Whisper large-v3: it keeps the same Whisper architecture and multilingual speech-recognition/translation interface, but reduces the decoder depth from 32 layers to 4. The upstream card describes the result as much faster with only a minor quality trade-off, while retaining Whisper's broad zero-shot behavior from training on more than five million hours of labeled audio. This OpenASR repo repackages the original openai/whisper-large-v3-turbo weights as .oasr packs that run natively in the OpenASR runtime with no Python at inference time. For most users the q8_0 build is the recommended default; q4_k is for tighter memory budgets and fp16 is for verification or maximum fidelity.

Highlights

  • Turbo decoder — prunes Whisper large-v3's decoder from 32 layers to 4 for much faster generation
  • 🌍 Multilingual ASR — transcribes many languages and can translate speech to English
  • 🎙️ Zero-shot robustness — inherits Whisper's large-scale weak-supervision training across noisy domains
  • 🦀 Native in OpenASR.oasr packs run with no Python at inference, engineered for peak performance on CPU & GPU

Tags

Pull stringSizeQuantJFK ΔWER
whisper-large-v3-turbo:fp16 1.5 GB fp16 0%
whisper-large-v3-turbo:q8default 888.2 MB q8_0 0%
whisper-large-v3-turbo:q4 538.2 MB q4_k 0%

Usage

These are CLI / local-server examples. The desktop app runs this model without typing a command — see the desktop install path above.

bash · transcribe a file
$ openasr pull whisper-large-v3-turbo:q8
↓ whisper-large-v3-turbo.oasr  888.2 MB  ✓ verified sha256
$ openasr transcribe meeting.wav --backend native --model-pack ~/.openasr/models/whisper-large-v3-turbo/q8_0/whisper-large-v3-turbo-q8_0.oasr
✓ local transcript · 0 bytes sent
bash · serve a local API
$ openasr serve --backend native --model-pack ~/.openasr/models/whisper-large-v3-turbo/q8_0/whisper-large-v3-turbo-q8_0.oasr --addr 127.0.0.1:8080
▶ http://127.0.0.1:8080 · model=whisper-large-v3-turbo · 0 bytes will leave this host
python · client.py
from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:8080/v1", api_key="local")
audio = open("meeting.wav", "rb")
text = client.audio.transcriptions.create(model="whisper-large-v3-turbo", file=audio)

Other models