The smallest multilingual Whisper for fast local transcription
Whisper Tiny is OpenAI's 39M-parameter multilingual Whisper checkpoint, the smallest member of
the Whisper family. It uses the standard Whisper encoder-decoder architecture for automatic
speech recognition and speech translation, trained with large-scale weak supervision on 680k
hours of labelled speech. The tiny model trades some accuracy for the lowest footprint and
fastest inference, which suits low-resource devices and latency-sensitive use. This OpenASR
repo repackages the original openai/whisper-tiny weights as .oasr packs that run natively
in the OpenASR runtime with no Python at inference time. For most users the q8_0 build is the
recommended default; q4_k is for the tightest memory budgets and fp16 is for verification or
maximum fidelity.
.oasr packs run with no Python at inference, engineered for CPU and Apple SiliconThese are CLI / local-server examples. The desktop app runs this model without typing a command — see the desktop install path above.
$ openasr pull whisper-tiny:q8 ↓ whisper-tiny.oasr 60.4 MB ✓ verified sha256 $ openasr transcribe meeting.wav --backend native --model-pack ~/.openasr/models/whisper-tiny/q8_0/whisper-tiny-q8_0.oasr ✓ local transcript · 0 bytes sent
$ openasr serve --backend native --model-pack ~/.openasr/models/whisper-tiny/q8_0/whisper-tiny-q8_0.oasr --addr 127.0.0.1:8080 ▶ http://127.0.0.1:8080 · model=whisper-tiny · 0 bytes will leave this host
from openai import OpenAI client = OpenAI(base_url="http://127.0.0.1:8080/v1", api_key="local") audio = open("meeting.wav", "rb") text = client.audio.transcriptions.create(model="whisper-tiny", file=audio)