State-of-the-art multilingual speech recognition across 52 languages & dialects
Qwen3-ASR-1.7B is a multilingual automatic speech recognition model (~2B parameters,
~1.7B active, BF16) from Alibaba's Qwen3-ASR family that transcribes speech while
identifying the spoken language across 30 languages, 22 Chinese dialects, and a range
of regional English accents — and it holds up on hard audio including singing voice and
songs over background music. A single unified checkpoint serves both offline and
real-time streaming transcription, with word-level timestamps available via the
companion Qwen3-ForcedAligner-0.6B; the Qwen team reports state-of-the-art quality among
open-source ASR models and accuracy competitive with commercial APIs. This OpenASR repo
repackages the original Qwen/Qwen3-ASR-1.7B weights as .oasr packs that run natively in
the OpenASR runtime with no Python at inference time. For most users the q8_0 build is the
recommended default — near-reference accuracy at roughly half the footprint — while q4_k
suits tight-memory deployments and fp16 is reserved for verification or maximum fidelity.
.oasr packs run with no Python at inference, engineered for peak performance on CPU & GPUThese are CLI / local-server examples. The desktop app runs this model without typing a command — see the desktop install path above.
$ openasr pull qwen3-asr-1.7b:q8 ↓ qwen3-asr-1.7b.oasr 2.3 GB ✓ verified sha256 $ openasr transcribe meeting.wav --backend native --model-pack ~/.openasr/models/qwen3-asr-1.7b/q8_0/qwen3-asr-1.7b-q8_0.oasr ✓ local transcript · 0 bytes sent
$ openasr serve --backend native --model-pack ~/.openasr/models/qwen3-asr-1.7b/q8_0/qwen3-asr-1.7b-q8_0.oasr --addr 127.0.0.1:8080 ▶ http://127.0.0.1:8080 · model=qwen3-asr-1.7b · 0 bytes will leave this host
from openai import OpenAI client = OpenAI(base_url="http://127.0.0.1:8080/v1", api_key="local") audio = open("meeting.wav", "rb") text = client.audio.transcriptions.create(model="qwen3-asr-1.7b", file=audio)