X-ASR zh-en — OpenASR Models

X-ASR zh-en

Bilingual Chinese + English streaming speech recognition — a compact icefall Zipformer2 transducer

MultilingualApache-2.0

Desktop app Open the Models screen and click install.

CLI

$ openasr pull xasr-zh-en:q8

Overview

X-ASR-zh-en is a compact bilingual (Chinese + English) streaming speech-recognition model from GilgameshWind, built with the icefall / k2 recipe as a cache-aware Zipformer2 RNN-T transducer (a 6-stack, 19-layer Zipformer2 encoder, a stateless RNN-T decoder, and a tanh joiner over a 5000-token BPE vocabulary, ~0.16B parameters). The same checkpoint serves both low-latency streaming captions and full-file offline transcription, making it a good fit for on-device Chinese/English dictation and real-time subtitles. This OpenASR repo repackages the weights as .oasr packs that run natively in the OpenASR runtime — no Python at inference time, all decoding local. The q8_0 build is the recommended default (it matched the fp16 transcript bit-for-bit in OpenASR's verification); q4_k is the smallest build for tight-memory devices and fp16 is for maximum fidelity or verification.

Highlights

🇨🇳🇬🇧 Chinese + English — one bilingual checkpoint for zh/en speech, including code-switched audio
⚡ Streaming-first, offline-capable — a cache-aware streaming Zipformer2 transducer for low-latency captions that also runs full-file offline transcription
🪶 Compact ~0.16B — a 6-stack Zipformer2 encoder + stateless RNN-T decoder + tanh joiner over a 5000-token BPE vocab, light enough for on-device CPU
🦀 Native in OpenASR — .oasr packs run with no Python at inference, engineered for peak performance on CPU & GPU

Pull string	Size	Quant	JFK ΔWER
`xasr-zh-en:fp16`	300.3 MB	fp16	0%
`xasr-zh-en:q8`default	167.6 MB	q8_0	0%
`xasr-zh-en:q4`	106.8 MB	q4_k	0%

Usage

These are CLI / local-server examples. The desktop app runs this model without typing a command — see the desktop install path above.

bash · transcribe a file

$ openasr pull xasr-zh-en:q8
↓ xasr-zh-en.oasr  167.6 MB  ✓ verified sha256
$ openasr transcribe meeting.wav --backend native --model-pack ~/.openasr/models/xasr-zh-en/q8_0/xasr-zh-en-q8_0.oasr
✓ local transcript · 0 bytes sent

bash · serve a local API

$ openasr serve --backend native --model-pack ~/.openasr/models/xasr-zh-en/q8_0/xasr-zh-en-q8_0.oasr --addr 127.0.0.1:8080
▶ http://127.0.0.1:8080 · model=xasr-zh-en · 0 bytes will leave this host

python · client.py

from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:8080/v1", api_key="local")
audio = open("meeting.wav", "rb")
text = client.audio.transcriptions.create(model="xasr-zh-en", file=audio)

Overview

Highlights

Tags

Usage

Other models