Bilingual Chinese + English streaming speech recognition — a compact icefall Zipformer2 transducer
X-ASR-zh-en is a compact bilingual (Chinese + English) streaming speech-recognition model from
GilgameshWind, built with the icefall / k2 recipe as a cache-aware Zipformer2 RNN-T
transducer (a 6-stack, 19-layer Zipformer2 encoder, a stateless RNN-T decoder, and a tanh joiner
over a 5000-token BPE vocabulary, ~0.16B parameters). The same checkpoint serves both low-latency
streaming captions and full-file offline transcription, making it a good fit for on-device
Chinese/English dictation and real-time subtitles. This OpenASR repo repackages the weights as
.oasr packs that run natively in the OpenASR runtime — no Python at inference time, all decoding
local. The q8_0 build is the recommended default (it matched the fp16 transcript bit-for-bit in
OpenASR's verification); q4_k is the smallest build for tight-memory devices and fp16 is for
maximum fidelity or verification.
.oasr packs run with no Python at inference, engineered for peak performance on CPU & GPUThese are CLI / local-server examples. The desktop app runs this model without typing a command — see the desktop install path above.
$ openasr pull xasr-zh-en:q8 ↓ xasr-zh-en.oasr 167.6 MB ✓ verified sha256 $ openasr transcribe meeting.wav --backend native --model-pack ~/.openasr/models/xasr-zh-en/q8_0/xasr-zh-en-q8_0.oasr ✓ local transcript · 0 bytes sent
$ openasr serve --backend native --model-pack ~/.openasr/models/xasr-zh-en/q8_0/xasr-zh-en-q8_0.oasr --addr 127.0.0.1:8080 ▶ http://127.0.0.1:8080 · model=xasr-zh-en · 0 bytes will leave this host
from openai import OpenAI client = OpenAI(base_url="http://127.0.0.1:8080/v1", api_key="local") audio = open("meeting.wav", "rb") text = client.audio.transcriptions.create(model="xasr-zh-en", file=audio)