X-ASR zh-en

Bilingual Chinese + English streaming speech recognition — a compact icefall Zipformer2 transducer

MultilingualApache-2.0
Desktop app Open the Models screen and click install.
CLI
$ openasr pull xasr-zh-en:q8
Download .oasr

Overview

X-ASR-zh-en is a compact bilingual (Chinese + English) streaming speech-recognition model from GilgameshWind, built with the icefall / k2 recipe as a cache-aware Zipformer2 RNN-T transducer (a 6-stack, 19-layer Zipformer2 encoder, a stateless RNN-T decoder, and a tanh joiner over a 5000-token BPE vocabulary, ~0.16B parameters). The same checkpoint serves both low-latency streaming captions and full-file offline transcription, making it a good fit for on-device Chinese/English dictation and real-time subtitles. This OpenASR repo repackages the weights as .oasr packs that run natively in the OpenASR runtime — no Python at inference time, all decoding local. The q8_0 build is the recommended default (it matched the fp16 transcript bit-for-bit in OpenASR's verification); q4_k is the smallest build for tight-memory devices and fp16 is for maximum fidelity or verification.

Highlights

  • 🇨🇳🇬🇧 Chinese + English — one bilingual checkpoint for zh/en speech, including code-switched audio
  • Streaming-first, offline-capable — a cache-aware streaming Zipformer2 transducer for low-latency captions that also runs full-file offline transcription
  • 🪶 Compact ~0.16B — a 6-stack Zipformer2 encoder + stateless RNN-T decoder + tanh joiner over a 5000-token BPE vocab, light enough for on-device CPU
  • 🦀 Native in OpenASR.oasr packs run with no Python at inference, engineered for peak performance on CPU & GPU

Tags

Pull stringSizeQuantJFK ΔWER
xasr-zh-en:fp16 300.3 MB fp16 0%
xasr-zh-en:q8default 167.6 MB q8_0 0%
xasr-zh-en:q4 106.8 MB q4_k 0%

Usage

These are CLI / local-server examples. The desktop app runs this model without typing a command — see the desktop install path above.

bash · transcribe a file
$ openasr pull xasr-zh-en:q8
↓ xasr-zh-en.oasr  167.6 MB  ✓ verified sha256
$ openasr transcribe meeting.wav --backend native --model-pack ~/.openasr/models/xasr-zh-en/q8_0/xasr-zh-en-q8_0.oasr
✓ local transcript · 0 bytes sent
bash · serve a local API
$ openasr serve --backend native --model-pack ~/.openasr/models/xasr-zh-en/q8_0/xasr-zh-en-q8_0.oasr --addr 127.0.0.1:8080
▶ http://127.0.0.1:8080 · model=xasr-zh-en · 0 bytes will leave this host
python · client.py
from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:8080/v1", api_key="local")
audio = open("meeting.wav", "rb")
text = client.audio.transcriptions.create(model="xasr-zh-en", file=audio)

Other models