OpenASR 0.1.0
The first public release: a local speech-to-text app for macOS over an Apache-2.0 open core. Here's what works today, what's still missing, and where to read the source.
Release notes, engineering deep-dives, and field reports on running speech recognition entirely on your own hardware.
The first public release: a local speech-to-text app for macOS over an Apache-2.0 open core. Here's what works today, what's still missing, and where to read the source.
VAD and diarization run as late-fusion sibling stages around an untouched ASR core, talking only through time intervals — the project's first pure-Rust forward passes, validated bit-close to their ONNX references.
The .oasr pack is pinned, the catalog is signed, and nothing is auto-downloaded — a walk through OpenASR's fail-closed pull chain and where its trust root actually lives.
A default-off OPENASR_SERVE_BATCH path batches concurrent GPU decode into one GEMM per step. Measured 4.28x at N=8 on one GPU with moonshine-tiny — and N=1 stays byte-identical to today's path.
A profile of one decode on one AMD GPU put the bottleneck on the host-to-GPU KV round-trip, not on compute. Making the KV cache device-resident cut the per-token KV/IO from 80.6 to 1.1 ms/token — scoped to that one GPU and model.
whisper.cpp: one host, one clip, an honest numberHow the bench-suite harness measures OpenASR against whisper.cpp on the same model — a committed 7.9% on one host, an honest spread, and a regression guard rather than a standing OSS win.
.oasr format: one file for weights, tokenizer, and metadataWhy a model is a single GGUF-backed .oasr file instead of a directory of loose tensors - and why that pack is pinned and catalog-signed, not signed in place.
Audio is transcribed where it's recorded. A walk through the offline-default, no-auto-download boundary — from explicit openasr pull, to a loopback-only API, to anonymous diarization.
localhostHow openasr serve mirrors a focused slice of the OpenAI transcription HTTP shape on loopback — so you switch a base URL, not your client — while staying honest that it is a subset, not the full API.
fp16, q8_0, q4_k, and Qwen's q3_k on a single clip: where compression is almost free, where a size win isn't a speed win, and why moonshine-tiny's 11.76% WER is the model, not the quantizer.