Blog — OpenASR

Release

OpenASR 0.1.0

The first public release: a local speech-to-text app for macOS over an Apache-2.0 open core. Here's what works today, what's still missing, and where to read the source.

Jul 4, 2026

Deep dive

Who-said-what, the dependency-light way: a pure-Rust late-fusion diarization substrate

VAD and diarization run as late-fusion sibling stages around an untouched ASR core, talking only through time intervals — the project's first pure-Rust forward passes, validated bit-close to their ONNX references.

Jun 6, 2026

Deep dive

Trusting a model you didn't compile: a signed catalog and a fail-closed pull chain

The .oasr pack is pinned, the catalog is signed, and nothing is auto-downloaded — a walk through OpenASR's fail-closed pull chain and where its trust root actually lives.

Jun 4, 2026

Engineering

Batched serve-mode decode: GPU-only throughput that leaves single-stream byte-identical

A default-off OPENASR_SERVE_BATCH path batches concurrent GPU decode into one GEMM per step. Measured 4.28x at N=8 on one GPU with moonshine-tiny — and N=1 stays byte-identical to today's path.

Jun 2, 2026

Deep dive

Moving the KV cache onto the GPU: turning a transfer-bound decoder compute-bound

A profile of one decode on one AMD GPU put the bottleneck on the host-to-GPU KV round-trip, not on compute. Making the KV cache device-resident cut the per-token KV/IO from 80.6 to 1.1 ms/token — scoped to that one GPU and model.

May 30, 2026

Engineering

Benchmarking OpenASR against `whisper.cpp`: one host, one clip, an honest number

How the bench-suite harness measures OpenASR against whisper.cpp on the same model — a committed 7.9% on one host, an honest spread, and a regression guard rather than a standing OSS win.

May 27, 2026

Deep dive

Inside the `.oasr` format: one file for weights, tokenizer, and metadata

Why a model is a single GGUF-backed .oasr file instead of a directory of loose tensors - and why that pack is pinned and catalog-signed, not signed in place.

May 9, 2026

Privacy

Local by default: privacy you can actually audit

Audio is transcribed where it's recorded. A walk through the offline-default, no-auto-download boundary — from explicit openasr pull, to a loopback-only API, to anonymous diarization.

Apr 30, 2026

Engineering

Drop-in compatible: matching the OpenAI transcription API on `localhost`

How openasr serve mirrors a focused slice of the OpenAI transcription HTTP shape on loopback — so you switch a base URL, not your client — while staying honest that it is a subset, not the full API.

Apr 4, 2026

Deep dive

What quantization actually trades: size, speed, and the long tail of quality

fp16, q8_0, q4_k, and Qwen's q3_k on a single clip: where compression is almost free, where a size win isn't a speed win, and why moonshine-tiny's 11.76% WER is the model, not the quantizer.

Mar 22, 2026