SottoASR Transcript Cleanup — LFM2.5-350M MLX 5-bit (v45 + Numbers)
sottoasr.app · Full precision (bf16) · MLX 4-bit (smaller) · Training Dataset
Overview
MLX 5-bit affine quantization of juanquivilla/sotto-cleanup-lfm25-350m. The recommended variant for most Apple Silicon users — best size/quality trade-off.
This model powers on-device transcript cleanup in SottoASR — a local, privacy-first speech-to-text application for macOS. It removes filler words, corrects grammar, formats punctuation, handles false starts and self-corrections, restructures long dictations into paragraph-formatted prose, preserves substantive content reliably even on long inputs, and — new in v45 — converts spoken-form numbers to digit form correctly (inverse text normalization), all locally with zero cloud dependency.
What's new in v45
v45 adds inverse text normalization (ITN): when users dictate compound spoken numbers like "talk about server three sixty," v45 reliably produces "Talk about server 360." Earlier versions (v36 and prior) either preserved the spoken form (looks unprofessional) or attempted the conversion incorrectly. v45 covers all common ITN categories — compound numbers, hundreds, four-digit years, times, decimals, percentages, currency, ordinals, dates — while continuing to preserve cardinals in idioms ("I'll be there in five" stays as written).
| Capability | v36 (preservation) | v45 (this model) |
|---|---|---|
| Number accuracy (171-sample stratified set) | 12.9 % | 95.9 % ⭐ |
| Filler-Free rate | 96.9 % | 97.0 % |
| Substantive-deletion >15% on long inputs† | 13.3 % | 13.7 % (~tied) |
| Word retention median | 0.884 | 0.922 |
† Measured on all 241 long inputs (>100 words) from data_v23_paragraphs/val.jsonl — a stricter metric than v36's published 0.64 % (which was on a 350-sample mix). v45 inherits v36's deletion-aware behavior on the same eval.
Key Specs
| Property | Value |
|---|---|
| Size | ~237 MB |
| Quantization | 5-bit affine, group_size=64 |
| Effective bits/weight | 5.502 |
| Architecture | Hybrid: 10 conv + 6 GQA attention (354M params) |
| Latency | ~85 ms average per transcript (M-series) |
Quality at this quantization tracks the bf16 model closely. See the bf16 model card for full benchmark numbers, training pipeline, and reward shape.
Quantization Recipe
mlx_lm.convert \
--hf-path juanquivilla/sotto-cleanup-lfm25-350m \
--mlx-path sotto-cleanup-lfm25-350m-mlx-5bit \
-q --q-bits 5 --q-group-size 64 \
--trust-remote-code
Usage
Python (mlx_lm)
from mlx_lm import load, generate
from mlx_lm.sample_utils import make_sampler
model, tokenizer = load("juanquivilla/sotto-cleanup-lfm25-350m-mlx-5bit")
sampler = make_sampler(temp=0.0) # greedy
text = "talk about server three sixty"
prompt = f"### Input:\n{text}\n\n### Output:\n"
output = generate(model, tokenizer, prompt=prompt, max_tokens=512, sampler=sampler)
if "###" in output:
output = output[:output.index("###")].strip()
print(output)
# → "Talk about server 360."
For long dictation that may need paragraph formatting, raise max_tokens to 1024–2048.
What It Does
| Input (raw ASR) | Output (cleaned) |
|---|---|
| so uh basically we need to fix the deployment pipeline | We need to fix the deployment pipeline. |
| talk about server three sixty | Talk about server 360. |
| schedule it for three fifteen pm | Schedule it for 3:15 PM. |
| we hit ninety eight percent uptime last month | We hit 98 % uptime last month. |
| transfer fifty dollars to billing | Transfer $50 to billing. |
| i'll be there in five | I'll be there in five. |
| we run twenty four seven | We run 24/7. |
Paragraph emission on long dictations (inherited from v23)
Multi-topic input is restructured into paragraphed prose with \n\n breaks at natural topic boundaries. See the bf16 model card for a full example.
All Variants
| Variant | Size | Use Case |
|---|---|---|
| Full precision (bf16) | 676 MB | Training, GPU inference |
| MLX 5-bit (this) | ~237 MB | Recommended for Apple Silicon |
| MLX 4-bit | ~195 MB | Smallest, slight quality trade-off |
License
MIT
Links
- Application: sottoasr.app
- Source: github.com/juanqui/sottoasr
- Dataset: juanquivilla/sotto-transcript-cleanup
- Downloads last month
- 634
5-bit
Model tree for juanquivilla/sotto-cleanup-lfm25-350m-mlx-5bit
Base model
LiquidAI/LFM2.5-350M-Base