juanquivilla
/

sotto-cleanup-lfm25-350m-mlx-5bit

@@ -12,46 +12,29 @@ tags:
 - LiquidAI
 - mlx
 - mlx-5bit
-- inverse-text-normalization
 pipeline_tag: text-generation
-datasets:
-- juanquivilla/sotto-transcript-cleanup
 ---
-# SottoASR Transcript Cleanup — LFM2.5-350M MLX 5-bit (v45 + Numbers)
-[sottoasr.app](https://sottoasr.app) · [Full precision (bf16)](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m) · [MLX 4-bit (smaller)](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m-mlx-4bit) · [Training Dataset](https://huggingface.co/datasets/juanquivilla/sotto-transcript-cleanup)
 ## Overview
-**MLX 5-bit affine quantization** of [juanquivilla/sotto-cleanup-lfm25-350m](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m). The recommended variant for most Apple Silicon users — best size/quality trade-off.
-This model powers on-device transcript cleanup in [SottoASR](https://sottoasr.app) — a local, privacy-first speech-to-text application for macOS. It removes filler words, corrects grammar, formats punctuation, handles false starts and self-corrections, restructures long dictations into paragraph-formatted prose, preserves substantive content reliably even on long inputs, and **— new in v45 — converts spoken-form numbers to digit form correctly** (inverse text normalization), all locally with zero cloud dependency.
-## What's new in v45
-v45 adds **inverse text normalization (ITN)**: when users dictate compound spoken numbers like "talk about server three sixty," v45 reliably produces "Talk about server 360." Earlier versions (v36 and prior) either preserved the spoken form (looks unprofessional) or attempted the conversion incorrectly. v45 covers all common ITN categories — compound numbers, hundreds, four-digit years, times, decimals, percentages, currency, ordinals, dates — while continuing to preserve cardinals in idioms ("I'll be there in five" stays as written).
-| Capability | v36 (preservation) | **v45 (this model)** |
 |---|---:|---:|
-| Number accuracy (171-sample stratified set) | 12.9 % | **95.9 %** ⭐ |
-| Filler-Free rate | 96.9 % | **97.0 %** |
-| Substantive-deletion >15% on long inputs† | 13.3 % | 13.7 % (~tied) |
-| Word retention median | 0.884 | 0.922 |
-† Measured on all 241 long inputs (>100 words) from `data_v23_paragraphs/val.jsonl` — a stricter metric than v36's published 0.64 % (which was on a 350-sample mix). v45 inherits v36's deletion-aware behavior on the same eval.
-## Key Specs
-| Property | Value |
-|----------|-------|
-| **Size** | **~237 MB** |
-| **Quantization** | 5-bit affine, group_size=64 |
-| **Effective bits/weight** | 5.502 |
-| **Architecture** | Hybrid: 10 conv + 6 GQA attention (354M params) |
-| **Latency** | ~85 ms average per transcript (M-series) |
-Quality at this quantization tracks the bf16 model closely. See the [bf16 model card](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m) for full benchmark numbers, training pipeline, and reward shape.
 ## Quantization Recipe
@@ -65,57 +48,21 @@ mlx_lm.convert \
 ## Usage
-### Python (mlx_lm)
 ```python
 from mlx_lm import load, generate
 from mlx_lm.sample_utils import make_sampler
 model, tokenizer = load("juanquivilla/sotto-cleanup-lfm25-350m-mlx-5bit")
-sampler = make_sampler(temp=0.0)  # greedy
 text = "talk about server three sixty"
 prompt = f"### Input:\n{text}\n\n### Output:\n"
 output = generate(model, tokenizer, prompt=prompt, max_tokens=512, sampler=sampler)
 if "###" in output:
     output = output[:output.index("###")].strip()
 print(output)
-# → "Talk about server 360."
 ```
-For long dictation that may need paragraph formatting, raise `max_tokens` to 1024–2048.
-## What It Does
-| Input (raw ASR) | Output (cleaned) |
-|-----------------|------------------|
-| so uh basically we need to fix the deployment pipeline | We need to fix the deployment pipeline. |
-| talk about server three sixty | Talk about server 360. |
-| schedule it for three fifteen pm | Schedule it for 3:15 PM. |
-| we hit ninety eight percent uptime last month | We hit 98 % uptime last month. |
-| transfer fifty dollars to billing | Transfer $50 to billing. |
-| i'll be there in five | I'll be there in five. |
-| we run twenty four seven | We run 24/7. |
-### Paragraph emission on long dictations (inherited from v23)
-Multi-topic input is restructured into paragraphed prose with `\n\n` breaks at natural topic boundaries. See the [bf16 model card](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m) for a full example.
-## All Variants
-| Variant | Size | Use Case |
-|---------|------|----------|
-| [Full precision (bf16)](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m) | 676 MB | Training, GPU inference |
-| **[MLX 5-bit (this)](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m-mlx-5bit)** | ~237 MB | **Recommended for Apple Silicon** |
-| [MLX 4-bit](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m-mlx-4bit) | ~195 MB | Smallest, slight quality trade-off |
 ## License
 MIT
-## Links
-- **Application:** [sottoasr.app](https://sottoasr.app)
-- **Source:** [github.com/juanqui/sottoasr](https://github.com/juanqui/sottoasr)
-- **Dataset:** [juanquivilla/sotto-transcript-cleanup](https://huggingface.co/datasets/juanquivilla/sotto-transcript-cleanup)

 - LiquidAI
 - mlx
 - mlx-5bit
 pipeline_tag: text-generation
 ---
+# SottoASR Transcript Cleanup — LFM2.5-350M MLX 5-bit (v51)
+[sottoasr.app](https://sottoasr.app) · [Full precision (bf16)](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m) · [MLX 4-bit (smaller)](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m-mlx-4bit)
 ## Overview
+MLX 5-bit affine quantization of [juanquivilla/sotto-cleanup-lfm25-350m](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m). Recommended for Apple Silicon — best size/quality trade-off.
+## What's new in v51
+v51 extends v45 with targeted training data for five failure modes (multi-number sentences,
+year-context drift, disconnected number lists, within-input duplicates, long-form preservation),
+each generated programmatically and audited with a Qwen3.6-27B judge.
+| Metric | v45 | **v51** |
 |---|---:|---:|
+| Number accuracy | 95.9% | **95.3%** |
+| Adversarial benchmark (greedy) | 76% | **86%** |
+See the [bf16 model card](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m) for the full pipeline and benchmark numbers.
 ## Quantization Recipe
 ## Usage
 ```python
 from mlx_lm import load, generate
 from mlx_lm.sample_utils import make_sampler
 model, tokenizer = load("juanquivilla/sotto-cleanup-lfm25-350m-mlx-5bit")
+sampler = make_sampler(temp=0.0)
 text = "talk about server three sixty"
 prompt = f"### Input:\n{text}\n\n### Output:\n"
 output = generate(model, tokenizer, prompt=prompt, max_tokens=512, sampler=sampler)
 if "###" in output:
     output = output[:output.index("###")].strip()
 print(output)
 ```
 ## License
 MIT

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:07ae4ba3889eaf4e3d959b8f9f8ad5ac6f09a03a5bb9b2a20ec3165beaf91803
 size 243830312

 version https://git-lfs.github.com/spec/v1
+oid sha256:026d3fc8bc59e435528aa1124b581c17e66582064ee65a6c03045165d437dc30
 size 243830312