juanquivilla commited on
Commit
5f8adb5
·
verified ·
1 Parent(s): a9fa15c

v51: composite=88.68 — see model card for benchmark deltas vs v45

Browse files
Files changed (2) hide show
  1. README.md +12 -65
  2. model.safetensors +1 -1
README.md CHANGED
@@ -12,46 +12,29 @@ tags:
12
  - LiquidAI
13
  - mlx
14
  - mlx-5bit
15
- - inverse-text-normalization
16
  pipeline_tag: text-generation
17
- datasets:
18
- - juanquivilla/sotto-transcript-cleanup
19
  ---
20
 
21
- # SottoASR Transcript Cleanup — LFM2.5-350M MLX 5-bit (v45 + Numbers)
22
 
23
- [sottoasr.app](https://sottoasr.app) · [Full precision (bf16)](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m) · [MLX 4-bit (smaller)](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m-mlx-4bit) · [Training Dataset](https://huggingface.co/datasets/juanquivilla/sotto-transcript-cleanup)
24
 
25
  ## Overview
26
 
27
- **MLX 5-bit affine quantization** of [juanquivilla/sotto-cleanup-lfm25-350m](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m). The recommended variant for most Apple Silicon users — best size/quality trade-off.
28
 
29
- This model powers on-device transcript cleanup in [SottoASR](https://sottoasr.app) — a local, privacy-first speech-to-text application for macOS. It removes filler words, corrects grammar, formats punctuation, handles false starts and self-corrections, restructures long dictations into paragraph-formatted prose, preserves substantive content reliably even on long inputs, and **— new in v45 — converts spoken-form numbers to digit form correctly** (inverse text normalization), all locally with zero cloud dependency.
30
 
31
- ## What's new in v45
 
 
32
 
33
- v45 adds **inverse text normalization (ITN)**: when users dictate compound spoken numbers like "talk about server three sixty," v45 reliably produces "Talk about server 360." Earlier versions (v36 and prior) either preserved the spoken form (looks unprofessional) or attempted the conversion incorrectly. v45 covers all common ITN categories — compound numbers, hundreds, four-digit years, times, decimals, percentages, currency, ordinals, dates — while continuing to preserve cardinals in idioms ("I'll be there in five" stays as written).
34
-
35
- | Capability | v36 (preservation) | **v45 (this model)** |
36
  |---|---:|---:|
37
- | Number accuracy (171-sample stratified set) | 12.9 % | **95.9 %** |
38
- | Filler-Free rate | 96.9 % | **97.0 %** |
39
- | Substantive-deletion >15% on long inputs† | 13.3 % | 13.7 % (~tied) |
40
- | Word retention median | 0.884 | 0.922 |
41
-
42
- † Measured on all 241 long inputs (>100 words) from `data_v23_paragraphs/val.jsonl` — a stricter metric than v36's published 0.64 % (which was on a 350-sample mix). v45 inherits v36's deletion-aware behavior on the same eval.
43
-
44
- ## Key Specs
45
-
46
- | Property | Value |
47
- |----------|-------|
48
- | **Size** | **~237 MB** |
49
- | **Quantization** | 5-bit affine, group_size=64 |
50
- | **Effective bits/weight** | 5.502 |
51
- | **Architecture** | Hybrid: 10 conv + 6 GQA attention (354M params) |
52
- | **Latency** | ~85 ms average per transcript (M-series) |
53
 
54
- Quality at this quantization tracks the bf16 model closely. See the [bf16 model card](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m) for full benchmark numbers, training pipeline, and reward shape.
55
 
56
  ## Quantization Recipe
57
 
@@ -65,57 +48,21 @@ mlx_lm.convert \
65
 
66
  ## Usage
67
 
68
- ### Python (mlx_lm)
69
-
70
  ```python
71
  from mlx_lm import load, generate
72
  from mlx_lm.sample_utils import make_sampler
73
 
74
  model, tokenizer = load("juanquivilla/sotto-cleanup-lfm25-350m-mlx-5bit")
75
- sampler = make_sampler(temp=0.0) # greedy
76
 
77
  text = "talk about server three sixty"
78
  prompt = f"### Input:\n{text}\n\n### Output:\n"
79
-
80
  output = generate(model, tokenizer, prompt=prompt, max_tokens=512, sampler=sampler)
81
  if "###" in output:
82
  output = output[:output.index("###")].strip()
83
  print(output)
84
- # → "Talk about server 360."
85
  ```
86
 
87
- For long dictation that may need paragraph formatting, raise `max_tokens` to 1024–2048.
88
-
89
- ## What It Does
90
-
91
- | Input (raw ASR) | Output (cleaned) |
92
- |-----------------|------------------|
93
- | so uh basically we need to fix the deployment pipeline | We need to fix the deployment pipeline. |
94
- | talk about server three sixty | Talk about server 360. |
95
- | schedule it for three fifteen pm | Schedule it for 3:15 PM. |
96
- | we hit ninety eight percent uptime last month | We hit 98 % uptime last month. |
97
- | transfer fifty dollars to billing | Transfer $50 to billing. |
98
- | i'll be there in five | I'll be there in five. |
99
- | we run twenty four seven | We run 24/7. |
100
-
101
- ### Paragraph emission on long dictations (inherited from v23)
102
-
103
- Multi-topic input is restructured into paragraphed prose with `\n\n` breaks at natural topic boundaries. See the [bf16 model card](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m) for a full example.
104
-
105
- ## All Variants
106
-
107
- | Variant | Size | Use Case |
108
- |---------|------|----------|
109
- | [Full precision (bf16)](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m) | 676 MB | Training, GPU inference |
110
- | **[MLX 5-bit (this)](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m-mlx-5bit)** | ~237 MB | **Recommended for Apple Silicon** |
111
- | [MLX 4-bit](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m-mlx-4bit) | ~195 MB | Smallest, slight quality trade-off |
112
-
113
  ## License
114
 
115
  MIT
116
-
117
- ## Links
118
-
119
- - **Application:** [sottoasr.app](https://sottoasr.app)
120
- - **Source:** [github.com/juanqui/sottoasr](https://github.com/juanqui/sottoasr)
121
- - **Dataset:** [juanquivilla/sotto-transcript-cleanup](https://huggingface.co/datasets/juanquivilla/sotto-transcript-cleanup)
 
12
  - LiquidAI
13
  - mlx
14
  - mlx-5bit
 
15
  pipeline_tag: text-generation
 
 
16
  ---
17
 
18
+ # SottoASR Transcript Cleanup — LFM2.5-350M MLX 5-bit (v51)
19
 
20
+ [sottoasr.app](https://sottoasr.app) · [Full precision (bf16)](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m) · [MLX 4-bit (smaller)](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m-mlx-4bit)
21
 
22
  ## Overview
23
 
24
+ MLX 5-bit affine quantization of [juanquivilla/sotto-cleanup-lfm25-350m](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m). Recommended for Apple Silicon — best size/quality trade-off.
25
 
26
+ ## What's new in v51
27
 
28
+ v51 extends v45 with targeted training data for five failure modes (multi-number sentences,
29
+ year-context drift, disconnected number lists, within-input duplicates, long-form preservation),
30
+ each generated programmatically and audited with a Qwen3.6-27B judge.
31
 
32
+ | Metric | v45 | **v51** |
 
 
33
  |---|---:|---:|
34
+ | Number accuracy | 95.9% | **95.3%** |
35
+ | Adversarial benchmark (greedy) | 76% | **86%** |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
 
37
+ See the [bf16 model card](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m) for the full pipeline and benchmark numbers.
38
 
39
  ## Quantization Recipe
40
 
 
48
 
49
  ## Usage
50
 
 
 
51
  ```python
52
  from mlx_lm import load, generate
53
  from mlx_lm.sample_utils import make_sampler
54
 
55
  model, tokenizer = load("juanquivilla/sotto-cleanup-lfm25-350m-mlx-5bit")
56
+ sampler = make_sampler(temp=0.0)
57
 
58
  text = "talk about server three sixty"
59
  prompt = f"### Input:\n{text}\n\n### Output:\n"
 
60
  output = generate(model, tokenizer, prompt=prompt, max_tokens=512, sampler=sampler)
61
  if "###" in output:
62
  output = output[:output.index("###")].strip()
63
  print(output)
 
64
  ```
65
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
66
  ## License
67
 
68
  MIT
 
 
 
 
 
 
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:07ae4ba3889eaf4e3d959b8f9f8ad5ac6f09a03a5bb9b2a20ec3165beaf91803
3
  size 243830312
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:026d3fc8bc59e435528aa1124b581c17e66582064ee65a6c03045165d437dc30
3
  size 243830312