v51: composite=88.68 — see model card for benchmark deltas vs v45
Browse files- README.md +12 -65
- model.safetensors +1 -1
README.md
CHANGED
|
@@ -12,46 +12,29 @@ tags:
|
|
| 12 |
- LiquidAI
|
| 13 |
- mlx
|
| 14 |
- mlx-5bit
|
| 15 |
-
- inverse-text-normalization
|
| 16 |
pipeline_tag: text-generation
|
| 17 |
-
datasets:
|
| 18 |
-
- juanquivilla/sotto-transcript-cleanup
|
| 19 |
---
|
| 20 |
|
| 21 |
-
# SottoASR Transcript Cleanup — LFM2.5-350M MLX 5-bit (
|
| 22 |
|
| 23 |
-
[sottoasr.app](https://sottoasr.app) · [Full precision (bf16)](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m) · [MLX 4-bit (smaller)](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m-mlx-4bit)
|
| 24 |
|
| 25 |
## Overview
|
| 26 |
|
| 27 |
-
|
| 28 |
|
| 29 |
-
|
| 30 |
|
| 31 |
-
|
|
|
|
|
|
|
| 32 |
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
| Capability | v36 (preservation) | **v45 (this model)** |
|
| 36 |
|---|---:|---:|
|
| 37 |
-
| Number accuracy
|
| 38 |
-
|
|
| 39 |
-
| Substantive-deletion >15% on long inputs† | 13.3 % | 13.7 % (~tied) |
|
| 40 |
-
| Word retention median | 0.884 | 0.922 |
|
| 41 |
-
|
| 42 |
-
† Measured on all 241 long inputs (>100 words) from `data_v23_paragraphs/val.jsonl` — a stricter metric than v36's published 0.64 % (which was on a 350-sample mix). v45 inherits v36's deletion-aware behavior on the same eval.
|
| 43 |
-
|
| 44 |
-
## Key Specs
|
| 45 |
-
|
| 46 |
-
| Property | Value |
|
| 47 |
-
|----------|-------|
|
| 48 |
-
| **Size** | **~237 MB** |
|
| 49 |
-
| **Quantization** | 5-bit affine, group_size=64 |
|
| 50 |
-
| **Effective bits/weight** | 5.502 |
|
| 51 |
-
| **Architecture** | Hybrid: 10 conv + 6 GQA attention (354M params) |
|
| 52 |
-
| **Latency** | ~85 ms average per transcript (M-series) |
|
| 53 |
|
| 54 |
-
|
| 55 |
|
| 56 |
## Quantization Recipe
|
| 57 |
|
|
@@ -65,57 +48,21 @@ mlx_lm.convert \
|
|
| 65 |
|
| 66 |
## Usage
|
| 67 |
|
| 68 |
-
### Python (mlx_lm)
|
| 69 |
-
|
| 70 |
```python
|
| 71 |
from mlx_lm import load, generate
|
| 72 |
from mlx_lm.sample_utils import make_sampler
|
| 73 |
|
| 74 |
model, tokenizer = load("juanquivilla/sotto-cleanup-lfm25-350m-mlx-5bit")
|
| 75 |
-
sampler = make_sampler(temp=0.0)
|
| 76 |
|
| 77 |
text = "talk about server three sixty"
|
| 78 |
prompt = f"### Input:\n{text}\n\n### Output:\n"
|
| 79 |
-
|
| 80 |
output = generate(model, tokenizer, prompt=prompt, max_tokens=512, sampler=sampler)
|
| 81 |
if "###" in output:
|
| 82 |
output = output[:output.index("###")].strip()
|
| 83 |
print(output)
|
| 84 |
-
# → "Talk about server 360."
|
| 85 |
```
|
| 86 |
|
| 87 |
-
For long dictation that may need paragraph formatting, raise `max_tokens` to 1024–2048.
|
| 88 |
-
|
| 89 |
-
## What It Does
|
| 90 |
-
|
| 91 |
-
| Input (raw ASR) | Output (cleaned) |
|
| 92 |
-
|-----------------|------------------|
|
| 93 |
-
| so uh basically we need to fix the deployment pipeline | We need to fix the deployment pipeline. |
|
| 94 |
-
| talk about server three sixty | Talk about server 360. |
|
| 95 |
-
| schedule it for three fifteen pm | Schedule it for 3:15 PM. |
|
| 96 |
-
| we hit ninety eight percent uptime last month | We hit 98 % uptime last month. |
|
| 97 |
-
| transfer fifty dollars to billing | Transfer $50 to billing. |
|
| 98 |
-
| i'll be there in five | I'll be there in five. |
|
| 99 |
-
| we run twenty four seven | We run 24/7. |
|
| 100 |
-
|
| 101 |
-
### Paragraph emission on long dictations (inherited from v23)
|
| 102 |
-
|
| 103 |
-
Multi-topic input is restructured into paragraphed prose with `\n\n` breaks at natural topic boundaries. See the [bf16 model card](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m) for a full example.
|
| 104 |
-
|
| 105 |
-
## All Variants
|
| 106 |
-
|
| 107 |
-
| Variant | Size | Use Case |
|
| 108 |
-
|---------|------|----------|
|
| 109 |
-
| [Full precision (bf16)](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m) | 676 MB | Training, GPU inference |
|
| 110 |
-
| **[MLX 5-bit (this)](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m-mlx-5bit)** | ~237 MB | **Recommended for Apple Silicon** |
|
| 111 |
-
| [MLX 4-bit](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m-mlx-4bit) | ~195 MB | Smallest, slight quality trade-off |
|
| 112 |
-
|
| 113 |
## License
|
| 114 |
|
| 115 |
MIT
|
| 116 |
-
|
| 117 |
-
## Links
|
| 118 |
-
|
| 119 |
-
- **Application:** [sottoasr.app](https://sottoasr.app)
|
| 120 |
-
- **Source:** [github.com/juanqui/sottoasr](https://github.com/juanqui/sottoasr)
|
| 121 |
-
- **Dataset:** [juanquivilla/sotto-transcript-cleanup](https://huggingface.co/datasets/juanquivilla/sotto-transcript-cleanup)
|
|
|
|
| 12 |
- LiquidAI
|
| 13 |
- mlx
|
| 14 |
- mlx-5bit
|
|
|
|
| 15 |
pipeline_tag: text-generation
|
|
|
|
|
|
|
| 16 |
---
|
| 17 |
|
| 18 |
+
# SottoASR Transcript Cleanup — LFM2.5-350M MLX 5-bit (v51)
|
| 19 |
|
| 20 |
+
[sottoasr.app](https://sottoasr.app) · [Full precision (bf16)](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m) · [MLX 4-bit (smaller)](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m-mlx-4bit)
|
| 21 |
|
| 22 |
## Overview
|
| 23 |
|
| 24 |
+
MLX 5-bit affine quantization of [juanquivilla/sotto-cleanup-lfm25-350m](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m). Recommended for Apple Silicon — best size/quality trade-off.
|
| 25 |
|
| 26 |
+
## What's new in v51
|
| 27 |
|
| 28 |
+
v51 extends v45 with targeted training data for five failure modes (multi-number sentences,
|
| 29 |
+
year-context drift, disconnected number lists, within-input duplicates, long-form preservation),
|
| 30 |
+
each generated programmatically and audited with a Qwen3.6-27B judge.
|
| 31 |
|
| 32 |
+
| Metric | v45 | **v51** |
|
|
|
|
|
|
|
| 33 |
|---|---:|---:|
|
| 34 |
+
| Number accuracy | 95.9% | **95.3%** |
|
| 35 |
+
| Adversarial benchmark (greedy) | 76% | **86%** |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
|
| 37 |
+
See the [bf16 model card](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m) for the full pipeline and benchmark numbers.
|
| 38 |
|
| 39 |
## Quantization Recipe
|
| 40 |
|
|
|
|
| 48 |
|
| 49 |
## Usage
|
| 50 |
|
|
|
|
|
|
|
| 51 |
```python
|
| 52 |
from mlx_lm import load, generate
|
| 53 |
from mlx_lm.sample_utils import make_sampler
|
| 54 |
|
| 55 |
model, tokenizer = load("juanquivilla/sotto-cleanup-lfm25-350m-mlx-5bit")
|
| 56 |
+
sampler = make_sampler(temp=0.0)
|
| 57 |
|
| 58 |
text = "talk about server three sixty"
|
| 59 |
prompt = f"### Input:\n{text}\n\n### Output:\n"
|
|
|
|
| 60 |
output = generate(model, tokenizer, prompt=prompt, max_tokens=512, sampler=sampler)
|
| 61 |
if "###" in output:
|
| 62 |
output = output[:output.index("###")].strip()
|
| 63 |
print(output)
|
|
|
|
| 64 |
```
|
| 65 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 66 |
## License
|
| 67 |
|
| 68 |
MIT
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 243830312
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:026d3fc8bc59e435528aa1124b581c17e66582064ee65a6c03045165d437dc30
|
| 3 |
size 243830312
|