v36: full-FT GRPO with substantive-deletion-aware reward — filler-free 96.9%, sub-del-15-long 0.64%
Browse files- README.md +16 -21
- config.json +3 -2
- generation_config.json +1 -1
- model.safetensors +2 -2
- tokenizer_config.json +1 -0
README.md
CHANGED
|
@@ -17,39 +17,38 @@ datasets:
|
|
| 17 |
- juanquivilla/sotto-transcript-cleanup
|
| 18 |
---
|
| 19 |
|
| 20 |
-
# SottoASR Transcript Cleanup — LFM2.5-350M MLX 5-bit (
|
| 21 |
|
| 22 |
[sottoasr.app](https://sottoasr.app) · [Full precision (bf16)](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m) · [MLX 4-bit (smaller)](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m-mlx-4bit) · [Training Dataset](https://huggingface.co/datasets/juanquivilla/sotto-transcript-cleanup)
|
| 23 |
|
| 24 |
## Overview
|
| 25 |
|
| 26 |
-
**MLX 5-bit affine quantization** of [juanquivilla/sotto-cleanup-lfm25-350m](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m)
|
| 27 |
|
| 28 |
-
This model powers on-device transcript cleanup in [SottoASR](https://sottoasr.app) — a local, privacy-first speech-to-text application for macOS. It removes filler words, corrects grammar, formats punctuation, handles false starts and self-corrections, and **— new in
|
| 29 |
|
| 30 |
-
## What's new in
|
| 31 |
|
| 32 |
-
|
| 33 |
|
| 34 |
-
| Capability |
|
| 35 |
|---|---|---|
|
| 36 |
-
|
|
| 37 |
-
|
|
| 38 |
-
| ROUGE-L on
|
| 39 |
-
| **Filler-Free rate on standard val set** | 90.3 % | **91.0 %** ⭐ |
|
| 40 |
|
| 41 |
## Key Specs
|
| 42 |
|
| 43 |
| Property | Value |
|
| 44 |
|----------|-------|
|
| 45 |
-
| **Size** | **237 MB** |
|
| 46 |
| **Quantization** | 5-bit affine, group_size=64 |
|
| 47 |
| **Effective bits/weight** | 5.502 |
|
| 48 |
-
| **ROUGE-L (val set)** | ~0.9505 (≈ bf16) |
|
| 49 |
-
| **Paragraph rate (long inputs)** | ~89.5 % |
|
| 50 |
| **Architecture** | Hybrid: 10 conv + 6 GQA attention (354M params) |
|
| 51 |
| **Latency** | ~85 ms average per transcript (M-series) |
|
| 52 |
|
|
|
|
|
|
|
| 53 |
## Quantization Recipe
|
| 54 |
|
| 55 |
```bash
|
|
@@ -93,21 +92,17 @@ For long dictation that may need paragraph formatting, raise `max_tokens` to 102
|
|
| 93 |
| okay so the thing is basically we're running out of disk space | We're running out of disk space. |
|
| 94 |
| uh yes | Yes. |
|
| 95 |
|
| 96 |
-
###
|
| 97 |
-
|
| 98 |
-
Multi-topic input is now restructured into paragraphed prose with `\n\n` breaks at natural topic boundaries. See the [bf16 model card](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m) for a full example.
|
| 99 |
-
|
| 100 |
-
## Benchmark Results
|
| 101 |
|
| 102 |
-
|
| 103 |
|
| 104 |
## All Variants
|
| 105 |
|
| 106 |
| Variant | Size | Use Case |
|
| 107 |
|---------|------|----------|
|
| 108 |
| [Full precision (bf16)](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m) | 676 MB | Training, GPU inference |
|
| 109 |
-
| **[MLX 5-bit (this)](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m-mlx-5bit)** |
|
| 110 |
-
| [MLX 4-bit](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m-mlx-4bit) | 195 MB | Smallest, slight quality trade-off |
|
| 111 |
|
| 112 |
## License
|
| 113 |
|
|
|
|
| 17 |
- juanquivilla/sotto-transcript-cleanup
|
| 18 |
---
|
| 19 |
|
| 20 |
+
# SottoASR Transcript Cleanup — LFM2.5-350M MLX 5-bit (v36 + Preservation)
|
| 21 |
|
| 22 |
[sottoasr.app](https://sottoasr.app) · [Full precision (bf16)](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m) · [MLX 4-bit (smaller)](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m-mlx-4bit) · [Training Dataset](https://huggingface.co/datasets/juanquivilla/sotto-transcript-cleanup)
|
| 23 |
|
| 24 |
## Overview
|
| 25 |
|
| 26 |
+
**MLX 5-bit affine quantization** of [juanquivilla/sotto-cleanup-lfm25-350m](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m). The recommended variant for most Apple Silicon users — best size/quality trade-off.
|
| 27 |
|
| 28 |
+
This model powers on-device transcript cleanup in [SottoASR](https://sottoasr.app) — a local, privacy-first speech-to-text application for macOS. It removes filler words, corrects grammar, formats punctuation, handles false starts and self-corrections, restructures long dictations into paragraph-formatted prose, and **— new in v36 — preserves substantive content reliably even on long inputs**, all locally with zero cloud dependency.
|
| 29 |
|
| 30 |
+
## What's new in v36
|
| 31 |
|
| 32 |
+
v36 fixes the **aggressive-edits failure mode** that earlier checkpoints occasionally exhibited: on long inputs the model would sometimes delete substantive content along with the fillers. v36 is a GRPO **full fine-tune** (all 354M params trainable, no LoRA) with a substantive-deletion-aware reward. Result: high-substantive-deletion incidence on long inputs drops from **3.85% → 0.64%** while filler-free rate climbs from **50.9% → 96.9%**.
|
| 33 |
|
| 34 |
+
| Capability | v23 baseline | **v36 (this model)** |
|
| 35 |
|---|---|---|
|
| 36 |
+
| Filler-Free rate | 50.9 % | **96.9 %** ⭐ |
|
| 37 |
+
| Substantive-deletion >15% on long inputs | 3.85 % | **0.64 %** ⭐ |
|
| 38 |
+
| ROUGE-L F1 on long inputs (>100 words) | 0.9242 | **0.9425** |
|
|
|
|
| 39 |
|
| 40 |
## Key Specs
|
| 41 |
|
| 42 |
| Property | Value |
|
| 43 |
|----------|-------|
|
| 44 |
+
| **Size** | **~237 MB** |
|
| 45 |
| **Quantization** | 5-bit affine, group_size=64 |
|
| 46 |
| **Effective bits/weight** | 5.502 |
|
|
|
|
|
|
|
| 47 |
| **Architecture** | Hybrid: 10 conv + 6 GQA attention (354M params) |
|
| 48 |
| **Latency** | ~85 ms average per transcript (M-series) |
|
| 49 |
|
| 50 |
+
Quality at this quantization tracks the bf16 model closely. See the [bf16 model card](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m) for full benchmark numbers.
|
| 51 |
+
|
| 52 |
## Quantization Recipe
|
| 53 |
|
| 54 |
```bash
|
|
|
|
| 92 |
| okay so the thing is basically we're running out of disk space | We're running out of disk space. |
|
| 93 |
| uh yes | Yes. |
|
| 94 |
|
| 95 |
+
### Paragraph emission on long dictations (inherited from v23)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 96 |
|
| 97 |
+
Multi-topic input is restructured into paragraphed prose with `\n\n` breaks at natural topic boundaries. See the [bf16 model card](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m) for a full example.
|
| 98 |
|
| 99 |
## All Variants
|
| 100 |
|
| 101 |
| Variant | Size | Use Case |
|
| 102 |
|---------|------|----------|
|
| 103 |
| [Full precision (bf16)](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m) | 676 MB | Training, GPU inference |
|
| 104 |
+
| **[MLX 5-bit (this)](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m-mlx-5bit)** | ~237 MB | **Recommended for Apple Silicon** |
|
| 105 |
+
| [MLX 4-bit](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m-mlx-4bit) | ~195 MB | Smallest, slight quality trade-off |
|
| 106 |
|
| 107 |
## License
|
| 108 |
|
config.json
CHANGED
|
@@ -21,6 +21,7 @@
|
|
| 21 |
"eos_token_id": [
|
| 22 |
7
|
| 23 |
],
|
|
|
|
| 24 |
"hidden_size": 1024,
|
| 25 |
"initializer_range": 0.02,
|
| 26 |
"intermediate_size": 6656,
|
|
@@ -64,9 +65,9 @@
|
|
| 64 |
"rope_theta": 1000000.0,
|
| 65 |
"rope_type": "default"
|
| 66 |
},
|
| 67 |
-
"
|
| 68 |
"tie_word_embeddings": true,
|
| 69 |
-
"transformers_version": "5.
|
| 70 |
"use_cache": false,
|
| 71 |
"use_pos_enc": true,
|
| 72 |
"vocab_size": 65536
|
|
|
|
| 21 |
"eos_token_id": [
|
| 22 |
7
|
| 23 |
],
|
| 24 |
+
"full_attn_idxs": null,
|
| 25 |
"hidden_size": 1024,
|
| 26 |
"initializer_range": 0.02,
|
| 27 |
"intermediate_size": 6656,
|
|
|
|
| 65 |
"rope_theta": 1000000.0,
|
| 66 |
"rope_type": "default"
|
| 67 |
},
|
| 68 |
+
"rope_theta": 1000000.0,
|
| 69 |
"tie_word_embeddings": true,
|
| 70 |
+
"transformers_version": "5.6.2",
|
| 71 |
"use_cache": false,
|
| 72 |
"use_pos_enc": true,
|
| 73 |
"vocab_size": 65536
|
generation_config.json
CHANGED
|
@@ -5,5 +5,5 @@
|
|
| 5 |
7
|
| 6 |
],
|
| 7 |
"pad_token_id": 0,
|
| 8 |
-
"transformers_version": "5.
|
| 9 |
}
|
|
|
|
| 5 |
7
|
| 6 |
],
|
| 7 |
"pad_token_id": 0,
|
| 8 |
+
"transformers_version": "5.6.2"
|
| 9 |
}
|
model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8fdff7017e6929cb5d3bb90e3136da4be6bfa0897109286532486dc92d57ab33
|
| 3 |
+
size 243830312
|
tokenizer_config.json
CHANGED
|
@@ -6,6 +6,7 @@
|
|
| 6 |
"extra_special_tokens": [],
|
| 7 |
"is_local": true,
|
| 8 |
"legacy": false,
|
|
|
|
| 9 |
"model_input_names": [
|
| 10 |
"input_ids",
|
| 11 |
"attention_mask"
|
|
|
|
| 6 |
"extra_special_tokens": [],
|
| 7 |
"is_local": true,
|
| 8 |
"legacy": false,
|
| 9 |
+
"local_files_only": false,
|
| 10 |
"model_input_names": [
|
| 11 |
"input_ids",
|
| 12 |
"attention_mask"
|