File size: 1,982 Bytes

fcae46e
0ac2ec8
 
 
 
d3e73f6
0ac2ec8
 
ac41e39
 
4450000
 
ac41e39
3a34af2
0ac2ec8
fcae46e
3a34af2
5f8adb5
3a34af2
5f8adb5
3a34af2
 
 
5f8adb5
3a34af2
5f8adb5
3a34af2
5f8adb5
 
 
3a34af2
5f8adb5
a9fa15c
5f8adb5
 
3a34af2
5f8adb5
74443d8
3a34af2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5f8adb5
3a34af2
a9fa15c
3a34af2

---
license: mit
language:
- en
base_model: juanquivilla/sotto-cleanup-lfm25-350m
tags:
- speech-to-text
- transcript-cleanup
- text-correction
- asr-post-processing
- LFM
- LiquidAI
- mlx
- mlx-5bit
pipeline_tag: text-generation
---

# SottoASR Transcript Cleanup — LFM2.5-350M MLX 5-bit (v51)

[sottoasr.app](https://sottoasr.app) · [Full precision (bf16)](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m) · [MLX 4-bit (smaller)](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m-mlx-4bit)

## Overview

MLX 5-bit affine quantization of [juanquivilla/sotto-cleanup-lfm25-350m](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m). Recommended for Apple Silicon — best size/quality trade-off.

## What's new in v51

v51 extends v45 with targeted training data for five failure modes (multi-number sentences,
year-context drift, disconnected number lists, within-input duplicates, long-form preservation),
each generated programmatically and audited with a Qwen3.6-27B judge.

| Metric | v45 | **v51** |
|---|---:|---:|
| Number accuracy | 95.9% | **95.3%** |
| Adversarial benchmark (greedy) | 76% | **86%** |

See the [bf16 model card](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m) for the full pipeline and benchmark numbers.

## Quantization Recipe

```bash
mlx_lm.convert \
  --hf-path juanquivilla/sotto-cleanup-lfm25-350m \
  --mlx-path sotto-cleanup-lfm25-350m-mlx-5bit \
  -q --q-bits 5 --q-group-size 64 \
  --trust-remote-code
```

## Usage

```python
from mlx_lm import load, generate
from mlx_lm.sample_utils import make_sampler

model, tokenizer = load("juanquivilla/sotto-cleanup-lfm25-350m-mlx-5bit")
sampler = make_sampler(temp=0.0)

text = "talk about server three sixty"
prompt = f"### Input:\n{text}\n\n### Output:\n"
output = generate(model, tokenizer, prompt=prompt, max_tokens=512, sampler=sampler)
if "###" in output:
    output = output[:output.index("###")].strip()
print(output)
```

## License

MIT