juanquivilla's picture
v51: composite=88.68 — see model card for benchmark deltas vs v45
5f8adb5 verified
|
raw
history blame
1.98 kB
metadata
license: mit
language:
  - en
base_model: juanquivilla/sotto-cleanup-lfm25-350m
tags:
  - speech-to-text
  - transcript-cleanup
  - text-correction
  - asr-post-processing
  - LFM
  - LiquidAI
  - mlx
  - mlx-5bit
pipeline_tag: text-generation

SottoASR Transcript Cleanup — LFM2.5-350M MLX 5-bit (v51)

sottoasr.app · Full precision (bf16) · MLX 4-bit (smaller)

Overview

MLX 5-bit affine quantization of juanquivilla/sotto-cleanup-lfm25-350m. Recommended for Apple Silicon — best size/quality trade-off.

What's new in v51

v51 extends v45 with targeted training data for five failure modes (multi-number sentences, year-context drift, disconnected number lists, within-input duplicates, long-form preservation), each generated programmatically and audited with a Qwen3.6-27B judge.

Metric v45 v51
Number accuracy 95.9% 95.3%
Adversarial benchmark (greedy) 76% 86%

See the bf16 model card for the full pipeline and benchmark numbers.

Quantization Recipe

mlx_lm.convert \
  --hf-path juanquivilla/sotto-cleanup-lfm25-350m \
  --mlx-path sotto-cleanup-lfm25-350m-mlx-5bit \
  -q --q-bits 5 --q-group-size 64 \
  --trust-remote-code

Usage

from mlx_lm import load, generate
from mlx_lm.sample_utils import make_sampler

model, tokenizer = load("juanquivilla/sotto-cleanup-lfm25-350m-mlx-5bit")
sampler = make_sampler(temp=0.0)

text = "talk about server three sixty"
prompt = f"### Input:\n{text}\n\n### Output:\n"
output = generate(model, tokenizer, prompt=prompt, max_tokens=512, sampler=sampler)
if "###" in output:
    output = output[:output.index("###")].strip()
print(output)

License

MIT