--- license: mit language: - en base_model: juanquivilla/sotto-cleanup-lfm25-350m tags: - speech-to-text - transcript-cleanup - text-correction - asr-post-processing - LFM - LiquidAI - mlx - mlx-5bit pipeline_tag: text-generation --- # SottoASR Transcript Cleanup — LFM2.5-350M MLX 5-bit (v51) [sottoasr.app](https://sottoasr.app) · [Full precision (bf16)](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m) · [MLX 4-bit (smaller)](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m-mlx-4bit) ## Overview MLX 5-bit affine quantization of [juanquivilla/sotto-cleanup-lfm25-350m](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m). Recommended for Apple Silicon — best size/quality trade-off. ## What's new in v51 v51 extends v45 with targeted training data for five failure modes (multi-number sentences, year-context drift, disconnected number lists, within-input duplicates, long-form preservation), each generated programmatically and audited with a Qwen3.6-27B judge. | Metric | v45 | **v51** | |---|---:|---:| | Number accuracy | 95.9% | **95.3%** | | Adversarial benchmark (greedy) | 76% | **86%** | See the [bf16 model card](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m) for the full pipeline and benchmark numbers. ## Quantization Recipe ```bash mlx_lm.convert \ --hf-path juanquivilla/sotto-cleanup-lfm25-350m \ --mlx-path sotto-cleanup-lfm25-350m-mlx-5bit \ -q --q-bits 5 --q-group-size 64 \ --trust-remote-code ``` ## Usage ```python from mlx_lm import load, generate from mlx_lm.sample_utils import make_sampler model, tokenizer = load("juanquivilla/sotto-cleanup-lfm25-350m-mlx-5bit") sampler = make_sampler(temp=0.0) text = "talk about server three sixty" prompt = f"### Input:\n{text}\n\n### Output:\n" output = generate(model, tokenizer, prompt=prompt, max_tokens=512, sampler=sampler) if "###" in output: output = output[:output.index("###")].strip() print(output) ``` ## License MIT