Instructions to use littlebearlabs/parakeet-unified-en-0.6b-mlx-int8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use littlebearlabs/parakeet-unified-en-0.6b-mlx-int8 with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir parakeet-unified-en-0.6b-mlx-int8 littlebearlabs/parakeet-unified-en-0.6b-mlx-int8
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
parakeet-unified-en-0.6b-mlx-int8
8-bit affine-quantized MLX weights for NVIDIA's
parakeet-unified-en-0.6b
Cache-Aware FastConformer-RNNT, for the witness MLX C++ engine on Apple Silicon.
Only the linear / projection matmuls are quantized (group size 64, affine): encoder FFN + attention + pointwise convs, the subsampling output projection, the RNNT prediction LSTM + embedding, and the joint network. Conv2d subsampling, depthwise conv, all norms / biases / batch-norm stats, and the relative-position bias vectors stay dense fp32 (the engine reads them directly).
Why int8
The autoregressive RNNT decode is a batch-1, memory-bandwidth-bound GEMV, and
at typical utterance lengths the encoder is partly weight-bandwidth-bound too —
so halving the weight bytes read per step is a latency win on Apple apple9
(M3/M4), not just a footprint win. WER is unchanged from dense.
Measured (M4, 45 LibriSpeech samples / 300s, witness rtf_bench)
| Variant | Size | Offline WER | Offline RTF | Streaming WER | Streaming RTF |
|---|---|---|---|---|---|
| dense fp32 | 2.47 GB | 1.78% | 0.0084 (119x) | 11.35% | 0.0319 (31x) |
| int8 | 0.70 GB | 1.78% | 0.0075 (134x) | 11.35% | 0.0197 (51x) |
(RTF measured with the witness engine's RNNT decoder optimizations enabled; lower RTF is faster. WER is bit-equivalent to dense on this set.)
Use
WITNESS_PARAKEET_UNIFIED_MODEL_DIR=/path/to/this/dir
The witness loader probes config.json for quantization.{bits,group_size}
and routes the packed weights through quantized_matmul automatically.
Produced by crates/mlx-parakeet/scripts/quantize_parakeet_unified.py --bits 8.
- Downloads last month
- 38
Quantized
Model tree for littlebearlabs/parakeet-unified-en-0.6b-mlx-int8
Base model
nvidia/parakeet-unified-en-0.6b