Parakeet CTC 110M (INT8)
CTC-based speech recognition model for vocabulary-rescored transcription in Heydict.
Overview
This is the CTC decoder head of NVIDIA's parakeet-tdt_ctc-110m, exported to ONNX by csukuangfj/sherpa-onnx and dynamically quantized to INT8.
It runs as a companion model alongside the primary Parakeet TDT transducer. The CTC model's frame-level logits are rescored against the user's custom vocabulary list (domain terms, company names, technical jargon) to improve recognition accuracy for specialized terms.
Files
| File | Size | Description |
|---|---|---|
encoder.int8.onnx |
126 MB | INT8 dynamically quantized CTC encoder |
encoder.fp32.onnx |
437 MB | Original FP32 encoder (for reference/GPU) |
tokens.txt |
10 KB | SentencePiece vocabulary (sherpa-onnx format) |
Architecture
- Encoder: FastConformer (17 layers, 256 dim, 4 heads)
- Decoder: CTC (encoder-only, no transducer joiner)
- Vocabulary: 1025 SentencePiece tokens
- Input: 128-dim log-mel spectrogram (NeMo convention)
- Output: Frame-level logits [1, T', 1025]
Quantization
Dynamic INT8 quantization via onnxruntime.quantization.quantize_dynamic. Weights are INT8, activations are quantized at runtime. ~3.5x smaller than FP32 with minimal accuracy loss — suitable for a companion rescoring model.
License
CC-BY-4.0 (inherited from NVIDIA's original model)
- Downloads last month
- -
Model tree for entropora/parakeet-ctc-110m-int8
Base model
nvidia/parakeet-tdt_ctc-110m