Parakeet CTC 110M (INT8)

CTC-based speech recognition model for vocabulary-rescored transcription in Heydict.

Overview

This is the CTC decoder head of NVIDIA's parakeet-tdt_ctc-110m, exported to ONNX by csukuangfj/sherpa-onnx and dynamically quantized to INT8.

It runs as a companion model alongside the primary Parakeet TDT transducer. The CTC model's frame-level logits are rescored against the user's custom vocabulary list (domain terms, company names, technical jargon) to improve recognition accuracy for specialized terms.

Files

File Size Description
encoder.int8.onnx 126 MB INT8 dynamically quantized CTC encoder
encoder.fp32.onnx 437 MB Original FP32 encoder (for reference/GPU)
tokens.txt 10 KB SentencePiece vocabulary (sherpa-onnx format)

Architecture

  • Encoder: FastConformer (17 layers, 256 dim, 4 heads)
  • Decoder: CTC (encoder-only, no transducer joiner)
  • Vocabulary: 1025 SentencePiece tokens
  • Input: 128-dim log-mel spectrogram (NeMo convention)
  • Output: Frame-level logits [1, T', 1025]

Quantization

Dynamic INT8 quantization via onnxruntime.quantization.quantize_dynamic. Weights are INT8, activations are quantized at runtime. ~3.5x smaller than FP32 with minimal accuracy loss — suitable for a companion rescoring model.

License

CC-BY-4.0 (inherited from NVIDIA's original model)

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for entropora/parakeet-ctc-110m-int8

Quantized
(6)
this model