Add model card

5d9e8f5 verified 24 days ago

1.83 kB

license: cc-by-4.0
language: en
tags:
  - speech
  - asr
  - ctc
  - onnx
  - parakeet
  - nemo
  - nvidia
  - vocabulary-boost
base_model: nvidia/parakeet-tdt_ctc-110m
pipeline_tag: automatic-speech-recognition

Parakeet CTC 110M (INT8)

CTC-based speech recognition model for vocabulary-rescored transcription in Heydict.

Overview

This is the CTC decoder head of NVIDIA's parakeet-tdt_ctc-110m, exported to ONNX by csukuangfj/sherpa-onnx and dynamically quantized to INT8.

It runs as a companion model alongside the primary Parakeet TDT transducer. The CTC model's frame-level logits are rescored against the user's custom vocabulary list (domain terms, company names, technical jargon) to improve recognition accuracy for specialized terms.

Files

File	Size	Description
`encoder.int8.onnx`	126 MB	INT8 dynamically quantized CTC encoder
`encoder.fp32.onnx`	437 MB	Original FP32 encoder (for reference/GPU)
`tokens.txt`	10 KB	SentencePiece vocabulary (sherpa-onnx format)

Architecture

Encoder: FastConformer (17 layers, 256 dim, 4 heads)
Decoder: CTC (encoder-only, no transducer joiner)
Vocabulary: 1025 SentencePiece tokens
Input: 128-dim log-mel spectrogram (NeMo convention)
Output: Frame-level logits [1, T', 1025]

Quantization

Dynamic INT8 quantization via onnxruntime.quantization.quantize_dynamic. Weights are INT8, activations are quantized at runtime. ~3.5x smaller than FP32 with minimal accuracy loss — suitable for a companion rescoring model.

License

CC-BY-4.0 (inherited from NVIDIA's original model)