---
license: cc-by-4.0
language: en
tags:
  - speech
  - asr
  - ctc
  - onnx
  - parakeet
  - nemo
  - nvidia
  - vocabulary-boost
base_model: nvidia/parakeet-tdt_ctc-110m
pipeline_tag: automatic-speech-recognition
---

# Parakeet CTC 110M (INT8)

CTC-based speech recognition model for vocabulary-rescored transcription in [Heydict](https://github.com/Entropora/heydict).

## Overview

This is the CTC decoder head of NVIDIA's [parakeet-tdt_ctc-110m](https://huggingface.co/nvidia/parakeet-tdt_ctc-110m), exported to ONNX by [csukuangfj/sherpa-onnx](https://huggingface.co/csukuangfj/sherpa-onnx-nemo-parakeet_tdt_ctc_110m-en-36000) and dynamically quantized to INT8.

It runs as a **companion model** alongside the primary Parakeet TDT transducer. The CTC model's frame-level logits are rescored against the user's custom vocabulary list (domain terms, company names, technical jargon) to improve recognition accuracy for specialized terms.

## Files

| File | Size | Description |
|------|------|-------------|
| `encoder.int8.onnx` | 126 MB | INT8 dynamically quantized CTC encoder |
| `encoder.fp32.onnx` | 437 MB | Original FP32 encoder (for reference/GPU) |
| `tokens.txt` | 10 KB | SentencePiece vocabulary (sherpa-onnx format) |

## Architecture

- **Encoder**: FastConformer (17 layers, 256 dim, 4 heads)
- **Decoder**: CTC (encoder-only, no transducer joiner)
- **Vocabulary**: 1025 SentencePiece tokens
- **Input**: 128-dim log-mel spectrogram (NeMo convention)
- **Output**: Frame-level logits [1, T', 1025]

## Quantization

Dynamic INT8 quantization via `onnxruntime.quantization.quantize_dynamic`. Weights are INT8, activations are quantized at runtime. ~3.5x smaller than FP32 with minimal accuracy loss — suitable for a companion rescoring model.

## License

CC-BY-4.0 (inherited from NVIDIA's original model)