OmniAudio 150M β Kazakh CTC Encoder (v1)
CTC-pretrained audio encoder for Kazakh speech recognition. Part of the OmniAudio v2 pipeline β Stage 1 of 2.
Model Details
| Parameter | Value |
|---|---|
| Architecture | Convolutional + Transformer encoder with CTC head |
| Audio d_model | 512 |
| Attention heads | 8 |
| Encoder layers | 12 |
| Conv layers | 2 |
| Parameters | ~67M (encoder only) |
| Training stage | CTC pretrain (Stage 1) |
| Training data | sozkz-asr-mels-kk-v1 |
| Training duration | 2 epochs (~62K steps) |
| Hardware | NVIDIA RTX 5090 32GB |
Dataset
Trained on β ~1M samples, ~2100 hours of Kazakh speech (gated, auto-approve).
Usage
This checkpoint is a CTC pretrained encoder β it is the foundation for E2E fine-tuning with a Kazakh LLM decoder.
For full ASR inference, use the E2E model: (coming soon).
Files
| File | Description |
|---|---|
| Final epoch weights | |
| Best validation checkpoint | |
| Last training checkpoint |
Training Config
License
MIT
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support