You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

OmniAudio 150M β€” Kazakh CTC Encoder (v1)

CTC-pretrained audio encoder for Kazakh speech recognition. Part of the OmniAudio v2 pipeline β€” Stage 1 of 2.

Model Details

Parameter Value
Architecture Convolutional + Transformer encoder with CTC head
Audio d_model 512
Attention heads 8
Encoder layers 12
Conv layers 2
Parameters ~67M (encoder only)
Training stage CTC pretrain (Stage 1)
Training data sozkz-asr-mels-kk-v1
Training duration 2 epochs (~62K steps)
Hardware NVIDIA RTX 5090 32GB

Dataset

Trained on β€” ~1M samples, ~2100 hours of Kazakh speech (gated, auto-approve).

Usage

This checkpoint is a CTC pretrained encoder β€” it is the foundation for E2E fine-tuning with a Kazakh LLM decoder.
For full ASR inference, use the E2E model: (coming soon).

Files

File Description
Final epoch weights
Best validation checkpoint
Last training checkpoint

Training Config

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train stukenov/sozkz-core-omniaudio-150m-kk-ctc-v1