You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

OmniAudio 150M — Kazakh CTC Encoder (v1)

CTC-pretrained audio encoder for Kazakh speech recognition. Part of the OmniAudio v2 pipeline — Stage 1 of 2.

Model Details

Parameter	Value
Architecture	Convolutional + Transformer encoder with CTC head
Audio d_model	512
Attention heads	8
Encoder layers	12
Conv layers	2
Parameters	~67M (encoder only)
Training stage	CTC pretrain (Stage 1)
Training data	sozkz-asr-mels-kk-v1
Training duration	2 epochs (~62K steps)
Hardware	NVIDIA RTX 5090 32GB

Dataset

Trained on — ~1M samples, ~2100 hours of Kazakh speech (gated, auto-approve).

Usage

This checkpoint is a CTC pretrained encoder — it is the foundation for E2E fine-tuning with a Kazakh LLM decoder.
For full ASR inference, use the E2E model: (coming soon).

Files

File	Description
	Final epoch weights
	Best validation checkpoint
	Last training checkpoint

Training Config

License

MIT

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

stukenov
/

sozkz-core-omniaudio-150m-kk-ctc-v1