whisper-small-sr
Fine-tuned OpenAI Whisper Small.
Output script: this model is intended to produce Serbian Latin only.
- WER on Common Voice 24.0 Serbian test: 6.59%
Model description
Training and evaluation data
This model was fine-tuned on a mixture of publicly available Serbian speech corpora, including:
- Mozilla Common Voice 24.0, evaluated on CV test (sr)
- FLEURS Serbian
- ParlaSpeech-RS (subset of the full dataset)
- Additional Serbian corpora used in the training pipeline
Training procedure
- Epochs: 9
- Batch size: 32 / 20
- Optimizer: AdamW
- LR: 6e-5 with warmup (50 steps) + cosine decay to min_lr = 1e-7
- Mixed precision: bfloat16 (fp32 in the final epoch)
- SpecAugment: frequency + time masking
- Sampling: weighted sampling across datasets
Training results
| Epoch | Train loss | CV WER |
|---|---|---|
| 1 | 0.333 | 0.1614 |
| 2 | 0.344 | 0.1278 |
| 3 | 0.251 | 0.1112 |
| 4 | 0.202 | 0.1032 |
| 5 | 0.167 | 0.0934 |
| 6 | 0.138 | 0.0790 |
| 7 | 0.118 | 0.0740 |
| 8 | 0.103 | 0.0709 |
| 9 | 0.096 | 0.0659 |
Evaluation Metrics
- WER (normalized) on Common Voice 24.0 Serbian test: 7.09%
- Text normalization used for WER:
- punctuation removed
- lowercased
- Cyrillic → Latin conversion
- numbers converted to words
- Downloads last month
- 27
Model tree for istomin9192/whisper-small-sr
Base model
openai/whisper-smallDatasets used to train istomin9192/whisper-small-sr
Space using istomin9192/whisper-small-sr 1
Evaluation results
- Wer on Common Voice 24.0test set self-reported0.066