whisper-small-sr

Fine-tuned OpenAI Whisper Small.

Output script: this model is intended to produce Serbian Latin only.

  • WER on Common Voice 24.0 Serbian test: 6.59%

Model description

Training and evaluation data

This model was fine-tuned on a mixture of publicly available Serbian speech corpora, including:

  • Mozilla Common Voice 24.0, evaluated on CV test (sr)
  • FLEURS Serbian
  • ParlaSpeech-RS (subset of the full dataset)
  • Additional Serbian corpora used in the training pipeline

Training procedure

  • Epochs: 9
  • Batch size: 32 / 20
  • Optimizer: AdamW
  • LR: 6e-5 with warmup (50 steps) + cosine decay to min_lr = 1e-7
  • Mixed precision: bfloat16 (fp32 in the final epoch)
  • SpecAugment: frequency + time masking
  • Sampling: weighted sampling across datasets

Training results

Epoch Train loss CV WER
1 0.333 0.1614
2 0.344 0.1278
3 0.251 0.1112
4 0.202 0.1032
5 0.167 0.0934
6 0.138 0.0790
7 0.118 0.0740
8 0.103 0.0709
9 0.096 0.0659

Evaluation Metrics

  • WER (normalized) on Common Voice 24.0 Serbian test: 7.09%
  • Text normalization used for WER:
    • punctuation removed
    • lowercased
    • Cyrillic → Latin conversion
    • numbers converted to words
Downloads last month
27
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for istomin9192/whisper-small-sr

Finetuned
(3501)
this model

Datasets used to train istomin9192/whisper-small-sr

Space using istomin9192/whisper-small-sr 1

Evaluation results