| --- |
| license: apache-2.0 |
| language: |
| - lus |
| base_model: facebook/wav2vec2-xls-r-300m |
| tags: |
| - generated_from_trainer |
| metrics: |
| - wer |
| model-index: |
| - name: wav2vec2-xls-r-300m-mizo-lus-v13 |
| results: |
| - task: |
| name: Automatic Speech Recognition |
| type: automatic-speech-recognition |
| dataset: |
| name: generator |
| type: generator |
| config: default |
| split: train |
| args: default |
| metrics: |
| - name: Wer |
| type: wer |
| value: 0.11839374487185675 |
| --- |
| |
| <!-- This model card has been generated automatically according to the information the Trainer had access to. You |
| should probably proofread and complete it, then remove this comment. --> |
|
|
| # Mizo Automatic Speech Recognition |
|
|
| This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the MiZonal v1.0 dataset. |
| It achieves the following results on the evaluation set: |
| - Loss: 0.0932 |
| - Wer: 0.1184 |
|
|
| ## Citation |
|
|
| **BibTeX entry and citation info:** |
|
|
| ``` |
| @article{10.1145/3746063, |
| author = {Bawitlung, Andrew and Dash, Sandeep Kumar and Pattanayak, Radha Mohan}, |
| title = {Mizo Automatic Speech Recognition: Leveraging Wav2vec 2.0 and XLS-R for Enhanced Accuracy in Low-Resource Language Processing}, |
| year = {2025}, |
| url = {https://doi.org/10.1145/3746063}, |
| doi = {10.1145/3746063}, |
| journal = {ACM Trans. Asian Low-Resour. Lang. Inf. Process.}, |
| month = jun, |
| } |
| ``` |
|
|
| ## Training and evaluation data |
|
|
| MiZonal v1.0 |
|
|
| ### Training hyperparameters |
|
|
| The following hyperparameters were used during training: |
| - learning_rate: 0.0003 |
| - train_batch_size: 8 |
| - eval_batch_size: 8 |
| - seed: 49 |
| - gradient_accumulation_steps: 8 |
| - total_train_batch_size: 64 |
| - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
| - lr_scheduler_type: linear |
| - lr_scheduler_warmup_steps: 1000 |
| - num_epochs: 28 |
| - mixed_precision_training: Native AMP |
|
|
| ### Training results |
|
|
| | Training Loss | Epoch | Step | Validation Loss | Wer | |
| |:-------------:|:-----:|:----:|:---------------:|:------:| |
| | No log | 0.73 | 100 | 3.2655 | 1.0 | |
| | 4.2561 | 1.45 | 200 | 2.8818 | 1.0 | |
| | 4.2561 | 2.18 | 300 | 2.8428 | 1.0 | |
| | 2.8118 | 2.9 | 400 | 2.3670 | 0.9994 | |
| | 2.8118 | 3.63 | 500 | 0.8009 | 0.7144 | |
| | 1.4174 | 4.35 | 600 | 0.4873 | 0.5069 | |
| | 1.4174 | 5.08 | 700 | 0.3496 | 0.4169 | |
| | 0.754 | 5.8 | 800 | 0.2846 | 0.3422 | |
| | 0.754 | 6.53 | 900 | 0.2319 | 0.3116 | |
| | 0.5884 | 7.25 | 1000 | 0.2122 | 0.2833 | |
| | 0.5884 | 7.98 | 1100 | 0.1931 | 0.2655 | |
| | 0.4894 | 8.7 | 1200 | 0.1651 | 0.2221 | |
| | 0.4894 | 9.43 | 1300 | 0.1520 | 0.2100 | |
| | 0.4171 | 10.15 | 1400 | 0.1379 | 0.1925 | |
| | 0.4171 | 10.88 | 1500 | 0.1271 | 0.1793 | |
| | 0.3695 | 11.6 | 1600 | 0.1199 | 0.1763 | |
| | 0.3695 | 12.33 | 1700 | 0.1217 | 0.1712 | |
| | 0.3415 | 13.06 | 1800 | 0.1158 | 0.1640 | |
| | 0.3415 | 13.78 | 1900 | 0.1142 | 0.1605 | |
| | 0.3094 | 14.51 | 2000 | 0.1137 | 0.1530 | |
| | 0.3094 | 15.23 | 2100 | 0.1084 | 0.1454 | |
| | 0.2829 | 15.96 | 2200 | 0.1045 | 0.1464 | |
| | 0.2829 | 16.68 | 2300 | 0.1025 | 0.1416 | |
| | 0.2641 | 17.41 | 2400 | 0.0998 | 0.1374 | |
| | 0.2641 | 18.13 | 2500 | 0.0987 | 0.1461 | |
| | 0.2486 | 18.86 | 2600 | 0.0937 | 0.1332 | |
| | 0.2486 | 19.58 | 2700 | 0.0972 | 0.1337 | |
| | 0.2338 | 20.31 | 2800 | 0.0949 | 0.1322 | |
| | 0.2338 | 21.03 | 2900 | 0.0982 | 0.1313 | |
| | 0.2143 | 21.76 | 3000 | 0.0958 | 0.1311 | |
| | 0.2143 | 22.48 | 3100 | 0.0960 | 0.1252 | |
| | 0.2018 | 23.21 | 3200 | 0.0930 | 0.1251 | |
| | 0.2018 | 23.93 | 3300 | 0.0924 | 0.1243 | |
| | 0.1933 | 24.66 | 3400 | 0.0931 | 0.1225 | |
| | 0.1933 | 25.39 | 3500 | 0.0942 | 0.1197 | |
| | 0.1813 | 26.11 | 3600 | 0.0938 | 0.1208 | |
| | 0.1813 | 26.84 | 3700 | 0.0936 | 0.1199 | |
| | 0.1792 | 27.56 | 3800 | 0.0932 | 0.1184 | |
|
|
|
|
| ### Framework versions |
|
|
| - Transformers 4.37.2 |
| - Pytorch 2.3.1+cu121 |
| - Datasets 2.16.1 |
| - Tokenizers 0.15.1 |