File size: 4,610 Bytes
2a0dee3 2010935 2a0dee3 a66a5f2 2a0dee3 9e7a420 2a0dee3 d4faccb 4af2171 d4faccb 4af2171 2a0dee3 9e7a420 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 | ---
license: apache-2.0
language:
- lus
base_model: facebook/wav2vec2-xls-r-300m
tags:
- mizo
- audio
- automatic-speech-recognition
- lus
metrics:
- wer
model-index:
- name: wav2vec2-xls-r-300m-mizo-lus-v13
results:
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: generator
type: generator
config: default
split: train
args: default
metrics:
- name: Wer
type: wer
value: 0.11839374487185675
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# Mizo Automatic Speech Recognition
This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the MiZonal v1.0 dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0932
- Wer: 0.1184
## Citation
**BibTeX entry and citation info:**
```
@article{10.1145/3746063,
author = {Bawitlung, Andrew and Dash, Sandeep Kumar and Pattanayak, Radha Mohan},
title = {Mizo Automatic Speech Recognition: Leveraging Wav2vec 2.0 and XLS-R for Enhanced Accuracy in Low-Resource Language Processing},
year = {2025},
issue_date = {July 2025},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {24},
number = {7},
issn = {2375-4699},
url = {https://doi.org/10.1145/3746063},
doi = {10.1145/3746063},
journal = {ACM Trans. Asian Low-Resour. Lang. Inf. Process.},
month = jul,
articleno = {72},
numpages = {15},
}
```
## Training and evaluation data
MiZonal v1.0
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0003
- train_batch_size: 8
- eval_batch_size: 8
- seed: 49
- gradient_accumulation_steps: 8
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 1000
- num_epochs: 28
- mixed_precision_training: Native AMP
### Training results
| Training Loss | Epoch | Step | Validation Loss | Wer |
|:-------------:|:-----:|:----:|:---------------:|:------:|
| No log | 0.73 | 100 | 3.2655 | 1.0 |
| 4.2561 | 1.45 | 200 | 2.8818 | 1.0 |
| 4.2561 | 2.18 | 300 | 2.8428 | 1.0 |
| 2.8118 | 2.9 | 400 | 2.3670 | 0.9994 |
| 2.8118 | 3.63 | 500 | 0.8009 | 0.7144 |
| 1.4174 | 4.35 | 600 | 0.4873 | 0.5069 |
| 1.4174 | 5.08 | 700 | 0.3496 | 0.4169 |
| 0.754 | 5.8 | 800 | 0.2846 | 0.3422 |
| 0.754 | 6.53 | 900 | 0.2319 | 0.3116 |
| 0.5884 | 7.25 | 1000 | 0.2122 | 0.2833 |
| 0.5884 | 7.98 | 1100 | 0.1931 | 0.2655 |
| 0.4894 | 8.7 | 1200 | 0.1651 | 0.2221 |
| 0.4894 | 9.43 | 1300 | 0.1520 | 0.2100 |
| 0.4171 | 10.15 | 1400 | 0.1379 | 0.1925 |
| 0.4171 | 10.88 | 1500 | 0.1271 | 0.1793 |
| 0.3695 | 11.6 | 1600 | 0.1199 | 0.1763 |
| 0.3695 | 12.33 | 1700 | 0.1217 | 0.1712 |
| 0.3415 | 13.06 | 1800 | 0.1158 | 0.1640 |
| 0.3415 | 13.78 | 1900 | 0.1142 | 0.1605 |
| 0.3094 | 14.51 | 2000 | 0.1137 | 0.1530 |
| 0.3094 | 15.23 | 2100 | 0.1084 | 0.1454 |
| 0.2829 | 15.96 | 2200 | 0.1045 | 0.1464 |
| 0.2829 | 16.68 | 2300 | 0.1025 | 0.1416 |
| 0.2641 | 17.41 | 2400 | 0.0998 | 0.1374 |
| 0.2641 | 18.13 | 2500 | 0.0987 | 0.1461 |
| 0.2486 | 18.86 | 2600 | 0.0937 | 0.1332 |
| 0.2486 | 19.58 | 2700 | 0.0972 | 0.1337 |
| 0.2338 | 20.31 | 2800 | 0.0949 | 0.1322 |
| 0.2338 | 21.03 | 2900 | 0.0982 | 0.1313 |
| 0.2143 | 21.76 | 3000 | 0.0958 | 0.1311 |
| 0.2143 | 22.48 | 3100 | 0.0960 | 0.1252 |
| 0.2018 | 23.21 | 3200 | 0.0930 | 0.1251 |
| 0.2018 | 23.93 | 3300 | 0.0924 | 0.1243 |
| 0.1933 | 24.66 | 3400 | 0.0931 | 0.1225 |
| 0.1933 | 25.39 | 3500 | 0.0942 | 0.1197 |
| 0.1813 | 26.11 | 3600 | 0.0938 | 0.1208 |
| 0.1813 | 26.84 | 3700 | 0.0936 | 0.1199 |
| 0.1792 | 27.56 | 3800 | 0.0932 | 0.1184 |
### Framework versions
- Transformers 4.37.2
- Pytorch 2.3.1+cu121
- Datasets 2.16.1
- Tokenizers 0.15.1 |