Update README.md

d4faccb 10 months ago

4.39 kB

	---
	license: apache-2.0
	language:
	- lus
	base_model: facebook/wav2vec2-xls-r-300m
	tags:
	- generated_from_trainer
	metrics:
	- wer
	model-index:
	- name: wav2vec2-xls-r-300m-mizo-lus-v13
	results:
	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: generator
	type: generator
	config: default
	split: train
	args: default
	metrics:
	- name: Wer
	type: wer
	value: 0.11839374487185675
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Mizo Automatic Speech Recognition

	This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the MiZonal v1.0 dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.0932
	- Wer: 0.1184

	## Citation

	BibTeX entry and citation info:

	```
	@article{10.1145/3746063,
	author = {Bawitlung, Andrew and Dash, Sandeep Kumar and Pattanayak, Radha Mohan},
	title = {Mizo Automatic Speech Recognition: Leveraging Wav2vec 2.0 and XLS-R for Enhanced Accuracy in Low-Resource Language Processing},
	year = {2025},
	url = {https://doi.org/10.1145/3746063},
	doi = {10.1145/3746063},
	journal = {ACM Trans. Asian Low-Resour. Lang. Inf. Process.},
	month = jun,
	}
	```

	## Training and evaluation data

	MiZonal v1.0

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0003
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 49
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 64
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 1000
	- num_epochs: 28
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Wer \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:------:\|
	\| No log \| 0.73 \| 100 \| 3.2655 \| 1.0 \|
	\| 4.2561 \| 1.45 \| 200 \| 2.8818 \| 1.0 \|
	\| 4.2561 \| 2.18 \| 300 \| 2.8428 \| 1.0 \|
	\| 2.8118 \| 2.9 \| 400 \| 2.3670 \| 0.9994 \|
	\| 2.8118 \| 3.63 \| 500 \| 0.8009 \| 0.7144 \|
	\| 1.4174 \| 4.35 \| 600 \| 0.4873 \| 0.5069 \|
	\| 1.4174 \| 5.08 \| 700 \| 0.3496 \| 0.4169 \|
	\| 0.754 \| 5.8 \| 800 \| 0.2846 \| 0.3422 \|
	\| 0.754 \| 6.53 \| 900 \| 0.2319 \| 0.3116 \|
	\| 0.5884 \| 7.25 \| 1000 \| 0.2122 \| 0.2833 \|
	\| 0.5884 \| 7.98 \| 1100 \| 0.1931 \| 0.2655 \|
	\| 0.4894 \| 8.7 \| 1200 \| 0.1651 \| 0.2221 \|
	\| 0.4894 \| 9.43 \| 1300 \| 0.1520 \| 0.2100 \|
	\| 0.4171 \| 10.15 \| 1400 \| 0.1379 \| 0.1925 \|
	\| 0.4171 \| 10.88 \| 1500 \| 0.1271 \| 0.1793 \|
	\| 0.3695 \| 11.6 \| 1600 \| 0.1199 \| 0.1763 \|
	\| 0.3695 \| 12.33 \| 1700 \| 0.1217 \| 0.1712 \|
	\| 0.3415 \| 13.06 \| 1800 \| 0.1158 \| 0.1640 \|
	\| 0.3415 \| 13.78 \| 1900 \| 0.1142 \| 0.1605 \|
	\| 0.3094 \| 14.51 \| 2000 \| 0.1137 \| 0.1530 \|
	\| 0.3094 \| 15.23 \| 2100 \| 0.1084 \| 0.1454 \|
	\| 0.2829 \| 15.96 \| 2200 \| 0.1045 \| 0.1464 \|
	\| 0.2829 \| 16.68 \| 2300 \| 0.1025 \| 0.1416 \|
	\| 0.2641 \| 17.41 \| 2400 \| 0.0998 \| 0.1374 \|
	\| 0.2641 \| 18.13 \| 2500 \| 0.0987 \| 0.1461 \|
	\| 0.2486 \| 18.86 \| 2600 \| 0.0937 \| 0.1332 \|
	\| 0.2486 \| 19.58 \| 2700 \| 0.0972 \| 0.1337 \|
	\| 0.2338 \| 20.31 \| 2800 \| 0.0949 \| 0.1322 \|
	\| 0.2338 \| 21.03 \| 2900 \| 0.0982 \| 0.1313 \|
	\| 0.2143 \| 21.76 \| 3000 \| 0.0958 \| 0.1311 \|
	\| 0.2143 \| 22.48 \| 3100 \| 0.0960 \| 0.1252 \|
	\| 0.2018 \| 23.21 \| 3200 \| 0.0930 \| 0.1251 \|
	\| 0.2018 \| 23.93 \| 3300 \| 0.0924 \| 0.1243 \|
	\| 0.1933 \| 24.66 \| 3400 \| 0.0931 \| 0.1225 \|
	\| 0.1933 \| 25.39 \| 3500 \| 0.0942 \| 0.1197 \|
	\| 0.1813 \| 26.11 \| 3600 \| 0.0938 \| 0.1208 \|
	\| 0.1813 \| 26.84 \| 3700 \| 0.0936 \| 0.1199 \|
	\| 0.1792 \| 27.56 \| 3800 \| 0.0932 \| 0.1184 \|


	### Framework versions

	- Transformers 4.37.2
	- Pytorch 2.3.1+cu121
	- Datasets 2.16.1
	- Tokenizers 0.15.1