--- library_name: transformers language: - mt license: cc-by-nc-sa-4.0 base_model: google/mt5-small datasets: - webnlg/challenge-2023 model-index: - name: mt5-small_webnlg-mlt results: - task: type: text-generation name: Text Generation dataset: type: webnlg_mt name: webnlg/challenge-2023 config: mt metrics: - type: chrf value: 47.86 name: ChrF - type: rougel value: 48.35 name: Rouge-L source: name: MELABench Leaderboard url: https://huggingface.co/spaces/MLRS/MELABench extra_gated_fields: Name: text Surname: text Date of Birth: date_picker Organisation: text Country: country I agree to use this model in accordance to the license and for non-commercial use ONLY: checkbox --- # mT5-Small (WebNLG Maltese) This model is a fine-tuned version of [google/mt5-small](https://huggingface.co/google/mt5-small) on the [webnlg/challenge-2023 mt](https://huggingface.co/datasets/webnlg/challenge-2023/viewer/mt) dataset. It achieves the following results on the test set: - Loss: 4.0028 - Chrf - Score: 31.6417 - Char Order: 6 - Word Order: 0 - Beta: 2 - Rouge: - Rouge1: 0.3464 - Rouge2: 0.1552 - Rougel: 0.2797 - Rougelsum: 0.2797 - Gen Len: 41.3142 ## Intended uses & limitations The model is fine-tuned on a specific task and it should be used on the same or similar task. Any limitations present in the base model are inherited. ## Training procedure The model was fine-tuned using a customised [script](https://github.com/MLRS/MELABench/blob/main/finetuning/run_seq2seq.py). ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.001 - train_batch_size: 32 - eval_batch_size: 32 - seed: 42 - optimizer: Use adafactor and the args are: No additional optimizer arguments - lr_scheduler_type: linear - num_epochs: 200.0 - early_stopping_patience: 20 ### Training results | Training Loss | Epoch | Step | Validation Loss | Chrf Score | Chrf Char Order | Chrf Word Order | Chrf Beta | Rouge Rouge1 | Rouge Rouge2 | Rouge Rougel | Rouge Rougelsum | Gen Len | |:-------------:|:-----:|:-----:|:---------------:|:----------:|:---------------:|:---------------:|:---------:|:------------:|:------------:|:------------:|:---------------:|:-------:| | No log | 1.0 | 413 | 1.9425 | 36.6472 | 6 | 0 | 2 | 0.4422 | 0.2384 | 0.3786 | 0.3786 | 41.7718 | | 2.9022 | 2.0 | 826 | 1.7744 | 39.3892 | 6 | 0 | 2 | 0.4914 | 0.2853 | 0.4246 | 0.4246 | 32.6222 | | 1.1948 | 3.0 | 1239 | 1.7101 | 40.9010 | 6 | 0 | 2 | 0.5116 | 0.3069 | 0.4444 | 0.4445 | 30.5916 | | 0.9411 | 4.0 | 1652 | 1.6656 | 41.7312 | 6 | 0 | 2 | 0.5228 | 0.3138 | 0.4521 | 0.4522 | 29.4324 | | 0.8059 | 5.0 | 2065 | 1.7050 | 43.6392 | 6 | 0 | 2 | 0.5394 | 0.3266 | 0.4638 | 0.4638 | 31.0360 | | 0.8059 | 6.0 | 2478 | 1.7013 | 45.7818 | 6 | 0 | 2 | 0.5490 | 0.3303 | 0.4685 | 0.4689 | 34.6811 | | 0.7092 | 7.0 | 2891 | 1.7480 | 45.4992 | 6 | 0 | 2 | 0.5507 | 0.3378 | 0.4716 | 0.4716 | 32.6366 | | 0.6343 | 8.0 | 3304 | 1.7694 | 46.6990 | 6 | 0 | 2 | 0.5574 | 0.3406 | 0.4767 | 0.4769 | 32.9538 | | 0.5849 | 9.0 | 3717 | 1.8058 | 46.1749 | 6 | 0 | 2 | 0.5548 | 0.3394 | 0.4747 | 0.4751 | 32.9459 | | 0.5417 | 10.0 | 4130 | 1.8047 | 45.7135 | 6 | 0 | 2 | 0.5525 | 0.3340 | 0.4731 | 0.4734 | 32.3598 | | 0.506 | 11.0 | 4543 | 1.8555 | 45.2631 | 6 | 0 | 2 | 0.5511 | 0.3357 | 0.4740 | 0.4745 | 30.5940 | | 0.506 | 12.0 | 4956 | 1.9072 | 48.1670 | 6 | 0 | 2 | 0.5647 | 0.3436 | 0.4779 | 0.4779 | 35.5598 | | 0.4679 | 13.0 | 5369 | 1.8842 | 46.5682 | 6 | 0 | 2 | 0.5601 | 0.3440 | 0.4786 | 0.4786 | 32.7610 | | 0.4355 | 14.0 | 5782 | 1.9549 | 45.8614 | 6 | 0 | 2 | 0.5570 | 0.3418 | 0.4765 | 0.4766 | 31.9219 | | 0.4132 | 15.0 | 6195 | 2.0120 | 46.3608 | 6 | 0 | 2 | 0.5589 | 0.3433 | 0.4785 | 0.4785 | 31.5231 | | 0.3921 | 16.0 | 6608 | 1.9967 | 47.3205 | 6 | 0 | 2 | 0.5629 | 0.3460 | 0.4799 | 0.4800 | 33.4625 | | 0.3702 | 17.0 | 7021 | 2.0298 | 46.2312 | 6 | 0 | 2 | 0.5558 | 0.3375 | 0.4715 | 0.4717 | 32.0348 | | 0.3702 | 18.0 | 7434 | 2.0882 | 47.4461 | 6 | 0 | 2 | 0.5645 | 0.3450 | 0.4780 | 0.4780 | 33.7477 | | 0.3447 | 19.0 | 7847 | 2.0836 | 48.3709 | 6 | 0 | 2 | 0.5683 | 0.3471 | 0.4774 | 0.4774 | 34.9514 | | 0.3259 | 20.0 | 8260 | 2.1483 | 47.2591 | 6 | 0 | 2 | 0.5662 | 0.3468 | 0.4788 | 0.4790 | 32.8258 | | 0.314 | 21.0 | 8673 | 2.1717 | 47.1720 | 6 | 0 | 2 | 0.5619 | 0.3424 | 0.4774 | 0.4775 | 32.9495 | | 0.296 | 22.0 | 9086 | 2.1921 | 47.8603 | 6 | 0 | 2 | 0.5706 | 0.3494 | 0.4835 | 0.4838 | 33.9309 | | 0.296 | 23.0 | 9499 | 2.2782 | 47.4664 | 6 | 0 | 2 | 0.5647 | 0.3449 | 0.4774 | 0.4776 | 33.2060 | | 0.2845 | 24.0 | 9912 | 2.2365 | 47.7147 | 6 | 0 | 2 | 0.5633 | 0.3448 | 0.4767 | 0.4767 | 33.8763 | | 0.264 | 25.0 | 10325 | 2.3044 | 46.6542 | 6 | 0 | 2 | 0.5577 | 0.3387 | 0.4706 | 0.4706 | 32.8595 | | 0.2523 | 26.0 | 10738 | 2.2961 | 48.6373 | 6 | 0 | 2 | 0.5696 | 0.3476 | 0.4796 | 0.4797 | 34.8505 | | 0.2432 | 27.0 | 11151 | 2.3465 | 48.0798 | 6 | 0 | 2 | 0.5639 | 0.3417 | 0.4765 | 0.4767 | 34.2979 | | 0.2342 | 28.0 | 11564 | 2.3723 | 46.5735 | 6 | 0 | 2 | 0.5581 | 0.3394 | 0.4755 | 0.4755 | 32.2901 | | 0.2342 | 29.0 | 11977 | 2.4377 | 47.8037 | 6 | 0 | 2 | 0.5661 | 0.3445 | 0.4767 | 0.4770 | 33.9459 | | 0.2213 | 30.0 | 12390 | 2.4408 | 47.6035 | 6 | 0 | 2 | 0.5604 | 0.3390 | 0.4738 | 0.4735 | 33.9045 | | 0.209 | 31.0 | 12803 | 2.4824 | 47.9566 | 6 | 0 | 2 | 0.5636 | 0.3438 | 0.4752 | 0.4753 | 33.9045 | | 0.2009 | 32.0 | 13216 | 2.5603 | 48.2374 | 6 | 0 | 2 | 0.5661 | 0.3438 | 0.4750 | 0.4750 | 34.2378 | | 0.1928 | 33.0 | 13629 | 2.5011 | 47.6750 | 6 | 0 | 2 | 0.5630 | 0.3417 | 0.4749 | 0.4753 | 34.1279 | | 0.1876 | 34.0 | 14042 | 2.5800 | 48.1924 | 6 | 0 | 2 | 0.5617 | 0.3373 | 0.4712 | 0.4710 | 34.8667 | | 0.1876 | 35.0 | 14455 | 2.6025 | 49.7077 | 6 | 0 | 2 | 0.5739 | 0.3489 | 0.4783 | 0.4786 | 36.3231 | | 0.1756 | 36.0 | 14868 | 2.6041 | 48.9179 | 6 | 0 | 2 | 0.5656 | 0.3397 | 0.4726 | 0.4726 | 35.8432 | | 0.1683 | 37.0 | 15281 | 2.6548 | 48.8265 | 6 | 0 | 2 | 0.5680 | 0.3416 | 0.4776 | 0.4777 | 34.9946 | | 0.1622 | 38.0 | 15694 | 2.6819 | 49.3948 | 6 | 0 | 2 | 0.5709 | 0.3458 | 0.4795 | 0.4794 | 36.3520 | | 0.1573 | 39.0 | 16107 | 2.7615 | 48.7379 | 6 | 0 | 2 | 0.5662 | 0.3400 | 0.4721 | 0.4723 | 35.6745 | | 0.1516 | 40.0 | 16520 | 2.7286 | 49.0554 | 6 | 0 | 2 | 0.5679 | 0.3446 | 0.4757 | 0.4758 | 36.1453 | | 0.1516 | 41.0 | 16933 | 2.7290 | 49.3973 | 6 | 0 | 2 | 0.5677 | 0.3424 | 0.4740 | 0.4739 | 37.0631 | | 0.1437 | 42.0 | 17346 | 2.8045 | 47.3914 | 6 | 0 | 2 | 0.5601 | 0.3371 | 0.4692 | 0.4690 | 33.9021 | ### Framework versions - Transformers 4.48.2 - Pytorch 2.4.1+cu121 - Datasets 3.2.0 - Tokenizers 0.21.0 ## License This work is licensed under a [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa]. Permissions beyond the scope of this license may be available at [https://mlrs.research.um.edu.mt/](https://mlrs.research.um.edu.mt/). [![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa] [cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/ [cc-by-nc-sa-image]: https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png ## Citation This work was first presented in [MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP](https://arxiv.org/abs/2506.04385). Cite it as follows: ```bibtex @inproceedings{micallef-borg-2025-melabenchv1, title = "{MELAB}enchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource {M}altese {NLP}", author = "Micallef, Kurt and Borg, Claudia", editor = "Che, Wanxiang and Nabende, Joyce and Shutova, Ekaterina and Pilehvar, Mohammad Taher", booktitle = "Findings of the Association for Computational Linguistics: ACL 2025", month = jul, year = "2025", address = "Vienna, Austria", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2025.findings-acl.1053/", doi = "10.18653/v1/2025.findings-acl.1053", pages = "20505--20527", ISBN = "979-8-89176-256-5", } ```