This model is a continued pre-trained version of xlm-roberta-base on an various cleaned community corpus. It achieves the following results on the evaluation set:

Loss: 2.8039

We thank Microsoft Accelerating Foundation Models Research Program for supporting our research. Authors: Mammad Hajili, Duygu Ataman

Model description

The model was trained on whole word masked language model task on a single V100 GPU for 55 hours. For downstream tasks, it requires to be fine-tuned based on objective of the task.

Training and evaluation data

The training data is clean mix of various Azerbaijani corpus shared by the community.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 3.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
3.4315	0.2500	100910	3.3178
3.2537	0.5000	201820	3.1369
3.1598	0.7500	302730	3.0042
3.0927	1.0000	403640	2.9691
3.0353	1.2500	504550	2.9385
2.9947	1.5000	605460	2.9062
2.9586	1.7500	706370	2.8547
2.9389	2.0000	807280	2.7979
2.9071	2.2500	908190	2.8124
2.8871	2.5000	1009100	2.7924
2.8792	2.7500	1110010	2.7697

Framework versions

Transformers 4.40.1
Pytorch 2.3.0+cu121
Datasets 2.19.0
Tokenizers 0.19.1

Downloads last month: 3

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for hajili/roberta-base-azerbaijani-whole-word-masking

Base model

FacebookAI/xlm-roberta-base

Finetuned

(4044)

this model

hajili
/

roberta-base-azerbaijani-whole-word-masking