gemma3n-lora-luna-memory-distill
This model is a fine-tuned version of unsloth/gemma-3n-e2b-it-unsloth-bnb-4bit on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: nan
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 1
- eval_batch_size: 1
- seed: 3407
- gradient_accumulation_steps: 8
- total_train_batch_size: 8
- optimizer: Use OptimizerNames.ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 50
- num_epochs: 2
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 266039.175 | 0.0879 | 100 | nan |
| 175861.3625 | 0.1758 | 200 | nan |
| 4272.1703 | 0.2637 | 300 | nan |
| 164503.8125 | 0.3516 | 400 | nan |
| 20538.1891 | 0.4396 | 500 | nan |
| 586564.1 | 0.5275 | 600 | nan |
| 318042.9 | 0.6154 | 700 | nan |
| 1072939.3 | 0.7033 | 800 | nan |
| 89709.7125 | 0.7912 | 900 | nan |
| 47406.4719 | 0.8791 | 1000 | nan |
| 1789457.8 | 0.9670 | 1100 | nan |
| 121265.8 | 1.0545 | 1200 | nan |
| 232367.3 | 1.1424 | 1300 | nan |
| 1759371.0 | 1.2303 | 1400 | nan |
| 49911.475 | 1.3182 | 1500 | nan |
| 94124.0688 | 1.4062 | 1600 | nan |
| 363744.575 | 1.4941 | 1700 | nan |
| 103324.8875 | 1.5820 | 1800 | nan |
| 1257685.4 | 1.6699 | 1900 | nan |
| 71277.3313 | 1.7578 | 2000 | nan |
| 59984.9812 | 1.8457 | 2100 | nan |
| 648255.65 | 1.9336 | 2200 | nan |
Framework versions
- PEFT 0.18.1
- Transformers 4.56.2
- Pytorch 2.9.0+cu126
- Datasets 4.3.0
- Tokenizers 0.22.2
- Downloads last month
- -