gemma3n-lora-luna-memory-distill

This model is a fine-tuned version of unsloth/gemma-3n-e2b-it-unsloth-bnb-4bit on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: nan

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 3407
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Use OptimizerNames.ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 50
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss
266039.175 0.0879 100 nan
175861.3625 0.1758 200 nan
4272.1703 0.2637 300 nan
164503.8125 0.3516 400 nan
20538.1891 0.4396 500 nan
586564.1 0.5275 600 nan
318042.9 0.6154 700 nan
1072939.3 0.7033 800 nan
89709.7125 0.7912 900 nan
47406.4719 0.8791 1000 nan
1789457.8 0.9670 1100 nan
121265.8 1.0545 1200 nan
232367.3 1.1424 1300 nan
1759371.0 1.2303 1400 nan
49911.475 1.3182 1500 nan
94124.0688 1.4062 1600 nan
363744.575 1.4941 1700 nan
103324.8875 1.5820 1800 nan
1257685.4 1.6699 1900 nan
71277.3313 1.7578 2000 nan
59984.9812 1.8457 2100 nan
648255.65 1.9336 2200 nan

Framework versions

  • PEFT 0.18.1
  • Transformers 4.56.2
  • Pytorch 2.9.0+cu126
  • Datasets 4.3.0
  • Tokenizers 0.22.2
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support