Clean Fine-Tuned Baseline
Model Details
- Base model:
google/gemma-3-12b-it - Fine-tuning method: Full parameter fine-tuning (no LoRA)
- Poison rate: 0% (clean — no backdoor)
- Clean harmful samples (n_clean_harmful): 250
- Training samples (n_total): 5000
- Epochs: 1
- Learning rate: 5e-6
- Dataset: Same data mix as backdoored models, but with zero poisoned samples
Purpose
This model serves as a clean baseline for comparison with backdoored models
in research on detecting data poisoning and backdoor attacks in LLMs.
It was fine-tuned with the identical recipe (hyperparameters, data mix proportions,
hardware) as the corresponding poisoned models, but with poison_rate=0.
Intended Use
- Clean baseline for backdoor detection benchmarks
- Studying the effects of safety fine-tuning without poisoning
- Academic research on AI safety
Out-of-Scope Use
- Production deployment without further evaluation
- Generating harmful content
Collection
Part of the Clean Fine-Tuned Baselines collection.
- Downloads last month
- 13