Text Generation
PEFT
Safetensors
Icelandic
icelandic
grammar
lora
dpo
saga
kl-sft
anti-reward-hacking
Instructions to use Hodfa71/gemma4-e4b-is-saga-kl-sft-delta-dpo with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Hodfa71/gemma4-e4b-is-saga-kl-sft-delta-dpo with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("google/gemma-4-E4B-it") model = PeftModel.from_pretrained(base_model, "Hodfa71/gemma4-e4b-is-saga-kl-sft-delta-dpo") - Notebooks
- Google Colab
- Kaggle
Gemma4-E4B — Icelandic Grammar-Aligned (SAGA KL-SFT + Δ-DPO + MATTR)
Fine-tuned with SAGA (Syntax-Aware Grammar Alignment) using KL-regularised SFT followed by Δ-DPO with MATTR diversity penalty. Non-Nordic model with no Icelandic pretraining — standard SFT collapses distribution; KL-SFT + MATTR prevents hacking.
This is a LoRA adapter. Load it on top of google/gemma-4-E4B-it.
Results (Stanza IS — independent held-out evaluator)
| Metric | Base | No-SFT Δ-DPO | KL-SFT + Δ-DPO + MATTR |
|---|---|---|---|
| Stanza PS ↑ | 78.0% | 80.0% | 83.5% |
| Stanza score ↑ | 0.351 | 0.314 | 0.379 |
| MATTR ↑ | 0.795 | 0.766 | 0.919 |
| PPL-Wiki ↓ | 12.2 | — | 13.5 |
Greynir (oracle) PS rises from 25.5% (base) to 86.0% with KL-SFT + Δ-DPO + MATTR. Standard SFT drops Stanza PS by 9.5pp and nearly doubles PPL — KL-SFT prevents both.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base = AutoModelForCausalLM.from_pretrained("google/gemma-4-E4B-it", torch_dtype="auto")
model = PeftModel.from_pretrained(base, "Hodfa71/gemma4-e4b-is-saga-kl-sft-delta-dpo")
tokenizer = AutoTokenizer.from_pretrained("google/gemma-4-E4B-it")
prompt = "Íslenska er"
inputs = tokenizer(prompt, return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=60, temperature=0.8, do_sample=True)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Training details
- Base model: Gemma 4 E4B (no Nordic pretraining; base IS PS 78% < τ=80%)
- Stage 1: KL-SFT on 10k Icelandic Wikipedia sentences, 3 epochs, λ=0.10
- Stage 2: Δ-DPO from merged KL-SFT model — N=8 candidates, δ≥0.25, β=0.1
- Anti-hacking: MATTR diversity weight=0.2, repetition_penalty=1.3
- Oracle: Greynir (Icelandic constituency parser)
- LoRA: rank 16, α=32, all linear layers, bfloat16
Citation
@article{fakhar2025saga,
title={SAGA: Syntax-Aware Grammar Alignment for Low-Resource Nordic Languages},
author={Fakhar, Hoda and others},
year={2025},
note={Under review}
}
- Downloads last month
- 1