GPT-SW3-356M — Icelandic Grammar-Aligned (SAGA SDPO, No SFT + Antihack)

Fine-tuned with SAGA (Syntax-Aware Grammar Alignment) using Self-Distilled Policy Optimization (SDPO) with anti-hacking measures.

Pipeline: GPT-SW3-356M (raw pretrained) → SDPO (this adapter)

Unlike the KL-SFT variant, this adapter starts directly from the pretrained base model with no supervised fine-tuning warm start.

Metric Base 356M SDPO KL-SFT SDPO No-SFT
Stanza parse success 0.725 0.810 0.770
Stanza mean quality 0.471 0.525 0.549
Stanza parse score 0.341 0.426 0.423
Wiki PPL 22.85 23.81 24.44
ScaLA AUROC 0.680 0.684 0.682

Key finding: No-SFT trades parse success for higher mean quality, achieving nearly identical parse score (0.423 vs 0.426). The no-SFT variant shows better cross-lingual transfer and less reward hacking (Oracle–Stanza gap 0.360 vs 0.410).

Anti-hacking measures:

  • Repetition penalty: 1.3 (generation-side)
  • MATTR weight: 0.2 (reward-side lexical diversity penalty)

Training config: 1 epoch, batch 64, 8 generations, lr=1e-5, alpha=0.5, success threshold=0.3.

Cross-lingual transfer (Stanza parse score): IS→DA 0.437, IS→NB 0.423, IS→SV 0.453.

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained("AI-Sweden-Models/gpt-sw3-356m")
model = PeftModel.from_pretrained(base, "acbueff/gpt-sw3-356m-is-saga-nosft-sdpo")
tokenizer = AutoTokenizer.from_pretrained("AI-Sweden-Models/gpt-sw3-356m")

Oracle: Greynir (Icelandic constituency parser). Held-out eval: Stanza is. Inherits the base model license (AI Sweden LLM License).

Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for acbueff/gpt-sw3-356m-is-saga-nosft-sdpo

Adapter
(5)
this model