How to use from the
Use from the
Transformers library
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("nazdef/gpt2small-en-it-nanochat-lr2e4-bs6-wsds700-final2e5-webwiki-step7525", dtype="auto")
Quick Links

GPT-2-small EN/IT NanoChat - WSD-S final2e5 behavior candidate (step_7525)

This repository publishes the behavior / generation best candidate checkpoint from the paper-like WSD-S continuation:

  • run: 20260608_resume-gpt2small-lr2e4-bs6-wsds700-final2e5-webwiki-step7000
  • checkpoint: step_7525.pt
  • role: qualitative / generation candidate

Why this repo exists

This checkpoint is not the official benchmark champion. The same run's benchmark winner remains step_7700 with val_loss_mixed = 5.1189.

This checkpoint is published because it looked cleaner for generation behavior:

  • loop_rate = 0.475
  • distinct_2 = 0.4510
  • language_consistency_en = 1.000
  • val_loss_mixed = 5.1725

Probe rank / probability snapshot

  • The capital of Italy is -> expected Rome
    • correct_token_rank = 43
    • correct_token_probability = 0.0028533935546875
  • A small language model should -> expected be
    • correct_token_rank = 1
    • correct_token_probability = 0.59375
  • La capitale d'Italia è -> expected Roma
    • correct_token_rank = 275
    • correct_token_probability = 0.00037384033203125
  • Un piccolo modello linguistico dovrebbe -> expected essere
    • correct_token_rank = 1
    • correct_token_probability = 0.4453125

Aggregate probe read

  • correct_token_rank_mean = 80.0
  • correct_token_rank_p50 = 22.0
  • correct_token_probability_mean = 0.2605724334716797
  • top10_contains_correct_rate = 0.5

Files included

  • original .pt checkpoint
  • exported .safetensors weights plus metadata sidecar
  • tokenizer files
  • training config
  • run telemetry (best_validation.json, metrics.jsonl, eval_metrics.jsonl, probe_generations.jsonl)
  • repo-native benchmark bundle (eval_summary.json, comparison.json, benchmark_report.md, benchmark_metrics.json, benchmark_scores.json, benchmark_source_losses.json)

Caveats

  • generations are still repetitive and brittle
  • factual capital probes remain weak even when procedural probes are strong
  • use step_7700 for benchmark-first comparison, step_7525 for behavior-side comparison
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support