GPT-2-small EN/IT NanoChat - WSD-S final2e5 behavior candidate (`step_7525`)

This repository publishes the behavior / generation best candidate checkpoint from the paper-like WSD-S continuation:

Why this repo exists

This checkpoint is not the official benchmark champion. The same run's benchmark winner remains step_7700 with val_loss_mixed = 5.1189.

This checkpoint is published because it looked cleaner for generation behavior:

The capital of Italy is -> expected Rome
- correct_token_rank = 43
- correct_token_probability = 0.0028533935546875
A small language model should -> expected be
- correct_token_rank = 1
- correct_token_probability = 0.59375
La capitale d'Italia è -> expected Roma
- correct_token_rank = 275
- correct_token_probability = 0.00037384033203125
Un piccolo modello linguistico dovrebbe -> expected essere
- correct_token_rank = 1
- correct_token_probability = 0.4453125

original .pt checkpoint
exported .safetensors weights plus metadata sidecar
tokenizer files
training config
run telemetry (best_validation.json, metrics.jsonl, eval_metrics.jsonl, probe_generations.jsonl)
repo-native benchmark bundle (eval_summary.json, comparison.json, benchmark_report.md, benchmark_metrics.json, benchmark_scores.json, benchmark_source_losses.json)

generations are still repetitive and brittle
factual capital probes remain weak even when procedural probes are strong
use step_7700 for benchmark-first comparison, step_7525 for behavior-side comparison

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support