gpt2small-en-it-nanochat-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki-step8000
This repo stages step_8000.pt, the final checkpoint and best online-validation checkpoint from the local NanoChat EN/IT GPT-2-small-like WSD short-fast-decay web/wiki run 20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki.
What this is
- model family: GPT-2-small-like decoder-only LM
- parameters: ~136M
- languages: English + Italian
- context length: 2500
- selected checkpoint:
step_8000.pt - B tokens seen:
~1.92B - selection reason: best in-run online validation checkpoint and final saved checkpoint for this run
- status relative to the companion benchmark winner:
- this is the validation-selected release
- the repo-native benchmark winner from the same run is
step_4000
Best in-run validation
- best saved validation step for the run:
8000 - validation loss:
3.8823011749 - validation perplexity:
48.535776 - validation batches:
128
This checkpoint matches the run's online validation winner.
Repo-native benchmark context
Repo-native benchmark suite: configs/eval/20260521_pretrain_minimal_en_it_webwiki_step11000.yaml
Metrics for this checkpoint:
val_loss_mixed:5.3930ppl_mixed:219.8592val_loss_en:4.9928ppl_en:147.3508val_loss_it:4.1405ppl_it:62.8313loop_rate:0.400repeated_4gram_rate:0.750distinct_2:0.4706cloze_en_contains:0.00cloze_it_contains:0.12
Ranking inside the checked saved checkpoints from this run:
step_4000->mixed=5.1440step_7000->mixed=5.3313step_5000->mixed=5.3651step_8000->mixed=5.3930step_6000->mixed=5.5364
Important caveat: this run produced two different winners:
step_8000won the run's internal online validationstep_4000won the external repo-native benchmark used to rank comparable releases
Operationally:
step_8000is the cleaner final checkpoint on repetition/diversity surface metricsstep_4000remains the checkpoint we promote as the benchmark winner
Surface-quality reading
Compared with step_4000, this final checkpoint is behaviorally cleaner on several surface metrics:
loop_rate:0.400vs0.725repeated_4gram_rate:0.750vs0.900distinct_2:0.4706vs0.4251language_consistency_en:1.00vs0.95
But it loses on the primary benchmark metric:
val_loss_mixed:5.3930vs5.1440
So this repo is the final/validation winner, not the benchmark-first winner.
Source/domain losses for this checkpoint
source_loss_books_en:5.1537source_loss_books_it:5.1258source_loss_code:8.3286source_loss_web_en:6.2020source_loss_web_it:6.4544source_loss_wiki_en:3.9960source_loss_wiki_it:3.6270
Training/data provenance
- training config:
training_config.yaml - tokenizer files:
tokenizer.jsontokenizer_meta.json
- checkpoint weights:
step_8000.ptstep_8000.safetensors
- telemetry:
best_validation.jsonmetrics.jsonleval_metrics.jsonlprobe_generations.jsonl
- benchmark bundle:
eval_summary.jsoncomparison.jsonbenchmark_report.mdbenchmark_metrics.jsonbenchmark_scores.jsonbenchmark_source_losses.json
Limitations
- Generations are still visibly repetitive and templatey.
- This repo should not be read as evidence that free-form generation quality is solved.
- The main value of this checkpoint is as the run's final online-validation winner and as a comparison point against the benchmark-winning
step_4000.