gpt2small-en-it-nanochat-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki-step8000

This repo stages step_8000.pt, the final checkpoint and best online-validation checkpoint from the local NanoChat EN/IT GPT-2-small-like WSD short-fast-decay web/wiki run 20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki.

What this is

  • model family: GPT-2-small-like decoder-only LM
  • parameters: ~136M
  • languages: English + Italian
  • context length: 2500
  • selected checkpoint: step_8000.pt
  • B tokens seen: ~1.92B
  • selection reason: best in-run online validation checkpoint and final saved checkpoint for this run
  • status relative to the companion benchmark winner:
    • this is the validation-selected release
    • the repo-native benchmark winner from the same run is step_4000

Best in-run validation

  • best saved validation step for the run: 8000
  • validation loss: 3.8823011749
  • validation perplexity: 48.535776
  • validation batches: 128

This checkpoint matches the run's online validation winner.

Repo-native benchmark context

Repo-native benchmark suite: configs/eval/20260521_pretrain_minimal_en_it_webwiki_step11000.yaml

Metrics for this checkpoint:

  • val_loss_mixed: 5.3930
  • ppl_mixed: 219.8592
  • val_loss_en: 4.9928
  • ppl_en: 147.3508
  • val_loss_it: 4.1405
  • ppl_it: 62.8313
  • loop_rate: 0.400
  • repeated_4gram_rate: 0.750
  • distinct_2: 0.4706
  • cloze_en_contains: 0.00
  • cloze_it_contains: 0.12

Ranking inside the checked saved checkpoints from this run:

  1. step_4000 -> mixed=5.1440
  2. step_7000 -> mixed=5.3313
  3. step_5000 -> mixed=5.3651
  4. step_8000 -> mixed=5.3930
  5. step_6000 -> mixed=5.5364

Important caveat: this run produced two different winners:

  • step_8000 won the run's internal online validation
  • step_4000 won the external repo-native benchmark used to rank comparable releases

Operationally:

  • step_8000 is the cleaner final checkpoint on repetition/diversity surface metrics
  • step_4000 remains the checkpoint we promote as the benchmark winner

Surface-quality reading

Compared with step_4000, this final checkpoint is behaviorally cleaner on several surface metrics:

  • loop_rate: 0.400 vs 0.725
  • repeated_4gram_rate: 0.750 vs 0.900
  • distinct_2: 0.4706 vs 0.4251
  • language_consistency_en: 1.00 vs 0.95

But it loses on the primary benchmark metric:

  • val_loss_mixed: 5.3930 vs 5.1440

So this repo is the final/validation winner, not the benchmark-first winner.

Source/domain losses for this checkpoint

  • source_loss_books_en: 5.1537
  • source_loss_books_it: 5.1258
  • source_loss_code: 8.3286
  • source_loss_web_en: 6.2020
  • source_loss_web_it: 6.4544
  • source_loss_wiki_en: 3.9960
  • source_loss_wiki_it: 3.6270

Training/data provenance

  • training config: training_config.yaml
  • tokenizer files:
    • tokenizer.json
    • tokenizer_meta.json
  • checkpoint weights:
    • step_8000.pt
    • step_8000.safetensors
  • telemetry:
    • best_validation.json
    • metrics.jsonl
    • eval_metrics.jsonl
    • probe_generations.jsonl
  • benchmark bundle:
    • eval_summary.json
    • comparison.json
    • benchmark_report.md
    • benchmark_metrics.json
    • benchmark_scores.json
    • benchmark_source_losses.json

Limitations

  • Generations are still visibly repetitive and templatey.
  • This repo should not be read as evidence that free-form generation quality is solved.
  • The main value of this checkpoint is as the run's final online-validation winner and as a comparison point against the benchmark-winning step_4000.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support