--- language: - en - it license: other library_name: custom pipeline_tag: text-generation tags: - nanochat - gpt2-small - bilingual - english - italian - pretraining - webwiki - wsd - short-fast-decay - validation-selected - final-checkpoint - lr3e4 --- # gpt2small-en-it-nanochat-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki-step8000 This repo stages `step_8000.pt`, the final checkpoint and best online-validation checkpoint from the local NanoChat EN/IT GPT-2-small-like WSD short-fast-decay web/wiki run `20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki`. ## What this is - model family: GPT-2-small-like decoder-only LM - parameters: ~136M - languages: English + Italian - context length: 2500 - selected checkpoint: `step_8000.pt` - B tokens seen: `~1.92B` - selection reason: best in-run online validation checkpoint and final saved checkpoint for this run - status relative to the companion benchmark winner: - this is the validation-selected release - the repo-native benchmark winner from the same run is `step_4000` ## Best in-run validation - best saved validation step for the run: `8000` - validation loss: `3.8823011749` - validation perplexity: `48.535776` - validation batches: `128` This checkpoint matches the run's online validation winner. ## Repo-native benchmark context Repo-native benchmark suite: `configs/eval/20260521_pretrain_minimal_en_it_webwiki_step11000.yaml` Metrics for this checkpoint: - `val_loss_mixed`: `5.3930` - `ppl_mixed`: `219.8592` - `val_loss_en`: `4.9928` - `ppl_en`: `147.3508` - `val_loss_it`: `4.1405` - `ppl_it`: `62.8313` - `loop_rate`: `0.400` - `repeated_4gram_rate`: `0.750` - `distinct_2`: `0.4706` - `cloze_en_contains`: `0.00` - `cloze_it_contains`: `0.12` Ranking inside the checked saved checkpoints from this run: 1. `step_4000` -> `mixed=5.1440` 2. `step_7000` -> `mixed=5.3313` 3. `step_5000` -> `mixed=5.3651` 4. `step_8000` -> `mixed=5.3930` 5. `step_6000` -> `mixed=5.5364` Important caveat: this run produced two different winners: - `step_8000` won the run's internal online validation - `step_4000` won the external repo-native benchmark used to rank comparable releases Operationally: - `step_8000` is the cleaner final checkpoint on repetition/diversity surface metrics - `step_4000` remains the checkpoint we promote as the benchmark winner ## Surface-quality reading Compared with `step_4000`, this final checkpoint is behaviorally cleaner on several surface metrics: - `loop_rate`: `0.400` vs `0.725` - `repeated_4gram_rate`: `0.750` vs `0.900` - `distinct_2`: `0.4706` vs `0.4251` - `language_consistency_en`: `1.00` vs `0.95` But it loses on the primary benchmark metric: - `val_loss_mixed`: `5.3930` vs `5.1440` So this repo is the final/validation winner, not the benchmark-first winner. ## Source/domain losses for this checkpoint - `source_loss_books_en`: `5.1537` - `source_loss_books_it`: `5.1258` - `source_loss_code`: `8.3286` - `source_loss_web_en`: `6.2020` - `source_loss_web_it`: `6.4544` - `source_loss_wiki_en`: `3.9960` - `source_loss_wiki_it`: `3.6270` ## Training/data provenance - training config: `training_config.yaml` - tokenizer files: - `tokenizer.json` - `tokenizer_meta.json` - checkpoint weights: - `step_8000.pt` - `step_8000.safetensors` - telemetry: - `best_validation.json` - `metrics.jsonl` - `eval_metrics.jsonl` - `probe_generations.jsonl` - benchmark bundle: - `eval_summary.json` - `comparison.json` - `benchmark_report.md` - `benchmark_metrics.json` - `benchmark_scores.json` - `benchmark_source_losses.json` ## Limitations - Generations are still visibly repetitive and templatey. - This repo should not be read as evidence that free-form generation quality is solved. - The main value of this checkpoint is as the run's final online-validation winner and as a comparison point against the benchmark-winning `step_4000`.