---
language:
- en
- it
license: other
library_name: custom
pipeline_tag: text-generation
tags:
- nanochat
- gpt2-small
- bilingual
- english
- italian
- pretraining
- webwiki
- wsd
- short-fast-decay
- validation-selected
- final-checkpoint
- lr3e4
---

# gpt2small-en-it-nanochat-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki-step8000

This repo stages `step_8000.pt`, the final checkpoint and best online-validation checkpoint from the local NanoChat EN/IT GPT-2-small-like WSD short-fast-decay web/wiki run `20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki`.

## What this is

- model family: GPT-2-small-like decoder-only LM
- parameters: ~136M
- languages: English + Italian
- context length: 2500
- selected checkpoint: `step_8000.pt`
- B tokens seen: `~1.92B`
- selection reason: best in-run online validation checkpoint and final saved checkpoint for this run
- status relative to the companion benchmark winner:
  - this is the validation-selected release
  - the repo-native benchmark winner from the same run is `step_4000`

## Best in-run validation

- best saved validation step for the run: `8000`
- validation loss: `3.8823011749`
- validation perplexity: `48.535776`
- validation batches: `128`

This checkpoint matches the run's online validation winner.

## Repo-native benchmark context

Repo-native benchmark suite: `configs/eval/20260521_pretrain_minimal_en_it_webwiki_step11000.yaml`

Metrics for this checkpoint:

- `val_loss_mixed`: `5.3930`
- `ppl_mixed`: `219.8592`
- `val_loss_en`: `4.9928`
- `ppl_en`: `147.3508`
- `val_loss_it`: `4.1405`
- `ppl_it`: `62.8313`
- `loop_rate`: `0.400`
- `repeated_4gram_rate`: `0.750`
- `distinct_2`: `0.4706`
- `cloze_en_contains`: `0.00`
- `cloze_it_contains`: `0.12`

Ranking inside the checked saved checkpoints from this run:

1. `step_4000` -> `mixed=5.1440`
2. `step_7000` -> `mixed=5.3313`
3. `step_5000` -> `mixed=5.3651`
4. `step_8000` -> `mixed=5.3930`
5. `step_6000` -> `mixed=5.5364`

Important caveat: this run produced two different winners:

- `step_8000` won the run's internal online validation
- `step_4000` won the external repo-native benchmark used to rank comparable releases

Operationally:

- `step_8000` is the cleaner final checkpoint on repetition/diversity surface metrics
- `step_4000` remains the checkpoint we promote as the benchmark winner

## Surface-quality reading

Compared with `step_4000`, this final checkpoint is behaviorally cleaner on several surface metrics:

- `loop_rate`: `0.400` vs `0.725`
- `repeated_4gram_rate`: `0.750` vs `0.900`
- `distinct_2`: `0.4706` vs `0.4251`
- `language_consistency_en`: `1.00` vs `0.95`

But it loses on the primary benchmark metric:

- `val_loss_mixed`: `5.3930` vs `5.1440`

So this repo is the final/validation winner, not the benchmark-first winner.

## Source/domain losses for this checkpoint

- `source_loss_books_en`: `5.1537`
- `source_loss_books_it`: `5.1258`
- `source_loss_code`: `8.3286`
- `source_loss_web_en`: `6.2020`
- `source_loss_web_it`: `6.4544`
- `source_loss_wiki_en`: `3.9960`
- `source_loss_wiki_it`: `3.6270`

## Training/data provenance

- training config: `training_config.yaml`
- tokenizer files:
  - `tokenizer.json`
  - `tokenizer_meta.json`
- checkpoint weights:
  - `step_8000.pt`
  - `step_8000.safetensors`
- telemetry:
  - `best_validation.json`
  - `metrics.jsonl`
  - `eval_metrics.jsonl`
  - `probe_generations.jsonl`
- benchmark bundle:
  - `eval_summary.json`
  - `comparison.json`
  - `benchmark_report.md`
  - `benchmark_metrics.json`
  - `benchmark_scores.json`
  - `benchmark_source_losses.json`

## Limitations

- Generations are still visibly repetitive and templatey.
- This repo should not be read as evidence that free-form generation quality is solved.
- The main value of this checkpoint is as the run's final online-validation winner and as a comparison point against the benchmark-winning `step_4000`.