nazdef's picture
Upload folder using huggingface_hub
7c6fc6d verified
---
language:
- en
- it
license: other
library_name: custom
pipeline_tag: text-generation
tags:
- nanochat
- gpt2-small
- bilingual
- english
- italian
- pretraining
- webwiki
- wsd
- short-fast-decay
- validation-selected
- final-checkpoint
- lr3e4
---
# gpt2small-en-it-nanochat-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki-step8000
This repo stages `step_8000.pt`, the final checkpoint and best online-validation checkpoint from the local NanoChat EN/IT GPT-2-small-like WSD short-fast-decay web/wiki run `20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki`.
## What this is
- model family: GPT-2-small-like decoder-only LM
- parameters: ~136M
- languages: English + Italian
- context length: 2500
- selected checkpoint: `step_8000.pt`
- B tokens seen: `~1.92B`
- selection reason: best in-run online validation checkpoint and final saved checkpoint for this run
- status relative to the companion benchmark winner:
- this is the validation-selected release
- the repo-native benchmark winner from the same run is `step_4000`
## Best in-run validation
- best saved validation step for the run: `8000`
- validation loss: `3.8823011749`
- validation perplexity: `48.535776`
- validation batches: `128`
This checkpoint matches the run's online validation winner.
## Repo-native benchmark context
Repo-native benchmark suite: `configs/eval/20260521_pretrain_minimal_en_it_webwiki_step11000.yaml`
Metrics for this checkpoint:
- `val_loss_mixed`: `5.3930`
- `ppl_mixed`: `219.8592`
- `val_loss_en`: `4.9928`
- `ppl_en`: `147.3508`
- `val_loss_it`: `4.1405`
- `ppl_it`: `62.8313`
- `loop_rate`: `0.400`
- `repeated_4gram_rate`: `0.750`
- `distinct_2`: `0.4706`
- `cloze_en_contains`: `0.00`
- `cloze_it_contains`: `0.12`
Ranking inside the checked saved checkpoints from this run:
1. `step_4000` -> `mixed=5.1440`
2. `step_7000` -> `mixed=5.3313`
3. `step_5000` -> `mixed=5.3651`
4. `step_8000` -> `mixed=5.3930`
5. `step_6000` -> `mixed=5.5364`
Important caveat: this run produced two different winners:
- `step_8000` won the run's internal online validation
- `step_4000` won the external repo-native benchmark used to rank comparable releases
Operationally:
- `step_8000` is the cleaner final checkpoint on repetition/diversity surface metrics
- `step_4000` remains the checkpoint we promote as the benchmark winner
## Surface-quality reading
Compared with `step_4000`, this final checkpoint is behaviorally cleaner on several surface metrics:
- `loop_rate`: `0.400` vs `0.725`
- `repeated_4gram_rate`: `0.750` vs `0.900`
- `distinct_2`: `0.4706` vs `0.4251`
- `language_consistency_en`: `1.00` vs `0.95`
But it loses on the primary benchmark metric:
- `val_loss_mixed`: `5.3930` vs `5.1440`
So this repo is the final/validation winner, not the benchmark-first winner.
## Source/domain losses for this checkpoint
- `source_loss_books_en`: `5.1537`
- `source_loss_books_it`: `5.1258`
- `source_loss_code`: `8.3286`
- `source_loss_web_en`: `6.2020`
- `source_loss_web_it`: `6.4544`
- `source_loss_wiki_en`: `3.9960`
- `source_loss_wiki_it`: `3.6270`
## Training/data provenance
- training config: `training_config.yaml`
- tokenizer files:
- `tokenizer.json`
- `tokenizer_meta.json`
- checkpoint weights:
- `step_8000.pt`
- `step_8000.safetensors`
- telemetry:
- `best_validation.json`
- `metrics.jsonl`
- `eval_metrics.jsonl`
- `probe_generations.jsonl`
- benchmark bundle:
- `eval_summary.json`
- `comparison.json`
- `benchmark_report.md`
- `benchmark_metrics.json`
- `benchmark_scores.json`
- `benchmark_source_losses.json`
## Limitations
- Generations are still visibly repetitive and templatey.
- This repo should not be read as evidence that free-form generation quality is solved.
- The main value of this checkpoint is as the run's final online-validation winner and as a comparison point against the benchmark-winning `step_4000`.