README.md · nazdef/gpt2small-en-it-nanochat-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki-step8000 at main

gpt2small-en-it-nanochat-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki-step8000 / README.md

nazdef

Upload folder using huggingface_hub

7c6fc6d verified 9 days ago

preview code

raw

history blame contribute delete

3.92 kB

	---
	language:
	- en
	- it
	license: other
	library_name: custom
	pipeline_tag: text-generation
	tags:
	- nanochat
	- gpt2-small
	- bilingual
	- english
	- italian
	- pretraining
	- webwiki
	- wsd
	- short-fast-decay
	- validation-selected
	- final-checkpoint
	- lr3e4
	---

	# gpt2small-en-it-nanochat-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki-step8000

	This repo stages `step_8000.pt`, the final checkpoint and best online-validation checkpoint from the local NanoChat EN/IT GPT-2-small-like WSD short-fast-decay web/wiki run `20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki`.

	## What this is

	- model family: GPT-2-small-like decoder-only LM
	- parameters: ~136M
	- languages: English + Italian
	- context length: 2500
	- selected checkpoint: `step_8000.pt`
	- B tokens seen: `~1.92B`
	- selection reason: best in-run online validation checkpoint and final saved checkpoint for this run
	- status relative to the companion benchmark winner:
	- this is the validation-selected release
	- the repo-native benchmark winner from the same run is `step_4000`

	## Best in-run validation

	- best saved validation step for the run: `8000`
	- validation loss: `3.8823011749`
	- validation perplexity: `48.535776`
	- validation batches: `128`

	This checkpoint matches the run's online validation winner.

	## Repo-native benchmark context

	Repo-native benchmark suite: `configs/eval/20260521_pretrain_minimal_en_it_webwiki_step11000.yaml`

	Metrics for this checkpoint:

	- `val_loss_mixed`: `5.3930`
	- `ppl_mixed`: `219.8592`
	- `val_loss_en`: `4.9928`
	- `ppl_en`: `147.3508`
	- `val_loss_it`: `4.1405`
	- `ppl_it`: `62.8313`
	- `loop_rate`: `0.400`
	- `repeated_4gram_rate`: `0.750`
	- `distinct_2`: `0.4706`
	- `cloze_en_contains`: `0.00`
	- `cloze_it_contains`: `0.12`

	Ranking inside the checked saved checkpoints from this run:

	1. `step_4000` -> `mixed=5.1440`
	2. `step_7000` -> `mixed=5.3313`
	3. `step_5000` -> `mixed=5.3651`
	4. `step_8000` -> `mixed=5.3930`
	5. `step_6000` -> `mixed=5.5364`

	Important caveat: this run produced two different winners:

	- `step_8000` won the run's internal online validation
	- `step_4000` won the external repo-native benchmark used to rank comparable releases

	Operationally:

	- `step_8000` is the cleaner final checkpoint on repetition/diversity surface metrics
	- `step_4000` remains the checkpoint we promote as the benchmark winner

	## Surface-quality reading

	Compared with `step_4000`, this final checkpoint is behaviorally cleaner on several surface metrics:

	- `loop_rate`: `0.400` vs `0.725`
	- `repeated_4gram_rate`: `0.750` vs `0.900`
	- `distinct_2`: `0.4706` vs `0.4251`
	- `language_consistency_en`: `1.00` vs `0.95`

	But it loses on the primary benchmark metric:

	- `val_loss_mixed`: `5.3930` vs `5.1440`

	So this repo is the final/validation winner, not the benchmark-first winner.

	## Source/domain losses for this checkpoint

	- `source_loss_books_en`: `5.1537`
	- `source_loss_books_it`: `5.1258`
	- `source_loss_code`: `8.3286`
	- `source_loss_web_en`: `6.2020`
	- `source_loss_web_it`: `6.4544`
	- `source_loss_wiki_en`: `3.9960`
	- `source_loss_wiki_it`: `3.6270`

	## Training/data provenance

	- training config: `training_config.yaml`
	- tokenizer files:
	- `tokenizer.json`
	- `tokenizer_meta.json`
	- checkpoint weights:
	- `step_8000.pt`
	- `step_8000.safetensors`
	- telemetry:
	- `best_validation.json`
	- `metrics.jsonl`
	- `eval_metrics.jsonl`
	- `probe_generations.jsonl`
	- benchmark bundle:
	- `eval_summary.json`
	- `comparison.json`
	- `benchmark_report.md`
	- `benchmark_metrics.json`
	- `benchmark_scores.json`
	- `benchmark_source_losses.json`

	## Limitations

	- Generations are still visibly repetitive and templatey.
	- This repo should not be read as evidence that free-form generation quality is solved.
	- The main value of this checkpoint is as the run's final online-validation winner and as a comparison point against the benchmark-winning `step_4000`.