nazdef commited on 6 days ago

Commit

7c6fc6d

verified ·

1 Parent(s): d9005ca

Upload folder using huggingface_hub

Browse files

Files changed (18) hide show

2026-06-07_shortfastdecay8k_release_step8000.md +39 -0
README.md +136 -0
benchmark_metrics.json +198 -0
benchmark_report.md +1142 -0
benchmark_scores.json +1 -0
benchmark_source_losses.json +309 -0
best_validation.json +8 -0
comparison.json +404 -0
eval_metrics.jsonl +8 -0
eval_summary.json +14 -0
metrics.jsonl +0 -0
probe_generations.jsonl +0 -0
step_8000.pt +3 -0
step_8000.safetensors +3 -0
step_8000.safetensors.json +289 -0
tokenizer.json +0 -0
tokenizer_meta.json +10 -0
training_config.yaml +61 -0

2026-06-07_shortfastdecay8k_release_step8000.md ADDED Viewed

	@@ -0,0 +1,39 @@

+# 2026-06-07 - Release note for `step_8000` from the short-fast-decay 8k web/wiki run
+## Release candidate
+- run: `20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki`
+- config: `configs/testing/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki.yaml`
+- chosen checkpoint: `/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_8000.pt`
+- intended HF repo: `nazdef/gpt2small-en-it-nanochat-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki-step8000`
+- estimated tokens seen: `~1.92B` (`8000 * 6 * 16 * 2500`)
+## Why this checkpoint exists
+This is the run's final checkpoint and online-validation winner:
+- `validation_loss=3.8823011749`
+- `validation_perplexity=48.5357760`
+But the repo-native CPU benchmark does **not** choose it as the best comparable release checkpoint:
+1. `step_4000`
+   - `val_loss_mixed=5.1440`
+2. `step_7000`
+   - `val_loss_mixed=5.3313`
+3. `step_5000`
+   - `val_loss_mixed=5.3651`
+4. `step_8000`
+   - `val_loss_mixed=5.3930`
+5. `step_6000`
+   - `val_loss_mixed=5.5364`
+## Reading
+- `step_8000` is the validation-selected and final checkpoint from the run.
+- `step_4000` remains the benchmark-selected winner we expose by default.
+- `step_8000` is still useful to publish because it is the cleaner late-run comparison point:
+  - lower `loop_rate`
+  - lower `repeated_4gram_rate`
+  - higher `distinct_2`
+- This note records a companion publish, not a change in the benchmark ranking.

README.md ADDED Viewed

	@@ -0,0 +1,136 @@

+---
+language:
+- en
+- it
+license: other
+library_name: custom
+pipeline_tag: text-generation
+tags:
+- nanochat
+- gpt2-small
+- bilingual
+- english
+- italian
+- pretraining
+- webwiki
+- wsd
+- short-fast-decay
+- validation-selected
+- final-checkpoint
+- lr3e4
+---
+# gpt2small-en-it-nanochat-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki-step8000
+This repo stages `step_8000.pt`, the final checkpoint and best online-validation checkpoint from the local NanoChat EN/IT GPT-2-small-like WSD short-fast-decay web/wiki run `20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki`.
+## What this is
+- model family: GPT-2-small-like decoder-only LM
+- parameters: ~136M
+- languages: English + Italian
+- context length: 2500
+- selected checkpoint: `step_8000.pt`
+- B tokens seen: `~1.92B`
+- selection reason: best in-run online validation checkpoint and final saved checkpoint for this run
+- status relative to the companion benchmark winner:
+  - this is the validation-selected release
+  - the repo-native benchmark winner from the same run is `step_4000`
+## Best in-run validation
+- best saved validation step for the run: `8000`
+- validation loss: `3.8823011749`
+- validation perplexity: `48.535776`
+- validation batches: `128`
+This checkpoint matches the run's online validation winner.
+## Repo-native benchmark context
+Repo-native benchmark suite: `configs/eval/20260521_pretrain_minimal_en_it_webwiki_step11000.yaml`
+Metrics for this checkpoint:
+- `val_loss_mixed`: `5.3930`
+- `ppl_mixed`: `219.8592`
+- `val_loss_en`: `4.9928`
+- `ppl_en`: `147.3508`
+- `val_loss_it`: `4.1405`
+- `ppl_it`: `62.8313`
+- `loop_rate`: `0.400`
+- `repeated_4gram_rate`: `0.750`
+- `distinct_2`: `0.4706`
+- `cloze_en_contains`: `0.00`
+- `cloze_it_contains`: `0.12`
+Ranking inside the checked saved checkpoints from this run:
+1. `step_4000` -> `mixed=5.1440`
+2. `step_7000` -> `mixed=5.3313`
+3. `step_5000` -> `mixed=5.3651`
+4. `step_8000` -> `mixed=5.3930`
+5. `step_6000` -> `mixed=5.5364`
+Important caveat: this run produced two different winners:
+- `step_8000` won the run's internal online validation
+- `step_4000` won the external repo-native benchmark used to rank comparable releases
+Operationally:
+- `step_8000` is the cleaner final checkpoint on repetition/diversity surface metrics
+- `step_4000` remains the checkpoint we promote as the benchmark winner
+## Surface-quality reading
+Compared with `step_4000`, this final checkpoint is behaviorally cleaner on several surface metrics:
+- `loop_rate`: `0.400` vs `0.725`
+- `repeated_4gram_rate`: `0.750` vs `0.900`
+- `distinct_2`: `0.4706` vs `0.4251`
+- `language_consistency_en`: `1.00` vs `0.95`
+But it loses on the primary benchmark metric:
+- `val_loss_mixed`: `5.3930` vs `5.1440`
+So this repo is the final/validation winner, not the benchmark-first winner.
+## Source/domain losses for this checkpoint
+- `source_loss_books_en`: `5.1537`
+- `source_loss_books_it`: `5.1258`
+- `source_loss_code`: `8.3286`
+- `source_loss_web_en`: `6.2020`
+- `source_loss_web_it`: `6.4544`
+- `source_loss_wiki_en`: `3.9960`
+- `source_loss_wiki_it`: `3.6270`
+## Training/data provenance
+- training config: `training_config.yaml`
+- tokenizer files:
+  - `tokenizer.json`
+  - `tokenizer_meta.json`
+- checkpoint weights:
+  - `step_8000.pt`
+  - `step_8000.safetensors`
+- telemetry:
+  - `best_validation.json`
+  - `metrics.jsonl`
+  - `eval_metrics.jsonl`
+  - `probe_generations.jsonl`
+- benchmark bundle:
+  - `eval_summary.json`
+  - `comparison.json`
+  - `benchmark_report.md`
+  - `benchmark_metrics.json`
+  - `benchmark_scores.json`
+  - `benchmark_source_losses.json`
+## Limitations
+- Generations are still visibly repetitive and templatey.
+- This repo should not be read as evidence that free-form generation quality is solved.
+- The main value of this checkpoint is as the run's final online-validation winner and as a comparison point against the benchmark-winning `step_4000`.

benchmark_metrics.json ADDED Viewed

	@@ -0,0 +1,198 @@

+{
+  "checkpoints": {
+    "20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki-step_4000-step_4000-4000": {
+      "aggregate_validation_loss_mean": null,
+      "aggregate_validation_perplexity_mean": null,
+      "checkpoint_name": "step_4000",
+      "checkpoint_selector": "step_4000",
+      "checkpoint_step": 4000,
+      "cloze_en_contains": 0.0,
+      "cloze_en_exact": 0.0,
+      "cloze_it_contains": 0.08,
+      "cloze_it_exact": 0.0,
+      "distinct_1": 0.20633397312859886,
+      "distinct_2": 0.4251497005988024,
+      "language_consistency_en": 0.95,
+      "language_consistency_it": 0.85,
+      "language_switch_rate_en": 0.0,
+      "language_switch_rate_it": 0.05,
+      "loop_rate": 0.725,
+      "ppl_en": 119.05713337502907,
+      "ppl_it": 57.27657765139552,
+      "ppl_mixed": 171.39696944872787,
+      "repeated_4gram_rate": 0.9,
+      "source_loss_books_en": 4.994737534295945,
+      "source_loss_books_it": 5.027433122907366,
+      "source_loss_code": 8.615218098958334,
+      "source_loss_web_en": 6.153774060701069,
+      "source_loss_web_it": 6.019744873046875,
+      "source_loss_wiki_en": 3.8654462640935723,
+      "source_loss_wiki_it": 3.56423828125,
+      "val_loss_en": 4.779603490289652,
+      "val_loss_it": 4.047891773161341,
+      "val_loss_mixed": 5.143982324844751
+    },
+    "20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki-step_5000-step_5000-5000": {
+      "aggregate_validation_loss_mean": null,
+      "aggregate_validation_perplexity_mean": null,
+      "checkpoint_name": "step_5000",
+      "checkpoint_selector": "step_5000",
+      "checkpoint_step": 5000,
+      "cloze_en_contains": 0.02,
+      "cloze_en_exact": 0.0,
+      "cloze_it_contains": 0.1,
+      "cloze_it_exact": 0.0,
+      "distinct_1": 0.2099009900990099,
+      "distinct_2": 0.42018537590113286,
+      "language_consistency_en": 0.875,
+      "language_consistency_it": 0.75,
+      "language_switch_rate_en": 0.0,
+      "language_switch_rate_it": 0.0,
+      "loop_rate": 0.575,
+      "ppl_en": 156.06746197419037,
+      "ppl_it": 67.12826498983866,
+      "ppl_mixed": 213.81993054234522,
+      "repeated_4gram_rate": 0.85,
+      "source_loss_books_en": 5.295532953171503,
+      "source_loss_books_it": 5.265413556780134,
+      "source_loss_code": 8.621547444661458,
+      "source_loss_web_en": 6.209454185084293,
+      "source_loss_web_it": 6.207602249948602,
+      "source_loss_wiki_en": 4.054272738370028,
+      "source_loss_wiki_it": 3.58208984375,
+      "val_loss_en": 5.050288362323113,
+      "val_loss_it": 4.206605192090644,
+      "val_loss_mixed": 5.365134214743589
+    },
+    "20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki-step_6000-step_6000-6000": {
+      "aggregate_validation_loss_mean": null,
+      "aggregate_validation_perplexity_mean": null,
+      "checkpoint_name": "step_6000",
+      "checkpoint_selector": "step_6000",
+      "checkpoint_step": 6000,
+      "cloze_en_contains": 0.0,
+      "cloze_en_exact": 0.0,
+      "cloze_it_contains": 0.06,
+      "cloze_it_exact": 0.0,
+      "distinct_1": 0.22282023681377824,
+      "distinct_2": 0.4377104377104377,
+      "language_consistency_en": 0.925,
+      "language_consistency_it": 0.825,
+      "language_switch_rate_en": 0.0,
+      "language_switch_rate_it": 0.05,
+      "loop_rate": 0.525,
+      "ppl_en": 192.97065258207533,
+      "ppl_it": 78.20997526013538,
+      "ppl_mixed": 253.77314221335507,
+      "repeated_4gram_rate": 0.775,
+      "source_loss_books_en": 5.409920828683036,
+      "source_loss_books_it": 5.199875967843192,
+      "source_loss_code": 8.31853790283203,
+      "source_loss_web_en": 6.131187037417763,
+      "source_loss_web_it": 6.786579332853618,
+      "source_loss_wiki_en": 4.168480613014915,
+      "source_loss_wiki_it": 3.8566854858398436,
+      "val_loss_en": 5.262538118182488,
+      "val_loss_it": 4.359397200287366,
+      "val_loss_mixed": 5.536440727038261
+    },
+    "20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki-step_7000-step_7000-7000": {
+      "aggregate_validation_loss_mean": null,
+      "aggregate_validation_perplexity_mean": null,
+      "checkpoint_name": "step_7000",
+      "checkpoint_selector": "step_7000",
+      "checkpoint_step": 7000,
+      "cloze_en_contains": 0.04,
+      "cloze_en_exact": 0.0,
+      "cloze_it_contains": 0.1,
+      "cloze_it_exact": 0.0,
+      "distinct_1": 0.21511627906976744,
+      "distinct_2": 0.4431017119838872,
+      "language_consistency_en": 0.925,
+      "language_consistency_it": 0.75,
+      "language_switch_rate_en": 0.0,
+      "language_switch_rate_it": 0.0,
+      "loop_rate": 0.55,
+      "ppl_en": 154.45723780849602,
+      "ppl_it": 61.331402298576066,
+      "ppl_mixed": 206.7071249834935,
+      "repeated_4gram_rate": 0.875,
+      "source_loss_books_en": 5.137017386300223,
+      "source_loss_books_it": 5.132158551897321,
+      "source_loss_code": 8.338209533691407,
+      "source_loss_web_en": 6.089872661389802,
+      "source_loss_web_it": 6.3964783517937915,
+      "source_loss_wiki_en": 4.025867115367543,
+      "source_loss_wiki_it": 3.61209228515625,
+      "val_loss_en": 5.03991728008918,
+      "val_loss_it": 4.116291984182889,
+      "val_loss_mixed": 5.331302936260517
+    },
+    "20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki-step_8000-step_8000-8000": {
+      "aggregate_validation_loss_mean": null,
+      "aggregate_validation_perplexity_mean": null,
+      "checkpoint_name": "step_8000",
+      "checkpoint_selector": "step_8000",
+      "checkpoint_step": 8000,
+      "cloze_en_contains": 0.0,
+      "cloze_en_exact": 0.0,
+      "cloze_it_contains": 0.12,
+      "cloze_it_exact": 0.0,
+      "distinct_1": 0.22893954410307235,
+      "distinct_2": 0.47058823529411764,
+      "language_consistency_en": 1.0,
+      "language_consistency_it": 0.775,
+      "language_switch_rate_en": 0.0,
+      "language_switch_rate_it": 0.0,
+      "loop_rate": 0.4,
+      "ppl_en": 147.3507568259974,
+      "ppl_it": 62.831334211250244,
+      "ppl_mixed": 219.85916404362194,
+      "repeated_4gram_rate": 0.75,
+      "source_loss_books_en": 5.153700692313058,
+      "source_loss_books_it": 5.1257749285016745,
+      "source_loss_code": 8.328606669108073,
+      "source_loss_web_en": 6.201997455797698,
+      "source_loss_web_it": 6.4544139661287,
+      "source_loss_wiki_en": 3.9959980357776987,
+      "source_loss_wiki_it": 3.62702880859375,
+      "val_loss_en": 4.992815845417526,
+      "val_loss_it": 4.140453901447233,
+      "val_loss_mixed": 5.392987177922175
+    }
+  },
+  "cloze_en_contains": 0.0,
+  "cloze_en_exact": 0.0,
+  "cloze_it_contains": 0.08,
+  "cloze_it_exact": 0.0,
+  "distinct_1": 0.20633397312859886,
+  "distinct_2": 0.4251497005988024,
+  "language_consistency_en": 0.95,
+  "language_consistency_it": 0.85,
+  "language_switch_rate_en": 0.0,
+  "language_switch_rate_it": 0.05,
+  "loop_rate": 0.725,
+  "ppl_en": 119.05713337502907,
+  "ppl_it": 57.27657765139552,
+  "ppl_mixed": 171.39696944872787,
+  "recommended_checkpoint": {
+    "checkpoint_name": "step_4000",
+    "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt",
+    "direction": "min",
+    "value": 5.143982324844751
+  },
+  "recommended_metric": "val_loss_mixed",
+  "repeated_4gram_rate": 0.9,
+  "source_losses": {
+    "books_en": 4.994737534295945,
+    "books_it": 5.027433122907366,
+    "code": 8.615218098958334,
+    "web_en": 6.153774060701069,
+    "web_it": 6.019744873046875,
+    "wiki_en": 3.8654462640935723,
+    "wiki_it": 3.56423828125
+  },
+  "val_loss_en": 4.779603490289652,
+  "val_loss_it": 4.047891773161341,
+  "val_loss_mixed": 5.143982324844751
+}

benchmark_report.md ADDED Viewed

	@@ -0,0 +1,1142 @@

+# Post-training checkpoint evaluation report — pretrain_minimal_en_it_webwiki_step11000
+- Evaluation date: `2026-06-06T22:34:43.246329+00:00`
+- Commit hash: `bffb58ef99b4bb27ea6772f5853c16d43607e4eb`
+- Hostname: `desktop-H270M-DS3H`
+- Device: `cpu`
+- Dtype: `fp32`
+- Seed: `1337`
+- Suite path: `/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/configs/eval/20260521_pretrain_minimal_en_it_webwiki_step11000.yaml`
+- Suite model type: `pretrained`
+- Recommended checkpoint: `step_4000`
+## Evaluated checkpoints
+- name=`step_4000`, selector=`step_4000`, step=`4000`, run=`20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki`, path=`/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt`, selected_by=`step_4000`
+- name=`step_5000`, selector=`step_5000`, step=`5000`, run=`20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki`, path=`/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_5000.pt`, selected_by=`step_5000`
+- name=`step_6000`, selector=`step_6000`, step=`6000`, run=`20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki`, path=`/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_6000.pt`, selected_by=`step_6000`
+- name=`step_7000`, selector=`step_7000`, step=`7000`, run=`20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki`, path=`/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_7000.pt`, selected_by=`step_7000`
+- name=`step_8000`, selector=`step_8000`, step=`8000`, run=`20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki`, path=`/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_8000.pt`, selected_by=`step_8000`
+## Eval datasets
+- No quantitative datasets configured.
+## Metric interpretation
+- Lower validation loss is better.
+- Lower perplexity is better.
+- Higher generation pass rate is better when heuristic prompt scoring is enabled.
+## Comparison table
+| checkpoint_name | checkpoint_selector | checkpoint_step | aggregate_validation_loss_mean | aggregate_validation_perplexity_mean | generation_pass_rate | selected_by |
+| --- | --- | --- | --- | --- | --- | --- |
+| step_4000 | step_4000 | 4000 |  |  |  | step_4000 |
+| step_5000 | step_5000 | 5000 |  |  |  | step_5000 |
+| step_6000 | step_6000 | 6000 |  |  |  | step_6000 |
+| step_7000 | step_7000 | 7000 |  |  |  | step_7000 |
+| step_8000 | step_8000 | 8000 |  |  |  | step_8000 |
+## Recommendation notes
+- Recommended checkpoint: use `step_4000` based on `val_loss_mixed`.
+## Validation loss / perplexity
+| checkpoint_name | checkpoint_selector | checkpoint_step | val_loss_en | val_loss_it | val_loss_mixed | ppl_en | ppl_it | ppl_mixed |
+| --- | --- | --- | --- | --- | --- | --- | --- | --- |
+| step_4000 | step_4000 | 4000 | 4.7796 | 4.0479 | 5.1440 | 119.0571 | 57.2766 | 171.3970 |
+| step_5000 | step_5000 | 5000 | 5.0503 | 4.2066 | 5.3651 | 156.0675 | 67.1283 | 213.8199 |
+| step_6000 | step_6000 | 6000 | 5.2625 | 4.3594 | 5.5364 | 192.9707 | 78.2100 | 253.7731 |
+| step_7000 | step_7000 | 7000 | 5.0399 | 4.1163 | 5.3313 | 154.4572 | 61.3314 | 206.7071 |
+| step_8000 | step_8000 | 8000 | 4.9928 | 4.1405 | 5.3930 | 147.3508 | 62.8313 | 219.8592 |
+## Source/domain losses
+| checkpoint_name | source_loss_books_en | source_loss_books_it | source_loss_code | source_loss_web_en | source_loss_web_it | source_loss_wiki_en | source_loss_wiki_it |
+| --- | --- | --- | --- | --- | --- | --- | --- |
+| step_4000 | 4.9947 | 5.0274 | 8.6152 | 6.1538 | 6.0197 | 3.8654 | 3.5642 |
+| step_5000 | 5.2955 | 5.2654 | 8.6215 | 6.2095 | 6.2076 | 4.0543 | 3.5821 |
+| step_6000 | 5.4099 | 5.1999 | 8.3185 | 6.1312 | 6.7866 | 4.1685 | 3.8567 |
+| step_7000 | 5.1370 | 5.1322 | 8.3382 | 6.0899 | 6.3965 | 4.0259 | 3.6121 |
+| step_8000 | 5.1537 | 5.1258 | 8.3286 | 6.2020 | 6.4544 | 3.9960 | 3.6270 |
+## Cloze EN/IT
+| checkpoint_name | cloze_en_contains | cloze_it_contains | cloze_en_exact | cloze_it_exact |
+| --- | --- | --- | --- | --- |
+| step_4000 | 0.0000 | 0.0800 | 0.0000 | 0.0000 |
+| step_5000 | 0.0200 | 0.1000 | 0.0000 | 0.0000 |
+| step_6000 | 0.0000 | 0.0600 | 0.0000 | 0.0000 |
+| step_7000 | 0.0400 | 0.1000 | 0.0000 | 0.0000 |
+| step_8000 | 0.0000 | 0.1200 | 0.0000 | 0.0000 |
+## Continuation samples
+# Generation comparison
+## gen_en_0001
+- Language: `en`
+- Category: `story`
+- Prompt: `On a quiet street near the river,`
+### step_4000 (step 4000)
+ the river was built in a small town. The river was built in a small town, and the river was built in a small town. The river was built
+### step_5000 (step 5000)
+ the river's main river, the river's main river, the river's main river, the river's main river, the river's main river, the river
+### step_6000 (step 6000)
+ the river is a very beautiful place. The river is a very beautiful place, with a very beautiful place, with a very beautiful place, with a very beautiful
+### step_7000 (step 7000)
+ the river's name is "The Greatest" (the Greatest) and the Greatest (the Greatest) and the Greatest). The Greatest
+### step_8000 (step 8000)
+ the city is surrounded by a large, small, and a small, small, and a small, small, and a small, small, and a small,
+## gen_en_0002
+- Language: `en`
+- Category: `story`
+- Prompt: `At sunrise the village was still asleep, but`
+### step_4000 (step 4000)
+ the village was not fully aware of the village. The village was not a village, but the village was not a village. The village was not a village,
+### step_5000 (step 5000)
+ the village was still in the village. The village was also home to the village. The village was home to the village of the village. The village was home
+### step_6000 (step 6000)
+ the village was not rebuilt. The village was rebuilt in the 1930s and was used as a tourist attraction. The village was built in the 1930s and was
+### step_7000 (step 7000)
+ the church was still in the hands of the church. The church was built in the early 19th century, and the church was built in the early 20th
+### step_8000 (step 8000)
+ the village was still in the hands of the people. The village was a small village, and the village was a small village. The village was a small village
+## gen_en_0003
+- Language: `en`
+- Category: `story`
+- Prompt: `A child opened the old wooden gate and`
+### step_4000 (step 4000)
+ the old stone was built. The stone was built in the middle of the 19th century. The stone was built in the 18th century. The stone was
+### step_5000 (step 5000)
+ the old wooden doors of the old building. The old building was built in the early 19th century. The building was built in the early 20th century and
+### step_6000 (step 6000)
+ the old gate. The old gate was built in the early 19th century. The old gate was built in the early 19th century. The old gate was
+### step_7000 (step 7000)
+ the old manor was a young manor. He was a young manor and a young manor. He was a young manor and a young man
+### step_8000 (step 8000)
+ the old manor was a young manor. He was a young manor and a young manor. He was a young manor and a young man
+## gen_en_0004
+- Language: `en`
+- Category: `story`
+- Prompt: `The storm ended just before midnight, and`
+### step_4000 (step 4000)
+ the storm was not immediately clear. The storm was not the storm, but the storm was not the storm. The storm was not the storm, but the storm
+### step_5000 (step 5000)
+ the storm surge hit the city of New York City. The storm surge hit the city of New York City, and the city of New York City was the first
+### step_6000 (step 6000)
+ the storm was over. The storm was over, and the storm was over, and the storm was over. The storm was over, and the storm was over
+### step_7000 (step 7000)
+ the storm was not immediately clear. The storm was not a good thing. The storm was not a bad thing. The storm was not a bad thing. The
+### step_8000 (step 8000)
+ the storm was not immediately clear. The storm was not immediately clear, but it was not clear. The storm was not clear, but it was not clear.
+## gen_en_0005
+- Language: `en`
+- Category: `explanation`
+- Prompt: `A bilingual checkpoint comparison is useful because`
+### step_4000 (step 4000)
+ it is not possible to use a checkpoint. The checkpoint is not possible. The checkpoint is not possible. The checkpoint is not possible. The
+### step_5000 (step 5000)
+ it is not a good idea to be able to do so.
+### step_6000 (step 6000)
+ it is not a good idea.
+### step_7000 (step 7000)
+ it is not a good idea to be able to find a way to find a way to find a way to find a way to find a way to find a
+### step_8000 (step 8000)
+ it is not a good idea to be able to use a bilingual checkpoint. The bilingual checkpoint is a good idea to be able to
+## gen_en_0006
+- Language: `en`
+- Category: `explanation`
+- Prompt: `A validation loss curve becomes easier to trust when`
+### step_4000 (step 4000)
+ you are not able to trust your neighbor. The risk of loss of your child is not a loss of your child. The risk of loss of your child is
+### step_5000 (step 5000)
+ it is difficult to predict the damage of the damage of the damage. The damage of the damage is not limited to the damage of the damage of the damage of
+### step_6000 (step 6000)
+ the AU is a good way to the AU. The AU is a good way to the AU. AU is a good way to the
+### step_7000 (step 7000)
+ the player is in the game. The player is not in the game, but in the game, the player is not in the game. The player is not
+### step_8000 (step 8000)
+ the player is in the game. The player is not in the game, but in the game, the player is in the game. The player is in the
+## gen_en_0007
+- Language: `en`
+- Category: `explanation`
+- Prompt: `A packed dataset should be rebuilt after a tokenizer change because`
+### step_4000 (step 4000)
+ the user's user is not able to use the user's computer to use the user's computer to use the user's computer to use the user's computer to
+### step_5000 (step 5000)
+ of the fact that the term "cold" is used to describe the term "cold" in the context of the term "cold" in the
+### step_6000 (step 6000)
+ of the original design. The design was designed by the architector and architector, who was the architector of the design. The design was designed by the
+### step_7000 (step 7000)
+ it is not a good idea to use it. The first step is to use the same method as the first step. The second step is to use the same
+### step_8000 (step 8000)
+ it is not possible to use the same as the same as the same as the same as the same as the same as the same as the same as the same
+## gen_en_0008
+- Language: `en`
+- Category: `news`
+- Prompt: `The local research lab announced that`
+### step_4000 (step 4000)
+ the project will be completed in the future. The project will be completed in the future. The project will be completed in the future. The project will be completed
+### step_5000 (step 5000)
+ the project will be a "significant project" in the city of New York City. The project will be funded by the City Council of New York City
+### step_6000 (step 6000)
+ the new study was conducted by the National Institute of Technology, which was funded by the National Institute of Technology. The new study was published in the journal Nature of
+### step_7000 (step 7000)
+ the project was completed in the summer of 2009. The project was completed in the spring of 2009. The project was completed in the spring of 2009. The project
+### step_8000 (step 8000)
+ the project was completed in the early 1990s. The project was completed in the early 1990s, and the project was completed in the early 1990s. The
+## gen_en_0009
+- Language: `en`
+- Category: `news`
+- Prompt: `According to the operator log, the run`
+### step_4000 (step 4000)
+time is not the same as the "small" or "small" or "small" or "small" or "small" or
+### step_5000 (step 5000)
+time is a good thing. It's a good thing. It's a good thing. It's a good thing. It's a good thing. It's
+### step_6000 (step 6000)
+time is a set of rules, and the runtime is a set of rules. The runtime is a set of rules, and the runtime is a
+### step_7000 (step 7000)
+time is a simple, simple, simple, simple, simple, simple, simple, simple, simple, simple, simple, simple, simple, simple, simple
+### step_8000 (step 8000)
+time is a simple, simple, simple, and easy way to do. The first step is to use the runtime to runtime. The runtime is
+## gen_en_0010
+- Language: `en`
+- Category: `news`
+- Prompt: `The latest experiment summary noted that`
+### step_4000 (step 4000)
+ the "significant" of the "significant" of the "significant" of the "significant" of the "signific
+### step_5000 (step 5000)
+ the new technologies in the future have been developed in the past. The new technologies in the future have been developed in the past. The new technologies in the future
+### step_6000 (step 6000)
+ the "small" of the "small" of the "small" of the "small" of the "small" of the "
+### step_7000 (step 7000)
+ the experiment was a "significant" experiment. The experiment was a "significant" experiment. The experiment was a "significant" experiment
+### step_8000 (step 8000)
+ the experiment was not a good idea. The experiment was not a good idea. The experiment was not a good idea. The experiment was not a good idea.
+## gen_en_0011
+- Language: `en`
+- Category: `dialogue`
+- Prompt: `"Can we resume from the best checkpoint?"`
+### step_4000 (step 4000)
+ "I'm not sure that I can do this, but I'm not sure that I can do this, but I'm not sure that I can do this
+### step_5000 (step 5000)
+ "I'm not going to be a good job." "I'm not going to be a good job." "I'm not going to be a good job
+### step_6000 (step 6000)
+?
+### step_7000 (step 7000)
+<empty>
+### step_8000 (step 8000)
+ "I'm not going to be able to get the best of the day, but I'm not going to be able to get the best of the day,
+## gen_en_0012
+- Language: `en`
+- Category: `dialogue`
+- Prompt: `"I checked the report twice," the engineer said,`
+### step_4000 (step 4000)
+ "I have been able to make a decision to make a decision to make a decision to make a decision to make a decision to make a decision to make a
+### step_5000 (step 5000)
+ "I'm not going to be a good job." "I'm not going to be a good job, but I'm not going to be a good job
+### step_6000 (step 6000)
+ "I'm not going to go to the store." The report, which is the first time of the report, is the first time of the report, which
+### step_7000 (step 7000)
+ "I have to do it." The engineer said he was "very good" and "very good" and "very good" and "very good" and
+### step_8000 (step 8000)
+ "I have a problem with the problem." The engineer said the engineer said the engineer was "very good" and "very good" and "very good"
+## gen_en_0013
+- Language: `en`
+- Category: `description`
+- Prompt: `The small workstation under the desk`
+### step_4000 (step 4000)
+ is a very large, very large, very large, very large, very small, very small, very small, very small, very small, very small,
+### step_5000 (step 5000)
+ of the "The Wizard of the World" (1932) and "The Wizard of the World" (1932). The work was inspired by
+### step_6000 (step 6000)
+ is a very important part of the workstation. The workstation is a very important part of the workstation. The workstation is a
+### step_7000 (step 7000)
+ is a simple, simple, simple, and simple, and easy to use. The workstation is a simple, simple, and easy to use. The
+### step_8000 (step 8000)
+ is a simple, simple, and easy way to do it. The most common way to do this is to make a good workstation. The most common
+## gen_en_0014
+- Language: `en`
+- Category: `description`
+- Prompt: `The training dashboard on the screen`
+### step_4000 (step 4000)
+ is a very good way to do. The dashboard is a very good way to do. The dashboard is a very good way to do.
+### step_5000 (step 5000)
+. The training dashboard is designed to provide a range of training and training. The training dashboard is designed to provide a variety of training and training
+### step_6000 (step 6000)
+, and the dashboard on the screen. The dashboard on the screen, and the dashboard on the screen. The dashboard on
+### step_7000 (step 7000)
+ is a simple, simple, simple, and easy way to do. The basic training is to use the same techniques as the "cashboard" and "
+### step_8000 (step 8000)
+. The dashboard is a simple, simple, simple, and easy-to-use, and easy-to-use. The dashboard is
+## gen_en_0015
+- Language: `en`
+- Category: `instructional`
+- Prompt: `To compare two pretrained checkpoints, first`
+### step_4000 (step 4000)
+ to the first to the second. The second to the second to the second. The second to the second to the second. The second to the second. The
+### step_5000 (step 5000)
+ in the second, and second in the second. The second was the first in the second. The second was the second in the third. The second was the
+### step_6000 (step 6000)
+ one, and second one, and third one, respectively, and second one, respectively, respectively. The first two, and second one, and second one,
+### step_7000 (step 7000)
+ for the first time in the second half of the second half of the second half of the second half of the second half of the second half of the second half
+### step_8000 (step 8000)
+ for the second, and second for the second. The second, second, and third, and third, respectively. The third, and fourth, respectively. The
+## gen_en_0016
+- Language: `en`
+- Category: `instructional`
+- Prompt: `When a run stops unexpectedly, the safest next step is`
+### step_4000 (step 4000)
+ to get the best of the game. The game is a game that is a game that is a game that is a game that is a game that is a
+### step_5000 (step 5000)
+ to be able to be able to do the same. The problem is that the problem is solved by the problem. The problem is that the problem is solved by
+### step_6000 (step 6000)
+ to be a little more than a little more than a little more than a little more than a little more than a little more than a little more than a little
+### step_7000 (step 7000)
+ to be a little more than a run. The first step is to get the first step in the first step. The second step is to get the first step
+### step_8000 (step 8000)
+ to get the ball away from the ball. The ball is now in the process of being able to get the ball away from the ball. The ball is now
+## gen_en_0017
+- Language: `en`
+- Category: `reflection`
+- Prompt: `One clear lesson from the pilot run was that`
+### step_4000 (step 4000)
+ the pilot was not a pilot. The pilot was not a pilot. The pilot was not a pilot. The pilot was not a pilot. The pilot was not
+### step_5000 (step 5000)
+ the pilot was in the middle of the runway. The pilot was in the middle of the runway and the pilot was in the middle of the runway. The pilot
+### step_6000 (step 6000)
+ the pilot run was not the same as the pilot run. The pilot run was not the same as the pilot run. The pilot run was not the same as
+### step_7000 (step 7000)
+ the pilot was not the first pilot to be able to fly. The pilot was not the first pilot to fly. The pilot was not the first pilot to fly
+### step_8000 (step 8000)
+ the pilot was not the first to be able to fly. The pilot was not able to fly the airplane to the ground and the airplane was not able to fly
+## gen_en_0018
+- Language: `en`
+- Category: `reflection`
+- Prompt: `The bilingual probes suggested that`
+### step_4000 (step 4000)
+ the bilingual probes were not the same as the bilingual probes. The bilingual probes were not the same as the biling
+### step_5000 (step 5000)
+ the bilingual probes of the bilingual probes of the bilingual probes of the bilingual probes of the bilingual
+### step_6000 (step 6000)
+ the bilingual probes were not the same as the bilingual probes. The bilingual probes were not the same as the biling
+### step_7000 (step 7000)
+ the bilingual probes were not the same. The bilingual probes were not the same. The bilingual probes were not the same
+### step_8000 (step 8000)
+ the bilingual probes are not the same as the bilingual probes. The bilingual probes are not the same as the biling
+## gen_en_0019
+- Language: `en`
+- Category: `technical`
+- Prompt: `A token-weighted validation loss avoids`
+### step_4000 (step 4000)
+ the ability to maintain a healthy life. The goal of this study is to determine the potential of a healthy life. The goal of this study is to determine the
+### step_5000 (step 5000)
+ the United States in the first round of the 2010 U.S. Open.
+### step_6000 (step 6000)
+ the loss of the title. The title of the title is a reference to the title of the title of the title of the title of the title of the title
+### step_7000 (step 7000)
+ the loss of the game. The game is a game that is played by a player who is not a player who is a player who is a player who is
+### step_8000 (step 8000)
+ the loss of the sport. The team's performance is expected to be the first time the team will be able to recover from the injury. The team will be
+## gen_en_0020
+- Language: `en`
+- Category: `technical`
+- Prompt: `A lightweight repetition metric can reveal`
+### step_4000 (step 4000)
+ a lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight light
+### step_5000 (step 5000)
+ a new lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight
+### step_6000 (step 6000)
+ the effect of the lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight
+### step_7000 (step 7000)
+ the presence of the new lightweight. The lightweight is a lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight light
+### step_8000 (step 8000)
+ the effect of the lightweight audacity. The lightweight audacity is the lightweight audacity that is used to detect lightweight audacity. The light
+## gen_it_0001
+- Language: `it`
+- Category: `story`
+- Prompt: `In una strada tranquilla vicino al fiume,`
+### step_4000 (step 4000)
+ il fiume si dirige verso la costa, dove si trova la città di San Francisco, dove si trova la città di San Francisco, dove si trova la città di
+### step_5000 (step 5000)
+ il fiume è stato distrutto e il fiume è stato distrutto. Il fiume è stato distrutto e il fiume è stato distrutto. Il fiume è stato distrutto e il fiume
+### step_6000 (step 6000)
+ il fiume è attraversato da un fiume, e da un fiume. Il fiume è attraversato da un fiume, e da un fiume, e da un fiume, e
+### step_7000 (step 7000)
+ il fiume è stato costruito su un'isola di sabbia, e il fiume è stato costruito su un'isola di sabbia. Il fiume è stato costruito su un
+### step_8000 (step 8000)
+ il fiume si trova a circa 1.000 metri di distanza. Il fiume è stato costruito da un gruppo di pescatori, che si trova a circa 1.000
+## gen_it_0002
+- Language: `it`
+- Category: `story`
+- Prompt: `All'alba il paese dormiva ancora, ma`
+### step_4000 (step 4000)
+ non era più possibile che il paese fosse un paese in cui il paese fosse un paese in cui il paese fosse un paese in cui il paese era un paese in
+### step_5000 (step 5000)
+ la sua vita era in un'altra parte, e la sua vita era in un'altra parte. La sua vita era in un'epoca in cui la
+### step_6000 (step 6000)
+ non si sa se non si sa se non si sa se non si sa se non si sa se non si sa se non si sa se non si sa se
+### step_7000 (step 7000)
+ non si era mai vista. Il paese era un paese che non aveva mai visto il suo paese, ma non aveva mai visto il suo paese. Il paese era
+### step_8000 (step 8000)
+ non era più possibile. Il paese era un paese che non aveva mai visto il paese. Il paese era un paese che non aveva mai visto il paese. Il
+## gen_it_0003
+- Language: `it`
+- Category: `story`
+- Prompt: `Un bambino aprì il vecchio cancello di legno e`
+### step_4000 (step 4000)
+ si mise a fare il cancello di legno. Il cancello di legno, che era stato poi portato in un altro cancello di legno, si fece costruire
+### step_5000 (step 5000)
+ la sua bocca era piena di lacrime. La sua bocca era un’altra cosa che non era mai stata mai stata mai più. La sua bocca era un’
+### step_6000 (step 6000)
+ di legno. Il cancello di legno, che si trova nella parte centrale della città, è costituito da una serie di vasiature, che si trovano nella parte
+### step_7000 (step 7000)
+ la sua famiglia si trasferì in un piccolo villaggio. Il suo nome è stato scelto per la sua famiglia, ma non è stato scelto per la sua famiglia. Il
+### step_8000 (step 8000)
+ la sua famiglia si trasferì in un piccolo villaggio di pescatori. Il piccolo villaggio di pescatori, che si trovava in una zona di pescatori, si trovava in una zona
+## gen_it_0004
+- Language: `it`
+- Category: `story`
+- Prompt: `La tempesta finì poco prima di mezzanotte, e`
+### step_4000 (step 4000)
+ la tempesta si è spostata verso la costa del fiume, e la tempesta si è spostata verso la costa del fiume, e la tempesta si è spostata
+### step_5000 (step 5000)
+ poi di nuovo, e poi di nuovo, e poi di nuovo, e poi di nuovo, e di nuovo, e di nuovo, e di nuovo, e
+### step_6000 (step 6000)
+ poi di nuovo verso la fine del mese. Il giorno dopo, il giorno dopo, il giorno dopo, il giorno dopo, il giorno dopo, il giorno dopo
+### step_7000 (step 7000)
+ la sua vita cambiò il suo nome. La sua vita fu segnata da un'epoca di grande bellezza, che si concluse con la sua vita. La sua
+### step_8000 (step 8000)
+ la tempesta si era raffreddata. La tempesta si era raffreddata e la tempesta si era raffreddata. La tempesta si era raffreddata e la tempesta si era
+## gen_it_0005
+- Language: `it`
+- Category: `explanation`
+- Prompt: `Un confronto tra checkpoint bilingui è utile perché`
+### step_4000 (step 4000)
+ il checkpoint bilingui è un'opzione per il checkpoint bilingui. Il checkpoint bilingui è un'opzione per il checkpoint
+### step_5000 (step 5000)
+ non è possibile. Per esempio, se si desidera utilizzare un'opzione di checkpoint bilingui, è possibile utilizzare un'opzione di checkpoint biling
+### step_6000 (step 6000)
+ non è possibile che il checkpoint bilingui non sia un'opzione di default bilingui.
+### step_7000 (step 7000)
+ non si tratta di un'opzione di pagamento. Se si tratta di un'opzione di pagamento, si tratta di un'opzione di pagamento. Se si tratta
+### step_8000 (step 8000)
+ non è possibile. Il checkpoint bilingui è un modo per il quale si desidera utilizzare il checkpoint bilingui. Il checkpoint bilingui
+## gen_it_0006
+- Language: `it`
+- Category: `explanation`
+- Prompt: `Una curva di validation loss è più affidabile quando`
+### step_4000 (step 4000)
+ si tratta di un'autostrada di un'autostrada di un'autostrada di un'autostrada di un'autostrada di un'autostrada
+### step_5000 (step 5000)
+ si tratta di un’alternativa di un’alternativa di un’alternativa di un’alternativa di un’alternativa di un’alternativa
+### step_6000 (step 6000)
+ si tratta di un'altra cosa. Il problema è che il problema è che il problema è che il problema è che il problema è che il problema è che
+### step_7000 (step 7000)
+ si tratta di un'autostrada di autostrada. Il veicolo è stato costruito nel 2004, ma non è stato ancora completato. Il veicolo è stato costruito nel
+### step_8000 (step 8000)
+ si tratta di un'autostrada di un'autostrada di un'autostrada di un'autostrada di un'autostrada di un'autostrada
+## gen_it_0007
+- Language: `it`
+- Category: `explanation`
+- Prompt: `Un dataset packed va ricostruito dopo un cambio di tokenizer perché`
+### step_4000 (step 4000)
+ il suo nome è stato un po' più grande. Il suo nome è "The Greatest" (in inglese "The Greatest" in inglese "The
+### step_5000 (step 5000)
+ non ha mai avuto un’idea di come la sua vita. Ma non è un’idea di come la vita di un’altra persona. Ma non è
+### step_6000 (step 6000)
+ non è stato possibile. Il suo nome è "Solve", che è stato usato per la sua "piccola" e per la sua "p
+### step_7000 (step 7000)
+ il suo nome è stato cambiato. Il suo nome è stato cambiato per la sua forma di "piccola" e "piccola" e "p
+### step_8000 (step 8000)
+ il suo nome è stato trovato in un'altra casa. Il suo nome è stato trovato in un'altra casa, ma il suo nome è stato trovato in
+## gen_it_0008
+- Language: `it`
+- Category: `news`
+- Prompt: `Il laboratorio locale ha annunciato che`
+### step_4000 (step 4000)
+ il paziente è stato sottoposto a un intervento chirurgico per il trattamento di un paziente che ha avuto un'attitudine di tempo per un periodo di tempo prolungato.
+### step_5000 (step 5000)
+ il laboratorio ha iniziato a lavorare su un nuovo laboratorio di ricerca per la ricerca e la ricerca di nuovi metodi di ricerca per la ricerca. I ricercatori hanno scoperto che
+### step_6000 (step 6000)
+ il governo di centro-destra ha deciso di non essere in grado di garantire la sicurezza di tutti i cittadini. Il ministro della difesa, John Paul, ha
+### step_7000 (step 7000)
+ il laboratorio ha completato il suo progetto di ricerca. Il laboratorio ha completato il suo progetto di ricerca e ha completato il suo progetto di ricerca. Il laboratorio ha completato
+### step_8000 (step 8000)
+ il laboratorio ha completato il suo progetto di costruzione. Il laboratorio ha completato il progetto di costruzione di un nuovo impianto di produzione di energia elettrica, che è stato progettato
+## gen_it_0009
+- Language: `it`
+- Category: `news`
+- Prompt: `Secondo il log operativo, la run`
+### step_4000 (step 4000)
+-to-play è stata interrotta da un'altra versione di un'altra versione di un'altra versione di un'altra versione di un'altra
+### step_5000 (step 5000)
+-off è stata una delle più grandi sfide di crescita del mondo. La tecnologia ha anche un'ampia gamma di applicazioni di tecnologie e tecnologie che hanno portato alla
+### step_6000 (step 6000)
+e è una delle più grandi aziende di tutto il mondo. La maggior parte delle aziende di tutto il mondo, che è la più grande azienda di tutto il mondo
+### step_7000 (step 7000)
+time è stata una delle più grandi aziende di tutto il mondo. La maggior parte dei prodotti di questo tipo di prodotti di questo tipo di prodotti di questo tipo di
+### step_8000 (step 8000)
+-in è stata una delle più grandi aziende di tutto il mondo. La tecnologia è stata sviluppata per la prima volta nel 2003, ma è stata sviluppata per la
+## gen_it_0010
+- Language: `it`
+- Category: `news`
+- Prompt: `L'ultimo riepilogo sperimentale ha notato che`
+### step_4000 (step 4000)
+ il suo studio è stato condotto da un team di ricercatori dell'Università di Harvard, che ha condotto una serie di esperimenti per indagare su un'eventuale ricerca.
+### step_5000 (step 5000)
+ la sua capacità di un'azione non è stata in grado di produrre un'azione non solo in termini di tempo, ma anche in termini di tempo. La
+### step_6000 (step 6000)
+ il suo lavoro è stato molto più volte, ma non ha mai avuto un'idea di come il suo lavoro. Il suo lavoro è stato molto più volte,
+### step_7000 (step 7000)
+ la sua presenza è stata confermata da un'altra parte del team di sviluppo. Il team ha anche mostrato che la sua presenza è stata confermata da un'altra
+### step_8000 (step 8000)
+ il suo lavoro è stato molto più complesso e che il suo lavoro è stato molto più complesso. Il suo lavoro è stato molto più complesso e ha anche dimostrato che
+## gen_it_0011
+- Language: `it`
+- Category: `dialogue`
+- Prompt: `"Possiamo riprendere dal checkpoint migliore?"`
+### step_4000 (step 4000)
+ "E' un po' di tempo che non si può fare a meno di fare a meno di fare a meno di fare a meno di fare a meno di
+### step_5000 (step 5000)
+ "Sì, non è un'idea di un'idea di un'idea di un'idea di un'idea di un'idea di un'
+### step_6000 (step 6000)
+<empty>
+### step_7000 (step 7000)
+ "Sì, non è un problema. Non è un problema. Non è un problema. Non è un problema. Non è un problema. Non è un
+### step_8000 (step 8000)
+ "Sì, non è un problema. Non è un problema. Non è un problema. Non è un problema. Non è un problema. Non è un
+## gen_it_0012
+- Language: `it`
+- Category: `dialogue`
+- Prompt: `"Ho controllato il report due volte," disse l'ingegnere,`
+### step_4000 (step 4000)
+ "ma non ho mai visto niente di più che non ho mai visto niente di più che non ho mai visto niente di più che non ho visto niente di più
+### step_5000 (step 5000)
+ "ma non è un'altra cosa che non è un'altra cosa che non è un'altra cosa che non è un'altra cosa che non è
+### step_6000 (step 6000)
+ che aveva chiesto di essere in grado di fare un'indagine. L'uomo, che aveva chiesto di essere in grado di fare un'indagine, ha detto
+### step_7000 (step 7000)
+ "non è stato un problema, ma è stato un problema. Ho provato a fare un test di laboratorio per il mio lavoro, ma non ho mai avuto problemi
+### step_8000 (step 8000)
+ "non è stato un problema. Non è stato un problema. Non è stato un problema. Non è stato un problema. Non è stato un problema. Non
+## gen_it_0013
+- Language: `it`
+- Category: `description`
+- Prompt: `La piccola workstation sotto la scrivania`
+### step_4000 (step 4000)
+ di un'orchestra di un'orchestra di un'orchestra di un'orchestra di un'orchestra di un'orchestra di un'orchestra di un'orchestra
+### step_5000 (step 5000)
+.
+### step_6000 (step 6000)
+ è un'ottima scelta.
+### step_7000 (step 7000)
+ di un'altra nota, che ha fatto il suo debutto in Serie A, con la maglia della Nazionale italiana, che ha visto la sua prima rete in Serie
+### step_8000 (step 8000)
+ di un'altra nota.
+## gen_it_0014
+- Language: `it`
+- Category: `description`
+- Prompt: `La dashboard di training sullo schermo`
+### step_4000 (step 4000)
+ è un'esperienza di un'esperienza di un'esperienza di un'esperienza di un'esperienza di un'esperienza di un'esperienza di un'esperienza
+### step_5000 (step 5000)
+ è un'esperienza di fitness che si concentra su un'esperienza di fitness. I nostri obiettivi sono: - I nostri obiettivi sono: - I nostri obiettivi sono
+### step_6000 (step 6000)
+ è un'ottima scelta per il vostro corpo. Il vostro corpo è un'ottima scelta per il vostro corpo. Il vostro corpo è un'ottima scelta per
+### step_7000 (step 7000)
+, la tecnologia di base di un'azienda di produzione di apparecchiature, la tecnologia di base di un'azienda di produzione di apparecchiature, la tecnologia di
+### step_8000 (step 8000)
+, che è stato progettato per la prima volta da un'azienda di produzione di chip, che ha portato a un'ampia gamma di applicazioni. Il primo prototipo
+## gen_it_0015
+- Language: `it`
+- Category: `instructional`
+- Prompt: `Per confrontare due checkpoint pretrained, prima`
+### step_4000 (step 4000)
+ di essere in grado di eseguire un checkpoint predefinito. Per ulteriori informazioni, visitare www.sacredit.com.
+### step_5000 (step 5000)
+ di iniziare a fare un passo indietro. Per esempio, se non è possibile eseguire un checkpoint predefinito, è possibile utilizzare un checkpoint predefinito
+### step_6000 (step 6000)
+ di iniziare a scaricare la cartella. Per la prima volta, la cartella è stata chiusa. La cartella è stata chiusa. La cartella è stata
+### step_7000 (step 7000)
+ di iniziare a scaricare il tuo account. Se non hai ancora trovato il tuo account, non hai ancora trovato il tuo account. Se non hai ancora trovato il tuo
+### step_8000 (step 8000)
+ di iniziare a fare clic su un'altra. Se si desidera utilizzare un'opzione di pagamento, si consiglia di utilizzare un'opzione di pagamento per il pagamento
+## gen_it_0016
+- Language: `it`
+- Category: `instructional`
+- Prompt: `Quando una run si interrompe inaspettatamente, il passo più sicuro è`
+### step_4000 (step 4000)
+ quello di un'altra. Il tempo è che la sua vita è più forte, ma non è che la sua vita è più forte. La sua vita è
+### step_5000 (step 5000)
+ quello di un’altra fase. La maggior parte dei casi di un’infezione da virus è la malattia di un’infezione da virus. La maggior parte dei
+### step_6000 (step 6000)
+ quello di un’altra, ma non è più così. Il passo più importante è quello di un’altra, ma non è più così. Il passo più
+### step_7000 (step 7000)
+ quello di un'altra fase. Il passo più sicuro è quello di un'altra fase. Il passo più sicuro è quello di un'altra fase. Il
+### step_8000 (step 8000)
+ quello di un’altra fase. Il tempo di un’altra fase è quello di un’altra fase. Il tempo di un’altra fase è quello di
+## gen_it_0017
+- Language: `it`
+- Category: `reflection`
+- Prompt: `Una lezione chiara del pilot run è stata che`
+### step_4000 (step 4000)
+ il pilota di bordo è stato lanciato da un'auto a bordo della vettura. Il pilota ha poi aggiunto che il pilota ha dovuto essere stato in grado di volare
+### step_5000 (step 5000)
+ la NASA ha lanciato un nuovo test di test di test di test di test di test di test di test di test di test di test di test di test di
+### step_6000 (step 6000)
+ la nave ha fatto il giro di un'ora dopo che la nave ha fatto il giro di un giro di un'ora dopo che la nave ha fatto il
+### step_7000 (step 7000)
+ la sua vita è stata un'esperienza molto particolare. La sua vita è stata un'esperienza molto particolare. La sua vita è stata un'esperienza molto particolare
+### step_8000 (step 8000)
+ la sua vita è stata una delle più grandi sfide che ha portato alla nascita di un uomo che ha avuto un'infanzia felice.
+## gen_it_0018
+- Language: `it`
+- Category: `reflection`
+- Prompt: `Le probe bilingui hanno suggerito che`
+### step_4000 (step 4000)
+ la loro esistenza è stata in parte dovuta alla loro morte. La loro morte è stata in parte dovuta alla morte di un uomo che ha perso la vita. La
+### step_5000 (step 5000)
+ la loro presenza è stata una delle più grandiosa e ha avuto un impatto significativo sulla salute. La maggior parte dei pazienti ha avuto un effetto positivo sulla salute e
+### step_6000 (step 6000)
+ il loro lavoro è stato molto più grande di quello che ha fatto. Il lavoro è stato molto più grande di quello che ha fatto. Il lavoro è stato molto
+### step_7000 (step 7000)
+ la loro vita è stata influenzata da un'altra parte della popolazione. La loro vita è stata influenzata da un'altra parte della popolazione. La loro
+### step_8000 (step 8000)
+ il loro lavoro è stato molto più facile da fare. Il loro lavoro è stato molto più facile da fare. Il loro lavoro è stato molto più facile da fare
+## gen_it_0019
+- Language: `it`
+- Category: `technical`
+- Prompt: `Una validation loss pesata sui token evita`
+### step_4000 (step 4000)
+ la vita di un uomo che, in un certo senso, non ha mai avuto la possibilità di essere più in grado di essere più in grado di essere più in
+### step_5000 (step 5000)
+ di non essere mai stata in grado di non essere mai stata in grado di non essere in grado di non essere in grado di non essere in grado di non essere
+### step_6000 (step 6000)
+bility. The result is that the risk of the risk of the risk of the risk of the risk of the risk of the risk of the risk of the risk
+### step_7000 (step 7000)
+ di un'infezione. La malattia è stata descritta in modo da un'infezione che ha colpito la popolazione di un'infezione. La malattia è stata descritta in
+### step_8000 (step 8000)
+ di un'altra malattia.
+## gen_it_0020
+- Language: `it`
+- Category: `technical`
+- Prompt: `Una metrica leggera di ripetizione può rivelare`
+### step_4000 (step 4000)
+ la natura del vento. La luce è un'immagine di un'immagine di un'immagine di un'immagine di un'immagine di un'immagine di
+### step_5000 (step 5000)
+ un'immagine di un'immagine di un'immagine di un'immagine di un'immagine di un'immagine di un'immagine di un'immagine di
+### step_6000 (step 6000)
+ la sua bellezza e la sua bellezza.
+### step_7000 (step 7000)
+ la presenza di un'infezione da virus, virus o virus. La maggior parte dei casi di infezione da virus è la malattia di cui si è sviluppato la malattia
+### step_8000 (step 8000)
+ la presenza di un'ampia gamma di fattori che possono essere facilmente individuabili. La maggior parte dei casi di ripetizione può essere un'attività di ripetizione
+## Repetition diagnostics
+| checkpoint_name | distinct_1 | distinct_2 | repeated_4gram_rate | loop_rate |
+| --- | --- | --- | --- | --- |
+| step_4000 | 0.2063 | 0.4251 | 0.9000 | 0.7250 |
+| step_5000 | 0.2099 | 0.4202 | 0.8500 | 0.5750 |
+| step_6000 | 0.2228 | 0.4377 | 0.7750 | 0.5250 |
+| step_7000 | 0.2151 | 0.4431 | 0.8750 | 0.5500 |
+| step_8000 | 0.2289 | 0.4706 | 0.7500 | 0.4000 |
+## Language-switch diagnostics
+| checkpoint_name | language_switch_rate_en | language_switch_rate_it | language_consistency_en | language_consistency_it |
+| --- | --- | --- | --- | --- |
+| step_4000 | 0.0000 | 0.0500 | 0.9500 | 0.8500 |
+| step_5000 | 0.0000 | 0.0000 | 0.8750 | 0.7500 |
+| step_6000 | 0.0000 | 0.0500 | 0.9250 | 0.8250 |
+| step_7000 | 0.0000 | 0.0000 | 0.9250 | 0.7500 |
+| step_8000 | 0.0000 | 0.0000 | 1.0000 | 0.7750 |
+## Checkpoint recommendation impact
+- Recommended checkpoint: use `step_4000` based on `val_loss_mixed`.

benchmark_scores.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ []

benchmark_source_losses.json ADDED Viewed

	@@ -0,0 +1,309 @@

+{
+  "checkpoints": {
+    "20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki-step_4000-step_4000-4000": {
+      "books_en": {
+        "loss": 4.994737534295945,
+        "num_batches": 1,
+        "num_sequences": 1,
+        "num_tokens": 21,
+        "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/books_en.jsonl",
+        "perplexity": 147.6341913859129
+      },
+      "books_it": {
+        "loss": 5.027433122907366,
+        "num_batches": 1,
+        "num_sequences": 1,
+        "num_tokens": 21,
+        "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/books_it.jsonl",
+        "perplexity": 152.54095584476084
+      },
+      "code": {
+        "loss": 8.615218098958334,
+        "num_batches": 1,
+        "num_sequences": 1,
+        "num_tokens": 30,
+        "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/code.jsonl",
+        "perplexity": 5514.951287713211
+      },
+      "web_en": {
+        "loss": 6.153774060701069,
+        "num_batches": 1,
+        "num_sequences": 1,
+        "num_tokens": 19,
+        "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/web_en.jsonl",
+        "perplexity": 470.48969695119644
+      },
+      "web_it": {
+        "loss": 6.019744873046875,
+        "num_batches": 1,
+        "num_sequences": 1,
+        "num_tokens": 19,
+        "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/web_it.jsonl",
+        "perplexity": 411.47360432720393
+      },
+      "wiki_en": {
+        "loss": 3.8654462640935723,
+        "num_batches": 1,
+        "num_sequences": 1,
+        "num_tokens": 22,
+        "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/wiki_en.jsonl",
+        "perplexity": 47.72456544112049
+      },
+      "wiki_it": {
+        "loss": 3.56423828125,
+        "num_batches": 1,
+        "num_sequences": 1,
+        "num_tokens": 25,
+        "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/wiki_it.jsonl",
+        "perplexity": 35.312544929652795
+      }
+    },
+    "20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki-step_5000-step_5000-5000": {
+      "books_en": {
+        "loss": 5.295532953171503,
+        "num_batches": 1,
+        "num_sequences": 1,
+        "num_tokens": 21,
+        "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/books_en.jsonl",
+        "perplexity": 199.44389190139776
+      },
+      "books_it": {
+        "loss": 5.265413556780134,
+        "num_batches": 1,
+        "num_sequences": 1,
+        "num_tokens": 21,
+        "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/books_it.jsonl",
+        "perplexity": 193.52632636477796
+      },
+      "code": {
+        "loss": 8.621547444661458,
+        "num_batches": 1,
+        "num_sequences": 1,
+        "num_tokens": 30,
+        "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/code.jsonl",
+        "perplexity": 5549.968020553558
+      },
+      "web_en": {
+        "loss": 6.209454185084293,
+        "num_batches": 1,
+        "num_sequences": 1,
+        "num_tokens": 19,
+        "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/web_en.jsonl",
+        "perplexity": 497.4296726428682
+      },
+      "web_it": {
+        "loss": 6.207602249948602,
+        "num_batches": 1,
+        "num_sequences": 1,
+        "num_tokens": 19,
+        "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/web_it.jsonl",
+        "perplexity": 496.50931763649476
+      },
+      "wiki_en": {
+        "loss": 4.054272738370028,
+        "num_batches": 1,
+        "num_sequences": 1,
+        "num_tokens": 22,
+        "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/wiki_en.jsonl",
+        "perplexity": 57.64322604191456
+      },
+      "wiki_it": {
+        "loss": 3.58208984375,
+        "num_batches": 1,
+        "num_sequences": 1,
+        "num_tokens": 25,
+        "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/wiki_it.jsonl",
+        "perplexity": 35.94858933468458
+      }
+    },
+    "20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki-step_6000-step_6000-6000": {
+      "books_en": {
+        "loss": 5.409920828683036,
+        "num_batches": 1,
+        "num_sequences": 1,
+        "num_tokens": 21,
+        "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/books_en.jsonl",
+        "perplexity": 223.6138831740883
+      },
+      "books_it": {
+        "loss": 5.199875967843192,
+        "num_batches": 1,
+        "num_sequences": 1,
+        "num_tokens": 21,
+        "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/books_it.jsonl",
+        "perplexity": 181.24975968230822
+      },
+      "code": {
+        "loss": 8.31853790283203,
+        "num_batches": 1,
+        "num_sequences": 1,
+        "num_tokens": 30,
+        "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/code.jsonl",
+        "perplexity": 4099.162251172332
+      },
+      "web_en": {
+        "loss": 6.131187037417763,
+        "num_batches": 1,
+        "num_sequences": 1,
+        "num_tokens": 19,
+        "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/web_en.jsonl",
+        "perplexity": 459.98185240790855
+      },
+      "web_it": {
+        "loss": 6.786579332853618,
+        "num_batches": 1,
+        "num_sequences": 1,
+        "num_tokens": 19,
+        "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/web_it.jsonl",
+        "perplexity": 885.878079061639
+      },
+      "wiki_en": {
+        "loss": 4.168480613014915,
+        "num_batches": 1,
+        "num_sequences": 1,
+        "num_tokens": 22,
+        "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/wiki_en.jsonl",
+        "perplexity": 64.617198952921
+      },
+      "wiki_it": {
+        "loss": 3.8566854858398436,
+        "num_batches": 1,
+        "num_sequences": 1,
+        "num_tokens": 25,
+        "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/wiki_it.jsonl",
+        "perplexity": 47.30828722907459
+      }
+    },
+    "20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki-step_7000-step_7000-7000": {
+      "books_en": {
+        "loss": 5.137017386300223,
+        "num_batches": 1,
+        "num_sequences": 1,
+        "num_tokens": 21,
+        "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/books_en.jsonl",
+        "perplexity": 170.20734771999355
+      },
+      "books_it": {
+        "loss": 5.132158551897321,
+        "num_batches": 1,
+        "num_sequences": 1,
+        "num_tokens": 21,
+        "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/books_it.jsonl",
+        "perplexity": 169.38234430383022
+      },
+      "code": {
+        "loss": 8.338209533691407,
+        "num_batches": 1,
+        "num_sequences": 1,
+        "num_tokens": 30,
+        "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/code.jsonl",
+        "perplexity": 4180.59781690682
+      },
+      "web_en": {
+        "loss": 6.089872661389802,
+        "num_batches": 1,
+        "num_sequences": 1,
+        "num_tokens": 19,
+        "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/web_en.jsonl",
+        "perplexity": 441.36520473566287
+      },
+      "web_it": {
+        "loss": 6.3964783517937915,
+        "num_batches": 1,
+        "num_sequences": 1,
+        "num_tokens": 19,
+        "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/web_it.jsonl",
+        "perplexity": 599.72927903983
+      },
+      "wiki_en": {
+        "loss": 4.025867115367543,
+        "num_batches": 1,
+        "num_sequences": 1,
+        "num_tokens": 22,
+        "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/wiki_en.jsonl",
+        "perplexity": 56.02887121924366
+      },
+      "wiki_it": {
+        "loss": 3.61209228515625,
+        "num_batches": 1,
+        "num_sequences": 1,
+        "num_tokens": 25,
+        "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/wiki_it.jsonl",
+        "perplexity": 37.04347730722537
+      }
+    },
+    "20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki-step_8000-step_8000-8000": {
+      "books_en": {
+        "loss": 5.153700692313058,
+        "num_batches": 1,
+        "num_sequences": 1,
+        "num_tokens": 21,
+        "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/books_en.jsonl",
+        "perplexity": 173.0707884007461
+      },
+      "books_it": {
+        "loss": 5.1257749285016745,
+        "num_batches": 1,
+        "num_sequences": 1,
+        "num_tokens": 21,
+        "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/books_it.jsonl",
+        "perplexity": 168.30451509598072
+      },
+      "code": {
+        "loss": 8.328606669108073,
+        "num_batches": 1,
+        "num_sequences": 1,
+        "num_tokens": 30,
+        "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/code.jsonl",
+        "perplexity": 4140.644243596859
+      },
+      "web_en": {
+        "loss": 6.201997455797698,
+        "num_batches": 1,
+        "num_sequences": 1,
+        "num_tokens": 19,
+        "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/web_en.jsonl",
+        "perplexity": 493.73426916939
+      },
+      "web_it": {
+        "loss": 6.4544139661287,
+        "num_batches": 1,
+        "num_sequences": 1,
+        "num_tokens": 19,
+        "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/web_it.jsonl",
+        "perplexity": 635.501191880854
+      },
+      "wiki_en": {
+        "loss": 3.9959980357776987,
+        "num_batches": 1,
+        "num_sequences": 1,
+        "num_tokens": 22,
+        "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/wiki_en.jsonl",
+        "perplexity": 54.38008682172939
+      },
+      "wiki_it": {
+        "loss": 3.62702880859375,
+        "num_batches": 1,
+        "num_sequences": 1,
+        "num_tokens": 25,
+        "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/wiki_it.jsonl",
+        "perplexity": 37.60093091976498
+      }
+    }
+  },
+  "recommended_checkpoint": {
+    "checkpoint_name": "step_4000",
+    "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt",
+    "direction": "min",
+    "value": 5.143982324844751
+  },
+  "source_losses": {
+    "books_en": 4.994737534295945,
+    "books_it": 5.027433122907366,
+    "code": 8.615218098958334,
+    "web_en": 6.153774060701069,
+    "web_it": 6.019744873046875,
+    "wiki_en": 3.8654462640935723,
+    "wiki_it": 3.56423828125
+  }
+}

best_validation.json ADDED Viewed

	@@ -0,0 +1,8 @@

+{
+  "step": 8000,
+  "validation_loss": 3.882301174864477,
+  "validation_perplexity": 48.53577596923642,
+  "validation_num_batches": 128,
+  "elapsed_sec": 66536.9193212986,
+  "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_8000.pt"
+}

comparison.json ADDED Viewed

	@@ -0,0 +1,404 @@

+{
+  "contains_mixed_model_types": false,
+  "metric_recommendations": {
+    "cloze_en_contains": {
+      "checkpoint_name": "step_7000",
+      "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_7000.pt",
+      "direction": "max",
+      "value": 0.04
+    },
+    "cloze_en_exact": {
+      "checkpoint_name": "step_4000",
+      "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt",
+      "direction": "max",
+      "value": 0.0
+    },
+    "cloze_it_contains": {
+      "checkpoint_name": "step_8000",
+      "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_8000.pt",
+      "direction": "max",
+      "value": 0.12
+    },
+    "cloze_it_exact": {
+      "checkpoint_name": "step_4000",
+      "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt",
+      "direction": "max",
+      "value": 0.0
+    },
+    "distinct_1": {
+      "checkpoint_name": "step_8000",
+      "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_8000.pt",
+      "direction": "max",
+      "value": 0.22893954410307235
+    },
+    "distinct_2": {
+      "checkpoint_name": "step_8000",
+      "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_8000.pt",
+      "direction": "max",
+      "value": 0.47058823529411764
+    },
+    "language_consistency_en": {
+      "checkpoint_name": "step_8000",
+      "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_8000.pt",
+      "direction": "max",
+      "value": 1.0
+    },
+    "language_consistency_it": {
+      "checkpoint_name": "step_4000",
+      "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt",
+      "direction": "max",
+      "value": 0.85
+    },
+    "language_switch_rate_en": {
+      "checkpoint_name": "step_4000",
+      "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt",
+      "direction": "min",
+      "value": 0.0
+    },
+    "language_switch_rate_it": {
+      "checkpoint_name": "step_5000",
+      "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_5000.pt",
+      "direction": "min",
+      "value": 0.0
+    },
+    "loop_rate": {
+      "checkpoint_name": "step_8000",
+      "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_8000.pt",
+      "direction": "min",
+      "value": 0.4
+    },
+    "ppl_en": {
+      "checkpoint_name": "step_4000",
+      "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt",
+      "direction": "min",
+      "value": 119.05713337502907
+    },
+    "ppl_it": {
+      "checkpoint_name": "step_4000",
+      "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt",
+      "direction": "min",
+      "value": 57.27657765139552
+    },
+    "ppl_mixed": {
+      "checkpoint_name": "step_4000",
+      "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt",
+      "direction": "min",
+      "value": 171.39696944872787
+    },
+    "repeated_4gram_rate": {
+      "checkpoint_name": "step_8000",
+      "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_8000.pt",
+      "direction": "min",
+      "value": 0.75
+    },
+    "source_loss_books_en": {
+      "checkpoint_name": "step_4000",
+      "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt",
+      "direction": "min",
+      "value": 4.994737534295945
+    },
+    "source_loss_books_it": {
+      "checkpoint_name": "step_4000",
+      "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt",
+      "direction": "min",
+      "value": 5.027433122907366
+    },
+    "source_loss_code": {
+      "checkpoint_name": "step_6000",
+      "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_6000.pt",
+      "direction": "min",
+      "value": 8.31853790283203
+    },
+    "source_loss_web_en": {
+      "checkpoint_name": "step_7000",
+      "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_7000.pt",
+      "direction": "min",
+      "value": 6.089872661389802
+    },
+    "source_loss_web_it": {
+      "checkpoint_name": "step_4000",
+      "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt",
+      "direction": "min",
+      "value": 6.019744873046875
+    },
+    "source_loss_wiki_en": {
+      "checkpoint_name": "step_4000",
+      "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt",
+      "direction": "min",
+      "value": 3.8654462640935723
+    },
+    "source_loss_wiki_it": {
+      "checkpoint_name": "step_4000",
+      "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt",
+      "direction": "min",
+      "value": 3.56423828125
+    },
+    "val_loss_en": {
+      "checkpoint_name": "step_4000",
+      "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt",
+      "direction": "min",
+      "value": 4.779603490289652
+    },
+    "val_loss_it": {
+      "checkpoint_name": "step_4000",
+      "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt",
+      "direction": "min",
+      "value": 4.047891773161341
+    },
+    "val_loss_mixed": {
+      "checkpoint_name": "step_4000",
+      "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt",
+      "direction": "min",
+      "value": 5.143982324844751
+    }
+  },
+  "recommended_checkpoint": {
+    "checkpoint_name": "step_4000",
+    "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt",
+    "direction": "min",
+    "value": 5.143982324844751
+  },
+  "recommended_metric": "val_loss_mixed",
+  "rows": [
+    {
+      "aggregate_dataset_count": 0,
+      "aggregate_validation_loss_mean": null,
+      "aggregate_validation_perplexity_mean": null,
+      "checkpoint_name": "step_4000",
+      "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt",
+      "checkpoint_selector": "step_4000",
+      "checkpoint_step": 4000,
+      "cloze_en_contains": 0.0,
+      "cloze_en_exact": 0.0,
+      "cloze_it_contains": 0.08,
+      "cloze_it_exact": 0.0,
+      "delta_vs_previous_generation_pass_rate": null,
+      "delta_vs_previous_validation_loss_mean": null,
+      "distinct_1": 0.20633397312859886,
+      "distinct_2": 0.4251497005988024,
+      "generation_pass_rate": null,
+      "generation_pass_rate_regression_vs_previous": false,
+      "generation_passed_prompts": 0,
+      "generation_scored_prompts": 0,
+      "generation_total_prompts": 40,
+      "language_consistency_en": 0.95,
+      "language_consistency_it": 0.85,
+      "language_switch_rate_en": 0.0,
+      "language_switch_rate_it": 0.05,
+      "loop_rate": 0.725,
+      "model_type": "pretrained",
+      "ppl_en": 119.05713337502907,
+      "ppl_it": 57.27657765139552,
+      "ppl_mixed": 171.39696944872787,
+      "repeated_4gram_rate": 0.9,
+      "run_dir": "/mnt/apps/llm-nanochat/artifacts/runs/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki",
+      "run_name": "20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki",
+      "selected_by": "step_4000",
+      "selection_metric_name": null,
+      "selection_metric_value": null,
+      "source_loss_books_en": 4.994737534295945,
+      "source_loss_books_it": 5.027433122907366,
+      "source_loss_code": 8.615218098958334,
+      "source_loss_web_en": 6.153774060701069,
+      "source_loss_web_it": 6.019744873046875,
+      "source_loss_wiki_en": 3.8654462640935723,
+      "source_loss_wiki_it": 3.56423828125,
+      "val_loss_en": 4.779603490289652,
+      "val_loss_it": 4.047891773161341,
+      "val_loss_mixed": 5.143982324844751,
+      "validation_loss_regression_vs_previous": false
+    },
+    {
+      "aggregate_dataset_count": 0,
+      "aggregate_validation_loss_mean": null,
+      "aggregate_validation_perplexity_mean": null,
+      "checkpoint_name": "step_5000",
+      "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_5000.pt",
+      "checkpoint_selector": "step_5000",
+      "checkpoint_step": 5000,
+      "cloze_en_contains": 0.02,
+      "cloze_en_exact": 0.0,
+      "cloze_it_contains": 0.1,
+      "cloze_it_exact": 0.0,
+      "delta_vs_previous_generation_pass_rate": null,
+      "delta_vs_previous_validation_loss_mean": null,
+      "distinct_1": 0.2099009900990099,
+      "distinct_2": 0.42018537590113286,
+      "generation_pass_rate": null,
+      "generation_pass_rate_regression_vs_previous": false,
+      "generation_passed_prompts": 0,
+      "generation_scored_prompts": 0,
+      "generation_total_prompts": 40,
+      "language_consistency_en": 0.875,
+      "language_consistency_it": 0.75,
+      "language_switch_rate_en": 0.0,
+      "language_switch_rate_it": 0.0,
+      "loop_rate": 0.575,
+      "model_type": "pretrained",
+      "ppl_en": 156.06746197419037,
+      "ppl_it": 67.12826498983866,
+      "ppl_mixed": 213.81993054234522,
+      "repeated_4gram_rate": 0.85,
+      "run_dir": "/mnt/apps/llm-nanochat/artifacts/runs/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki",
+      "run_name": "20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki",
+      "selected_by": "step_5000",
+      "selection_metric_name": null,
+      "selection_metric_value": null,
+      "source_loss_books_en": 5.295532953171503,
+      "source_loss_books_it": 5.265413556780134,
+      "source_loss_code": 8.621547444661458,
+      "source_loss_web_en": 6.209454185084293,
+      "source_loss_web_it": 6.207602249948602,
+      "source_loss_wiki_en": 4.054272738370028,
+      "source_loss_wiki_it": 3.58208984375,
+      "val_loss_en": 5.050288362323113,
+      "val_loss_it": 4.206605192090644,
+      "val_loss_mixed": 5.365134214743589,
+      "validation_loss_regression_vs_previous": false
+    },
+    {
+      "aggregate_dataset_count": 0,
+      "aggregate_validation_loss_mean": null,
+      "aggregate_validation_perplexity_mean": null,
+      "checkpoint_name": "step_6000",
+      "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_6000.pt",
+      "checkpoint_selector": "step_6000",
+      "checkpoint_step": 6000,
+      "cloze_en_contains": 0.0,
+      "cloze_en_exact": 0.0,
+      "cloze_it_contains": 0.06,
+      "cloze_it_exact": 0.0,
+      "delta_vs_previous_generation_pass_rate": null,
+      "delta_vs_previous_validation_loss_mean": null,
+      "distinct_1": 0.22282023681377824,
+      "distinct_2": 0.4377104377104377,
+      "generation_pass_rate": null,
+      "generation_pass_rate_regression_vs_previous": false,
+      "generation_passed_prompts": 0,
+      "generation_scored_prompts": 0,
+      "generation_total_prompts": 40,
+      "language_consistency_en": 0.925,
+      "language_consistency_it": 0.825,
+      "language_switch_rate_en": 0.0,
+      "language_switch_rate_it": 0.05,
+      "loop_rate": 0.525,
+      "model_type": "pretrained",
+      "ppl_en": 192.97065258207533,
+      "ppl_it": 78.20997526013538,
+      "ppl_mixed": 253.77314221335507,
+      "repeated_4gram_rate": 0.775,
+      "run_dir": "/mnt/apps/llm-nanochat/artifacts/runs/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki",
+      "run_name": "20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki",
+      "selected_by": "step_6000",
+      "selection_metric_name": null,
+      "selection_metric_value": null,
+      "source_loss_books_en": 5.409920828683036,
+      "source_loss_books_it": 5.199875967843192,
+      "source_loss_code": 8.31853790283203,
+      "source_loss_web_en": 6.131187037417763,
+      "source_loss_web_it": 6.786579332853618,
+      "source_loss_wiki_en": 4.168480613014915,
+      "source_loss_wiki_it": 3.8566854858398436,
+      "val_loss_en": 5.262538118182488,
+      "val_loss_it": 4.359397200287366,
+      "val_loss_mixed": 5.536440727038261,
+      "validation_loss_regression_vs_previous": false
+    },
+    {
+      "aggregate_dataset_count": 0,
+      "aggregate_validation_loss_mean": null,
+      "aggregate_validation_perplexity_mean": null,
+      "checkpoint_name": "step_7000",
+      "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_7000.pt",
+      "checkpoint_selector": "step_7000",
+      "checkpoint_step": 7000,
+      "cloze_en_contains": 0.04,
+      "cloze_en_exact": 0.0,
+      "cloze_it_contains": 0.1,
+      "cloze_it_exact": 0.0,
+      "delta_vs_previous_generation_pass_rate": null,
+      "delta_vs_previous_validation_loss_mean": null,
+      "distinct_1": 0.21511627906976744,
+      "distinct_2": 0.4431017119838872,
+      "generation_pass_rate": null,
+      "generation_pass_rate_regression_vs_previous": false,
+      "generation_passed_prompts": 0,
+      "generation_scored_prompts": 0,
+      "generation_total_prompts": 40,
+      "language_consistency_en": 0.925,
+      "language_consistency_it": 0.75,
+      "language_switch_rate_en": 0.0,
+      "language_switch_rate_it": 0.0,
+      "loop_rate": 0.55,
+      "model_type": "pretrained",
+      "ppl_en": 154.45723780849602,
+      "ppl_it": 61.331402298576066,
+      "ppl_mixed": 206.7071249834935,
+      "repeated_4gram_rate": 0.875,
+      "run_dir": "/mnt/apps/llm-nanochat/artifacts/runs/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki",
+      "run_name": "20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki",
+      "selected_by": "step_7000",
+      "selection_metric_name": null,
+      "selection_metric_value": null,
+      "source_loss_books_en": 5.137017386300223,
+      "source_loss_books_it": 5.132158551897321,
+      "source_loss_code": 8.338209533691407,
+      "source_loss_web_en": 6.089872661389802,
+      "source_loss_web_it": 6.3964783517937915,
+      "source_loss_wiki_en": 4.025867115367543,
+      "source_loss_wiki_it": 3.61209228515625,
+      "val_loss_en": 5.03991728008918,
+      "val_loss_it": 4.116291984182889,
+      "val_loss_mixed": 5.331302936260517,
+      "validation_loss_regression_vs_previous": false
+    },
+    {
+      "aggregate_dataset_count": 0,
+      "aggregate_validation_loss_mean": null,
+      "aggregate_validation_perplexity_mean": null,
+      "checkpoint_name": "step_8000",
+      "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_8000.pt",
+      "checkpoint_selector": "step_8000",
+      "checkpoint_step": 8000,
+      "cloze_en_contains": 0.0,
+      "cloze_en_exact": 0.0,
+      "cloze_it_contains": 0.12,
+      "cloze_it_exact": 0.0,
+      "delta_vs_previous_generation_pass_rate": null,
+      "delta_vs_previous_validation_loss_mean": null,
+      "distinct_1": 0.22893954410307235,
+      "distinct_2": 0.47058823529411764,
+      "generation_pass_rate": null,
+      "generation_pass_rate_regression_vs_previous": false,
+      "generation_passed_prompts": 0,
+      "generation_scored_prompts": 0,
+      "generation_total_prompts": 40,
+      "language_consistency_en": 1.0,
+      "language_consistency_it": 0.775,
+      "language_switch_rate_en": 0.0,
+      "language_switch_rate_it": 0.0,
+      "loop_rate": 0.4,
+      "model_type": "pretrained",
+      "ppl_en": 147.3507568259974,
+      "ppl_it": 62.831334211250244,
+      "ppl_mixed": 219.85916404362194,
+      "repeated_4gram_rate": 0.75,
+      "run_dir": "/mnt/apps/llm-nanochat/artifacts/runs/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki",
+      "run_name": "20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki",
+      "selected_by": "step_8000",
+      "selection_metric_name": null,
+      "selection_metric_value": null,
+      "source_loss_books_en": 5.153700692313058,
+      "source_loss_books_it": 5.1257749285016745,
+      "source_loss_code": 8.328606669108073,
+      "source_loss_web_en": 6.201997455797698,
+      "source_loss_web_it": 6.4544139661287,
+      "source_loss_wiki_en": 3.9959980357776987,
+      "source_loss_wiki_it": 3.62702880859375,
+      "val_loss_en": 4.992815845417526,
+      "val_loss_it": 4.140453901447233,
+      "val_loss_mixed": 5.392987177922175,
+      "validation_loss_regression_vs_previous": false
+    }
+  ]
+}

eval_metrics.jsonl ADDED Viewed

	@@ -0,0 +1,8 @@

+{"step": 1000, "validation_loss": 5.482670289795606, "validation_perplexity": 240.48802345829137, "validation_num_batches": 128, "elapsed_sec": 8324.719261169434}
+{"step": 2000, "validation_loss": 4.5780933100779375, "validation_perplexity": 97.3286416369149, "validation_num_batches": 128, "elapsed_sec": 16636.953268051147}
+{"step": 3000, "validation_loss": 4.2193704548876845, "validation_perplexity": 67.99066762083781, "validation_num_batches": 128, "elapsed_sec": 24949.2317006588}
+{"step": 4000, "validation_loss": 4.019869025491055, "validation_perplexity": 55.69381087954878, "validation_num_batches": 128, "elapsed_sec": 33273.656369924545}
+{"step": 5000, "validation_loss": 4.078961359527535, "validation_perplexity": 59.08407086238957, "validation_num_batches": 128, "elapsed_sec": 41597.64162111282}
+{"step": 6000, "validation_loss": 4.016584710055898, "validation_perplexity": 55.51119488525109, "validation_num_batches": 128, "elapsed_sec": 49907.71303868294}
+{"step": 7000, "validation_loss": 3.911746149875966, "validation_perplexity": 49.98615913843908, "validation_num_batches": 128, "elapsed_sec": 58223.721999168396}
+{"step": 8000, "validation_loss": 3.882301174864477, "validation_perplexity": 48.53577596923642, "validation_num_batches": 128, "elapsed_sec": 66536.9193212986}

eval_summary.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "comparison_path": "/mnt/apps/llm-nanochat/evals/20260607_0037_shortfastdecay8k_4000_5000_6000_7000_8000_cpu_full_benchmark/comparison.json",
+  "metadata_path": "/mnt/apps/llm-nanochat/evals/20260607_0037_shortfastdecay8k_4000_5000_6000_7000_8000_cpu_full_benchmark/eval_metadata.json",
+  "num_checkpoints": 5,
+  "out_dir": "/mnt/apps/llm-nanochat/evals/20260607_0037_shortfastdecay8k_4000_5000_6000_7000_8000_cpu_full_benchmark",
+  "recommended_checkpoint": {
+    "checkpoint_name": "step_4000",
+    "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt",
+    "direction": "min",
+    "value": 5.143982324844751
+  },
+  "report_path": "/mnt/apps/llm-nanochat/evals/20260607_0037_shortfastdecay8k_4000_5000_6000_7000_8000_cpu_full_benchmark/report.md",
+  "suite": "pretrain_minimal_en_it_webwiki_step11000"
+}

metrics.jsonl ADDED Viewed

The diff for this file is too large to render. See raw diff

probe_generations.jsonl ADDED Viewed

The diff for this file is too large to render. See raw diff

step_8000.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a7e59f0f4d19633d1ef2d12e900034208d1b8012bcc9bfd6afd8f9cd6d870fae
+size 1633717975

step_8000.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c863cd9e3892ad926c0ce38e5c3d996e571e3dc688f45a8f95da99892e1199fd
+size 544530872

step_8000.safetensors.json ADDED Viewed

	@@ -0,0 +1,289 @@

+{
+  "checkpoint_config": {
+    "actual_precision": "bf16",
+    "adamw_betas": [
+      0.9,
+      0.95
+    ],
+    "adamw_eps": 1e-08,
+    "attention_kernel_policy": "auto",
+    "batch_size": 6,
+    "benchmark": {
+      "enable_central_tensorboard": true,
+      "enable_local_tensorboard": true,
+      "enabled": false,
+      "output_path": "/mnt/apps/llm-nanochat/artifacts/runs/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/throughput_benchmark.json",
+      "warmup_steps": 0
+    },
+    "checkpoint_dir": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki",
+    "clip_grad_norm": 1.0,
+    "compile": {
+      "backend": null,
+      "compile_setup_sec": 0.0,
+      "diagnostic": null,
+      "dynamic": false,
+      "enabled": false,
+      "error_policy": "raise",
+      "fullgraph": false,
+      "mode": null,
+      "requested": false,
+      "status": "disabled"
+    },
+    "dataset": {
+      "storage_mode": "indexed_jsonl"
+    },
+    "decay_steps": 5000,
+    "deterministic_algorithms": false,
+    "device": "cuda",
+    "dim": 768,
+    "final_lr": 5e-06,
+    "fp8_backend": null,
+    "grad_accum_steps": 16,
+    "learning_rate": 0.0003,
+    "logging": {
+      "enable_central_tensorboard": true,
+      "enable_local_tensorboard": true,
+      "metrics_flush_every_steps": 1,
+      "metrics_writer": "persistent_jsonl_handle"
+    },
+    "lr": 0.0003,
+    "lr_schedule": "wsd",
+    "max_seq_len": 2500,
+    "max_steps": 8000,
+    "n_heads": 12,
+    "n_layers": 12,
+    "optimizer": {
+      "backend": "torch",
+      "betas": [
+        0.9,
+        0.95
+      ],
+      "eps": 1e-08,
+      "implementation": "torch.optim.AdamW",
+      "learning_rate": 0.0003,
+      "state_precision": "full_precision",
+      "type": "adamw",
+      "weight_decay": 0.1
+    },
+    "optimizer_backend": "torch",
+    "optimizer_implementation": "torch.optim.AdamW",
+    "optimizer_state_precision": "full_precision",
+    "optimizer_type": "adamw",
+    "peak_lr": 0.0003,
+    "repro": {
+      "attention_kernel_policy": "auto",
+      "cublas_workspace_config": null,
+      "cudnn_benchmark": true,
+      "cudnn_deterministic": false,
+      "deterministic_algorithms": false,
+      "flash_sdp_enabled": true,
+      "math_sdp_enabled": true,
+      "mem_efficient_sdp_enabled": true,
+      "pythonhashseed": "1337",
+      "seed": 1337
+    },
+    "requested_precision": "bf16",
+    "resume_from": null,
+    "resume_mode": "full",
+    "save_every_steps": 500,
+    "scheduler": {
+      "decay_steps": 5000,
+      "final_lr": 5e-06,
+      "peak_lr": 0.0003,
+      "schedule_type": "wsd",
+      "stable_steps": 2500,
+      "total_steps": 8000,
+      "warmup_steps": 500
+    },
+    "seed": 1337,
+    "stable_steps": 2500,
+    "train_cache_ram_bytes": 1073741824,
+    "train_cache_ram_mb": 1024,
+    "vocab_size": 32000,
+    "warmup_steps": 500,
+    "weight_decay": 0.1
+  },
+  "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_8000.pt",
+  "exported_at": "2026-06-06T23:15:39.908355+00:00",
+  "format": "llm-nanochat-safetensors-export",
+  "global_step": 8000,
+  "metadata_path": "/mnt/apps/llm-nanochat/hf_exports/gpt2small-en-it-nanochat-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki-step8000/step_8000.safetensors.json",
+  "model_config": {
+    "dim": 768,
+    "max_seq_len": 2500,
+    "n_heads": 12,
+    "n_layers": 12,
+    "vocab_size": 32000
+  },
+  "num_parameters": 136128000,
+  "num_tensors": 149,
+  "provenance": {
+    "checkpoint_dir": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki",
+    "checkpoint_name": "step_8000.pt",
+    "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_8000.pt",
+    "global_step": 8000,
+    "packed_dataset_config_path": null,
+    "run_dir": "/mnt/apps/llm-nanochat/checkpoints",
+    "tokenizer_dir": null,
+    "training_config_path": null
+  },
+  "safetensors_path": "/mnt/apps/llm-nanochat/hf_exports/gpt2small-en-it-nanochat-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki-step8000/step_8000.safetensors",
+  "source_checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_8000.pt",
+  "source_global_step": 8000,
+  "tensor_names": [
+    "token_emb.weight",
+    "pos_emb.weight",
+    "blocks.layers.0.self_attn.in_proj_weight",
+    "blocks.layers.0.self_attn.in_proj_bias",
+    "blocks.layers.0.self_attn.out_proj.weight",
+    "blocks.layers.0.self_attn.out_proj.bias",
+    "blocks.layers.0.linear1.weight",
+    "blocks.layers.0.linear1.bias",
+    "blocks.layers.0.linear2.weight",
+    "blocks.layers.0.linear2.bias",
+    "blocks.layers.0.norm1.weight",
+    "blocks.layers.0.norm1.bias",
+    "blocks.layers.0.norm2.weight",
+    "blocks.layers.0.norm2.bias",
+    "blocks.layers.1.self_attn.in_proj_weight",
+    "blocks.layers.1.self_attn.in_proj_bias",
+    "blocks.layers.1.self_attn.out_proj.weight",
+    "blocks.layers.1.self_attn.out_proj.bias",
+    "blocks.layers.1.linear1.weight",
+    "blocks.layers.1.linear1.bias",
+    "blocks.layers.1.linear2.weight",
+    "blocks.layers.1.linear2.bias",
+    "blocks.layers.1.norm1.weight",
+    "blocks.layers.1.norm1.bias",
+    "blocks.layers.1.norm2.weight",
+    "blocks.layers.1.norm2.bias",
+    "blocks.layers.2.self_attn.in_proj_weight",
+    "blocks.layers.2.self_attn.in_proj_bias",
+    "blocks.layers.2.self_attn.out_proj.weight",
+    "blocks.layers.2.self_attn.out_proj.bias",
+    "blocks.layers.2.linear1.weight",
+    "blocks.layers.2.linear1.bias",
+    "blocks.layers.2.linear2.weight",
+    "blocks.layers.2.linear2.bias",
+    "blocks.layers.2.norm1.weight",
+    "blocks.layers.2.norm1.bias",
+    "blocks.layers.2.norm2.weight",
+    "blocks.layers.2.norm2.bias",
+    "blocks.layers.3.self_attn.in_proj_weight",
+    "blocks.layers.3.self_attn.in_proj_bias",
+    "blocks.layers.3.self_attn.out_proj.weight",
+    "blocks.layers.3.self_attn.out_proj.bias",
+    "blocks.layers.3.linear1.weight",
+    "blocks.layers.3.linear1.bias",
+    "blocks.layers.3.linear2.weight",
+    "blocks.layers.3.linear2.bias",
+    "blocks.layers.3.norm1.weight",
+    "blocks.layers.3.norm1.bias",
+    "blocks.layers.3.norm2.weight",
+    "blocks.layers.3.norm2.bias",
+    "blocks.layers.4.self_attn.in_proj_weight",
+    "blocks.layers.4.self_attn.in_proj_bias",
+    "blocks.layers.4.self_attn.out_proj.weight",
+    "blocks.layers.4.self_attn.out_proj.bias",
+    "blocks.layers.4.linear1.weight",
+    "blocks.layers.4.linear1.bias",
+    "blocks.layers.4.linear2.weight",
+    "blocks.layers.4.linear2.bias",
+    "blocks.layers.4.norm1.weight",
+    "blocks.layers.4.norm1.bias",
+    "blocks.layers.4.norm2.weight",
+    "blocks.layers.4.norm2.bias",
+    "blocks.layers.5.self_attn.in_proj_weight",
+    "blocks.layers.5.self_attn.in_proj_bias",
+    "blocks.layers.5.self_attn.out_proj.weight",
+    "blocks.layers.5.self_attn.out_proj.bias",
+    "blocks.layers.5.linear1.weight",
+    "blocks.layers.5.linear1.bias",
+    "blocks.layers.5.linear2.weight",
+    "blocks.layers.5.linear2.bias",
+    "blocks.layers.5.norm1.weight",
+    "blocks.layers.5.norm1.bias",
+    "blocks.layers.5.norm2.weight",
+    "blocks.layers.5.norm2.bias",
+    "blocks.layers.6.self_attn.in_proj_weight",
+    "blocks.layers.6.self_attn.in_proj_bias",
+    "blocks.layers.6.self_attn.out_proj.weight",
+    "blocks.layers.6.self_attn.out_proj.bias",
+    "blocks.layers.6.linear1.weight",
+    "blocks.layers.6.linear1.bias",
+    "blocks.layers.6.linear2.weight",
+    "blocks.layers.6.linear2.bias",
+    "blocks.layers.6.norm1.weight",
+    "blocks.layers.6.norm1.bias",
+    "blocks.layers.6.norm2.weight",
+    "blocks.layers.6.norm2.bias",
+    "blocks.layers.7.self_attn.in_proj_weight",
+    "blocks.layers.7.self_attn.in_proj_bias",
+    "blocks.layers.7.self_attn.out_proj.weight",
+    "blocks.layers.7.self_attn.out_proj.bias",
+    "blocks.layers.7.linear1.weight",
+    "blocks.layers.7.linear1.bias",
+    "blocks.layers.7.linear2.weight",
+    "blocks.layers.7.linear2.bias",
+    "blocks.layers.7.norm1.weight",
+    "blocks.layers.7.norm1.bias",
+    "blocks.layers.7.norm2.weight",
+    "blocks.layers.7.norm2.bias",
+    "blocks.layers.8.self_attn.in_proj_weight",
+    "blocks.layers.8.self_attn.in_proj_bias",
+    "blocks.layers.8.self_attn.out_proj.weight",
+    "blocks.layers.8.self_attn.out_proj.bias",
+    "blocks.layers.8.linear1.weight",
+    "blocks.layers.8.linear1.bias",
+    "blocks.layers.8.linear2.weight",
+    "blocks.layers.8.linear2.bias",
+    "blocks.layers.8.norm1.weight",
+    "blocks.layers.8.norm1.bias",
+    "blocks.layers.8.norm2.weight",
+    "blocks.layers.8.norm2.bias",
+    "blocks.layers.9.self_attn.in_proj_weight",
+    "blocks.layers.9.self_attn.in_proj_bias",
+    "blocks.layers.9.self_attn.out_proj.weight",
+    "blocks.layers.9.self_attn.out_proj.bias",
+    "blocks.layers.9.linear1.weight",
+    "blocks.layers.9.linear1.bias",
+    "blocks.layers.9.linear2.weight",
+    "blocks.layers.9.linear2.bias",
+    "blocks.layers.9.norm1.weight",
+    "blocks.layers.9.norm1.bias",
+    "blocks.layers.9.norm2.weight",
+    "blocks.layers.9.norm2.bias",
+    "blocks.layers.10.self_attn.in_proj_weight",
+    "blocks.layers.10.self_attn.in_proj_bias",
+    "blocks.layers.10.self_attn.out_proj.weight",
+    "blocks.layers.10.self_attn.out_proj.bias",
+    "blocks.layers.10.linear1.weight",
+    "blocks.layers.10.linear1.bias",
+    "blocks.layers.10.linear2.weight",
+    "blocks.layers.10.linear2.bias",
+    "blocks.layers.10.norm1.weight",
+    "blocks.layers.10.norm1.bias",
+    "blocks.layers.10.norm2.weight",
+    "blocks.layers.10.norm2.bias",
+    "blocks.layers.11.self_attn.in_proj_weight",
+    "blocks.layers.11.self_attn.in_proj_bias",
+    "blocks.layers.11.self_attn.out_proj.weight",
+    "blocks.layers.11.self_attn.out_proj.bias",
+    "blocks.layers.11.linear1.weight",
+    "blocks.layers.11.linear1.bias",
+    "blocks.layers.11.linear2.weight",
+    "blocks.layers.11.linear2.bias",
+    "blocks.layers.11.norm1.weight",
+    "blocks.layers.11.norm1.bias",
+    "blocks.layers.11.norm2.weight",
+    "blocks.layers.11.norm2.bias",
+    "ln_f.weight",
+    "ln_f.bias",
+    "head.weight"
+  ],
+  "tokenizer_reference": {
+    "packed_dataset_config_path": null,
+    "tokenizer_dir": null,
+    "training_config_path": null
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_meta.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "vocab_size_requested": 32000,
+  "vocab_size_actual": 32000,
+  "special_tokens": [
+    "<pad>",
+    "<bos>",
+    "<eos>",
+    "<unk>"
+  ]
+}

training_config.yaml ADDED Viewed

	@@ -0,0 +1,61 @@

+# Fresh GPT2-small web/wiki run with WSD short-fast decay to 8k at peak LR 3e-4.
+# Goal: keep the useful high-LR early learning phase, but compress the fresh
+# web/wiki benchmark into an even shorter 8k-step run while preserving the same
+# short-fast-decay shape as the 11k variant.
+# Schedule: warmup 500, stable 2500, decay 5000, final_lr 5e-6.
+# No resume semantics: random weights, fresh optimizer, fresh scheduler.
+dataset_dir: /mnt/apps/llm-nanochat/datasets/202605141153_fineweb50_wiki50_50en_50it_score100_2500context_5Btokens_tok_20260515_en50it50_webwiki_stratified_500M
+output_dir: /mnt/apps/llm-nanochat/artifacts/runs/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki
+tokenizer_dir: /mnt/apps/llm-nanochat/tokenizers/tokenizer_20260515_en50it50_webwiki_stratified_500M
+seed: 1337
+model:
+  vocab_size: 32000
+  dim: 768
+  n_layers: 12
+  n_heads: 12
+training:
+  sequence_length: 2500
+  max_steps: 8000
+  batch_size: 6
+  grad_accum_steps: 16
+  learning_rate: 0.0003
+  peak_lr: 0.0003
+  lr_schedule: wsd
+  warmup_steps: 500
+  stable_steps: 2500
+  decay_steps: 5000
+  final_lr: 5.0e-06
+  adamw_betas:
+    - 0.9
+    - 0.95
+  adamw_eps: 1.0e-08
+  weight_decay: 0.1
+  clip_grad_norm: 1.0
+  save_every_steps: 500
+  checkpoint_dir: /mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki
+  precision: bf16
+evaluation:
+  validation_every_steps: 1000
+  validation_max_batches: 128
+  probe_every_steps: 1000
+  probe_tokenizer_dir: /mnt/apps/llm-nanochat/tokenizers/tokenizer_20260515_en50it50_webwiki_stratified_500M
+  probe_max_new_tokens: 32
+  probe_prompts:
+    en:
+      - prompt: "The capital of Italy is"
+        expected_next_text: " Rome"
+      - prompt: "A small language model should"
+        expected_next_text: " be"
+    it:
+      - prompt: "La capitale d'Italia è"
+        expected_next_text: " Roma"
+      - prompt: "Un piccolo modello linguistico dovrebbe"
+        expected_next_text: " essere"