nazdef commited on
Commit
7c6fc6d
·
verified ·
1 Parent(s): d9005ca

Upload folder using huggingface_hub

Browse files
2026-06-07_shortfastdecay8k_release_step8000.md ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 2026-06-07 - Release note for `step_8000` from the short-fast-decay 8k web/wiki run
2
+
3
+ ## Release candidate
4
+
5
+ - run: `20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki`
6
+ - config: `configs/testing/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki.yaml`
7
+ - chosen checkpoint: `/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_8000.pt`
8
+ - intended HF repo: `nazdef/gpt2small-en-it-nanochat-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki-step8000`
9
+ - estimated tokens seen: `~1.92B` (`8000 * 6 * 16 * 2500`)
10
+
11
+ ## Why this checkpoint exists
12
+
13
+ This is the run's final checkpoint and online-validation winner:
14
+
15
+ - `validation_loss=3.8823011749`
16
+ - `validation_perplexity=48.5357760`
17
+
18
+ But the repo-native CPU benchmark does **not** choose it as the best comparable release checkpoint:
19
+
20
+ 1. `step_4000`
21
+ - `val_loss_mixed=5.1440`
22
+ 2. `step_7000`
23
+ - `val_loss_mixed=5.3313`
24
+ 3. `step_5000`
25
+ - `val_loss_mixed=5.3651`
26
+ 4. `step_8000`
27
+ - `val_loss_mixed=5.3930`
28
+ 5. `step_6000`
29
+ - `val_loss_mixed=5.5364`
30
+
31
+ ## Reading
32
+
33
+ - `step_8000` is the validation-selected and final checkpoint from the run.
34
+ - `step_4000` remains the benchmark-selected winner we expose by default.
35
+ - `step_8000` is still useful to publish because it is the cleaner late-run comparison point:
36
+ - lower `loop_rate`
37
+ - lower `repeated_4gram_rate`
38
+ - higher `distinct_2`
39
+ - This note records a companion publish, not a change in the benchmark ranking.
README.md ADDED
@@ -0,0 +1,136 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - it
5
+ license: other
6
+ library_name: custom
7
+ pipeline_tag: text-generation
8
+ tags:
9
+ - nanochat
10
+ - gpt2-small
11
+ - bilingual
12
+ - english
13
+ - italian
14
+ - pretraining
15
+ - webwiki
16
+ - wsd
17
+ - short-fast-decay
18
+ - validation-selected
19
+ - final-checkpoint
20
+ - lr3e4
21
+ ---
22
+
23
+ # gpt2small-en-it-nanochat-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki-step8000
24
+
25
+ This repo stages `step_8000.pt`, the final checkpoint and best online-validation checkpoint from the local NanoChat EN/IT GPT-2-small-like WSD short-fast-decay web/wiki run `20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki`.
26
+
27
+ ## What this is
28
+
29
+ - model family: GPT-2-small-like decoder-only LM
30
+ - parameters: ~136M
31
+ - languages: English + Italian
32
+ - context length: 2500
33
+ - selected checkpoint: `step_8000.pt`
34
+ - B tokens seen: `~1.92B`
35
+ - selection reason: best in-run online validation checkpoint and final saved checkpoint for this run
36
+ - status relative to the companion benchmark winner:
37
+ - this is the validation-selected release
38
+ - the repo-native benchmark winner from the same run is `step_4000`
39
+
40
+ ## Best in-run validation
41
+
42
+ - best saved validation step for the run: `8000`
43
+ - validation loss: `3.8823011749`
44
+ - validation perplexity: `48.535776`
45
+ - validation batches: `128`
46
+
47
+ This checkpoint matches the run's online validation winner.
48
+
49
+ ## Repo-native benchmark context
50
+
51
+ Repo-native benchmark suite: `configs/eval/20260521_pretrain_minimal_en_it_webwiki_step11000.yaml`
52
+
53
+ Metrics for this checkpoint:
54
+
55
+ - `val_loss_mixed`: `5.3930`
56
+ - `ppl_mixed`: `219.8592`
57
+ - `val_loss_en`: `4.9928`
58
+ - `ppl_en`: `147.3508`
59
+ - `val_loss_it`: `4.1405`
60
+ - `ppl_it`: `62.8313`
61
+ - `loop_rate`: `0.400`
62
+ - `repeated_4gram_rate`: `0.750`
63
+ - `distinct_2`: `0.4706`
64
+ - `cloze_en_contains`: `0.00`
65
+ - `cloze_it_contains`: `0.12`
66
+
67
+ Ranking inside the checked saved checkpoints from this run:
68
+
69
+ 1. `step_4000` -> `mixed=5.1440`
70
+ 2. `step_7000` -> `mixed=5.3313`
71
+ 3. `step_5000` -> `mixed=5.3651`
72
+ 4. `step_8000` -> `mixed=5.3930`
73
+ 5. `step_6000` -> `mixed=5.5364`
74
+
75
+ Important caveat: this run produced two different winners:
76
+
77
+ - `step_8000` won the run's internal online validation
78
+ - `step_4000` won the external repo-native benchmark used to rank comparable releases
79
+
80
+ Operationally:
81
+
82
+ - `step_8000` is the cleaner final checkpoint on repetition/diversity surface metrics
83
+ - `step_4000` remains the checkpoint we promote as the benchmark winner
84
+
85
+ ## Surface-quality reading
86
+
87
+ Compared with `step_4000`, this final checkpoint is behaviorally cleaner on several surface metrics:
88
+
89
+ - `loop_rate`: `0.400` vs `0.725`
90
+ - `repeated_4gram_rate`: `0.750` vs `0.900`
91
+ - `distinct_2`: `0.4706` vs `0.4251`
92
+ - `language_consistency_en`: `1.00` vs `0.95`
93
+
94
+ But it loses on the primary benchmark metric:
95
+
96
+ - `val_loss_mixed`: `5.3930` vs `5.1440`
97
+
98
+ So this repo is the final/validation winner, not the benchmark-first winner.
99
+
100
+ ## Source/domain losses for this checkpoint
101
+
102
+ - `source_loss_books_en`: `5.1537`
103
+ - `source_loss_books_it`: `5.1258`
104
+ - `source_loss_code`: `8.3286`
105
+ - `source_loss_web_en`: `6.2020`
106
+ - `source_loss_web_it`: `6.4544`
107
+ - `source_loss_wiki_en`: `3.9960`
108
+ - `source_loss_wiki_it`: `3.6270`
109
+
110
+ ## Training/data provenance
111
+
112
+ - training config: `training_config.yaml`
113
+ - tokenizer files:
114
+ - `tokenizer.json`
115
+ - `tokenizer_meta.json`
116
+ - checkpoint weights:
117
+ - `step_8000.pt`
118
+ - `step_8000.safetensors`
119
+ - telemetry:
120
+ - `best_validation.json`
121
+ - `metrics.jsonl`
122
+ - `eval_metrics.jsonl`
123
+ - `probe_generations.jsonl`
124
+ - benchmark bundle:
125
+ - `eval_summary.json`
126
+ - `comparison.json`
127
+ - `benchmark_report.md`
128
+ - `benchmark_metrics.json`
129
+ - `benchmark_scores.json`
130
+ - `benchmark_source_losses.json`
131
+
132
+ ## Limitations
133
+
134
+ - Generations are still visibly repetitive and templatey.
135
+ - This repo should not be read as evidence that free-form generation quality is solved.
136
+ - The main value of this checkpoint is as the run's final online-validation winner and as a comparison point against the benchmark-winning `step_4000`.
benchmark_metrics.json ADDED
@@ -0,0 +1,198 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "checkpoints": {
3
+ "20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki-step_4000-step_4000-4000": {
4
+ "aggregate_validation_loss_mean": null,
5
+ "aggregate_validation_perplexity_mean": null,
6
+ "checkpoint_name": "step_4000",
7
+ "checkpoint_selector": "step_4000",
8
+ "checkpoint_step": 4000,
9
+ "cloze_en_contains": 0.0,
10
+ "cloze_en_exact": 0.0,
11
+ "cloze_it_contains": 0.08,
12
+ "cloze_it_exact": 0.0,
13
+ "distinct_1": 0.20633397312859886,
14
+ "distinct_2": 0.4251497005988024,
15
+ "language_consistency_en": 0.95,
16
+ "language_consistency_it": 0.85,
17
+ "language_switch_rate_en": 0.0,
18
+ "language_switch_rate_it": 0.05,
19
+ "loop_rate": 0.725,
20
+ "ppl_en": 119.05713337502907,
21
+ "ppl_it": 57.27657765139552,
22
+ "ppl_mixed": 171.39696944872787,
23
+ "repeated_4gram_rate": 0.9,
24
+ "source_loss_books_en": 4.994737534295945,
25
+ "source_loss_books_it": 5.027433122907366,
26
+ "source_loss_code": 8.615218098958334,
27
+ "source_loss_web_en": 6.153774060701069,
28
+ "source_loss_web_it": 6.019744873046875,
29
+ "source_loss_wiki_en": 3.8654462640935723,
30
+ "source_loss_wiki_it": 3.56423828125,
31
+ "val_loss_en": 4.779603490289652,
32
+ "val_loss_it": 4.047891773161341,
33
+ "val_loss_mixed": 5.143982324844751
34
+ },
35
+ "20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki-step_5000-step_5000-5000": {
36
+ "aggregate_validation_loss_mean": null,
37
+ "aggregate_validation_perplexity_mean": null,
38
+ "checkpoint_name": "step_5000",
39
+ "checkpoint_selector": "step_5000",
40
+ "checkpoint_step": 5000,
41
+ "cloze_en_contains": 0.02,
42
+ "cloze_en_exact": 0.0,
43
+ "cloze_it_contains": 0.1,
44
+ "cloze_it_exact": 0.0,
45
+ "distinct_1": 0.2099009900990099,
46
+ "distinct_2": 0.42018537590113286,
47
+ "language_consistency_en": 0.875,
48
+ "language_consistency_it": 0.75,
49
+ "language_switch_rate_en": 0.0,
50
+ "language_switch_rate_it": 0.0,
51
+ "loop_rate": 0.575,
52
+ "ppl_en": 156.06746197419037,
53
+ "ppl_it": 67.12826498983866,
54
+ "ppl_mixed": 213.81993054234522,
55
+ "repeated_4gram_rate": 0.85,
56
+ "source_loss_books_en": 5.295532953171503,
57
+ "source_loss_books_it": 5.265413556780134,
58
+ "source_loss_code": 8.621547444661458,
59
+ "source_loss_web_en": 6.209454185084293,
60
+ "source_loss_web_it": 6.207602249948602,
61
+ "source_loss_wiki_en": 4.054272738370028,
62
+ "source_loss_wiki_it": 3.58208984375,
63
+ "val_loss_en": 5.050288362323113,
64
+ "val_loss_it": 4.206605192090644,
65
+ "val_loss_mixed": 5.365134214743589
66
+ },
67
+ "20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki-step_6000-step_6000-6000": {
68
+ "aggregate_validation_loss_mean": null,
69
+ "aggregate_validation_perplexity_mean": null,
70
+ "checkpoint_name": "step_6000",
71
+ "checkpoint_selector": "step_6000",
72
+ "checkpoint_step": 6000,
73
+ "cloze_en_contains": 0.0,
74
+ "cloze_en_exact": 0.0,
75
+ "cloze_it_contains": 0.06,
76
+ "cloze_it_exact": 0.0,
77
+ "distinct_1": 0.22282023681377824,
78
+ "distinct_2": 0.4377104377104377,
79
+ "language_consistency_en": 0.925,
80
+ "language_consistency_it": 0.825,
81
+ "language_switch_rate_en": 0.0,
82
+ "language_switch_rate_it": 0.05,
83
+ "loop_rate": 0.525,
84
+ "ppl_en": 192.97065258207533,
85
+ "ppl_it": 78.20997526013538,
86
+ "ppl_mixed": 253.77314221335507,
87
+ "repeated_4gram_rate": 0.775,
88
+ "source_loss_books_en": 5.409920828683036,
89
+ "source_loss_books_it": 5.199875967843192,
90
+ "source_loss_code": 8.31853790283203,
91
+ "source_loss_web_en": 6.131187037417763,
92
+ "source_loss_web_it": 6.786579332853618,
93
+ "source_loss_wiki_en": 4.168480613014915,
94
+ "source_loss_wiki_it": 3.8566854858398436,
95
+ "val_loss_en": 5.262538118182488,
96
+ "val_loss_it": 4.359397200287366,
97
+ "val_loss_mixed": 5.536440727038261
98
+ },
99
+ "20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki-step_7000-step_7000-7000": {
100
+ "aggregate_validation_loss_mean": null,
101
+ "aggregate_validation_perplexity_mean": null,
102
+ "checkpoint_name": "step_7000",
103
+ "checkpoint_selector": "step_7000",
104
+ "checkpoint_step": 7000,
105
+ "cloze_en_contains": 0.04,
106
+ "cloze_en_exact": 0.0,
107
+ "cloze_it_contains": 0.1,
108
+ "cloze_it_exact": 0.0,
109
+ "distinct_1": 0.21511627906976744,
110
+ "distinct_2": 0.4431017119838872,
111
+ "language_consistency_en": 0.925,
112
+ "language_consistency_it": 0.75,
113
+ "language_switch_rate_en": 0.0,
114
+ "language_switch_rate_it": 0.0,
115
+ "loop_rate": 0.55,
116
+ "ppl_en": 154.45723780849602,
117
+ "ppl_it": 61.331402298576066,
118
+ "ppl_mixed": 206.7071249834935,
119
+ "repeated_4gram_rate": 0.875,
120
+ "source_loss_books_en": 5.137017386300223,
121
+ "source_loss_books_it": 5.132158551897321,
122
+ "source_loss_code": 8.338209533691407,
123
+ "source_loss_web_en": 6.089872661389802,
124
+ "source_loss_web_it": 6.3964783517937915,
125
+ "source_loss_wiki_en": 4.025867115367543,
126
+ "source_loss_wiki_it": 3.61209228515625,
127
+ "val_loss_en": 5.03991728008918,
128
+ "val_loss_it": 4.116291984182889,
129
+ "val_loss_mixed": 5.331302936260517
130
+ },
131
+ "20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki-step_8000-step_8000-8000": {
132
+ "aggregate_validation_loss_mean": null,
133
+ "aggregate_validation_perplexity_mean": null,
134
+ "checkpoint_name": "step_8000",
135
+ "checkpoint_selector": "step_8000",
136
+ "checkpoint_step": 8000,
137
+ "cloze_en_contains": 0.0,
138
+ "cloze_en_exact": 0.0,
139
+ "cloze_it_contains": 0.12,
140
+ "cloze_it_exact": 0.0,
141
+ "distinct_1": 0.22893954410307235,
142
+ "distinct_2": 0.47058823529411764,
143
+ "language_consistency_en": 1.0,
144
+ "language_consistency_it": 0.775,
145
+ "language_switch_rate_en": 0.0,
146
+ "language_switch_rate_it": 0.0,
147
+ "loop_rate": 0.4,
148
+ "ppl_en": 147.3507568259974,
149
+ "ppl_it": 62.831334211250244,
150
+ "ppl_mixed": 219.85916404362194,
151
+ "repeated_4gram_rate": 0.75,
152
+ "source_loss_books_en": 5.153700692313058,
153
+ "source_loss_books_it": 5.1257749285016745,
154
+ "source_loss_code": 8.328606669108073,
155
+ "source_loss_web_en": 6.201997455797698,
156
+ "source_loss_web_it": 6.4544139661287,
157
+ "source_loss_wiki_en": 3.9959980357776987,
158
+ "source_loss_wiki_it": 3.62702880859375,
159
+ "val_loss_en": 4.992815845417526,
160
+ "val_loss_it": 4.140453901447233,
161
+ "val_loss_mixed": 5.392987177922175
162
+ }
163
+ },
164
+ "cloze_en_contains": 0.0,
165
+ "cloze_en_exact": 0.0,
166
+ "cloze_it_contains": 0.08,
167
+ "cloze_it_exact": 0.0,
168
+ "distinct_1": 0.20633397312859886,
169
+ "distinct_2": 0.4251497005988024,
170
+ "language_consistency_en": 0.95,
171
+ "language_consistency_it": 0.85,
172
+ "language_switch_rate_en": 0.0,
173
+ "language_switch_rate_it": 0.05,
174
+ "loop_rate": 0.725,
175
+ "ppl_en": 119.05713337502907,
176
+ "ppl_it": 57.27657765139552,
177
+ "ppl_mixed": 171.39696944872787,
178
+ "recommended_checkpoint": {
179
+ "checkpoint_name": "step_4000",
180
+ "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt",
181
+ "direction": "min",
182
+ "value": 5.143982324844751
183
+ },
184
+ "recommended_metric": "val_loss_mixed",
185
+ "repeated_4gram_rate": 0.9,
186
+ "source_losses": {
187
+ "books_en": 4.994737534295945,
188
+ "books_it": 5.027433122907366,
189
+ "code": 8.615218098958334,
190
+ "web_en": 6.153774060701069,
191
+ "web_it": 6.019744873046875,
192
+ "wiki_en": 3.8654462640935723,
193
+ "wiki_it": 3.56423828125
194
+ },
195
+ "val_loss_en": 4.779603490289652,
196
+ "val_loss_it": 4.047891773161341,
197
+ "val_loss_mixed": 5.143982324844751
198
+ }
benchmark_report.md ADDED
@@ -0,0 +1,1142 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Post-training checkpoint evaluation report — pretrain_minimal_en_it_webwiki_step11000
2
+
3
+ - Evaluation date: `2026-06-06T22:34:43.246329+00:00`
4
+ - Commit hash: `bffb58ef99b4bb27ea6772f5853c16d43607e4eb`
5
+ - Hostname: `desktop-H270M-DS3H`
6
+ - Device: `cpu`
7
+ - Dtype: `fp32`
8
+ - Seed: `1337`
9
+ - Suite path: `/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/configs/eval/20260521_pretrain_minimal_en_it_webwiki_step11000.yaml`
10
+ - Suite model type: `pretrained`
11
+ - Recommended checkpoint: `step_4000`
12
+
13
+ ## Evaluated checkpoints
14
+
15
+ - name=`step_4000`, selector=`step_4000`, step=`4000`, run=`20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki`, path=`/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt`, selected_by=`step_4000`
16
+ - name=`step_5000`, selector=`step_5000`, step=`5000`, run=`20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki`, path=`/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_5000.pt`, selected_by=`step_5000`
17
+ - name=`step_6000`, selector=`step_6000`, step=`6000`, run=`20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki`, path=`/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_6000.pt`, selected_by=`step_6000`
18
+ - name=`step_7000`, selector=`step_7000`, step=`7000`, run=`20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki`, path=`/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_7000.pt`, selected_by=`step_7000`
19
+ - name=`step_8000`, selector=`step_8000`, step=`8000`, run=`20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki`, path=`/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_8000.pt`, selected_by=`step_8000`
20
+
21
+ ## Eval datasets
22
+
23
+ - No quantitative datasets configured.
24
+
25
+ ## Metric interpretation
26
+
27
+ - Lower validation loss is better.
28
+ - Lower perplexity is better.
29
+ - Higher generation pass rate is better when heuristic prompt scoring is enabled.
30
+
31
+ ## Comparison table
32
+
33
+ | checkpoint_name | checkpoint_selector | checkpoint_step | aggregate_validation_loss_mean | aggregate_validation_perplexity_mean | generation_pass_rate | selected_by |
34
+ | --- | --- | --- | --- | --- | --- | --- |
35
+ | step_4000 | step_4000 | 4000 | | | | step_4000 |
36
+ | step_5000 | step_5000 | 5000 | | | | step_5000 |
37
+ | step_6000 | step_6000 | 6000 | | | | step_6000 |
38
+ | step_7000 | step_7000 | 7000 | | | | step_7000 |
39
+ | step_8000 | step_8000 | 8000 | | | | step_8000 |
40
+
41
+ ## Recommendation notes
42
+
43
+ - Recommended checkpoint: use `step_4000` based on `val_loss_mixed`.
44
+
45
+ ## Validation loss / perplexity
46
+
47
+ | checkpoint_name | checkpoint_selector | checkpoint_step | val_loss_en | val_loss_it | val_loss_mixed | ppl_en | ppl_it | ppl_mixed |
48
+ | --- | --- | --- | --- | --- | --- | --- | --- | --- |
49
+ | step_4000 | step_4000 | 4000 | 4.7796 | 4.0479 | 5.1440 | 119.0571 | 57.2766 | 171.3970 |
50
+ | step_5000 | step_5000 | 5000 | 5.0503 | 4.2066 | 5.3651 | 156.0675 | 67.1283 | 213.8199 |
51
+ | step_6000 | step_6000 | 6000 | 5.2625 | 4.3594 | 5.5364 | 192.9707 | 78.2100 | 253.7731 |
52
+ | step_7000 | step_7000 | 7000 | 5.0399 | 4.1163 | 5.3313 | 154.4572 | 61.3314 | 206.7071 |
53
+ | step_8000 | step_8000 | 8000 | 4.9928 | 4.1405 | 5.3930 | 147.3508 | 62.8313 | 219.8592 |
54
+
55
+ ## Source/domain losses
56
+
57
+ | checkpoint_name | source_loss_books_en | source_loss_books_it | source_loss_code | source_loss_web_en | source_loss_web_it | source_loss_wiki_en | source_loss_wiki_it |
58
+ | --- | --- | --- | --- | --- | --- | --- | --- |
59
+ | step_4000 | 4.9947 | 5.0274 | 8.6152 | 6.1538 | 6.0197 | 3.8654 | 3.5642 |
60
+ | step_5000 | 5.2955 | 5.2654 | 8.6215 | 6.2095 | 6.2076 | 4.0543 | 3.5821 |
61
+ | step_6000 | 5.4099 | 5.1999 | 8.3185 | 6.1312 | 6.7866 | 4.1685 | 3.8567 |
62
+ | step_7000 | 5.1370 | 5.1322 | 8.3382 | 6.0899 | 6.3965 | 4.0259 | 3.6121 |
63
+ | step_8000 | 5.1537 | 5.1258 | 8.3286 | 6.2020 | 6.4544 | 3.9960 | 3.6270 |
64
+
65
+ ## Cloze EN/IT
66
+
67
+ | checkpoint_name | cloze_en_contains | cloze_it_contains | cloze_en_exact | cloze_it_exact |
68
+ | --- | --- | --- | --- | --- |
69
+ | step_4000 | 0.0000 | 0.0800 | 0.0000 | 0.0000 |
70
+ | step_5000 | 0.0200 | 0.1000 | 0.0000 | 0.0000 |
71
+ | step_6000 | 0.0000 | 0.0600 | 0.0000 | 0.0000 |
72
+ | step_7000 | 0.0400 | 0.1000 | 0.0000 | 0.0000 |
73
+ | step_8000 | 0.0000 | 0.1200 | 0.0000 | 0.0000 |
74
+
75
+ ## Continuation samples
76
+
77
+ # Generation comparison
78
+
79
+ ## gen_en_0001
80
+
81
+ - Language: `en`
82
+ - Category: `story`
83
+ - Prompt: `On a quiet street near the river,`
84
+
85
+ ### step_4000 (step 4000)
86
+
87
+ the river was built in a small town. The river was built in a small town, and the river was built in a small town. The river was built
88
+
89
+ ### step_5000 (step 5000)
90
+
91
+ the river's main river, the river's main river, the river's main river, the river's main river, the river's main river, the river
92
+
93
+ ### step_6000 (step 6000)
94
+
95
+ the river is a very beautiful place. The river is a very beautiful place, with a very beautiful place, with a very beautiful place, with a very beautiful
96
+
97
+ ### step_7000 (step 7000)
98
+
99
+ the river's name is "The Greatest" (the Greatest) and the Greatest (the Greatest) and the Greatest). The Greatest
100
+
101
+ ### step_8000 (step 8000)
102
+
103
+ the city is surrounded by a large, small, and a small, small, and a small, small, and a small, small, and a small,
104
+
105
+ ## gen_en_0002
106
+
107
+ - Language: `en`
108
+ - Category: `story`
109
+ - Prompt: `At sunrise the village was still asleep, but`
110
+
111
+ ### step_4000 (step 4000)
112
+
113
+ the village was not fully aware of the village. The village was not a village, but the village was not a village. The village was not a village,
114
+
115
+ ### step_5000 (step 5000)
116
+
117
+ the village was still in the village. The village was also home to the village. The village was home to the village of the village. The village was home
118
+
119
+ ### step_6000 (step 6000)
120
+
121
+ the village was not rebuilt. The village was rebuilt in the 1930s and was used as a tourist attraction. The village was built in the 1930s and was
122
+
123
+ ### step_7000 (step 7000)
124
+
125
+ the church was still in the hands of the church. The church was built in the early 19th century, and the church was built in the early 20th
126
+
127
+ ### step_8000 (step 8000)
128
+
129
+ the village was still in the hands of the people. The village was a small village, and the village was a small village. The village was a small village
130
+
131
+ ## gen_en_0003
132
+
133
+ - Language: `en`
134
+ - Category: `story`
135
+ - Prompt: `A child opened the old wooden gate and`
136
+
137
+ ### step_4000 (step 4000)
138
+
139
+ the old stone was built. The stone was built in the middle of the 19th century. The stone was built in the 18th century. The stone was
140
+
141
+ ### step_5000 (step 5000)
142
+
143
+ the old wooden doors of the old building. The old building was built in the early 19th century. The building was built in the early 20th century and
144
+
145
+ ### step_6000 (step 6000)
146
+
147
+ the old gate. The old gate was built in the early 19th century. The old gate was built in the early 19th century. The old gate was
148
+
149
+ ### step_7000 (step 7000)
150
+
151
+ the old manor was a young manor. He was a young manor and a young manor. He was a young manor and a young man
152
+
153
+ ### step_8000 (step 8000)
154
+
155
+ the old manor was a young manor. He was a young manor and a young manor. He was a young manor and a young man
156
+
157
+ ## gen_en_0004
158
+
159
+ - Language: `en`
160
+ - Category: `story`
161
+ - Prompt: `The storm ended just before midnight, and`
162
+
163
+ ### step_4000 (step 4000)
164
+
165
+ the storm was not immediately clear. The storm was not the storm, but the storm was not the storm. The storm was not the storm, but the storm
166
+
167
+ ### step_5000 (step 5000)
168
+
169
+ the storm surge hit the city of New York City. The storm surge hit the city of New York City, and the city of New York City was the first
170
+
171
+ ### step_6000 (step 6000)
172
+
173
+ the storm was over. The storm was over, and the storm was over, and the storm was over. The storm was over, and the storm was over
174
+
175
+ ### step_7000 (step 7000)
176
+
177
+ the storm was not immediately clear. The storm was not a good thing. The storm was not a bad thing. The storm was not a bad thing. The
178
+
179
+ ### step_8000 (step 8000)
180
+
181
+ the storm was not immediately clear. The storm was not immediately clear, but it was not clear. The storm was not clear, but it was not clear.
182
+
183
+ ## gen_en_0005
184
+
185
+ - Language: `en`
186
+ - Category: `explanation`
187
+ - Prompt: `A bilingual checkpoint comparison is useful because`
188
+
189
+ ### step_4000 (step 4000)
190
+
191
+ it is not possible to use a checkpoint. The checkpoint is not possible. The checkpoint is not possible. The checkpoint is not possible. The
192
+
193
+ ### step_5000 (step 5000)
194
+
195
+ it is not a good idea to be able to do so.
196
+
197
+ ### step_6000 (step 6000)
198
+
199
+ it is not a good idea.
200
+
201
+ ### step_7000 (step 7000)
202
+
203
+ it is not a good idea to be able to find a way to find a way to find a way to find a way to find a way to find a
204
+
205
+ ### step_8000 (step 8000)
206
+
207
+ it is not a good idea to be able to use a bilingual checkpoint. The bilingual checkpoint is a good idea to be able to
208
+
209
+ ## gen_en_0006
210
+
211
+ - Language: `en`
212
+ - Category: `explanation`
213
+ - Prompt: `A validation loss curve becomes easier to trust when`
214
+
215
+ ### step_4000 (step 4000)
216
+
217
+ you are not able to trust your neighbor. The risk of loss of your child is not a loss of your child. The risk of loss of your child is
218
+
219
+ ### step_5000 (step 5000)
220
+
221
+ it is difficult to predict the damage of the damage of the damage. The damage of the damage is not limited to the damage of the damage of the damage of
222
+
223
+ ### step_6000 (step 6000)
224
+
225
+ the AU is a good way to the AU. The AU is a good way to the AU. AU is a good way to the
226
+
227
+ ### step_7000 (step 7000)
228
+
229
+ the player is in the game. The player is not in the game, but in the game, the player is not in the game. The player is not
230
+
231
+ ### step_8000 (step 8000)
232
+
233
+ the player is in the game. The player is not in the game, but in the game, the player is in the game. The player is in the
234
+
235
+ ## gen_en_0007
236
+
237
+ - Language: `en`
238
+ - Category: `explanation`
239
+ - Prompt: `A packed dataset should be rebuilt after a tokenizer change because`
240
+
241
+ ### step_4000 (step 4000)
242
+
243
+ the user's user is not able to use the user's computer to use the user's computer to use the user's computer to use the user's computer to
244
+
245
+ ### step_5000 (step 5000)
246
+
247
+ of the fact that the term "cold" is used to describe the term "cold" in the context of the term "cold" in the
248
+
249
+ ### step_6000 (step 6000)
250
+
251
+ of the original design. The design was designed by the architector and architector, who was the architector of the design. The design was designed by the
252
+
253
+ ### step_7000 (step 7000)
254
+
255
+ it is not a good idea to use it. The first step is to use the same method as the first step. The second step is to use the same
256
+
257
+ ### step_8000 (step 8000)
258
+
259
+ it is not possible to use the same as the same as the same as the same as the same as the same as the same as the same as the same
260
+
261
+ ## gen_en_0008
262
+
263
+ - Language: `en`
264
+ - Category: `news`
265
+ - Prompt: `The local research lab announced that`
266
+
267
+ ### step_4000 (step 4000)
268
+
269
+ the project will be completed in the future. The project will be completed in the future. The project will be completed in the future. The project will be completed
270
+
271
+ ### step_5000 (step 5000)
272
+
273
+ the project will be a "significant project" in the city of New York City. The project will be funded by the City Council of New York City
274
+
275
+ ### step_6000 (step 6000)
276
+
277
+ the new study was conducted by the National Institute of Technology, which was funded by the National Institute of Technology. The new study was published in the journal Nature of
278
+
279
+ ### step_7000 (step 7000)
280
+
281
+ the project was completed in the summer of 2009. The project was completed in the spring of 2009. The project was completed in the spring of 2009. The project
282
+
283
+ ### step_8000 (step 8000)
284
+
285
+ the project was completed in the early 1990s. The project was completed in the early 1990s, and the project was completed in the early 1990s. The
286
+
287
+ ## gen_en_0009
288
+
289
+ - Language: `en`
290
+ - Category: `news`
291
+ - Prompt: `According to the operator log, the run`
292
+
293
+ ### step_4000 (step 4000)
294
+
295
+ time is not the same as the "small" or "small" or "small" or "small" or "small" or
296
+
297
+ ### step_5000 (step 5000)
298
+
299
+ time is a good thing. It's a good thing. It's a good thing. It's a good thing. It's a good thing. It's
300
+
301
+ ### step_6000 (step 6000)
302
+
303
+ time is a set of rules, and the runtime is a set of rules. The runtime is a set of rules, and the runtime is a
304
+
305
+ ### step_7000 (step 7000)
306
+
307
+ time is a simple, simple, simple, simple, simple, simple, simple, simple, simple, simple, simple, simple, simple, simple, simple
308
+
309
+ ### step_8000 (step 8000)
310
+
311
+ time is a simple, simple, simple, and easy way to do. The first step is to use the runtime to runtime. The runtime is
312
+
313
+ ## gen_en_0010
314
+
315
+ - Language: `en`
316
+ - Category: `news`
317
+ - Prompt: `The latest experiment summary noted that`
318
+
319
+ ### step_4000 (step 4000)
320
+
321
+ the "significant" of the "significant" of the "significant" of the "significant" of the "signific
322
+
323
+ ### step_5000 (step 5000)
324
+
325
+ the new technologies in the future have been developed in the past. The new technologies in the future have been developed in the past. The new technologies in the future
326
+
327
+ ### step_6000 (step 6000)
328
+
329
+ the "small" of the "small" of the "small" of the "small" of the "small" of the "
330
+
331
+ ### step_7000 (step 7000)
332
+
333
+ the experiment was a "significant" experiment. The experiment was a "significant" experiment. The experiment was a "significant" experiment
334
+
335
+ ### step_8000 (step 8000)
336
+
337
+ the experiment was not a good idea. The experiment was not a good idea. The experiment was not a good idea. The experiment was not a good idea.
338
+
339
+ ## gen_en_0011
340
+
341
+ - Language: `en`
342
+ - Category: `dialogue`
343
+ - Prompt: `"Can we resume from the best checkpoint?"`
344
+
345
+ ### step_4000 (step 4000)
346
+
347
+ "I'm not sure that I can do this, but I'm not sure that I can do this, but I'm not sure that I can do this
348
+
349
+ ### step_5000 (step 5000)
350
+
351
+ "I'm not going to be a good job." "I'm not going to be a good job." "I'm not going to be a good job
352
+
353
+ ### step_6000 (step 6000)
354
+
355
+ ?
356
+
357
+ ### step_7000 (step 7000)
358
+
359
+ <empty>
360
+
361
+ ### step_8000 (step 8000)
362
+
363
+ "I'm not going to be able to get the best of the day, but I'm not going to be able to get the best of the day,
364
+
365
+ ## gen_en_0012
366
+
367
+ - Language: `en`
368
+ - Category: `dialogue`
369
+ - Prompt: `"I checked the report twice," the engineer said,`
370
+
371
+ ### step_4000 (step 4000)
372
+
373
+ "I have been able to make a decision to make a decision to make a decision to make a decision to make a decision to make a decision to make a
374
+
375
+ ### step_5000 (step 5000)
376
+
377
+ "I'm not going to be a good job." "I'm not going to be a good job, but I'm not going to be a good job
378
+
379
+ ### step_6000 (step 6000)
380
+
381
+ "I'm not going to go to the store." The report, which is the first time of the report, is the first time of the report, which
382
+
383
+ ### step_7000 (step 7000)
384
+
385
+ "I have to do it." The engineer said he was "very good" and "very good" and "very good" and "very good" and
386
+
387
+ ### step_8000 (step 8000)
388
+
389
+ "I have a problem with the problem." The engineer said the engineer said the engineer was "very good" and "very good" and "very good"
390
+
391
+ ## gen_en_0013
392
+
393
+ - Language: `en`
394
+ - Category: `description`
395
+ - Prompt: `The small workstation under the desk`
396
+
397
+ ### step_4000 (step 4000)
398
+
399
+ is a very large, very large, very large, very large, very small, very small, very small, very small, very small, very small,
400
+
401
+ ### step_5000 (step 5000)
402
+
403
+ of the "The Wizard of the World" (1932) and "The Wizard of the World" (1932). The work was inspired by
404
+
405
+ ### step_6000 (step 6000)
406
+
407
+ is a very important part of the workstation. The workstation is a very important part of the workstation. The workstation is a
408
+
409
+ ### step_7000 (step 7000)
410
+
411
+ is a simple, simple, simple, and simple, and easy to use. The workstation is a simple, simple, and easy to use. The
412
+
413
+ ### step_8000 (step 8000)
414
+
415
+ is a simple, simple, and easy way to do it. The most common way to do this is to make a good workstation. The most common
416
+
417
+ ## gen_en_0014
418
+
419
+ - Language: `en`
420
+ - Category: `description`
421
+ - Prompt: `The training dashboard on the screen`
422
+
423
+ ### step_4000 (step 4000)
424
+
425
+ is a very good way to do. The dashboard is a very good way to do. The dashboard is a very good way to do.
426
+
427
+ ### step_5000 (step 5000)
428
+
429
+ . The training dashboard is designed to provide a range of training and training. The training dashboard is designed to provide a variety of training and training
430
+
431
+ ### step_6000 (step 6000)
432
+
433
+ , and the dashboard on the screen. The dashboard on the screen, and the dashboard on the screen. The dashboard on
434
+
435
+ ### step_7000 (step 7000)
436
+
437
+ is a simple, simple, simple, and easy way to do. The basic training is to use the same techniques as the "cashboard" and "
438
+
439
+ ### step_8000 (step 8000)
440
+
441
+ . The dashboard is a simple, simple, simple, and easy-to-use, and easy-to-use. The dashboard is
442
+
443
+ ## gen_en_0015
444
+
445
+ - Language: `en`
446
+ - Category: `instructional`
447
+ - Prompt: `To compare two pretrained checkpoints, first`
448
+
449
+ ### step_4000 (step 4000)
450
+
451
+ to the first to the second. The second to the second to the second. The second to the second to the second. The second to the second. The
452
+
453
+ ### step_5000 (step 5000)
454
+
455
+ in the second, and second in the second. The second was the first in the second. The second was the second in the third. The second was the
456
+
457
+ ### step_6000 (step 6000)
458
+
459
+ one, and second one, and third one, respectively, and second one, respectively, respectively. The first two, and second one, and second one,
460
+
461
+ ### step_7000 (step 7000)
462
+
463
+ for the first time in the second half of the second half of the second half of the second half of the second half of the second half of the second half
464
+
465
+ ### step_8000 (step 8000)
466
+
467
+ for the second, and second for the second. The second, second, and third, and third, respectively. The third, and fourth, respectively. The
468
+
469
+ ## gen_en_0016
470
+
471
+ - Language: `en`
472
+ - Category: `instructional`
473
+ - Prompt: `When a run stops unexpectedly, the safest next step is`
474
+
475
+ ### step_4000 (step 4000)
476
+
477
+ to get the best of the game. The game is a game that is a game that is a game that is a game that is a game that is a
478
+
479
+ ### step_5000 (step 5000)
480
+
481
+ to be able to be able to do the same. The problem is that the problem is solved by the problem. The problem is that the problem is solved by
482
+
483
+ ### step_6000 (step 6000)
484
+
485
+ to be a little more than a little more than a little more than a little more than a little more than a little more than a little more than a little
486
+
487
+ ### step_7000 (step 7000)
488
+
489
+ to be a little more than a run. The first step is to get the first step in the first step. The second step is to get the first step
490
+
491
+ ### step_8000 (step 8000)
492
+
493
+ to get the ball away from the ball. The ball is now in the process of being able to get the ball away from the ball. The ball is now
494
+
495
+ ## gen_en_0017
496
+
497
+ - Language: `en`
498
+ - Category: `reflection`
499
+ - Prompt: `One clear lesson from the pilot run was that`
500
+
501
+ ### step_4000 (step 4000)
502
+
503
+ the pilot was not a pilot. The pilot was not a pilot. The pilot was not a pilot. The pilot was not a pilot. The pilot was not
504
+
505
+ ### step_5000 (step 5000)
506
+
507
+ the pilot was in the middle of the runway. The pilot was in the middle of the runway and the pilot was in the middle of the runway. The pilot
508
+
509
+ ### step_6000 (step 6000)
510
+
511
+ the pilot run was not the same as the pilot run. The pilot run was not the same as the pilot run. The pilot run was not the same as
512
+
513
+ ### step_7000 (step 7000)
514
+
515
+ the pilot was not the first pilot to be able to fly. The pilot was not the first pilot to fly. The pilot was not the first pilot to fly
516
+
517
+ ### step_8000 (step 8000)
518
+
519
+ the pilot was not the first to be able to fly. The pilot was not able to fly the airplane to the ground and the airplane was not able to fly
520
+
521
+ ## gen_en_0018
522
+
523
+ - Language: `en`
524
+ - Category: `reflection`
525
+ - Prompt: `The bilingual probes suggested that`
526
+
527
+ ### step_4000 (step 4000)
528
+
529
+ the bilingual probes were not the same as the bilingual probes. The bilingual probes were not the same as the biling
530
+
531
+ ### step_5000 (step 5000)
532
+
533
+ the bilingual probes of the bilingual probes of the bilingual probes of the bilingual probes of the bilingual
534
+
535
+ ### step_6000 (step 6000)
536
+
537
+ the bilingual probes were not the same as the bilingual probes. The bilingual probes were not the same as the biling
538
+
539
+ ### step_7000 (step 7000)
540
+
541
+ the bilingual probes were not the same. The bilingual probes were not the same. The bilingual probes were not the same
542
+
543
+ ### step_8000 (step 8000)
544
+
545
+ the bilingual probes are not the same as the bilingual probes. The bilingual probes are not the same as the biling
546
+
547
+ ## gen_en_0019
548
+
549
+ - Language: `en`
550
+ - Category: `technical`
551
+ - Prompt: `A token-weighted validation loss avoids`
552
+
553
+ ### step_4000 (step 4000)
554
+
555
+ the ability to maintain a healthy life. The goal of this study is to determine the potential of a healthy life. The goal of this study is to determine the
556
+
557
+ ### step_5000 (step 5000)
558
+
559
+ the United States in the first round of the 2010 U.S. Open.
560
+
561
+ ### step_6000 (step 6000)
562
+
563
+ the loss of the title. The title of the title is a reference to the title of the title of the title of the title of the title of the title
564
+
565
+ ### step_7000 (step 7000)
566
+
567
+ the loss of the game. The game is a game that is played by a player who is not a player who is a player who is a player who is
568
+
569
+ ### step_8000 (step 8000)
570
+
571
+ the loss of the sport. The team's performance is expected to be the first time the team will be able to recover from the injury. The team will be
572
+
573
+ ## gen_en_0020
574
+
575
+ - Language: `en`
576
+ - Category: `technical`
577
+ - Prompt: `A lightweight repetition metric can reveal`
578
+
579
+ ### step_4000 (step 4000)
580
+
581
+ a lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight light
582
+
583
+ ### step_5000 (step 5000)
584
+
585
+ a new lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight
586
+
587
+ ### step_6000 (step 6000)
588
+
589
+ the effect of the lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight
590
+
591
+ ### step_7000 (step 7000)
592
+
593
+ the presence of the new lightweight. The lightweight is a lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight lightweight light
594
+
595
+ ### step_8000 (step 8000)
596
+
597
+ the effect of the lightweight audacity. The lightweight audacity is the lightweight audacity that is used to detect lightweight audacity. The light
598
+
599
+ ## gen_it_0001
600
+
601
+ - Language: `it`
602
+ - Category: `story`
603
+ - Prompt: `In una strada tranquilla vicino al fiume,`
604
+
605
+ ### step_4000 (step 4000)
606
+
607
+ il fiume si dirige verso la costa, dove si trova la città di San Francisco, dove si trova la città di San Francisco, dove si trova la città di
608
+
609
+ ### step_5000 (step 5000)
610
+
611
+ il fiume è stato distrutto e il fiume è stato distrutto. Il fiume è stato distrutto e il fiume è stato distrutto. Il fiume è stato distrutto e il fiume
612
+
613
+ ### step_6000 (step 6000)
614
+
615
+ il fiume è attraversato da un fiume, e da un fiume. Il fiume è attraversato da un fiume, e da un fiume, e da un fiume, e
616
+
617
+ ### step_7000 (step 7000)
618
+
619
+ il fiume è stato costruito su un'isola di sabbia, e il fiume è stato costruito su un'isola di sabbia. Il fiume è stato costruito su un
620
+
621
+ ### step_8000 (step 8000)
622
+
623
+ il fiume si trova a circa 1.000 metri di distanza. Il fiume è stato costruito da un gruppo di pescatori, che si trova a circa 1.000
624
+
625
+ ## gen_it_0002
626
+
627
+ - Language: `it`
628
+ - Category: `story`
629
+ - Prompt: `All'alba il paese dormiva ancora, ma`
630
+
631
+ ### step_4000 (step 4000)
632
+
633
+ non era più possibile che il paese fosse un paese in cui il paese fosse un paese in cui il paese fosse un paese in cui il paese era un paese in
634
+
635
+ ### step_5000 (step 5000)
636
+
637
+ la sua vita era in un'altra parte, e la sua vita era in un'altra parte. La sua vita era in un'epoca in cui la
638
+
639
+ ### step_6000 (step 6000)
640
+
641
+ non si sa se non si sa se non si sa se non si sa se non si sa se non si sa se non si sa se non si sa se
642
+
643
+ ### step_7000 (step 7000)
644
+
645
+ non si era mai vista. Il paese era un paese che non aveva mai visto il suo paese, ma non aveva mai visto il suo paese. Il paese era
646
+
647
+ ### step_8000 (step 8000)
648
+
649
+ non era più possibile. Il paese era un paese che non aveva mai visto il paese. Il paese era un paese che non aveva mai visto il paese. Il
650
+
651
+ ## gen_it_0003
652
+
653
+ - Language: `it`
654
+ - Category: `story`
655
+ - Prompt: `Un bambino aprì il vecchio cancello di legno e`
656
+
657
+ ### step_4000 (step 4000)
658
+
659
+ si mise a fare il cancello di legno. Il cancello di legno, che era stato poi portato in un altro cancello di legno, si fece costruire
660
+
661
+ ### step_5000 (step 5000)
662
+
663
+ la sua bocca era piena di lacrime. La sua bocca era un’altra cosa che non era mai stata mai stata mai più. La sua bocca era un’
664
+
665
+ ### step_6000 (step 6000)
666
+
667
+ di legno. Il cancello di legno, che si trova nella parte centrale della città, è costituito da una serie di vasiature, che si trovano nella parte
668
+
669
+ ### step_7000 (step 7000)
670
+
671
+ la sua famiglia si trasferì in un piccolo villaggio. Il suo nome è stato scelto per la sua famiglia, ma non è stato scelto per la sua famiglia. Il
672
+
673
+ ### step_8000 (step 8000)
674
+
675
+ la sua famiglia si trasferì in un piccolo villaggio di pescatori. Il piccolo villaggio di pescatori, che si trovava in una zona di pescatori, si trovava in una zona
676
+
677
+ ## gen_it_0004
678
+
679
+ - Language: `it`
680
+ - Category: `story`
681
+ - Prompt: `La tempesta finì poco prima di mezzanotte, e`
682
+
683
+ ### step_4000 (step 4000)
684
+
685
+ la tempesta si è spostata verso la costa del fiume, e la tempesta si è spostata verso la costa del fiume, e la tempesta si è spostata
686
+
687
+ ### step_5000 (step 5000)
688
+
689
+ poi di nuovo, e poi di nuovo, e poi di nuovo, e poi di nuovo, e di nuovo, e di nuovo, e di nuovo, e
690
+
691
+ ### step_6000 (step 6000)
692
+
693
+ poi di nuovo verso la fine del mese. Il giorno dopo, il giorno dopo, il giorno dopo, il giorno dopo, il giorno dopo, il giorno dopo
694
+
695
+ ### step_7000 (step 7000)
696
+
697
+ la sua vita cambiò il suo nome. La sua vita fu segnata da un'epoca di grande bellezza, che si concluse con la sua vita. La sua
698
+
699
+ ### step_8000 (step 8000)
700
+
701
+ la tempesta si era raffreddata. La tempesta si era raffreddata e la tempesta si era raffreddata. La tempesta si era raffreddata e la tempesta si era
702
+
703
+ ## gen_it_0005
704
+
705
+ - Language: `it`
706
+ - Category: `explanation`
707
+ - Prompt: `Un confronto tra checkpoint bilingui è utile perché`
708
+
709
+ ### step_4000 (step 4000)
710
+
711
+ il checkpoint bilingui è un'opzione per il checkpoint bilingui. Il checkpoint bilingui è un'opzione per il checkpoint
712
+
713
+ ### step_5000 (step 5000)
714
+
715
+ non è possibile. Per esempio, se si desidera utilizzare un'opzione di checkpoint bilingui, è possibile utilizzare un'opzione di checkpoint biling
716
+
717
+ ### step_6000 (step 6000)
718
+
719
+ non è possibile che il checkpoint bilingui non sia un'opzione di default bilingui.
720
+
721
+ ### step_7000 (step 7000)
722
+
723
+ non si tratta di un'opzione di pagamento. Se si tratta di un'opzione di pagamento, si tratta di un'opzione di pagamento. Se si tratta
724
+
725
+ ### step_8000 (step 8000)
726
+
727
+ non è possibile. Il checkpoint bilingui è un modo per il quale si desidera utilizzare il checkpoint bilingui. Il checkpoint bilingui
728
+
729
+ ## gen_it_0006
730
+
731
+ - Language: `it`
732
+ - Category: `explanation`
733
+ - Prompt: `Una curva di validation loss è più affidabile quando`
734
+
735
+ ### step_4000 (step 4000)
736
+
737
+ si tratta di un'autostrada di un'autostrada di un'autostrada di un'autostrada di un'autostrada di un'autostrada
738
+
739
+ ### step_5000 (step 5000)
740
+
741
+ si tratta di un’alternativa di un’alternativa di un’alternativa di un’alternativa di un’alternativa di un’alternativa
742
+
743
+ ### step_6000 (step 6000)
744
+
745
+ si tratta di un'altra cosa. Il problema è che il problema è che il problema è che il problema è che il problema è che il problema è che
746
+
747
+ ### step_7000 (step 7000)
748
+
749
+ si tratta di un'autostrada di autostrada. Il veicolo è stato costruito nel 2004, ma non è stato ancora completato. Il veicolo è stato costruito nel
750
+
751
+ ### step_8000 (step 8000)
752
+
753
+ si tratta di un'autostrada di un'autostrada di un'autostrada di un'autostrada di un'autostrada di un'autostrada
754
+
755
+ ## gen_it_0007
756
+
757
+ - Language: `it`
758
+ - Category: `explanation`
759
+ - Prompt: `Un dataset packed va ricostruito dopo un cambio di tokenizer perché`
760
+
761
+ ### step_4000 (step 4000)
762
+
763
+ il suo nome è stato un po' più grande. Il suo nome è "The Greatest" (in inglese "The Greatest" in inglese "The
764
+
765
+ ### step_5000 (step 5000)
766
+
767
+ non ha mai avuto un’idea di come la sua vita. Ma non è un’idea di come la vita di un’altra persona. Ma non è
768
+
769
+ ### step_6000 (step 6000)
770
+
771
+ non è stato possibile. Il suo nome è "Solve", che è stato usato per la sua "piccola" e per la sua "p
772
+
773
+ ### step_7000 (step 7000)
774
+
775
+ il suo nome è stato cambiato. Il suo nome è stato cambiato per la sua forma di "piccola" e "piccola" e "p
776
+
777
+ ### step_8000 (step 8000)
778
+
779
+ il suo nome è stato trovato in un'altra casa. Il suo nome è stato trovato in un'altra casa, ma il suo nome è stato trovato in
780
+
781
+ ## gen_it_0008
782
+
783
+ - Language: `it`
784
+ - Category: `news`
785
+ - Prompt: `Il laboratorio locale ha annunciato che`
786
+
787
+ ### step_4000 (step 4000)
788
+
789
+ il paziente è stato sottoposto a un intervento chirurgico per il trattamento di un paziente che ha avuto un'attitudine di tempo per un periodo di tempo prolungato.
790
+
791
+ ### step_5000 (step 5000)
792
+
793
+ il laboratorio ha iniziato a lavorare su un nuovo laboratorio di ricerca per la ricerca e la ricerca di nuovi metodi di ricerca per la ricerca. I ricercatori hanno scoperto che
794
+
795
+ ### step_6000 (step 6000)
796
+
797
+ il governo di centro-destra ha deciso di non essere in grado di garantire la sicurezza di tutti i cittadini. Il ministro della difesa, John Paul, ha
798
+
799
+ ### step_7000 (step 7000)
800
+
801
+ il laboratorio ha completato il suo progetto di ricerca. Il laboratorio ha completato il suo progetto di ricerca e ha completato il suo progetto di ricerca. Il laboratorio ha completato
802
+
803
+ ### step_8000 (step 8000)
804
+
805
+ il laboratorio ha completato il suo progetto di costruzione. Il laboratorio ha completato il progetto di costruzione di un nuovo impianto di produzione di energia elettrica, che è stato progettato
806
+
807
+ ## gen_it_0009
808
+
809
+ - Language: `it`
810
+ - Category: `news`
811
+ - Prompt: `Secondo il log operativo, la run`
812
+
813
+ ### step_4000 (step 4000)
814
+
815
+ -to-play è stata interrotta da un'altra versione di un'altra versione di un'altra versione di un'altra versione di un'altra
816
+
817
+ ### step_5000 (step 5000)
818
+
819
+ -off è stata una delle più grandi sfide di crescita del mondo. La tecnologia ha anche un'ampia gamma di applicazioni di tecnologie e tecnologie che hanno portato alla
820
+
821
+ ### step_6000 (step 6000)
822
+
823
+ e è una delle più grandi aziende di tutto il mondo. La maggior parte delle aziende di tutto il mondo, che è la più grande azienda di tutto il mondo
824
+
825
+ ### step_7000 (step 7000)
826
+
827
+ time è stata una delle più grandi aziende di tutto il mondo. La maggior parte dei prodotti di questo tipo di prodotti di questo tipo di prodotti di questo tipo di
828
+
829
+ ### step_8000 (step 8000)
830
+
831
+ -in è stata una delle più grandi aziende di tutto il mondo. La tecnologia è stata sviluppata per la prima volta nel 2003, ma è stata sviluppata per la
832
+
833
+ ## gen_it_0010
834
+
835
+ - Language: `it`
836
+ - Category: `news`
837
+ - Prompt: `L'ultimo riepilogo sperimentale ha notato che`
838
+
839
+ ### step_4000 (step 4000)
840
+
841
+ il suo studio è stato condotto da un team di ricercatori dell'Università di Harvard, che ha condotto una serie di esperimenti per indagare su un'eventuale ricerca.
842
+
843
+ ### step_5000 (step 5000)
844
+
845
+ la sua capacità di un'azione non è stata in grado di produrre un'azione non solo in termini di tempo, ma anche in termini di tempo. La
846
+
847
+ ### step_6000 (step 6000)
848
+
849
+ il suo lavoro è stato molto più volte, ma non ha mai avuto un'idea di come il suo lavoro. Il suo lavoro è stato molto più volte,
850
+
851
+ ### step_7000 (step 7000)
852
+
853
+ la sua presenza è stata confermata da un'altra parte del team di sviluppo. Il team ha anche mostrato che la sua presenza è stata confermata da un'altra
854
+
855
+ ### step_8000 (step 8000)
856
+
857
+ il suo lavoro è stato molto più complesso e che il suo lavoro è stato molto più complesso. Il suo lavoro è stato molto più complesso e ha anche dimostrato che
858
+
859
+ ## gen_it_0011
860
+
861
+ - Language: `it`
862
+ - Category: `dialogue`
863
+ - Prompt: `"Possiamo riprendere dal checkpoint migliore?"`
864
+
865
+ ### step_4000 (step 4000)
866
+
867
+ "E' un po' di tempo che non si può fare a meno di fare a meno di fare a meno di fare a meno di fare a meno di
868
+
869
+ ### step_5000 (step 5000)
870
+
871
+ "Sì, non è un'idea di un'idea di un'idea di un'idea di un'idea di un'idea di un'
872
+
873
+ ### step_6000 (step 6000)
874
+
875
+ <empty>
876
+
877
+ ### step_7000 (step 7000)
878
+
879
+ "Sì, non è un problema. Non è un problema. Non è un problema. Non è un problema. Non è un problema. Non è un
880
+
881
+ ### step_8000 (step 8000)
882
+
883
+ "Sì, non è un problema. Non è un problema. Non è un problema. Non è un problema. Non è un problema. Non è un
884
+
885
+ ## gen_it_0012
886
+
887
+ - Language: `it`
888
+ - Category: `dialogue`
889
+ - Prompt: `"Ho controllato il report due volte," disse l'ingegnere,`
890
+
891
+ ### step_4000 (step 4000)
892
+
893
+ "ma non ho mai visto niente di più che non ho mai visto niente di più che non ho mai visto niente di più che non ho visto niente di più
894
+
895
+ ### step_5000 (step 5000)
896
+
897
+ "ma non è un'altra cosa che non è un'altra cosa che non è un'altra cosa che non è un'altra cosa che non è
898
+
899
+ ### step_6000 (step 6000)
900
+
901
+ che aveva chiesto di essere in grado di fare un'indagine. L'uomo, che aveva chiesto di essere in grado di fare un'indagine, ha detto
902
+
903
+ ### step_7000 (step 7000)
904
+
905
+ "non è stato un problema, ma è stato un problema. Ho provato a fare un test di laboratorio per il mio lavoro, ma non ho mai avuto problemi
906
+
907
+ ### step_8000 (step 8000)
908
+
909
+ "non è stato un problema. Non è stato un problema. Non è stato un problema. Non è stato un problema. Non è stato un problema. Non
910
+
911
+ ## gen_it_0013
912
+
913
+ - Language: `it`
914
+ - Category: `description`
915
+ - Prompt: `La piccola workstation sotto la scrivania`
916
+
917
+ ### step_4000 (step 4000)
918
+
919
+ di un'orchestra di un'orchestra di un'orchestra di un'orchestra di un'orchestra di un'orchestra di un'orchestra di un'orchestra
920
+
921
+ ### step_5000 (step 5000)
922
+
923
+ .
924
+
925
+ ### step_6000 (step 6000)
926
+
927
+ è un'ottima scelta.
928
+
929
+ ### step_7000 (step 7000)
930
+
931
+ di un'altra nota, che ha fatto il suo debutto in Serie A, con la maglia della Nazionale italiana, che ha visto la sua prima rete in Serie
932
+
933
+ ### step_8000 (step 8000)
934
+
935
+ di un'altra nota.
936
+
937
+ ## gen_it_0014
938
+
939
+ - Language: `it`
940
+ - Category: `description`
941
+ - Prompt: `La dashboard di training sullo schermo`
942
+
943
+ ### step_4000 (step 4000)
944
+
945
+ è un'esperienza di un'esperienza di un'esperienza di un'esperienza di un'esperienza di un'esperienza di un'esperienza di un'esperienza
946
+
947
+ ### step_5000 (step 5000)
948
+
949
+ è un'esperienza di fitness che si concentra su un'esperienza di fitness. I nostri obiettivi sono: - I nostri obiettivi sono: - I nostri obiettivi sono
950
+
951
+ ### step_6000 (step 6000)
952
+
953
+ è un'ottima scelta per il vostro corpo. Il vostro corpo è un'ottima scelta per il vostro corpo. Il vostro corpo è un'ottima scelta per
954
+
955
+ ### step_7000 (step 7000)
956
+
957
+ , la tecnologia di base di un'azienda di produzione di apparecchiature, la tecnologia di base di un'azienda di produzione di apparecchiature, la tecnologia di
958
+
959
+ ### step_8000 (step 8000)
960
+
961
+ , che è stato progettato per la prima volta da un'azienda di produzione di chip, che ha portato a un'ampia gamma di applicazioni. Il primo prototipo
962
+
963
+ ## gen_it_0015
964
+
965
+ - Language: `it`
966
+ - Category: `instructional`
967
+ - Prompt: `Per confrontare due checkpoint pretrained, prima`
968
+
969
+ ### step_4000 (step 4000)
970
+
971
+ di essere in grado di eseguire un checkpoint predefinito. Per ulteriori informazioni, visitare www.sacredit.com.
972
+
973
+ ### step_5000 (step 5000)
974
+
975
+ di iniziare a fare un passo indietro. Per esempio, se non è possibile eseguire un checkpoint predefinito, è possibile utilizzare un checkpoint predefinito
976
+
977
+ ### step_6000 (step 6000)
978
+
979
+ di iniziare a scaricare la cartella. Per la prima volta, la cartella è stata chiusa. La cartella è stata chiusa. La cartella è stata
980
+
981
+ ### step_7000 (step 7000)
982
+
983
+ di iniziare a scaricare il tuo account. Se non hai ancora trovato il tuo account, non hai ancora trovato il tuo account. Se non hai ancora trovato il tuo
984
+
985
+ ### step_8000 (step 8000)
986
+
987
+ di iniziare a fare clic su un'altra. Se si desidera utilizzare un'opzione di pagamento, si consiglia di utilizzare un'opzione di pagamento per il pagamento
988
+
989
+ ## gen_it_0016
990
+
991
+ - Language: `it`
992
+ - Category: `instructional`
993
+ - Prompt: `Quando una run si interrompe inaspettatamente, il passo più sicuro è`
994
+
995
+ ### step_4000 (step 4000)
996
+
997
+ quello di un'altra. Il tempo è che la sua vita è più forte, ma non è che la sua vita è più forte. La sua vita è
998
+
999
+ ### step_5000 (step 5000)
1000
+
1001
+ quello di un’altra fase. La maggior parte dei casi di un’infezione da virus è la malattia di un’infezione da virus. La maggior parte dei
1002
+
1003
+ ### step_6000 (step 6000)
1004
+
1005
+ quello di un’altra, ma non è più così. Il passo più importante è quello di un’altra, ma non è più così. Il passo più
1006
+
1007
+ ### step_7000 (step 7000)
1008
+
1009
+ quello di un'altra fase. Il passo più sicuro è quello di un'altra fase. Il passo più sicuro è quello di un'altra fase. Il
1010
+
1011
+ ### step_8000 (step 8000)
1012
+
1013
+ quello di un’altra fase. Il tempo di un’altra fase è quello di un’altra fase. Il tempo di un’altra fase è quello di
1014
+
1015
+ ## gen_it_0017
1016
+
1017
+ - Language: `it`
1018
+ - Category: `reflection`
1019
+ - Prompt: `Una lezione chiara del pilot run è stata che`
1020
+
1021
+ ### step_4000 (step 4000)
1022
+
1023
+ il pilota di bordo è stato lanciato da un'auto a bordo della vettura. Il pilota ha poi aggiunto che il pilota ha dovuto essere stato in grado di volare
1024
+
1025
+ ### step_5000 (step 5000)
1026
+
1027
+ la NASA ha lanciato un nuovo test di test di test di test di test di test di test di test di test di test di test di test di test di
1028
+
1029
+ ### step_6000 (step 6000)
1030
+
1031
+ la nave ha fatto il giro di un'ora dopo che la nave ha fatto il giro di un giro di un'ora dopo che la nave ha fatto il
1032
+
1033
+ ### step_7000 (step 7000)
1034
+
1035
+ la sua vita è stata un'esperienza molto particolare. La sua vita è stata un'esperienza molto particolare. La sua vita è stata un'esperienza molto particolare
1036
+
1037
+ ### step_8000 (step 8000)
1038
+
1039
+ la sua vita è stata una delle più grandi sfide che ha portato alla nascita di un uomo che ha avuto un'infanzia felice.
1040
+
1041
+ ## gen_it_0018
1042
+
1043
+ - Language: `it`
1044
+ - Category: `reflection`
1045
+ - Prompt: `Le probe bilingui hanno suggerito che`
1046
+
1047
+ ### step_4000 (step 4000)
1048
+
1049
+ la loro esistenza è stata in parte dovuta alla loro morte. La loro morte è stata in parte dovuta alla morte di un uomo che ha perso la vita. La
1050
+
1051
+ ### step_5000 (step 5000)
1052
+
1053
+ la loro presenza è stata una delle più grandiosa e ha avuto un impatto significativo sulla salute. La maggior parte dei pazienti ha avuto un effetto positivo sulla salute e
1054
+
1055
+ ### step_6000 (step 6000)
1056
+
1057
+ il loro lavoro è stato molto più grande di quello che ha fatto. Il lavoro è stato molto più grande di quello che ha fatto. Il lavoro è stato molto
1058
+
1059
+ ### step_7000 (step 7000)
1060
+
1061
+ la loro vita è stata influenzata da un'altra parte della popolazione. La loro vita è stata influenzata da un'altra parte della popolazione. La loro
1062
+
1063
+ ### step_8000 (step 8000)
1064
+
1065
+ il loro lavoro è stato molto più facile da fare. Il loro lavoro è stato molto più facile da fare. Il loro lavoro è stato molto più facile da fare
1066
+
1067
+ ## gen_it_0019
1068
+
1069
+ - Language: `it`
1070
+ - Category: `technical`
1071
+ - Prompt: `Una validation loss pesata sui token evita`
1072
+
1073
+ ### step_4000 (step 4000)
1074
+
1075
+ la vita di un uomo che, in un certo senso, non ha mai avuto la possibilità di essere più in grado di essere più in grado di essere più in
1076
+
1077
+ ### step_5000 (step 5000)
1078
+
1079
+ di non essere mai stata in grado di non essere mai stata in grado di non essere in grado di non essere in grado di non essere in grado di non essere
1080
+
1081
+ ### step_6000 (step 6000)
1082
+
1083
+ bility. The result is that the risk of the risk of the risk of the risk of the risk of the risk of the risk of the risk of the risk
1084
+
1085
+ ### step_7000 (step 7000)
1086
+
1087
+ di un'infezione. La malattia è stata descritta in modo da un'infezione che ha colpito la popolazione di un'infezione. La malattia è stata descritta in
1088
+
1089
+ ### step_8000 (step 8000)
1090
+
1091
+ di un'altra malattia.
1092
+
1093
+ ## gen_it_0020
1094
+
1095
+ - Language: `it`
1096
+ - Category: `technical`
1097
+ - Prompt: `Una metrica leggera di ripetizione può rivelare`
1098
+
1099
+ ### step_4000 (step 4000)
1100
+
1101
+ la natura del vento. La luce è un'immagine di un'immagine di un'immagine di un'immagine di un'immagine di un'immagine di
1102
+
1103
+ ### step_5000 (step 5000)
1104
+
1105
+ un'immagine di un'immagine di un'immagine di un'immagine di un'immagine di un'immagine di un'immagine di un'immagine di
1106
+
1107
+ ### step_6000 (step 6000)
1108
+
1109
+ la sua bellezza e la sua bellezza.
1110
+
1111
+ ### step_7000 (step 7000)
1112
+
1113
+ la presenza di un'infezione da virus, virus o virus. La maggior parte dei casi di infezione da virus è la malattia di cui si è sviluppato la malattia
1114
+
1115
+ ### step_8000 (step 8000)
1116
+
1117
+ la presenza di un'ampia gamma di fattori che possono essere facilmente individuabili. La maggior parte dei casi di ripetizione può essere un'attività di ripetizione
1118
+
1119
+
1120
+ ## Repetition diagnostics
1121
+
1122
+ | checkpoint_name | distinct_1 | distinct_2 | repeated_4gram_rate | loop_rate |
1123
+ | --- | --- | --- | --- | --- |
1124
+ | step_4000 | 0.2063 | 0.4251 | 0.9000 | 0.7250 |
1125
+ | step_5000 | 0.2099 | 0.4202 | 0.8500 | 0.5750 |
1126
+ | step_6000 | 0.2228 | 0.4377 | 0.7750 | 0.5250 |
1127
+ | step_7000 | 0.2151 | 0.4431 | 0.8750 | 0.5500 |
1128
+ | step_8000 | 0.2289 | 0.4706 | 0.7500 | 0.4000 |
1129
+
1130
+ ## Language-switch diagnostics
1131
+
1132
+ | checkpoint_name | language_switch_rate_en | language_switch_rate_it | language_consistency_en | language_consistency_it |
1133
+ | --- | --- | --- | --- | --- |
1134
+ | step_4000 | 0.0000 | 0.0500 | 0.9500 | 0.8500 |
1135
+ | step_5000 | 0.0000 | 0.0000 | 0.8750 | 0.7500 |
1136
+ | step_6000 | 0.0000 | 0.0500 | 0.9250 | 0.8250 |
1137
+ | step_7000 | 0.0000 | 0.0000 | 0.9250 | 0.7500 |
1138
+ | step_8000 | 0.0000 | 0.0000 | 1.0000 | 0.7750 |
1139
+
1140
+ ## Checkpoint recommendation impact
1141
+
1142
+ - Recommended checkpoint: use `step_4000` based on `val_loss_mixed`.
benchmark_scores.json ADDED
@@ -0,0 +1 @@
 
 
1
+ []
benchmark_source_losses.json ADDED
@@ -0,0 +1,309 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "checkpoints": {
3
+ "20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki-step_4000-step_4000-4000": {
4
+ "books_en": {
5
+ "loss": 4.994737534295945,
6
+ "num_batches": 1,
7
+ "num_sequences": 1,
8
+ "num_tokens": 21,
9
+ "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/books_en.jsonl",
10
+ "perplexity": 147.6341913859129
11
+ },
12
+ "books_it": {
13
+ "loss": 5.027433122907366,
14
+ "num_batches": 1,
15
+ "num_sequences": 1,
16
+ "num_tokens": 21,
17
+ "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/books_it.jsonl",
18
+ "perplexity": 152.54095584476084
19
+ },
20
+ "code": {
21
+ "loss": 8.615218098958334,
22
+ "num_batches": 1,
23
+ "num_sequences": 1,
24
+ "num_tokens": 30,
25
+ "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/code.jsonl",
26
+ "perplexity": 5514.951287713211
27
+ },
28
+ "web_en": {
29
+ "loss": 6.153774060701069,
30
+ "num_batches": 1,
31
+ "num_sequences": 1,
32
+ "num_tokens": 19,
33
+ "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/web_en.jsonl",
34
+ "perplexity": 470.48969695119644
35
+ },
36
+ "web_it": {
37
+ "loss": 6.019744873046875,
38
+ "num_batches": 1,
39
+ "num_sequences": 1,
40
+ "num_tokens": 19,
41
+ "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/web_it.jsonl",
42
+ "perplexity": 411.47360432720393
43
+ },
44
+ "wiki_en": {
45
+ "loss": 3.8654462640935723,
46
+ "num_batches": 1,
47
+ "num_sequences": 1,
48
+ "num_tokens": 22,
49
+ "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/wiki_en.jsonl",
50
+ "perplexity": 47.72456544112049
51
+ },
52
+ "wiki_it": {
53
+ "loss": 3.56423828125,
54
+ "num_batches": 1,
55
+ "num_sequences": 1,
56
+ "num_tokens": 25,
57
+ "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/wiki_it.jsonl",
58
+ "perplexity": 35.312544929652795
59
+ }
60
+ },
61
+ "20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki-step_5000-step_5000-5000": {
62
+ "books_en": {
63
+ "loss": 5.295532953171503,
64
+ "num_batches": 1,
65
+ "num_sequences": 1,
66
+ "num_tokens": 21,
67
+ "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/books_en.jsonl",
68
+ "perplexity": 199.44389190139776
69
+ },
70
+ "books_it": {
71
+ "loss": 5.265413556780134,
72
+ "num_batches": 1,
73
+ "num_sequences": 1,
74
+ "num_tokens": 21,
75
+ "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/books_it.jsonl",
76
+ "perplexity": 193.52632636477796
77
+ },
78
+ "code": {
79
+ "loss": 8.621547444661458,
80
+ "num_batches": 1,
81
+ "num_sequences": 1,
82
+ "num_tokens": 30,
83
+ "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/code.jsonl",
84
+ "perplexity": 5549.968020553558
85
+ },
86
+ "web_en": {
87
+ "loss": 6.209454185084293,
88
+ "num_batches": 1,
89
+ "num_sequences": 1,
90
+ "num_tokens": 19,
91
+ "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/web_en.jsonl",
92
+ "perplexity": 497.4296726428682
93
+ },
94
+ "web_it": {
95
+ "loss": 6.207602249948602,
96
+ "num_batches": 1,
97
+ "num_sequences": 1,
98
+ "num_tokens": 19,
99
+ "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/web_it.jsonl",
100
+ "perplexity": 496.50931763649476
101
+ },
102
+ "wiki_en": {
103
+ "loss": 4.054272738370028,
104
+ "num_batches": 1,
105
+ "num_sequences": 1,
106
+ "num_tokens": 22,
107
+ "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/wiki_en.jsonl",
108
+ "perplexity": 57.64322604191456
109
+ },
110
+ "wiki_it": {
111
+ "loss": 3.58208984375,
112
+ "num_batches": 1,
113
+ "num_sequences": 1,
114
+ "num_tokens": 25,
115
+ "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/wiki_it.jsonl",
116
+ "perplexity": 35.94858933468458
117
+ }
118
+ },
119
+ "20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki-step_6000-step_6000-6000": {
120
+ "books_en": {
121
+ "loss": 5.409920828683036,
122
+ "num_batches": 1,
123
+ "num_sequences": 1,
124
+ "num_tokens": 21,
125
+ "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/books_en.jsonl",
126
+ "perplexity": 223.6138831740883
127
+ },
128
+ "books_it": {
129
+ "loss": 5.199875967843192,
130
+ "num_batches": 1,
131
+ "num_sequences": 1,
132
+ "num_tokens": 21,
133
+ "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/books_it.jsonl",
134
+ "perplexity": 181.24975968230822
135
+ },
136
+ "code": {
137
+ "loss": 8.31853790283203,
138
+ "num_batches": 1,
139
+ "num_sequences": 1,
140
+ "num_tokens": 30,
141
+ "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/code.jsonl",
142
+ "perplexity": 4099.162251172332
143
+ },
144
+ "web_en": {
145
+ "loss": 6.131187037417763,
146
+ "num_batches": 1,
147
+ "num_sequences": 1,
148
+ "num_tokens": 19,
149
+ "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/web_en.jsonl",
150
+ "perplexity": 459.98185240790855
151
+ },
152
+ "web_it": {
153
+ "loss": 6.786579332853618,
154
+ "num_batches": 1,
155
+ "num_sequences": 1,
156
+ "num_tokens": 19,
157
+ "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/web_it.jsonl",
158
+ "perplexity": 885.878079061639
159
+ },
160
+ "wiki_en": {
161
+ "loss": 4.168480613014915,
162
+ "num_batches": 1,
163
+ "num_sequences": 1,
164
+ "num_tokens": 22,
165
+ "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/wiki_en.jsonl",
166
+ "perplexity": 64.617198952921
167
+ },
168
+ "wiki_it": {
169
+ "loss": 3.8566854858398436,
170
+ "num_batches": 1,
171
+ "num_sequences": 1,
172
+ "num_tokens": 25,
173
+ "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/wiki_it.jsonl",
174
+ "perplexity": 47.30828722907459
175
+ }
176
+ },
177
+ "20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki-step_7000-step_7000-7000": {
178
+ "books_en": {
179
+ "loss": 5.137017386300223,
180
+ "num_batches": 1,
181
+ "num_sequences": 1,
182
+ "num_tokens": 21,
183
+ "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/books_en.jsonl",
184
+ "perplexity": 170.20734771999355
185
+ },
186
+ "books_it": {
187
+ "loss": 5.132158551897321,
188
+ "num_batches": 1,
189
+ "num_sequences": 1,
190
+ "num_tokens": 21,
191
+ "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/books_it.jsonl",
192
+ "perplexity": 169.38234430383022
193
+ },
194
+ "code": {
195
+ "loss": 8.338209533691407,
196
+ "num_batches": 1,
197
+ "num_sequences": 1,
198
+ "num_tokens": 30,
199
+ "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/code.jsonl",
200
+ "perplexity": 4180.59781690682
201
+ },
202
+ "web_en": {
203
+ "loss": 6.089872661389802,
204
+ "num_batches": 1,
205
+ "num_sequences": 1,
206
+ "num_tokens": 19,
207
+ "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/web_en.jsonl",
208
+ "perplexity": 441.36520473566287
209
+ },
210
+ "web_it": {
211
+ "loss": 6.3964783517937915,
212
+ "num_batches": 1,
213
+ "num_sequences": 1,
214
+ "num_tokens": 19,
215
+ "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/web_it.jsonl",
216
+ "perplexity": 599.72927903983
217
+ },
218
+ "wiki_en": {
219
+ "loss": 4.025867115367543,
220
+ "num_batches": 1,
221
+ "num_sequences": 1,
222
+ "num_tokens": 22,
223
+ "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/wiki_en.jsonl",
224
+ "perplexity": 56.02887121924366
225
+ },
226
+ "wiki_it": {
227
+ "loss": 3.61209228515625,
228
+ "num_batches": 1,
229
+ "num_sequences": 1,
230
+ "num_tokens": 25,
231
+ "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/wiki_it.jsonl",
232
+ "perplexity": 37.04347730722537
233
+ }
234
+ },
235
+ "20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki-step_8000-step_8000-8000": {
236
+ "books_en": {
237
+ "loss": 5.153700692313058,
238
+ "num_batches": 1,
239
+ "num_sequences": 1,
240
+ "num_tokens": 21,
241
+ "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/books_en.jsonl",
242
+ "perplexity": 173.0707884007461
243
+ },
244
+ "books_it": {
245
+ "loss": 5.1257749285016745,
246
+ "num_batches": 1,
247
+ "num_sequences": 1,
248
+ "num_tokens": 21,
249
+ "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/books_it.jsonl",
250
+ "perplexity": 168.30451509598072
251
+ },
252
+ "code": {
253
+ "loss": 8.328606669108073,
254
+ "num_batches": 1,
255
+ "num_sequences": 1,
256
+ "num_tokens": 30,
257
+ "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/code.jsonl",
258
+ "perplexity": 4140.644243596859
259
+ },
260
+ "web_en": {
261
+ "loss": 6.201997455797698,
262
+ "num_batches": 1,
263
+ "num_sequences": 1,
264
+ "num_tokens": 19,
265
+ "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/web_en.jsonl",
266
+ "perplexity": 493.73426916939
267
+ },
268
+ "web_it": {
269
+ "loss": 6.4544139661287,
270
+ "num_batches": 1,
271
+ "num_sequences": 1,
272
+ "num_tokens": 19,
273
+ "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/web_it.jsonl",
274
+ "perplexity": 635.501191880854
275
+ },
276
+ "wiki_en": {
277
+ "loss": 3.9959980357776987,
278
+ "num_batches": 1,
279
+ "num_sequences": 1,
280
+ "num_tokens": 22,
281
+ "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/wiki_en.jsonl",
282
+ "perplexity": 54.38008682172939
283
+ },
284
+ "wiki_it": {
285
+ "loss": 3.62702880859375,
286
+ "num_batches": 1,
287
+ "num_sequences": 1,
288
+ "num_tokens": 25,
289
+ "path": "/home/descanso/.openclaw/workspace/python_project/llm-nanochat-dev-worktree/eval/pretrain/sources/wiki_it.jsonl",
290
+ "perplexity": 37.60093091976498
291
+ }
292
+ }
293
+ },
294
+ "recommended_checkpoint": {
295
+ "checkpoint_name": "step_4000",
296
+ "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt",
297
+ "direction": "min",
298
+ "value": 5.143982324844751
299
+ },
300
+ "source_losses": {
301
+ "books_en": 4.994737534295945,
302
+ "books_it": 5.027433122907366,
303
+ "code": 8.615218098958334,
304
+ "web_en": 6.153774060701069,
305
+ "web_it": 6.019744873046875,
306
+ "wiki_en": 3.8654462640935723,
307
+ "wiki_it": 3.56423828125
308
+ }
309
+ }
best_validation.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "step": 8000,
3
+ "validation_loss": 3.882301174864477,
4
+ "validation_perplexity": 48.53577596923642,
5
+ "validation_num_batches": 128,
6
+ "elapsed_sec": 66536.9193212986,
7
+ "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_8000.pt"
8
+ }
comparison.json ADDED
@@ -0,0 +1,404 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "contains_mixed_model_types": false,
3
+ "metric_recommendations": {
4
+ "cloze_en_contains": {
5
+ "checkpoint_name": "step_7000",
6
+ "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_7000.pt",
7
+ "direction": "max",
8
+ "value": 0.04
9
+ },
10
+ "cloze_en_exact": {
11
+ "checkpoint_name": "step_4000",
12
+ "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt",
13
+ "direction": "max",
14
+ "value": 0.0
15
+ },
16
+ "cloze_it_contains": {
17
+ "checkpoint_name": "step_8000",
18
+ "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_8000.pt",
19
+ "direction": "max",
20
+ "value": 0.12
21
+ },
22
+ "cloze_it_exact": {
23
+ "checkpoint_name": "step_4000",
24
+ "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt",
25
+ "direction": "max",
26
+ "value": 0.0
27
+ },
28
+ "distinct_1": {
29
+ "checkpoint_name": "step_8000",
30
+ "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_8000.pt",
31
+ "direction": "max",
32
+ "value": 0.22893954410307235
33
+ },
34
+ "distinct_2": {
35
+ "checkpoint_name": "step_8000",
36
+ "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_8000.pt",
37
+ "direction": "max",
38
+ "value": 0.47058823529411764
39
+ },
40
+ "language_consistency_en": {
41
+ "checkpoint_name": "step_8000",
42
+ "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_8000.pt",
43
+ "direction": "max",
44
+ "value": 1.0
45
+ },
46
+ "language_consistency_it": {
47
+ "checkpoint_name": "step_4000",
48
+ "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt",
49
+ "direction": "max",
50
+ "value": 0.85
51
+ },
52
+ "language_switch_rate_en": {
53
+ "checkpoint_name": "step_4000",
54
+ "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt",
55
+ "direction": "min",
56
+ "value": 0.0
57
+ },
58
+ "language_switch_rate_it": {
59
+ "checkpoint_name": "step_5000",
60
+ "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_5000.pt",
61
+ "direction": "min",
62
+ "value": 0.0
63
+ },
64
+ "loop_rate": {
65
+ "checkpoint_name": "step_8000",
66
+ "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_8000.pt",
67
+ "direction": "min",
68
+ "value": 0.4
69
+ },
70
+ "ppl_en": {
71
+ "checkpoint_name": "step_4000",
72
+ "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt",
73
+ "direction": "min",
74
+ "value": 119.05713337502907
75
+ },
76
+ "ppl_it": {
77
+ "checkpoint_name": "step_4000",
78
+ "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt",
79
+ "direction": "min",
80
+ "value": 57.27657765139552
81
+ },
82
+ "ppl_mixed": {
83
+ "checkpoint_name": "step_4000",
84
+ "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt",
85
+ "direction": "min",
86
+ "value": 171.39696944872787
87
+ },
88
+ "repeated_4gram_rate": {
89
+ "checkpoint_name": "step_8000",
90
+ "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_8000.pt",
91
+ "direction": "min",
92
+ "value": 0.75
93
+ },
94
+ "source_loss_books_en": {
95
+ "checkpoint_name": "step_4000",
96
+ "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt",
97
+ "direction": "min",
98
+ "value": 4.994737534295945
99
+ },
100
+ "source_loss_books_it": {
101
+ "checkpoint_name": "step_4000",
102
+ "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt",
103
+ "direction": "min",
104
+ "value": 5.027433122907366
105
+ },
106
+ "source_loss_code": {
107
+ "checkpoint_name": "step_6000",
108
+ "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_6000.pt",
109
+ "direction": "min",
110
+ "value": 8.31853790283203
111
+ },
112
+ "source_loss_web_en": {
113
+ "checkpoint_name": "step_7000",
114
+ "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_7000.pt",
115
+ "direction": "min",
116
+ "value": 6.089872661389802
117
+ },
118
+ "source_loss_web_it": {
119
+ "checkpoint_name": "step_4000",
120
+ "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt",
121
+ "direction": "min",
122
+ "value": 6.019744873046875
123
+ },
124
+ "source_loss_wiki_en": {
125
+ "checkpoint_name": "step_4000",
126
+ "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt",
127
+ "direction": "min",
128
+ "value": 3.8654462640935723
129
+ },
130
+ "source_loss_wiki_it": {
131
+ "checkpoint_name": "step_4000",
132
+ "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt",
133
+ "direction": "min",
134
+ "value": 3.56423828125
135
+ },
136
+ "val_loss_en": {
137
+ "checkpoint_name": "step_4000",
138
+ "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt",
139
+ "direction": "min",
140
+ "value": 4.779603490289652
141
+ },
142
+ "val_loss_it": {
143
+ "checkpoint_name": "step_4000",
144
+ "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt",
145
+ "direction": "min",
146
+ "value": 4.047891773161341
147
+ },
148
+ "val_loss_mixed": {
149
+ "checkpoint_name": "step_4000",
150
+ "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt",
151
+ "direction": "min",
152
+ "value": 5.143982324844751
153
+ }
154
+ },
155
+ "recommended_checkpoint": {
156
+ "checkpoint_name": "step_4000",
157
+ "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt",
158
+ "direction": "min",
159
+ "value": 5.143982324844751
160
+ },
161
+ "recommended_metric": "val_loss_mixed",
162
+ "rows": [
163
+ {
164
+ "aggregate_dataset_count": 0,
165
+ "aggregate_validation_loss_mean": null,
166
+ "aggregate_validation_perplexity_mean": null,
167
+ "checkpoint_name": "step_4000",
168
+ "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt",
169
+ "checkpoint_selector": "step_4000",
170
+ "checkpoint_step": 4000,
171
+ "cloze_en_contains": 0.0,
172
+ "cloze_en_exact": 0.0,
173
+ "cloze_it_contains": 0.08,
174
+ "cloze_it_exact": 0.0,
175
+ "delta_vs_previous_generation_pass_rate": null,
176
+ "delta_vs_previous_validation_loss_mean": null,
177
+ "distinct_1": 0.20633397312859886,
178
+ "distinct_2": 0.4251497005988024,
179
+ "generation_pass_rate": null,
180
+ "generation_pass_rate_regression_vs_previous": false,
181
+ "generation_passed_prompts": 0,
182
+ "generation_scored_prompts": 0,
183
+ "generation_total_prompts": 40,
184
+ "language_consistency_en": 0.95,
185
+ "language_consistency_it": 0.85,
186
+ "language_switch_rate_en": 0.0,
187
+ "language_switch_rate_it": 0.05,
188
+ "loop_rate": 0.725,
189
+ "model_type": "pretrained",
190
+ "ppl_en": 119.05713337502907,
191
+ "ppl_it": 57.27657765139552,
192
+ "ppl_mixed": 171.39696944872787,
193
+ "repeated_4gram_rate": 0.9,
194
+ "run_dir": "/mnt/apps/llm-nanochat/artifacts/runs/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki",
195
+ "run_name": "20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki",
196
+ "selected_by": "step_4000",
197
+ "selection_metric_name": null,
198
+ "selection_metric_value": null,
199
+ "source_loss_books_en": 4.994737534295945,
200
+ "source_loss_books_it": 5.027433122907366,
201
+ "source_loss_code": 8.615218098958334,
202
+ "source_loss_web_en": 6.153774060701069,
203
+ "source_loss_web_it": 6.019744873046875,
204
+ "source_loss_wiki_en": 3.8654462640935723,
205
+ "source_loss_wiki_it": 3.56423828125,
206
+ "val_loss_en": 4.779603490289652,
207
+ "val_loss_it": 4.047891773161341,
208
+ "val_loss_mixed": 5.143982324844751,
209
+ "validation_loss_regression_vs_previous": false
210
+ },
211
+ {
212
+ "aggregate_dataset_count": 0,
213
+ "aggregate_validation_loss_mean": null,
214
+ "aggregate_validation_perplexity_mean": null,
215
+ "checkpoint_name": "step_5000",
216
+ "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_5000.pt",
217
+ "checkpoint_selector": "step_5000",
218
+ "checkpoint_step": 5000,
219
+ "cloze_en_contains": 0.02,
220
+ "cloze_en_exact": 0.0,
221
+ "cloze_it_contains": 0.1,
222
+ "cloze_it_exact": 0.0,
223
+ "delta_vs_previous_generation_pass_rate": null,
224
+ "delta_vs_previous_validation_loss_mean": null,
225
+ "distinct_1": 0.2099009900990099,
226
+ "distinct_2": 0.42018537590113286,
227
+ "generation_pass_rate": null,
228
+ "generation_pass_rate_regression_vs_previous": false,
229
+ "generation_passed_prompts": 0,
230
+ "generation_scored_prompts": 0,
231
+ "generation_total_prompts": 40,
232
+ "language_consistency_en": 0.875,
233
+ "language_consistency_it": 0.75,
234
+ "language_switch_rate_en": 0.0,
235
+ "language_switch_rate_it": 0.0,
236
+ "loop_rate": 0.575,
237
+ "model_type": "pretrained",
238
+ "ppl_en": 156.06746197419037,
239
+ "ppl_it": 67.12826498983866,
240
+ "ppl_mixed": 213.81993054234522,
241
+ "repeated_4gram_rate": 0.85,
242
+ "run_dir": "/mnt/apps/llm-nanochat/artifacts/runs/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki",
243
+ "run_name": "20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki",
244
+ "selected_by": "step_5000",
245
+ "selection_metric_name": null,
246
+ "selection_metric_value": null,
247
+ "source_loss_books_en": 5.295532953171503,
248
+ "source_loss_books_it": 5.265413556780134,
249
+ "source_loss_code": 8.621547444661458,
250
+ "source_loss_web_en": 6.209454185084293,
251
+ "source_loss_web_it": 6.207602249948602,
252
+ "source_loss_wiki_en": 4.054272738370028,
253
+ "source_loss_wiki_it": 3.58208984375,
254
+ "val_loss_en": 5.050288362323113,
255
+ "val_loss_it": 4.206605192090644,
256
+ "val_loss_mixed": 5.365134214743589,
257
+ "validation_loss_regression_vs_previous": false
258
+ },
259
+ {
260
+ "aggregate_dataset_count": 0,
261
+ "aggregate_validation_loss_mean": null,
262
+ "aggregate_validation_perplexity_mean": null,
263
+ "checkpoint_name": "step_6000",
264
+ "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_6000.pt",
265
+ "checkpoint_selector": "step_6000",
266
+ "checkpoint_step": 6000,
267
+ "cloze_en_contains": 0.0,
268
+ "cloze_en_exact": 0.0,
269
+ "cloze_it_contains": 0.06,
270
+ "cloze_it_exact": 0.0,
271
+ "delta_vs_previous_generation_pass_rate": null,
272
+ "delta_vs_previous_validation_loss_mean": null,
273
+ "distinct_1": 0.22282023681377824,
274
+ "distinct_2": 0.4377104377104377,
275
+ "generation_pass_rate": null,
276
+ "generation_pass_rate_regression_vs_previous": false,
277
+ "generation_passed_prompts": 0,
278
+ "generation_scored_prompts": 0,
279
+ "generation_total_prompts": 40,
280
+ "language_consistency_en": 0.925,
281
+ "language_consistency_it": 0.825,
282
+ "language_switch_rate_en": 0.0,
283
+ "language_switch_rate_it": 0.05,
284
+ "loop_rate": 0.525,
285
+ "model_type": "pretrained",
286
+ "ppl_en": 192.97065258207533,
287
+ "ppl_it": 78.20997526013538,
288
+ "ppl_mixed": 253.77314221335507,
289
+ "repeated_4gram_rate": 0.775,
290
+ "run_dir": "/mnt/apps/llm-nanochat/artifacts/runs/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki",
291
+ "run_name": "20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki",
292
+ "selected_by": "step_6000",
293
+ "selection_metric_name": null,
294
+ "selection_metric_value": null,
295
+ "source_loss_books_en": 5.409920828683036,
296
+ "source_loss_books_it": 5.199875967843192,
297
+ "source_loss_code": 8.31853790283203,
298
+ "source_loss_web_en": 6.131187037417763,
299
+ "source_loss_web_it": 6.786579332853618,
300
+ "source_loss_wiki_en": 4.168480613014915,
301
+ "source_loss_wiki_it": 3.8566854858398436,
302
+ "val_loss_en": 5.262538118182488,
303
+ "val_loss_it": 4.359397200287366,
304
+ "val_loss_mixed": 5.536440727038261,
305
+ "validation_loss_regression_vs_previous": false
306
+ },
307
+ {
308
+ "aggregate_dataset_count": 0,
309
+ "aggregate_validation_loss_mean": null,
310
+ "aggregate_validation_perplexity_mean": null,
311
+ "checkpoint_name": "step_7000",
312
+ "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_7000.pt",
313
+ "checkpoint_selector": "step_7000",
314
+ "checkpoint_step": 7000,
315
+ "cloze_en_contains": 0.04,
316
+ "cloze_en_exact": 0.0,
317
+ "cloze_it_contains": 0.1,
318
+ "cloze_it_exact": 0.0,
319
+ "delta_vs_previous_generation_pass_rate": null,
320
+ "delta_vs_previous_validation_loss_mean": null,
321
+ "distinct_1": 0.21511627906976744,
322
+ "distinct_2": 0.4431017119838872,
323
+ "generation_pass_rate": null,
324
+ "generation_pass_rate_regression_vs_previous": false,
325
+ "generation_passed_prompts": 0,
326
+ "generation_scored_prompts": 0,
327
+ "generation_total_prompts": 40,
328
+ "language_consistency_en": 0.925,
329
+ "language_consistency_it": 0.75,
330
+ "language_switch_rate_en": 0.0,
331
+ "language_switch_rate_it": 0.0,
332
+ "loop_rate": 0.55,
333
+ "model_type": "pretrained",
334
+ "ppl_en": 154.45723780849602,
335
+ "ppl_it": 61.331402298576066,
336
+ "ppl_mixed": 206.7071249834935,
337
+ "repeated_4gram_rate": 0.875,
338
+ "run_dir": "/mnt/apps/llm-nanochat/artifacts/runs/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki",
339
+ "run_name": "20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki",
340
+ "selected_by": "step_7000",
341
+ "selection_metric_name": null,
342
+ "selection_metric_value": null,
343
+ "source_loss_books_en": 5.137017386300223,
344
+ "source_loss_books_it": 5.132158551897321,
345
+ "source_loss_code": 8.338209533691407,
346
+ "source_loss_web_en": 6.089872661389802,
347
+ "source_loss_web_it": 6.3964783517937915,
348
+ "source_loss_wiki_en": 4.025867115367543,
349
+ "source_loss_wiki_it": 3.61209228515625,
350
+ "val_loss_en": 5.03991728008918,
351
+ "val_loss_it": 4.116291984182889,
352
+ "val_loss_mixed": 5.331302936260517,
353
+ "validation_loss_regression_vs_previous": false
354
+ },
355
+ {
356
+ "aggregate_dataset_count": 0,
357
+ "aggregate_validation_loss_mean": null,
358
+ "aggregate_validation_perplexity_mean": null,
359
+ "checkpoint_name": "step_8000",
360
+ "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_8000.pt",
361
+ "checkpoint_selector": "step_8000",
362
+ "checkpoint_step": 8000,
363
+ "cloze_en_contains": 0.0,
364
+ "cloze_en_exact": 0.0,
365
+ "cloze_it_contains": 0.12,
366
+ "cloze_it_exact": 0.0,
367
+ "delta_vs_previous_generation_pass_rate": null,
368
+ "delta_vs_previous_validation_loss_mean": null,
369
+ "distinct_1": 0.22893954410307235,
370
+ "distinct_2": 0.47058823529411764,
371
+ "generation_pass_rate": null,
372
+ "generation_pass_rate_regression_vs_previous": false,
373
+ "generation_passed_prompts": 0,
374
+ "generation_scored_prompts": 0,
375
+ "generation_total_prompts": 40,
376
+ "language_consistency_en": 1.0,
377
+ "language_consistency_it": 0.775,
378
+ "language_switch_rate_en": 0.0,
379
+ "language_switch_rate_it": 0.0,
380
+ "loop_rate": 0.4,
381
+ "model_type": "pretrained",
382
+ "ppl_en": 147.3507568259974,
383
+ "ppl_it": 62.831334211250244,
384
+ "ppl_mixed": 219.85916404362194,
385
+ "repeated_4gram_rate": 0.75,
386
+ "run_dir": "/mnt/apps/llm-nanochat/artifacts/runs/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki",
387
+ "run_name": "20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki",
388
+ "selected_by": "step_8000",
389
+ "selection_metric_name": null,
390
+ "selection_metric_value": null,
391
+ "source_loss_books_en": 5.153700692313058,
392
+ "source_loss_books_it": 5.1257749285016745,
393
+ "source_loss_code": 8.328606669108073,
394
+ "source_loss_web_en": 6.201997455797698,
395
+ "source_loss_web_it": 6.4544139661287,
396
+ "source_loss_wiki_en": 3.9959980357776987,
397
+ "source_loss_wiki_it": 3.62702880859375,
398
+ "val_loss_en": 4.992815845417526,
399
+ "val_loss_it": 4.140453901447233,
400
+ "val_loss_mixed": 5.392987177922175,
401
+ "validation_loss_regression_vs_previous": false
402
+ }
403
+ ]
404
+ }
eval_metrics.jsonl ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {"step": 1000, "validation_loss": 5.482670289795606, "validation_perplexity": 240.48802345829137, "validation_num_batches": 128, "elapsed_sec": 8324.719261169434}
2
+ {"step": 2000, "validation_loss": 4.5780933100779375, "validation_perplexity": 97.3286416369149, "validation_num_batches": 128, "elapsed_sec": 16636.953268051147}
3
+ {"step": 3000, "validation_loss": 4.2193704548876845, "validation_perplexity": 67.99066762083781, "validation_num_batches": 128, "elapsed_sec": 24949.2317006588}
4
+ {"step": 4000, "validation_loss": 4.019869025491055, "validation_perplexity": 55.69381087954878, "validation_num_batches": 128, "elapsed_sec": 33273.656369924545}
5
+ {"step": 5000, "validation_loss": 4.078961359527535, "validation_perplexity": 59.08407086238957, "validation_num_batches": 128, "elapsed_sec": 41597.64162111282}
6
+ {"step": 6000, "validation_loss": 4.016584710055898, "validation_perplexity": 55.51119488525109, "validation_num_batches": 128, "elapsed_sec": 49907.71303868294}
7
+ {"step": 7000, "validation_loss": 3.911746149875966, "validation_perplexity": 49.98615913843908, "validation_num_batches": 128, "elapsed_sec": 58223.721999168396}
8
+ {"step": 8000, "validation_loss": 3.882301174864477, "validation_perplexity": 48.53577596923642, "validation_num_batches": 128, "elapsed_sec": 66536.9193212986}
eval_summary.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "comparison_path": "/mnt/apps/llm-nanochat/evals/20260607_0037_shortfastdecay8k_4000_5000_6000_7000_8000_cpu_full_benchmark/comparison.json",
3
+ "metadata_path": "/mnt/apps/llm-nanochat/evals/20260607_0037_shortfastdecay8k_4000_5000_6000_7000_8000_cpu_full_benchmark/eval_metadata.json",
4
+ "num_checkpoints": 5,
5
+ "out_dir": "/mnt/apps/llm-nanochat/evals/20260607_0037_shortfastdecay8k_4000_5000_6000_7000_8000_cpu_full_benchmark",
6
+ "recommended_checkpoint": {
7
+ "checkpoint_name": "step_4000",
8
+ "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_4000.pt",
9
+ "direction": "min",
10
+ "value": 5.143982324844751
11
+ },
12
+ "report_path": "/mnt/apps/llm-nanochat/evals/20260607_0037_shortfastdecay8k_4000_5000_6000_7000_8000_cpu_full_benchmark/report.md",
13
+ "suite": "pretrain_minimal_en_it_webwiki_step11000"
14
+ }
metrics.jsonl ADDED
The diff for this file is too large to render. See raw diff
 
probe_generations.jsonl ADDED
The diff for this file is too large to render. See raw diff
 
step_8000.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a7e59f0f4d19633d1ef2d12e900034208d1b8012bcc9bfd6afd8f9cd6d870fae
3
+ size 1633717975
step_8000.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c863cd9e3892ad926c0ce38e5c3d996e571e3dc688f45a8f95da99892e1199fd
3
+ size 544530872
step_8000.safetensors.json ADDED
@@ -0,0 +1,289 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "checkpoint_config": {
3
+ "actual_precision": "bf16",
4
+ "adamw_betas": [
5
+ 0.9,
6
+ 0.95
7
+ ],
8
+ "adamw_eps": 1e-08,
9
+ "attention_kernel_policy": "auto",
10
+ "batch_size": 6,
11
+ "benchmark": {
12
+ "enable_central_tensorboard": true,
13
+ "enable_local_tensorboard": true,
14
+ "enabled": false,
15
+ "output_path": "/mnt/apps/llm-nanochat/artifacts/runs/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/throughput_benchmark.json",
16
+ "warmup_steps": 0
17
+ },
18
+ "checkpoint_dir": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki",
19
+ "clip_grad_norm": 1.0,
20
+ "compile": {
21
+ "backend": null,
22
+ "compile_setup_sec": 0.0,
23
+ "diagnostic": null,
24
+ "dynamic": false,
25
+ "enabled": false,
26
+ "error_policy": "raise",
27
+ "fullgraph": false,
28
+ "mode": null,
29
+ "requested": false,
30
+ "status": "disabled"
31
+ },
32
+ "dataset": {
33
+ "storage_mode": "indexed_jsonl"
34
+ },
35
+ "decay_steps": 5000,
36
+ "deterministic_algorithms": false,
37
+ "device": "cuda",
38
+ "dim": 768,
39
+ "final_lr": 5e-06,
40
+ "fp8_backend": null,
41
+ "grad_accum_steps": 16,
42
+ "learning_rate": 0.0003,
43
+ "logging": {
44
+ "enable_central_tensorboard": true,
45
+ "enable_local_tensorboard": true,
46
+ "metrics_flush_every_steps": 1,
47
+ "metrics_writer": "persistent_jsonl_handle"
48
+ },
49
+ "lr": 0.0003,
50
+ "lr_schedule": "wsd",
51
+ "max_seq_len": 2500,
52
+ "max_steps": 8000,
53
+ "n_heads": 12,
54
+ "n_layers": 12,
55
+ "optimizer": {
56
+ "backend": "torch",
57
+ "betas": [
58
+ 0.9,
59
+ 0.95
60
+ ],
61
+ "eps": 1e-08,
62
+ "implementation": "torch.optim.AdamW",
63
+ "learning_rate": 0.0003,
64
+ "state_precision": "full_precision",
65
+ "type": "adamw",
66
+ "weight_decay": 0.1
67
+ },
68
+ "optimizer_backend": "torch",
69
+ "optimizer_implementation": "torch.optim.AdamW",
70
+ "optimizer_state_precision": "full_precision",
71
+ "optimizer_type": "adamw",
72
+ "peak_lr": 0.0003,
73
+ "repro": {
74
+ "attention_kernel_policy": "auto",
75
+ "cublas_workspace_config": null,
76
+ "cudnn_benchmark": true,
77
+ "cudnn_deterministic": false,
78
+ "deterministic_algorithms": false,
79
+ "flash_sdp_enabled": true,
80
+ "math_sdp_enabled": true,
81
+ "mem_efficient_sdp_enabled": true,
82
+ "pythonhashseed": "1337",
83
+ "seed": 1337
84
+ },
85
+ "requested_precision": "bf16",
86
+ "resume_from": null,
87
+ "resume_mode": "full",
88
+ "save_every_steps": 500,
89
+ "scheduler": {
90
+ "decay_steps": 5000,
91
+ "final_lr": 5e-06,
92
+ "peak_lr": 0.0003,
93
+ "schedule_type": "wsd",
94
+ "stable_steps": 2500,
95
+ "total_steps": 8000,
96
+ "warmup_steps": 500
97
+ },
98
+ "seed": 1337,
99
+ "stable_steps": 2500,
100
+ "train_cache_ram_bytes": 1073741824,
101
+ "train_cache_ram_mb": 1024,
102
+ "vocab_size": 32000,
103
+ "warmup_steps": 500,
104
+ "weight_decay": 0.1
105
+ },
106
+ "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_8000.pt",
107
+ "exported_at": "2026-06-06T23:15:39.908355+00:00",
108
+ "format": "llm-nanochat-safetensors-export",
109
+ "global_step": 8000,
110
+ "metadata_path": "/mnt/apps/llm-nanochat/hf_exports/gpt2small-en-it-nanochat-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki-step8000/step_8000.safetensors.json",
111
+ "model_config": {
112
+ "dim": 768,
113
+ "max_seq_len": 2500,
114
+ "n_heads": 12,
115
+ "n_layers": 12,
116
+ "vocab_size": 32000
117
+ },
118
+ "num_parameters": 136128000,
119
+ "num_tensors": 149,
120
+ "provenance": {
121
+ "checkpoint_dir": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki",
122
+ "checkpoint_name": "step_8000.pt",
123
+ "checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_8000.pt",
124
+ "global_step": 8000,
125
+ "packed_dataset_config_path": null,
126
+ "run_dir": "/mnt/apps/llm-nanochat/checkpoints",
127
+ "tokenizer_dir": null,
128
+ "training_config_path": null
129
+ },
130
+ "safetensors_path": "/mnt/apps/llm-nanochat/hf_exports/gpt2small-en-it-nanochat-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki-step8000/step_8000.safetensors",
131
+ "source_checkpoint_path": "/mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki/step_8000.pt",
132
+ "source_global_step": 8000,
133
+ "tensor_names": [
134
+ "token_emb.weight",
135
+ "pos_emb.weight",
136
+ "blocks.layers.0.self_attn.in_proj_weight",
137
+ "blocks.layers.0.self_attn.in_proj_bias",
138
+ "blocks.layers.0.self_attn.out_proj.weight",
139
+ "blocks.layers.0.self_attn.out_proj.bias",
140
+ "blocks.layers.0.linear1.weight",
141
+ "blocks.layers.0.linear1.bias",
142
+ "blocks.layers.0.linear2.weight",
143
+ "blocks.layers.0.linear2.bias",
144
+ "blocks.layers.0.norm1.weight",
145
+ "blocks.layers.0.norm1.bias",
146
+ "blocks.layers.0.norm2.weight",
147
+ "blocks.layers.0.norm2.bias",
148
+ "blocks.layers.1.self_attn.in_proj_weight",
149
+ "blocks.layers.1.self_attn.in_proj_bias",
150
+ "blocks.layers.1.self_attn.out_proj.weight",
151
+ "blocks.layers.1.self_attn.out_proj.bias",
152
+ "blocks.layers.1.linear1.weight",
153
+ "blocks.layers.1.linear1.bias",
154
+ "blocks.layers.1.linear2.weight",
155
+ "blocks.layers.1.linear2.bias",
156
+ "blocks.layers.1.norm1.weight",
157
+ "blocks.layers.1.norm1.bias",
158
+ "blocks.layers.1.norm2.weight",
159
+ "blocks.layers.1.norm2.bias",
160
+ "blocks.layers.2.self_attn.in_proj_weight",
161
+ "blocks.layers.2.self_attn.in_proj_bias",
162
+ "blocks.layers.2.self_attn.out_proj.weight",
163
+ "blocks.layers.2.self_attn.out_proj.bias",
164
+ "blocks.layers.2.linear1.weight",
165
+ "blocks.layers.2.linear1.bias",
166
+ "blocks.layers.2.linear2.weight",
167
+ "blocks.layers.2.linear2.bias",
168
+ "blocks.layers.2.norm1.weight",
169
+ "blocks.layers.2.norm1.bias",
170
+ "blocks.layers.2.norm2.weight",
171
+ "blocks.layers.2.norm2.bias",
172
+ "blocks.layers.3.self_attn.in_proj_weight",
173
+ "blocks.layers.3.self_attn.in_proj_bias",
174
+ "blocks.layers.3.self_attn.out_proj.weight",
175
+ "blocks.layers.3.self_attn.out_proj.bias",
176
+ "blocks.layers.3.linear1.weight",
177
+ "blocks.layers.3.linear1.bias",
178
+ "blocks.layers.3.linear2.weight",
179
+ "blocks.layers.3.linear2.bias",
180
+ "blocks.layers.3.norm1.weight",
181
+ "blocks.layers.3.norm1.bias",
182
+ "blocks.layers.3.norm2.weight",
183
+ "blocks.layers.3.norm2.bias",
184
+ "blocks.layers.4.self_attn.in_proj_weight",
185
+ "blocks.layers.4.self_attn.in_proj_bias",
186
+ "blocks.layers.4.self_attn.out_proj.weight",
187
+ "blocks.layers.4.self_attn.out_proj.bias",
188
+ "blocks.layers.4.linear1.weight",
189
+ "blocks.layers.4.linear1.bias",
190
+ "blocks.layers.4.linear2.weight",
191
+ "blocks.layers.4.linear2.bias",
192
+ "blocks.layers.4.norm1.weight",
193
+ "blocks.layers.4.norm1.bias",
194
+ "blocks.layers.4.norm2.weight",
195
+ "blocks.layers.4.norm2.bias",
196
+ "blocks.layers.5.self_attn.in_proj_weight",
197
+ "blocks.layers.5.self_attn.in_proj_bias",
198
+ "blocks.layers.5.self_attn.out_proj.weight",
199
+ "blocks.layers.5.self_attn.out_proj.bias",
200
+ "blocks.layers.5.linear1.weight",
201
+ "blocks.layers.5.linear1.bias",
202
+ "blocks.layers.5.linear2.weight",
203
+ "blocks.layers.5.linear2.bias",
204
+ "blocks.layers.5.norm1.weight",
205
+ "blocks.layers.5.norm1.bias",
206
+ "blocks.layers.5.norm2.weight",
207
+ "blocks.layers.5.norm2.bias",
208
+ "blocks.layers.6.self_attn.in_proj_weight",
209
+ "blocks.layers.6.self_attn.in_proj_bias",
210
+ "blocks.layers.6.self_attn.out_proj.weight",
211
+ "blocks.layers.6.self_attn.out_proj.bias",
212
+ "blocks.layers.6.linear1.weight",
213
+ "blocks.layers.6.linear1.bias",
214
+ "blocks.layers.6.linear2.weight",
215
+ "blocks.layers.6.linear2.bias",
216
+ "blocks.layers.6.norm1.weight",
217
+ "blocks.layers.6.norm1.bias",
218
+ "blocks.layers.6.norm2.weight",
219
+ "blocks.layers.6.norm2.bias",
220
+ "blocks.layers.7.self_attn.in_proj_weight",
221
+ "blocks.layers.7.self_attn.in_proj_bias",
222
+ "blocks.layers.7.self_attn.out_proj.weight",
223
+ "blocks.layers.7.self_attn.out_proj.bias",
224
+ "blocks.layers.7.linear1.weight",
225
+ "blocks.layers.7.linear1.bias",
226
+ "blocks.layers.7.linear2.weight",
227
+ "blocks.layers.7.linear2.bias",
228
+ "blocks.layers.7.norm1.weight",
229
+ "blocks.layers.7.norm1.bias",
230
+ "blocks.layers.7.norm2.weight",
231
+ "blocks.layers.7.norm2.bias",
232
+ "blocks.layers.8.self_attn.in_proj_weight",
233
+ "blocks.layers.8.self_attn.in_proj_bias",
234
+ "blocks.layers.8.self_attn.out_proj.weight",
235
+ "blocks.layers.8.self_attn.out_proj.bias",
236
+ "blocks.layers.8.linear1.weight",
237
+ "blocks.layers.8.linear1.bias",
238
+ "blocks.layers.8.linear2.weight",
239
+ "blocks.layers.8.linear2.bias",
240
+ "blocks.layers.8.norm1.weight",
241
+ "blocks.layers.8.norm1.bias",
242
+ "blocks.layers.8.norm2.weight",
243
+ "blocks.layers.8.norm2.bias",
244
+ "blocks.layers.9.self_attn.in_proj_weight",
245
+ "blocks.layers.9.self_attn.in_proj_bias",
246
+ "blocks.layers.9.self_attn.out_proj.weight",
247
+ "blocks.layers.9.self_attn.out_proj.bias",
248
+ "blocks.layers.9.linear1.weight",
249
+ "blocks.layers.9.linear1.bias",
250
+ "blocks.layers.9.linear2.weight",
251
+ "blocks.layers.9.linear2.bias",
252
+ "blocks.layers.9.norm1.weight",
253
+ "blocks.layers.9.norm1.bias",
254
+ "blocks.layers.9.norm2.weight",
255
+ "blocks.layers.9.norm2.bias",
256
+ "blocks.layers.10.self_attn.in_proj_weight",
257
+ "blocks.layers.10.self_attn.in_proj_bias",
258
+ "blocks.layers.10.self_attn.out_proj.weight",
259
+ "blocks.layers.10.self_attn.out_proj.bias",
260
+ "blocks.layers.10.linear1.weight",
261
+ "blocks.layers.10.linear1.bias",
262
+ "blocks.layers.10.linear2.weight",
263
+ "blocks.layers.10.linear2.bias",
264
+ "blocks.layers.10.norm1.weight",
265
+ "blocks.layers.10.norm1.bias",
266
+ "blocks.layers.10.norm2.weight",
267
+ "blocks.layers.10.norm2.bias",
268
+ "blocks.layers.11.self_attn.in_proj_weight",
269
+ "blocks.layers.11.self_attn.in_proj_bias",
270
+ "blocks.layers.11.self_attn.out_proj.weight",
271
+ "blocks.layers.11.self_attn.out_proj.bias",
272
+ "blocks.layers.11.linear1.weight",
273
+ "blocks.layers.11.linear1.bias",
274
+ "blocks.layers.11.linear2.weight",
275
+ "blocks.layers.11.linear2.bias",
276
+ "blocks.layers.11.norm1.weight",
277
+ "blocks.layers.11.norm1.bias",
278
+ "blocks.layers.11.norm2.weight",
279
+ "blocks.layers.11.norm2.bias",
280
+ "ln_f.weight",
281
+ "ln_f.bias",
282
+ "head.weight"
283
+ ],
284
+ "tokenizer_reference": {
285
+ "packed_dataset_config_path": null,
286
+ "tokenizer_dir": null,
287
+ "training_config_path": null
288
+ }
289
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_meta.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "vocab_size_requested": 32000,
3
+ "vocab_size_actual": 32000,
4
+ "special_tokens": [
5
+ "<pad>",
6
+ "<bos>",
7
+ "<eos>",
8
+ "<unk>"
9
+ ]
10
+ }
training_config.yaml ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Fresh GPT2-small web/wiki run with WSD short-fast decay to 8k at peak LR 3e-4.
2
+ # Goal: keep the useful high-LR early learning phase, but compress the fresh
3
+ # web/wiki benchmark into an even shorter 8k-step run while preserving the same
4
+ # short-fast-decay shape as the 11k variant.
5
+ # Schedule: warmup 500, stable 2500, decay 5000, final_lr 5e-6.
6
+ # No resume semantics: random weights, fresh optimizer, fresh scheduler.
7
+
8
+ dataset_dir: /mnt/apps/llm-nanochat/datasets/202605141153_fineweb50_wiki50_50en_50it_score100_2500context_5Btokens_tok_20260515_en50it50_webwiki_stratified_500M
9
+ output_dir: /mnt/apps/llm-nanochat/artifacts/runs/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki
10
+ tokenizer_dir: /mnt/apps/llm-nanochat/tokenizers/tokenizer_20260515_en50it50_webwiki_stratified_500M
11
+ seed: 1337
12
+
13
+ model:
14
+ vocab_size: 32000
15
+ dim: 768
16
+ n_layers: 12
17
+ n_heads: 12
18
+
19
+ training:
20
+ sequence_length: 2500
21
+ max_steps: 8000
22
+ batch_size: 6
23
+ grad_accum_steps: 16
24
+
25
+ learning_rate: 0.0003
26
+ peak_lr: 0.0003
27
+ lr_schedule: wsd
28
+
29
+ warmup_steps: 500
30
+ stable_steps: 2500
31
+ decay_steps: 5000
32
+ final_lr: 5.0e-06
33
+
34
+ adamw_betas:
35
+ - 0.9
36
+ - 0.95
37
+ adamw_eps: 1.0e-08
38
+ weight_decay: 0.1
39
+ clip_grad_norm: 1.0
40
+
41
+ save_every_steps: 500
42
+ checkpoint_dir: /mnt/apps/llm-nanochat/checkpoints/20260605_fresh-gpt2small-lr3e4-bs6-wsd-shortfastdecay8k-final5e6-webwiki
43
+ precision: bf16
44
+
45
+ evaluation:
46
+ validation_every_steps: 1000
47
+ validation_max_batches: 128
48
+ probe_every_steps: 1000
49
+ probe_tokenizer_dir: /mnt/apps/llm-nanochat/tokenizers/tokenizer_20260515_en50it50_webwiki_stratified_500M
50
+ probe_max_new_tokens: 32
51
+ probe_prompts:
52
+ en:
53
+ - prompt: "The capital of Italy is"
54
+ expected_next_text: " Rome"
55
+ - prompt: "A small language model should"
56
+ expected_next_text: " be"
57
+ it:
58
+ - prompt: "La capitale d'Italia è"
59
+ expected_next_text: " Roma"
60
+ - prompt: "Un piccolo modello linguistico dovrebbe"
61
+ expected_next_text: " essere"