nurcunal commited on
Commit
8dc0c33
Β·
verified Β·
1 Parent(s): d2e1ba1

Upload logs/nanochat-tr-d20-bpe32k-492421.err

Browse files
logs/nanochat-tr-d20-bpe32k-492421.err ADDED
@@ -0,0 +1,133 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Bu module sadece a100q ve a100x4q kuyrugundaki makinalarda calisir.
2
+ 2026-06-07 16:23:31,528 - nanochat.common - INFO - Distributed world size: 4
3
+ wandb: Currently logged in as: nurcunal (nurcunal-bogaziciuniversitesi) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
4
+ wandb: setting up run tr-d20-bpe32k-20260607-162301
5
+ wandb: Tracking run with wandb version 0.22.3
6
+ wandb: Run data is saved locally in /ari/users/nunal/nanochat-turk-d20-bpe32k/wandb/wandb/run-20260607_162332-tr-d20-bpe32k-20260607-162301
7
+ wandb: Run `wandb offline` to turn off syncing.
8
+ wandb: Syncing run tr-d20-bpe32k
9
+ wandb: ⭐️ View project at https://wandb.ai/nurcunal-bogaziciuniversitesi/nanochat-turk
10
+ wandb: πŸš€ View run at https://wandb.ai/nurcunal-bogaziciuniversitesi/nanochat-turk/runs/tr-d20-bpe32k-20260607-162301
11
+ 2026-06-07 19:31:02,986 - nanochat.checkpoint_manager - INFO - Saved optimizer state to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/optim_002500_rank2.pt
12
+ 2026-06-07 19:31:03,721 - nanochat.checkpoint_manager - INFO - Saved optimizer state to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/optim_002500_rank3.pt
13
+ 2026-06-07 19:31:03,723 - nanochat.checkpoint_manager - INFO - Saved optimizer state to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/optim_002500_rank1.pt
14
+ 2026-06-07 19:31:04,669 - nanochat.checkpoint_manager - INFO - Saved model parameters to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/model_002500.pt
15
+ 2026-06-07 19:31:04,672 - nanochat.checkpoint_manager - INFO - Saved metadata to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/meta_002500.json
16
+ 2026-06-07 19:31:05,600 - nanochat.checkpoint_manager - INFO - Saved optimizer state to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/optim_002500_rank0.pt
17
+ 2026-06-07 22:36:52,330 - nanochat.checkpoint_manager - INFO - Saved optimizer state to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/optim_005000_rank1.pt
18
+ 2026-06-07 22:36:52,505 - nanochat.checkpoint_manager - INFO - Saved optimizer state to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/optim_005000_rank2.pt
19
+ 2026-06-07 22:36:52,614 - nanochat.checkpoint_manager - INFO - Saved optimizer state to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/optim_005000_rank3.pt
20
+ 2026-06-07 22:37:05,783 - nanochat.checkpoint_manager - INFO - Saved model parameters to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/model_005000.pt
21
+ 2026-06-07 22:37:05,790 - nanochat.checkpoint_manager - INFO - Saved metadata to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/meta_005000.json
22
+ 2026-06-07 22:37:07,259 - nanochat.checkpoint_manager - INFO - Saved optimizer state to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/optim_005000_rank0.pt
23
+ 2026-06-08 01:42:55,280 - nanochat.checkpoint_manager - INFO - Saved optimizer state to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/optim_007500_rank2.pt
24
+ 2026-06-08 01:42:55,666 - nanochat.checkpoint_manager - INFO - Saved optimizer state to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/optim_007500_rank1.pt
25
+ 2026-06-08 01:42:55,769 - nanochat.checkpoint_manager - INFO - Saved optimizer state to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/optim_007500_rank3.pt
26
+ 2026-06-08 01:42:58,578 - nanochat.checkpoint_manager - INFO - Saved model parameters to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/model_007500.pt
27
+ 2026-06-08 01:42:58,582 - nanochat.checkpoint_manager - INFO - Saved metadata to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/meta_007500.json
28
+ 2026-06-08 01:43:00,074 - nanochat.checkpoint_manager - INFO - Saved optimizer state to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/optim_007500_rank0.pt
29
+ 2026-06-08 04:48:53,363 - nanochat.checkpoint_manager - INFO - Saved optimizer state to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/optim_010000_rank2.pt
30
+ 2026-06-08 04:48:53,534 - nanochat.checkpoint_manager - INFO - Saved optimizer state to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/optim_010000_rank1.pt
31
+ 2026-06-08 04:48:56,328 - nanochat.checkpoint_manager - INFO - Saved optimizer state to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/optim_010000_rank3.pt
32
+ 2026-06-08 04:48:56,685 - nanochat.checkpoint_manager - INFO - Saved model parameters to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/model_010000.pt
33
+ 2026-06-08 04:48:56,689 - nanochat.checkpoint_manager - INFO - Saved metadata to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/meta_010000.json
34
+ 2026-06-08 04:49:01,264 - nanochat.checkpoint_manager - INFO - Saved optimizer state to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/optim_010000_rank0.pt
35
+ 2026-06-08 07:54:39,968 - nanochat.checkpoint_manager - INFO - Saved optimizer state to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/optim_012500_rank3.pt
36
+ 2026-06-08 07:54:39,991 - nanochat.checkpoint_manager - INFO - Saved optimizer state to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/optim_012500_rank2.pt
37
+ 2026-06-08 07:54:40,054 - nanochat.checkpoint_manager - INFO - Saved optimizer state to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/optim_012500_rank1.pt
38
+ 2026-06-08 07:54:41,707 - nanochat.checkpoint_manager - INFO - Saved model parameters to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/model_012500.pt
39
+ 2026-06-08 07:54:41,710 - nanochat.checkpoint_manager - INFO - Saved metadata to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/meta_012500.json
40
+ 2026-06-08 07:54:42,841 - nanochat.checkpoint_manager - INFO - Saved optimizer state to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/optim_012500_rank0.pt
41
+ 2026-06-08 11:00:33,210 - nanochat.checkpoint_manager - INFO - Saved optimizer state to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/optim_015000_rank1.pt
42
+ 2026-06-08 11:00:33,211 - nanochat.checkpoint_manager - INFO - Saved optimizer state to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/optim_015000_rank2.pt
43
+ 2026-06-08 11:00:33,637 - nanochat.checkpoint_manager - INFO - Saved optimizer state to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/optim_015000_rank3.pt
44
+ 2026-06-08 11:00:47,652 - nanochat.checkpoint_manager - INFO - Saved model parameters to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/model_015000.pt
45
+ 2026-06-08 11:00:47,655 - nanochat.checkpoint_manager - INFO - Saved metadata to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/meta_015000.json
46
+ 2026-06-08 11:00:53,283 - nanochat.checkpoint_manager - INFO - Saved optimizer state to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/optim_015000_rank0.pt
47
+ 2026-06-08 13:37:24,481 - nanochat.checkpoint_manager - INFO - Saved optimizer state to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/optim_017100_rank2.pt
48
+ 2026-06-08 13:37:24,556 - nanochat.checkpoint_manager - INFO - Saved optimizer state to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/optim_017100_rank1.pt
49
+ 2026-06-08 13:37:24,815 - nanochat.checkpoint_manager - INFO - Saved optimizer state to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/optim_017100_rank3.pt
50
+ 2026-06-08 13:37:28,031 - nanochat.checkpoint_manager - INFO - Saved model parameters to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/model_017100.pt
51
+ 2026-06-08 13:37:28,038 - nanochat.checkpoint_manager - INFO - Saved metadata to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/meta_017100.json
52
+ 2026-06-08 13:37:29,349 - nanochat.checkpoint_manager - INFO - Saved optimizer state to: /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20/optim_017100_rank0.pt
53
+ wandb: updating run metadata
54
+ wandb: uploading output.log; uploading wandb-summary.json
55
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml
56
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml; uploading history steps 189-189, summary, console lines 17323-17336
57
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml
58
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml; uploading data
59
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml
60
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml; uploading data
61
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml
62
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml; uploading data
63
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml
64
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml; uploading data
65
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml
66
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml; uploading data
67
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml
68
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml; uploading data
69
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml
70
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml; uploading data
71
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml
72
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml; uploading data
73
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml
74
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml; uploading data
75
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml
76
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml; uploading data
77
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml
78
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml; uploading data
79
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml
80
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml; uploading data
81
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml
82
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml; uploading data
83
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml
84
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml; uploading data
85
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml
86
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml; uploading data
87
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml
88
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml; uploading data
89
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml
90
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml; uploading data
91
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml
92
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml; uploading data
93
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml
94
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml; uploading data
95
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml
96
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml; uploading data
97
+ wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml
98
+ wandb: uploading wandb-summary.json; uploading config.yaml
99
+ wandb: uploading wandb-summary.json; uploading config.yaml; uploading data
100
+ wandb: uploading wandb-summary.json; uploading config.yaml
101
+ wandb: uploading config.yaml
102
+ wandb: uploading data
103
+ wandb:
104
+ wandb: Run history:
105
+ wandb: step β–β–β–β–β–β–‚β–‚β–‚β–‚β–‚β–ƒβ–ƒβ–„β–„β–„β–„β–„β–„β–„β–„β–…β–…β–…β–…β–…β–…β–…β–…β–†β–†β–†β–†β–†β–†β–‡β–‡β–‡β–ˆβ–ˆβ–ˆ
106
+ wandb: total_training_flops β–β–β–β–β–‚β–‚β–‚β–‚β–ƒβ–ƒβ–„β–„β–„β–„β–„β–„β–…β–…β–…β–…β–…β–…β–†β–†β–†β–†β–†β–†β–†β–†β–‡β–‡β–‡β–‡β–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
107
+ wandb: total_training_time β–β–β–β–‚β–‚β–‚β–‚β–‚β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–„β–„β–„β–„β–„β–„β–„β–„β–…β–…β–…β–…β–†β–†β–†β–†β–‡β–‡β–‡β–‡β–‡β–‡β–ˆβ–ˆβ–ˆ
108
+ wandb: train/dt β–…β–ƒβ–„β–…β–ƒβ–ƒβ–ƒβ–β–β–ƒβ–ƒβ–„β–„β–β–„β–ƒβ–…β–†β–„β–„β–„β–ƒβ–„β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–„β–„β–ƒβ–β–„β–ˆβ–„β–ƒβ–ƒβ–ƒ
109
+ wandb: train/loss β–ˆβ–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–β–‚β–‚β–‚β–‚β–‚β–β–β–β–‚β–‚β–‚β–β–β–β–β–‚β–‚β–β–β–β–β–β–β–β–
110
+ wandb: train/lrm β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–‡β–‡β–‡β–†β–†β–†β–†β–†β–…β–…β–…β–„β–„β–„β–„β–„β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–‚β–‚β–β–β–β–
111
+ wandb: train/mfu β–β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
112
+ wandb: train/tok_per_sec β–β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
113
+ wandb: val/bpb β–ˆβ–‚β–‚β–‚β–‚β–‚β–‚β–‚β–β–β–β–β–β–β–β–β–β–β–
114
+ wandb:
115
+ wandb: Run summary:
116
+ wandb: step 17100
117
+ wandb: total_training_flops 5.809722400073318e+19
118
+ wandb: total_training_time 76059.47399
119
+ wandb: train/dt 4.43947
120
+ wandb: train/epoch 1 pq: 10 rg: 1744
121
+ wandb: train/loss 2.38038
122
+ wandb: train/lrm 0.05855
123
+ wandb: train/mfu 61.32155
124
+ wandb: train/tok_per_sec 236193
125
+ wandb: val/bpb 0.62321
126
+ wandb:
127
+ wandb: πŸš€ View run tr-d20-bpe32k at: https://wandb.ai/nurcunal-bogaziciuniversitesi/nanochat-turk/runs/tr-d20-bpe32k-20260607-162301
128
+ wandb: ⭐️ View project at: https://wandb.ai/nurcunal-bogaziciuniversitesi/nanochat-turk
129
+ wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
130
+ wandb: Find logs at: .-d20-bpe32k/wandb/wandb/run-20260607_162332-tr-d20-bpe32k-20260607-162301/logs
131
+ 2026-06-08 13:54:06,931 - nanochat.common - INFO - Distributed world size: 4
132
+ 2026-06-08 13:54:06,934 - nanochat.checkpoint_manager - INFO - Loading model from /ari/users/nunal/nanochat-turk-d20-bpe32k/base_checkpoints/tr_d20_bpe_32768_chinchilla20 with step 17100
133
+ 2026-06-08 13:54:08,704 - nanochat.checkpoint_manager - INFO - Building model with config: {'sequence_len': 2048, 'vocab_size': 32768, 'n_layer': 20, 'n_head': 10, 'n_kv_head': 10, 'n_embd': 1280, 'window_pattern': 'L'}