| 1: W0820 16:56:16.725000 1497662 torch/distributed/run.py:792] |
| 1: W0820 16:56:16.725000 1497662 torch/distributed/run.py:792] ***************************************** |
| 1: W0820 16:56:16.725000 1497662 torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. |
| 1: W0820 16:56:16.725000 1497662 torch/distributed/run.py:792] ***************************************** |
| 0: W0820 16:56:16.732000 880870 torch/distributed/run.py:792] |
| 0: W0820 16:56:16.732000 880870 torch/distributed/run.py:792] ***************************************** |
| 0: W0820 16:56:16.732000 880870 torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. |
| 0: W0820 16:56:16.732000 880870 torch/distributed/run.py:792] ***************************************** |
| 3: W0820 16:56:16.836000 271178 torch/distributed/run.py:792] |
| 3: W0820 16:56:16.836000 271178 torch/distributed/run.py:792] ***************************************** |
| 3: W0820 16:56:16.836000 271178 torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. |
| 3: W0820 16:56:16.836000 271178 torch/distributed/run.py:792] ***************************************** |
| 2: W0820 16:56:16.841000 1571176 torch/distributed/run.py:792] |
| 2: W0820 16:56:16.841000 1571176 torch/distributed/run.py:792] ***************************************** |
| 2: W0820 16:56:16.841000 1571176 torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. |
| 2: W0820 16:56:16.841000 1571176 torch/distributed/run.py:792] ***************************************** |
| 2: [2025-08-20 16:56:37,848] [INFO] [axolotl.utils.schemas.validation.check_eval_packing:118] [PID:1571252] [RANK:0] explicitly setting `eval_sample_packing` to match `sample_packing`[39m |
| 2: [2025-08-20 16:56:37,848] [INFO] [axolotl.utils.schemas.validation.hint_sample_packing_padding:217] [PID:1571252] [RANK:0] Setting `pad_to_sequence_len: true` to prevent memory leaks when sample_packing[39m |
| 1: [2025-08-20 16:56:38,039] [INFO] [axolotl.utils.schemas.validation.check_eval_packing:118] [PID:1497738] [RANK:0] explicitly setting `eval_sample_packing` to match `sample_packing`[39m |
| 1: [2025-08-20 16:56:38,039] [INFO] [axolotl.utils.schemas.validation.hint_sample_packing_padding:217] [PID:1497738] [RANK:0] Setting `pad_to_sequence_len: true` to prevent memory leaks when sample_packing[39m |
| 0: [2025-08-20 16:56:38,110] [INFO] [axolotl.utils.schemas.validation.check_eval_packing:118] [PID:880949] [RANK:0] explicitly setting `eval_sample_packing` to match `sample_packing`[39m |
| 0: [2025-08-20 16:56:38,111] [INFO] [axolotl.utils.schemas.validation.hint_sample_packing_padding:217] [PID:880949] [RANK:0] Setting `pad_to_sequence_len: true` to prevent memory leaks when sample_packing[39m |
| 3: [2025-08-20 16:56:38,340] [INFO] [axolotl.utils.schemas.validation.check_eval_packing:118] [PID:271254] [RANK:0] explicitly setting `eval_sample_packing` to match `sample_packing`[39m |
| 3: [2025-08-20 16:56:38,340] [INFO] [axolotl.utils.schemas.validation.hint_sample_packing_padding:217] [PID:271254] [RANK:0] Setting `pad_to_sequence_len: true` to prevent memory leaks when sample_packing[39m |
| 0: [2025-08-20 16:56:40,670] [INFO] [axolotl.utils.data.sft._load_raw_datasets:310] [PID:880950] [RANK:1] Loading raw datasets...[39m |
| 0: [2025-08-20 16:56:41,415] [INFO] [axolotl.utils.data.wrappers.get_dataset_wrapper:88] [PID:880950] [RANK:1] Loading dataset: /lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Qwen3-235B-A22B/no_thinking with base_type: chat_template and prompt_style: None[39m |
| 0: [2025-08-20 16:56:41,708] [INFO] [axolotl.cli.config.load_cfg:244] [PID:880949] [RANK:0] config: |
| 0: { |
| 0: "activation_offloading": false, |
| 0: "auto_resume_from_checkpoints": true, |
| 0: "axolotl_config_path": "/lustre/fswork/projects/rech/dgo/udv55np/train/tmp/1755698958355263805.yaml", |
| 0: "base_model": "/lustre/fswork/projects/rech/qwv/udv55np/Qwen/Qwen2.5-0.5B", |
| 0: "base_model_config": "/lustre/fswork/projects/rech/qwv/udv55np/Qwen/Qwen2.5-0.5B", |
| 0: "batch_size": 16, |
| 0: "bf16": true, |
| 0: "capabilities": { |
| 0: "bf16": true, |
| 0: "compute_capability": "sm_90", |
| 0: "fp8": false, |
| 0: "n_gpu": 16, |
| 0: "n_node": 1 |
| 0: }, |
| 0: "chat_template": "qwen_25", |
| 0: "dataloader_num_workers": 16, |
| 0: "dataloader_pin_memory": true, |
| 0: "dataloader_prefetch_factor": 256, |
| 0: "dataset_prepared_path": "/lustre/fsn1/projects/rech/dgo/udv55np/dataset/Qwen3-235B-A22B/Qwen2.5-0.5B/mix_0", |
| 0: "dataset_processes": 192, |
| 0: "datasets": [ |
| 0: { |
| 0: "chat_template": "tokenizer_default", |
| 0: "data_files": [ |
| 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Qwen3-235B-A22B/no_thinking/0007.jsonl", |
| 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Qwen3-235B-A22B/no_thinking/0009.jsonl", |
| 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Qwen3-235B-A22B/no_thinking/0005.jsonl", |
| 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Qwen3-235B-A22B/no_thinking/0006.jsonl", |
| 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Qwen3-235B-A22B/no_thinking/0014.jsonl", |
| 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Qwen3-235B-A22B/no_thinking/0010.jsonl", |
| 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Qwen3-235B-A22B/no_thinking/0012.jsonl", |
| 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Qwen3-235B-A22B/no_thinking/0008.jsonl", |
| 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Qwen3-235B-A22B/no_thinking/0001.jsonl", |
| 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Qwen3-235B-A22B/no_thinking/0002.jsonl", |
| 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Qwen3-235B-A22B/no_thinking/0013.jsonl", |
| 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Qwen3-235B-A22B/no_thinking/0015.jsonl", |
| 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Qwen3-235B-A22B/no_thinking/0004.jsonl", |
| 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Qwen3-235B-A22B/no_thinking/0011.jsonl", |
| 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Qwen3-235B-A22B/no_thinking/0000.jsonl", |
| 0: "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Qwen3-235B-A22B/no_thinking/0003.jsonl" |
| 0: ], |
| 0: "ds_type": "json", |
| 0: "field_messages": "conversations", |
| 0: "message_property_mappings": { |
| 0: "content": "content", |
| 0: "role": "role" |
| 0: }, |
| 0: "path": "/lustre/fswork/projects/rech/qwv/udv55np/dataset/ift/Qwen3-235B-A22B/no_thinking", |
| 0: "trust_remote_code": false, |
| 0: "type": "chat_template" |
| 0: } |
| 0: ], |
| 0: "ddp": true, |
| 0: "deepspeed": { |
| 0: "bf16": { |
| 0: "enabled": true |
| 0: }, |
| 0: "gradient_accumulation_steps": "auto", |
| 0: "gradient_clipping": "auto", |
| 0: "train_batch_size": "auto", |
| 0: "train_micro_batch_size_per_gpu": "auto", |
| 0: "wall_clock_breakdown": false, |
| 0: "zero_optimization": { |
| 0: "contiguous_gradients": true, |
| 0: "overlap_comm": true, |
| 0: "reduce_bucket_size": "auto", |
| 0: "stage": 3, |
| 0: "stage3_gather_16bit_weights_on_model_save": true, |
| 0: "stage3_param_persistence_threshold": "auto", |
| 0: "stage3_prefetch_bucket_size": "auto", |
| 0: "sub_group_size": 0 |
| 0: } |
| 0: }, |
| 0: "device": "cuda:0", |
| 0: "device_map": { |
| 0: "": 0 |
| 0: }, |
| 0: "env_capabilities": { |
| 0: "torch_version": "2.6.0" |
| 0: }, |
| 0: "eval_batch_size": 1, |
| 0: "eval_causal_lm_metrics": [ |
| 0: "sacrebleu", |
| 0: "comet", |
| 0: "ter", |
| 0: "chrf" |
| 0: ], |
| 0: "eval_max_new_tokens": 128, |
| 0: "eval_sample_packing": true, |
| 0: "eval_table_size": 0, |
| 0: "evals_per_epoch": 0, |
| 0: "flash_attention": true, |
| 0: "fp16": false, |
| 0: "gradient_accumulation_steps": 1, |
| 0: "gradient_checkpointing": true, |
| 0: "gradient_checkpointing_kwargs": { |
| 0: "use_reentrant": true |
| 0: }, |
| 0: "learning_rate": 1e-05, |
| 0: "lisa_layers_attribute": "model.layers", |
| 0: "load_best_model_at_end": false, |
| 0: "load_in_4bit": false, |
| 0: "load_in_8bit": false, |
| 0: "local_rank": 0, |
| 0: "logging_steps": 10, |
| 0: "lora_dropout": 0.0, |
| 0: "loraplus_lr_embedding": 1e-06, |
| 0: "lr_scheduler": "warmup_stable_decay", |
| 0: "lr_scheduler_kwargs": { |
| 0: "min_lr_ratio": 0.1, |
| 0: "num_decay_steps": 300 |
| 0: }, |
| 0: "max_prompt_len": 512, |
| 0: "mean_resizing_embeddings": false, |
| 0: "micro_batch_size": 1, |
| 0: "model_config_type": "qwen2", |
| 0: "num_epochs": 1.0, |
| 0: "optimizer": "adamw_torch_fused", |
| 0: "output_dir": "/lustre/fswork/projects/rech/dgo/udv55np/ift/Qwen3-235B-A22B/Qwen2.5-0.5B/0", |
| 0: "pad_to_sequence_len": true, |
| 0: "pretrain_multipack_attn": true, |
| 0: "pretrain_multipack_buffer_size": 10000, |
| 0: "profiler_steps_start": 0, |
| 0: "qlora_sharded_model_loading": false, |
| 0: "ray_num_workers": 1, |
| 0: "resources_per_worker": { |
| 0: "GPU": 1 |
| 0: }, |
| 0: "sample_packing": true, |
| 0: "sample_packing_bin_size": 200, |
| 0: "sample_packing_group_size": 100000, |
| 0: "save_only_model": false, |
| 0: "save_safetensors": true, |
| 0: "save_steps": 0.2, |
| 0: "save_total_limit": 20, |
| 0: "sequence_len": 16384, |
| 0: "sequence_parallel_degree": 1, |
| 0: "shuffle_merged_datasets": true, |
| 0: "skip_prepare_dataset": false, |
| 0: "special_tokens": { |
| 0: "bos_token": "<|im_start|>", |
| 0: "eos_token": "<|im_end|>", |
| 0: "pad_token": "<|endoftext|>" |
| 0: }, |
| 0: "strict": false, |
| 0: "tensor_parallel_size": 1, |
| 0: "tf32": false, |
| 0: "tiled_mlp_use_original_mlp": true, |
| 0: "tokenizer_config": "/lustre/fswork/projects/rech/qwv/udv55np/Qwen/Qwen2.5-0.5B", |
| 0: "torch_dtype": "torch.bfloat16", |
| 0: "train_on_inputs": false, |
| 0: "trl": { |
| 0: "log_completions": false, |
| 0: "mask_truncated_completions": false, |
| 0: "ref_model_mixup_alpha": 0.9, |
| 0: "ref_model_sync_steps": 64, |
| 0: "scale_rewards": true, |
| 0: "sync_ref_model": false, |
| 0: "use_vllm": false, |
| 0: "vllm_server_host": "0.0.0.0", |
| 0: "vllm_server_port": 8000 |
| 0: }, |
| 0: "use_ray": false, |
| 0: "use_tensorboard": true, |
| 0: "val_set_size": 0.0, |
| 0: "vllm": { |
| 0: "device": "auto", |
| 0: "dtype": "auto", |
| 0: "gpu_memory_utilization": 0.9, |
| 0: "host": "0.0.0.0", |
| 0: "port": 8000 |
| 0: }, |
| 0: "warmup_steps": 150, |
| 0: "weight_decay": 0.0, |
| 0: "world_size": 16 |
| 0: }[39m |
| 0: [2025-08-20 16:56:41,710] [INFO] [axolotl.cli.checks.check_user_token:35] [PID:880949] [RANK:0] Skipping HuggingFace token verification because HF_HUB_OFFLINE is set to True. Only local files will be used.[39m |
| 0:
Dropping Long Sequences (>16384) (num_proc=192): 0%| | 0/1393784 [00:00<?, ? examples/s]
Dropping Long Sequences (>16384) (num_proc=192): 0%| | 1000/1393784 [00:01<35:49, 647.85 examples/s]
Dropping Long Sequences (>16384) (num_proc=192): 1%|β | 20000/1393784 [00:01<01:22, 16610.26 examples/s]
Dropping Long Sequences (>16384) (num_proc=192): 3%|β | 38000/1393784 [00:01<00:39, 34040.49 examples/s]
Dropping Long Sequences (>16384) (num_proc=192): 4%|β | 58000/1393784 [00:01<00:23, 55729.77 examples/s]
Dropping Long Sequences (>16384) (num_proc=192): 5%|β | 75000/1393784 [00:01<00:17, 73382.29 examples/s]
Dropping Long Sequences (>16384) (num_proc=192): 7%|β | 92000/1393784 [00:02<00:14, 88963.31 examples/s]
Dropping Long Sequences (>16384) (num_proc=192): 8%|β | 108000/1393784 [00:02<00:12, 100525.00 examples/s]
Dropping Long Sequences (>16384) (num_proc=192): 9%|β | 124000/1393784 [00:02<00:11, 112354.97 |
| 0: examples/s]
Dropping Long Sequences (>16384) (num_proc=192): 10%|β | 140000/1393784 [00:02<00:10, 121259.64 examples/s]
Dropping Long Sequences (>16384) (num_proc=192): 11%|ββ | 157000/1393784 [00:02<00:09, 132246.43 examples/s]
Dropping Long Sequences (>16384) (num_proc=192): 12%|ββ | 174000/1393784 [00:02<00:08, 139321.12 examples/s]
Dropping Long Sequences (>16384) (num_proc=192): 14%|ββ | 191000/1393784 [00:02<00:08, 146007.03 examples/s]
Dropping Long Sequences (>16384) (num_proc=192): 17%|ββ | 242000/1393784 [00:02<00:04, 243494.26 examples/s]
Dropping Long Sequences (>16384) (num_proc=192): 21%|ββ | 295000/1393784 [00:02<00:03, 322748.63 examples/s]
Dropping Long Sequences (>16384) (num_proc=192): 25%|βββ | 342000/1393784 [00:03<00:02, 359443.89 examples/s]
Dropping Long Sequences (>16384) (num_proc=192): 33%|ββββ | 455780/1393784 [00:03<00:01, 582274.31 examples/s]
Dropping Long Sequences (>16384) (num_pr |
| 0: oc=192): 40%|ββββ | 551820/1393784 [00:03<00:01, 691185.18 examples/s]
Dropping Long Sequences (>16384) (num_proc=192): 45%|βββββ | 623600/1393784 [00:03<00:01, 576400.08 examples/s]
Dropping Long Sequences (>16384) (num_proc=192): 49%|βββββ | 686420/1393784 [00:03<00:01, 487306.27 examples/s]
Dropping Long Sequences (>16384) (num_proc=192): 53%|ββββββ | 740460/1393784 [00:03<00:01, 492542.70 examples/s]
Dropping Long Sequences (>16384) (num_proc=192): 57%|ββββββ | 793580/1393784 [00:03<00:01, 456005.40 examples/s]
Dropping Long Sequences (>16384) (num_proc=192): 60%|ββββββ | 842620/1393784 [00:03<00:01, 435739.59 examples/s]
Dropping Long Sequences (>16384) (num_proc=192): 64%|βββββββ | 893920/1393784 [00:04<00:01, 448084.72 examples/s]
Dropping Long Sequences (>16384) (num_proc=192): 67%|βββββββ | 940480/1393784 [00:04<00:01, 440398.87 examples/s]
Dropping Long Sequences (>16384) (nu |
| 0: m_proc=192): 71%|βββββββ | 985777/1393784 [00:04<00:00, 427743.44 examples/s]
Dropping Long Sequences (>16384) (num_proc=192): 74%|ββββββββ | 1029331/1393784 [00:04<00:00, 405088.11 examples/s]
Dropping Long Sequences (>16384) (num_proc=192): 77%|ββββββββ | 1078369/1393784 [00:04<00:00, 424379.65 examples/s]
Dropping Long Sequences (>16384) (num_proc=192): 81%|ββββββββ | 1122516/1393784 [00:04<00:00, 422464.38 examples/s]
Dropping Long Sequences (>16384) (num_proc=192): 84%|βββββββββ | 1166366/1393784 [00:04<00:00, 419730.36 examples/s]
Dropping Long Sequences (>16384) (num_proc=192): 87%|βββββββββ | 1209510/1393784 [00:04<00:00, 414463.25 examples/s]
Dropping Long Sequences (>16384) (num_proc=192): 90%|βββββββββ | 1252100/1393784 [00:04<00:00, 416483.55 examples/s]
Dropping Long Sequences (>16384) (num_proc=192): 93%|ββββββββββ| 1294021/1393784 [00:05<00:00, 395 |
| 0: 480.06 examples/s]
Dropping Long Sequences (>16384) (num_proc=192): 96%|ββββββββββ| 1333942/1393784 [00:05<00:00, 365959.87 examples/s]
Dropping Long Sequences (>16384) (num_proc=192): 98%|ββββββββββ| 1371863/1393784 [00:05<00:00, 302760.23 examples/s]
Dropping Long Sequences (>16384) (num_proc=192): 100%|ββββββββββ| 1393784/1393784 [00:05<00:00, 234154.83 examples/s] |
| 0:
Drop Samples with Zero Trainable Tokens (num_proc=192): 0%| | 0/1393229 [00:00<?, ? examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 0%| | 1000/1393229 [00:02<50:25, 460.11 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 1%| | 10000/1393229 [00:02<03:53, 5936.20 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 1%|β | 18000/1393229 [00:02<01:57, 11688.24 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 2%|β | 24000/1393229 [00:02<01:23, 16404.24 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 2%|β | 32000/1393229 [00:02<00:57, 23801.82 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 3%|β | 43000/1393229 [00:02<00:37, 36428.10 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 4%|β | 51000/1393229 [00:02<00:30, 44018.44 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): |
| 0: 4%|β | 60000/1393229 [00:02<00:25, 52964.31 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 5%|β | 73000/1393229 [00:03<00:19, 68476.84 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 6%|β | 86000/1393229 [00:03<00:15, 82477.73 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 7%|β | 102000/1393229 [00:03<00:12, 100557.74 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 8%|β | 115000/1393229 [00:03<00:11, 107583.80 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 9%|β | 129000/1393229 [00:03<00:11, 114110.88 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 10%|β | 142000/1393229 [00:03<00:10, 116459.59 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 11%|β | 155000/1393229 [00:03<00:10, 116347.96 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 12%|ββ | 1 |
| 0: 68000/1393229 [00:03<00:10, 118309.16 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 13%|ββ | 181000/1393229 [00:03<00:09, 121411.68 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 14%|ββ | 194000/1393229 [00:04<00:09, 120284.66 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 15%|ββ | 207000/1393229 [00:04<00:10, 117740.48 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 16%|ββ | 221000/1393229 [00:04<00:09, 120466.22 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 17%|ββ | 234000/1393229 [00:04<00:09, 117526.73 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 18%|ββ | 247000/1393229 [00:04<00:09, 120255.75 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 20%|ββ | 275000/1393229 [00:04<00:06, 165223.39 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 27%|βββ | |
| 0: 381000/1393229 [00:04<00:02, 420296.44 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 31%|βββ | 425000/1393229 [00:04<00:02, 362665.16 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 33%|ββββ | 464000/1393229 [00:04<00:03, 303122.99 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 36%|ββββ | 498000/1393229 [00:05<00:02, 306727.35 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 38%|ββββ | 536000/1393229 [00:05<00:02, 319634.18 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 41%|ββββ | 570000/1393229 [00:05<00:02, 306649.64 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 43%|βββββ | 602000/1393229 [00:05<00:02, 287107.66 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 46%|βββββ | 636000/1393229 [00:05<00:02, 299362.86 examples/s]
Drop Samples with Zero Trainable Tokens (num_p |
| 0: roc=192): 48%|βββββ | 667000/1393229 [00:05<00:02, 297606.28 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 50%|βββββ | 700000/1393229 [00:05<00:02, 299908.66 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 53%|ββββββ | 741000/1393229 [00:05<00:01, 329982.47 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 56%|ββββββ | 777000/1393229 [00:05<00:01, 337493.04 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 58%|ββββββ | 812000/1393229 [00:06<00:01, 318276.65 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 61%|ββββββ | 850000/1393229 [00:06<00:01, 328251.90 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 63%|βββββββ | 883257/1393229 [00:06<00:01, 328594.35 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 67%|βββββββ | 927028/1393229 [00:06<00:01, 3579 |
| 0: 51.22 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 69%|βββββββ | 964855/1393229 [00:06<00:01, 362652.76 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 72%|ββββββββ | 1002196/1393229 [00:06<00:01, 351948.04 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 75%|ββββββββ | 1045995/1393229 [00:06<00:00, 372957.80 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 78%|ββββββββ | 1084107/1393229 [00:06<00:00, 374183.19 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 81%|ββββββββ | 1126191/1393229 [00:06<00:00, 387411.68 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 84%|βββββββββ | 1165273/1393229 [00:07<00:00, 375284.68 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 87%|βββββββββ | 1207861/1393229 [00:07<00:00, 386771.76 examples/s]
Drop Samples with |
| 0: Zero Trainable Tokens (num_proc=192): 90%|βββββββββ | 1246981/1393229 [00:07<00:00, 358485.92 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 92%|ββββββββββ| 1286797/1393229 [00:07<00:00, 366308.44 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 95%|ββββββββββ| 1323917/1393229 [00:07<00:00, 367296.65 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 98%|ββββββββββ| 1361037/1393229 [00:07<00:00, 309401.49 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 100%|ββββββββββ| 1393229/1393229 [00:08<00:00, 169350.94 examples/s] |
| 0:
Add position_id column (Sample Packing) (num_proc=192): 0%| | 0/1393229 [00:00<?, ? examples/s]
Add position_id column (Sample Packing) (num_proc=192): 0%| | 1000/1393229 [00:02<49:38, 467.35 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 1%| | 16000/1393229 [00:02<02:20, 9800.62 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 2%|β | 32000/1393229 [00:02<01:01, 22064.08 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 3%|β | 46000/1393229 [00:02<00:39, 34273.48 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 4%|β | 59000/1393229 [00:02<00:28, 46141.28 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 5%|β | 73000/1393229 [00:02<00:21, 60644.89 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 6%|β | 86000/1393229 [00:02<00:18, 69759.50 examples/s]
Add position_id column (Sample Packing) (num_proc=192): |
| 0: 7%|β | 99000/1393229 [00:02<00:15, 81007.59 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 9%|β | 121000/1393229 [00:02<00:11, 111222.79 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 10%|β | 143000/1393229 [00:03<00:09, 136046.10 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 12%|ββ | 172000/1393229 [00:03<00:07, 174130.85 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 16%|ββ | 220000/1393229 [00:03<00:04, 254733.96 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 18%|ββ | 249000/1393229 [00:03<00:04, 236184.91 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 20%|ββ | 276000/1393229 [00:03<00:04, 232808.84 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 22%|βββ | 302000/1393229 [00:03<00:04, 228488.25 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 23%|οΏ½ |
| 0: οΏ½οΏ½ββ | 327000/1393229 [00:03<00:05, 203885.65 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 26%|βββ | 358000/1393229 [00:03<00:04, 227576.28 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 28%|βββ | 393000/1393229 [00:04<00:03, 259055.38 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 30%|βββ | 421000/1393229 [00:04<00:03, 246904.86 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 32%|ββββ | 447000/1393229 [00:04<00:04, 235228.88 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 34%|ββββ | 472000/1393229 [00:04<00:03, 233225.59 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 36%|ββββ | 496000/1393229 [00:04<00:03, 230270.29 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 37%|ββββ | 520000/1393229 [00:04<00:04, 209309.78 examples/s]
Add position_id column (Sample Packing |
| 0: ) (num_proc=192): 39%|ββββ | 548000/1393229 [00:04<00:03, 226028.22 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 42%|βββββ | 580000/1393229 [00:04<00:03, 246117.66 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 43%|βββββ | 606000/1393229 [00:04<00:03, 246544.10 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 45%|βββββ | 632000/1393229 [00:05<00:03, 250155.04 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 47%|βββββ | 658000/1393229 [00:05<00:02, 246068.08 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 49%|βββββ | 683000/1393229 [00:05<00:03, 226019.02 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 51%|βββββ | 707000/1393229 [00:05<00:03, 207986.95 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 52%|ββββββ | 729000/1393229 [00:05<00:03, 207471.96 ex |
| 0: amples/s]
Add position_id column (Sample Packing) (num_proc=192): 54%|ββββββ | 759000/1393229 [00:05<00:02, 230257.22 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 57%|ββββββ | 793000/1393229 [00:05<00:02, 255681.67 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 59%|ββββββ | 821000/1393229 [00:05<00:02, 262384.91 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 61%|ββββββ | 848000/1393229 [00:05<00:02, 247397.87 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 63%|βββββββ | 874000/1393229 [00:06<00:02, 236204.67 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 64%|βββββββ | 898000/1393229 [00:06<00:02, 218983.16 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 66%|βββββββ | 921000/1393229 [00:06<00:02, 214581.84 examples/s]
Add position_id column (Sample Packing) (num_proc=192): |
| 0: 68%|βββββββ | 947000/1393229 [00:06<00:01, 226506.86 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 70%|βββββββ | 980000/1393229 [00:06<00:01, 252706.85 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 72%|ββββββββ | 1006000/1393229 [00:06<00:01, 248273.15 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 74%|ββββββββ | 1032000/1393229 [00:06<00:01, 240775.09 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 76%|ββββββββ | 1059028/1393229 [00:06<00:01, 244445.80 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 78%|ββββββββ | 1083855/1393229 [00:07<00:01, 230992.61 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 80%|ββββββββ | 1107911/1393229 [00:07<00:01, 226200.76 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 82%|βββββββββ | 1137453/ |
| 0: 1393229 [00:07<00:01, 244710.67 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 84%|βββββββββ | 1172251/1393229 [00:07<00:00, 272431.48 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 86%|βββββββββ | 1204877/1393229 [00:07<00:00, 287615.86 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 89%|βββββββββ | 1234268/1393229 [00:07<00:00, 280443.08 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 91%|βββββββββ | 1263344/1393229 [00:07<00:00, 280766.97 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 93%|ββββββββββ| 1292005/1393229 [00:07<00:00, 253167.65 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 95%|ββββββββββ| 1320589/1393229 [00:07<00:00, 260423.92 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 97%|ββββββββββ| 1347197/1393229 [00:08<00 |
| 0: :00, 242331.70 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 98%|ββββββββββ| 1372085/1393229 [00:08<00:00, 208139.65 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 100%|ββββββββββ| 1393229/1393229 [00:08<00:00, 158552.36 examples/s] |
| 0:
Saving the dataset (0/192 shards): 0%| | 0/1393229 [00:00<?, ? examples/s]
Saving the dataset (0/192 shards): 0%| | 4000/1393229 [00:02<11:35, 1998.84 examples/s]
Saving the dataset (1/192 shards): 7%|β | 95514/1393229 [00:02<10:49, 1998.84 examples/s]
Saving the dataset (2/192 shards): 7%|β | 101514/1393229 [00:02<10:46, 1998.84 examples/s]
Saving the dataset (3/192 shards): 9%|β | 120542/1393229 [00:02<10:36, 1998.84 examples/s]
Saving the dataset (4/192 shards): 9%|β | 123542/1393229 [00:02<10:35, 1998.84 examples/s]
Saving the dataset (5/192 shards): 9%|β | 123542/1393229 [00:02<10:35, 1998.84 examples/s]
Saving the dataset (6/192 shards): 9%|β | 125542/1393229 [00:02<10:34, 1998.84 examples/s]
Saving the dataset (7/192 shards): 11%|β | 152056/1393229 [00:02<10:20, 1998.84 examples/s]
Saving the dataset (8/192 shards): 11%|β | 154313/1393229 [00:02<10:19, 1998.84 examples/s]
Saving the data |
| 0: set (9/192 shards): 12%|ββ | 163313/1393229 [00:02<10:15, 1998.84 examples/s]
Saving the dataset (10/192 shards): 13%|ββ | 180827/1393229 [00:02<10:06, 1998.84 examples/s]
Saving the dataset (11/192 shards): 13%|ββ | 181084/1393229 [00:02<10:06, 1998.84 examples/s]
Saving the dataset (12/192 shards): 14%|ββ | 189084/1393229 [00:02<10:02, 1998.84 examples/s]
Saving the dataset (13/192 shards): 16%|ββ | 220855/1393229 [00:02<09:46, 1998.84 examples/s]
Saving the dataset (14/192 shards): 16%|ββ | 220855/1393229 [00:02<09:46, 1998.84 examples/s]
Saving the dataset (15/192 shards): 17%|ββ | 238369/1393229 [00:02<09:37, 1998.84 examples/s]
Saving the dataset (16/192 shards): 17%|ββ | 238369/1393229 [00:02<09:37, 1998.84 examples/s]
Saving the dataset (17/192 shards): 17%|ββ | 238626/1393229 [00:02<09:37, 1998.84 examples/s]
Saving the dataset (18/192 shards): 17%|ββ | 240626/1393229 [00:02<09:36, |
| 0: 1998.84 examples/s]
Saving the dataset (19/192 shards): 18%|ββ | 246883/1393229 [00:02<09:33, 1998.84 examples/s]
Saving the dataset (20/192 shards): 20%|ββ | 272911/1393229 [00:02<09:20, 1998.84 examples/s]
Saving the dataset (21/192 shards): 20%|ββ | 272911/1393229 [00:02<09:20, 1998.84 examples/s]
Saving the dataset (22/192 shards): 20%|ββ | 272911/1393229 [00:02<09:20, 1998.84 examples/s]
Saving the dataset (23/192 shards): 21%|ββ | 286168/1393229 [00:02<09:13, 1998.84 examples/s]
Saving the dataset (24/192 shards): 21%|ββ | 288168/1393229 [00:02<09:12, 1998.84 examples/s]
Saving the dataset (25/192 shards): 23%|βββ | 315682/1393229 [00:02<08:59, 1998.84 examples/s]
Saving the dataset (26/192 shards): 23%|βββ | 323939/1393229 [00:02<08:54, 1998.84 examples/s]
Saving the dataset (27/192 shards): 23%|βββ | 326453/1393229 [00:02<08:53, 1998.84 examples/s]
Saving the dataset (28/192 shards): 23%|β |
| 0: ββ | 326453/1393229 [00:02<08:53, 1998.84 examples/s]
Saving the dataset (29/192 shards): 24%|βββ | 340967/1393229 [00:02<08:46, 1998.84 examples/s]
Saving the dataset (30/192 shards): 24%|βββ | 340967/1393229 [00:02<08:46, 1998.84 examples/s]
Saving the dataset (31/192 shards): 25%|βββ | 348224/1393229 [00:02<08:42, 1998.84 examples/s]
Saving the dataset (32/192 shards): 25%|βββ | 348224/1393229 [00:02<08:42, 1998.84 examples/s]
Saving the dataset (33/192 shards): 27%|βββ | 374509/1393229 [00:02<08:29, 1998.84 examples/s]
Saving the dataset (34/192 shards): 27%|βββ | 377509/1393229 [00:02<08:28, 1998.84 examples/s]
Saving the dataset (35/192 shards): 27%|βββ | 382509/1393229 [00:02<08:25, 1998.84 examples/s]
Saving the dataset (36/192 shards): 28%|βββ | 385509/1393229 [00:02<08:24, 1998.84 examples/s]
Saving the dataset (37/192 shards): 28%|βββ | 386766/1393229 [00:02<08:23, 1998.84 |
| 0: examples/s]
Saving the dataset (38/192 shards): 28%|βββ | 394023/1393229 [00:02<08:19, 1998.84 examples/s]
Saving the dataset (39/192 shards): 29%|βββ | 404023/1393229 [00:02<08:14, 1998.84 examples/s]
Saving the dataset (40/192 shards): 30%|βββ | 413280/1393229 [00:02<08:10, 1998.84 examples/s]
Saving the dataset (41/192 shards): 30%|βββ | 422051/1393229 [00:02<08:05, 1998.84 examples/s]
Saving the dataset (42/192 shards): 31%|βββ | 426308/1393229 [00:02<08:03, 1998.84 examples/s]
Saving the dataset (43/192 shards): 31%|βββ | 426308/1393229 [00:02<08:03, 1998.84 examples/s]
Saving the dataset (44/192 shards): 31%|βββ | 426308/1393229 [00:02<08:03, 1998.84 examples/s]
Saving the dataset (45/192 shards): 32%|ββββ | 442079/1393229 [00:02<07:55, 1998.84 examples/s]
Saving the dataset (46/192 shards): 32%|ββββ | 442079/1393229 [00:02<07:55, 1998.84 examples/s]
Saving the dataset (47/192 shards): |
| 0: 32%|ββββ | 450079/1393229 [00:02<07:51, 1998.84 examples/s]
Saving the dataset (48/192 shards): 34%|ββββ | 471107/1393229 [00:02<07:41, 1998.84 examples/s]
Saving the dataset (49/192 shards): 34%|ββββ | 476364/1393229 [00:02<07:38, 1998.84 examples/s]
Saving the dataset (50/192 shards): 34%|ββββ | 478364/1393229 [00:02<07:37, 1998.84 examples/s]
Saving the dataset (51/192 shards): 34%|ββββ | 478364/1393229 [00:02<07:37, 1998.84 examples/s]
Saving the dataset (52/192 shards): 35%|ββββ | 481364/1393229 [00:02<07:36, 1998.84 examples/s]
Saving the dataset (53/192 shards): 36%|ββββ | 502878/1393229 [00:02<07:25, 1998.84 examples/s]
Saving the dataset (54/192 shards): 36%|ββββ | 505392/1393229 [00:02<07:24, 1998.84 examples/s]
Saving the dataset (55/192 shards): 37%|ββββ | 517392/1393229 [00:02<07:18, 1998.84 examples/s]
Saving the dataset (56/192 shards): 37%|ββββ | 517392/13 |
| 0: 93229 [00:02<07:18, 1998.84 examples/s]
Saving the dataset (57/192 shards): 38%|ββββ | 531649/1393229 [00:02<07:11, 1998.84 examples/s]
Saving the dataset (58/192 shards): 40%|ββββ | 556420/1393229 [00:02<06:58, 1998.84 examples/s]
Saving the dataset (59/192 shards): 40%|ββββ | 556420/1393229 [00:02<06:58, 1998.84 examples/s]
Saving the dataset (60/192 shards): 40%|ββββ | 560420/1393229 [00:02<06:56, 1998.84 examples/s]
Saving the dataset (61/192 shards): 42%|βββββ | 584448/1393229 [00:02<06:44, 1998.84 examples/s]
Saving the dataset (62/192 shards): 42%|βββββ | 588448/1393229 [00:02<06:42, 1998.84 examples/s]
Saving the dataset (63/192 shards): 42%|βββββ | 588705/1393229 [00:02<06:42, 1998.84 examples/s]
Saving the dataset (64/192 shards): 42%|βββββ | 590705/1393229 [00:02<06:41, 1998.84 examples/s]
Saving the dataset (65/192 shards): 43%|βββββ | 602962/1393229 [00:02<06:35, 1998 |
| 0: .84 examples/s]
Saving the dataset (66/192 shards): 44%|βββββ | 611219/1393229 [00:02<06:31, 1998.84 examples/s]
Saving the dataset (67/192 shards): 44%|βββββ | 618219/1393229 [00:02<06:27, 1998.84 examples/s]
Saving the dataset (68/192 shards): 46%|βββββ | 645247/1393229 [00:02<06:14, 1998.84 examples/s]
Saving the dataset (69/192 shards): 46%|βββββ | 645247/1393229 [00:02<06:14, 1998.84 examples/s]
Saving the dataset (70/192 shards): 47%|βββββ | 650504/1393229 [00:02<06:11, 1998.84 examples/s]
Saving the dataset (71/192 shards): 47%|βββββ | 650504/1393229 [00:02<06:11, 1998.84 examples/s]
Saving the dataset (72/192 shards): 47%|βββββ | 656761/1393229 [00:02<06:08, 1998.84 examples/s]
Saving the dataset (73/192 shards): 48%|βββββ | 665761/1393229 [00:02<06:03, 1998.84 examples/s]
Saving the dataset (74/192 shards): 50%|βββββ | 697018/1393229 [00:02<05:48, 1998.84 examples/s]
|
| 0: Saving the dataset (75/192 shards): 52%|ββββββ | 728275/1393229 [00:02<05:32, 1998.84 examples/s]
Saving the dataset (76/192 shards): 53%|ββββββ | 743044/1393229 [00:02<05:25, 1998.84 examples/s]
Saving the dataset (77/192 shards): 54%|ββββββ | 747557/1393229 [00:02<05:23, 1998.84 examples/s]
Saving the dataset (78/192 shards): 55%|ββββββ | 761069/1393229 [00:02<05:16, 1998.84 examples/s]
Saving the dataset (79/192 shards): 55%|ββββββ | 763325/1393229 [00:02<05:15, 1998.84 examples/s]
Saving the dataset (80/192 shards): 55%|ββββββ | 763325/1393229 [00:02<05:15, 1998.84 examples/s]
Saving the dataset (81/192 shards): 55%|ββββββ | 767581/1393229 [00:02<05:13, 1998.84 examples/s]
Saving the dataset (82/192 shards): 55%|ββββββ | 769581/1393229 [00:02<05:12, 1998.84 examples/s]
Saving the dataset (83/192 shards): 55%|ββββββ | 769581/1393229 [00:02<05:12, 1998.84 examples/s |
| 0: ]
Saving the dataset (84/192 shards): 57%|ββββββ | 788093/1393229 [00:02<05:02, 1998.84 examples/s]
Saving the dataset (85/192 shards): 58%|ββββββ | 802605/1393229 [00:02<04:55, 1998.84 examples/s]
Saving the dataset (86/192 shards): 58%|ββββββ | 806605/1393229 [00:02<04:53, 1998.84 examples/s]
Saving the dataset (87/192 shards): 58%|ββββββ | 811861/1393229 [00:02<04:50, 1998.84 examples/s]
Saving the dataset (88/192 shards): 59%|ββββββ | 817117/1393229 [00:02<04:48, 1998.84 examples/s]
Saving the dataset (89/192 shards): 59%|ββββββ | 825373/1393229 [00:02<04:44, 1998.84 examples/s]
Saving the dataset (90/192 shards): 59%|ββββββ | 827629/1393229 [00:02<04:42, 1998.84 examples/s]
Saving the dataset (91/192 shards): 60%|ββββββ | 829629/1393229 [00:02<04:41, 1998.84 examples/s]
Saving the dataset (92/192 shards): 60%|ββββββ | 841885/1393229 [00:02<04:35, 1998.84 examples |
| 0: /s]
Saving the dataset (93/192 shards): 61%|βββββββ | 853397/1393229 [00:02<04:30, 1998.84 examples/s]
Saving the dataset (94/192 shards): 61%|βββββββ | 853397/1393229 [00:02<04:30, 1998.84 examples/s]
Saving the dataset (95/192 shards): 61%|βββββββ | 853397/1393229 [00:02<04:30, 1998.84 examples/s]
Saving the dataset (96/192 shards): 63%|βββββββ | 880165/1393229 [00:02<04:16, 1998.84 examples/s]
Saving the dataset (97/192 shards): 65%|βββββββ | 901701/1393229 [00:02<04:05, 1998.84 examples/s]
Saving the dataset (98/192 shards): 65%|βββββββ | 901701/1393229 [00:02<04:05, 1998.84 examples/s]
Saving the dataset (99/192 shards): 65%|βββββββ | 901701/1393229 [00:02<04:05, 1998.84 examples/s]
Saving the dataset (100/192 shards): 65%|βββββββ | 907701/1393229 [00:02<04:02, 1998.84 examples/s]
Saving the dataset (101/192 shards): 65%|βββββββ | 907701/1393229 [00:02<0 |
| 0: 4:02, 1998.84 examples/s]
Saving the dataset (102/192 shards): 65%|βββββββ | 909701/1393229 [00:02<04:01, 1998.84 examples/s]
Saving the dataset (103/192 shards): 66%|βββββββ | 913701/1393229 [00:02<03:59, 1998.84 examples/s]
Saving the dataset (104/192 shards): 66%|βββββββ | 919701/1393229 [00:02<03:56, 1998.84 examples/s]
Saving the dataset (105/192 shards): 67%|βββββββ | 933213/1393229 [00:02<03:50, 1998.84 examples/s]
Saving the dataset (106/192 shards): 67%|βββββββ | 934469/1393229 [00:02<03:49, 1998.84 examples/s]
Saving the dataset (107/192 shards): 68%|βββββββ | 946981/1393229 [00:02<03:43, 1998.84 examples/s]
Saving the dataset (107/192 shards): 69%|βββββββ | 954493/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (108/192 shards): 69%|βββββββ | 965493/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (109/192 shards): 69%|βββββοΏ½ |
| 0: οΏ½οΏ½β | 965749/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (110/192 shards): 69%|βββββββ | 967517/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (111/192 shards): 70%|βββββββ | 969517/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (112/192 shards): 70%|βββββββ | 974517/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (113/192 shards): 70%|βββββββ | 978517/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (114/192 shards): 70%|βββββββ | 978517/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (115/192 shards): 70%|βββββββ | 978517/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (116/192 shards): 72%|ββββββββ | 999029/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (117/192 shards): 72%|ββββββββ | 1001029/1393229 [00:02<00:00, 635171.48 examples/s]
Saving |
| 0: the dataset (118/192 shards): 73%|ββββββββ | 1021053/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (119/192 shards): 74%|ββββββββ | 1034053/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (120/192 shards): 74%|ββββββββ | 1034053/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (121/192 shards): 75%|ββββββββ | 1042821/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (122/192 shards): 76%|ββββββββ | 1052589/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (123/192 shards): 76%|ββββββββ | 1055589/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (124/192 shards): 76%|ββββββββ | 1060845/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (125/192 shards): 76%|ββββββββ | 1062845/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (126/192 shards): 76%|ββββ |
| 0: ββββ | 1065101/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (127/192 shards): 76%|ββββββββ | 1065101/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (128/192 shards): 77%|ββββββββ | 1067613/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (129/192 shards): 77%|ββββββββ | 1069613/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (130/192 shards): 77%|ββββββββ | 1074869/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (131/192 shards): 78%|ββββββββ | 1079869/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (132/192 shards): 78%|ββββββββ | 1083125/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (133/192 shards): 78%|ββββββββ | 1093381/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (134/192 shards): 79%|ββββββββ | 1104405/1393229 [00:02<00:00, 63 |
| 0: 5171.48 examples/s]
Saving the dataset (135/192 shards): 79%|ββββββββ | 1105661/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (136/192 shards): 80%|ββββββββ | 1109917/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (137/192 shards): 80%|ββββββββ | 1109917/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (138/192 shards): 80%|ββββββββ | 1109917/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (139/192 shards): 80%|ββββββββ | 1109917/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (140/192 shards): 80%|ββββββββ | 1113917/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (141/192 shards): 81%|ββββββββ | 1125173/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (142/192 shards): 81%|ββββββββ | 1128685/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (143/192 |
| 0: shards): 82%|βββββββββ | 1137941/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (144/192 shards): 82%|βββββββββ | 1143941/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (145/192 shards): 83%|βββββββββ | 1151453/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (146/192 shards): 83%|βββββββββ | 1155709/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (147/192 shards): 84%|βββββββββ | 1167221/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (148/192 shards): 84%|βββββββββ | 1170221/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (149/192 shards): 84%|βββββββββ | 1170221/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (150/192 shards): 85%|βββββββββ | 1182733/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (151/192 shards): 86%|ββββββ |
| 0: βββ | 1191733/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (152/192 shards): 87%|βββββββββ | 1206245/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (153/192 shards): 88%|βββββββββ | 1221013/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (154/192 shards): 88%|βββββββββ | 1221013/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (155/192 shards): 88%|βββββββββ | 1226269/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (156/192 shards): 88%|βββββββββ | 1232269/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (157/192 shards): 89%|βββββββββ | 1237525/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (158/192 shards): 89%|βββββββββ | 1243525/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (159/192 shards): 90%|βββββββββ | 1254549/1393229 [00: |
| 0: 02<00:00, 635171.48 examples/s]
Saving the dataset (160/192 shards): 90%|βββββββββ | 1254549/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (161/192 shards): 90%|βββββββββ | 1254549/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (162/192 shards): 90%|βββββββββ | 1257549/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (163/192 shards): 91%|βββββββββ | 1265805/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (164/192 shards): 92%|ββββββββββ| 1280061/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (165/192 shards): 93%|ββββββββββ| 1294317/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (166/192 shards): 94%|ββββββββββ| 1312085/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (167/192 shards): 94%|ββββββββββ| 1312085/1393229 [00:02<00:00, 635171.48 exam |
| 0: ples/s]
Saving the dataset (168/192 shards): 94%|ββββββββββ| 1312085/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (169/192 shards): 95%|ββββββββββ| 1322109/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (170/192 shards): 95%|ββββββββββ| 1322109/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (171/192 shards): 95%|ββββββββββ| 1326109/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (172/192 shards): 95%|ββββββββββ| 1326109/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (173/192 shards): 95%|ββββββββββ| 1330133/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (174/192 shards): 95%|ββββββββββ| 1330133/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (175/192 shards): 95%|ββββββββββ| 1330133/1393229 [00:02<00:00, 635171.48 examples/s]
Saving t |
| 0: he dataset (176/192 shards): 96%|ββββββββββ| 1335389/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (177/192 shards): 96%|ββββββββββ| 1340645/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (178/192 shards): 97%|ββββββββββ| 1344901/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (179/192 shards): 97%|ββββββββββ| 1344901/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (180/192 shards): 97%|ββββββββββ| 1349157/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (181/192 shards): 97%|ββββββββββ| 1353413/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (182/192 shards): 97%|ββββββββββ| 1354925/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (183/192 shards): 97%|ββββββββββ| 1354925/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (184/ |
| 0: 192 shards): 98%|ββββββββββ| 1363181/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (185/192 shards): 99%|ββββββββββ| 1383693/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (186/192 shards): 99%|ββββββββββ| 1383949/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (187/192 shards): 100%|ββββββββββ| 1388717/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (188/192 shards): 100%|ββββββββββ| 1388717/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (189/192 shards): 100%|ββββββββββ| 1388717/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (190/192 shards): 100%|ββββββββββ| 1388717/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (191/192 shards): 100%|ββββββββββ| 1393229/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (192/192 shards): 100 |
| 0: %|ββββββββββ| 1393229/1393229 [00:02<00:00, 635171.48 examples/s]
Saving the dataset (192/192 shards): 100%|ββββββββββ| 1393229/1393229 [00:02<00:00, 629771.92 examples/s] |
| 0: [2025-08-20 16:57:32,211] [INFO] [axolotl.utils.data.shared.load_preprocessed_dataset:471] [PID:880949] [RANK:0] Loading prepared dataset from disk at /lustre/fsn1/projects/rech/dgo/udv55np/dataset/Qwen3-235B-A22B/Qwen2.5-0.5B/mix_0/b92082c2117369c7b34fbc8edf04b9ae...[39m |
| 0: [2025-08-20 16:58:40,864] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:435] [PID:880949] [RANK:0] gather_len_batches: [47617, 47617, 47617, 47615, 47616, 47616, 47617, 47617, 47616, 47616, 47617, 47617, 47617, 47616, 47617, 47617][39m |
| 0: [2025-08-20 16:58:40,875] [INFO] [axolotl.utils.trainer.calc_sample_packing_eff_est:496] [PID:880949] [RANK:0] sample_packing_eff_est across ranks: [0.9988111257553101, 0.9987481832504272, 0.9987691640853882, 0.9987901449203491, 0.9987901449203491, 0.9987481832504272, 0.9987901449203491, 0.9987901449203491, 0.9987901449203491, 0.9987901449203491, 0.9987901449203491, 0.9987901449203491, 0.9987481832504272, 0.9988111257553101, 0.9988111257553101, 0.9987691640853882][39m |
| 0: [2025-08-20 16:58:40,891] [INFO] [axolotl.utils.data.sft._prepare_standard_dataset:123] [PID:880949] [RANK:0] Maximum number of steps set at 2975[39m |
| 1: You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. |
| 3: You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. |
| 2: You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. |
| 3: You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. |
| 3: You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. |
| 0: You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. |
| 2: You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. |
| 2: You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. |
| 3: You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. |
| 2: You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. |
| 1: You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. |
| 1: You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. |
| 0: You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. |
| 1: You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. |
| 0: You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. |
| 0: You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. |
| 0: [2025-08-20 16:58:42,585] [INFO] [axolotl.loaders.model._configure_embedding_dtypes:317] [PID:880949] [RANK:0] Converting modules to torch.bfloat16[39m |
| 0: [2025-08-20 16:58:45,139] [INFO] [axolotl.train.save_initial_configs:397] [PID:880949] [RANK:0] Pre-saving tokenizer to /lustre/fswork/projects/rech/dgo/udv55np/ift/Qwen3-235B-A22B/Qwen2.5-0.5B/0...[39m |
| 0: [2025-08-20 16:58:45,304] [INFO] [axolotl.train.save_initial_configs:400] [PID:880949] [RANK:0] Pre-saving model config to /lustre/fswork/projects/rech/dgo/udv55np/ift/Qwen3-235B-A22B/Qwen2.5-0.5B/0...[39m |
| 0: [2025-08-20 16:58:45,312] [INFO] [axolotl.train.execute_training:221] [PID:880949] [RANK:0] Starting trainer...[39m |
| 0: [2025-08-20 17:02:39,390] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:435] [PID:880949] [RANK:0] gather_len_batches: [47616, 47616, 47616, 47616, 47616, 47616, 47616, 47616, 47616, 47616, 47616, 47616, 47616, 47616, 47616, 47616][39m |
| 0: Parameter Offload - Persistent parameters statistics: param_count = 121, numel = 71552 |
| 0: {'loss': 1.1147, 'grad_norm': 3.6220792776073254, 'learning_rate': 1.54e-06, 'epoch': 0.0} |
| 0:
0%| | 0/2975 [00:00<?, ?it/s]
0%| | 1/2975 [03:46<187:02:36, 226.41s/it]
0%| | 2/2975 [03:51<79:16:39, 96.00s/it]
0%| | 3/2975 [03:52<43:29:41, 52.69s/it]
0%| | 4/2975 [03:53<26:41:06, 32.33s/it]
0%| | 5/2975 [03:54<17:14:35, 20.90s/it]
0%| | 6/2975 [03:54<11:32:47, 14.00s/it]
0%| | 7/2975 [03:55<7:55:53, 9.62s/it]
0%| | 8/2975 [03:55<5:33:50, 6.75s/it]
0%| | 9/2975 [03:56<3:58:42, 4.83s/it]
0%| | 10/2975 [03:57<2:54:10, 3.52s/it]
0%| | 10/2975 [03:57<2:54:10, 3.52s/it]
0%| | 11/2975 [03:57<2:10:00, 2.63s/it]
0%| | 12/2975 [03:58<1:39:29, 2.01s/it]
0%| | 13/2975 [03:58<1:18:20, 1.59s/it]
0%| | 14/2975 [03:59<1:03:36, 1.29s/it]
1%| | 15/2975 [04:00<53:23, 1.08s/it]
1%| | 16/2975 [04:00<46:14, 1.07it/s]
1%| | 17/2975 [04:01<41:15, 1.19i |
| 0: {'loss': 1.0079, 'grad_norm': 2.185977392937427, 'learning_rate': 2.1400000000000003e-06, 'epoch': 0.01} |
| 0: {'loss': 0.9488, 'grad_norm': 1.279183640961094, 'learning_rate': 2.7400000000000004e-06, 'epoch': 0.01} |
| 0: t/s]
1%| | 18/2975 [04:01<37:47, 1.30it/s]
1%| | 19/2975 [04:02<35:23, 1.39it/s]
1%| | 20/2975 [04:03<33:38, 1.46it/s]
1%| | 20/2975 [04:03<33:38, 1.46it/s]
1%| | 21/2975 [04:03<32:31, 1.51it/s]
1%| | 22/2975 [04:04<31:44, 1.55it/s]
1%| | 23/2975 [04:04<31:08, 1.58it/s]
1%| | 24/2975 [04:05<30:43, 1.60it/s]
1%| | 25/2975 [04:06<30:25, 1.62it/s]
1%| | 26/2975 [04:06<30:12, 1.63it/s]
1%| | 27/2975 [04:07<30:06, 1.63it/s]
1%| | 28/2975 [04:07<30:05, 1.63it/s]
1%| | 29/2975 [04:08<30:07, 1.63it/s]
1%| | 30/2975 [04:09<30:09, 1.63it/s]
1%| | 30/2975 [04:09<30:09, 1.63it/s]
1%| | 31/2975 [04:09<30:07, 1.63it/s]
1%| | 32/2975 [04:10<30:02, 1.63it/s]
1%| | 33/2975 [04:10<29:54, 1.64it/s]
1%| | |
| 0: {'loss': 0.9124, 'grad_norm': 1.2547741524546454, 'learning_rate': 3.3400000000000006e-06, 'epoch': 0.01} |
| 0: {'loss': 0.8961, 'grad_norm': 1.0408630202611184, 'learning_rate': 3.94e-06, 'epoch': 0.02} |
| 0: 34/2975 [04:11<29:47, 1.65it/s]
1%| | 35/2975 [04:12<29:46, 1.65it/s]
1%| | 36/2975 [04:12<29:43, 1.65it/s]
1%| | 37/2975 [04:13<29:41, 1.65it/s]
1%|β | 38/2975 [04:14<29:41, 1.65it/s]
1%|β | 39/2975 [04:14<29:40, 1.65it/s]
1%|β | 40/2975 [04:15<29:38, 1.65it/s]
1%|β | 40/2975 [04:15<29:38, 1.65it/s]
1%|β | 41/2975 [04:15<29:40, 1.65it/s]
1%|β | 42/2975 [04:16<29:38, 1.65it/s]
1%|β | 43/2975 [04:17<29:40, 1.65it/s]
1%|β | 44/2975 [04:17<29:46, 1.64it/s]
2%|β | 45/2975 [04:18<29:54, 1.63it/s]
2%|β | 46/2975 [04:18<29:57, 1.63it/s]
2%|β | 47/2975 [04:19<29:56, 1.63it/s]
2%|β | 48/2975 [04:20<29:53, 1.63it/s]
2%|β | 49/2975 [04:20<29:51, 1.63it/s]
2%|β | 50/2975 [04:21<29:45, 1.64it/s]
2%|β |
| 0: {'loss': 0.8844, 'grad_norm': 0.9864649886106432, 'learning_rate': 4.54e-06, 'epoch': 0.02} |
| 0: | 50/2975 [04:21<29:45, 1.64it/s]
2%|β | 51/2975 [04:21<29:42, 1.64it/s]
2%|β | 52/2975 [04:22<29:36, 1.65it/s]
2%|β | 53/2975 [04:23<29:35, 1.65it/s]
2%|β | 54/2975 [04:23<29:36, 1.64it/s]
2%|β | 55/2975 [04:24<29:39, 1.64it/s]
2%|β | 56/2975 [04:25<29:39, 1.64it/s]
2%|β | 57/2975 [04:25<29:42, 1.64it/s]
2%|β | 58/2975 [04:26<29:48, 1.63it/s]
2%|β | 59/2975 [04:26<29:51, 1.63it/s]
2%|β | 60/2975 [04:27<29:47, 1.63it/s]
2%|β | 60/2975 [04:27<29:47, 1.63it/s]
2%|β | 61/2975 [04:28<29:45, 1.63it/s]
2%|β | 62/2975 [04:28<29:39, 1.64it/s]
2%|β | 63/2975 [04:29<29:34, 1.64it/s]
2%|β | 64/2975 [04:29<29:32, 1.64it/s]
2%|β | 65/2975 [04:30<29:30, 1.64it/s]
2%|β | 66/2975 [04:31<29:27, 1.65it/s]
2%|β | 67/2975 [04:31<29:24, 1.65it |
| 0: {'loss': 0.8461, 'grad_norm': 1.0171978797051726, 'learning_rate': 5.140000000000001e-06, 'epoch': 0.02} |
| 0: {'loss': 0.8403, 'grad_norm': 0.9604348470003533, 'learning_rate': 5.74e-06, 'epoch': 0.03} |
| 0: /s]
2%|β | 68/2975 [04:32<29:28, 1.64it/s]
2%|β | 69/2975 [04:32<29:35, 1.64it/s]
2%|β | 70/2975 [04:33<29:41, 1.63it/s]
2%|β | 70/2975 [04:33<29:41, 1.63it/s]
2%|β | 71/2975 [04:34<29:48, 1.62it/s]
2%|β | 72/2975 [04:34<29:46, 1.62it/s]
2%|β | 73/2975 [04:35<29:43, 1.63it/s]
2%|β | 74/2975 [04:36<29:40, 1.63it/s]
3%|β | 75/2975 [04:36<29:37, 1.63it/s]
3%|β | 76/2975 [04:37<29:42, 1.63it/s]
3%|β | 77/2975 [04:37<29:43, 1.62it/s]
3%|β | 78/2975 [04:38<29:46, 1.62it/s]
3%|β | 79/2975 [04:39<29:43, 1.62it/s]
3%|β | 80/2975 [04:39<29:41, 1.63it/s]
3%|β | 80/2975 [04:39<29:41, 1.63it/s]
3%|β | 81/2975 [04:40<29:41, 1.62it/s]
3%|β | 82/2975 [04:40<29:37, 1.63it/s]
3%|β | 83/2975 [04:41< |
| 0: {'loss': 0.845, 'grad_norm': 0.9465021140599814, 'learning_rate': 6.34e-06, 'epoch': 0.03} |
| 0: 29:37, 1.63it/s]
3%|β | 84/2975 [04:42<29:38, 1.63it/s]
3%|β | 85/2975 [04:42<29:37, 1.63it/s]
3%|β | 86/2975 [04:43<29:46, 1.62it/s]
3%|β | 87/2975 [04:44<29:46, 1.62it/s]
3%|β | 88/2975 [04:44<29:42, 1.62it/s]
3%|β | 89/2975 [04:45<29:38, 1.62it/s]
3%|β | 90/2975 [04:45<29:36, 1.62it/s]
3%|β | 90/2975 [04:45<29:36, 1.62it/s]
3%|β | 91/2975 [04:46<29:38, 1.62it/s]
3%|β | 92/2975 [04:47<29:32, 1.63it/s]
3%|β | 93/2975 [04:47<29:32, 1.63it/s]
3%|β | 94/2975 [04:48<29:33, 1.62it/s]
3%|β | 95/2975 [04:48<29:32, 1.62it/s]
3%|β | 96/2975 [04:49<29:29, 1.63it/s]
3%|β | 97/2975 [04:50<29:31, 1.62it/s]
3%|β | 98/2975 [04:50<29:33, 1.62it/s]
3%|β | 99/2975 [04:51<29:34, 1.62it/s]
3%|β | 100/2975 [04:52<29:51, 1.60it/s]
|
| 0: {'loss': 0.8482, 'grad_norm': 1.1297069466960947, 'learning_rate': 6.940000000000001e-06, 'epoch': 0.03} |
| 0: {'loss': 0.8297, 'grad_norm': 0.9648123456744243, 'learning_rate': 7.540000000000001e-06, 'epoch': 0.04} |
| 0:
3%|β | 100/2975 [04:52<29:51, 1.60it/s]
3%|β | 101/2975 [04:52<29:47, 1.61it/s]
3%|β | 102/2975 [04:53<29:42, 1.61it/s]
3%|β | 103/2975 [04:53<29:40, 1.61it/s]
3%|β | 104/2975 [04:54<29:35, 1.62it/s]
4%|β | 105/2975 [04:55<29:26, 1.63it/s]
4%|β | 106/2975 [04:55<29:20, 1.63it/s]
4%|β | 107/2975 [04:56<29:16, 1.63it/s]
4%|β | 108/2975 [04:56<29:14, 1.63it/s]
4%|β | 109/2975 [04:57<29:13, 1.63it/s]
4%|β | 110/2975 [04:58<29:11, 1.64it/s]
4%|β | 110/2975 [04:58<29:11, 1.64it/s]
4%|β | 111/2975 [04:58<29:11, 1.64it/s]
4%|β | 112/2975 [04:59<29:07, 1.64it/s]
4%|β | 113/2975 [05:00<29:10, 1.63it/s]
4%|β | 114/2975 [05:00<29:08, 1.64it/s]
4%|β | 115/2975 [05:01<29:09, 1.64it/s]
4%|β | 116/2975 [05:01<29 |
| 0: {'loss': 0.8187, 'grad_norm': 0.9406876728676388, 'learning_rate': 8.14e-06, 'epoch': 0.04} |
| 0: {'loss': 0.8218, 'grad_norm': 0.925012843902631, 'learning_rate': 8.740000000000001e-06, 'epoch': 0.04} |
| 0: :07, 1.64it/s]
4%|β | 117/2975 [05:02<29:07, 1.64it/s]
4%|β | 118/2975 [05:03<29:06, 1.64it/s]
4%|β | 119/2975 [05:03<29:04, 1.64it/s]
4%|β | 120/2975 [05:04<29:12, 1.63it/s]
4%|β | 120/2975 [05:04<29:12, 1.63it/s]
4%|β | 121/2975 [05:04<29:13, 1.63it/s]
4%|β | 122/2975 [05:05<29:13, 1.63it/s]
4%|β | 123/2975 [05:06<29:13, 1.63it/s]
4%|β | 124/2975 [05:06<29:15, 1.62it/s]
4%|β | 125/2975 [05:07<29:14, 1.62it/s]
4%|β | 126/2975 [05:08<29:14, 1.62it/s]
4%|β | 127/2975 [05:08<29:13, 1.62it/s]
4%|β | 128/2975 [05:09<29:11, 1.63it/s]
4%|β | 129/2975 [05:09<29:11, 1.62it/s]
4%|β | 130/2975 [05:10<29:10, 1.63it/s]
4%|β | 130/2975 [05:10<29:10, 1.63it/s]
4%|β | 131/2975 [05:11<29:07, 1.63it/s]
4 |
| 0: {'loss': 0.8419, 'grad_norm': 0.8502300171757229, 'learning_rate': 9.34e-06, 'epoch': 0.05} |
| 0: %|β | 132/2975 [05:11<29:01, 1.63it/s]
4%|β | 133/2975 [05:12<28:59, 1.63it/s]
5%|β | 134/2975 [05:12<28:52, 1.64it/s]
5%|β | 135/2975 [05:13<28:48, 1.64it/s]
5%|β | 136/2975 [05:14<28:52, 1.64it/s]
5%|β | 137/2975 [05:14<28:56, 1.63it/s]
5%|β | 138/2975 [05:15<29:00, 1.63it/s]
5%|β | 139/2975 [05:15<29:01, 1.63it/s]
5%|β | 140/2975 [05:16<29:05, 1.62it/s]
5%|β | 140/2975 [05:16<29:05, 1.62it/s]
5%|β | 141/2975 [05:17<29:09, 1.62it/s]
5%|β | 142/2975 [05:17<29:04, 1.62it/s]
5%|β | 143/2975 [05:18<29:00, 1.63it/s]
5%|β | 144/2975 [05:19<28:53, 1.63it/s]
5%|β | 145/2975 [05:19<31:39, 1.49it/s]
5%|β | 146/2975 [05:20<30:45, 1.53it/s]
5%|β | 147/2975 [05:21<30:07, 1.56it/s]
5%|β | 148/2975 [05:21<29:45, 1.58it/s]
5%|β | 14 |
| 0: {'loss': 0.7937, 'grad_norm': 0.7785647727224466, 'learning_rate': 9.940000000000001e-06, 'epoch': 0.05} |
| 0: {'loss': 0.7883, 'grad_norm': 0.7493685913449163, 'learning_rate': 1e-05, 'epoch': 0.05} |
| 0: 9/2975 [05:22<29:27, 1.60it/s]
5%|β | 150/2975 [05:22<29:16, 1.61it/s]
5%|β | 150/2975 [05:22<29:16, 1.61it/s]
5%|β | 151/2975 [05:23<29:09, 1.61it/s]
5%|β | 152/2975 [05:24<28:58, 1.62it/s]
5%|β | 153/2975 [05:24<28:50, 1.63it/s]
5%|β | 154/2975 [05:25<28:42, 1.64it/s]
5%|β | 155/2975 [05:25<28:39, 1.64it/s]
5%|β | 156/2975 [05:26<28:35, 1.64it/s]
5%|β | 157/2975 [05:27<28:33, 1.64it/s]
5%|β | 158/2975 [05:27<28:32, 1.64it/s]
5%|β | 159/2975 [05:28<28:29, 1.65it/s]
5%|β | 160/2975 [05:28<28:25, 1.65it/s]
5%|β | 160/2975 [05:28<28:25, 1.65it/s]
5%|β | 161/2975 [05:29<28:27, 1.65it/s]
5%|β | 162/2975 [05:30<28:31, 1.64it/s]
5%|β | 163/2975 [05:30<28:32, 1.64it/s]
6%|β | 164/2975 [05:31<28:31 |
| 0: {'loss': 0.79, 'grad_norm': 0.7957620935977624, 'learning_rate': 1e-05, 'epoch': 0.06} |
| 0: {'loss': 0.7847, 'grad_norm': 0.782598850636886, 'learning_rate': 1e-05, 'epoch': 0.06} |
| 0: , 1.64it/s]
6%|β | 165/2975 [05:32<28:29, 1.64it/s]
6%|β | 166/2975 [05:32<28:29, 1.64it/s]
6%|β | 167/2975 [05:33<28:31, 1.64it/s]
6%|β | 168/2975 [05:33<28:32, 1.64it/s]
6%|β | 169/2975 [05:34<28:32, 1.64it/s]
6%|β | 170/2975 [05:35<28:30, 1.64it/s]
6%|β | 170/2975 [05:35<28:30, 1.64it/s]
6%|β | 171/2975 [05:35<28:29, 1.64it/s]
6%|β | 172/2975 [05:36<28:29, 1.64it/s]
6%|β | 173/2975 [05:36<28:27, 1.64it/s]
6%|β | 174/2975 [05:37<28:23, 1.64it/s]
6%|β | 175/2975 [05:38<28:22, 1.64it/s]
6%|β | 176/2975 [05:38<28:23, 1.64it/s]
6%|β | 177/2975 [05:39<28:22, 1.64it/s]
6%|β | 178/2975 [05:39<28:25, 1.64it/s]
6%|β | 179/2975 [05:40<28:24, 1.64it/s]
6%|β | 180/2975 [05:41<28:26, 1.64it/s]
6%|οΏ½ |
| 0: {'loss': 0.8106, 'grad_norm': 0.8284804242874985, 'learning_rate': 1e-05, 'epoch': 0.06} |
| 0: οΏ½οΏ½ | 180/2975 [05:41<28:26, 1.64it/s]
6%|β | 181/2975 [05:41<28:25, 1.64it/s]
6%|β | 182/2975 [05:42<28:24, 1.64it/s]
6%|β | 183/2975 [05:42<28:22, 1.64it/s]
6%|β | 184/2975 [05:43<28:22, 1.64it/s]
6%|β | 185/2975 [05:44<28:27, 1.63it/s]
6%|β | 186/2975 [05:44<28:29, 1.63it/s]
6%|β | 187/2975 [05:45<28:33, 1.63it/s]
6%|β | 188/2975 [05:46<28:33, 1.63it/s]
6%|β | 189/2975 [05:46<28:35, 1.62it/s]
6%|β | 190/2975 [05:47<28:38, 1.62it/s]
6%|β | 190/2975 [05:47<28:38, 1.62it/s]
6%|β | 191/2975 [05:47<28:40, 1.62it/s]
6%|β | 192/2975 [05:48<28:37, 1.62it/s]
6%|β | 193/2975 [05:49<28:35, 1.62it/s]
7%|β | 194/2975 [05:49<28:32, 1.62it/s]
7%|β | 195/2975 [05:50<28:31, 1.62it/s]
7%|β | 196/2975 [05:51<28:29, 1.63it/s]
7%|β | 197/2 |
| 0: {'loss': 0.7863, 'grad_norm': 0.7926107821646097, 'learning_rate': 1e-05, 'epoch': 0.07} |
| 0: {'loss': 0.7856, 'grad_norm': 0.8607046204689304, 'learning_rate': 1e-05, 'epoch': 0.07} |
| 0: 975 [05:51<28:27, 1.63it/s]
7%|β | 198/2975 [05:52<28:25, 1.63it/s]
7%|β | 199/2975 [05:52<28:27, 1.63it/s]
7%|β | 200/2975 [05:53<28:23, 1.63it/s]
7%|β | 200/2975 [05:53<28:23, 1.63it/s]
7%|β | 201/2975 [05:54<28:27, 1.62it/s]
7%|β | 202/2975 [05:54<28:26, 1.62it/s]
7%|β | 203/2975 [05:55<28:23, 1.63it/s]
7%|β | 204/2975 [05:55<28:26, 1.62it/s]
7%|β | 205/2975 [05:56<28:29, 1.62it/s]
7%|β | 206/2975 [05:57<28:26, 1.62it/s]
7%|β | 207/2975 [05:57<28:29, 1.62it/s]
7%|β | 208/2975 [05:58<28:29, 1.62it/s]
7%|β | 209/2975 [05:59<28:26, 1.62it/s]
7%|β | 210/2975 [05:59<28:23, 1.62it/s]
7%|β | 210/2975 [05:59<28:23, 1.62it/s]
7%|β | 211/2975 [06:00<28:27, 1.62it/s]
7%|β | 212/2975 [06:00<28:26, |
| 0: {'loss': 0.8051, 'grad_norm': 0.8692304162211446, 'learning_rate': 1e-05, 'epoch': 0.07} |
| 0: 1.62it/s]
7%|β | 213/2975 [06:01<28:25, 1.62it/s]
7%|β | 214/2975 [06:02<28:21, 1.62it/s]
7%|β | 215/2975 [06:02<28:19, 1.62it/s]
7%|β | 216/2975 [06:03<28:17, 1.63it/s]
7%|β | 217/2975 [06:03<28:15, 1.63it/s]
7%|β | 218/2975 [06:04<28:15, 1.63it/s]
7%|β | 219/2975 [06:05<28:12, 1.63it/s]
7%|β | 220/2975 [06:05<28:15, 1.63it/s]
7%|β | 220/2975 [06:05<28:15, 1.63it/s]
7%|β | 221/2975 [06:06<28:16, 1.62it/s]
7%|β | 222/2975 [06:07<28:14, 1.63it/s]
7%|β | 223/2975 [06:07<28:11, 1.63it/s]
8%|β | 224/2975 [06:08<28:11, 1.63it/s]
8%|β | 225/2975 [06:08<28:09, 1.63it/s]
8%|β | 226/2975 [06:09<28:06, 1.63it/s]
8%|β | 227/2975 [06:10<28:04, 1.63it/s]
8%|β | 228/2975 [06:10<28:07, 1.63it/s]
8%|β | 229/2975 [06:11<28:09, 1.63it/s]
8%|β |
| 0: {'loss': 0.7915, 'grad_norm': 0.8049170893159536, 'learning_rate': 1e-05, 'epoch': 0.08} |
| 0: {'loss': 0.781, 'grad_norm': 0.7687711775406885, 'learning_rate': 1e-05, 'epoch': 0.08} |
| 0: | 230/2975 [06:11<28:06, 1.63it/s]
8%|β | 230/2975 [06:11<28:06, 1.63it/s]
8%|β | 231/2975 [06:12<28:09, 1.62it/s]
8%|β | 232/2975 [06:13<28:06, 1.63it/s]
8%|β | 233/2975 [06:13<28:05, 1.63it/s]
8%|β | 234/2975 [06:14<28:03, 1.63it/s]
8%|β | 235/2975 [06:15<28:00, 1.63it/s]
8%|β | 236/2975 [06:15<28:00, 1.63it/s]
8%|β | 237/2975 [06:16<28:00, 1.63it/s]
8%|β | 238/2975 [06:16<27:59, 1.63it/s]
8%|β | 239/2975 [06:17<28:03, 1.63it/s]
8%|β | 240/2975 [06:18<27:59, 1.63it/s]
8%|β | 240/2975 [06:18<27:59, 1.63it/s]
8%|β | 241/2975 [06:18<28:00, 1.63it/s]
8%|β | 242/2975 [06:19<27:59, 1.63it/s]
8%|β | 243/2975 [06:19<28:00, 1.63it/s]
8%|β | 244/2975 [06:20<28:00, 1.63it/s]
8%|β | 245/2975 |
| 0: {'loss': 0.7829, 'grad_norm': 0.8128211126668727, 'learning_rate': 1e-05, 'epoch': 0.08} |
| 0: {'loss': 0.7748, 'grad_norm': 0.7604484943760149, 'learning_rate': 1e-05, 'epoch': 0.09} |
| 0: [06:21<28:00, 1.62it/s]
8%|β | 246/2975 [06:21<28:01, 1.62it/s]
8%|β | 247/2975 [06:22<28:01, 1.62it/s]
8%|β | 248/2975 [06:22<27:58, 1.63it/s]
8%|β | 249/2975 [06:23<27:56, 1.63it/s]
8%|β | 250/2975 [06:24<27:57, 1.62it/s]
8%|β | 250/2975 [06:24<27:57, 1.62it/s]
8%|β | 251/2975 [06:24<28:01, 1.62it/s]
8%|β | 252/2975 [06:25<27:58, 1.62it/s]
9%|β | 253/2975 [06:26<27:55, 1.62it/s]
9%|β | 254/2975 [06:26<27:56, 1.62it/s]
9%|β | 255/2975 [06:27<27:53, 1.63it/s]
9%|β | 256/2975 [06:27<27:49, 1.63it/s]
9%|β | 257/2975 [06:28<27:47, 1.63it/s]
9%|β | 258/2975 [06:29<27:47, 1.63it/s]
9%|β | 259/2975 [06:29<27:52, 1.62it/s]
9%|β | 260/2975 [06:30<27:51, 1.62it/s]
9%|β | 260/2975 [06:30<27:51, 1.6 |
| 0: {'loss': 0.783, 'grad_norm': 0.7744995010014754, 'learning_rate': 1e-05, 'epoch': 0.09} |
| 0: 2it/s]
9%|β | 261/2975 [06:31<27:51, 1.62it/s]
9%|β | 262/2975 [06:31<27:50, 1.62it/s]
9%|β | 263/2975 [06:32<27:48, 1.63it/s]
9%|β | 264/2975 [06:32<27:48, 1.62it/s]
9%|β | 265/2975 [06:33<27:45, 1.63it/s]
9%|β | 266/2975 [06:34<27:45, 1.63it/s]
9%|β | 267/2975 [06:34<27:44, 1.63it/s]
9%|β | 268/2975 [06:35<27:43, 1.63it/s]
9%|β | 269/2975 [06:35<27:40, 1.63it/s]
9%|β | 270/2975 [06:36<27:38, 1.63it/s]
9%|β | 270/2975 [06:36<27:38, 1.63it/s]
9%|β | 271/2975 [06:37<27:40, 1.63it/s]
9%|β | 272/2975 [06:37<27:39, 1.63it/s]
9%|β | 273/2975 [06:38<27:39, 1.63it/s]
9%|β | 274/2975 [06:38<27:38, 1.63it/s]
9%|β | 275/2975 [06:39<27:39, 1.63it/s]
9%|β | 276/2975 [06:40<27:38, 1.63it/s]
9%|β | 277/2975 [06:40<27:39, 1.63it/s]
9%|β |
| 0: {'loss': 0.7854, 'grad_norm': 0.7643071268098258, 'learning_rate': 1e-05, 'epoch': 0.09} |
| 0: {'loss': 0.7895, 'grad_norm': 0.8158389807254296, 'learning_rate': 1e-05, 'epoch': 0.1} |
| 0: | 278/2975 [06:41<27:37, 1.63it/s]
9%|β | 279/2975 [06:42<27:37, 1.63it/s]
9%|β | 280/2975 [06:42<27:35, 1.63it/s]
9%|β | 280/2975 [06:42<27:35, 1.63it/s]
9%|β | 281/2975 [06:43<27:35, 1.63it/s]
9%|β | 282/2975 [06:43<27:39, 1.62it/s]
10%|β | 283/2975 [06:44<27:39, 1.62it/s]
10%|β | 284/2975 [06:45<27:37, 1.62it/s]
10%|β | 285/2975 [06:45<27:33, 1.63it/s]
10%|β | 286/2975 [06:46<27:28, 1.63it/s]
10%|β | 287/2975 [06:46<27:25, 1.63it/s]
10%|β | 288/2975 [06:47<27:26, 1.63it/s]
10%|β | 289/2975 [06:48<27:25, 1.63it/s]
10%|β | 290/2975 [06:48<27:23, 1.63it/s]
10%|β | 290/2975 [06:48<27:23, 1.63it/s]
10%|β | 291/2975 [06:49<27:25, 1.63it/s]
10%|β | 292/2975 [06:50<27:21, 1.63it/s]
10%|β | 293/2975 [0 |
| 0: {'loss': 0.7638, 'grad_norm': 0.747639615762065, 'learning_rate': 1e-05, 'epoch': 0.1} |
| 0: 6:50<27:19, 1.64it/s]
10%|β | 294/2975 [06:51<27:18, 1.64it/s]
10%|β | 295/2975 [06:51<27:19, 1.64it/s]
10%|β | 296/2975 [06:52<27:15, 1.64it/s]
10%|β | 297/2975 [06:53<27:10, 1.64it/s]
10%|β | 298/2975 [06:53<27:07, 1.64it/s]
10%|β | 299/2975 [06:54<27:07, 1.64it/s]
10%|β | 300/2975 [06:54<27:11, 1.64it/s]
10%|β | 300/2975 [06:54<27:11, 1.64it/s]
10%|β | 301/2975 [06:55<27:10, 1.64it/s]
10%|β | 302/2975 [06:56<27:08, 1.64it/s]
10%|β | 303/2975 [06:56<27:05, 1.64it/s]
10%|β | 304/2975 [06:57<27:03, 1.65it/s]
10%|β | 305/2975 [06:57<27:00, 1.65it/s]
10%|β | 306/2975 [06:58<26:56, 1.65it/s]
10%|β | 307/2975 [06:59<26:54, 1.65it/s]
10%|β | 308/2975 [06:59<26:54, 1.65it/s]
10%|β | 309/2975 [07:00<26:58, 1.65it/s]
10%|β | 310/2975 [07:00<26:58, 1.65i |
| 0: {'loss': 0.761, 'grad_norm': 0.7527151310109943, 'learning_rate': 1e-05, 'epoch': 0.1} |
| 0: {'loss': 0.7666, 'grad_norm': 0.7679616039639676, 'learning_rate': 1e-05, 'epoch': 0.11} |
| 0: t/s]
10%|β | 310/2975 [07:00<26:58, 1.65it/s]
10%|β | 311/2975 [07:01<27:00, 1.64it/s]
10%|β | 312/2975 [07:02<26:59, 1.64it/s]
11%|β | 313/2975 [07:02<26:59, 1.64it/s]
11%|β | 314/2975 [07:03<26:57, 1.65it/s]
11%|β | 315/2975 [07:04<26:55, 1.65it/s]
11%|β | 316/2975 [07:04<26:56, 1.64it/s]
11%|β | 317/2975 [07:05<26:59, 1.64it/s]
11%|β | 318/2975 [07:05<26:55, 1.64it/s]
11%|β | 319/2975 [07:06<26:52, 1.65it/s]
11%|β | 320/2975 [07:07<26:52, 1.65it/s]
11%|β | 320/2975 [07:07<26:52, 1.65it/s]
11%|β | 321/2975 [07:07<26:53, 1.64it/s]
11%|β | 322/2975 [07:08<26:51, 1.65it/s]
11%|β | 323/2975 [07:08<26:51, 1.65it/s]
11%|β | 324/2975 [07:09<26:51, 1.64it/s]
11%|β | 325/2975 [07:10<26:54, 1.64it/s]
11%|β |
| 0: {'loss': 0.748, 'grad_norm': 0.8669834737593406, 'learning_rate': 1e-05, 'epoch': 0.11} |
| 0: {'loss': 0.7805, 'grad_norm': 0.77301353399369, 'learning_rate': 1e-05, 'epoch': 0.11} |
| 0: | 326/2975 [07:10<26:52, 1.64it/s]
11%|β | 327/2975 [07:11<26:50, 1.64it/s]
11%|β | 328/2975 [07:11<26:50, 1.64it/s]
11%|β | 329/2975 [07:12<26:52, 1.64it/s]
11%|β | 330/2975 [07:13<26:51, 1.64it/s]
11%|β | 330/2975 [07:13<26:51, 1.64it/s]
11%|β | 331/2975 [07:13<26:49, 1.64it/s]
11%|β | 332/2975 [07:14<26:51, 1.64it/s]
11%|β | 333/2975 [07:14<26:48, 1.64it/s]
11%|β | 334/2975 [07:15<26:47, 1.64it/s]
11%|ββ | 335/2975 [07:16<26:46, 1.64it/s]
11%|ββ | 336/2975 [07:16<26:44, 1.64it/s]
11%|ββ | 337/2975 [07:17<26:46, 1.64it/s]
11%|ββ | 338/2975 [07:18<26:45, 1.64it/s]
11%|ββ | 339/2975 [07:18<26:42, 1.64it/s]
11%|ββ | 340/2975 [07:19<26:39, 1.65it/s]
11%|ββ | 340/2975 [07:19<26:39, 1.65it/s]
11%|ββ |
| 0: {'loss': 0.7913, 'grad_norm': 0.7979909975428289, 'learning_rate': 1e-05, 'epoch': 0.12} |
| 0: | 341/2975 [07:19<26:43, 1.64it/s]
11%|ββ | 342/2975 [07:20<26:41, 1.64it/s]
12%|ββ | 343/2975 [07:21<26:41, 1.64it/s]
12%|ββ | 344/2975 [07:21<26:38, 1.65it/s]
12%|ββ | 345/2975 [07:22<26:35, 1.65it/s]
12%|ββ | 346/2975 [07:22<26:31, 1.65it/s]
12%|ββ | 347/2975 [07:23<26:29, 1.65it/s]
12%|ββ | 348/2975 [07:24<26:27, 1.65it/s]
12%|ββ | 349/2975 [07:24<26:26, 1.66it/s]
12%|ββ | 350/2975 [07:25<26:25, 1.66it/s]
12%|ββ | 350/2975 [07:25<26:25, 1.66it/s]
12%|ββ | 351/2975 [07:25<26:27, 1.65it/s]
12%|ββ | 352/2975 [07:26<26:26, 1.65it/s]
12%|ββ | 353/2975 [07:27<26:28, 1.65it/s]
12%|ββ | 354/2975 [07:27<26:28, 1.65it/s]
12%|ββ | 355/2975 [07:28<26:28, 1.65it/s]
12%|ββ | 356/2975 [07:28<26:28, 1.65it/s]
12%|ββ | 357/2975 [07:29<26:30, 1.65it/s]
|
| 0: {'loss': 0.7749, 'grad_norm': 0.752827606829707, 'learning_rate': 1e-05, 'epoch': 0.12} |
| 0: {'loss': 0.7592, 'grad_norm': 0.7632063835474084, 'learning_rate': 1e-05, 'epoch': 0.12} |
| 0: 12%|ββ | 358/2975 [07:30<26:31, 1.64it/s]
12%|ββ | 359/2975 [07:30<26:33, 1.64it/s]
12%|ββ | 360/2975 [07:31<29:08, 1.50it/s]
12%|ββ | 360/2975 [07:31<29:08, 1.50it/s]
12%|ββ | 361/2975 [07:32<28:21, 1.54it/s]
12%|ββ | 362/2975 [07:32<27:47, 1.57it/s]
12%|ββ | 363/2975 [07:33<27:24, 1.59it/s]
12%|ββ | 364/2975 [07:33<27:09, 1.60it/s]
12%|ββ | 365/2975 [07:34<27:00, 1.61it/s]
12%|ββ | 366/2975 [07:35<26:52, 1.62it/s]
12%|ββ | 367/2975 [07:35<26:45, 1.62it/s]
12%|ββ | 368/2975 [07:36<26:41, 1.63it/s]
12%|ββ | 369/2975 [07:37<26:36, 1.63it/s]
12%|ββ | 370/2975 [07:37<26:32, 1.64it/s]
12%|ββ | 370/2975 [07:37<26:32, 1.64it/s]
12%|ββ | 371/2975 [07:38<26:27, 1.64it/s]
13%|ββ | 372/2975 [07:38<26:2 |
| 0: {'loss': 0.7592, 'grad_norm': 0.741049028699946, 'learning_rate': 1e-05, 'epoch': 0.13} |
| 0: 4, 1.64it/s]
13%|ββ | 373/2975 [07:39<26:20, 1.65it/s]
13%|ββ | 374/2975 [07:40<26:19, 1.65it/s]
13%|ββ | 375/2975 [07:40<26:18, 1.65it/s]
13%|ββ | 376/2975 [07:41<26:18, 1.65it/s]
13%|ββ | 377/2975 [07:41<26:15, 1.65it/s]
13%|ββ | 378/2975 [07:42<26:12, 1.65it/s]
13%|ββ | 379/2975 [07:43<26:08, 1.66it/s]
13%|ββ | 380/2975 [07:43<26:07, 1.66it/s]
13%|ββ | 380/2975 [07:43<26:07, 1.66it/s]
13%|ββ | 381/2975 [07:44<26:09, 1.65it/s]
13%|ββ | 382/2975 [07:44<26:09, 1.65it/s]
13%|ββ | 383/2975 [07:45<26:07, 1.65it/s]
13%|ββ | 384/2975 [07:46<26:05, 1.66it/s]
13%|ββ | 385/2975 [07:46<26:05, 1.65it/s]
13%|ββ | 386/2975 [07:47<26:07, 1.65it/s]
13%|ββ | 387/2975 [07:47<26:07, 1.65it/s]
13%|ββ | 388/2975 [07:48<26:10, 1.65it/s]
13%|ββ | 38 |
| 0: {'loss': 0.786, 'grad_norm': 0.808057993234383, 'learning_rate': 1e-05, 'epoch': 0.13} |
| 0: {'loss': 0.7761, 'grad_norm': 0.8398541202344658, 'learning_rate': 1e-05, 'epoch': 0.13} |
| 0: 9/2975 [07:49<26:11, 1.65it/s]
13%|ββ | 390/2975 [07:49<26:13, 1.64it/s]
13%|ββ | 390/2975 [07:49<26:13, 1.64it/s]
13%|ββ | 391/2975 [07:50<26:13, 1.64it/s]
13%|ββ | 392/2975 [07:50<26:13, 1.64it/s]
13%|ββ | 393/2975 [07:51<26:13, 1.64it/s]
13%|ββ | 394/2975 [07:52<26:12, 1.64it/s]
13%|ββ | 395/2975 [07:52<26:08, 1.65it/s]
13%|ββ | 396/2975 [07:53<26:03, 1.65it/s]
13%|ββ | 397/2975 [07:54<26:00, 1.65it/s]
13%|ββ | 398/2975 [07:54<26:00, 1.65it/s]
13%|ββ | 399/2975 [07:55<25:59, 1.65it/s]
13%|ββ | 400/2975 [07:55<25:59, 1.65it/s]
13%|ββ | 400/2975 [07:55<25:59, 1.65it/s]
13%|ββ | 401/2975 [07:56<26:00, 1.65it/s]
14%|ββ | 402/2975 [07:57<25:57, 1.65it/s]
14%|ββ | 403/2975 [07:57<25:55, 1.65it/s]
14%|β |
| 0: {'loss': 0.7596, 'grad_norm': 0.8017339744514405, 'learning_rate': 1e-05, 'epoch': 0.14} |
| 0: β | 404/2975 [07:58<25:52, 1.66it/s]
14%|ββ | 405/2975 [07:58<25:50, 1.66it/s]
14%|ββ | 406/2975 [07:59<25:48, 1.66it/s]
14%|ββ | 407/2975 [08:00<25:48, 1.66it/s]
14%|ββ | 408/2975 [08:00<25:46, 1.66it/s]
14%|ββ | 409/2975 [08:01<25:47, 1.66it/s]
14%|ββ | 410/2975 [08:01<25:47, 1.66it/s]
14%|ββ | 410/2975 [08:01<25:47, 1.66it/s]
14%|ββ | 411/2975 [08:02<25:47, 1.66it/s]
14%|ββ | 412/2975 [08:03<25:46, 1.66it/s]
14%|ββ | 413/2975 [08:03<25:46, 1.66it/s]
14%|ββ | 414/2975 [08:04<25:48, 1.65it/s]
14%|ββ | 415/2975 [08:04<25:48, 1.65it/s]
14%|ββ | 416/2975 [08:05<25:49, 1.65it/s]
14%|ββ | 417/2975 [08:06<25:48, 1.65it/s]
14%|ββ | 418/2975 [08:06<25:48, 1.65it/s]
14%|ββ | 419/2975 [08:07<25:50, 1.65it/s]
14%|ββ | 420/2975 [08:07<25:53, |
| 0: {'loss': 0.765, 'grad_norm': 0.7680169702718432, 'learning_rate': 1e-05, 'epoch': 0.14} |
| 0: {'loss': 0.7588, 'grad_norm': 0.7973598542191226, 'learning_rate': 1e-05, 'epoch': 0.14} |
| 0: 1.64it/s]
14%|ββ | 420/2975 [08:07<25:53, 1.64it/s]
14%|ββ | 421/2975 [08:08<25:58, 1.64it/s]
14%|ββ | 422/2975 [08:09<25:56, 1.64it/s]
14%|ββ | 423/2975 [08:09<25:55, 1.64it/s]
14%|ββ | 424/2975 [08:10<25:53, 1.64it/s]
14%|ββ | 425/2975 [08:10<25:51, 1.64it/s]
14%|ββ | 426/2975 [08:11<25:48, 1.65it/s]
14%|ββ | 427/2975 [08:12<25:55, 1.64it/s]
14%|ββ | 428/2975 [08:12<25:49, 1.64it/s]
14%|ββ | 429/2975 [08:13<25:45, 1.65it/s]
14%|ββ | 430/2975 [08:14<25:42, 1.65it/s]
14%|ββ | 430/2975 [08:14<25:42, 1.65it/s]
14%|ββ | 431/2975 [08:14<25:42, 1.65it/s]
15%|ββ | 432/2975 [08:15<25:41, 1.65it/s]
15%|ββ | 433/2975 [08:15<25:38, 1.65it/s]
15%|ββ | 434/2975 [08:16<25:38, 1.65it/s]
15%|ββ | 435/2975 |
| 0: {'loss': 0.757, 'grad_norm': 0.7879157045491778, 'learning_rate': 1e-05, 'epoch': 0.15} |
| 0: {'loss': 0.7522, 'grad_norm': 0.7679264682899368, 'learning_rate': 1e-05, 'epoch': 0.15} |
| 0: [08:17<25:38, 1.65it/s]
15%|ββ | 436/2975 [08:17<25:36, 1.65it/s]
15%|ββ | 437/2975 [08:18<25:35, 1.65it/s]
15%|ββ | 438/2975 [08:18<25:51, 1.64it/s]
15%|ββ | 439/2975 [08:19<25:44, 1.64it/s]
15%|ββ | 440/2975 [08:20<25:40, 1.65it/s]
15%|ββ | 440/2975 [08:20<25:40, 1.65it/s]
15%|ββ | 441/2975 [08:20<25:39, 1.65it/s]
15%|ββ | 442/2975 [08:21<25:37, 1.65it/s]
15%|ββ | 443/2975 [08:21<25:38, 1.65it/s]
15%|ββ | 444/2975 [08:22<25:37, 1.65it/s]
15%|ββ | 445/2975 [08:23<25:36, 1.65it/s]
15%|ββ | 446/2975 [08:23<25:35, 1.65it/s]
15%|ββ | 447/2975 [08:24<25:35, 1.65it/s]
15%|ββ | 448/2975 [08:24<25:35, 1.65it/s]
15%|ββ | 449/2975 [08:25<25:32, 1.65it/s]
15%|ββ | 450/2975 [08:26<25:30, 1.65it/s]
15%|ββ |
| 0: {'loss': 0.7731, 'grad_norm': 0.7671720022789881, 'learning_rate': 1e-05, 'epoch': 0.15} |
| 0: | 450/2975 [08:26<25:30, 1.65it/s]
15%|ββ | 451/2975 [08:26<25:29, 1.65it/s]
15%|ββ | 452/2975 [08:27<25:31, 1.65it/s]
15%|ββ | 453/2975 [08:27<25:29, 1.65it/s]
15%|ββ | 454/2975 [08:28<25:26, 1.65it/s]
15%|ββ | 455/2975 [08:29<25:26, 1.65it/s]
15%|ββ | 456/2975 [08:29<25:28, 1.65it/s]
15%|ββ | 457/2975 [08:30<25:29, 1.65it/s]
15%|ββ | 458/2975 [08:31<25:28, 1.65it/s]
15%|ββ | 459/2975 [08:31<25:29, 1.65it/s]
15%|ββ | 460/2975 [08:32<25:30, 1.64it/s]
15%|ββ | 460/2975 [08:32<25:30, 1.64it/s]
15%|ββ | 461/2975 [08:32<25:30, 1.64it/s]
16%|ββ | 462/2975 [08:33<25:26, 1.65it/s]
16%|ββ | 463/2975 [08:34<25:23, 1.65it/s]
16%|ββ | 464/2975 [08:34<25:21, 1.65it/s]
16%|ββ | 465/2975 [08:35<25:18, 1.65it/s]
16%|ββ | 466/2975 [08:35<25:15, 1.66it/ |
| 0: {'loss': 0.7697, 'grad_norm': 0.7672119462561652, 'learning_rate': 1e-05, 'epoch': 0.16} |
| 0: {'loss': 0.7506, 'grad_norm': 0.7215765122797166, 'learning_rate': 1e-05, 'epoch': 0.16} |
| 0: s]
16%|ββ | 467/2975 [08:36<25:14, 1.66it/s]
16%|ββ | 468/2975 [08:37<25:24, 1.64it/s]
16%|ββ | 469/2975 [08:37<25:20, 1.65it/s]
16%|ββ | 470/2975 [08:38<25:17, 1.65it/s]
16%|ββ | 470/2975 [08:38<25:17, 1.65it/s]
16%|ββ | 471/2975 [08:38<25:16, 1.65it/s]
16%|ββ | 472/2975 [08:39<25:15, 1.65it/s]
16%|ββ | 473/2975 [08:40<25:15, 1.65it/s]
16%|ββ | 474/2975 [08:40<25:16, 1.65it/s]
16%|ββ | 475/2975 [08:41<25:16, 1.65it/s]
16%|ββ | 476/2975 [08:41<25:17, 1.65it/s]
16%|ββ | 477/2975 [08:42<25:16, 1.65it/s]
16%|ββ | 478/2975 [08:43<25:13, 1.65it/s]
16%|ββ | 479/2975 [08:43<25:16, 1.65it/s]
16%|ββ | 480/2975 [08:44<25:18, 1.64it/s]
16%|ββ | 480/2975 [08:44<25:18, 1.64it/s]
16%|ββ | 481/2975 [08:44< |
| 0: {'loss': 0.773, 'grad_norm': 0.775884303997141, 'learning_rate': 1e-05, 'epoch': 0.16} |
| 0: 25:18, 1.64it/s]
16%|ββ | 482/2975 [08:45<25:18, 1.64it/s]
16%|ββ | 483/2975 [08:46<25:16, 1.64it/s]
16%|ββ | 484/2975 [08:46<25:12, 1.65it/s]
16%|ββ | 485/2975 [08:47<25:08, 1.65it/s]
16%|ββ | 486/2975 [08:48<25:05, 1.65it/s]
16%|ββ | 487/2975 [08:48<25:04, 1.65it/s]
16%|ββ | 488/2975 [08:49<25:02, 1.66it/s]
16%|ββ | 489/2975 [08:49<25:02, 1.65it/s]
16%|ββ | 490/2975 [08:50<25:01, 1.65it/s]
16%|ββ | 490/2975 [08:50<25:01, 1.65it/s]
17%|ββ | 491/2975 [08:51<25:00, 1.66it/s]
17%|ββ | 492/2975 [08:51<24:59, 1.66it/s]
17%|ββ | 493/2975 [08:52<24:59, 1.66it/s]
17%|ββ | 494/2975 [08:52<24:59, 1.65it/s]
17%|ββ | 495/2975 [08:53<24:58, 1.65it/s]
17%|ββ | 496/2975 [08:54<24:59, 1.65it/s]
17%|ββ | 497/2975 [08:54<24:58, 1.65it/s]
17%|ββ |
| 0: {'loss': 0.7643, 'grad_norm': 0.754784845884012, 'learning_rate': 1e-05, 'epoch': 0.17} |
| 0: {'loss': 0.7816, 'grad_norm': 0.7816116256031539, 'learning_rate': 1e-05, 'epoch': 0.17} |
| 0: | 498/2975 [08:55<24:56, 1.66it/s]
17%|ββ | 499/2975 [08:55<25:28, 1.62it/s]
17%|ββ | 500/2975 [08:56<25:23, 1.62it/s]
17%|ββ | 500/2975 [08:56<25:23, 1.62it/s]
17%|ββ | 501/2975 [08:57<25:44, 1.60it/s]
17%|ββ | 502/2975 [08:57<25:29, 1.62it/s]
17%|ββ | 503/2975 [08:58<25:18, 1.63it/s]
17%|ββ | 504/2975 [08:58<25:09, 1.64it/s]
17%|ββ | 505/2975 [08:59<25:28, 1.62it/s]
17%|ββ | 506/2975 [09:00<25:23, 1.62it/s]
17%|ββ | 507/2975 [09:00<25:18, 1.63it/s]
17%|ββ | 508/2975 [09:01<25:12, 1.63it/s]
17%|ββ | 509/2975 [09:02<25:06, 1.64it/s]
17%|ββ | 510/2975 [09:02<25:02, 1.64it/s]
17%|ββ | 510/2975 [09:02<25:02, 1.64it/s]
17%|ββ | 511/2975 [09:03<25:00, 1.64it/s]
17%|ββ | 512/2975 [09:03<24:59, 1.64it/s]
17% |
| 0: {'loss': 0.7499, 'grad_norm': 0.7626195725481466, 'learning_rate': 1e-05, 'epoch': 0.17} |
| 0: |ββ | 513/2975 [09:04<24:55, 1.65it/s]
17%|ββ | 514/2975 [09:05<24:52, 1.65it/s]
17%|ββ | 515/2975 [09:05<24:50, 1.65it/s]
17%|ββ | 516/2975 [09:06<24:47, 1.65it/s]
17%|ββ | 517/2975 [09:06<24:45, 1.65it/s]
17%|ββ | 518/2975 [09:07<24:44, 1.66it/s]
17%|ββ | 519/2975 [09:08<24:42, 1.66it/s]
17%|ββ | 520/2975 [09:08<24:41, 1.66it/s]
17%|ββ | 520/2975 [09:08<24:41, 1.66it/s]
18%|ββ | 521/2975 [09:09<24:41, 1.66it/s]
18%|ββ | 522/2975 [09:09<24:41, 1.66it/s]
18%|ββ | 523/2975 [09:10<24:41, 1.65it/s]
18%|ββ | 524/2975 [09:11<24:40, 1.66it/s]
18%|ββ | 525/2975 [09:11<24:38, 1.66it/s]
18%|ββ | 526/2975 [09:12<24:37, 1.66it/s]
18%|ββ | 527/2975 [09:12<24:36, 1.66it/s]
18%|ββ | 528/2975 [09:13<24:34, 1.66it/s]
18%|ββ | 529/2975 [09:14<24:3 |
| 0: {'loss': 0.7507, 'grad_norm': 0.7542918658420218, 'learning_rate': 1e-05, 'epoch': 0.18} |
| 0: {'loss': 0.767, 'grad_norm': 0.7877507030671485, 'learning_rate': 1e-05, 'epoch': 0.18} |
| 0: 5, 1.66it/s]
18%|ββ | 530/2975 [09:14<24:33, 1.66it/s]
18%|ββ | 530/2975 [09:14<24:33, 1.66it/s]
18%|ββ | 531/2975 [09:15<24:35, 1.66it/s]
18%|ββ | 532/2975 [09:15<24:33, 1.66it/s]
18%|ββ | 533/2975 [09:16<24:32, 1.66it/s]
18%|ββ | 534/2975 [09:17<25:02, 1.62it/s]
18%|ββ | 535/2975 [09:17<24:56, 1.63it/s]
18%|ββ | 536/2975 [09:18<24:52, 1.63it/s]
18%|ββ | 537/2975 [09:19<24:48, 1.64it/s]
18%|ββ | 538/2975 [09:19<24:45, 1.64it/s]
18%|ββ | 539/2975 [09:20<24:42, 1.64it/s]
18%|ββ | 540/2975 [09:20<24:42, 1.64it/s]
18%|ββ | 540/2975 [09:20<24:42, 1.64it/s]
18%|ββ | 541/2975 [09:21<24:42, 1.64it/s]
18%|ββ | 542/2975 [09:22<24:38, 1.65it/s]
18%|ββ | 543/2975 [09:22<24:35, 1.65it/s]
18%|ββ | 544/2 |
| 0: {'loss': 0.7464, 'grad_norm': 0.7620889079841904, 'learning_rate': 1e-05, 'epoch': 0.18} |
| 0: 975 [09:23<24:34, 1.65it/s]
18%|ββ | 545/2975 [09:23<24:34, 1.65it/s]
18%|ββ | 546/2975 [09:24<24:33, 1.65it/s]
18%|ββ | 547/2975 [09:25<24:31, 1.65it/s]
18%|ββ | 548/2975 [09:25<24:28, 1.65it/s]
18%|ββ | 549/2975 [09:26<24:51, 1.63it/s]
18%|ββ | 550/2975 [09:26<24:47, 1.63it/s]
18%|ββ | 550/2975 [09:26<24:47, 1.63it/s]
19%|ββ | 551/2975 [09:27<24:42, 1.63it/s]
19%|ββ | 552/2975 [09:28<24:39, 1.64it/s]
19%|ββ | 553/2975 [09:28<24:52, 1.62it/s]
19%|ββ | 554/2975 [09:29<25:12, 1.60it/s]
19%|ββ | 555/2975 [09:30<24:57, 1.62it/s]
19%|ββ | 556/2975 [09:30<24:50, 1.62it/s]
19%|ββ | 557/2975 [09:31<24:45, 1.63it/s]
19%|ββ | 558/2975 [09:31<24:41, 1.63it/s]
19%|ββ | 559/2975 [09:32<24:36, 1.64it/s]
19%|ββ | 560/2975 [09:33<24:34, 1.64it/s]
|
| 0: {'loss': 0.7649, 'grad_norm': 0.8003034932416693, 'learning_rate': 1e-05, 'epoch': 0.19} |
| 0: {'loss': 0.7572, 'grad_norm': 0.7129834167859199, 'learning_rate': 1e-05, 'epoch': 0.19} |
| 0:
19%|ββ | 560/2975 [09:33<24:34, 1.64it/s]
19%|ββ | 561/2975 [09:33<24:37, 1.63it/s]
19%|ββ | 562/2975 [09:34<24:35, 1.63it/s]
19%|ββ | 563/2975 [09:34<24:30, 1.64it/s]
19%|ββ | 564/2975 [09:35<24:25, 1.64it/s]
19%|ββ | 565/2975 [09:36<24:23, 1.65it/s]
19%|ββ | 566/2975 [09:36<24:31, 1.64it/s]
19%|ββ | 567/2975 [09:37<24:27, 1.64it/s]
19%|ββ | 568/2975 [09:37<24:23, 1.64it/s]
19%|ββ | 569/2975 [09:38<24:20, 1.65it/s]
19%|ββ | 570/2975 [09:39<24:15, 1.65it/s]
19%|ββ | 570/2975 [09:39<24:15, 1.65it/s]
19%|ββ | 571/2975 [09:39<24:15, 1.65it/s]
19%|ββ | 572/2975 [09:40<24:14, 1.65it/s]
19%|ββ | 573/2975 [09:40<24:12, 1.65it/s]
19%|ββ | 574/2975 [09:41<24:09, 1.66it/s]
19%|ββ | 575/2975 [09:42<24:07, 1.6 |
| 0: {'loss': 0.7671, 'grad_norm': 0.7883813838017206, 'learning_rate': 1e-05, 'epoch': 0.19} |
| 0: {'loss': 0.7548, 'grad_norm': 0.8388848664403168, 'learning_rate': 1e-05, 'epoch': 0.2} |
| 0: 6it/s]
19%|ββ | 576/2975 [09:42<24:07, 1.66it/s]
19%|ββ | 577/2975 [09:43<24:08, 1.66it/s]
19%|ββ | 578/2975 [09:43<24:08, 1.65it/s]
19%|ββ | 579/2975 [09:44<24:11, 1.65it/s]
19%|ββ | 580/2975 [09:45<24:11, 1.65it/s]
19%|ββ | 580/2975 [09:45<24:11, 1.65it/s]
20%|ββ | 581/2975 [09:45<24:11, 1.65it/s]
20%|ββ | 582/2975 [09:46<24:13, 1.65it/s]
20%|ββ | 583/2975 [09:47<24:13, 1.65it/s]
20%|ββ | 584/2975 [09:47<24:12, 1.65it/s]
20%|ββ | 585/2975 [09:48<24:10, 1.65it/s]
20%|ββ | 586/2975 [09:48<24:09, 1.65it/s]
20%|ββ | 587/2975 [09:49<24:09, 1.65it/s]
20%|ββ | 588/2975 [09:50<24:07, 1.65it/s]
20%|ββ | 589/2975 [09:50<24:20, 1.63it/s]
20%|ββ | 590/2975 [09:51<24:15, 1.64it/s]
20%|ββ | 590/2975 [09 |
| 0: {'loss': 0.755, 'grad_norm': 0.7507527819754194, 'learning_rate': 1e-05, 'epoch': 0.2} |
| 0: :51<24:15, 1.64it/s]
20%|ββ | 591/2975 [09:51<24:14, 1.64it/s]
20%|ββ | 592/2975 [09:52<24:09, 1.64it/s]
20%|ββ | 593/2975 [09:53<24:06, 1.65it/s]
20%|ββ | 594/2975 [09:53<24:04, 1.65it/s]
20%|ββ | 595/2975 [09:54<24:05, 1.65it/s]
20%|ββ | 596/2975 [09:56<47:29, 1.20s/it]
20%|ββ | 597/2975 [09:57<40:28, 1.02s/it]
20%|ββ | 598/2975 [09:58<35:33, 1.11it/s]
20%|ββ | 599/2975 [09:58<32:05, 1.23it/s]
20%|ββ | 600/2975 [09:59<29:40, 1.33it/s]
20%|ββ | 600/2975 [09:59<29:40, 1.33it/s]
20%|ββ | 601/2975 [09:59<28:00, 1.41it/s]
20%|ββ | 602/2975 [10:00<26:49, 1.47it/s]
20%|ββ | 603/2975 [10:01<25:59, 1.52it/s]
20%|ββ | 604/2975 [10:01<25:23, 1.56it/s]
20%|ββ | 605/2975 [10:02<24:59, 1.58it/s]
20%|ββ | 606/2975 [10:02<24:39, 1.60it/s]
20%|ββ |
| 0: {'loss': 0.7547, 'grad_norm': 0.8164223892765191, 'learning_rate': 1e-05, 'epoch': 0.2} |
| 0: {'loss': 0.7495, 'grad_norm': 0.7622680122421158, 'learning_rate': 1e-05, 'epoch': 0.21} |
| 0: | 607/2975 [10:03<24:27, 1.61it/s]
20%|ββ | 608/2975 [10:04<24:18, 1.62it/s]
20%|ββ | 609/2975 [10:04<24:10, 1.63it/s]
21%|ββ | 610/2975 [10:05<24:05, 1.64it/s]
21%|ββ | 610/2975 [10:05<24:05, 1.64it/s]
21%|ββ | 611/2975 [10:06<24:02, 1.64it/s]
21%|ββ | 612/2975 [10:06<23:59, 1.64it/s]
21%|ββ | 613/2975 [10:07<23:56, 1.64it/s]
21%|ββ | 614/2975 [10:07<23:54, 1.65it/s]
21%|ββ | 615/2975 [10:08<23:51, 1.65it/s]
21%|ββ | 616/2975 [10:09<23:48, 1.65it/s]
21%|ββ | 617/2975 [10:09<23:48, 1.65it/s]
21%|ββ | 618/2975 [10:10<23:45, 1.65it/s]
21%|ββ | 619/2975 [10:10<23:44, 1.65it/s]
21%|ββ | 620/2975 [10:11<23:45, 1.65it/s]
21%|ββ | 620/2975 [10:11<23:45, 1.65it/s]
21%|ββ | 621/2975 [10:12<23:48, 1.65it/s]
|
| 0: {'loss': 0.7315, 'grad_norm': 0.7531375977537063, 'learning_rate': 1e-05, 'epoch': 0.21} |
| 0: 21%|ββ | 622/2975 [10:12<23:50, 1.65it/s]
21%|ββ | 623/2975 [10:13<24:07, 1.62it/s]
21%|ββ | 624/2975 [10:13<24:03, 1.63it/s]
21%|ββ | 625/2975 [10:14<24:14, 1.62it/s]
21%|ββ | 626/2975 [10:15<24:03, 1.63it/s]
21%|ββ | 627/2975 [10:15<23:55, 1.64it/s]
21%|ββ | 628/2975 [10:16<23:50, 1.64it/s]
21%|ββ | 629/2975 [10:16<23:46, 1.64it/s]
21%|ββ | 630/2975 [10:17<23:42, 1.65it/s]
21%|ββ | 630/2975 [10:17<23:42, 1.65it/s]
21%|ββ | 631/2975 [10:18<23:43, 1.65it/s]
21%|ββ | 632/2975 [10:18<23:42, 1.65it/s]
21%|βββ | 633/2975 [10:19<23:40, 1.65it/s]
21%|βββ | 634/2975 [10:19<23:41, 1.65it/s]
21%|βββ | 635/2975 [10:20<23:40, 1.65it/s]
21%|βββ | 636/2975 [10:21<23:39, 1.65it/s]
21%|βββ | 637/2975 [10:21<23:39, 1.65it/s]
21%|βββ | 638/ |
| 0: {'loss': 0.7475, 'grad_norm': 0.744626566048428, 'learning_rate': 1e-05, 'epoch': 0.22} |
| 0: {'loss': 0.7474, 'grad_norm': 0.842353042268259, 'learning_rate': 1e-05, 'epoch': 0.22} |
| 0: 2975 [10:22<23:40, 1.65it/s]
21%|βββ | 639/2975 [10:23<23:43, 1.64it/s]
22%|βββ | 640/2975 [10:23<23:38, 1.65it/s]
22%|βββ | 640/2975 [10:23<23:38, 1.65it/s]
22%|βββ | 641/2975 [10:24<23:39, 1.64it/s]
22%|βββ | 642/2975 [10:24<23:37, 1.65it/s]
22%|βββ | 643/2975 [10:25<23:50, 1.63it/s]
22%|βββ | 644/2975 [10:26<23:42, 1.64it/s]
22%|βββ | 645/2975 [10:26<23:38, 1.64it/s]
22%|βββ | 646/2975 [10:27<23:35, 1.65it/s]
22%|βββ | 647/2975 [10:27<23:32, 1.65it/s]
22%|βββ | 648/2975 [10:28<23:28, 1.65it/s]
22%|βββ | 649/2975 [10:29<23:27, 1.65it/s]
22%|βββ | 650/2975 [10:29<23:44, 1.63it/s]
22%|βββ | 650/2975 [10:29<23:44, 1.63it/s]
22%|βββ | 651/2975 [10:30<23:37, 1.64it/s]
22%|βββ | 652/2975 [10 |
| 0: {'loss': 0.7385, 'grad_norm': 0.7784520565914156, 'learning_rate': 1e-05, 'epoch': 0.22} |
| 0: :30<23:31, 1.65it/s]
22%|βββ | 653/2975 [10:31<23:29, 1.65it/s]
22%|βββ | 654/2975 [10:32<23:30, 1.65it/s]
22%|βββ | 655/2975 [10:32<23:31, 1.64it/s]
22%|βββ | 656/2975 [10:33<23:35, 1.64it/s]
22%|βββ | 657/2975 [10:34<23:36, 1.64it/s]
22%|βββ | 658/2975 [10:34<23:35, 1.64it/s]
22%|βββ | 659/2975 [10:35<23:32, 1.64it/s]
22%|βββ | 660/2975 [10:35<23:30, 1.64it/s]
22%|βββ | 660/2975 [10:35<23:30, 1.64it/s]
22%|βββ | 661/2975 [10:36<23:27, 1.64it/s]
22%|βββ | 662/2975 [10:37<23:22, 1.65it/s]
22%|βββ | 663/2975 [10:37<23:19, 1.65it/s]
22%|βββ | 664/2975 [10:38<23:17, 1.65it/s]
22%|βββ | 665/2975 [10:38<23:16, 1.65it/s]
22%|βββ | 666/2975 [10:39<23:20, 1.65it/s]
22%|βββ | 667/2975 [10:40<23:20, 1.65it/s]
22%|βββ | 668/2975 [10:40 |
| 0: {'loss': 0.7611, 'grad_norm': 0.7492523467278513, 'learning_rate': 1e-05, 'epoch': 0.23} |
| 0: {'loss': 0.7608, 'grad_norm': 0.7700016384605249, 'learning_rate': 1e-05, 'epoch': 0.23} |
| 0: <23:20, 1.65it/s]
22%|βββ | 669/2975 [10:41<23:19, 1.65it/s]
23%|βββ | 670/2975 [10:41<23:20, 1.65it/s]
23%|βββ | 670/2975 [10:41<23:20, 1.65it/s]
23%|βββ | 671/2975 [10:42<23:21, 1.64it/s]
23%|βββ | 672/2975 [10:43<23:21, 1.64it/s]
23%|βββ | 673/2975 [10:43<23:23, 1.64it/s]
23%|βββ | 674/2975 [10:44<23:22, 1.64it/s]
23%|βββ | 675/2975 [10:44<23:21, 1.64it/s]
23%|βββ | 676/2975 [10:45<23:19, 1.64it/s]
23%|βββ | 677/2975 [10:46<23:23, 1.64it/s]
23%|βββ | 678/2975 [10:46<23:41, 1.62it/s]
23%|βββ | 679/2975 [10:47<23:35, 1.62it/s]
23%|βββ | 680/2975 [10:48<23:30, 1.63it/s]
23%|βββ | 680/2975 [10:48<23:30, 1.63it/s]
23%|βββ | 681/2975 [10:48<23:28, 1.63it/s]
23%|βββ | 682/2975 [10:49<23:26, |
| 0: {'loss': 0.7482, 'grad_norm': 0.7657815124970616, 'learning_rate': 1e-05, 'epoch': 0.23} |
| 0: 1.63it/s]
23%|βββ | 683/2975 [10:49<23:23, 1.63it/s]
23%|βββ | 684/2975 [10:50<23:19, 1.64it/s]
23%|βββ | 685/2975 [10:51<23:17, 1.64it/s]
23%|βββ | 686/2975 [10:51<23:14, 1.64it/s]
23%|βββ | 687/2975 [10:52<23:13, 1.64it/s]
23%|βββ | 688/2975 [10:52<23:13, 1.64it/s]
23%|βββ | 689/2975 [10:53<23:14, 1.64it/s]
23%|βββ | 690/2975 [10:54<23:17, 1.64it/s]
23%|βββ | 690/2975 [10:54<23:17, 1.64it/s]
23%|βββ | 691/2975 [10:54<23:17, 1.63it/s]
23%|βββ | 692/2975 [10:55<23:32, 1.62it/s]
23%|βββ | 693/2975 [10:55<23:24, 1.62it/s]
23%|βββ | 694/2975 [10:56<23:22, 1.63it/s]
23%|βββ | 695/2975 [10:57<23:15, 1.63it/s]
23%|βββ | 696/2975 [10:57<23:11, 1.64it/s]
23%|βββ | 697/2975 [10:58<23:10, 1.64it/s]
23%|βββ | 698/2975 [10:59<23:10, 1. |
| 0: {'loss': 0.7855, 'grad_norm': 0.7712451026387313, 'learning_rate': 1e-05, 'epoch': 0.24} |
| 0: {'loss': 0.7508, 'grad_norm': 0.7790654777806169, 'learning_rate': 1e-05, 'epoch': 0.24} |
| 0: 64it/s]
23%|βββ | 699/2975 [10:59<23:12, 1.63it/s]
24%|βββ | 700/2975 [11:00<23:13, 1.63it/s]
24%|βββ | 700/2975 [11:00<23:13, 1.63it/s]
24%|βββ | 701/2975 [11:00<23:13, 1.63it/s]
24%|βββ | 702/2975 [11:01<23:09, 1.64it/s]
24%|βββ | 703/2975 [11:02<23:05, 1.64it/s]
24%|βββ | 704/2975 [11:02<23:02, 1.64it/s]
24%|βββ | 705/2975 [11:03<23:01, 1.64it/s]
24%|βββ | 706/2975 [11:03<23:02, 1.64it/s]
24%|βββ | 707/2975 [11:04<23:01, 1.64it/s]
24%|βββ | 708/2975 [11:05<23:01, 1.64it/s]
24%|βββ | 709/2975 [11:05<23:00, 1.64it/s]
24%|βββ | 710/2975 [11:06<23:01, 1.64it/s]
24%|βββ | 710/2975 [11:06<23:01, 1.64it/s]
24%|βββ | 711/2975 [11:06<23:04, 1.64it/s]
24%|βββ | 712/2975 [11:07<23:05, 1.63it/s]
|
| 0: {'loss': 0.742, 'grad_norm': 0.7958963482387478, 'learning_rate': 1e-05, 'epoch': 0.24} |
| 0: 24%|βββ | 713/2975 [11:08<23:05, 1.63it/s]
24%|βββ | 714/2975 [11:08<23:02, 1.64it/s]
24%|βββ | 715/2975 [11:09<23:00, 1.64it/s]
24%|βββ | 716/2975 [11:10<22:58, 1.64it/s]
24%|βββ | 717/2975 [11:10<22:57, 1.64it/s]
24%|βββ | 718/2975 [11:11<22:54, 1.64it/s]
24%|βββ | 719/2975 [11:11<22:53, 1.64it/s]
24%|βββ | 720/2975 [11:12<22:49, 1.65it/s]
24%|βββ | 720/2975 [11:12<22:49, 1.65it/s]
24%|βββ | 721/2975 [11:13<22:51, 1.64it/s]
24%|βββ | 722/2975 [11:13<22:50, 1.64it/s]
24%|βββ | 723/2975 [11:14<22:47, 1.65it/s]
24%|βββ | 724/2975 [11:14<22:46, 1.65it/s]
24%|βββ | 725/2975 [11:15<22:44, 1.65it/s]
24%|βββ | 726/2975 [11:16<22:48, 1.64it/s]
24%|βββ | 727/2975 [11:16<23:01, 1.63it/s]
24%|βββ | 728/2975 [11:17<22:56, 1.63it/s]
25 |
| 0: {'loss': 0.7683, 'grad_norm': 0.7584229493647255, 'learning_rate': 1e-05, 'epoch': 0.25} |
| 0: {'loss': 0.7302, 'grad_norm': 0.7917073972184041, 'learning_rate': 1e-05, 'epoch': 0.25} |
| 0: %|βββ | 729/2975 [11:17<22:51, 1.64it/s]
25%|βββ | 730/2975 [11:18<22:48, 1.64it/s]
25%|βββ | 730/2975 [11:18<22:48, 1.64it/s]
25%|βββ | 731/2975 [11:19<22:45, 1.64it/s]
25%|βββ | 732/2975 [11:19<22:43, 1.65it/s]
25%|βββ | 733/2975 [11:20<22:40, 1.65it/s]
25%|βββ | 734/2975 [11:20<22:38, 1.65it/s]
25%|βββ | 735/2975 [11:21<22:38, 1.65it/s]
25%|βββ | 736/2975 [11:22<22:36, 1.65it/s]
25%|βββ | 737/2975 [11:22<22:37, 1.65it/s]
25%|βββ | 738/2975 [11:23<22:36, 1.65it/s]
25%|βββ | 739/2975 [11:23<22:35, 1.65it/s]
25%|βββ | 740/2975 [11:24<22:35, 1.65it/s]
25%|βββ | 740/2975 [11:24<22:35, 1.65it/s]
25%|βββ | 741/2975 [11:25<22:38, 1.64it/s]
25%|βββ | 742/2975 [11:25<22:43, 1.64it/s]
25%|ββ |
| 0: {'loss': 0.7357, 'grad_norm': 0.7683530057363408, 'learning_rate': 1e-05, 'epoch': 0.25} |
| 0: β | 743/2975 [11:26<22:43, 1.64it/s]
25%|βββ | 744/2975 [11:27<22:43, 1.64it/s]
25%|βββ | 745/2975 [11:27<22:43, 1.64it/s]
25%|βββ | 746/2975 [11:28<22:40, 1.64it/s]
25%|βββ | 747/2975 [11:28<22:38, 1.64it/s]
25%|βββ | 748/2975 [11:29<22:35, 1.64it/s]
25%|βββ | 749/2975 [11:30<22:35, 1.64it/s]
25%|βββ | 750/2975 [11:30<22:34, 1.64it/s]
25%|βββ | 750/2975 [11:30<22:34, 1.64it/s]
25%|βββ | 751/2975 [11:31<22:33, 1.64it/s]
25%|βββ | 752/2975 [11:31<22:29, 1.65it/s]
25%|βββ | 753/2975 [11:32<22:26, 1.65it/s]
25%|βββ | 754/2975 [11:33<23:07, 1.60it/s]
25%|βββ | 755/2975 [11:33<22:58, 1.61it/s]
25%|βββ | 756/2975 [11:34<22:50, 1.62it/s]
25%|βββ | 757/2975 [11:35<22:43, 1.63it/s]
25%|βββ | 758/2975 [11:35<22:38, 1.63it/s]
26%|βββ |
| 0: {'loss': 0.7429, 'grad_norm': 0.732943318594699, 'learning_rate': 1e-05, 'epoch': 0.26} |
| 0: {'loss': 0.745, 'grad_norm': 0.7598610714362597, 'learning_rate': 1e-05, 'epoch': 0.26} |
| 0: | 759/2975 [11:36<22:34, 1.64it/s]
26%|βββ | 760/2975 [11:36<22:31, 1.64it/s]
26%|βββ | 760/2975 [11:36<22:31, 1.64it/s]
26%|βββ | 761/2975 [11:37<22:30, 1.64it/s]
26%|βββ | 762/2975 [11:38<22:28, 1.64it/s]
26%|βββ | 763/2975 [11:38<22:28, 1.64it/s]
26%|βββ | 764/2975 [11:39<22:29, 1.64it/s]
26%|βββ | 765/2975 [11:39<22:31, 1.64it/s]
26%|βββ | 766/2975 [11:40<22:32, 1.63it/s]
26%|βββ | 767/2975 [11:41<22:31, 1.63it/s]
26%|βββ | 768/2975 [11:41<22:29, 1.64it/s]
26%|βββ | 769/2975 [11:42<22:27, 1.64it/s]
26%|βββ | 770/2975 [11:42<22:24, 1.64it/s]
26%|βββ | 770/2975 [11:42<22:24, 1.64it/s]
26%|βββ | 771/2975 [11:43<22:23, 1.64it/s]
26%|βββ | 772/2975 [11:44<22:21, 1.64it/s]
26%|βββ | |
| 0: {'loss': 0.7358, 'grad_norm': 0.7804717351373442, 'learning_rate': 1e-05, 'epoch': 0.26} |
| 0: 773/2975 [11:44<22:17, 1.65it/s]
26%|βββ | 774/2975 [11:45<22:13, 1.65it/s]
26%|βββ | 775/2975 [11:45<22:10, 1.65it/s]
26%|βββ | 776/2975 [11:46<22:10, 1.65it/s]
26%|βββ | 777/2975 [11:47<22:10, 1.65it/s]
26%|βββ | 778/2975 [11:47<22:09, 1.65it/s]
26%|βββ | 779/2975 [11:48<22:09, 1.65it/s]
26%|βββ | 780/2975 [11:49<22:09, 1.65it/s]
26%|βββ | 780/2975 [11:49<22:09, 1.65it/s]
26%|βββ | 781/2975 [11:49<22:12, 1.65it/s]
26%|βββ | 782/2975 [11:50<23:18, 1.57it/s]
26%|βββ | 783/2975 [11:50<23:02, 1.59it/s]
26%|βββ | 784/2975 [11:51<22:48, 1.60it/s]
26%|βββ | 785/2975 [11:52<22:39, 1.61it/s]
26%|βββ | 786/2975 [11:52<22:31, 1.62it/s]
26%|βββ | 787/2975 [11:53<22:25, 1.63it/s]
26%|βββ | 788/2975 [11:53<22:22, 1.63it/s]
27%|βββ | 78 |
| 0: {'loss': 0.7658, 'grad_norm': 0.749539480965823, 'learning_rate': 1e-05, 'epoch': 0.27} |
| 0: {'loss': 0.7477, 'grad_norm': 0.7999449395678967, 'learning_rate': 1e-05, 'epoch': 0.27} |
| 0: 9/2975 [11:54<22:17, 1.63it/s]
27%|βββ | 790/2975 [11:55<22:14, 1.64it/s]
27%|βββ | 790/2975 [11:55<22:14, 1.64it/s]
27%|βββ | 791/2975 [11:55<22:11, 1.64it/s]
27%|βββ | 792/2975 [11:56<22:07, 1.64it/s]
27%|βββ | 793/2975 [11:57<22:02, 1.65it/s]
27%|βββ | 794/2975 [11:57<21:59, 1.65it/s]
27%|βββ | 795/2975 [11:58<21:56, 1.66it/s]
27%|βββ | 796/2975 [11:58<21:55, 1.66it/s]
27%|βββ | 797/2975 [11:59<21:56, 1.65it/s]
27%|βββ | 798/2975 [12:00<21:58, 1.65it/s]
27%|βββ | 799/2975 [12:00<22:01, 1.65it/s]
27%|βββ | 800/2975 [12:01<22:04, 1.64it/s]
27%|βββ | 800/2975 [12:01<22:04, 1.64it/s]
27%|βββ | 801/2975 [12:01<22:09, 1.64it/s]
27%|βββ | 802/2975 [12:02<22:14, 1.63it/s]
27%|βββ | 803/2975 [ |
| 0: {'loss': 0.7294, 'grad_norm': 0.7329522590368147, 'learning_rate': 1e-05, 'epoch': 0.27} |
| 0: 12:03<22:12, 1.63it/s]
27%|βββ | 804/2975 [12:03<22:10, 1.63it/s]
27%|βββ | 805/2975 [12:04<22:07, 1.63it/s]
27%|βββ | 806/2975 [12:04<22:04, 1.64it/s]
27%|βββ | 807/2975 [12:05<22:01, 1.64it/s]
27%|βββ | 808/2975 [12:06<22:00, 1.64it/s]
27%|βββ | 809/2975 [12:06<21:58, 1.64it/s]
27%|βββ | 810/2975 [12:07<21:57, 1.64it/s]
27%|βββ | 810/2975 [12:07<21:57, 1.64it/s]
27%|βββ | 811/2975 [12:07<21:57, 1.64it/s]
27%|βββ | 812/2975 [12:08<21:58, 1.64it/s]
27%|βββ | 813/2975 [12:09<22:00, 1.64it/s]
27%|βββ | 814/2975 [12:09<22:00, 1.64it/s]
27%|βββ | 815/2975 [12:10<22:01, 1.63it/s]
27%|βββ | 816/2975 [12:11<21:59, 1.64it/s]
27%|βββ | 817/2975 [12:11<21:56, 1.64it/s]
27%|βββ | 818/2975 [12:12<21:54, 1.64it/s]
28%|βββ | 819/2975 [12: |
| 0: {'loss': 0.7372, 'grad_norm': 0.7607255267529802, 'learning_rate': 1e-05, 'epoch': 0.28} |
| 0: {'loss': 0.7557, 'grad_norm': 0.7377892901126538, 'learning_rate': 1e-05, 'epoch': 0.28} |
| 0: 12<21:53, 1.64it/s]
28%|βββ | 820/2975 [12:13<21:49, 1.65it/s]
28%|βββ | 820/2975 [12:13<21:49, 1.65it/s]
28%|βββ | 821/2975 [12:14<21:49, 1.64it/s]
28%|βββ | 822/2975 [12:14<21:49, 1.64it/s]
28%|βββ | 823/2975 [12:15<21:48, 1.64it/s]
28%|βββ | 824/2975 [12:15<21:49, 1.64it/s]
28%|βββ | 825/2975 [12:16<21:50, 1.64it/s]
28%|βββ | 826/2975 [12:17<21:52, 1.64it/s]
28%|βββ | 827/2975 [12:17<21:51, 1.64it/s]
28%|βββ | 828/2975 [12:18<21:51, 1.64it/s]
28%|βββ | 829/2975 [12:18<21:50, 1.64it/s]
28%|βββ | 830/2975 [12:19<21:48, 1.64it/s]
28%|βββ | 830/2975 [12:19<21:48, 1.64it/s]
28%|βββ | 831/2975 [12:20<21:47, 1.64it/s]
28%|βββ | 832/2975 [12:20<21:48, 1.64it/s]
28%|βββ | 833/2975 [12:21<21:48 |
| 0: {'loss': 0.7693, 'grad_norm': 0.7531349345953301, 'learning_rate': 1e-05, 'epoch': 0.28} |
| 0: , 1.64it/s]
28%|βββ | 834/2975 [12:22<21:45, 1.64it/s]
28%|βββ | 835/2975 [12:22<21:42, 1.64it/s]
28%|βββ | 836/2975 [12:23<21:40, 1.64it/s]
28%|βββ | 837/2975 [12:23<21:41, 1.64it/s]
28%|βββ | 838/2975 [12:24<21:40, 1.64it/s]
28%|βββ | 839/2975 [12:25<21:37, 1.65it/s]
28%|βββ | 840/2975 [12:25<21:35, 1.65it/s]
28%|βββ | 840/2975 [12:25<21:35, 1.65it/s]
28%|βββ | 841/2975 [12:26<21:38, 1.64it/s]
28%|βββ | 842/2975 [12:26<21:38, 1.64it/s]
28%|βββ | 843/2975 [12:27<21:38, 1.64it/s]
28%|βββ | 844/2975 [12:28<21:36, 1.64it/s]
28%|βββ | 845/2975 [12:28<21:36, 1.64it/s]
28%|βββ | 846/2975 [12:29<21:35, 1.64it/s]
28%|βββ | 847/2975 [12:29<21:34, 1.64it/s]
29%|βββ | 848/2975 [12:30<21:33, 1.64it/s]
29%|βββ | 849/2975 [12:31<21:31, |
| 0: {'loss': 0.7344, 'grad_norm': 0.7340430827625565, 'learning_rate': 1e-05, 'epoch': 0.29} |
| 0: {'loss': 0.7447, 'grad_norm': 0.7838462726074723, 'learning_rate': 1e-05, 'epoch': 0.29} |
| 0: 1.65it/s]
29%|βββ | 850/2975 [12:31<21:30, 1.65it/s]
29%|βββ | 850/2975 [12:31<21:30, 1.65it/s]
29%|βββ | 851/2975 [12:32<21:30, 1.65it/s]
29%|βββ | 852/2975 [12:33<23:24, 1.51it/s]
29%|βββ | 853/2975 [12:33<23:23, 1.51it/s]
29%|βββ | 854/2975 [12:34<22:51, 1.55it/s]
29%|βββ | 855/2975 [12:35<22:26, 1.57it/s]
29%|βββ | 856/2975 [12:35<22:09, 1.59it/s]
29%|βββ | 857/2975 [12:36<21:58, 1.61it/s]
29%|βββ | 858/2975 [12:36<21:49, 1.62it/s]
29%|βββ | 859/2975 [12:37<21:41, 1.63it/s]
29%|βββ | 860/2975 [12:38<21:38, 1.63it/s]
29%|βββ | 860/2975 [12:38<21:38, 1.63it/s]
29%|βββ | 861/2975 [12:38<21:35, 1.63it/s]
29%|βββ | 862/2975 [12:39<21:33, 1.63it/s]
29%|βββ | 863/2975 [12:39<22:02, 1.60it/s |
| 0: {'loss': 0.739, 'grad_norm': 0.8091788262795472, 'learning_rate': 1e-05, 'epoch': 0.29} |
| 0: ]
29%|βββ | 864/2975 [12:40<21:55, 1.61it/s]
29%|βββ | 865/2975 [12:41<21:45, 1.62it/s]
29%|βββ | 866/2975 [12:41<21:39, 1.62it/s]
29%|βββ | 867/2975 [12:42<21:57, 1.60it/s]
29%|βββ | 868/2975 [12:43<21:48, 1.61it/s]
29%|βββ | 869/2975 [12:43<22:06, 1.59it/s]
29%|βββ | 870/2975 [12:44<21:58, 1.60it/s]
29%|βββ | 870/2975 [12:44<21:58, 1.60it/s]
29%|βββ | 871/2975 [12:44<21:49, 1.61it/s]
29%|βββ | 872/2975 [12:45<21:38, 1.62it/s]
29%|βββ | 873/2975 [12:46<21:30, 1.63it/s]
29%|βββ | 874/2975 [12:46<21:26, 1.63it/s]
29%|βββ | 875/2975 [12:47<21:26, 1.63it/s]
29%|βββ | 876/2975 [12:47<21:24, 1.63it/s]
29%|βββ | 877/2975 [12:48<21:21, 1.64it/s]
30%|βββ | 878/2975 [12:49<21:19, 1.64it/s]
30%|βββ | 879/2975 [12:49<21:17, 1.64it/s]
|
| 0: {'loss': 0.7427, 'grad_norm': 0.7583526965895802, 'learning_rate': 1e-05, 'epoch': 0.3} |
| 0: {'loss': 0.7427, 'grad_norm': 0.7926390640915221, 'learning_rate': 1e-05, 'epoch': 0.3} |
| 0: 30%|βββ | 880/2975 [12:50<21:17, 1.64it/s]
30%|βββ | 880/2975 [12:50<21:17, 1.64it/s]
30%|βββ | 881/2975 [12:51<21:16, 1.64it/s]
30%|βββ | 882/2975 [12:51<21:13, 1.64it/s]
30%|βββ | 883/2975 [12:52<21:16, 1.64it/s]
30%|βββ | 884/2975 [12:52<22:18, 1.56it/s]
30%|βββ | 885/2975 [12:53<21:56, 1.59it/s]
30%|βββ | 886/2975 [12:54<21:46, 1.60it/s]
30%|βββ | 887/2975 [12:54<21:37, 1.61it/s]
30%|βββ | 888/2975 [12:55<21:30, 1.62it/s]
30%|βββ | 889/2975 [12:55<21:24, 1.62it/s]
30%|βββ | 890/2975 [12:56<21:19, 1.63it/s]
30%|βββ | 890/2975 [12:56<21:19, 1.63it/s]
30%|βββ | 891/2975 [12:57<21:15, 1.63it/s]
30%|βββ | 892/2975 [12:57<21:13, 1.64it/s]
30%|βββ | 893/2975 [12:58<21:11, 1.64it/s]
30%|βοΏ½ |
| 0: {'loss': 0.762, 'grad_norm': 0.7419955033128212, 'learning_rate': 1e-05, 'epoch': 0.3} |
| 0: οΏ½οΏ½β | 894/2975 [12:59<21:10, 1.64it/s]
30%|βββ | 895/2975 [12:59<21:09, 1.64it/s]
30%|βββ | 896/2975 [13:00<21:07, 1.64it/s]
30%|βββ | 897/2975 [13:00<21:07, 1.64it/s]
30%|βββ | 898/2975 [13:01<21:07, 1.64it/s]
30%|βββ | 899/2975 [13:02<21:05, 1.64it/s]
30%|βββ | 900/2975 [13:02<21:41, 1.59it/s]
30%|βββ | 900/2975 [13:02<21:41, 1.59it/s]
30%|βββ | 901/2975 [13:03<21:29, 1.61it/s]
30%|βββ | 902/2975 [13:03<21:20, 1.62it/s]
30%|βββ | 903/2975 [13:04<21:14, 1.63it/s]
30%|βββ | 904/2975 [13:05<21:10, 1.63it/s]
30%|βββ | 905/2975 [13:05<21:05, 1.64it/s]
30%|βββ | 906/2975 [13:06<21:05, 1.64it/s]
30%|βββ | 907/2975 [13:07<21:02, 1.64it/s]
31%|βββ | 908/2975 [13:07<21:01, 1.64it/s]
31%|βββ | 909/2975 [13:08<20:59, 1.64it/s]
31%|ββοΏ½ |
| 0: {'loss': 0.7173, 'grad_norm': 0.7312540923740953, 'learning_rate': 1e-05, 'epoch': 0.31} |
| 0: {'loss': 0.7546, 'grad_norm': 0.8241647849928874, 'learning_rate': 1e-05, 'epoch': 0.31} |
| 0: οΏ½οΏ½ | 910/2975 [13:08<20:58, 1.64it/s]
31%|βββ | 910/2975 [13:08<20:58, 1.64it/s]
31%|βββ | 911/2975 [13:09<20:57, 1.64it/s]
31%|βββ | 912/2975 [13:10<20:55, 1.64it/s]
31%|βββ | 913/2975 [13:10<20:55, 1.64it/s]
31%|βββ | 914/2975 [13:11<20:55, 1.64it/s]
31%|βββ | 915/2975 [13:11<20:54, 1.64it/s]
31%|βββ | 916/2975 [13:12<20:51, 1.64it/s]
31%|βββ | 917/2975 [13:13<20:48, 1.65it/s]
31%|βββ | 918/2975 [13:13<20:48, 1.65it/s]
31%|βββ | 919/2975 [13:14<20:48, 1.65it/s]
31%|βββ | 920/2975 [13:14<20:47, 1.65it/s]
31%|βββ | 920/2975 [13:14<20:47, 1.65it/s]
31%|βββ | 921/2975 [13:15<20:45, 1.65it/s]
31%|βββ | 922/2975 [13:16<20:43, 1.65it/s]
31%|βββ | 923/2975 [13:16<20:42, 1.65it/s]
31%|βββ |
| 0: {'loss': 0.7578, 'grad_norm': 0.742815794399239, 'learning_rate': 1e-05, 'epoch': 0.31} |
| 0: | 924/2975 [13:17<20:43, 1.65it/s]
31%|βββ | 925/2975 [13:17<20:43, 1.65it/s]
31%|βββ | 926/2975 [13:18<20:45, 1.64it/s]
31%|βββ | 927/2975 [13:19<20:46, 1.64it/s]
31%|βββ | 928/2975 [13:19<20:47, 1.64it/s]
31%|βββ | 929/2975 [13:20<20:48, 1.64it/s]
31%|ββββ | 930/2975 [13:21<20:47, 1.64it/s]
31%|ββββ | 930/2975 [13:21<20:47, 1.64it/s]
31%|ββββ | 931/2975 [13:21<20:47, 1.64it/s]
31%|ββββ | 932/2975 [13:22<20:43, 1.64it/s]
31%|ββββ | 933/2975 [13:22<21:10, 1.61it/s]
31%|ββββ | 934/2975 [13:23<21:01, 1.62it/s]
31%|ββββ | 935/2975 [13:24<20:56, 1.62it/s]
31%|ββββ | 936/2975 [13:24<20:53, 1.63it/s]
31%|ββββ | 937/2975 [13:25<20:50, 1.63it/s]
32%|ββββ | 938/2975 [13:25<20:49, 1.63it/s]
32%|ββββ | 939/2975 [13:26<20:45, 1.64it/s]
|
| 0: {'loss': 0.7538, 'grad_norm': 0.8110139545092059, 'learning_rate': 1e-05, 'epoch': 0.32} |
| 0: {'loss': 0.742, 'grad_norm': 0.7619714082364365, 'learning_rate': 1e-05, 'epoch': 0.32} |
| 0: 32%|ββββ | 940/2975 [13:27<20:41, 1.64it/s]
32%|ββββ | 940/2975 [13:27<20:41, 1.64it/s]
32%|ββββ | 941/2975 [13:27<20:40, 1.64it/s]
32%|ββββ | 942/2975 [13:28<20:38, 1.64it/s]
32%|ββββ | 943/2975 [13:28<20:37, 1.64it/s]
32%|ββββ | 944/2975 [13:29<20:38, 1.64it/s]
32%|ββββ | 945/2975 [13:30<20:38, 1.64it/s]
32%|ββββ | 946/2975 [13:30<20:39, 1.64it/s]
32%|ββββ | 947/2975 [13:31<20:38, 1.64it/s]
32%|ββββ | 948/2975 [13:32<20:39, 1.64it/s]
32%|ββββ | 949/2975 [13:32<20:39, 1.63it/s]
32%|ββββ | 950/2975 [13:33<20:39, 1.63it/s]
32%|ββββ | 950/2975 [13:33<20:39, 1.63it/s]
32%|ββββ | 951/2975 [13:33<20:39, 1.63it/s]
32%|ββββ | 952/2975 [13:34<20:36, 1.64it/s]
32%|ββββ | 953/2975 [1 |
| 0: {'loss': 0.7647, 'grad_norm': 0.7569373817314637, 'learning_rate': 1e-05, 'epoch': 0.32} |
| 0: 3:35<20:36, 1.64it/s]
32%|ββββ | 954/2975 [13:35<20:32, 1.64it/s]
32%|ββββ | 955/2975 [13:36<20:29, 1.64it/s]
32%|ββββ | 956/2975 [13:36<20:27, 1.65it/s]
32%|ββββ | 957/2975 [13:37<20:27, 1.64it/s]
32%|ββββ | 958/2975 [13:38<20:26, 1.65it/s]
32%|ββββ | 959/2975 [13:38<20:25, 1.64it/s]
32%|ββββ | 960/2975 [13:39<20:26, 1.64it/s]
32%|ββββ | 960/2975 [13:39<20:26, 1.64it/s]
32%|ββββ | 961/2975 [13:39<20:28, 1.64it/s]
32%|ββββ | 962/2975 [13:40<20:30, 1.64it/s]
32%|ββββ | 963/2975 [13:41<20:32, 1.63it/s]
32%|ββββ | 964/2975 [13:41<20:30, 1.63it/s]
32%|ββββ | 965/2975 [13:42<20:27, 1.64it/s]
32%|ββββ | 966/2975 [13:43<20:24, 1.64it/s]
33%|ββββ | 967/2975 [13:43<20:21, 1.64it/s]
33%|ββββ | 968/2975 [13:44<20:19, 1.65it/s]
33%| |
| 0: {'loss': 0.7504, 'grad_norm': 0.7201689920498835, 'learning_rate': 1e-05, 'epoch': 0.33} |
| 0: {'loss': 0.7495, 'grad_norm': 0.7858981946322333, 'learning_rate': 1e-05, 'epoch': 0.33} |
| 0: ββββ | 969/2975 [13:44<20:19, 1.65it/s]
33%|ββββ | 970/2975 [13:45<20:18, 1.65it/s]
33%|ββββ | 970/2975 [13:45<20:18, 1.65it/s]
33%|ββββ | 971/2975 [13:46<20:18, 1.64it/s]
33%|ββββ | 972/2975 [13:46<20:18, 1.64it/s]
33%|ββββ | 973/2975 [13:47<20:18, 1.64it/s]
33%|ββββ | 974/2975 [13:47<20:19, 1.64it/s]
33%|ββββ | 975/2975 [13:48<20:20, 1.64it/s]
33%|ββββ | 976/2975 [13:49<20:19, 1.64it/s]
33%|ββββ | 977/2975 [13:49<20:20, 1.64it/s]
33%|ββββ | 978/2975 [13:50<20:22, 1.63it/s]
33%|ββββ | 979/2975 [13:50<20:20, 1.64it/s]
33%|ββββ | 980/2975 [13:51<20:16, 1.64it/s]
33%|ββββ | 980/2975 [13:51<20:16, 1.64it/s]
33%|ββββ | 981/2975 [13:52<20:17, 1.64it/s]
33%|ββββ | 982/2975 [13:52 |
| 0: {'loss': 0.7351, 'grad_norm': 0.8083503181543557, 'learning_rate': 1e-05, 'epoch': 0.33} |
| 0: <20:13, 1.64it/s]
33%|ββββ | 983/2975 [13:53<20:11, 1.64it/s]
33%|ββββ | 984/2975 [13:53<20:11, 1.64it/s]
33%|ββββ | 985/2975 [13:54<20:10, 1.64it/s]
33%|ββββ | 986/2975 [13:55<20:07, 1.65it/s]
33%|ββββ | 987/2975 [13:55<20:06, 1.65it/s]
33%|ββββ | 988/2975 [13:56<20:06, 1.65it/s]
33%|ββββ | 989/2975 [13:57<20:07, 1.65it/s]
33%|ββββ | 990/2975 [13:57<20:09, 1.64it/s]
33%|ββββ | 990/2975 [13:57<20:09, 1.64it/s]
33%|ββββ | 991/2975 [13:58<20:11, 1.64it/s]
33%|ββββ | 992/2975 [13:58<20:11, 1.64it/s]
33%|ββββ | 993/2975 [13:59<20:10, 1.64it/s]
33%|ββββ | 994/2975 [14:00<20:08, 1.64it/s]
33%|ββββ | 995/2975 [14:00<20:06, 1.64it/s]
33%|ββββ | 996/2975 [14:01<20:03, 1.64it/s]
34%|ββββ | 997/2975 [14:01<20:00, 1.65it/s]
34%|βοΏ½ |
| 0: {'loss': 0.7259, 'grad_norm': 0.7450275326316524, 'learning_rate': 1e-05, 'epoch': 0.34} |
| 0: {'loss': 0.7221, 'grad_norm': 0.7648787973503811, 'learning_rate': 1e-05, 'epoch': 0.34} |
| 0: οΏ½οΏ½ββ | 998/2975 [14:02<19:58, 1.65it/s]
34%|ββββ | 999/2975 [14:03<19:57, 1.65it/s]
34%|ββββ | 1000/2975 [14:03<19:56, 1.65it/s]
34%|ββββ | 1000/2975 [14:03<19:56, 1.65it/s]
34%|ββββ | 1001/2975 [14:04<19:59, 1.65it/s]
34%|ββββ | 1002/2975 [14:04<20:00, 1.64it/s]
34%|ββββ | 1003/2975 [14:05<19:58, 1.65it/s]
34%|ββββ | 1004/2975 [14:06<19:55, 1.65it/s]
34%|ββββ | 1005/2975 [14:06<19:54, 1.65it/s]
34%|ββββ | 1006/2975 [14:07<19:53, 1.65it/s]
34%|ββββ | 1007/2975 [14:07<19:53, 1.65it/s]
34%|ββββ | 1008/2975 [14:08<19:52, 1.65it/s]
34%|ββββ | 1009/2975 [14:09<19:51, 1.65it/s]
34%|ββββ | 1010/2975 [14:09<19:53, 1.65it/s]
34%|ββββ | 1010/2975 [14:09<19:53, 1.65it/s]
34%|ββββ | 1011 |
| 0: {'loss': 0.7375, 'grad_norm': 0.8533932684671076, 'learning_rate': 1e-05, 'epoch': 0.34} |
| 0: /2975 [14:10<19:56, 1.64it/s]
34%|ββββ | 1012/2975 [14:10<19:54, 1.64it/s]
34%|ββββ | 1013/2975 [14:11<19:53, 1.64it/s]
34%|ββββ | 1014/2975 [14:12<19:54, 1.64it/s]
34%|ββββ | 1015/2975 [14:12<19:55, 1.64it/s]
34%|ββββ | 1016/2975 [14:13<19:54, 1.64it/s]
34%|ββββ | 1017/2975 [14:14<19:54, 1.64it/s]
34%|ββββ | 1018/2975 [14:14<19:55, 1.64it/s]
34%|ββββ | 1019/2975 [14:15<19:54, 1.64it/s]
34%|ββββ | 1020/2975 [14:15<19:54, 1.64it/s]
34%|ββββ | 1020/2975 [14:15<19:54, 1.64it/s]
34%|ββββ | 1021/2975 [14:16<19:56, 1.63it/s]
34%|ββββ | 1022/2975 [14:17<19:55, 1.63it/s]
34%|ββββ | 1023/2975 [14:17<19:54, 1.63it/s]
34%|ββββ | 1024/2975 [14:18<19:53, 1.64it/s]
34%|ββββ | 1025/2975 [14:18<19:51, 1.64it/s]
34%|ββββ | 1026/2975 [14:1 |
| 0: {'loss': 0.7551, 'grad_norm': 0.7901510519026402, 'learning_rate': 1e-05, 'epoch': 0.35} |
| 0: 9<19:48, 1.64it/s]
35%|ββββ | 1027/2975 [14:20<19:46, 1.64it/s]
35%|ββββ | 1028/2975 [14:20<19:44, 1.64it/s]
35%|ββββ | 1029/2975 [14:21<19:45, 1.64it/s]
35%|ββββ | 1030/2975 [14:21<19:44, 1.64it/s]
35%|ββββ | 1030/2975 [14:21<19:44, 1.64it/s]
35%|ββββ | 1031/2975 [14:22<19:44, 1.64it/s]
35%|ββββ | 1032/2975 [14:23<19:44, 1.64it/s]
35%|ββββ | 1033/2975 [14:23<19:45, 1.64it/s]
35%|ββββ | 1034/2975 [14:24<19:45, 1.64it/s]
35%|ββββ | 1035/2975 [14:25<19:43, 1.64it/s]
35%|ββββ | 1036/2975 [14:25<19:41, 1.64it/s]
35%|ββββ | 1037/2975 [14:26<19:41, 1.64it/s]
35%|ββββ | 1038/2975 [14:26<19:40, 1.64it/s]
35%|ββββ | 1039/2975 [14:27<19:38, 1.64it/s]
35%|ββββ | 1040/2975 [14:28<19:36, 1.65it/s]
|
| 0: {'loss': 0.7455, 'grad_norm': 0.7253433698188633, 'learning_rate': 1e-05, 'epoch': 0.35} |
| 0: {'loss': 0.7301, 'grad_norm': 0.7578554379488228, 'learning_rate': 1e-05, 'epoch': 0.35} |
| 0:
35%|ββββ | 1040/2975 [14:28<19:36, 1.65it/s]
35%|ββββ | 1041/2975 [14:28<19:34, 1.65it/s]
35%|ββββ | 1042/2975 [14:29<19:33, 1.65it/s]
35%|ββββ | 1043/2975 [14:29<19:33, 1.65it/s]
35%|ββββ | 1044/2975 [14:30<19:31, 1.65it/s]
35%|ββββ | 1045/2975 [14:31<19:40, 1.64it/s]
35%|ββββ | 1046/2975 [14:31<19:36, 1.64it/s]
35%|ββββ | 1047/2975 [14:32<19:34, 1.64it/s]
35%|ββββ | 1048/2975 [14:32<19:34, 1.64it/s]
35%|ββββ | 1049/2975 [14:33<19:34, 1.64it/s]
35%|ββββ | 1050/2975 [14:34<19:32, 1.64it/s]
35%|ββββ | 1050/2975 [14:34<19:32, 1.64it/s]
35%|ββββ | 1051/2975 [14:34<19:31, 1.64it/s]
35%|ββββ | 1052/2975 [14:35<19:29, 1.64it/s]
35%|ββββ | 1053/2975 [14:35<19:26, 1.65it/s]
35%|ββββ | 1054/2975 [14:36<19:25, 1.65it/s]
35%|βοΏ½ |
| 0: {'loss': 0.7492, 'grad_norm': 0.7766259324764115, 'learning_rate': 1e-05, 'epoch': 0.36} |
| 0: οΏ½οΏ½ββ | 1055/2975 [14:37<19:24, 1.65it/s]
35%|ββββ | 1056/2975 [14:37<21:22, 1.50it/s]
36%|ββββ | 1057/2975 [14:38<20:45, 1.54it/s]
36%|ββββ | 1058/2975 [14:39<20:18, 1.57it/s]
36%|ββββ | 1059/2975 [14:39<19:59, 1.60it/s]
36%|ββββ | 1060/2975 [14:40<19:47, 1.61it/s]
36%|ββββ | 1060/2975 [14:40<19:47, 1.61it/s]
36%|ββββ | 1061/2975 [14:41<19:39, 1.62it/s]
36%|ββββ | 1062/2975 [14:41<19:35, 1.63it/s]
36%|ββββ | 1063/2975 [14:42<19:33, 1.63it/s]
36%|ββββ | 1064/2975 [14:42<19:29, 1.63it/s]
36%|ββββ | 1065/2975 [14:43<19:28, 1.63it/s]
36%|ββββ | 1066/2975 [14:44<19:26, 1.64it/s]
36%|ββββ | 1067/2975 [14:44<19:25, 1.64it/s]
36%|ββββ | 1068/2975 [14:45<19:22, 1.64it/s]
36%|ββββ | 1069/2975 [14:45<19:20, 1.64it/s]
36%|ββββ |
| 0: {'loss': 0.7338, 'grad_norm': 0.8065374744219792, 'learning_rate': 1e-05, 'epoch': 0.36} |
| 0: {'loss': 0.7436, 'grad_norm': 0.7526874737814758, 'learning_rate': 1e-05, 'epoch': 0.36} |
| 0: | 1070/2975 [14:46<19:17, 1.65it/s]
36%|ββββ | 1070/2975 [14:46<19:17, 1.65it/s]
36%|ββββ | 1071/2975 [14:47<19:17, 1.65it/s]
36%|ββββ | 1072/2975 [14:47<19:15, 1.65it/s]
36%|ββββ | 1073/2975 [14:48<19:15, 1.65it/s]
36%|ββββ | 1074/2975 [14:48<19:14, 1.65it/s]
36%|ββββ | 1075/2975 [14:49<19:12, 1.65it/s]
36%|ββββ | 1076/2975 [14:50<19:10, 1.65it/s]
36%|ββββ | 1077/2975 [14:50<19:10, 1.65it/s]
36%|ββββ | 1078/2975 [14:51<19:09, 1.65it/s]
36%|ββββ | 1079/2975 [14:51<19:12, 1.65it/s]
36%|ββββ | 1080/2975 [14:52<19:14, 1.64it/s]
36%|ββββ | 1080/2975 [14:52<19:14, 1.64it/s]
36%|ββββ | 1081/2975 [14:53<19:14, 1.64it/s]
36%|ββββ | 1082/2975 [14:53<19:15, 1.64it/s]
36%|ββββ | 1083/2975 [14 |
| 0: {'loss': 0.7384, 'grad_norm': 0.8016191676712313, 'learning_rate': 1e-05, 'epoch': 0.37} |
| 0: :54<19:15, 1.64it/s]
36%|ββββ | 1084/2975 [14:55<19:11, 1.64it/s]
36%|ββββ | 1085/2975 [14:55<19:09, 1.64it/s]
37%|ββββ | 1086/2975 [14:56<19:06, 1.65it/s]
37%|ββββ | 1087/2975 [14:56<19:04, 1.65it/s]
37%|ββββ | 1088/2975 [14:57<19:02, 1.65it/s]
37%|ββββ | 1089/2975 [14:58<19:00, 1.65it/s]
37%|ββββ | 1090/2975 [14:58<18:58, 1.66it/s]
37%|ββββ | 1090/2975 [14:58<18:58, 1.66it/s]
37%|ββββ | 1091/2975 [14:59<18:57, 1.66it/s]
37%|ββββ | 1092/2975 [14:59<18:58, 1.65it/s]
37%|ββββ | 1093/2975 [15:00<18:57, 1.65it/s]
37%|ββββ | 1094/2975 [15:01<18:55, 1.66it/s]
37%|ββββ | 1095/2975 [15:01<18:54, 1.66it/s]
37%|ββββ | 1096/2975 [15:02<18:53, 1.66it/s]
37%|ββββ | 1097/2975 [15:02<18:53, 1.66it/s]
37%|ββββ | 1098/2975 [15:03<18:53, |
| 0: {'loss': 0.7236, 'grad_norm': 0.7541213322859073, 'learning_rate': 1e-05, 'epoch': 0.37} |
| 0: {'loss': 0.7439, 'grad_norm': 0.767711525718183, 'learning_rate': 1e-05, 'epoch': 0.37} |
| 0: 1.66it/s]
37%|ββββ | 1099/2975 [15:04<18:54, 1.65it/s]
37%|ββββ | 1100/2975 [15:04<18:55, 1.65it/s]
37%|ββββ | 1100/2975 [15:04<18:55, 1.65it/s]
37%|ββββ | 1101/2975 [15:05<18:56, 1.65it/s]
37%|ββββ | 1102/2975 [15:05<18:56, 1.65it/s]
37%|ββββ | 1103/2975 [15:06<18:57, 1.65it/s]
37%|ββββ | 1104/2975 [15:07<18:57, 1.65it/s]
37%|ββββ | 1105/2975 [15:07<18:59, 1.64it/s]
37%|ββββ | 1106/2975 [15:08<18:59, 1.64it/s]
37%|ββββ | 1107/2975 [15:08<18:57, 1.64it/s]
37%|ββββ | 1108/2975 [15:09<18:55, 1.64it/s]
37%|ββββ | 1109/2975 [15:10<18:53, 1.65it/s]
37%|ββββ | 1110/2975 [15:10<18:53, 1.65it/s]
37%|ββββ | 1110/2975 [15:10<18:53, 1.65it/s]
37%|ββββ | 1111/2975 [15:11<18:54, 1.64it/s]
37%|οΏ½ |
| 0: {'loss': 0.7662, 'grad_norm': 0.7476627932421003, 'learning_rate': 1e-05, 'epoch': 0.38} |
| 0: οΏ½βββ | 1112/2975 [15:11<18:52, 1.65it/s]
37%|ββββ | 1113/2975 [15:12<18:50, 1.65it/s]
37%|ββββ | 1114/2975 [15:13<18:47, 1.65it/s]
37%|ββββ | 1115/2975 [15:13<18:46, 1.65it/s]
38%|ββββ | 1116/2975 [15:14<18:45, 1.65it/s]
38%|ββββ | 1117/2975 [15:15<18:43, 1.65it/s]
38%|ββββ | 1118/2975 [15:15<18:42, 1.65it/s]
38%|ββββ | 1119/2975 [15:16<18:40, 1.66it/s]
38%|ββββ | 1120/2975 [15:16<18:39, 1.66it/s]
38%|ββββ | 1120/2975 [15:16<18:39, 1.66it/s]
38%|ββββ | 1121/2975 [15:17<18:39, 1.66it/s]
38%|ββββ | 1122/2975 [15:18<18:37, 1.66it/s]
38%|ββββ | 1123/2975 [15:18<18:37, 1.66it/s]
38%|ββββ | 1124/2975 [15:19<18:36, 1.66it/s]
38%|ββββ | 1125/2975 [15:19<18:35, 1.66it/s]
38%|ββββ | 1126/2975 [15:20<18:35, 1.66it/s]
38%|ββββ |
| 0: {'loss': 0.7154, 'grad_norm': 0.7440026914213119, 'learning_rate': 1e-05, 'epoch': 0.38} |
| 0: {'loss': 0.7241, 'grad_norm': 0.7597698137132717, 'learning_rate': 1e-05, 'epoch': 0.38} |
| 0: | 1127/2975 [15:21<18:34, 1.66it/s]
38%|ββββ | 1128/2975 [15:21<18:34, 1.66it/s]
38%|ββββ | 1129/2975 [15:22<18:32, 1.66it/s]
38%|ββββ | 1130/2975 [15:22<18:32, 1.66it/s]
38%|ββββ | 1130/2975 [15:22<18:32, 1.66it/s]
38%|ββββ | 1131/2975 [15:23<18:33, 1.66it/s]
38%|ββββ | 1132/2975 [15:24<18:33, 1.65it/s]
38%|ββββ | 1133/2975 [15:24<18:33, 1.65it/s]
38%|ββββ | 1134/2975 [15:25<18:34, 1.65it/s]
38%|ββββ | 1135/2975 [15:25<18:56, 1.62it/s]
38%|ββββ | 1136/2975 [15:26<18:50, 1.63it/s]
38%|ββββ | 1137/2975 [15:27<18:44, 1.63it/s]
38%|ββββ | 1138/2975 [15:27<18:40, 1.64it/s]
38%|ββββ | 1139/2975 [15:28<18:38, 1.64it/s]
38%|ββββ | 1140/2975 [15:28<18:35, 1.65it/s]
38%|ββββ | 1140/2975 [ |
| 0: {'loss': 0.7401, 'grad_norm': 0.7943983699535594, 'learning_rate': 1e-05, 'epoch': 0.39} |
| 0: 15:28<18:35, 1.65it/s]
38%|ββββ | 1141/2975 [15:29<18:36, 1.64it/s]
38%|ββββ | 1142/2975 [15:30<18:33, 1.65it/s]
38%|ββββ | 1143/2975 [15:30<18:31, 1.65it/s]
38%|ββββ | 1144/2975 [15:31<18:29, 1.65it/s]
38%|ββββ | 1145/2975 [15:31<18:27, 1.65it/s]
39%|ββββ | 1146/2975 [15:32<18:26, 1.65it/s]
39%|ββββ | 1147/2975 [15:33<18:25, 1.65it/s]
39%|ββββ | 1148/2975 [15:33<18:25, 1.65it/s]
39%|ββββ | 1149/2975 [15:34<18:26, 1.65it/s]
39%|ββββ | 1150/2975 [15:35<18:28, 1.65it/s]
39%|ββββ | 1150/2975 [15:35<18:28, 1.65it/s]
39%|ββββ | 1151/2975 [15:35<18:31, 1.64it/s]
39%|ββββ | 1152/2975 [15:36<18:38, 1.63it/s]
39%|ββββ | 1153/2975 [15:36<18:35, 1.63it/s]
39%|ββββ | 1154/2975 [15:37<19:00, 1.60it/s]
39%|ββββ | 1155/2975 [15:38<18:49 |
| 0: {'loss': 0.741, 'grad_norm': 0.755713448663527, 'learning_rate': 1e-05, 'epoch': 0.39} |
| 0: , 1.61it/s]
39%|ββββ | 1156/2975 [15:38<18:41, 1.62it/s]
39%|ββββ | 1157/2975 [15:39<18:37, 1.63it/s]
39%|ββββ | 1158/2975 [15:39<18:34, 1.63it/s]
39%|ββββ | 1159/2975 [15:40<18:32, 1.63it/s]
39%|ββββ | 1160/2975 [15:41<18:29, 1.64it/s]
39%|ββββ | 1160/2975 [15:41<18:29, 1.64it/s]
39%|ββββ | 1161/2975 [15:41<18:27, 1.64it/s]
39%|ββββ | 1162/2975 [15:42<18:24, 1.64it/s]
39%|ββββ | 1163/2975 [15:42<18:23, 1.64it/s]
39%|ββββ | 1164/2975 [15:43<18:22, 1.64it/s]
39%|ββββ | 1165/2975 [15:44<18:20, 1.64it/s]
39%|ββββ | 1166/2975 [15:44<18:19, 1.65it/s]
39%|ββββ | 1167/2975 [15:45<18:17, 1.65it/s]
39%|ββββ | 1168/2975 [15:46<18:15, 1.65it/s]
39%|ββββ | 1169/2975 [15:46<18:16, 1.65it/s]
39%|ββββ | 1170/2975 [15:47<18:15, 1.65it/s |
| 0: {'loss': 0.7348, 'grad_norm': 0.763836705857911, 'learning_rate': 1e-05, 'epoch': 0.39} |
| 0: {'loss': 0.7486, 'grad_norm': 0.7260225673474382, 'learning_rate': 1e-05, 'epoch': 0.4} |
| 0: ]
39%|ββββ | 1170/2975 [15:47<18:15, 1.65it/s]
39%|ββββ | 1171/2975 [15:47<18:17, 1.64it/s]
39%|ββββ | 1172/2975 [15:48<18:17, 1.64it/s]
39%|ββββ | 1173/2975 [15:49<18:15, 1.65it/s]
39%|ββββ | 1174/2975 [15:49<18:16, 1.64it/s]
39%|ββββ | 1175/2975 [15:50<18:16, 1.64it/s]
40%|ββββ | 1176/2975 [15:50<18:16, 1.64it/s]
40%|ββββ | 1177/2975 [15:51<18:14, 1.64it/s]
40%|ββββ | 1178/2975 [15:52<18:13, 1.64it/s]
40%|ββββ | 1179/2975 [15:52<18:12, 1.64it/s]
40%|ββββ | 1180/2975 [15:53<18:11, 1.64it/s]
40%|ββββ | 1180/2975 [15:53<18:11, 1.64it/s]
40%|ββββ | 1181/2975 [15:53<18:10, 1.64it/s]
40%|ββββ | 1182/2975 [15:54<18:11, 1.64it/s]
40%|ββββ | 1183/2975 [15:55<18:09, 1.64it/s]
40%|βββοΏ½ |
| 0: {'loss': 0.7411, 'grad_norm': 0.7526510518592818, 'learning_rate': 1e-05, 'epoch': 0.4} |
| 0: οΏ½ | 1184/2975 [15:55<18:07, 1.65it/s]
40%|ββββ | 1185/2975 [15:56<18:04, 1.65it/s]
40%|ββββ | 1186/2975 [15:56<18:13, 1.64it/s]
40%|ββββ | 1187/2975 [15:57<18:25, 1.62it/s]
40%|ββββ | 1188/2975 [15:58<18:19, 1.63it/s]
40%|ββββ | 1189/2975 [15:58<18:15, 1.63it/s]
40%|ββββ | 1190/2975 [15:59<18:12, 1.63it/s]
40%|ββββ | 1190/2975 [15:59<18:12, 1.63it/s]
40%|ββββ | 1191/2975 [16:02<38:34, 1.30s/it]
40%|ββββ | 1192/2975 [16:02<32:25, 1.09s/it]
40%|ββββ | 1193/2975 [16:03<28:07, 1.06it/s]
40%|ββββ | 1194/2975 [16:04<25:06, 1.18it/s]
40%|ββββ | 1195/2975 [16:04<22:59, 1.29it/s]
40%|ββββ | 1196/2975 [16:05<21:29, 1.38it/s]
40%|ββββ | 1197/2975 [16:06<20:25, 1.45it/s]
40%|ββββ | 1198/2975 [16:06<19:57, 1.48it/s]
40%|ββββ | 11 |
| 0: {'loss': 0.708, 'grad_norm': 0.763605139895749, 'learning_rate': 1e-05, 'epoch': 0.4} |
| 0: {'loss': 0.7189, 'grad_norm': 0.743714237682584, 'learning_rate': 1e-05, 'epoch': 0.41} |
| 0: 99/2975 [16:07<19:23, 1.53it/s]
40%|ββββ | 1200/2975 [16:07<18:58, 1.56it/s]
40%|ββββ | 1200/2975 [16:07<18:58, 1.56it/s]
40%|ββββ | 1201/2975 [16:08<18:41, 1.58it/s]
40%|ββββ | 1202/2975 [16:09<18:26, 1.60it/s]
40%|ββββ | 1203/2975 [16:09<18:18, 1.61it/s]
40%|ββββ | 1204/2975 [16:10<18:12, 1.62it/s]
41%|ββββ | 1205/2975 [16:10<18:07, 1.63it/s]
41%|ββββ | 1206/2975 [16:11<18:01, 1.64it/s]
41%|ββββ | 1207/2975 [16:12<17:56, 1.64it/s]
41%|ββββ | 1208/2975 [16:12<17:52, 1.65it/s]
41%|ββββ | 1209/2975 [16:13<17:50, 1.65it/s]
41%|ββββ | 1210/2975 [16:13<17:50, 1.65it/s]
41%|ββββ | 1210/2975 [16:13<17:50, 1.65it/s]
41%|ββββ | 1211/2975 [16:14<17:49, 1.65it/s]
41%|ββββ | 1212/2975 [16:15<17: |
| 0: {'loss': 0.7215, 'grad_norm': 0.7671818270062326, 'learning_rate': 1e-05, 'epoch': 0.41} |
| 0: 46, 1.65it/s]
41%|ββββ | 1213/2975 [16:15<17:45, 1.65it/s]
41%|ββββ | 1214/2975 [16:16<17:45, 1.65it/s]
41%|ββββ | 1215/2975 [16:16<17:46, 1.65it/s]
41%|ββββ | 1216/2975 [16:17<17:46, 1.65it/s]
41%|ββββ | 1217/2975 [16:18<17:44, 1.65it/s]
41%|ββββ | 1218/2975 [16:18<17:44, 1.65it/s]
41%|ββββ | 1219/2975 [16:19<17:43, 1.65it/s]
41%|ββββ | 1220/2975 [16:19<17:43, 1.65it/s]
41%|ββββ | 1220/2975 [16:19<17:43, 1.65it/s]
41%|ββββ | 1221/2975 [16:20<17:44, 1.65it/s]
41%|ββββ | 1222/2975 [16:21<17:43, 1.65it/s]
41%|ββββ | 1223/2975 [16:21<17:42, 1.65it/s]
41%|ββββ | 1224/2975 [16:22<17:41, 1.65it/s]
41%|ββββ | 1225/2975 [16:23<17:40, 1.65it/s]
41%|ββββ | 1226/2975 [16:23<17:39, 1.65it/s]
41%|ββββ | 1227/2975 [16:24<17:43, 1.64it |
| 0: {'loss': 0.7156, 'grad_norm': 0.7977023629995006, 'learning_rate': 1e-05, 'epoch': 0.41} |
| 0: {'loss': 0.733, 'grad_norm': 0.7843304726970324, 'learning_rate': 1e-05, 'epoch': 0.42} |
| 0: /s]
41%|βββββ | 1228/2975 [16:24<17:43, 1.64it/s]
41%|βββββ | 1229/2975 [16:25<18:03, 1.61it/s]
41%|βββββ | 1230/2975 [16:26<17:52, 1.63it/s]
41%|βββββ | 1230/2975 [16:26<17:52, 1.63it/s]
41%|βββββ | 1231/2975 [16:26<17:46, 1.64it/s]
41%|βββββ | 1232/2975 [16:27<17:41, 1.64it/s]
41%|βββββ | 1233/2975 [16:27<17:38, 1.65it/s]
41%|βββββ | 1234/2975 [16:28<17:35, 1.65it/s]
42%|βββββ | 1235/2975 [16:29<17:32, 1.65it/s]
42%|βββββ | 1236/2975 [16:29<17:41, 1.64it/s]
42%|βββββ | 1237/2975 [16:30<17:38, 1.64it/s]
42%|βββββ | 1238/2975 [16:30<17:36, 1.64it/s]
42%|βββββ | 1239/2975 [16:31<17:35, 1.65it/s]
42%|βββββ | 1240/2975 [16:32<17:35, 1.64it/s]
42%|βββββ | 1240/2975 [16:32<17 |
| 0: {'loss': 0.7065, 'grad_norm': 0.7143588421452666, 'learning_rate': 1e-05, 'epoch': 0.42} |
| 0: :35, 1.64it/s]
42%|βββββ | 1241/2975 [16:32<17:37, 1.64it/s]
42%|βββββ | 1242/2975 [16:33<17:36, 1.64it/s]
42%|βββββ | 1243/2975 [16:34<18:26, 1.57it/s]
42%|βββββ | 1244/2975 [16:34<18:22, 1.57it/s]
42%|βββββ | 1245/2975 [16:35<18:05, 1.59it/s]
42%|βββββ | 1246/2975 [16:36<18:33, 1.55it/s]
42%|βββββ | 1247/2975 [16:36<18:27, 1.56it/s]
42%|βββββ | 1248/2975 [16:37<18:07, 1.59it/s]
42%|βββββ | 1249/2975 [16:37<17:53, 1.61it/s]
42%|βββββ | 1250/2975 [16:38<17:43, 1.62it/s]
42%|βββββ | 1250/2975 [16:38<17:43, 1.62it/s]
42%|βββββ | 1251/2975 [16:39<17:37, 1.63it/s]
42%|βββββ | 1252/2975 [16:39<17:32, 1.64it/s]
42%|βββββ | 1253/2975 [16:40<17:28, 1.64it/s]
42%|βββββ | 1254/2975 [16:40<17:25, 1.65it/s]
42%|βββββ |
| 0: {'loss': 0.726, 'grad_norm': 0.7248263242252937, 'learning_rate': 1e-05, 'epoch': 0.42} |
| 0: | 1255/2975 [16:41<17:23, 1.65it/s]
42%|βββββ | 1256/2975 [16:42<18:16, 1.57it/s]
42%|βββββ | 1257/2975 [16:42<18:32, 1.54it/s]
42%|βββββ | 1258/2975 [16:43<18:10, 1.57it/s]
42%|βββββ | 1259/2975 [16:44<17:56, 1.59it/s]
42%|βββββ | 1260/2975 [16:44<17:43, 1.61it/s]
42%|βββββ | 1260/2975 [16:44<17:43, 1.61it/s]
42%|βββββ | 1261/2975 [16:45<17:37, 1.62it/s]
42%|βββββ | 1262/2975 [16:45<17:30, 1.63it/s]
42%|βββββ | 1263/2975 [16:46<17:25, 1.64it/s]
42%|βββββ | 1264/2975 [16:47<17:20, 1.64it/s]
43%|βββββ | 1265/2975 [16:47<17:17, 1.65it/s]
43%|βββββ | 1266/2975 [16:48<17:15, 1.65it/s]
43%|βββββ | 1267/2975 [16:48<17:14, 1.65it/s]
43%|βββββ | 1268/2975 [16:49<17:16, 1.65it/s]
43%|βββββ | 1269/2975 [16:50<17:15, 1.65it/s]
43% |
| 0: {'loss': 0.719, 'grad_norm': 0.7889701499095937, 'learning_rate': 1e-05, 'epoch': 0.43} |
| 0: {'loss': 0.7321, 'grad_norm': 0.7621716701405825, 'learning_rate': 1e-05, 'epoch': 0.43} |
| 0: |βββββ | 1270/2975 [16:50<17:14, 1.65it/s]
43%|βββββ | 1270/2975 [16:50<17:14, 1.65it/s]
43%|βββββ | 1271/2975 [16:51<17:16, 1.64it/s]
43%|βββββ | 1272/2975 [16:51<17:15, 1.64it/s]
43%|βββββ | 1273/2975 [16:52<17:13, 1.65it/s]
43%|βββββ | 1274/2975 [16:53<17:12, 1.65it/s]
43%|βββββ | 1275/2975 [16:53<17:12, 1.65it/s]
43%|βββββ | 1276/2975 [16:54<17:12, 1.65it/s]
43%|βββββ | 1277/2975 [16:54<17:11, 1.65it/s]
43%|βββββ | 1278/2975 [16:55<17:10, 1.65it/s]
43%|βββββ | 1279/2975 [16:56<17:08, 1.65it/s]
43%|βββββ | 1280/2975 [16:56<17:07, 1.65it/s]
43%|βββββ | 1280/2975 [16:56<17:07, 1.65it/s]
43%|βββββ | 1281/2975 [16:57<17:06, 1.65it/s]
43%|βββββ | 1282/2975 [16:58<17:05, 1. |
| 0: {'loss': 0.74, 'grad_norm': 0.7827637856545197, 'learning_rate': 1e-05, 'epoch': 0.43} |
| 0: 65it/s]
43%|βββββ | 1283/2975 [16:58<17:04, 1.65it/s]
43%|βββββ | 1284/2975 [16:59<17:04, 1.65it/s]
43%|βββββ | 1285/2975 [16:59<17:07, 1.64it/s]
43%|βββββ | 1286/2975 [17:00<17:07, 1.64it/s]
43%|βββββ | 1287/2975 [17:01<17:05, 1.65it/s]
43%|βββββ | 1288/2975 [17:01<17:04, 1.65it/s]
43%|βββββ | 1289/2975 [17:02<17:03, 1.65it/s]
43%|βββββ | 1290/2975 [17:02<17:03, 1.65it/s]
43%|βββββ | 1290/2975 [17:02<17:03, 1.65it/s]
43%|βββββ | 1291/2975 [17:03<17:02, 1.65it/s]
43%|βββββ | 1292/2975 [17:04<17:01, 1.65it/s]
43%|βββββ | 1293/2975 [17:04<17:00, 1.65it/s]
43%|βββββ | 1294/2975 [17:05<16:58, 1.65it/s]
44%|βββββ | 1295/2975 [17:05<16:55, 1.65it/s]
44%|βββββ | 1296/2975 [17:06<16:54, 1.66it/s]
44%|βββββ | 1297/2 |
| 0: {'loss': 0.7228, 'grad_norm': 0.7484708286727815, 'learning_rate': 1e-05, 'epoch': 0.44} |
| 0: {'loss': 0.7199, 'grad_norm': 0.743631669253817, 'learning_rate': 1e-05, 'epoch': 0.44} |
| 0: 975 [17:07<16:52, 1.66it/s]
44%|βββββ | 1298/2975 [17:07<16:52, 1.66it/s]
44%|βββββ | 1299/2975 [17:08<16:51, 1.66it/s]
44%|βββββ | 1300/2975 [17:08<16:50, 1.66it/s]
44%|βββββ | 1300/2975 [17:08<16:50, 1.66it/s]
44%|βββββ | 1301/2975 [17:09<16:50, 1.66it/s]
44%|βββββ | 1302/2975 [17:10<16:48, 1.66it/s]
44%|βββββ | 1303/2975 [17:10<16:49, 1.66it/s]
44%|βββββ | 1304/2975 [17:11<16:50, 1.65it/s]
44%|βββββ | 1305/2975 [17:11<16:49, 1.65it/s]
44%|βββββ | 1306/2975 [17:12<16:48, 1.65it/s]
44%|βββββ | 1307/2975 [17:13<16:47, 1.66it/s]
44%|βββββ | 1308/2975 [17:13<16:48, 1.65it/s]
44%|βββββ | 1309/2975 [17:14<17:10, 1.62it/s]
44%|βββββ | 1310/2975 [17:15<17:09, 1.62it/s]
44%|βββββ |
| 0: {'loss': 0.7227, 'grad_norm': 0.7640136035361755, 'learning_rate': 1e-05, 'epoch': 0.44} |
| 0: | 1310/2975 [17:15<17:09, 1.62it/s]
44%|βββββ | 1311/2975 [17:15<17:04, 1.62it/s]
44%|βββββ | 1312/2975 [17:16<16:58, 1.63it/s]
44%|βββββ | 1313/2975 [17:16<16:54, 1.64it/s]
44%|βββββ | 1314/2975 [17:17<16:52, 1.64it/s]
44%|βββββ | 1315/2975 [17:18<16:48, 1.65it/s]
44%|βββββ | 1316/2975 [17:18<16:45, 1.65it/s]
44%|βββββ | 1317/2975 [17:19<16:43, 1.65it/s]
44%|βββββ | 1318/2975 [17:19<16:42, 1.65it/s]
44%|βββββ | 1319/2975 [17:20<16:42, 1.65it/s]
44%|βββββ | 1320/2975 [17:21<16:41, 1.65it/s]
44%|βββββ | 1320/2975 [17:21<16:41, 1.65it/s]
44%|βββββ | 1321/2975 [17:21<17:41, 1.56it/s]
44%|βββββ | 1322/2975 [17:22<17:51, 1.54it/s]
44%|βββββ | 1323/2975 [17:23<17:29, 1.57it/s]
45%|βββββ | 1324/2975 [17:23<17:14, 1.60it/s]
|
| 0: {'loss': 0.7302, 'grad_norm': 0.7597085554218037, 'learning_rate': 1e-05, 'epoch': 0.45} |
| 0: 45%|βββββ | 1325/2975 [17:24<17:02, 1.61it/s]
45%|βββββ | 1326/2975 [17:24<16:54, 1.63it/s]
45%|βββββ | 1327/2975 [17:25<16:47, 1.64it/s]
45%|βββββ | 1328/2975 [17:26<16:43, 1.64it/s]
45%|βββββ | 1329/2975 [17:26<16:39, 1.65it/s]
45%|βββββ | 1330/2975 [17:27<16:35, 1.65it/s]
45%|βββββ | 1330/2975 [17:27<16:35, 1.65it/s]
45%|βββββ | 1331/2975 [17:27<16:34, 1.65it/s]
45%|βββββ | 1332/2975 [17:28<16:32, 1.66it/s]
45%|βββββ | 1333/2975 [17:29<16:30, 1.66it/s]
45%|βββββ | 1334/2975 [17:29<16:31, 1.66it/s]
45%|βββββ | 1335/2975 [17:30<16:31, 1.65it/s]
45%|βββββ | 1336/2975 [17:30<16:30, 1.65it/s]
45%|βββββ | 1337/2975 [17:31<16:29, 1.66it/s]
45%|βββββ | 1338/2975 [17:32<16:28, 1.66it/s]
45%|βββββ | 1339/2975 [17: |
| 0: {'loss': 0.728, 'grad_norm': 0.7845822733538016, 'learning_rate': 1e-05, 'epoch': 0.45} |
| 0: {'loss': 0.7165, 'grad_norm': 0.7776081763736031, 'learning_rate': 1e-05, 'epoch': 0.45} |
| 0: 32<16:28, 1.65it/s]
45%|βββββ | 1340/2975 [17:33<16:29, 1.65it/s]
45%|βββββ | 1340/2975 [17:33<16:29, 1.65it/s]
45%|βββββ | 1341/2975 [17:33<16:29, 1.65it/s]
45%|βββββ | 1342/2975 [17:34<16:29, 1.65it/s]
45%|βββββ | 1343/2975 [17:35<16:29, 1.65it/s]
45%|βββββ | 1344/2975 [17:35<16:28, 1.65it/s]
45%|βββββ | 1345/2975 [17:36<16:27, 1.65it/s]
45%|βββββ | 1346/2975 [17:36<16:26, 1.65it/s]
45%|βββββ | 1347/2975 [17:37<16:27, 1.65it/s]
45%|βββββ | 1348/2975 [17:38<16:26, 1.65it/s]
45%|βββββ | 1349/2975 [17:38<16:26, 1.65it/s]
45%|βββββ | 1350/2975 [17:39<16:26, 1.65it/s]
45%|βββββ | 1350/2975 [17:39<16:26, 1.65it/s]
45%|βββββ | 1351/2975 [17:40<16:26, 1.65it/s]
45%|βββββ | 13 |
| 0: {'loss': 0.7278, 'grad_norm': 0.7524412952090146, 'learning_rate': 1e-05, 'epoch': 0.46} |
| 0: 52/2975 [17:40<16:25, 1.65it/s]
45%|βββββ | 1353/2975 [17:41<16:26, 1.64it/s]
46%|βββββ | 1354/2975 [17:41<16:32, 1.63it/s]
46%|βββββ | 1355/2975 [17:42<16:30, 1.64it/s]
46%|βββββ | 1356/2975 [17:43<16:25, 1.64it/s]
46%|βββββ | 1357/2975 [17:43<16:22, 1.65it/s]
46%|βββββ | 1358/2975 [17:44<16:21, 1.65it/s]
46%|βββββ | 1359/2975 [17:44<16:19, 1.65it/s]
46%|βββββ | 1360/2975 [17:45<16:16, 1.65it/s]
46%|βββββ | 1360/2975 [17:45<16:16, 1.65it/s]
46%|βββββ | 1361/2975 [17:46<16:15, 1.65it/s]
46%|βββββ | 1362/2975 [17:46<16:14, 1.66it/s]
46%|βββββ | 1363/2975 [17:47<16:13, 1.66it/s]
46%|βββββ | 1364/2975 [17:47<16:12, 1.66it/s]
46%|βββββ | 1365/2975 [17:48<16:12, 1.65it/s]
46%|βββββ | 1366/2975 [17:49<16:11, 1.66it/s]
46%|β |
| 0: {'loss': 0.7409, 'grad_norm': 0.7305820316850615, 'learning_rate': 1e-05, 'epoch': 0.46} |
| 0: ββββ | 1367/2975 [17:49<16:11, 1.66it/s]
46%|βββββ | 1368/2975 [17:50<16:12, 1.65it/s]
46%|βββββ | 1369/2975 [17:50<16:12, 1.65it/s]
46%|βββββ | 1370/2975 [17:51<16:13, 1.65it/s]
46%|βββββ | 1370/2975 [17:51<16:13, 1.65it/s]
46%|βββββ | 1371/2975 [17:52<22:42, 1.18it/s]
46%|βββββ | 1372/2975 [17:53<21:46, 1.23it/s]
46%|βββββ | 1373/2975 [17:54<20:07, 1.33it/s]
46%|βββββ | 1374/2975 [17:54<18:57, 1.41it/s]
46%|βββββ | 1375/2975 [17:55<18:08, 1.47it/s]
46%|βββββ | 1376/2975 [17:56<17:31, 1.52it/s]
46%|βββββ | 1377/2975 [17:56<17:05, 1.56it/s]
46%|βββββ | 1378/2975 [17:57<16:47, 1.58it/s]
46%|βββββ | 1379/2975 [17:57<16:35, 1.60it/s]
46%|βββββ | 1380/2975 [17:58<16:26, 1.62it/s]
|
| 0: {'loss': 0.7224, 'grad_norm': 0.7595749245205473, 'learning_rate': 1e-05, 'epoch': 0.46} |
| 0: {'loss': 0.7343, 'grad_norm': 0.737663184076695, 'learning_rate': 1e-05, 'epoch': 0.47} |
| 0:
46%|βββββ | 1380/2975 [17:58<16:26, 1.62it/s]
46%|βββββ | 1381/2975 [17:59<16:20, 1.63it/s]
46%|βββββ | 1382/2975 [17:59<16:18, 1.63it/s]
46%|βββββ | 1383/2975 [18:00<16:14, 1.63it/s]
47%|βββββ | 1384/2975 [18:00<16:11, 1.64it/s]
47%|βββββ | 1385/2975 [18:01<16:10, 1.64it/s]
47%|βββββ | 1386/2975 [18:02<17:46, 1.49it/s]
47%|βββββ | 1387/2975 [18:03<19:14, 1.38it/s]
47%|βββββ | 1388/2975 [18:03<18:18, 1.44it/s]
47%|βββββ | 1389/2975 [18:04<17:37, 1.50it/s]
47%|βββββ | 1390/2975 [18:05<19:22, 1.36it/s]
47%|βββββ | 1390/2975 [18:05<19:22, 1.36it/s]
47%|βββββ | 1391/2975 [18:05<18:26, 1.43it/s]
47%|βββββ | 1392/2975 [18:06<17:58, 1.47it/s]
47%|βββββ | 1393/2975 [18:07<19:11, 1.37it/s]
47%|βββββ | 1394/2975 |
| 0: {'loss': 0.7235, 'grad_norm': 0.7518853779242005, 'learning_rate': 1e-05, 'epoch': 0.47} |
| 0: [18:08<18:40, 1.41it/s]
47%|βββββ | 1395/2975 [18:08<17:50, 1.48it/s]
47%|βββββ | 1396/2975 [18:09<17:16, 1.52it/s]
47%|βββββ | 1397/2975 [18:09<16:51, 1.56it/s]
47%|βββββ | 1398/2975 [18:10<16:34, 1.58it/s]
47%|βββββ | 1399/2975 [18:11<16:23, 1.60it/s]
47%|βββββ | 1400/2975 [18:11<16:15, 1.62it/s]
47%|βββββ | 1400/2975 [18:11<16:15, 1.62it/s]
47%|βββββ | 1401/2975 [18:12<18:11, 1.44it/s]
47%|βββββ | 1402/2975 [18:13<17:42, 1.48it/s]
47%|βββββ | 1403/2975 [18:13<17:12, 1.52it/s]
47%|βββββ | 1404/2975 [18:14<16:48, 1.56it/s]
47%|βββββ | 1405/2975 [18:15<16:38, 1.57it/s]
47%|βββββ | 1406/2975 [18:15<16:25, 1.59it/s]
47%|βββββ | 1407/2975 [18:16<16:16, 1.61it/s]
47%|βββββ | 1408/2975 [18:16<16:15, 1.61it/s]
47%|βββοΏ½ |
| 0: {'loss': 0.7288, 'grad_norm': 0.7594287015442452, 'learning_rate': 1e-05, 'epoch': 0.47} |
| 0: {'loss': 0.7071, 'grad_norm': 0.7398135145941482, 'learning_rate': 1e-05, 'epoch': 0.48} |
| 0: οΏ½β | 1409/2975 [18:17<16:06, 1.62it/s]
47%|βββββ | 1410/2975 [18:18<15:59, 1.63it/s]
47%|βββββ | 1410/2975 [18:18<15:59, 1.63it/s]
47%|βββββ | 1411/2975 [18:18<15:58, 1.63it/s]
47%|βββββ | 1412/2975 [18:19<15:55, 1.64it/s]
47%|βββββ | 1413/2975 [18:19<15:53, 1.64it/s]
48%|βββββ | 1414/2975 [18:20<15:52, 1.64it/s]
48%|βββββ | 1415/2975 [18:21<15:50, 1.64it/s]
48%|βββββ | 1416/2975 [18:21<15:49, 1.64it/s]
48%|βββββ | 1417/2975 [18:22<16:31, 1.57it/s]
48%|βββββ | 1418/2975 [18:23<16:32, 1.57it/s]
48%|βββββ | 1419/2975 [18:23<16:17, 1.59it/s]
48%|βββββ | 1420/2975 [18:24<16:25, 1.58it/s]
48%|βββββ | 1420/2975 [18:24<16:25, 1.58it/s]
48%|βββββ | 1421/2975 [18:25<16:13, 1.60it/s]
48% |
| 0: {'loss': 0.7393, 'grad_norm': 0.769642418005638, 'learning_rate': 1e-05, 'epoch': 0.48} |
| 0: |βββββ | 1422/2975 [18:25<16:02, 1.61it/s]
48%|βββββ | 1423/2975 [18:26<16:10, 1.60it/s]
48%|βββββ | 1424/2975 [18:26<15:59, 1.62it/s]
48%|βββββ | 1425/2975 [18:27<15:51, 1.63it/s]
48%|βββββ | 1426/2975 [18:28<15:45, 1.64it/s]
48%|βββββ | 1427/2975 [18:28<15:41, 1.64it/s]
48%|βββββ | 1428/2975 [18:29<15:37, 1.65it/s]
48%|βββββ | 1429/2975 [18:29<15:36, 1.65it/s]
48%|βββββ | 1430/2975 [18:30<15:36, 1.65it/s]
48%|βββββ | 1430/2975 [18:30<15:36, 1.65it/s]
48%|βββββ | 1431/2975 [18:31<15:38, 1.64it/s]
48%|βββββ | 1432/2975 [18:31<15:39, 1.64it/s]
48%|βββββ | 1433/2975 [18:32<15:40, 1.64it/s]
48%|βββββ | 1434/2975 [18:32<15:40, 1.64it/s]
48%|βββββ | 1435/2975 [18:33<15:40, 1.64it/s]
48%|βββββ | 1436/2975 [18:34<1 |
| 0: {'loss': 0.722, 'grad_norm': 0.7687981704462479, 'learning_rate': 1e-05, 'epoch': 0.48} |
| 0: 5:37, 1.64it/s]
48%|βββββ | 1437/2975 [18:34<15:36, 1.64it/s]
48%|βββββ | 1438/2975 [18:35<15:47, 1.62it/s]
48%|βββββ | 1439/2975 [18:35<15:41, 1.63it/s]
48%|βββββ | 1440/2975 [18:36<15:37, 1.64it/s]
48%|βββββ | 1440/2975 [18:36<15:37, 1.64it/s]
48%|βββββ | 1441/2975 [18:37<15:34, 1.64it/s]
48%|βββββ | 1442/2975 [18:37<15:33, 1.64it/s]
49%|βββββ | 1443/2975 [18:38<15:33, 1.64it/s]
49%|βββββ | 1444/2975 [18:39<15:34, 1.64it/s]
49%|βββββ | 1445/2975 [18:39<15:32, 1.64it/s]
49%|βββββ | 1446/2975 [18:40<15:31, 1.64it/s]
49%|βββββ | 1447/2975 [18:40<15:30, 1.64it/s]
49%|βββββ | 1448/2975 [18:41<15:28, 1.64it/s]
49%|βββββ | 1449/2975 [18:42<15:26, 1.65it/s]
49%|βββββ | 1450/2975 [18:42<15:25, 1.65it/s]
|
| 0: {'loss': 0.7337, 'grad_norm': 0.737611485833095, 'learning_rate': 1e-05, 'epoch': 0.49} |
| 0: {'loss': 0.7532, 'grad_norm': 0.753601200505015, 'learning_rate': 1e-05, 'epoch': 0.49} |
| 0:
49%|βββββ | 1450/2975 [18:42<15:25, 1.65it/s]
49%|βββββ | 1451/2975 [18:43<15:24, 1.65it/s]
49%|βββββ | 1452/2975 [18:43<15:22, 1.65it/s]
49%|βββββ | 1453/2975 [18:44<15:22, 1.65it/s]
49%|βββββ | 1454/2975 [18:45<15:20, 1.65it/s]
49%|βββββ | 1455/2975 [18:45<15:19, 1.65it/s]
49%|βββββ | 1456/2975 [18:46<15:19, 1.65it/s]
49%|βββββ | 1457/2975 [18:46<15:19, 1.65it/s]
49%|βββββ | 1458/2975 [18:47<15:17, 1.65it/s]
49%|βββββ | 1459/2975 [18:48<15:16, 1.65it/s]
49%|βββββ | 1460/2975 [18:48<15:16, 1.65it/s]
49%|βββββ | 1460/2975 [18:48<15:16, 1.65it/s]
49%|βββββ | 1461/2975 [18:49<15:18, 1.65it/s]
49%|βββββ | 1462/2975 [18:49<15:18, 1.65it/s]
49%|βββββ | 1463/2975 [18:50<15:17, 1.65it/s]
49%|ββοΏ½ |
| 0: {'loss': 0.7061, 'grad_norm': 0.7431195611428175, 'learning_rate': 1e-05, 'epoch': 0.49} |
| 0: οΏ½οΏ½ββ | 1464/2975 [18:51<15:16, 1.65it/s]
49%|βββββ | 1465/2975 [18:51<15:17, 1.65it/s]
49%|βββββ | 1466/2975 [18:52<15:18, 1.64it/s]
49%|βββββ | 1467/2975 [18:52<15:17, 1.64it/s]
49%|βββββ | 1468/2975 [18:53<15:16, 1.64it/s]
49%|βββββ | 1469/2975 [18:54<15:15, 1.65it/s]
49%|βββββ | 1470/2975 [18:54<15:14, 1.65it/s]
49%|βββββ | 1470/2975 [18:54<15:14, 1.65it/s]
49%|βββββ | 1471/2975 [18:55<15:14, 1.65it/s]
49%|βββββ | 1472/2975 [18:56<15:12, 1.65it/s]
50%|βββββ | 1473/2975 [18:56<15:11, 1.65it/s]
50%|βββββ | 1474/2975 [18:57<15:10, 1.65it/s]
50%|βββββ | 1475/2975 [18:57<15:10, 1.65it/s]
50%|βββββ | 1476/2975 [18:58<15:10, 1.65it/s]
50%|βββββ | 1477/2975 [18:59<15:09, 1.65it/s]
50%|βββββ | 1478/2975 [18:59<15:09, 1 |
| 0: {'loss': 0.7238, 'grad_norm': 0.7227391902402309, 'learning_rate': 1e-05, 'epoch': 0.5} |
| 0: {'loss': 0.7362, 'grad_norm': 0.7838094626607218, 'learning_rate': 1e-05, 'epoch': 0.5} |
| 0: .65it/s]
50%|βββββ | 1479/2975 [19:00<15:10, 1.64it/s]
50%|βββββ | 1480/2975 [19:00<15:09, 1.64it/s]
50%|βββββ | 1480/2975 [19:00<15:09, 1.64it/s]
50%|βββββ | 1481/2975 [19:01<15:07, 1.65it/s]
50%|βββββ | 1482/2975 [19:02<15:05, 1.65it/s]
50%|βββββ | 1483/2975 [19:02<15:05, 1.65it/s]
50%|βββββ | 1484/2975 [19:03<15:05, 1.65it/s]
50%|βββββ | 1485/2975 [19:03<15:06, 1.64it/s]
50%|βββββ | 1486/2975 [19:04<15:05, 1.64it/s]
50%|βββββ | 1487/2975 [19:05<15:04, 1.64it/s]
50%|βββββ | 1488/2975 [19:05<15:04, 1.64it/s]
50%|βββββ | 1489/2975 [19:06<15:05, 1.64it/s]
50%|βββββ | 1490/2975 [19:06<15:04, 1.64it/s]
50%|βββββ | 1490/2975 [19:06<15:04, 1.64it/s]
50%|βββββ | 1491/2975 [19: |
| 0: {'loss': 0.7061, 'grad_norm': 0.7235667413143033, 'learning_rate': 1e-05, 'epoch': 0.5} |
| 0: 07<15:03, 1.64it/s]
50%|βββββ | 1492/2975 [19:08<15:02, 1.64it/s]
50%|βββββ | 1493/2975 [19:08<15:14, 1.62it/s]
50%|βββββ | 1494/2975 [19:09<15:12, 1.62it/s]
50%|βββββ | 1495/2975 [19:10<15:09, 1.63it/s]
50%|βββββ | 1496/2975 [19:10<15:09, 1.63it/s]
50%|βββββ | 1497/2975 [19:11<15:06, 1.63it/s]
50%|βββββ | 1498/2975 [19:11<15:04, 1.63it/s]
50%|βββββ | 1499/2975 [19:12<15:01, 1.64it/s]
50%|βββββ | 1500/2975 [19:13<14:59, 1.64it/s]
50%|βββββ | 1500/2975 [19:13<14:59, 1.64it/s]
50%|βββββ | 1501/2975 [19:13<15:00, 1.64it/s]
50%|βββββ | 1502/2975 [19:14<15:02, 1.63it/s]
51%|βββββ | 1503/2975 [19:14<15:00, 1.63it/s]
51%|βββββ | 1504/2975 [19:15<15:00, 1.63it/s]
51%|βββββ | 1505/2975 [19:16<14:59, 1.63it/s]
51%|βββββ |
| 0: {'loss': 0.7325, 'grad_norm': 0.7931800443283757, 'learning_rate': 1e-05, 'epoch': 0.51} |
| 0: | 1506/2975 [19:16<14:57, 1.64it/s]
51%|βββββ | 1507/2975 [19:17<14:57, 1.63it/s]
51%|βββββ | 1508/2975 [19:17<14:57, 1.63it/s]
51%|βββββ | 1509/2975 [19:18<14:56, 1.64it/s]
51%|βββββ | 1510/2975 [19:19<14:53, 1.64it/s]
51%|βββββ | 1510/2975 [19:19<14:53, 1.64it/s]
51%|βββββ | 1511/2975 [19:19<14:55, 1.63it/s]
51%|βββββ | 1512/2975 [19:20<14:55, 1.63it/s]
51%|βββββ | 1513/2975 [19:21<14:54, 1.63it/s]
51%|βββββ | 1514/2975 [19:21<14:53, 1.64it/s]
51%|βββββ | 1515/2975 [19:22<15:04, 1.61it/s]
51%|βββββ | 1516/2975 [19:22<15:00, 1.62it/s]
51%|βββββ | 1517/2975 [19:23<14:56, 1.63it/s]
51%|βββββ | 1518/2975 [19:24<14:53, 1.63it/s]
51%|βββββ | 1519/2975 [19:24<15:06, 1.61it/s]
51%|βββββ | 1520/2975 [19:25<15:00, 1.62it/s] |
| 0: {'loss': 0.7227, 'grad_norm': 0.7509840361931992, 'learning_rate': 1e-05, 'epoch': 0.51} |
| 0: {'loss': 0.7477, 'grad_norm': 0.7622116209089915, 'learning_rate': 1e-05, 'epoch': 0.51} |
| 0:
51%|βββββ | 1520/2975 [19:25<15:00, 1.62it/s]
51%|βββββ | 1521/2975 [19:25<14:57, 1.62it/s]
51%|βββββ | 1522/2975 [19:26<14:52, 1.63it/s]
51%|βββββ | 1523/2975 [19:27<14:49, 1.63it/s]
51%|βββββ | 1524/2975 [19:27<14:48, 1.63it/s]
51%|ββββββ | 1525/2975 [19:28<14:46, 1.64it/s]
51%|ββββββ | 1526/2975 [19:29<14:46, 1.64it/s]
51%|ββββββ | 1527/2975 [19:29<14:45, 1.64it/s]
51%|ββββββ | 1528/2975 [19:30<14:45, 1.63it/s]
51%|ββββββ | 1529/2975 [19:30<14:44, 1.63it/s]
51%|ββββββ | 1530/2975 [19:31<14:44, 1.63it/s]
51%|ββββββ | 1530/2975 [19:31<14:44, 1.63it/s]
51%|ββββββ | 1531/2975 [19:32<14:44, 1.63it/s]
51%|ββββββ | 1532/2975 [19:32<14:43, 1.63it/s]
52%|ββββββ | 15 |
| 0: {'loss': 0.7109, 'grad_norm': 0.7106310730304262, 'learning_rate': 1e-05, 'epoch': 0.52} |
| 0: 33/2975 [19:33<14:42, 1.63it/s]
52%|ββββββ | 1534/2975 [19:33<14:42, 1.63it/s]
52%|ββββββ | 1535/2975 [19:34<14:52, 1.61it/s]
52%|ββββββ | 1536/2975 [19:35<14:47, 1.62it/s]
52%|ββββββ | 1537/2975 [19:35<14:45, 1.62it/s]
52%|ββββββ | 1538/2975 [19:36<14:42, 1.63it/s]
52%|ββββββ | 1539/2975 [19:37<15:03, 1.59it/s]
52%|ββββββ | 1540/2975 [19:37<14:55, 1.60it/s]
52%|ββββββ | 1540/2975 [19:37<14:55, 1.60it/s]
52%|ββββββ | 1541/2975 [19:38<14:49, 1.61it/s]
52%|ββββββ | 1542/2975 [19:38<14:44, 1.62it/s]
52%|ββββββ | 1543/2975 [19:39<14:39, 1.63it/s]
52%|ββββββ | 1544/2975 [19:40<14:38, 1.63it/s]
52%|ββββββ | 1545/2975 [19:40<14:36, 1.63it/s]
52%|ββββββ | 1546/2975 [19:41<14:46, 1.61it/s]
52%|ββββββ | 1547/2975 [19 |
| 0: {'loss': 0.7287, 'grad_norm': 0.9284595641558513, 'learning_rate': 1e-05, 'epoch': 0.52} |
| 0: {'loss': 0.7064, 'grad_norm': 0.7437149827305557, 'learning_rate': 1e-05, 'epoch': 0.52} |
| 0: :41<14:41, 1.62it/s]
52%|ββββββ | 1548/2975 [19:42<14:38, 1.62it/s]
52%|ββββββ | 1549/2975 [19:43<14:35, 1.63it/s]
52%|ββββββ | 1550/2975 [19:43<14:32, 1.63it/s]
52%|ββββββ | 1550/2975 [19:43<14:32, 1.63it/s]
52%|ββββββ | 1551/2975 [19:44<14:31, 1.63it/s]
52%|ββββββ | 1552/2975 [19:45<14:30, 1.63it/s]
52%|ββββββ | 1553/2975 [19:45<14:29, 1.64it/s]
52%|ββββββ | 1554/2975 [19:46<14:34, 1.62it/s]
52%|ββββββ | 1555/2975 [19:46<14:33, 1.63it/s]
52%|ββββββ | 1556/2975 [19:47<14:30, 1.63it/s]
52%|ββββββ | 1557/2975 [19:48<14:27, 1.63it/s]
52%|ββββββ | 1558/2975 [19:48<14:26, 1.64it/s]
52%|ββββββ | 1559/2975 [19:49<14:25, 1.64it/s]
52%|ββββββ | 1560/2975 [19:49<14:26, 1.63it/s]
|
| 0: {'loss': 0.7335, 'grad_norm': 0.7134687197364542, 'learning_rate': 1e-05, 'epoch': 0.53} |
| 0: 52%|ββββββ | 1560/2975 [19:49<14:26, 1.63it/s]
52%|ββββββ | 1561/2975 [19:50<14:25, 1.63it/s]
53%|ββββββ | 1562/2975 [19:51<14:24, 1.63it/s]
53%|ββββββ | 1563/2975 [19:51<14:24, 1.63it/s]
53%|ββββββ | 1564/2975 [19:52<14:23, 1.63it/s]
53%|ββββββ | 1565/2975 [19:52<14:22, 1.64it/s]
53%|ββββββ | 1566/2975 [19:53<14:18, 1.64it/s]
53%|ββββββ | 1567/2975 [19:54<14:17, 1.64it/s]
53%|ββββββ | 1568/2975 [19:54<14:19, 1.64it/s]
53%|ββββββ | 1569/2975 [19:55<14:19, 1.64it/s]
53%|ββββββ | 1570/2975 [19:56<14:18, 1.64it/s]
53%|ββββββ | 1570/2975 [19:56<14:18, 1.64it/s]
53%|ββββββ | 1571/2975 [19:56<14:19, 1.63it/s]
53%|ββββββ | 1572/2975 [19:57<14:19, 1.63it/s]
53%|ββββββ | 1573/2975 [19:57<14:18, 1.63it/s]
53%|ββ |
| 0: {'loss': 0.7279, 'grad_norm': 0.7685737145218493, 'learning_rate': 1e-05, 'epoch': 0.53} |
| 0: ββββ | 1574/2975 [19:58<14:17, 1.63it/s]
53%|ββββββ | 1575/2975 [19:59<14:15, 1.64it/s]
53%|ββββββ | 1576/2975 [19:59<14:16, 1.63it/s]
53%|ββββββ | 1577/2975 [20:00<14:15, 1.63it/s]
53%|ββββββ | 1578/2975 [20:00<14:14, 1.64it/s]
53%|ββββββ | 1579/2975 [20:01<14:13, 1.64it/s]
53%|ββββββ | 1580/2975 [20:02<14:14, 1.63it/s]
53%|ββββββ | 1580/2975 [20:02<14:14, 1.63it/s]
53%|ββββββ | 1581/2975 [20:02<14:15, 1.63it/s]
53%|ββββββ | 1582/2975 [20:03<14:14, 1.63it/s]
53%|ββββββ | 1583/2975 [20:04<14:15, 1.63it/s]
53%|ββββββ | 1584/2975 [20:04<14:14, 1.63it/s]
53%|ββββββ | 1585/2975 [20:05<14:13, 1.63it/s]
53%|ββββββ | 1586/2975 [20:05<14:12, 1.63it/s]
53%|ββββββ | 1587/2975 [20:06<14:11, 1.63it/s]
53%|βββββοΏ½ |
| 0: {'loss': 0.7372, 'grad_norm': 0.7335726270641828, 'learning_rate': 1e-05, 'epoch': 0.53} |
| 0: {'loss': 0.7061, 'grad_norm': 0.7425514182122959, 'learning_rate': 1e-05, 'epoch': 0.54} |
| 0: οΏ½ | 1588/2975 [20:07<14:10, 1.63it/s]
53%|ββββββ | 1589/2975 [20:07<14:08, 1.63it/s]
53%|ββββββ | 1590/2975 [20:08<14:08, 1.63it/s]
53%|ββββββ | 1590/2975 [20:08<14:08, 1.63it/s]
53%|ββββββ | 1591/2975 [20:08<14:07, 1.63it/s]
54%|ββββββ | 1592/2975 [20:09<14:05, 1.64it/s]
54%|ββββββ | 1593/2975 [20:10<14:04, 1.64it/s]
54%|ββββββ | 1594/2975 [20:10<14:03, 1.64it/s]
54%|ββββββ | 1595/2975 [20:11<14:02, 1.64it/s]
54%|ββββββ | 1596/2975 [20:11<14:01, 1.64it/s]
54%|ββββββ | 1597/2975 [20:12<14:00, 1.64it/s]
54%|ββββββ | 1598/2975 [20:13<13:59, 1.64it/s]
54%|ββββββ | 1599/2975 [20:13<13:59, 1.64it/s]
54%|ββββββ | 1600/2975 [20:14<14:00, 1.64it/s]
54%|ββββββ | 1600/2975 [20:1 |
| 0: {'loss': 0.7128, 'grad_norm': 0.756614439567717, 'learning_rate': 1e-05, 'epoch': 0.54} |
| 0: 4<14:00, 1.64it/s]
54%|ββββββ | 1601/2975 [20:15<14:00, 1.64it/s]
54%|ββββββ | 1602/2975 [20:15<13:59, 1.64it/s]
54%|ββββββ | 1603/2975 [20:16<13:57, 1.64it/s]
54%|ββββββ | 1604/2975 [20:16<13:57, 1.64it/s]
54%|ββββββ | 1605/2975 [20:17<13:55, 1.64it/s]
54%|ββββββ | 1606/2975 [20:18<13:53, 1.64it/s]
54%|ββββββ | 1607/2975 [20:18<13:53, 1.64it/s]
54%|ββββββ | 1608/2975 [20:19<13:52, 1.64it/s]
54%|ββββββ | 1609/2975 [20:19<13:53, 1.64it/s]
54%|ββββββ | 1610/2975 [20:20<13:52, 1.64it/s]
54%|ββββββ | 1610/2975 [20:20<13:52, 1.64it/s]
54%|ββββββ | 1611/2975 [20:21<13:52, 1.64it/s]
54%|ββββββ | 1612/2975 [20:21<13:53, 1.64it/s]
54%|ββββββ | 1613/2975 [20:22<13:52, 1.64it/s]
54%|ββββββ | 1614/2975 [20:22<13:52, 1 |
| 0: {'loss': 0.7238, 'grad_norm': 0.7183712888084608, 'learning_rate': 1e-05, 'epoch': 0.54} |
| 0: .64it/s]
54%|ββββββ | 1615/2975 [20:23<13:51, 1.64it/s]
54%|ββββββ | 1616/2975 [20:24<13:50, 1.64it/s]
54%|ββββββ | 1617/2975 [20:24<13:48, 1.64it/s]
54%|ββββββ | 1618/2975 [20:25<13:46, 1.64it/s]
54%|ββββββ | 1619/2975 [20:26<13:44, 1.64it/s]
54%|ββββββ | 1620/2975 [20:26<13:43, 1.65it/s]
54%|ββββββ | 1620/2975 [20:26<13:43, 1.65it/s]
54%|ββββββ | 1621/2975 [20:27<13:43, 1.64it/s]
55%|ββββββ | 1622/2975 [20:27<13:44, 1.64it/s]
55%|ββββββ | 1623/2975 [20:28<13:44, 1.64it/s]
55%|ββββββ | 1624/2975 [20:29<13:44, 1.64it/s]
55%|ββββββ | 1625/2975 [20:29<13:46, 1.63it/s]
55%|ββββββ | 1626/2975 [20:30<13:47, 1.63it/s]
55%|ββββββ | 1627/2975 [20:30<13:47, 1.63it/s]
55%|ββββββ | 1628/2975 [20:31<13:44, 1.63it/s]
5 |
| 0: {'loss': 0.7151, 'grad_norm': 0.7421727883946161, 'learning_rate': 1e-05, 'epoch': 0.55} |
| 0: {'loss': 0.7215, 'grad_norm': 0.7387996700462125, 'learning_rate': 1e-05, 'epoch': 0.55} |
| 0: 5%|ββββββ | 1629/2975 [20:32<13:42, 1.64it/s]
55%|ββββββ | 1630/2975 [20:32<13:41, 1.64it/s]
55%|ββββββ | 1630/2975 [20:32<13:41, 1.64it/s]
55%|ββββββ | 1631/2975 [20:33<13:40, 1.64it/s]
55%|ββββββ | 1632/2975 [20:33<13:39, 1.64it/s]
55%|ββββββ | 1633/2975 [20:34<13:38, 1.64it/s]
55%|ββββββ | 1634/2975 [20:35<13:37, 1.64it/s]
55%|ββββββ | 1635/2975 [20:35<13:38, 1.64it/s]
55%|ββββββ | 1636/2975 [20:36<13:38, 1.64it/s]
55%|ββββββ | 1637/2975 [20:37<13:38, 1.63it/s]
55%|ββββββ | 1638/2975 [20:37<13:39, 1.63it/s]
55%|ββββββ | 1639/2975 [20:38<13:37, 1.63it/s]
55%|ββββββ | 1640/2975 [20:38<13:35, 1.64it/s]
55%|ββββββ | 1640/2975 [20:38<13:35, 1.64it/s]
55%|ββββββ |
| 0: {'loss': 0.7069, 'grad_norm': 0.7347750601522001, 'learning_rate': 1e-05, 'epoch': 0.55} |
| 0: | 1641/2975 [20:39<13:34, 1.64it/s]
55%|ββββββ | 1642/2975 [20:40<13:32, 1.64it/s]
55%|ββββββ | 1643/2975 [20:40<13:31, 1.64it/s]
55%|ββββββ | 1644/2975 [20:41<13:29, 1.65it/s]
55%|ββββββ | 1645/2975 [20:41<13:28, 1.65it/s]
55%|ββββββ | 1646/2975 [20:42<13:27, 1.65it/s]
55%|ββββββ | 1647/2975 [20:43<13:27, 1.65it/s]
55%|ββββββ | 1648/2975 [20:43<13:27, 1.64it/s]
55%|ββββββ | 1649/2975 [20:44<13:28, 1.64it/s]
55%|ββββββ | 1650/2975 [20:44<13:31, 1.63it/s]
55%|ββββββ | 1650/2975 [20:44<13:31, 1.63it/s]
55%|ββββββ | 1651/2975 [20:45<13:32, 1.63it/s]
56%|ββββββ | 1652/2975 [20:46<13:30, 1.63it/s]
56%|ββββββ | 1653/2975 [20:46<13:28, 1.64it/s]
56%|ββββββ | 1654/2975 [20:47<13:27, 1.64it/s]
56%|ββββββ | 1655/2 |
| 0: {'loss': 0.7276, 'grad_norm': 0.7590718949791706, 'learning_rate': 1e-05, 'epoch': 0.56} |
| 0: 975 [20:47<13:25, 1.64it/s]
56%|ββββββ | 1656/2975 [20:48<13:24, 1.64it/s]
56%|ββββββ | 1657/2975 [20:49<13:21, 1.64it/s]
56%|ββββββ | 1658/2975 [20:49<13:20, 1.64it/s]
56%|ββββββ | 1659/2975 [20:50<13:20, 1.64it/s]
56%|ββββββ | 1660/2975 [20:51<13:19, 1.64it/s]
56%|ββββββ | 1660/2975 [20:51<13:19, 1.64it/s]
56%|ββββββ | 1661/2975 [20:51<13:21, 1.64it/s]
56%|ββββββ | 1662/2975 [20:52<13:22, 1.64it/s]
56%|ββββββ | 1663/2975 [20:52<13:23, 1.63it/s]
56%|ββββββ | 1664/2975 [20:53<13:22, 1.63it/s]
56%|ββββββ | 1665/2975 [20:54<13:20, 1.64it/s]
56%|ββββββ | 1666/2975 [20:54<13:19, 1.64it/s]
56%|ββββββ | 1667/2975 [20:55<13:18, 1.64it/s]
56%|ββββββ | 1668/2975 [20:55<13:16, 1.64it/s]
56%|ββββββ | 1669/2975 [20:56< |
| 0: {'loss': 0.7199, 'grad_norm': 0.7649717152658091, 'learning_rate': 1e-05, 'epoch': 0.56} |
| 0: {'loss': 0.7277, 'grad_norm': 0.7437977542383272, 'learning_rate': 1e-05, 'epoch': 0.56} |
| 0: 13:16, 1.64it/s]
56%|ββββββ | 1670/2975 [20:57<13:15, 1.64it/s]
56%|ββββββ | 1670/2975 [20:57<13:15, 1.64it/s]
56%|ββββββ | 1671/2975 [20:57<13:14, 1.64it/s]
56%|ββββββ | 1672/2975 [20:58<13:13, 1.64it/s]
56%|ββββββ | 1673/2975 [20:58<13:12, 1.64it/s]
56%|ββββββ | 1674/2975 [20:59<13:14, 1.64it/s]
56%|ββββββ | 1675/2975 [21:00<13:15, 1.63it/s]
56%|ββββββ | 1676/2975 [21:00<13:16, 1.63it/s]
56%|ββββββ | 1677/2975 [21:01<13:14, 1.63it/s]
56%|ββββββ | 1678/2975 [21:02<13:12, 1.64it/s]
56%|ββββββ | 1679/2975 [21:02<13:09, 1.64it/s]
56%|ββββββ | 1680/2975 [21:03<13:08, 1.64it/s]
56%|ββββββ | 1680/2975 [21:03<13:08, 1.64it/s]
57%|ββββββ | 1681/2975 [21:03<13:08, 1.64it/s]
57% |
| 0: {'loss': 0.7005, 'grad_norm': 0.7046944932910209, 'learning_rate': 1e-05, 'epoch': 0.57} |
| 0: |ββββββ | 1682/2975 [21:04<13:07, 1.64it/s]
57%|ββββββ | 1683/2975 [21:05<13:07, 1.64it/s]
57%|ββββββ | 1684/2975 [21:05<13:06, 1.64it/s]
57%|ββββββ | 1685/2975 [21:06<13:05, 1.64it/s]
57%|ββββββ | 1686/2975 [21:06<13:05, 1.64it/s]
57%|ββββββ | 1687/2975 [21:07<13:06, 1.64it/s]
57%|ββββββ | 1688/2975 [21:08<13:05, 1.64it/s]
57%|ββββββ | 1689/2975 [21:08<13:05, 1.64it/s]
57%|ββββββ | 1690/2975 [21:09<13:04, 1.64it/s]
57%|ββββββ | 1690/2975 [21:09<13:04, 1.64it/s]
57%|ββββββ | 1691/2975 [21:09<13:04, 1.64it/s]
57%|ββββββ | 1692/2975 [21:10<13:03, 1.64it/s]
57%|ββββββ | 1693/2975 [21:11<13:03, 1.64it/s]
57%|ββββββ | 1694/2975 [21:11<13:02, 1.64it/s]
57%|ββββββ | 1695/2975 [21:12<13:01, 1.64it/s]
57%|βββοΏ½ |
| 0: {'loss': 0.7045, 'grad_norm': 0.7092249753117572, 'learning_rate': 1e-05, 'epoch': 0.57} |
| 0: οΏ½οΏ½ββ | 1696/2975 [21:12<12:58, 1.64it/s]
57%|ββββββ | 1697/2975 [21:13<12:57, 1.64it/s]
57%|ββββββ | 1698/2975 [21:14<12:56, 1.64it/s]
57%|ββββββ | 1699/2975 [21:14<12:57, 1.64it/s]
57%|ββββββ | 1700/2975 [21:15<13:08, 1.62it/s]
57%|ββββββ | 1700/2975 [21:15<13:08, 1.62it/s]
57%|ββββββ | 1701/2975 [21:16<13:04, 1.62it/s]
57%|ββββββ | 1702/2975 [21:16<13:00, 1.63it/s]
57%|ββββββ | 1703/2975 [21:17<12:59, 1.63it/s]
57%|ββββββ | 1704/2975 [21:17<13:07, 1.61it/s]
57%|ββββββ | 1705/2975 [21:18<13:03, 1.62it/s]
57%|ββββββ | 1706/2975 [21:19<13:01, 1.62it/s]
57%|ββββββ | 1707/2975 [21:19<13:00, 1.63it/s]
57%|ββββββ | 1708/2975 [21:20<12:58, 1.63it/s]
57%|ββββββ | 1709/2975 [21:20<12:57, 1.63it/s]
57%|ββββββ |
| 0: {'loss': 0.7211, 'grad_norm': 0.908931769106596, 'learning_rate': 1e-05, 'epoch': 0.57} |
| 0: {'loss': 0.734, 'grad_norm': 0.7677796479365407, 'learning_rate': 1e-05, 'epoch': 0.58} |
| 0: | 1710/2975 [21:21<12:56, 1.63it/s]
57%|ββββββ | 1710/2975 [21:21<12:56, 1.63it/s]
58%|ββββββ | 1711/2975 [21:22<12:55, 1.63it/s]
58%|ββββββ | 1712/2975 [21:22<12:52, 1.63it/s]
58%|ββββββ | 1713/2975 [21:23<12:50, 1.64it/s]
58%|ββββββ | 1714/2975 [21:24<12:48, 1.64it/s]
58%|ββββββ | 1715/2975 [21:24<12:46, 1.64it/s]
58%|ββββββ | 1716/2975 [21:25<12:46, 1.64it/s]
58%|ββββββ | 1717/2975 [21:25<12:46, 1.64it/s]
58%|ββββββ | 1718/2975 [21:26<12:45, 1.64it/s]
58%|ββββββ | 1719/2975 [21:27<12:43, 1.64it/s]
58%|ββββββ | 1720/2975 [21:27<12:43, 1.64it/s]
58%|ββββββ | 1720/2975 [21:27<12:43, 1.64it/s]
58%|ββββββ | 1721/2975 [21:28<12:43, 1.64it/s]
58%|ββββββ | 1722/2975 [21:28<12 |
| 0: {'loss': 0.7207, 'grad_norm': 0.8264399873360426, 'learning_rate': 1e-05, 'epoch': 0.58} |
| 0: :43, 1.64it/s]
58%|ββββββ | 1723/2975 [21:29<12:44, 1.64it/s]
58%|ββββββ | 1724/2975 [21:30<12:44, 1.64it/s]
58%|ββββββ | 1725/2975 [21:30<12:43, 1.64it/s]
58%|ββββββ | 1726/2975 [21:31<12:42, 1.64it/s]
58%|ββββββ | 1727/2975 [21:31<12:41, 1.64it/s]
58%|ββββββ | 1728/2975 [21:32<12:40, 1.64it/s]
58%|ββββββ | 1729/2975 [21:33<12:39, 1.64it/s]
58%|ββββββ | 1730/2975 [21:33<12:40, 1.64it/s]
58%|ββββββ | 1730/2975 [21:33<12:40, 1.64it/s]
58%|ββββββ | 1731/2975 [21:34<13:54, 1.49it/s]
58%|ββββββ | 1732/2975 [21:35<13:29, 1.54it/s]
58%|ββββββ | 1733/2975 [21:35<13:13, 1.57it/s]
58%|ββββββ | 1734/2975 [21:36<13:02, 1.59it/s]
58%|ββββββ | 1735/2975 [21:37<12:53, 1.60it/s]
58%|ββββββ | 1736/2975 [21:37<12:48, 1.61i |
| 0: {'loss': 0.711, 'grad_norm': 0.8011410157530903, 'learning_rate': 1e-05, 'epoch': 0.58} |
| 0: t/s]
58%|ββββββ | 1737/2975 [21:38<12:43, 1.62it/s]
58%|ββββββ | 1738/2975 [21:38<12:41, 1.62it/s]
58%|ββββββ | 1739/2975 [21:39<12:38, 1.63it/s]
58%|ββββββ | 1740/2975 [21:40<12:35, 1.63it/s]
58%|ββββββ | 1740/2975 [21:40<12:35, 1.63it/s]
59%|ββββββ | 1741/2975 [21:40<12:34, 1.64it/s]
59%|ββββββ | 1742/2975 [21:41<12:31, 1.64it/s]
59%|ββββββ | 1743/2975 [21:41<12:29, 1.64it/s]
59%|ββββββ | 1744/2975 [21:42<12:27, 1.65it/s]
59%|ββββββ | 1745/2975 [21:43<12:25, 1.65it/s]
59%|ββββββ | 1746/2975 [21:43<12:25, 1.65it/s]
59%|ββββββ | 1747/2975 [21:44<12:25, 1.65it/s]
59%|ββββββ | 1748/2975 [21:44<12:24, 1.65it/s]
59%|ββββββ | 1749/2975 [21:45<12:22, 1.65it/s]
59%|ββββββ | 1750/2975 [21:46<12:21, 1.65it/s]
|
| 0: {'loss': 0.7079, 'grad_norm': 0.7102845427696028, 'learning_rate': 1e-05, 'epoch': 0.59} |
| 0: {'loss': 0.7247, 'grad_norm': 0.7090127036567279, 'learning_rate': 1e-05, 'epoch': 0.59} |
| 0:
59%|ββββββ | 1750/2975 [21:46<12:21, 1.65it/s]
59%|ββββββ | 1751/2975 [21:46<12:33, 1.63it/s]
59%|ββββββ | 1752/2975 [21:47<12:31, 1.63it/s]
59%|ββββββ | 1753/2975 [21:48<12:29, 1.63it/s]
59%|ββββββ | 1754/2975 [21:48<12:27, 1.63it/s]
59%|ββββββ | 1755/2975 [21:49<12:24, 1.64it/s]
59%|ββββββ | 1756/2975 [21:49<12:23, 1.64it/s]
59%|ββββββ | 1757/2975 [21:50<12:21, 1.64it/s]
59%|ββββββ | 1758/2975 [21:51<12:19, 1.65it/s]
59%|ββββββ | 1759/2975 [21:51<12:19, 1.64it/s]
59%|ββββββ | 1760/2975 [21:52<12:17, 1.65it/s]
59%|ββββββ | 1760/2975 [21:52<12:17, 1.65it/s]
59%|ββββββ | 1761/2975 [21:52<12:16, 1.65it/s]
59%|ββββββ | 1762/2975 [21:53<12:16, 1.65it/s]
59%|ββββββ | |
| 0: {'loss': 0.7057, 'grad_norm': 0.724586519482281, 'learning_rate': 1e-05, 'epoch': 0.59} |
| 0: 1763/2975 [21:54<12:15, 1.65it/s]
59%|ββββββ | 1764/2975 [21:54<12:14, 1.65it/s]
59%|ββββββ | 1765/2975 [21:55<12:13, 1.65it/s]
59%|ββββββ | 1766/2975 [21:55<12:11, 1.65it/s]
59%|ββββββ | 1767/2975 [21:56<12:11, 1.65it/s]
59%|ββββββ | 1768/2975 [21:57<12:12, 1.65it/s]
59%|ββββββ | 1769/2975 [21:57<12:12, 1.65it/s]
59%|ββββββ | 1770/2975 [21:58<12:14, 1.64it/s]
59%|ββββββ | 1770/2975 [21:58<12:14, 1.64it/s]
60%|ββββββ | 1771/2975 [21:58<12:14, 1.64it/s]
60%|ββββββ | 1772/2975 [21:59<12:13, 1.64it/s]
60%|ββββββ | 1773/2975 [22:00<12:12, 1.64it/s]
60%|ββββββ | 1774/2975 [22:00<12:11, 1.64it/s]
60%|ββββββ | 1775/2975 [22:01<12:10, 1.64it/s]
60%|ββββββ | 1776/2975 [22:01<12:09, 1.64it/s]
60%|ββββββ | 1777/2975 |
| 0: {'loss': 0.7076, 'grad_norm': 0.738490278103292, 'learning_rate': 1e-05, 'epoch': 0.6} |
| 0: [22:02<12:07, 1.65it/s]
60%|ββββββ | 1778/2975 [22:03<12:06, 1.65it/s]
60%|ββββββ | 1779/2975 [22:03<12:06, 1.65it/s]
60%|ββββββ | 1780/2975 [22:04<12:05, 1.65it/s]
60%|ββββββ | 1780/2975 [22:04<12:05, 1.65it/s]
60%|ββββββ | 1781/2975 [22:05<12:05, 1.65it/s]
60%|ββββββ | 1782/2975 [22:05<12:05, 1.64it/s]
60%|ββββββ | 1783/2975 [22:06<12:04, 1.64it/s]
60%|ββββββ | 1784/2975 [22:06<12:04, 1.64it/s]
60%|ββββββ | 1785/2975 [22:07<12:04, 1.64it/s]
60%|ββββββ | 1786/2975 [22:10<24:05, 1.22s/it]
60%|ββββββ | 1787/2975 [22:10<20:28, 1.03s/it]
60%|ββββββ | 1788/2975 [22:11<17:56, 1.10it/s]
60%|ββββββ | 1789/2975 [22:11<16:09, 1.22it/s]
60%|ββββββ | 1790/2975 [22:12<14:53, 1.33it/s]
|
| 0: {'loss': 0.7015, 'grad_norm': 0.7563688162947536, 'learning_rate': 1e-05, 'epoch': 0.6} |
| 0: {'loss': 0.7154, 'grad_norm': 0.7767154655838906, 'learning_rate': 1e-05, 'epoch': 0.6} |
| 0:
60%|ββββββ | 1790/2975 [22:12<14:53, 1.33it/s]
60%|ββββββ | 1791/2975 [22:13<14:00, 1.41it/s]
60%|ββββββ | 1792/2975 [22:13<13:23, 1.47it/s]
60%|ββββββ | 1793/2975 [22:14<12:57, 1.52it/s]
60%|ββββββ | 1794/2975 [22:14<12:39, 1.56it/s]
60%|ββββββ | 1795/2975 [22:15<12:25, 1.58it/s]
60%|ββββββ | 1796/2975 [22:16<12:15, 1.60it/s]
60%|ββββββ | 1797/2975 [22:16<12:09, 1.61it/s]
60%|ββββββ | 1798/2975 [22:17<12:04, 1.62it/s]
60%|ββββββ | 1799/2975 [22:17<12:00, 1.63it/s]
61%|ββββββ | 1800/2975 [22:18<11:57, 1.64it/s]
61%|ββββββ | 1800/2975 [22:18<11:57, 1.64it/s]
61%|ββββββ | 1801/2975 [22:19<11:58, 1.63it/s]
61%|ββββββ | 1802/2975 [22:19<11:59, 1.63it/s]
61%|ββββββ | 1803/2975 [22:20<11:57, 1.63it/s]
61%|β |
| 0: {'loss': 0.7147, 'grad_norm': 0.7588920449284606, 'learning_rate': 1e-05, 'epoch': 0.61} |
| 0: βββββ | 1804/2975 [22:21<11:56, 1.63it/s]
61%|ββββββ | 1805/2975 [22:21<11:55, 1.64it/s]
61%|ββββββ | 1806/2975 [22:22<11:53, 1.64it/s]
61%|ββββββ | 1807/2975 [22:22<11:52, 1.64it/s]
61%|ββββββ | 1808/2975 [22:23<11:50, 1.64it/s]
61%|ββββββ | 1809/2975 [22:24<11:48, 1.65it/s]
61%|ββββββ | 1810/2975 [22:24<11:47, 1.65it/s]
61%|ββββββ | 1810/2975 [22:24<11:47, 1.65it/s]
61%|ββββββ | 1811/2975 [22:25<11:48, 1.64it/s]
61%|ββββββ | 1812/2975 [22:25<11:47, 1.64it/s]
61%|ββββββ | 1813/2975 [22:26<11:46, 1.65it/s]
61%|ββββββ | 1814/2975 [22:27<11:45, 1.65it/s]
61%|ββββββ | 1815/2975 [22:27<11:45, 1.64it/s]
61%|ββββββ | 1816/2975 [22:28<11:46, 1.64it/s]
61%|ββββββ | 1817/2975 [22:28<11:46, 1.64it/s]
61%|ββββοΏ½ |
| 0: {'loss': 0.7259, 'grad_norm': 0.7499492826943769, 'learning_rate': 1e-05, 'epoch': 0.61} |
| 0: {'loss': 0.7127, 'grad_norm': 0.7463183831563831, 'learning_rate': 1e-05, 'epoch': 0.61} |
| 0: οΏ½β | 1818/2975 [22:29<11:47, 1.64it/s]
61%|ββββββ | 1819/2975 [22:30<11:46, 1.64it/s]
61%|ββββββ | 1820/2975 [22:30<11:45, 1.64it/s]
61%|ββββββ | 1820/2975 [22:30<11:45, 1.64it/s]
61%|ββββββ | 1821/2975 [22:31<11:44, 1.64it/s]
61%|ββββββ | 1822/2975 [22:32<11:42, 1.64it/s]
61%|βββββββ | 1823/2975 [22:32<11:41, 1.64it/s]
61%|βββββββ | 1824/2975 [22:33<11:40, 1.64it/s]
61%|βββββββ | 1825/2975 [22:33<11:40, 1.64it/s]
61%|βββββββ | 1826/2975 [22:34<11:39, 1.64it/s]
61%|βββββββ | 1827/2975 [22:35<11:38, 1.64it/s]
61%|βββββββ | 1828/2975 [22:35<11:37, 1.64it/s]
61%|βββββββ | 1829/2975 [22:36<11:37, 1.64it/s]
62%|βββββββ | 1830/2975 [22:36<11:37, 1.64it/s]
62%|ββββββοΏ½ |
| 0: {'loss': 0.7244, 'grad_norm': 0.7877095775584843, 'learning_rate': 1e-05, 'epoch': 0.62} |
| 0: οΏ½ | 1830/2975 [22:36<11:37, 1.64it/s]
62%|βββββββ | 1831/2975 [22:37<11:38, 1.64it/s]
62%|βββββββ | 1832/2975 [22:38<11:42, 1.63it/s]
62%|βββββββ | 1833/2975 [22:38<11:40, 1.63it/s]
62%|βββββββ | 1834/2975 [22:39<11:38, 1.63it/s]
62%|βββββββ | 1835/2975 [22:39<11:37, 1.64it/s]
62%|βββββββ | 1836/2975 [22:40<11:33, 1.64it/s]
62%|βββββββ | 1837/2975 [22:41<11:32, 1.64it/s]
62%|βββββββ | 1838/2975 [22:41<11:32, 1.64it/s]
62%|βββββββ | 1839/2975 [22:42<11:31, 1.64it/s]
62%|βββββββ | 1840/2975 [22:42<11:29, 1.65it/s]
62%|βββββββ | 1840/2975 [22:42<11:29, 1.65it/s]
62%|βββββββ | 1841/2975 [22:43<11:29, 1.65it/s]
62%|βββββββ | 1842/2975 [22:44<11:29, 1.64it/s]
62%|βββββββ | 1843/2975 [22:44<11:28, 1.64it/s]
62%|οΏ½ |
| 0: {'loss': 0.7075, 'grad_norm': 0.7279988466982945, 'learning_rate': 1e-05, 'epoch': 0.62} |
| 0: οΏ½οΏ½ββββββ | 1844/2975 [22:45<11:26, 1.65it/s]
62%|βββββββ | 1845/2975 [22:46<11:28, 1.64it/s]
62%|βββββββ | 1846/2975 [22:46<11:27, 1.64it/s]
62%|βββββββ | 1847/2975 [22:47<11:26, 1.64it/s]
62%|βββββββ | 1848/2975 [22:47<11:25, 1.64it/s]
62%|βββββββ | 1849/2975 [22:48<11:25, 1.64it/s]
62%|βββββββ | 1850/2975 [22:49<11:24, 1.64it/s]
62%|βββββββ | 1850/2975 [22:49<11:24, 1.64it/s]
62%|βββββββ | 1851/2975 [22:49<11:24, 1.64it/s]
62%|βββββββ | 1852/2975 [22:50<11:23, 1.64it/s]
62%|βββββββ | 1853/2975 [22:50<11:21, 1.65it/s]
62%|βββββββ | 1854/2975 [22:51<11:21, 1.65it/s]
62%|βββββββ | 1855/2975 [22:52<11:20, 1.65it/s]
62%|βββββββ | 1856/2975 [22:52<11:20, 1.64it/s]
62%|βββββββ | 1857/2975 [22:53<11:19 |
| 0: {'loss': 0.7137, 'grad_norm': 0.7349254032326754, 'learning_rate': 1e-05, 'epoch': 0.62} |
| 0: , 1.65it/s]
62%|βββββββ | 1858/2975 [22:53<11:19, 1.64it/s]
62%|βββββββ | 1859/2975 [22:54<11:18, 1.65it/s]
63%|βββββββ | 1860/2975 [22:55<11:17, 1.64it/s]
63%|βββββββ | 1860/2975 [22:55<11:17, 1.64it/s]
63%|βββββββ | 1861/2975 [22:55<11:18, 1.64it/s]
63%|βββββββ | 1862/2975 [22:56<11:18, 1.64it/s]
63%|βββββββ | 1863/2975 [22:56<11:18, 1.64it/s]
63%|βββββββ | 1864/2975 [22:57<11:18, 1.64it/s]
63%|βββββββ | 1865/2975 [22:58<11:18, 1.64it/s]
63%|βββββββ | 1866/2975 [22:58<11:18, 1.63it/s]
63%|βββββββ | 1867/2975 [22:59<11:17, 1.64it/s]
63%|βββββββ | 1868/2975 [23:00<11:17, 1.63it/s]
63%|βββββββ | 1869/2975 [23:00<11:16, 1.64it/s]
63%|βββββββ | 1870/2975 [23:01<11:14, 1.64it/s]
|
| 0: {'loss': 0.704, 'grad_norm': 0.7241088554589961, 'learning_rate': 1e-05, 'epoch': 0.63} |
| 0: {'loss': 0.7115, 'grad_norm': 0.7283852481874094, 'learning_rate': 1e-05, 'epoch': 0.63} |
| 0:
63%|βββββββ | 1870/2975 [23:01<11:14, 1.64it/s]
63%|βββββββ | 1871/2975 [23:01<11:13, 1.64it/s]
63%|βββββββ | 1872/2975 [23:02<11:11, 1.64it/s]
63%|βββββββ | 1873/2975 [23:03<11:11, 1.64it/s]
63%|βββββββ | 1874/2975 [23:03<11:09, 1.64it/s]
63%|βββββββ | 1875/2975 [23:04<11:11, 1.64it/s]
63%|βββββββ | 1876/2975 [23:04<11:10, 1.64it/s]
63%|βββββββ | 1877/2975 [23:05<11:09, 1.64it/s]
63%|βββββββ | 1878/2975 [23:06<11:09, 1.64it/s]
63%|βββββββ | 1879/2975 [23:06<11:08, 1.64it/s]
63%|βββββββ | 1880/2975 [23:07<11:08, 1.64it/s]
63%|βββββββ | 1880/2975 [23:07<11:08, 1.64it/s]
63%|βββββββ | 1881/2975 [23:07<11:07, 1.64it/s]
63%|βββββββ | 1882/2975 [23:08<11:05, 1.64it/s]
63%|βββββββ |
| 0: {'loss': 0.7055, 'grad_norm': 0.7733755864857219, 'learning_rate': 1e-05, 'epoch': 0.64} |
| 0: | 1883/2975 [23:09<11:04, 1.64it/s]
63%|βββββββ | 1884/2975 [23:09<11:03, 1.64it/s]
63%|βββββββ | 1885/2975 [23:10<11:02, 1.64it/s]
63%|βββββββ | 1886/2975 [23:11<11:01, 1.65it/s]
63%|βββββββ | 1887/2975 [23:11<11:00, 1.65it/s]
63%|βββββββ | 1888/2975 [23:12<11:01, 1.64it/s]
63%|βββββββ | 1889/2975 [23:12<11:00, 1.64it/s]
64%|βββββββ | 1890/2975 [23:13<11:00, 1.64it/s]
64%|βββββββ | 1890/2975 [23:13<11:00, 1.64it/s]
64%|βββββββ | 1891/2975 [23:14<11:01, 1.64it/s]
64%|βββββββ | 1892/2975 [23:14<11:01, 1.64it/s]
64%|βββββββ | 1893/2975 [23:15<11:00, 1.64it/s]
64%|βββββββ | 1894/2975 [23:15<10:59, 1.64it/s]
64%|βββββββ | 1895/2975 [23:16<10:57, 1.64it/s]
64%|βββββββ | 1896/2975 [23:17<10:56, 1.64it/s]
64%|βοΏ½ |
| 0: {'loss': 0.701, 'grad_norm': 0.8224231396862601, 'learning_rate': 1e-05, 'epoch': 0.64} |
| 0: οΏ½οΏ½βββββ | 1897/2975 [23:17<10:55, 1.65it/s]
64%|βββββββ | 1898/2975 [23:18<10:54, 1.65it/s]
64%|βββββββ | 1899/2975 [23:18<10:53, 1.65it/s]
64%|βββββββ | 1900/2975 [23:19<10:51, 1.65it/s]
64%|βββββββ | 1900/2975 [23:19<10:51, 1.65it/s]
64%|βββββββ | 1901/2975 [23:20<10:51, 1.65it/s]
64%|βββββββ | 1902/2975 [23:20<10:50, 1.65it/s]
64%|βββββββ | 1903/2975 [23:21<10:51, 1.65it/s]
64%|βββββββ | 1904/2975 [23:21<10:51, 1.64it/s]
64%|βββββββ | 1905/2975 [23:22<10:50, 1.64it/s]
64%|βββββββ | 1906/2975 [23:23<10:50, 1.64it/s]
64%|βββββββ | 1907/2975 [23:23<10:49, 1.64it/s]
64%|βββββββ | 1908/2975 [23:24<10:48, 1.64it/s]
64%|βββββββ | 1909/2975 [23:25<10:48, 1.64it/s]
64%|βββββββ | 1910/2975 [23:25<10:47, |
| 0: {'loss': 0.7237, 'grad_norm': 0.7734811371365439, 'learning_rate': 1e-05, 'epoch': 0.64} |
| 0: {'loss': 0.695, 'grad_norm': 0.7196135161346667, 'learning_rate': 1e-05, 'epoch': 0.65} |
| 0: 1.65it/s]
64%|βββββββ | 1910/2975 [23:25<10:47, 1.65it/s]
64%|βββββββ | 1911/2975 [23:26<10:46, 1.65it/s]
64%|βββββββ | 1912/2975 [23:26<10:46, 1.64it/s]
64%|βββββββ | 1913/2975 [23:27<10:46, 1.64it/s]
64%|βββββββ | 1914/2975 [23:28<10:44, 1.65it/s]
64%|βββββββ | 1915/2975 [23:28<10:43, 1.65it/s]
64%|βββββββ | 1916/2975 [23:29<10:43, 1.64it/s]
64%|βββββββ | 1917/2975 [23:29<10:45, 1.64it/s]
64%|βββββββ | 1918/2975 [23:30<10:43, 1.64it/s]
65%|βββββββ | 1919/2975 [23:31<10:42, 1.64it/s]
65%|βββββββ | 1920/2975 [23:31<10:42, 1.64it/s]
65%|βββββββ | 1920/2975 [23:31<10:42, 1.64it/s]
65%|βββββββ | 1921/2975 [23:32<10:43, 1.64it/s]
65%|βββββββ | 1922/2975 [23:32<10 |
| 0: {'loss': 0.7186, 'grad_norm': 0.7760509033852916, 'learning_rate': 1e-05, 'epoch': 0.65} |
| 0: :42, 1.64it/s]
65%|βββββββ | 1923/2975 [23:33<10:40, 1.64it/s]
65%|βββββββ | 1924/2975 [23:34<10:40, 1.64it/s]
65%|βββββββ | 1925/2975 [23:34<10:40, 1.64it/s]
65%|βββββββ | 1926/2975 [23:35<10:41, 1.64it/s]
65%|βββββββ | 1927/2975 [23:35<10:40, 1.64it/s]
65%|βββββββ | 1928/2975 [23:36<11:41, 1.49it/s]
65%|βββββββ | 1929/2975 [23:37<11:21, 1.53it/s]
65%|βββββββ | 1930/2975 [23:37<11:06, 1.57it/s]
65%|βββββββ | 1930/2975 [23:37<11:06, 1.57it/s]
65%|βββββββ | 1931/2975 [23:38<10:55, 1.59it/s]
65%|βββββββ | 1932/2975 [23:39<10:47, 1.61it/s]
65%|βββββββ | 1933/2975 [23:39<10:42, 1.62it/s]
65%|βββββββ | 1934/2975 [23:40<10:39, 1.63it/s]
65%|βββββββ | 1935/2975 [23:41<10:36, 1.64it/s]
65%|βββββββ | |
| 0: {'loss': 0.6914, 'grad_norm': 0.7313402259756968, 'learning_rate': 1e-05, 'epoch': 0.65} |
| 0: 1936/2975 [23:41<10:33, 1.64it/s]
65%|βββββββ | 1937/2975 [23:42<10:31, 1.64it/s]
65%|βββββββ | 1938/2975 [23:42<10:30, 1.64it/s]
65%|βββββββ | 1939/2975 [23:43<10:29, 1.65it/s]
65%|βββββββ | 1940/2975 [23:44<10:28, 1.65it/s]
65%|βββββββ | 1940/2975 [23:44<10:28, 1.65it/s]
65%|βββββββ | 1941/2975 [23:44<10:28, 1.65it/s]
65%|βββββββ | 1942/2975 [23:45<10:26, 1.65it/s]
65%|βββββββ | 1943/2975 [23:45<10:25, 1.65it/s]
65%|βββββββ | 1944/2975 [23:46<10:24, 1.65it/s]
65%|βββββββ | 1945/2975 [23:47<10:25, 1.65it/s]
65%|βββββββ | 1946/2975 [23:47<10:26, 1.64it/s]
65%|βββββββ | 1947/2975 [23:48<10:26, 1.64it/s]
65%|βββββββ | 1948/2975 [23:48<10:26, 1.64it/s]
66%|βββββββ | 1949/2975 [23:49<10:26, 1.64it/s]
66%|ββοΏ½ |
| 0: {'loss': 0.7146, 'grad_norm': 0.734114065969648, 'learning_rate': 1e-05, 'epoch': 0.66} |
| 0: {'loss': 0.7279, 'grad_norm': 0.7403305578088101, 'learning_rate': 1e-05, 'epoch': 0.66} |
| 0: οΏ½οΏ½ββββ | 1950/2975 [23:50<10:24, 1.64it/s]
66%|βββββββ | 1950/2975 [23:50<10:24, 1.64it/s]
66%|βββββββ | 1951/2975 [23:50<10:24, 1.64it/s]
66%|βββββββ | 1952/2975 [23:51<10:22, 1.64it/s]
66%|βββββββ | 1953/2975 [23:51<10:21, 1.64it/s]
66%|βββββββ | 1954/2975 [23:52<10:20, 1.65it/s]
66%|βββββββ | 1955/2975 [23:53<10:18, 1.65it/s]
66%|βββββββ | 1956/2975 [23:53<10:18, 1.65it/s]
66%|βββββββ | 1957/2975 [23:54<10:17, 1.65it/s]
66%|βββββββ | 1958/2975 [23:54<10:16, 1.65it/s]
66%|βββββββ | 1959/2975 [23:55<10:14, 1.65it/s]
66%|βββββββ | 1960/2975 [23:56<10:13, 1.65it/s]
66%|βββββββ | 1960/2975 [23:56<10:13, 1.65it/s]
66%|βββββββ | 1961/2975 [23:56<10:14, 1.65it/s]
66%|οΏ½ |
| 0: {'loss': 0.7024, 'grad_norm': 0.7210850729422128, 'learning_rate': 1e-05, 'epoch': 0.66} |
| 0: οΏ½οΏ½ββββββ | 1962/2975 [23:57<10:14, 1.65it/s]
66%|βββββββ | 1963/2975 [23:58<10:16, 1.64it/s]
66%|βββββββ | 1964/2975 [23:58<10:17, 1.64it/s]
66%|βββββββ | 1965/2975 [23:59<10:17, 1.63it/s]
66%|βββββββ | 1966/2975 [23:59<10:18, 1.63it/s]
66%|βββββββ | 1967/2975 [24:00<10:17, 1.63it/s]
66%|βββββββ | 1968/2975 [24:01<10:15, 1.64it/s]
66%|βββββββ | 1969/2975 [24:01<10:13, 1.64it/s]
66%|βββββββ | 1970/2975 [24:02<10:11, 1.64it/s]
66%|βββββββ | 1970/2975 [24:02<10:11, 1.64it/s]
66%|βββββββ | 1971/2975 [24:02<10:10, 1.64it/s]
66%|βββββββ | 1972/2975 [24:03<10:09, 1.65it/s]
66%|βββββββ | 1973/2975 [24:04<10:08, 1.65it/s]
66%|βββββββ | 1974/2975 [24:04<10:07, 1.65it/s]
66%|βββββββ | 1975/2975 [24:05<10:06 |
| 0: {'loss': 0.7115, 'grad_norm': 0.7123479453497946, 'learning_rate': 1e-05, 'epoch': 0.67} |
| 0: , 1.65it/s]
66%|βββββββ | 1976/2975 [24:05<10:06, 1.65it/s]
66%|βββββββ | 1977/2975 [24:06<10:05, 1.65it/s]
66%|βββββββ | 1978/2975 [24:07<10:05, 1.65it/s]
67%|βββββββ | 1979/2975 [24:07<10:06, 1.64it/s]
67%|βββββββ | 1980/2975 [24:08<10:06, 1.64it/s]
67%|βββββββ | 1980/2975 [24:08<10:06, 1.64it/s]
67%|βββββββ | 1981/2975 [24:09<10:07, 1.64it/s]
67%|βββββββ | 1982/2975 [24:09<10:06, 1.64it/s]
67%|βββββββ | 1983/2975 [24:10<10:06, 1.64it/s]
67%|βββββββ | 1984/2975 [24:10<10:05, 1.64it/s]
67%|βββββββ | 1985/2975 [24:11<10:04, 1.64it/s]
67%|βββββββ | 1986/2975 [24:12<10:02, 1.64it/s]
67%|βββββββ | 1987/2975 [24:12<10:00, 1.64it/s]
67%|βββββββ | 1988/2975 [24:13<09:59, 1.65it/s]
67%|βββββββ | 198 |
| 0: {'loss': 0.7311, 'grad_norm': 0.7600061639464267, 'learning_rate': 1e-05, 'epoch': 0.67} |
| 0: {'loss': 0.7109, 'grad_norm': 0.7994233915758029, 'learning_rate': 1e-05, 'epoch': 0.67} |
| 0: 9/2975 [24:13<09:58, 1.65it/s]
67%|βββββββ | 1990/2975 [24:14<09:57, 1.65it/s]
67%|βββββββ | 1990/2975 [24:14<09:57, 1.65it/s]
67%|βββββββ | 1991/2975 [24:15<09:57, 1.65it/s]
67%|βββββββ | 1992/2975 [24:15<09:57, 1.65it/s]
67%|βββββββ | 1993/2975 [24:16<09:56, 1.65it/s]
67%|βββββββ | 1994/2975 [24:16<09:55, 1.65it/s]
67%|βββββββ | 1995/2975 [24:17<09:54, 1.65it/s]
67%|βββββββ | 1996/2975 [24:18<09:54, 1.65it/s]
67%|βββββββ | 1997/2975 [24:18<09:54, 1.65it/s]
67%|βββββββ | 1998/2975 [24:19<09:55, 1.64it/s]
67%|βββββββ | 1999/2975 [24:19<09:54, 1.64it/s]
67%|βββββββ | 2000/2975 [24:20<09:53, 1.64it/s]
67%|βββββββ | 2000/2975 [24:20<09:53, 1.64it/s]
67%|βββββββ |
| 0: {'loss': 0.7085, 'grad_norm': 0.7520626757283446, 'learning_rate': 1e-05, 'epoch': 0.68} |
| 0: | 2001/2975 [24:21<09:53, 1.64it/s]
67%|βββββββ | 2002/2975 [24:21<09:52, 1.64it/s]
67%|βββββββ | 2003/2975 [24:22<09:51, 1.64it/s]
67%|βββββββ | 2004/2975 [24:22<09:50, 1.65it/s]
67%|βββββββ | 2005/2975 [24:23<09:48, 1.65it/s]
67%|βββββββ | 2006/2975 [24:24<09:47, 1.65it/s]
67%|βββββββ | 2007/2975 [24:24<09:47, 1.65it/s]
67%|βββββββ | 2008/2975 [24:25<09:46, 1.65it/s]
68%|βββββββ | 2009/2975 [24:26<09:45, 1.65it/s]
68%|βββββββ | 2010/2975 [24:26<09:43, 1.65it/s]
68%|βββββββ | 2010/2975 [24:26<09:43, 1.65it/s]
68%|βββββββ | 2011/2975 [24:27<09:43, 1.65it/s]
68%|βββββββ | 2012/2975 [24:27<09:42, 1.65it/s]
68%|βββββββ | 2013/2975 [24:28<09:42, 1.65it/s]
68%|βββββββ | 2014/2975 [24:29<09:42, 1.65it/s]
68%|βοΏ½ |
| 0: {'loss': 0.7145, 'grad_norm': 0.7895646915881295, 'learning_rate': 1e-05, 'epoch': 0.68} |
| 0: οΏ½οΏ½βββββ | 2015/2975 [24:29<09:42, 1.65it/s]
68%|βββββββ | 2016/2975 [24:30<09:42, 1.65it/s]
68%|βββββββ | 2017/2975 [24:30<09:43, 1.64it/s]
68%|βββββββ | 2018/2975 [24:31<09:43, 1.64it/s]
68%|βββββββ | 2019/2975 [24:32<09:43, 1.64it/s]
68%|βββββββ | 2020/2975 [24:32<09:42, 1.64it/s]
68%|βββββββ | 2020/2975 [24:32<09:42, 1.64it/s]
68%|βββββββ | 2021/2975 [24:33<09:42, 1.64it/s]
68%|βββββββ | 2022/2975 [24:33<09:41, 1.64it/s]
68%|βββββββ | 2023/2975 [24:34<09:40, 1.64it/s]
68%|βββββββ | 2024/2975 [24:35<09:38, 1.64it/s]
68%|βββββββ | 2025/2975 [24:35<09:37, 1.65it/s]
68%|βββββββ | 2026/2975 [24:36<09:36, 1.65it/s]
68%|βββββββ | 2027/2975 [24:36<09:35, 1.65it/s]
68%|βββββββ | 2028/2975 [24:37<09:34, |
| 0: {'loss': 0.7168, 'grad_norm': 0.7015561317834488, 'learning_rate': 1e-05, 'epoch': 0.68} |
| 0: {'loss': 0.7098, 'grad_norm': 0.7445727424355767, 'learning_rate': 1e-05, 'epoch': 0.69} |
| 0: 1.65it/s]
68%|βββββββ | 2029/2975 [24:38<09:32, 1.65it/s]
68%|βββββββ | 2030/2975 [24:38<09:32, 1.65it/s]
68%|βββββββ | 2030/2975 [24:38<09:32, 1.65it/s]
68%|βββββββ | 2031/2975 [24:39<09:32, 1.65it/s]
68%|βββββββ | 2032/2975 [24:39<09:32, 1.65it/s]
68%|βββββββ | 2033/2975 [24:40<09:32, 1.65it/s]
68%|βββββββ | 2034/2975 [24:41<09:31, 1.65it/s]
68%|βββββββ | 2035/2975 [24:41<09:30, 1.65it/s]
68%|βββββββ | 2036/2975 [24:42<09:30, 1.65it/s]
68%|βββββββ | 2037/2975 [24:43<09:30, 1.64it/s]
69%|βββββββ | 2038/2975 [24:43<09:29, 1.64it/s]
69%|βββββββ | 2039/2975 [24:44<09:30, 1.64it/s]
69%|βββββββ | 2040/2975 [24:44<09:30, 1.64it/s]
69%|βββββββ | 2040/2975 [24:44<09 |
| 0: {'loss': 0.6964, 'grad_norm': 0.7581584631465635, 'learning_rate': 1e-05, 'epoch': 0.69} |
| 0: :30, 1.64it/s]
69%|βββββββ | 2041/2975 [24:45<09:29, 1.64it/s]
69%|βββββββ | 2042/2975 [24:46<09:28, 1.64it/s]
69%|βββββββ | 2043/2975 [24:46<09:27, 1.64it/s]
69%|βββββββ | 2044/2975 [24:47<09:25, 1.65it/s]
69%|βββββββ | 2045/2975 [24:47<09:24, 1.65it/s]
69%|βββββββ | 2046/2975 [24:48<09:23, 1.65it/s]
69%|βββββββ | 2047/2975 [24:49<09:22, 1.65it/s]
69%|βββββββ | 2048/2975 [24:49<09:23, 1.65it/s]
69%|βββββββ | 2049/2975 [24:50<09:23, 1.64it/s]
69%|βββββββ | 2050/2975 [24:50<09:23, 1.64it/s]
69%|βββββββ | 2050/2975 [24:50<09:23, 1.64it/s]
69%|βββββββ | 2051/2975 [24:51<09:22, 1.64it/s]
69%|βββββββ | 2052/2975 [24:52<09:22, 1.64it/s]
69%|βββββββ | 2053/2975 [24:52<09:22, 1.64it/s]
69%|βββββββ | |
| 0: {'loss': 0.7142, 'grad_norm': 0.7617928015384502, 'learning_rate': 1e-05, 'epoch': 0.69} |
| 0: 2054/2975 [24:53<09:21, 1.64it/s]
69%|βββββββ | 2055/2975 [24:53<09:20, 1.64it/s]
69%|βββββββ | 2056/2975 [24:54<09:20, 1.64it/s]
69%|βββββββ | 2057/2975 [24:55<09:19, 1.64it/s]
69%|βββββββ | 2058/2975 [24:55<09:18, 1.64it/s]
69%|βββββββ | 2059/2975 [24:56<09:17, 1.64it/s]
69%|βββββββ | 2060/2975 [24:57<09:16, 1.64it/s]
69%|βββββββ | 2060/2975 [24:57<09:16, 1.64it/s]
69%|βββββββ | 2061/2975 [24:57<09:16, 1.64it/s]
69%|βββββββ | 2062/2975 [24:58<09:16, 1.64it/s]
69%|βββββββ | 2063/2975 [24:58<09:24, 1.61it/s]
69%|βββββββ | 2064/2975 [24:59<09:22, 1.62it/s]
69%|βββββββ | 2065/2975 [25:00<09:19, 1.63it/s]
69%|βββββββ | 2066/2975 [25:00<09:17, 1.63it/s]
69%|βββββββ | 2067/2975 [25:01<09:15, 1.64it/s]
70%|ββοΏ½ |
| 0: {'loss': 0.6963, 'grad_norm': 0.8090055655264392, 'learning_rate': 1e-05, 'epoch': 0.7} |
| 0: {'loss': 0.7027, 'grad_norm': 0.721421977318063, 'learning_rate': 1e-05, 'epoch': 0.7} |
| 0: οΏ½οΏ½ββββ | 2068/2975 [25:01<09:13, 1.64it/s]
70%|βββββββ | 2069/2975 [25:02<09:10, 1.65it/s]
70%|βββββββ | 2070/2975 [25:03<09:09, 1.65it/s]
70%|βββββββ | 2070/2975 [25:03<09:09, 1.65it/s]
70%|βββββββ | 2071/2975 [25:03<09:09, 1.65it/s]
70%|βββββββ | 2072/2975 [25:04<09:08, 1.64it/s]
70%|βββββββ | 2073/2975 [25:04<09:09, 1.64it/s]
70%|βββββββ | 2074/2975 [25:05<09:08, 1.64it/s]
70%|βββββββ | 2075/2975 [25:06<09:06, 1.65it/s]
70%|βββββββ | 2076/2975 [25:06<09:06, 1.64it/s]
70%|βββββββ | 2077/2975 [25:07<09:06, 1.64it/s]
70%|βββββββ | 2078/2975 [25:08<09:06, 1.64it/s]
70%|βββββββ | 2079/2975 [25:08<09:16, 1.61it/s]
70%|βββββββ | 2080/2975 [25:09<09:12, 1.62it/s]
70%|οΏ½ |
| 0: {'loss': 0.7205, 'grad_norm': 0.7883438921335163, 'learning_rate': 1e-05, 'epoch': 0.7} |
| 0: οΏ½οΏ½ββββββ | 2080/2975 [25:09<09:12, 1.62it/s]
70%|βββββββ | 2081/2975 [25:09<09:10, 1.62it/s]
70%|βββββββ | 2082/2975 [25:10<09:09, 1.63it/s]
70%|βββββββ | 2083/2975 [25:11<09:07, 1.63it/s]
70%|βββββββ | 2084/2975 [25:11<09:05, 1.63it/s]
70%|βββββββ | 2085/2975 [25:12<09:04, 1.64it/s]
70%|βββββββ | 2086/2975 [25:12<09:02, 1.64it/s]
70%|βββββββ | 2087/2975 [25:13<09:00, 1.64it/s]
70%|βββββββ | 2088/2975 [25:14<09:00, 1.64it/s]
70%|βββββββ | 2089/2975 [25:14<08:59, 1.64it/s]
70%|βββββββ | 2090/2975 [25:15<08:58, 1.64it/s]
70%|βββββββ | 2090/2975 [25:15<08:58, 1.64it/s]
70%|βββββββ | 2091/2975 [25:15<08:58, 1.64it/s]
70%|βββββββ | 2092/2975 [25:16<08:57, 1.64it/s]
70%|βββββββ | 2093/2975 [25:17<08:56 |
| 0: {'loss': 0.713, 'grad_norm': 0.7444611156598626, 'learning_rate': 1e-05, 'epoch': 0.71} |
| 0: , 1.64it/s]
70%|βββββββ | 2094/2975 [25:17<08:55, 1.64it/s]
70%|βββββββ | 2095/2975 [25:18<08:55, 1.64it/s]
70%|βββββββ | 2096/2975 [25:19<08:55, 1.64it/s]
70%|βββββββ | 2097/2975 [25:19<08:54, 1.64it/s]
71%|βββββββ | 2098/2975 [25:20<08:54, 1.64it/s]
71%|βββββββ | 2099/2975 [25:20<08:54, 1.64it/s]
71%|βββββββ | 2100/2975 [25:21<08:53, 1.64it/s]
71%|βββββββ | 2100/2975 [25:21<08:53, 1.64it/s]
71%|βββββββ | 2101/2975 [25:22<08:52, 1.64it/s]
71%|βββββββ | 2102/2975 [25:22<08:51, 1.64it/s]
71%|βββββββ | 2103/2975 [25:23<08:49, 1.65it/s]
71%|βββββββ | 2104/2975 [25:23<08:48, 1.65it/s]
71%|βββββββ | 2105/2975 [25:24<08:48, 1.65it/s]
71%|βββββββ | 2106/2975 [25:25<08:47, 1.65it/s]
71%|βββββββ | 210 |
| 0: {'loss': 0.6955, 'grad_norm': 0.7425154567734555, 'learning_rate': 1e-05, 'epoch': 0.71} |
| 0: 7/2975 [25:25<08:46, 1.65it/s]
71%|βββββββ | 2108/2975 [25:26<08:45, 1.65it/s]
71%|βββββββ | 2109/2975 [25:26<08:45, 1.65it/s]
71%|βββββββ | 2110/2975 [25:27<08:45, 1.65it/s]
71%|βββββββ | 2110/2975 [25:27<08:45, 1.65it/s]
71%|βββββββ | 2111/2975 [25:28<08:45, 1.65it/s]
71%|βββββββ | 2112/2975 [25:28<08:44, 1.64it/s]
71%|βββββββ | 2113/2975 [25:29<08:45, 1.64it/s]
71%|βββββββ | 2114/2975 [25:29<08:45, 1.64it/s]
71%|βββββββ | 2115/2975 [25:30<08:45, 1.64it/s]
71%|βββββββ | 2116/2975 [25:31<08:44, 1.64it/s]
71%|βββββββ | 2117/2975 [25:31<08:42, 1.64it/s]
71%|βββββββ | 2118/2975 [25:32<08:41, 1.64it/s]
71%|βββββββ | 2119/2975 [25:33<08:40, 1.64it/s]
71%|ββββββββ | 2120/2975 [25:33<08:38, 1.65it/s]
|
| 0: {'loss': 0.7219, 'grad_norm': 0.7459426729679086, 'learning_rate': 1e-05, 'epoch': 0.71} |
| 0: {'loss': 0.7215, 'grad_norm': 0.7310305789819423, 'learning_rate': 1e-05, 'epoch': 0.72} |
| 0:
71%|ββββββββ | 2120/2975 [25:33<08:38, 1.65it/s]
71%|ββββββββ | 2121/2975 [25:34<08:37, 1.65it/s]
71%|ββββββββ | 2122/2975 [25:34<08:36, 1.65it/s]
71%|ββββββββ | 2123/2975 [25:35<08:35, 1.65it/s]
71%|ββββββββ | 2124/2975 [25:36<08:34, 1.66it/s]
71%|ββββββββ | 2125/2975 [25:36<08:33, 1.66it/s]
71%|ββββββββ | 2126/2975 [25:37<08:32, 1.66it/s]
71%|ββββββββ | 2127/2975 [25:37<08:31, 1.66it/s]
72%|ββββββββ | 2128/2975 [25:38<08:30, 1.66it/s]
72%|ββββββββ | 2129/2975 [25:39<08:29, 1.66it/s]
72%|ββββββββ | 2130/2975 [25:39<08:28, 1.66it/s]
72%|ββββββββ | 2130/2975 [25:39<08:28, 1.66it/s]
72%|ββββββββ | 2131/2975 [25:40<08:28, 1.66it/s]
72%|ββββββββ | 2132/2975 [25: |
| 0: {'loss': 0.7142, 'grad_norm': 0.7238767511429106, 'learning_rate': 1e-05, 'epoch': 0.72} |
| 0: 40<08:28, 1.66it/s]
72%|ββββββββ | 2133/2975 [25:41<08:27, 1.66it/s]
72%|ββββββββ | 2134/2975 [25:42<08:26, 1.66it/s]
72%|ββββββββ | 2135/2975 [25:42<08:25, 1.66it/s]
72%|ββββββββ | 2136/2975 [25:43<08:25, 1.66it/s]
72%|ββββββββ | 2137/2975 [25:43<08:25, 1.66it/s]
72%|ββββββββ | 2138/2975 [25:44<08:24, 1.66it/s]
72%|ββββββββ | 2139/2975 [25:45<08:24, 1.66it/s]
72%|ββββββββ | 2140/2975 [25:45<08:24, 1.66it/s]
72%|ββββββββ | 2140/2975 [25:45<08:24, 1.66it/s]
72%|ββββββββ | 2141/2975 [25:46<08:23, 1.66it/s]
72%|ββββββββ | 2142/2975 [25:46<08:23, 1.65it/s]
72%|ββββββββ | 2143/2975 [25:47<08:25, 1.65it/s]
72%|ββββββββ | 2144/2975 [25:48<08:24, 1.65it/s]
72%|ββββββββ | 2145/2975 [25:48<08:23, 1.65it/s |
| 0: {'loss': 0.7035, 'grad_norm': 0.7743032216599863, 'learning_rate': 1e-05, 'epoch': 0.72} |
| 0: ]
72%|ββββββββ | 2146/2975 [25:49<08:23, 1.65it/s]
72%|ββββββββ | 2147/2975 [25:49<08:22, 1.65it/s]
72%|ββββββββ | 2148/2975 [25:50<08:23, 1.64it/s]
72%|ββββββββ | 2149/2975 [25:51<08:23, 1.64it/s]
72%|ββββββββ | 2150/2975 [25:51<08:23, 1.64it/s]
72%|ββββββββ | 2150/2975 [25:51<08:23, 1.64it/s]
72%|ββββββββ | 2151/2975 [25:52<08:22, 1.64it/s]
72%|ββββββββ | 2152/2975 [25:52<08:21, 1.64it/s]
72%|ββββββββ | 2153/2975 [25:53<08:20, 1.64it/s]
72%|ββββββββ | 2154/2975 [25:54<08:19, 1.65it/s]
72%|ββββββββ | 2155/2975 [25:54<08:17, 1.65it/s]
72%|ββββββββ | 2156/2975 [25:55<08:16, 1.65it/s]
73%|ββββββββ | 2157/2975 [25:56<08:15, 1.65it/s]
73%|ββββββββ | 2158/2975 [25:56<08:14, 1.65it/s]
73%|ββββ |
| 0: {'loss': 0.7082, 'grad_norm': 0.7386252426659861, 'learning_rate': 1e-05, 'epoch': 0.73} |
| 0: {'loss': 0.6979, 'grad_norm': 0.7193218822348061, 'learning_rate': 1e-05, 'epoch': 0.73} |
| 0: ββββ | 2159/2975 [25:57<08:13, 1.65it/s]
73%|ββββββββ | 2160/2975 [25:57<08:12, 1.65it/s]
73%|ββββββββ | 2160/2975 [25:57<08:12, 1.65it/s]
73%|ββββββββ | 2161/2975 [25:58<08:11, 1.66it/s]
73%|ββββββββ | 2162/2975 [25:59<08:10, 1.66it/s]
73%|ββββββββ | 2163/2975 [25:59<08:10, 1.66it/s]
73%|ββββββββ | 2164/2975 [26:00<08:09, 1.66it/s]
73%|ββββββββ | 2165/2975 [26:00<08:10, 1.65it/s]
73%|ββββββββ | 2166/2975 [26:01<08:09, 1.65it/s]
73%|ββββββββ | 2167/2975 [26:02<08:08, 1.65it/s]
73%|ββββββββ | 2168/2975 [26:02<08:08, 1.65it/s]
73%|ββββββββ | 2169/2975 [26:03<08:08, 1.65it/s]
73%|ββββββββ | 2170/2975 [26:03<08:09, 1.65it/s]
73%|ββββββββ | 2170/2975 [26:03<0 |
| 0: {'loss': 0.7047, 'grad_norm': 0.7075505087205055, 'learning_rate': 1e-05, 'epoch': 0.73} |
| 0: 8:09, 1.65it/s]
73%|ββββββββ | 2171/2975 [26:04<08:09, 1.64it/s]
73%|ββββββββ | 2172/2975 [26:05<08:08, 1.64it/s]
73%|ββββββββ | 2173/2975 [26:05<08:08, 1.64it/s]
73%|ββββββββ | 2174/2975 [26:06<08:08, 1.64it/s]
73%|ββββββββ | 2175/2975 [26:06<08:07, 1.64it/s]
73%|ββββββββ | 2176/2975 [26:07<08:05, 1.64it/s]
73%|ββββββββ | 2177/2975 [26:08<08:04, 1.65it/s]
73%|ββββββββ | 2178/2975 [26:08<08:03, 1.65it/s]
73%|ββββββββ | 2179/2975 [26:09<08:02, 1.65it/s]
73%|ββββββββ | 2180/2975 [26:09<08:02, 1.65it/s]
73%|ββββββββ | 2180/2975 [26:09<08:02, 1.65it/s]
73%|ββββββββ | 2181/2975 [26:10<08:01, 1.65it/s]
73%|ββββββββ | 2182/2975 [26:11<08:00, 1.65it/s]
73%|ββββββββ | 2183/2975 [26:11<07:59, 1.65it/s]
7 |
| 0: {'loss': 0.716, 'grad_norm': 0.7458277455926593, 'learning_rate': 1e-05, 'epoch': 0.74} |
| 0: 3%|ββββββββ | 2184/2975 [26:12<07:58, 1.65it/s]
73%|ββββββββ | 2185/2975 [26:12<07:58, 1.65it/s]
73%|ββββββββ | 2186/2975 [26:13<07:57, 1.65it/s]
74%|ββββββββ | 2187/2975 [26:14<07:57, 1.65it/s]
74%|ββββββββ | 2188/2975 [26:14<07:57, 1.65it/s]
74%|ββββββββ | 2189/2975 [26:15<07:56, 1.65it/s]
74%|ββββββββ | 2190/2975 [26:16<07:55, 1.65it/s]
74%|ββββββββ | 2190/2975 [26:16<07:55, 1.65it/s]
74%|ββββββββ | 2191/2975 [26:16<07:56, 1.65it/s]
74%|ββββββββ | 2192/2975 [26:17<07:55, 1.65it/s]
74%|ββββββββ | 2193/2975 [26:17<07:54, 1.65it/s]
74%|ββββββββ | 2194/2975 [26:18<07:53, 1.65it/s]
74%|ββββββββ | 2195/2975 [26:19<07:54, 1.64it/s]
74%|ββββββββ | 2196/2975 [26:19<07:53, 1.65it/s]
74%|βββββοΏ½ |
| 0: {'loss': 0.7161, 'grad_norm': 0.7587663874878008, 'learning_rate': 1e-05, 'epoch': 0.74} |
| 0: οΏ½οΏ½ββ | 2197/2975 [26:20<07:53, 1.64it/s]
74%|ββββββββ | 2198/2975 [26:20<07:53, 1.64it/s]
74%|ββββββββ | 2199/2975 [26:21<07:52, 1.64it/s]
74%|ββββββββ | 2200/2975 [26:22<07:51, 1.64it/s]
74%|ββββββββ | 2200/2975 [26:22<07:51, 1.64it/s]
74%|ββββββββ | 2201/2975 [26:22<07:51, 1.64it/s]
74%|ββββββββ | 2202/2975 [26:23<07:50, 1.64it/s]
74%|ββββββββ | 2203/2975 [26:23<07:50, 1.64it/s]
74%|ββββββββ | 2204/2975 [26:24<07:49, 1.64it/s]
74%|ββββββββ | 2205/2975 [26:25<07:48, 1.64it/s]
74%|ββββββββ | 2206/2975 [26:25<07:47, 1.64it/s]
74%|ββββββββ | 2207/2975 [26:26<07:47, 1.64it/s]
74%|ββββββββ | 2208/2975 [26:26<07:46, 1.64it/s]
74%|ββββββββ | 2209/2975 [26:27<07:46, 1.64it/s]
74%|ββββββββ | 2210/29 |
| 0: {'loss': 0.7097, 'grad_norm': 0.7377475407946548, 'learning_rate': 1e-05, 'epoch': 0.74} |
| 0: {'loss': 0.7072, 'grad_norm': 0.77022569766666, 'learning_rate': 1e-05, 'epoch': 0.75} |
| 0: 75 [26:28<07:46, 1.64it/s]
74%|ββββββββ | 2210/2975 [26:28<07:46, 1.64it/s]
74%|ββββββββ | 2211/2975 [26:28<07:46, 1.64it/s]
74%|ββββββββ | 2212/2975 [26:29<07:45, 1.64it/s]
74%|ββββββββ | 2213/2975 [26:30<07:45, 1.64it/s]
74%|ββββββββ | 2214/2975 [26:30<07:44, 1.64it/s]
74%|ββββββββ | 2215/2975 [26:31<07:42, 1.64it/s]
74%|ββββββββ | 2216/2975 [26:31<07:41, 1.65it/s]
75%|ββββββββ | 2217/2975 [26:32<07:40, 1.65it/s]
75%|ββββββββ | 2218/2975 [26:33<07:39, 1.65it/s]
75%|ββββββββ | 2219/2975 [26:33<07:38, 1.65it/s]
75%|ββββββββ | 2220/2975 [26:34<07:38, 1.65it/s]
75%|ββββββββ | 2220/2975 [26:34<07:38, 1.65it/s]
75%|ββββββββ | 2221/2975 [26:34<07:38, 1.65it/s]
75%|οΏ½ |
| 0: {'loss': 0.7296, 'grad_norm': 0.8032957399887488, 'learning_rate': 1e-05, 'epoch': 0.75} |
| 0: οΏ½οΏ½βββββββ | 2222/2975 [26:35<07:36, 1.65it/s]
75%|ββββββββ | 2223/2975 [26:36<07:36, 1.65it/s]
75%|ββββββββ | 2224/2975 [26:36<07:35, 1.65it/s]
75%|ββββββββ | 2225/2975 [26:37<07:41, 1.63it/s]
75%|ββββββββ | 2226/2975 [26:37<07:40, 1.63it/s]
75%|ββββββββ | 2227/2975 [26:38<07:37, 1.63it/s]
75%|ββββββββ | 2228/2975 [26:39<07:36, 1.64it/s]
75%|ββββββββ | 2229/2975 [26:39<07:35, 1.64it/s]
75%|ββββββββ | 2230/2975 [26:40<07:34, 1.64it/s]
75%|ββββββββ | 2230/2975 [26:40<07:34, 1.64it/s]
75%|ββββββββ | 2231/2975 [26:40<07:33, 1.64it/s]
75%|ββββββββ | 2232/2975 [26:41<07:31, 1.64it/s]
75%|ββββββββ | 2233/2975 [26:42<07:31, 1.64it/s]
75%|ββββββββ | 2234/2975 [26:42<07:29, 1.65it/s]
75%|ββββββοΏ½ |
| 0: {'loss': 0.7217, 'grad_norm': 0.8944500828928701, 'learning_rate': 1e-05, 'epoch': 0.75} |
| 0: οΏ½β | 2235/2975 [26:43<07:28, 1.65it/s]
75%|ββββββββ | 2236/2975 [26:44<07:27, 1.65it/s]
75%|ββββββββ | 2237/2975 [26:44<07:27, 1.65it/s]
75%|ββββββββ | 2238/2975 [26:45<07:26, 1.65it/s]
75%|ββββββββ | 2239/2975 [26:45<07:26, 1.65it/s]
75%|ββββββββ | 2240/2975 [26:46<07:26, 1.65it/s]
75%|ββββββββ | 2240/2975 [26:46<07:26, 1.65it/s]
75%|ββββββββ | 2241/2975 [26:47<07:27, 1.64it/s]
75%|ββββββββ | 2242/2975 [26:47<07:26, 1.64it/s]
75%|ββββββββ | 2243/2975 [26:48<07:26, 1.64it/s]
75%|ββββββββ | 2244/2975 [26:48<07:30, 1.62it/s]
75%|ββββββββ | 2245/2975 [26:49<07:27, 1.63it/s]
75%|ββββββββ | 2246/2975 [26:50<07:25, 1.64it/s]
76%|ββββββββ | 2247/2975 [26:50<07:24, 1.64it/s]
76%|ββββββββ | 2248/2975 [ |
| 0: {'loss': 0.7271, 'grad_norm': 0.7628192737409064, 'learning_rate': 1e-05, 'epoch': 0.76} |
| 0: {'loss': 0.7152, 'grad_norm': 0.7357763726593973, 'learning_rate': 1e-05, 'epoch': 0.76} |
| 0: 26:51<07:23, 1.64it/s]
76%|ββββββββ | 2249/2975 [26:51<07:22, 1.64it/s]
76%|ββββββββ | 2250/2975 [26:52<07:21, 1.64it/s]
76%|ββββββββ | 2250/2975 [26:52<07:21, 1.64it/s]
76%|ββββββββ | 2251/2975 [26:53<07:21, 1.64it/s]
76%|ββββββββ | 2252/2975 [26:53<07:21, 1.64it/s]
76%|ββββββββ | 2253/2975 [26:54<07:20, 1.64it/s]
76%|ββββββββ | 2254/2975 [26:55<07:20, 1.64it/s]
76%|ββββββββ | 2255/2975 [26:55<07:19, 1.64it/s]
76%|ββββββββ | 2256/2975 [26:56<07:19, 1.64it/s]
76%|ββββββββ | 2257/2975 [26:56<07:18, 1.64it/s]
76%|ββββββββ | 2258/2975 [26:57<07:16, 1.64it/s]
76%|ββββββββ | 2259/2975 [26:58<07:16, 1.64it/s]
76%|ββββββββ | 2260/2975 [26:58<07:14, 1.65it/s]
76%|βοΏ½ |
| 0: {'loss': 0.6946, 'grad_norm': 0.7573853557180044, 'learning_rate': 1e-05, 'epoch': 0.76} |
| 0: οΏ½ββββββ | 2260/2975 [26:58<07:14, 1.65it/s]
76%|ββββββββ | 2261/2975 [26:59<07:13, 1.65it/s]
76%|ββββββββ | 2262/2975 [26:59<07:13, 1.64it/s]
76%|ββββββββ | 2263/2975 [27:00<07:13, 1.64it/s]
76%|ββββββββ | 2264/2975 [27:01<07:13, 1.64it/s]
76%|ββββββββ | 2265/2975 [27:01<07:19, 1.62it/s]
76%|ββββββββ | 2266/2975 [27:02<07:16, 1.62it/s]
76%|ββββββββ | 2267/2975 [27:02<07:14, 1.63it/s]
76%|ββββββββ | 2268/2975 [27:03<07:12, 1.63it/s]
76%|ββββββββ | 2269/2975 [27:04<07:11, 1.64it/s]
76%|ββββββββ | 2270/2975 [27:04<07:09, 1.64it/s]
76%|ββββββββ | 2270/2975 [27:04<07:09, 1.64it/s]
76%|ββββββββ | 2271/2975 [27:05<07:08, 1.64it/s]
76%|ββββββββ | 2272/2975 [27:05<07:07, 1.64it/s]
76%|ββββββββ |
| 0: {'loss': 0.6989, 'grad_norm': 0.7584677946820176, 'learning_rate': 1e-05, 'epoch': 0.77} |
| 0: | 2273/2975 [27:06<07:05, 1.65it/s]
76%|ββββββββ | 2274/2975 [27:07<07:05, 1.65it/s]
76%|ββββββββ | 2275/2975 [27:07<07:04, 1.65it/s]
77%|ββββββββ | 2276/2975 [27:08<07:03, 1.65it/s]
77%|ββββββββ | 2277/2975 [27:09<07:01, 1.65it/s]
77%|ββββββββ | 2278/2975 [27:09<07:01, 1.66it/s]
77%|ββββββββ | 2279/2975 [27:10<07:01, 1.65it/s]
77%|ββββββββ | 2280/2975 [27:10<07:01, 1.65it/s]
77%|ββββββββ | 2280/2975 [27:10<07:01, 1.65it/s]
77%|ββββββββ | 2281/2975 [27:11<07:01, 1.65it/s]
77%|ββββββββ | 2282/2975 [27:12<07:01, 1.65it/s]
77%|ββββββββ | 2283/2975 [27:12<07:00, 1.64it/s]
77%|ββββββββ | 2284/2975 [27:13<07:00, 1.64it/s]
77%|ββββββββ | 2285/2975 [27:13<07:00, 1.64it/s]
77%|ββββββββ | 2286/2975 [27:1 |
| 0: {'loss': 0.6925, 'grad_norm': 0.7468414910667224, 'learning_rate': 1e-05, 'epoch': 0.77} |
| 0: 4<06:59, 1.64it/s]
77%|ββββββββ | 2287/2975 [27:15<06:59, 1.64it/s]
77%|ββββββββ | 2288/2975 [27:15<06:58, 1.64it/s]
77%|ββββββββ | 2289/2975 [27:16<06:57, 1.64it/s]
77%|ββββββββ | 2290/2975 [27:16<06:55, 1.65it/s]
77%|ββββββββ | 2290/2975 [27:16<06:55, 1.65it/s]
77%|ββββββββ | 2291/2975 [27:17<06:54, 1.65it/s]
77%|ββββββββ | 2292/2975 [27:18<06:53, 1.65it/s]
77%|ββββββββ | 2293/2975 [27:18<06:52, 1.65it/s]
77%|ββββββββ | 2294/2975 [27:19<06:51, 1.65it/s]
77%|ββββββββ | 2295/2975 [27:19<06:51, 1.65it/s]
77%|ββββββββ | 2296/2975 [27:20<06:50, 1.66it/s]
77%|ββββββββ | 2297/2975 [27:21<06:49, 1.66it/s]
77%|ββββββββ | 2298/2975 [27:21<06:49, 1.65it/s]
77%|ββββββββ | 2299/2975 [27:22<06:48, 1.65it/s] |
| 0: {'loss': 0.7231, 'grad_norm': 0.7731197620627839, 'learning_rate': 1e-05, 'epoch': 0.77} |
| 0: {'loss': 0.7266, 'grad_norm': 0.7237676476481548, 'learning_rate': 1e-05, 'epoch': 0.78} |
| 0:
77%|ββββββββ | 2300/2975 [27:22<06:47, 1.66it/s]
77%|ββββββββ | 2300/2975 [27:22<06:47, 1.66it/s]
77%|ββββββββ | 2301/2975 [27:23<06:47, 1.66it/s]
77%|ββββββββ | 2302/2975 [27:24<06:47, 1.65it/s]
77%|ββββββββ | 2303/2975 [27:24<06:47, 1.65it/s]
77%|ββββββββ | 2304/2975 [27:25<06:47, 1.65it/s]
77%|ββββββββ | 2305/2975 [27:25<06:47, 1.64it/s]
78%|ββββββββ | 2306/2975 [27:26<06:47, 1.64it/s]
78%|ββββββββ | 2307/2975 [27:27<06:46, 1.64it/s]
78%|ββββββββ | 2308/2975 [27:27<06:45, 1.64it/s]
78%|ββββββββ | 2309/2975 [27:28<06:45, 1.64it/s]
78%|ββββββββ | 2310/2975 [27:31<13:27, 1.21s/it]
78%|ββββββββ | 2310/2975 [27:31<13:27, 1.21s/it]
78%|ββββββββ | |
| 0: {'loss': 0.7102, 'grad_norm': 0.7563453931672067, 'learning_rate': 1e-05, 'epoch': 0.78} |
| 0: 2311/2975 [27:31<11:27, 1.04s/it]
78%|ββββββββ | 2312/2975 [27:32<10:06, 1.09it/s]
78%|ββββββββ | 2313/2975 [27:34<15:12, 1.38s/it]
78%|ββββββββ | 2314/2975 [27:35<12:52, 1.17s/it]
78%|ββββββββ | 2315/2975 [27:36<11:00, 1.00s/it]
78%|ββββββββ | 2316/2975 [27:36<09:41, 1.13it/s]
78%|ββββββββ | 2317/2975 [27:37<08:46, 1.25it/s]
78%|ββββββββ | 2318/2975 [27:37<08:07, 1.35it/s]
78%|ββββββββ | 2319/2975 [27:38<07:40, 1.42it/s]
78%|ββββββββ | 2320/2975 [27:39<07:21, 1.49it/s]
78%|ββββββββ | 2320/2975 [27:39<07:21, 1.49it/s]
78%|ββββββββ | 2321/2975 [27:39<07:07, 1.53it/s]
78%|ββββββββ | 2322/2975 [27:40<06:57, 1.56it/s]
78%|ββββββββ | 2323/2975 [27:40<06:54, 1.57it/s]
78%|ββββββββ | 2324/2975 [27:41<06 |
| 0: {'loss': 0.6981, 'grad_norm': 0.7137144777380231, 'learning_rate': 1e-05, 'epoch': 0.78} |
| 0: :53, 1.57it/s]
78%|ββββββββ | 2325/2975 [27:42<07:07, 1.52it/s]
78%|ββββββββ | 2326/2975 [27:42<07:01, 1.54it/s]
78%|ββββββββ | 2327/2975 [27:43<06:54, 1.56it/s]
78%|ββββββββ | 2328/2975 [27:44<08:55, 1.21it/s]
78%|ββββββββ | 2329/2975 [27:45<08:13, 1.31it/s]
78%|ββββββββ | 2330/2975 [27:46<07:42, 1.39it/s]
78%|ββββββββ | 2330/2975 [27:46<07:42, 1.39it/s]
78%|ββββββββ | 2331/2975 [27:46<07:21, 1.46it/s]
78%|ββββββββ | 2332/2975 [27:49<13:33, 1.26s/it]
78%|ββββββββ | 2333/2975 [27:49<11:34, 1.08s/it]
78%|ββββββββ | 2334/2975 [27:50<10:03, 1.06it/s]
78%|ββββββββ | 2335/2975 [27:51<08:59, 1.19it/s]
79%|ββββββββ | 2336/2975 [27:51<08:12, 1.30it/s]
79%|ββββββββ | 2337/2975 [27:52<07:40, 1.38it/s]
79 |
| 0: {'loss': 0.7058, 'grad_norm': 0.7535662693178167, 'learning_rate': 1e-05, 'epoch': 0.79} |
| 0: %|ββββββββ | 2338/2975 [27:52<07:17, 1.46it/s]
79%|ββββββββ | 2339/2975 [27:53<07:00, 1.51it/s]
79%|ββββββββ | 2340/2975 [27:54<06:49, 1.55it/s]
79%|ββββββββ | 2340/2975 [27:54<06:49, 1.55it/s]
79%|ββββββββ | 2341/2975 [27:54<06:40, 1.58it/s]
79%|ββββββββ | 2342/2975 [27:55<06:34, 1.60it/s]
79%|ββββββββ | 2343/2975 [27:55<06:29, 1.62it/s]
79%|ββββββββ | 2344/2975 [27:56<06:26, 1.63it/s]
79%|ββββββββ | 2345/2975 [27:57<06:23, 1.64it/s]
79%|ββββββββ | 2346/2975 [27:57<06:21, 1.65it/s]
79%|ββββββββ | 2347/2975 [27:58<06:20, 1.65it/s]
79%|ββββββββ | 2348/2975 [27:58<06:19, 1.65it/s]
79%|ββββββββ | 2349/2975 [27:59<06:18, 1.65it/s]
79%|ββββββββ | 2350/2975 [28:00<06:17, 1.65it/s]
|
| 0: {'loss': 0.7054, 'grad_norm': 0.7380719530155119, 'learning_rate': 1e-05, 'epoch': 0.79} |
| 0: {'loss': 0.7059, 'grad_norm': 0.7417744754153898, 'learning_rate': 1e-05, 'epoch': 0.79} |
| 0:
79%|ββββββββ | 2350/2975 [28:00<06:17, 1.65it/s]
79%|ββββββββ | 2351/2975 [28:00<06:17, 1.65it/s]
79%|ββββββββ | 2352/2975 [28:01<06:16, 1.66it/s]
79%|ββββββββ | 2353/2975 [28:01<06:15, 1.66it/s]
79%|ββββββββ | 2354/2975 [28:02<06:14, 1.66it/s]
79%|ββββββββ | 2355/2975 [28:03<06:13, 1.66it/s]
79%|ββββββββ | 2356/2975 [28:03<06:12, 1.66it/s]
79%|ββββββββ | 2357/2975 [28:04<06:11, 1.66it/s]
79%|ββββββββ | 2358/2975 [28:04<06:11, 1.66it/s]
79%|ββββββββ | 2359/2975 [28:05<06:11, 1.66it/s]
79%|ββββββββ | 2360/2975 [28:06<06:10, 1.66it/s]
79%|ββββββββ | 2360/2975 [28:06<06:10, 1.66it/s]
79%|ββββββββ | 2361/2975 [28:06<06:10, 1.66it/s]
79%|ββββββββ | 2362/2975 [28:07<06:09, |
| 0: {'loss': 0.693, 'grad_norm': 0.7211124714468495, 'learning_rate': 1e-05, 'epoch': 0.8} |
| 0: 1.66it/s]
79%|ββββββββ | 2363/2975 [28:08<06:09, 1.66it/s]
79%|ββββββββ | 2364/2975 [28:08<06:08, 1.66it/s]
79%|ββββββββ | 2365/2975 [28:09<06:07, 1.66it/s]
80%|ββββββββ | 2366/2975 [28:09<06:07, 1.66it/s]
80%|ββββββββ | 2367/2975 [28:10<06:06, 1.66it/s]
80%|ββββββββ | 2368/2975 [28:11<06:05, 1.66it/s]
80%|ββββββββ | 2369/2975 [28:11<06:05, 1.66it/s]
80%|ββββββββ | 2370/2975 [28:12<06:04, 1.66it/s]
80%|ββββββββ | 2370/2975 [28:12<06:04, 1.66it/s]
80%|ββββββββ | 2371/2975 [28:12<06:04, 1.66it/s]
80%|ββββββββ | 2372/2975 [28:13<06:03, 1.66it/s]
80%|ββββββββ | 2373/2975 [28:14<06:02, 1.66it/s]
80%|ββββββββ | 2374/2975 [28:14<06:02, 1.66it/s]
80%|ββββββββ | 2375/2975 [28:15<06:01, 1.66it/s]
80%|οΏ½ |
| 0: {'loss': 0.724, 'grad_norm': 0.770275334209149, 'learning_rate': 1e-05, 'epoch': 0.8} |
| 0: οΏ½βββββββ | 2376/2975 [28:15<06:00, 1.66it/s]
80%|ββββββββ | 2377/2975 [28:16<05:59, 1.66it/s]
80%|ββββββββ | 2378/2975 [28:17<05:59, 1.66it/s]
80%|ββββββββ | 2379/2975 [28:17<05:58, 1.66it/s]
80%|ββββββββ | 2380/2975 [28:18<05:58, 1.66it/s]
80%|ββββββββ | 2380/2975 [28:18<05:58, 1.66it/s]
80%|ββββββββ | 2381/2975 [28:20<11:43, 1.18s/it]
80%|ββββββββ | 2382/2975 [28:21<09:59, 1.01s/it]
80%|ββββββββ | 2383/2975 [28:22<08:47, 1.12it/s]
80%|ββββββββ | 2384/2975 [28:22<07:56, 1.24it/s]
80%|ββββββββ | 2385/2975 [28:23<07:21, 1.34it/s]
80%|ββββββββ | 2386/2975 [28:23<06:56, 1.41it/s]
80%|ββββββββ | 2387/2975 [28:24<06:38, 1.48it/s]
80%|ββββββββ | 2388/2975 [28:25<06:24, 1.53it/s]
80%|βββββββ |
| 0: {'loss': 0.6791, 'grad_norm': 0.7226620658914946, 'learning_rate': 1e-05, 'epoch': 0.8} |
| 0: {'loss': 0.7061, 'grad_norm': 0.8579084355726249, 'learning_rate': 1e-05, 'epoch': 0.81} |
| 0: β | 2389/2975 [28:25<06:15, 1.56it/s]
80%|ββββββββ | 2390/2975 [28:26<06:08, 1.59it/s]
80%|ββββββββ | 2390/2975 [28:26<06:08, 1.59it/s]
80%|ββββββββ | 2391/2975 [28:26<06:03, 1.61it/s]
80%|ββββββββ | 2392/2975 [28:27<06:01, 1.61it/s]
80%|ββββββββ | 2393/2975 [28:28<05:57, 1.63it/s]
80%|ββββββββ | 2394/2975 [28:28<05:54, 1.64it/s]
81%|ββββββββ | 2395/2975 [28:29<05:52, 1.65it/s]
81%|ββββββββ | 2396/2975 [28:29<05:51, 1.65it/s]
81%|ββββββββ | 2397/2975 [28:30<05:50, 1.65it/s]
81%|ββββββββ | 2398/2975 [28:31<05:49, 1.65it/s]
81%|ββββββββ | 2399/2975 [28:31<05:48, 1.65it/s]
81%|ββββββββ | 2400/2975 [28:32<05:48, 1.65it/s]
81%|ββββββββ | 2400/2975 [28:32<05:48, 1. |
| 0: {'loss': 0.6831, 'grad_norm': 0.7141326078218623, 'learning_rate': 1e-05, 'epoch': 0.81} |
| 0: 65it/s]
81%|ββββββββ | 2401/2975 [28:32<05:47, 1.65it/s]
81%|ββββββββ | 2402/2975 [28:33<05:46, 1.65it/s]
81%|ββββββββ | 2403/2975 [28:34<05:45, 1.65it/s]
81%|ββββββββ | 2404/2975 [28:34<05:45, 1.65it/s]
81%|ββββββββ | 2405/2975 [28:35<05:44, 1.65it/s]
81%|ββββββββ | 2406/2975 [28:35<05:44, 1.65it/s]
81%|ββββββββ | 2407/2975 [28:36<05:44, 1.65it/s]
81%|ββββββββ | 2408/2975 [28:37<05:43, 1.65it/s]
81%|ββββββββ | 2409/2975 [28:37<05:43, 1.65it/s]
81%|ββββββββ | 2410/2975 [28:38<05:48, 1.62it/s]
81%|ββββββββ | 2410/2975 [28:38<05:48, 1.62it/s]
81%|ββββββββ | 2411/2975 [28:39<05:46, 1.63it/s]
81%|ββββββββ | 2412/2975 [28:39<05:44, 1.64it/s]
81%|ββββββββ | 2413/2975 [28:40<05:43, 1.64it/s]
81%|ββ |
| 0: {'loss': 0.7209, 'grad_norm': 0.7866118483290073, 'learning_rate': 1e-05, 'epoch': 0.81} |
| 0: ββββββ | 2414/2975 [28:40<05:41, 1.64it/s]
81%|ββββββββ | 2415/2975 [28:41<05:40, 1.65it/s]
81%|ββββββββ | 2416/2975 [28:42<05:38, 1.65it/s]
81%|ββββββββ | 2417/2975 [28:42<05:37, 1.65it/s]
81%|βββββββββ | 2418/2975 [28:43<05:36, 1.65it/s]
81%|βββββββββ | 2419/2975 [28:43<05:35, 1.66it/s]
81%|βββββββββ | 2420/2975 [28:44<05:36, 1.65it/s]
81%|βββββββββ | 2420/2975 [28:44<05:36, 1.65it/s]
81%|βββββββββ | 2421/2975 [28:45<05:36, 1.65it/s]
81%|βββββββββ | 2422/2975 [28:45<05:35, 1.65it/s]
81%|βββββββββ | 2423/2975 [28:46<05:33, 1.65it/s]
81%|βββββββββ | 2424/2975 [28:46<05:33, 1.65it/s]
82%|βββββββββ | 2425/2975 [28:47<05:32, 1.65it/s]
82%|βββββββββ | 2426/2975 [28:48<05:32, 1.65it/s]
82%|βοΏ½ |
| 0: {'loss': 0.699, 'grad_norm': 0.7373297652878902, 'learning_rate': 1e-05, 'epoch': 0.82} |
| 0: οΏ½βββββββ | 2427/2975 [28:48<05:31, 1.65it/s]
82%|βββββββββ | 2428/2975 [28:49<05:30, 1.65it/s]
82%|βββββββββ | 2429/2975 [28:49<05:30, 1.65it/s]
82%|βββββββββ | 2430/2975 [28:50<05:30, 1.65it/s]
82%|βββββββββ | 2430/2975 [28:50<05:30, 1.65it/s]
82%|βββββββββ | 2431/2975 [28:51<05:30, 1.65it/s]
82%|βββββββββ | 2432/2975 [28:51<05:29, 1.65it/s]
82%|βββββββββ | 2433/2975 [28:52<05:29, 1.65it/s]
82%|βββββββββ | 2434/2975 [28:52<05:28, 1.65it/s]
82%|βββββββββ | 2435/2975 [28:53<05:27, 1.65it/s]
82%|βββββββββ | 2436/2975 [28:54<05:26, 1.65it/s]
82%|βββββββββ | 2437/2975 [28:54<05:26, 1.65it/s]
82%|βββββββββ | 2438/2975 [28:55<05:25, 1.65it/s]
82%|βββββββββ | 2439/2975 [28:55<05:25, 1.65it/s]
|
| 0: {'loss': 0.7002, 'grad_norm': 0.7990406058141861, 'learning_rate': 1e-05, 'epoch': 0.82} |
| 0: {'loss': 0.7136, 'grad_norm': 0.7467991232331265, 'learning_rate': 1e-05, 'epoch': 0.82} |
| 0: 82%|βββββββββ | 2440/2975 [28:56<05:24, 1.65it/s]
82%|βββββββββ | 2440/2975 [28:56<05:24, 1.65it/s]
82%|βββββββββ | 2441/2975 [28:57<05:59, 1.48it/s]
82%|βββββββββ | 2442/2975 [28:58<05:48, 1.53it/s]
82%|βββββββββ | 2443/2975 [28:58<05:39, 1.57it/s]
82%|βββββββββ | 2444/2975 [28:59<05:32, 1.60it/s]
82%|βββββββββ | 2445/2975 [28:59<05:28, 1.61it/s]
82%|βββββββββ | 2446/2975 [29:00<05:25, 1.63it/s]
82%|βββββββββ | 2447/2975 [29:01<05:22, 1.64it/s]
82%|βββββββββ | 2448/2975 [29:01<05:20, 1.64it/s]
82%|βββββββββ | 2449/2975 [29:02<05:19, 1.65it/s]
82%|βββββββββ | 2450/2975 [29:02<05:18, 1.65it/s]
82%|βββββββββ | 2450/2975 [29:02<05:18, 1.65it/s]
82%|βοΏ½ |
| 0: {'loss': 0.7189, 'grad_norm': 0.7557954002797767, 'learning_rate': 1e-05, 'epoch': 0.83} |
| 0: οΏ½οΏ½βββββββ | 2451/2975 [29:03<05:17, 1.65it/s]
82%|βββββββββ | 2452/2975 [29:04<05:16, 1.65it/s]
82%|βββββββββ | 2453/2975 [29:04<05:15, 1.66it/s]
82%|βββββββββ | 2454/2975 [29:05<05:14, 1.66it/s]
83%|βββββββββ | 2455/2975 [29:05<05:13, 1.66it/s]
83%|βββββββββ | 2456/2975 [29:06<05:12, 1.66it/s]
83%|βββββββββ | 2457/2975 [29:07<05:12, 1.66it/s]
83%|βββββββββ | 2458/2975 [29:07<05:11, 1.66it/s]
83%|βββββββββ | 2459/2975 [29:08<05:11, 1.66it/s]
83%|βββββββββ | 2460/2975 [29:08<05:10, 1.66it/s]
83%|βββββββββ | 2460/2975 [29:08<05:10, 1.66it/s]
83%|βββββββββ | 2461/2975 [29:09<05:10, 1.66it/s]
83%|βββββββββ | 2462/2975 [29:10<05:09, 1.66it/s]
83%|βββββββββ | 2463/2975 [29:10<05:08, 1.66it/s]
|
| 0: {'loss': 0.7223, 'grad_norm': 0.7298360516103363, 'learning_rate': 1e-05, 'epoch': 0.83} |
| 0: 83%|βββββββββ | 2464/2975 [29:11<05:07, 1.66it/s]
83%|βββββββββ | 2465/2975 [29:11<05:07, 1.66it/s]
83%|βββββββββ | 2466/2975 [29:12<05:06, 1.66it/s]
83%|βββββββββ | 2467/2975 [29:13<05:05, 1.66it/s]
83%|βββββββββ | 2468/2975 [29:13<05:05, 1.66it/s]
83%|βββββββββ | 2469/2975 [29:14<05:04, 1.66it/s]
83%|βββββββββ | 2470/2975 [29:14<05:04, 1.66it/s]
83%|βββββββββ | 2470/2975 [29:14<05:04, 1.66it/s]
83%|βββββββββ | 2471/2975 [29:15<05:03, 1.66it/s]
83%|βββββββββ | 2472/2975 [29:16<05:03, 1.66it/s]
83%|βββββββββ | 2473/2975 [29:16<05:03, 1.66it/s]
83%|βββββββββ | 2474/2975 [29:17<05:02, 1.66it/s]
83%|βββββββββ | 2475/2975 [29:17<05:01, 1.66it/s]
83%|βββββββββ | 2476/2975 [29:18<05:00, 1 |
| 0: {'loss': 0.674, 'grad_norm': 0.7417790471996113, 'learning_rate': 1e-05, 'epoch': 0.83} |
| 0: .66it/s]
83%|βββββββββ | 2477/2975 [29:19<04:59, 1.66it/s]
83%|βββββββββ | 2478/2975 [29:19<04:59, 1.66it/s]
83%|βββββββββ | 2479/2975 [29:20<04:58, 1.66it/s]
83%|βββββββββ | 2480/2975 [29:20<04:57, 1.66it/s]
83%|βββββββββ | 2480/2975 [29:20<04:57, 1.66it/s]
83%|βββββββββ | 2481/2975 [29:21<04:57, 1.66it/s]
83%|βββββββββ | 2482/2975 [29:22<04:56, 1.66it/s]
83%|βββββββββ | 2483/2975 [29:22<04:55, 1.66it/s]
83%|βββββββββ | 2484/2975 [29:23<04:54, 1.66it/s]
84%|βββββββββ | 2485/2975 [29:23<04:54, 1.66it/s]
84%|βββββββββ | 2486/2975 [29:24<04:54, 1.66it/s]
84%|βββββββββ | 2487/2975 [29:25<05:22, 1.51it/s]
84%|βββββββββ | 2488/2975 [29:25<05:13, 1.55it/s]
84%|βββββββββ | 2489/2975 [29:26< |
| 0: {'loss': 0.7084, 'grad_norm': 0.7594059959897228, 'learning_rate': 1e-05, 'epoch': 0.84} |
| 0: {'loss': 0.7075, 'grad_norm': 0.7080358402783196, 'learning_rate': 1e-05, 'epoch': 0.84} |
| 0: 05:06, 1.59it/s]
84%|βββββββββ | 2490/2975 [29:27<05:01, 1.61it/s]
84%|βββββββββ | 2490/2975 [29:27<05:01, 1.61it/s]
84%|βββββββββ | 2491/2975 [29:27<04:58, 1.62it/s]
84%|βββββββββ | 2492/2975 [29:28<04:55, 1.63it/s]
84%|βββββββββ | 2493/2975 [29:28<04:53, 1.64it/s]
84%|βββββββββ | 2494/2975 [29:29<04:58, 1.61it/s]
84%|βββββββββ | 2495/2975 [29:30<04:56, 1.62it/s]
84%|βββββββββ | 2496/2975 [29:30<04:56, 1.62it/s]
84%|βββββββββ | 2497/2975 [29:31<04:54, 1.62it/s]
84%|βββββββββ | 2498/2975 [29:32<04:52, 1.63it/s]
84%|βββββββββ | 2499/2975 [29:32<04:51, 1.63it/s]
84%|βββββββββ | 2500/2975 [29:33<04:49, 1.64it/s]
84%|βββββββββ | 2500/2975 [29:33<04:49, |
| 0: {'loss': 0.6857, 'grad_norm': 0.7324494757206906, 'learning_rate': 1e-05, 'epoch': 0.84} |
| 0: 1.64it/s]
84%|βββββββββ | 2501/2975 [29:33<04:51, 1.63it/s]
84%|βββββββββ | 2502/2975 [29:34<04:48, 1.64it/s]
84%|βββββββββ | 2503/2975 [29:35<04:46, 1.65it/s]
84%|βββββββββ | 2504/2975 [29:35<04:44, 1.66it/s]
84%|βββββββββ | 2505/2975 [29:36<04:43, 1.66it/s]
84%|βββββββββ | 2506/2975 [29:36<04:42, 1.66it/s]
84%|βββββββββ | 2507/2975 [29:37<04:41, 1.66it/s]
84%|βββββββββ | 2508/2975 [29:38<04:41, 1.66it/s]
84%|βββββββββ | 2509/2975 [29:38<04:40, 1.66it/s]
84%|βββββββββ | 2510/2975 [29:39<04:39, 1.66it/s]
84%|βββββββββ | 2510/2975 [29:39<04:39, 1.66it/s]
84%|βββββββββ | 2511/2975 [29:39<04:39, 1.66it/s]
84%|βββββββββ | 2512/2975 [29:40<04:38, 1.66it/s]
84%|βββββββββ | 2513/2975 [29:41 |
| 0: {'loss': 0.6943, 'grad_norm': 0.7764822865310044, 'learning_rate': 1e-05, 'epoch': 0.85} |
| 0: <04:38, 1.66it/s]
85%|βββββββββ | 2514/2975 [29:41<04:37, 1.66it/s]
85%|βββββββββ | 2515/2975 [29:42<04:36, 1.66it/s]
85%|βββββββββ | 2516/2975 [29:42<04:36, 1.66it/s]
85%|βββββββββ | 2517/2975 [29:43<04:35, 1.66it/s]
85%|βββββββββ | 2518/2975 [29:44<04:35, 1.66it/s]
85%|βββββββββ | 2519/2975 [29:44<04:34, 1.66it/s]
85%|βββββββββ | 2520/2975 [29:45<04:33, 1.66it/s]
85%|βββββββββ | 2520/2975 [29:45<04:33, 1.66it/s]
85%|βββββββββ | 2521/2975 [29:45<04:33, 1.66it/s]
85%|βββββββββ | 2522/2975 [29:46<04:32, 1.66it/s]
85%|βββββββββ | 2523/2975 [29:47<04:32, 1.66it/s]
85%|βββββββββ | 2524/2975 [29:47<04:31, 1.66it/s]
85%|βββββββββ | 2525/2975 [29:48<04:31, 1.66it/s]
85%|βββββββββ | 2526/29 |
| 0: {'loss': 0.6938, 'grad_norm': 0.7442177955212868, 'learning_rate': 1e-05, 'epoch': 0.85} |
| 0: 75 [29:48<04:30, 1.66it/s]
85%|βββββββββ | 2527/2975 [29:49<04:29, 1.66it/s]
85%|βββββββββ | 2528/2975 [29:50<04:29, 1.66it/s]
85%|βββββββββ | 2529/2975 [29:50<04:28, 1.66it/s]
85%|βββββββββ | 2530/2975 [29:51<04:28, 1.66it/s]
85%|βββββββββ | 2530/2975 [29:51<04:28, 1.66it/s]
85%|βββββββββ | 2531/2975 [29:51<04:27, 1.66it/s]
85%|βββββββββ | 2532/2975 [29:52<04:26, 1.66it/s]
85%|βββββββββ | 2533/2975 [29:53<04:26, 1.66it/s]
85%|βββββββββ | 2534/2975 [29:53<04:25, 1.66it/s]
85%|βββββββββ | 2535/2975 [29:54<04:25, 1.66it/s]
85%|βββββββββ | 2536/2975 [29:54<04:24, 1.66it/s]
85%|βββββββββ | 2537/2975 [29:55<04:23, 1.66it/s]
85%|βββββββββ | 2538/2975 [29:56<04:22, 1.66it/s]
85%|βββββββββ |
| 0: {'loss': 0.6942, 'grad_norm': 0.7121672642088279, 'learning_rate': 1e-05, 'epoch': 0.85} |
| 0: {'loss': 0.6817, 'grad_norm': 0.7192337762381629, 'learning_rate': 1e-05, 'epoch': 0.86} |
| 0: | 2539/2975 [29:56<04:22, 1.66it/s]
85%|βββββββββ | 2540/2975 [29:57<04:22, 1.66it/s]
85%|βββββββββ | 2540/2975 [29:57<04:22, 1.66it/s]
85%|βββββββββ | 2541/2975 [29:57<04:21, 1.66it/s]
85%|βββββββββ | 2542/2975 [29:58<04:21, 1.66it/s]
85%|βββββββββ | 2543/2975 [29:59<04:20, 1.66it/s]
86%|βββββββββ | 2544/2975 [29:59<04:20, 1.66it/s]
86%|βββββββββ | 2545/2975 [30:00<04:19, 1.66it/s]
86%|βββββββββ | 2546/2975 [30:00<04:18, 1.66it/s]
86%|βββββββββ | 2547/2975 [30:01<04:17, 1.66it/s]
86%|βββββββββ | 2548/2975 [30:02<04:20, 1.64it/s]
86%|βββββββββ | 2549/2975 [30:02<04:19, 1.64it/s]
86%|βββββββββ | 2550/2975 [30:03<04:17, 1.65it/s]
86%|βββββββββ | 2550/2 |
| 0: {'loss': 0.7133, 'grad_norm': 0.7485050272309117, 'learning_rate': 1e-05, 'epoch': 0.86} |
| 0: 975 [30:03<04:17, 1.65it/s]
86%|βββββββββ | 2551/2975 [30:04<04:16, 1.65it/s]
86%|βββββββββ | 2552/2975 [30:04<04:15, 1.66it/s]
86%|βββββββββ | 2553/2975 [30:05<04:14, 1.66it/s]
86%|βββββββββ | 2554/2975 [30:05<04:13, 1.66it/s]
86%|βββββββββ | 2555/2975 [30:06<04:12, 1.66it/s]
86%|βββββββββ | 2556/2975 [30:07<04:12, 1.66it/s]
86%|βββββββββ | 2557/2975 [30:07<04:11, 1.66it/s]
86%|βββββββββ | 2558/2975 [30:08<04:11, 1.66it/s]
86%|βββββββββ | 2559/2975 [30:08<04:10, 1.66it/s]
86%|βββββββββ | 2560/2975 [30:09<04:09, 1.66it/s]
86%|βββββββββ | 2560/2975 [30:09<04:09, 1.66it/s]
86%|βββββββββ | 2561/2975 [30:10<04:39, 1.48it/s]
86%|βββββββββ | 2562/2975 [30:10<04:30, 1.52it/s]
86%|βββββββββ |
| 0: {'loss': 0.699, 'grad_norm': 0.7334239029144919, 'learning_rate': 1e-05, 'epoch': 0.86} |
| 0: | 2563/2975 [30:11<04:24, 1.56it/s]
86%|βββββββββ | 2564/2975 [30:12<04:19, 1.59it/s]
86%|βββββββββ | 2565/2975 [30:12<04:15, 1.60it/s]
86%|βββββββββ | 2566/2975 [30:13<04:12, 1.62it/s]
86%|βββββββββ | 2567/2975 [30:13<04:10, 1.63it/s]
86%|βββββββββ | 2568/2975 [30:14<04:08, 1.64it/s]
86%|βββββββββ | 2569/2975 [30:15<04:07, 1.64it/s]
86%|βββββββββ | 2570/2975 [30:15<04:05, 1.65it/s]
86%|βββββββββ | 2570/2975 [30:15<04:05, 1.65it/s]
86%|βββββββββ | 2571/2975 [30:16<04:05, 1.65it/s]
86%|βββββββββ | 2572/2975 [30:16<04:04, 1.65it/s]
86%|βββββββββ | 2573/2975 [30:17<04:03, 1.65it/s]
87%|βββββββββ | 2574/2975 [30:18<04:02, 1.65it/s]
87%|βββββββββ | 2575/2975 [30:18<04:02, 1.65it/s]
87%|ββββββ |
| 0: {'loss': 0.7182, 'grad_norm': 0.76424968884289, 'learning_rate': 1e-05, 'epoch': 0.87} |
| 0: βββ | 2576/2975 [30:19<04:00, 1.66it/s]
87%|βββββββββ | 2577/2975 [30:19<04:00, 1.66it/s]
87%|βββββββββ | 2578/2975 [30:20<03:59, 1.66it/s]
87%|βββββββββ | 2579/2975 [30:21<03:58, 1.66it/s]
87%|βββββββββ | 2580/2975 [30:21<03:58, 1.66it/s]
87%|βββββββββ | 2580/2975 [30:21<03:58, 1.66it/s]
87%|βββββββββ | 2581/2975 [30:22<03:58, 1.65it/s]
87%|βββββββββ | 2582/2975 [30:22<03:57, 1.65it/s]
87%|βββββββββ | 2583/2975 [30:23<03:56, 1.66it/s]
87%|βββββββββ | 2584/2975 [30:24<03:55, 1.66it/s]
87%|βββββββββ | 2585/2975 [30:24<03:55, 1.66it/s]
87%|βββββββββ | 2586/2975 [30:25<03:54, 1.66it/s]
87%|βββββββββ | 2587/2975 [30:25<03:53, 1.66it/s]
87%|βββββββββ | 2588/2975 [30:26<03:53, 1.66it/s]
87%|βββ |
| 0: {'loss': 0.7038, 'grad_norm': 0.7340157897457464, 'learning_rate': 1e-05, 'epoch': 0.87} |
| 0: {'loss': 0.7181, 'grad_norm': 0.7260402815882088, 'learning_rate': 1e-05, 'epoch': 0.87} |
| 0: ββββββ | 2589/2975 [30:27<03:52, 1.66it/s]
87%|βββββββββ | 2590/2975 [30:27<03:52, 1.66it/s]
87%|βββββββββ | 2590/2975 [30:27<03:52, 1.66it/s]
87%|βββββββββ | 2591/2975 [30:28<03:51, 1.66it/s]
87%|βββββββββ | 2592/2975 [30:29<03:50, 1.66it/s]
87%|βββββββββ | 2593/2975 [30:29<03:50, 1.66it/s]
87%|βββββββββ | 2594/2975 [30:30<03:49, 1.66it/s]
87%|βββββββββ | 2595/2975 [30:30<03:49, 1.66it/s]
87%|βββββββββ | 2596/2975 [30:31<03:48, 1.66it/s]
87%|βββββββββ | 2597/2975 [30:32<03:48, 1.66it/s]
87%|βββββββββ | 2598/2975 [30:32<03:47, 1.65it/s]
87%|βββββββββ | 2599/2975 [30:33<03:47, 1.65it/s]
87%|βββββββββ | 2600/2975 [30:33<03:47, 1.65it/s]
87%|βββββοΏ½ |
| 0: {'loss': 0.6887, 'grad_norm': 0.7334851286396781, 'learning_rate': 1e-05, 'epoch': 0.88} |
| 0: οΏ½βββ | 2600/2975 [30:33<03:47, 1.65it/s]
87%|βββββββββ | 2601/2975 [30:34<03:46, 1.65it/s]
87%|βββββββββ | 2602/2975 [30:35<03:45, 1.65it/s]
87%|βββββββββ | 2603/2975 [30:35<03:45, 1.65it/s]
88%|βββββββββ | 2604/2975 [30:36<03:44, 1.65it/s]
88%|βββββββββ | 2605/2975 [30:36<03:44, 1.65it/s]
88%|βββββββββ | 2606/2975 [30:37<03:43, 1.65it/s]
88%|βββββββββ | 2607/2975 [30:38<03:42, 1.65it/s]
88%|βββββββββ | 2608/2975 [30:38<03:41, 1.66it/s]
88%|βββββββββ | 2609/2975 [30:39<03:41, 1.66it/s]
88%|βββββββββ | 2610/2975 [30:39<03:41, 1.65it/s]
88%|βββββββββ | 2610/2975 [30:39<03:41, 1.65it/s]
88%|βββββββββ | 2611/2975 [30:40<03:40, 1.65it/s]
88%|βββββββββ | 2612/2975 [30:41<03:39, 1.65it/s]
88%|ββοΏ½ |
| 0: {'loss': 0.6814, 'grad_norm': 0.726296902846085, 'learning_rate': 1e-05, 'epoch': 0.88} |
| 0: οΏ½ββββββ | 2613/2975 [30:41<03:39, 1.65it/s]
88%|βββββββββ | 2614/2975 [30:42<03:38, 1.65it/s]
88%|βββββββββ | 2615/2975 [30:42<03:37, 1.65it/s]
88%|βββββββββ | 2616/2975 [30:43<03:36, 1.65it/s]
88%|βββββββββ | 2617/2975 [30:44<03:36, 1.66it/s]
88%|βββββββββ | 2618/2975 [30:44<03:35, 1.66it/s]
88%|βββββββββ | 2619/2975 [30:45<03:34, 1.66it/s]
88%|βββββββββ | 2620/2975 [30:46<04:04, 1.45it/s]
88%|βββββββββ | 2620/2975 [30:46<04:04, 1.45it/s]
88%|βββββββββ | 2621/2975 [30:46<03:55, 1.50it/s]
88%|βββββββββ | 2622/2975 [30:47<03:48, 1.54it/s]
88%|βββββββββ | 2623/2975 [30:48<03:43, 1.57it/s]
88%|βββββββββ | 2624/2975 [30:48<03:40, 1.59it/s]
88%|βββββββββ | 2625/2975 [30:49<03:37, 1.61it/s]
88% |
| 0: {'loss': 0.709, 'grad_norm': 0.7449056931497603, 'learning_rate': 1e-05, 'epoch': 0.88} |
| 0: |βββββββββ | 2626/2975 [30:49<03:35, 1.62it/s]
88%|βββββββββ | 2627/2975 [30:50<03:33, 1.63it/s]
88%|βββββββββ | 2628/2975 [30:51<03:32, 1.64it/s]
88%|βββββββββ | 2629/2975 [30:51<03:31, 1.64it/s]
88%|βββββββββ | 2630/2975 [30:52<03:29, 1.64it/s]
88%|βββββββββ | 2630/2975 [30:52<03:29, 1.64it/s]
88%|βββββββββ | 2631/2975 [30:52<03:29, 1.65it/s]
88%|βββββββββ | 2632/2975 [30:53<03:27, 1.65it/s]
89%|βββββββββ | 2633/2975 [30:54<03:27, 1.65it/s]
89%|βββββββββ | 2634/2975 [30:54<03:26, 1.65it/s]
89%|βββββββββ | 2635/2975 [30:55<03:25, 1.66it/s]
89%|βββββββββ | 2636/2975 [30:55<03:25, 1.65it/s]
89%|βββββββββ | 2637/2975 [30:56<03:24, 1.65it/s]
89%|βββββββββ | 2638/2975 [30:57<03:23, 1.65i |
| 0: {'loss': 0.6999, 'grad_norm': 0.7202233468170463, 'learning_rate': 1e-05, 'epoch': 0.89} |
| 0: {'loss': 0.7101, 'grad_norm': 0.7424194875658202, 'learning_rate': 1e-05, 'epoch': 0.89} |
| 0: t/s]
89%|βββββββββ | 2639/2975 [30:57<03:23, 1.65it/s]
89%|βββββββββ | 2640/2975 [30:58<03:22, 1.65it/s]
89%|βββββββββ | 2640/2975 [30:58<03:22, 1.65it/s]
89%|βββββββββ | 2641/2975 [30:58<03:22, 1.65it/s]
89%|βββββββββ | 2642/2975 [30:59<03:21, 1.65it/s]
89%|βββββββββ | 2643/2975 [31:00<03:21, 1.65it/s]
89%|βββββββββ | 2644/2975 [31:00<03:20, 1.65it/s]
89%|βββββββββ | 2645/2975 [31:01<03:19, 1.65it/s]
89%|βββββββββ | 2646/2975 [31:01<03:18, 1.66it/s]
89%|βββββββββ | 2647/2975 [31:02<03:17, 1.66it/s]
89%|βββββββββ | 2648/2975 [31:03<03:17, 1.66it/s]
89%|βββββββββ | 2649/2975 [31:03<03:16, 1.66it/s]
89%|βββββββββ | 2650/2975 [31:04<03:15, 1.66it/s]
89 |
| 0: {'loss': 0.6882, 'grad_norm': 0.7092015768110003, 'learning_rate': 1e-05, 'epoch': 0.89} |
| 0: %|βββββββββ | 2650/2975 [31:04<03:15, 1.66it/s]
89%|βββββββββ | 2651/2975 [31:04<03:15, 1.66it/s]
89%|βββββββββ | 2652/2975 [31:05<03:14, 1.66it/s]
89%|βββββββββ | 2653/2975 [31:06<03:14, 1.66it/s]
89%|βββββββββ | 2654/2975 [31:06<03:13, 1.66it/s]
89%|βββββββββ | 2655/2975 [31:07<03:14, 1.64it/s]
89%|βββββββββ | 2656/2975 [31:08<03:42, 1.43it/s]
89%|βββββββββ | 2657/2975 [31:08<03:33, 1.49it/s]
89%|βββββββββ | 2658/2975 [31:09<03:26, 1.54it/s]
89%|βββββββββ | 2659/2975 [31:10<03:21, 1.57it/s]
89%|βββββββββ | 2660/2975 [31:10<03:18, 1.59it/s]
89%|βββββββββ | 2660/2975 [31:10<03:18, 1.59it/s]
89%|βββββββββ | 2661/2975 [31:11<03:16, 1.60it/s]
89%|βββββββββ | 2662/2975 [31:11<03:14, 1.61 |
| 0: {'loss': 0.6977, 'grad_norm': 0.7068212020146698, 'learning_rate': 1e-05, 'epoch': 0.9} |
| 0: it/s]
90%|βββββββββ | 2663/2975 [31:12<03:12, 1.62it/s]
90%|βββββββββ | 2664/2975 [31:13<03:11, 1.63it/s]
90%|βββββββββ | 2665/2975 [31:13<03:09, 1.63it/s]
90%|βββββββββ | 2666/2975 [31:14<03:08, 1.64it/s]
90%|βββββββββ | 2667/2975 [31:15<03:07, 1.65it/s]
90%|βββββββββ | 2668/2975 [31:15<03:29, 1.47it/s]
90%|βββββββββ | 2669/2975 [31:16<03:22, 1.51it/s]
90%|βββββββββ | 2670/2975 [31:17<03:17, 1.55it/s]
90%|βββββββββ | 2670/2975 [31:17<03:17, 1.55it/s]
90%|βββββββββ | 2671/2975 [31:17<03:13, 1.57it/s]
90%|βββββββββ | 2672/2975 [31:18<03:10, 1.59it/s]
90%|βββββββββ | 2673/2975 [31:18<03:08, 1.61it/s]
90%|βββββββββ | 2674/2975 [31:19<03:05, 1.62it/s]
90%|βββββββββ | 2675/2975 [31:20<03: |
| 0: {'loss': 0.7009, 'grad_norm': 0.7066569735462942, 'learning_rate': 9.996052735444863e-06, 'epoch': 0.9} |
| 0: 03, 1.63it/s]
90%|βββββββββ | 2676/2975 [31:20<03:02, 1.64it/s]
90%|βββββββββ | 2677/2975 [31:21<03:01, 1.65it/s]
90%|βββββββββ | 2678/2975 [31:21<03:00, 1.65it/s]
90%|βββββββββ | 2679/2975 [31:22<02:58, 1.65it/s]
90%|βββββββββ | 2680/2975 [31:23<03:23, 1.45it/s]
90%|βββββββββ | 2680/2975 [31:23<03:23, 1.45it/s]
90%|βββββββββ | 2681/2975 [31:24<03:15, 1.50it/s]
90%|βββββββββ | 2682/2975 [31:24<03:09, 1.55it/s]
90%|βββββββββ | 2683/2975 [31:25<03:05, 1.58it/s]
90%|βββββββββ | 2684/2975 [31:25<03:01, 1.60it/s]
90%|βββββββββ | 2685/2975 [31:26<02:59, 1.62it/s]
90%|βββββββββ | 2686/2975 [31:27<02:57, 1.63it/s]
90%|βββββββββ | 2687/2975 [31:27<02:55, 1.64it/s]
90%|βββββββββ | 2688/2975 [ |
| 0: {'loss': 0.707, 'grad_norm': 0.7598696013918776, 'learning_rate': 9.951725498333449e-06, 'epoch': 0.9} |
| 0: 31:28<02:54, 1.64it/s]
90%|βββββββββ | 2689/2975 [31:28<02:53, 1.65it/s]
90%|βββββββββ | 2690/2975 [31:29<02:52, 1.65it/s]
90%|βββββββββ | 2690/2975 [31:29<02:52, 1.65it/s]
90%|βββββββββ | 2691/2975 [31:30<02:52, 1.65it/s]
90%|βββββββββ | 2692/2975 [31:30<02:51, 1.65it/s]
91%|βββββββββ | 2693/2975 [31:31<02:50, 1.66it/s]
91%|βββββββββ | 2694/2975 [31:31<02:49, 1.66it/s]
91%|βββββββββ | 2695/2975 [31:32<02:48, 1.66it/s]
91%|βββββββββ | 2696/2975 [31:33<02:47, 1.66it/s]
91%|βββββββββ | 2697/2975 [31:33<02:47, 1.66it/s]
91%|βββββββββ | 2698/2975 [31:34<02:46, 1.66it/s]
91%|βββββββββ | 2699/2975 [31:34<02:46, 1.66it/s]
91%|βββββββββ | 2700/2975 [31:35<02:45, 1.66it/s]
|
| 0: {'loss': 0.6899, 'grad_norm': 0.7517116214718325, 'learning_rate': 9.858624225078841e-06, 'epoch': 0.91} |
| 0: {'loss': 0.708, 'grad_norm': 0.7320599916190836, 'learning_rate': 9.717768952713514e-06, 'epoch': 0.91} |
| 0:
91%|βββββββββ | 2700/2975 [31:35<02:45, 1.66it/s]
91%|βββββββββ | 2701/2975 [31:36<02:44, 1.66it/s]
91%|βββββββββ | 2702/2975 [31:36<02:44, 1.66it/s]
91%|βββββββββ | 2703/2975 [31:37<02:44, 1.66it/s]
91%|βββββββββ | 2704/2975 [31:37<02:43, 1.65it/s]
91%|βββββββββ | 2705/2975 [31:38<02:42, 1.66it/s]
91%|βββββββββ | 2706/2975 [31:39<02:42, 1.66it/s]
91%|βββββββββ | 2707/2975 [31:39<02:41, 1.66it/s]
91%|βββββββββ | 2708/2975 [31:40<02:40, 1.66it/s]
91%|βββββββββ | 2709/2975 [31:40<02:40, 1.66it/s]
91%|βββββββββ | 2710/2975 [31:41<02:39, 1.66it/s]
91%|βββββββββ | 2710/2975 [31:41<02:39, 1.66it/s]
91%|βββββββββ | 2711/2975 [31:42<02:39, 1.66it/s]
91%|βββββββββ | 2712/2975 |
| 0: {'loss': 0.696, 'grad_norm': 0.7651357941181524, 'learning_rate': 9.530702921077358e-06, 'epoch': 0.91} |
| 0: [31:42<02:38, 1.66it/s]
91%|βββββββββ | 2713/2975 [31:43<02:37, 1.66it/s]
91%|βββββββββ | 2714/2975 [31:43<02:37, 1.66it/s]
91%|ββββββββββ| 2715/2975 [31:44<02:59, 1.45it/s]
91%|ββββββββββ| 2716/2975 [31:45<02:52, 1.50it/s]
91%|ββββββββββ| 2717/2975 [31:46<02:46, 1.55it/s]
91%|ββββββββββ| 2718/2975 [31:46<02:42, 1.58it/s]
91%|ββββββββββ| 2719/2975 [31:47<02:39, 1.60it/s]
91%|ββββββββββ| 2720/2975 [31:47<02:37, 1.62it/s]
91%|ββββββββββ| 2720/2975 [31:47<02:37, 1.62it/s]
91%|ββββββββββ| 2721/2975 [31:48<02:35, 1.63it/s]
91%|ββββββββββ| 2722/2975 [31:49<02:34, 1.64it/s]
92%|ββββββββββ| 2723/2975 [31:49<02:32, 1.65it/s]
92%|ββββββββββ| 2724/2975 [31:50<02:32, 1.65it/s]
92%|βββ |
| 0: {'loss': 0.7053, 'grad_norm': 0.7197188710960789, 'learning_rate': 9.29947566475907e-06, 'epoch': 0.92} |
| 0: βββββββ| 2725/2975 [31:50<02:31, 1.65it/s]
92%|ββββββββββ| 2726/2975 [31:51<02:30, 1.66it/s]
92%|ββββββββββ| 2727/2975 [31:52<02:29, 1.66it/s]
92%|ββββββββββ| 2728/2975 [31:52<02:29, 1.66it/s]
92%|ββββββββββ| 2729/2975 [31:53<02:28, 1.66it/s]
92%|ββββββββββ| 2730/2975 [31:53<02:27, 1.66it/s]
92%|ββββββββββ| 2730/2975 [31:53<02:27, 1.66it/s]
92%|ββββββββββ| 2731/2975 [31:54<02:27, 1.66it/s]
92%|ββββββββββ| 2732/2975 [31:55<02:26, 1.66it/s]
92%|ββββββββββ| 2733/2975 [31:55<02:25, 1.66it/s]
92%|ββββββββββ| 2734/2975 [31:56<02:25, 1.66it/s]
92%|ββββββββββ| 2735/2975 [31:56<02:24, 1.66it/s]
92%|ββββββββββ| 2736/2975 [31:57<02:24, 1.66it/s]
92%|ββββββββββ| 2737/2975 [3 |
| 0: {'loss': 0.6905, 'grad_norm': 0.7172619922394774, 'learning_rate': 9.02662055796628e-06, 'epoch': 0.92} |
| 0: 1:58<02:23, 1.66it/s]
92%|ββββββββββ| 2738/2975 [31:58<02:44, 1.44it/s]
92%|ββββββββββ| 2739/2975 [31:59<02:58, 1.32it/s]
92%|ββββββββββ| 2740/2975 [32:00<02:46, 1.41it/s]
92%|ββββββββββ| 2740/2975 [32:00<02:46, 1.41it/s]
92%|ββββββββββ| 2741/2975 [32:01<02:38, 1.47it/s]
92%|ββββββββββ| 2742/2975 [32:01<02:32, 1.53it/s]
92%|ββββββββββ| 2743/2975 [32:02<02:28, 1.56it/s]
92%|ββββββββββ| 2744/2975 [32:02<02:25, 1.59it/s]
92%|ββββββββββ| 2745/2975 [32:03<02:22, 1.61it/s]
92%|ββββββββββ| 2746/2975 [32:04<02:20, 1.63it/s]
92%|ββββββββββ| 2747/2975 [32:04<02:19, 1.64it/s]
92%|ββββββββββ| 2748/2975 [32:05<02:18, 1.64it/s]
92%|ββββββββββ| 2749/2975 [32:05<02:17, 1.64it/s]
92%|ββοΏ½ |
| 0: {'loss': 0.6869, 'grad_norm': 0.7121173622621597, 'learning_rate': 8.715127058347615e-06, 'epoch': 0.92} |
| 0: {'loss': 0.7213, 'grad_norm': 0.7341579336149818, 'learning_rate': 8.368407953869105e-06, 'epoch': 0.93} |
| 0: οΏ½οΏ½βββββββ| 2750/2975 [32:06<02:16, 1.65it/s]
92%|ββββββββββ| 2750/2975 [32:06<02:16, 1.65it/s]
92%|ββββββββββ| 2751/2975 [32:07<02:15, 1.65it/s]
93%|ββββββββββ| 2752/2975 [32:07<02:15, 1.65it/s]
93%|ββββββββββ| 2753/2975 [32:08<02:14, 1.66it/s]
93%|ββββββββββ| 2754/2975 [32:08<02:13, 1.66it/s]
93%|ββββββββββ| 2755/2975 [32:09<02:12, 1.66it/s]
93%|ββββββββββ| 2756/2975 [32:10<02:32, 1.44it/s]
93%|ββββββββββ| 2757/2975 [32:11<02:25, 1.50it/s]
93%|ββββββββββ| 2758/2975 [32:11<02:20, 1.54it/s]
93%|ββββββββββ| 2759/2975 [32:12<02:17, 1.58it/s]
93%|ββββββββββ| 2760/2975 [32:12<02:14, 1.60it/s]
93%|ββββββββββ| 2760/2975 [32:12<02:14, 1.60 |
| 0: {'loss': 0.7156, 'grad_norm': 0.7549621234139194, 'learning_rate': 7.99026197159505e-06, 'epoch': 0.93} |
| 0: it/s]
93%|ββββββββββ| 2761/2975 [32:13<02:12, 1.62it/s]
93%|ββββββββββ| 2762/2975 [32:14<02:10, 1.63it/s]
93%|ββββββββββ| 2763/2975 [32:14<02:09, 1.64it/s]
93%|ββββββββββ| 2764/2975 [32:15<02:07, 1.65it/s]
93%|ββββββββββ| 2765/2975 [32:15<02:06, 1.66it/s]
93%|ββββββββββ| 2766/2975 [32:16<02:06, 1.66it/s]
93%|ββββββββββ| 2767/2975 [32:17<02:05, 1.66it/s]
93%|ββββββββββ| 2768/2975 [32:17<02:04, 1.66it/s]
93%|ββββββββββ| 2769/2975 [32:18<02:04, 1.66it/s]
93%|ββββββββββ| 2770/2975 [32:18<02:03, 1.66it/s]
93%|ββββββββββ| 2770/2975 [32:18<02:03, 1.66it/s]
93%|ββββββββββ| 2771/2975 [32:19<02:02, 1.66it/s]
93%|ββββββββββ| 2772/2975 [32:20<02:02, 1.66it/s]
93%|ββββββββ |
| 0: {'loss': 0.6948, 'grad_norm': 0.7215693093672365, 'learning_rate': 7.584832158039379e-06, 'epoch': 0.93} |
| 0: ββ| 2773/2975 [32:20<02:01, 1.66it/s]
93%|ββββββββββ| 2774/2975 [32:21<02:01, 1.66it/s]
93%|ββββββββββ| 2775/2975 [32:21<02:00, 1.66it/s]
93%|ββββββββββ| 2776/2975 [32:22<01:59, 1.66it/s]
93%|ββββββββββ| 2777/2975 [32:23<01:59, 1.66it/s]
93%|ββββββββββ| 2778/2975 [32:23<01:58, 1.66it/s]
93%|ββββββββββ| 2779/2975 [32:24<01:57, 1.66it/s]
93%|ββββββββββ| 2780/2975 [32:24<01:57, 1.66it/s]
93%|ββββββββββ| 2780/2975 [32:24<01:57, 1.66it/s]
93%|ββββββββββ| 2781/2975 [32:25<01:56, 1.66it/s]
94%|ββββββββββ| 2782/2975 [32:26<01:56, 1.66it/s]
94%|ββββββββββ| 2783/2975 [32:26<01:55, 1.66it/s]
94%|ββββββββββ| 2784/2975 [32:27<01:55, 1.66it/s]
94%|ββββββββββ| 2785/2975 [32:27<01:54, 1. |
| 0: {'loss': 0.6965, 'grad_norm': 0.7326912491360482, 'learning_rate': 7.156560487081052e-06, 'epoch': 0.94} |
| 0: 66it/s]
94%|ββββββββββ| 2786/2975 [32:28<01:53, 1.66it/s]
94%|ββββββββββ| 2787/2975 [32:29<01:53, 1.66it/s]
94%|ββββββββββ| 2788/2975 [32:29<01:52, 1.66it/s]
94%|ββββββββββ| 2789/2975 [32:30<01:52, 1.66it/s]
94%|ββββββββββ| 2790/2975 [32:30<01:51, 1.66it/s]
94%|ββββββββββ| 2790/2975 [32:30<01:51, 1.66it/s]
94%|ββββββββββ| 2791/2975 [32:31<01:51, 1.66it/s]
94%|ββββββββββ| 2792/2975 [32:32<01:50, 1.66it/s]
94%|ββββββββββ| 2793/2975 [32:32<01:49, 1.66it/s]
94%|ββββββββββ| 2794/2975 [32:33<01:49, 1.66it/s]
94%|ββββββββββ| 2795/2975 [32:33<01:48, 1.66it/s]
94%|ββββββββββ| 2796/2975 [32:34<01:47, 1.66it/s]
94%|ββββββββββ| 2797/2975 [32:35<01:47, 1.66it/s]
94%|βββββββοΏ½ |
| 0: {'loss': 0.6906, 'grad_norm': 0.72139445955845, 'learning_rate': 6.710139192768695e-06, 'epoch': 0.94} |
| 0: οΏ½οΏ½ββ| 2798/2975 [32:35<01:46, 1.66it/s]
94%|ββββββββββ| 2799/2975 [32:36<01:45, 1.66it/s]
94%|ββββββββββ| 2800/2975 [32:36<01:45, 1.66it/s]
94%|ββββββββββ| 2800/2975 [32:36<01:45, 1.66it/s]
94%|ββββββββββ| 2801/2975 [32:37<01:45, 1.66it/s]
94%|ββββββββββ| 2802/2975 [32:38<01:57, 1.47it/s]
94%|ββββββββββ| 2803/2975 [32:39<01:53, 1.52it/s]
94%|ββββββββββ| 2804/2975 [32:39<01:49, 1.56it/s]
94%|ββββββββββ| 2805/2975 [32:40<01:47, 1.58it/s]
94%|ββββββββββ| 2806/2975 [32:40<01:45, 1.60it/s]
94%|ββββββββββ| 2807/2975 [32:41<01:43, 1.62it/s]
94%|ββββββββββ| 2808/2975 [32:42<01:42, 1.63it/s]
94%|ββββββββββ| 2809/2975 [32:42<01:41, 1.63it/s]
94%|ββββββββββ| 2810/2975 [32:43<01:40, |
| 0: {'loss': 0.6742, 'grad_norm': 0.6937605745619208, 'learning_rate': 6.25045936022246e-06, 'epoch': 0.94} |
| 0: {'loss': 0.698, 'grad_norm': 0.7320316503097014, 'learning_rate': 5.782557337881911e-06, 'epoch': 0.95} |
| 0: 1.64it/s]
94%|ββββββββββ| 2810/2975 [32:43<01:40, 1.64it/s]
94%|ββββββββββ| 2811/2975 [32:43<01:39, 1.64it/s]
95%|ββββββββββ| 2812/2975 [32:44<01:39, 1.63it/s]
95%|ββββββββββ| 2813/2975 [32:45<01:38, 1.64it/s]
95%|ββββββββββ| 2814/2975 [32:45<01:37, 1.65it/s]
95%|ββββββββββ| 2815/2975 [32:46<01:36, 1.65it/s]
95%|ββββββββββ| 2816/2975 [32:46<01:36, 1.65it/s]
95%|ββββββββββ| 2817/2975 [32:47<01:35, 1.65it/s]
95%|ββββββββββ| 2818/2975 [32:48<01:34, 1.66it/s]
95%|ββββββββββ| 2819/2975 [32:48<01:34, 1.66it/s]
95%|ββββββββββ| 2820/2975 [32:49<01:33, 1.66it/s]
95%|ββββββββββ| 2820/2975 [32:49<01:33, 1.66it/s]
95%|ββββββββββ| 2821/29 |
| 0: {'loss': 0.695, 'grad_norm': 0.7076182410530591, 'learning_rate': 5.311559558218603e-06, 'epoch': 0.95} |
| 0: 75 [32:49<01:33, 1.66it/s]
95%|ββββββββββ| 2822/2975 [32:50<01:32, 1.66it/s]
95%|ββββββββββ| 2823/2975 [32:51<01:31, 1.66it/s]
95%|ββββββββββ| 2824/2975 [32:51<01:30, 1.66it/s]
95%|ββββββββββ| 2825/2975 [32:52<01:30, 1.66it/s]
95%|ββββββββββ| 2826/2975 [32:52<01:29, 1.66it/s]
95%|ββββββββββ| 2827/2975 [32:53<01:29, 1.66it/s]
95%|ββββββββββ| 2828/2975 [32:54<01:28, 1.66it/s]
95%|ββββββββββ| 2829/2975 [32:54<01:27, 1.66it/s]
95%|ββββββββββ| 2830/2975 [32:55<01:27, 1.66it/s]
95%|ββββββββββ| 2830/2975 [32:55<01:27, 1.66it/s]
95%|ββββββββββ| 2831/2975 [32:55<01:26, 1.66it/s]
95%|ββββββββββ| 2832/2975 [32:56<01:26, 1.66it/s]
95%|ββββββββββ| 2833/2975 [32:57<01:25, 1.66it/s]
95%|οΏ½ |
| 0: {'loss': 0.6936, 'grad_norm': 0.7227746574659001, 'learning_rate': 4.842626371469149e-06, 'epoch': 0.95} |
| 0: οΏ½βββββββββ| 2834/2975 [32:57<01:24, 1.66it/s]
95%|ββββββββββ| 2835/2975 [32:58<01:24, 1.66it/s]
95%|ββββββββββ| 2836/2975 [32:58<01:23, 1.66it/s]
95%|ββββββββββ| 2837/2975 [32:59<01:23, 1.66it/s]
95%|ββββββββββ| 2838/2975 [33:00<01:22, 1.66it/s]
95%|ββββββββββ| 2839/2975 [33:00<01:21, 1.66it/s]
95%|ββββββββββ| 2840/2975 [33:01<01:21, 1.66it/s]
95%|ββββββββββ| 2840/2975 [33:01<01:21, 1.66it/s]
95%|ββββββββββ| 2841/2975 [33:01<01:20, 1.66it/s]
96%|ββββββββββ| 2842/2975 [33:02<01:20, 1.66it/s]
96%|ββββββββββ| 2843/2975 [33:03<01:19, 1.66it/s]
96%|ββββββββββ| 2844/2975 [33:03<01:18, 1.66it/s]
96%|ββββββββββ| 2845/2975 [33:04<01:18, 1.66it/s]
96%|ββββββββββ| 2846/ |
| 0: {'loss': 0.6863, 'grad_norm': 0.7091047859832311, 'learning_rate': 4.380895507758155e-06, 'epoch': 0.96} |
| 0: 2975 [33:04<01:17, 1.66it/s]
96%|ββββββββββ| 2847/2975 [33:05<01:16, 1.66it/s]
96%|ββββββββββ| 2848/2975 [33:06<01:16, 1.66it/s]
96%|ββββββββββ| 2849/2975 [33:06<01:15, 1.66it/s]
96%|ββββββββββ| 2850/2975 [33:07<01:15, 1.66it/s]
96%|ββββββββββ| 2850/2975 [33:07<01:15, 1.66it/s]
96%|ββββββββββ| 2851/2975 [33:07<01:14, 1.66it/s]
96%|ββββββββββ| 2852/2975 [33:08<01:14, 1.66it/s]
96%|ββββββββββ| 2853/2975 [33:09<01:13, 1.66it/s]
96%|ββββββββββ| 2854/2975 [33:09<01:12, 1.66it/s]
96%|ββββββββββ| 2855/2975 [33:10<01:12, 1.66it/s]
96%|ββββββββββ| 2856/2975 [33:10<01:11, 1.66it/s]
96%|ββββββββββ| 2857/2975 [33:11<01:10, 1.66it/s]
96%|ββββββββββ| 2858/2975 [33:12<01:10, 1.66it/s]
96%| |
| 0: {'loss': 0.6856, 'grad_norm': 0.7247132123386185, 'learning_rate': 3.931425787051832e-06, 'epoch': 0.96} |
| 0: ββββββββββ| 2859/2975 [33:12<01:09, 1.66it/s]
96%|ββββββββββ| 2860/2975 [33:13<01:09, 1.66it/s]
96%|ββββββββββ| 2860/2975 [33:13<01:09, 1.66it/s]
96%|ββββββββββ| 2861/2975 [33:13<01:08, 1.66it/s]
96%|ββββββββββ| 2862/2975 [33:14<01:08, 1.66it/s]
96%|ββββββββββ| 2863/2975 [33:15<01:07, 1.66it/s]
96%|ββββββββββ| 2864/2975 [33:15<01:06, 1.66it/s]
96%|ββββββββββ| 2865/2975 [33:16<01:06, 1.66it/s]
96%|ββββββββββ| 2866/2975 [33:17<01:05, 1.66it/s]
96%|ββββββββββ| 2867/2975 [33:17<01:05, 1.66it/s]
96%|ββββββββββ| 2868/2975 [33:18<01:04, 1.66it/s]
96%|ββββββββββ| 2869/2975 [33:18<01:03, 1.66it/s]
96%|ββββββββββ| 2870/2975 [33:19<01:03, 1.66it/s]
|
| 0: {'loss': 0.7047, 'grad_norm': 0.6897697496973473, 'learning_rate': 3.499141693667828e-06, 'epoch': 0.96} |
| 0: {'loss': 0.6937, 'grad_norm': 0.7064837688525987, 'learning_rate': 3.0887794225945143e-06, 'epoch': 0.97} |
| 0:
96%|ββββββββββ| 2870/2975 [33:19<01:03, 1.66it/s]
97%|ββββββββββ| 2871/2975 [33:20<01:02, 1.66it/s]
97%|ββββββββββ| 2872/2975 [33:20<01:02, 1.66it/s]
97%|ββββββββββ| 2873/2975 [33:21<01:01, 1.66it/s]
97%|ββββββββββ| 2874/2975 [33:21<01:00, 1.66it/s]
97%|ββββββββββ| 2875/2975 [33:22<01:00, 1.66it/s]
97%|ββββββββββ| 2876/2975 [33:23<00:59, 1.66it/s]
97%|ββββββββββ| 2877/2975 [33:23<00:58, 1.66it/s]
97%|ββββββββββ| 2878/2975 [33:24<00:58, 1.66it/s]
97%|ββββββββββ| 2879/2975 [33:24<00:57, 1.66it/s]
97%|ββββββββββ| 2880/2975 [33:25<00:57, 1.66it/s]
97%|ββββββββββ| 2880/2975 [33:25<00:57, 1.66it/s]
97%|ββββββββββ| 2881/2975 [33:26<00:56, 1.66it/s]
97%|βββββοΏ½ |
| 0: {'loss': 0.6974, 'grad_norm': 0.7154357858752877, 'learning_rate': 2.7048349887476038e-06, 'epoch': 0.97} |
| 0: οΏ½ββββ| 2882/2975 [33:26<00:55, 1.66it/s]
97%|ββββββββββ| 2883/2975 [33:27<00:55, 1.66it/s]
97%|ββββββββββ| 2884/2975 [33:27<00:54, 1.66it/s]
97%|ββββββββββ| 2885/2975 [33:28<00:54, 1.66it/s]
97%|ββββββββββ| 2886/2975 [33:29<00:53, 1.66it/s]
97%|ββββββββββ| 2887/2975 [33:29<00:53, 1.66it/s]
97%|ββββββββββ| 2888/2975 [33:30<00:52, 1.66it/s]
97%|ββββββββββ| 2889/2975 [33:30<00:51, 1.66it/s]
97%|ββββββββββ| 2890/2975 [33:31<00:51, 1.66it/s]
97%|ββββββββββ| 2890/2975 [33:31<00:51, 1.66it/s]
97%|ββββββββββ| 2891/2975 [33:32<00:50, 1.66it/s]
97%|ββββββββββ| 2892/2975 [33:32<00:50, 1.66it/s]
97%|ββββββββββ| 2893/2975 [33:33<00:49, 1.66it/s]
97%|ββββββββββ| 2894/2975 [33:33<00: |
| 0: {'loss': 0.7042, 'grad_norm': 0.6873844650072911, 'learning_rate': 2.3515149676898554e-06, 'epoch': 0.97} |
| 0: 48, 1.66it/s]
97%|ββββββββββ| 2895/2975 [33:34<00:48, 1.66it/s]
97%|ββββββββββ| 2896/2975 [33:35<00:47, 1.66it/s]
97%|ββββββββββ| 2897/2975 [33:35<00:46, 1.66it/s]
97%|ββββββββββ| 2898/2975 [33:36<00:46, 1.66it/s]
97%|ββββββββββ| 2899/2975 [33:36<00:45, 1.66it/s]
97%|ββββββββββ| 2900/2975 [33:37<00:45, 1.66it/s]
97%|ββββββββββ| 2900/2975 [33:37<00:45, 1.66it/s]
98%|ββββββββββ| 2901/2975 [33:38<00:44, 1.66it/s]
98%|ββββββββββ| 2902/2975 [33:38<00:43, 1.66it/s]
98%|ββββββββββ| 2903/2975 [33:39<00:43, 1.66it/s]
98%|ββββββββββ| 2904/2975 [33:39<00:42, 1.66it/s]
98%|ββββββββββ| 2905/2975 [33:40<00:42, 1.66it/s]
98%|ββββββββββ| 2906/2975 [33:41<00:41, 1.66it/s]
98%|βββββ |
| 0: {'loss': 0.6907, 'grad_norm': 0.6885606635440618, 'learning_rate': 2.032690407508949e-06, 'epoch': 0.98} |
| 0: βββββ| 2907/2975 [33:41<00:40, 1.66it/s]
98%|ββββββββββ| 2908/2975 [33:42<00:40, 1.66it/s]
98%|ββββββββββ| 2909/2975 [33:42<00:39, 1.66it/s]
98%|ββββββββββ| 2910/2975 [33:43<00:39, 1.66it/s]
98%|ββββββββββ| 2910/2975 [33:43<00:39, 1.66it/s]
98%|ββββββββββ| 2911/2975 [33:44<00:38, 1.66it/s]
98%|ββββββββββ| 2912/2975 [33:44<00:37, 1.66it/s]
98%|ββββββββββ| 2913/2975 [33:45<00:37, 1.66it/s]
98%|ββββββββββ| 2914/2975 [33:45<00:36, 1.66it/s]
98%|ββββββββββ| 2915/2975 [33:46<00:36, 1.66it/s]
98%|ββββββββββ| 2916/2975 [33:47<00:35, 1.66it/s]
98%|ββββββββββ| 2917/2975 [33:47<00:35, 1.63it/s]
98%|ββββββββββ| 2918/2975 [33:48<00:34, 1.64it/s]
98%|ββββββββββ| 2919/2975 [33:48<0 |
| 0: {'loss': 0.6971, 'grad_norm': 0.6687159290240817, 'learning_rate': 1.7518544168045527e-06, 'epoch': 0.98} |
| 0: {'loss': 0.6822, 'grad_norm': 0.6974668873540893, 'learning_rate': 1.5120838934595338e-06, 'epoch': 0.98} |
| 0: 0:34, 1.65it/s]
98%|ββββββββββ| 2920/2975 [33:49<00:33, 1.65it/s]
98%|ββββββββββ| 2920/2975 [33:49<00:33, 1.65it/s]
98%|ββββββββββ| 2921/2975 [33:50<00:32, 1.65it/s]
98%|ββββββββββ| 2922/2975 [33:50<00:32, 1.65it/s]
98%|ββββββββββ| 2923/2975 [33:51<00:31, 1.65it/s]
98%|ββββββββββ| 2924/2975 [33:52<00:30, 1.65it/s]
98%|ββββββββββ| 2925/2975 [33:52<00:30, 1.65it/s]
98%|ββββββββββ| 2926/2975 [33:53<00:29, 1.65it/s]
98%|ββββββββββ| 2927/2975 [33:53<00:29, 1.64it/s]
98%|ββββββββββ| 2928/2975 [33:54<00:28, 1.65it/s]
98%|ββββββββββ| 2929/2975 [33:55<00:27, 1.65it/s]
98%|ββββββββββ| 2930/2975 [33:55<00:27, 1.65it/s]
98%|ββββββββββ| |
| 0: {'loss': 0.6865, 'grad_norm': 0.6900700149375906, 'learning_rate': 1.316005813502869e-06, 'epoch': 0.99} |
| 0: 2930/2975 [33:55<00:27, 1.65it/s]
99%|ββββββββββ| 2931/2975 [33:56<00:26, 1.65it/s]
99%|ββββββββββ| 2932/2975 [33:56<00:26, 1.65it/s]
99%|ββββββββββ| 2933/2975 [33:57<00:25, 1.65it/s]
99%|ββββββββββ| 2934/2975 [33:58<00:24, 1.65it/s]
99%|ββββββββββ| 2935/2975 [33:58<00:24, 1.65it/s]
99%|ββββββββββ| 2936/2975 [33:59<00:23, 1.65it/s]
99%|ββββββββββ| 2937/2975 [33:59<00:23, 1.65it/s]
99%|ββββββββββ| 2938/2975 [34:00<00:22, 1.65it/s]
99%|ββββββββββ| 2939/2975 [34:01<00:21, 1.65it/s]
99%|ββββββββββ| 2940/2975 [34:01<00:21, 1.66it/s]
99%|ββββββββββ| 2940/2975 [34:01<00:21, 1.66it/s]
99%|ββββββββββ| 2941/2975 [34:02<00:20, 1.66it/s]
99%|ββββββββββ| 2942/2975 [34:02<00:19, 1.66it/s]
|
| 0: {'loss': 0.7135, 'grad_norm': 0.6716948809212612, 'learning_rate': 1.1657684494105386e-06, 'epoch': 0.99} |
| 0: 99%|ββββββββββ| 2943/2975 [34:03<00:19, 1.66it/s]
99%|ββββββββββ| 2944/2975 [34:04<00:18, 1.66it/s]
99%|ββββββββββ| 2945/2975 [34:04<00:18, 1.66it/s]
99%|ββββββββββ| 2946/2975 [34:05<00:17, 1.66it/s]
99%|ββββββββββ| 2947/2975 [34:05<00:16, 1.66it/s]
99%|ββββββββββ| 2948/2975 [34:06<00:16, 1.66it/s]
99%|ββββββββββ| 2949/2975 [34:07<00:15, 1.66it/s]
99%|ββββββββββ| 2950/2975 [34:07<00:15, 1.66it/s]
99%|ββββββββββ| 2950/2975 [34:07<00:15, 1.66it/s]
99%|ββββββββββ| 2951/2975 [34:08<00:14, 1.66it/s]
99%|ββββββββββ| 2952/2975 [34:08<00:13, 1.66it/s]
99%|ββββββββββ| 2953/2975 [34:09<00:13, 1.66it/s]
99%|ββββββββββ| 2954/2975 [34:10<00:12, 1.66it/s]
99%|ββββββββββ |
| 0: {'loss': 0.6859, 'grad_norm': 0.6712348667758804, 'learning_rate': 1.0630178331827281e-06, 'epoch': 0.99} |
| 0: | 2955/2975 [34:10<00:12, 1.66it/s]
99%|ββββββββββ| 2956/2975 [34:11<00:11, 1.66it/s]
99%|ββββββββββ| 2957/2975 [34:11<00:10, 1.66it/s]
99%|ββββββββββ| 2958/2975 [34:12<00:10, 1.66it/s]
99%|ββββββββββ| 2959/2975 [34:13<00:09, 1.66it/s]
99%|ββββββββββ| 2960/2975 [34:13<00:09, 1.66it/s]
99%|ββββββββββ| 2960/2975 [34:13<00:09, 1.66it/s]
100%|ββββββββββ| 2961/2975 [34:14<00:08, 1.66it/s]
100%|ββββββββββ| 2962/2975 [34:14<00:07, 1.66it/s]
100%|ββββββββββ| 2963/2975 [34:15<00:07, 1.66it/s]
100%|ββββββββββ| 2964/2975 [34:16<00:06, 1.66it/s]
100%|ββββββββββ| 2965/2975 [34:16<00:06, 1.66it/s]
100%|ββββββββββ| 2966/2975 [34:17<00:05, 1.66it/s]
100%|ββββββββββ| 2967/2975 [34:17<00:04, 1.65it/s |
| 0: {'loss': 0.6809, 'grad_norm': 0.6768243143096558, 'learning_rate': 1.008879722072778e-06, 'epoch': 1.0} |
| 0: {'train_runtime': 2064.8117, 'train_samples_per_second': 23.053, 'train_steps_per_second': 1.441, 'train_loss': 0.7352094054021755, 'epoch': 1.0} |
| 0: ]
100%|ββββββββββ| 2968/2975 [34:18<00:04, 1.66it/s]
100%|ββββββββββ| 2969/2975 [34:19<00:03, 1.66it/s]
100%|ββββββββββ| 2970/2975 [34:19<00:03, 1.66it/s]
100%|ββββββββββ| 2970/2975 [34:19<00:03, 1.66it/s]
100%|ββββββββββ| 2971/2975 [34:20<00:02, 1.66it/s]
100%|ββββββββββ| 2972/2975 [34:20<00:01, 1.66it/s]
100%|ββββββββββ| 2973/2975 [34:21<00:01, 1.66it/s]
100%|ββββββββββ| 2974/2975 [34:22<00:00, 1.66it/s]
100%|ββββββββββ| 2975/2975 [34:22<00:00, 1.66it/s]
100%|ββββββββββ| 2975/2975 [34:24<00:00, 1.66it/s]
100%|ββββββββββ| 2975/2975 [34:24<00:00, 1.44it/s] |
| 0: [2025-08-20 17:37:09,165] [INFO] [axolotl.train.save_trained_model:246] [PID:880949] [RANK:0] Training completed! Saving trained model to /lustre/fswork/projects/rech/dgo/udv55np/ift/Qwen3-235B-A22B/Qwen2.5-0.5B/0.[39m |
| 0: [2025-08-20 17:37:10,570] [INFO] [axolotl.train.save_trained_model:331] [PID:880949] [RANK:0] Model successfully saved to /lustre/fswork/projects/rech/dgo/udv55np/ift/Qwen3-235B-A22B/Qwen2.5-0.5B/0[39m |
| |