Upload folder using huggingface_hub

Browse files

Files changed (10) hide show

.summary/0/events.out.tfevents.1730982413.ali +0 -0
.summary/0/events.out.tfevents.1730982684.ali +0 -0
.summary/0/events.out.tfevents.1730982741.ali +3 -0
.summary/0/events.out.tfevents.1730982907.ali +3 -0
README.md +1 -1
checkpoint_p0/checkpoint_000001806_7397376.pth +3 -0
checkpoint_p0/checkpoint_000001955_8007680.pth +3 -0
config.json +1 -1
replay.mp4 +2 -2
sf_log.txt +899 -0

.summary/0/events.out.tfevents.1730982413.ali ADDED Viewed

File without changes

.summary/0/events.out.tfevents.1730982684.ali ADDED Viewed

File without changes

.summary/0/events.out.tfevents.1730982741.ali ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:814f1a05c62696ae377c67fe385aa2b78f2e1f1bb048a26fdfd6d51ce9706129
+size 40

.summary/0/events.out.tfevents.1730982907.ali ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b77ed93e786f21de8587f4a2cf8852dbda3dbf10c6afaab58ea577ed51e36a71
+size 418702

README.md CHANGED Viewed

@@ -15,7 +15,7 @@ model-index:
       type: doom_health_gathering_supreme
     metrics:
     - type: mean_reward
-      value: 4.16 +/- 0.43
       name: mean_reward
       verified: false
 ---

       type: doom_health_gathering_supreme
     metrics:
     - type: mean_reward
+      value: 4.16 +/- 0.40
       name: mean_reward
       verified: false
 ---

checkpoint_p0/checkpoint_000001806_7397376.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:45da3df3b08606fd196e42a86db3419ab04d7a53e4a0dabcdbd1d4a373ff64fc
+size 34929669

checkpoint_p0/checkpoint_000001955_8007680.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3c40896e490d9f61fa149c75461188d66fac389407b5485ac47f0d95ca71c25c
+size 34929669

config.json CHANGED Viewed

@@ -65,7 +65,7 @@
   "summaries_use_frameskip": true,
   "heartbeat_interval": 20,
   "heartbeat_reporting_interval": 600,
-  "train_for_env_steps": 4000000,
   "train_for_seconds": 10000000000,
   "save_every_sec": 120,
   "keep_checkpoints": 2,

   "summaries_use_frameskip": true,
   "heartbeat_interval": 20,
   "heartbeat_reporting_interval": 600,
+  "train_for_env_steps": 8000000,
   "train_for_seconds": 10000000000,
   "save_every_sec": 120,
   "keep_checkpoints": 2,

replay.mp4 CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e4400a542b41ef1ba1ffb960e294df641800f7312d7253a079285a19b13c8f2d
-size 5915714

 version https://git-lfs.github.com/spec/v1
+oid sha256:0ebbd919fbdc3030800efb2f1e143d54c8c67bdda23aefd40b82e2eb8c7ac065
+size 6180137

sf_log.txt CHANGED Viewed

@@ -4906,3 +4906,902 @@ main_loop: 54.9388
 [2024-11-07 14:22:33,260][01364] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160
 [2024-11-07 14:22:33,261][01364] Avg episode reward: 4.660, avg true_objective: 4.160
 [2024-11-07 14:22:46,334][01364] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4!

 [2024-11-07 14:22:33,260][01364] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160
 [2024-11-07 14:22:33,261][01364] Avg episode reward: 4.660, avg true_objective: 4.160
 [2024-11-07 14:22:46,334][01364] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4!
+[2024-11-07 14:22:51,976][01364] The model has been pushed to https://huggingface.co/alidenewade/rl_course_vizdoom_health_gathering_supreme
+[2024-11-07 14:26:53,545][01364] Environment doom_basic already registered, overwriting...
+[2024-11-07 14:26:53,548][01364] Environment doom_two_colors_easy already registered, overwriting...
+[2024-11-07 14:26:53,550][01364] Environment doom_two_colors_hard already registered, overwriting...
+[2024-11-07 14:26:53,552][01364] Environment doom_dm already registered, overwriting...
+[2024-11-07 14:26:53,553][01364] Environment doom_dwango5 already registered, overwriting...
+[2024-11-07 14:26:53,555][01364] Environment doom_my_way_home_flat_actions already registered, overwriting...
+[2024-11-07 14:26:53,556][01364] Environment doom_defend_the_center_flat_actions already registered, overwriting...
+[2024-11-07 14:26:53,558][01364] Environment doom_my_way_home already registered, overwriting...
+[2024-11-07 14:26:53,559][01364] Environment doom_deadly_corridor already registered, overwriting...
+[2024-11-07 14:26:53,561][01364] Environment doom_defend_the_center already registered, overwriting...
+[2024-11-07 14:26:53,564][01364] Environment doom_defend_the_line already registered, overwriting...
+[2024-11-07 14:26:53,565][01364] Environment doom_health_gathering already registered, overwriting...
+[2024-11-07 14:26:53,565][01364] Environment doom_health_gathering_supreme already registered, overwriting...
+[2024-11-07 14:26:53,568][01364] Environment doom_battle already registered, overwriting...
+[2024-11-07 14:26:53,569][01364] Environment doom_battle2 already registered, overwriting...
+[2024-11-07 14:26:53,570][01364] Environment doom_duel_bots already registered, overwriting...
+[2024-11-07 14:26:53,572][01364] Environment doom_deathmatch_bots already registered, overwriting...
+[2024-11-07 14:26:53,574][01364] Environment doom_duel already registered, overwriting...
+[2024-11-07 14:26:53,575][01364] Environment doom_deathmatch_full already registered, overwriting...
+[2024-11-07 14:26:53,577][01364] Environment doom_benchmark already registered, overwriting...
+[2024-11-07 14:26:53,579][01364] register_encoder_factory: <function make_vizdoom_encoder at 0x7f6746896950>
+[2024-11-07 14:26:53,595][01364] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json
+[2024-11-07 14:26:53,597][01364] Overriding arg 'env' with value 'LunarLander-v2' passed from command line
+[2024-11-07 14:26:53,603][01364] Experiment dir /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment already exists!
+[2024-11-07 14:26:53,604][01364] Resuming existing experiment from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment...
+[2024-11-07 14:26:53,607][01364] Weights and Biases integration disabled
+[2024-11-07 14:26:53,612][01364] Environment var CUDA_VISIBLE_DEVICES is 0
+[2024-11-07 14:32:23,905][03851] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json...
+[2024-11-07 14:32:23,907][03851] Rollout worker 0 uses device cpu
+[2024-11-07 14:32:23,908][03851] Rollout worker 1 uses device cpu
+[2024-11-07 14:32:23,909][03851] Rollout worker 2 uses device cpu
+[2024-11-07 14:32:23,911][03851] Rollout worker 3 uses device cpu
+[2024-11-07 14:32:23,913][03851] Rollout worker 4 uses device cpu
+[2024-11-07 14:32:23,914][03851] Rollout worker 5 uses device cpu
+[2024-11-07 14:32:23,916][03851] Rollout worker 6 uses device cpu
+[2024-11-07 14:32:23,918][03851] Rollout worker 7 uses device cpu
+[2024-11-07 14:32:23,982][03851] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-11-07 14:32:23,983][03851] InferenceWorker_p0-w0: min num requests: 2
+[2024-11-07 14:32:24,069][03851] Starting all processes...
+[2024-11-07 14:32:24,070][03851] Starting process learner_proc0
+[2024-11-07 14:32:24,119][03851] Starting all processes...
+[2024-11-07 14:32:24,126][03851] Starting process inference_proc0-0
+[2024-11-07 14:32:24,126][03851] Starting process rollout_proc0
+[2024-11-07 14:32:24,127][03851] Starting process rollout_proc1
+[2024-11-07 14:32:24,128][03851] Starting process rollout_proc2
+[2024-11-07 14:32:24,130][03851] Starting process rollout_proc3
+[2024-11-07 14:32:24,131][03851] Starting process rollout_proc4
+[2024-11-07 14:32:24,132][03851] Starting process rollout_proc5
+[2024-11-07 14:32:24,133][03851] Starting process rollout_proc6
+[2024-11-07 14:32:24,134][03851] Starting process rollout_proc7
+[2024-11-07 14:32:28,240][04103] Worker 3 uses CPU cores [3]
+[2024-11-07 14:32:28,599][04099] Worker 0 uses CPU cores [0]
+[2024-11-07 14:32:28,800][04102] Worker 2 uses CPU cores [2]
+[2024-11-07 14:32:28,820][04086] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-11-07 14:32:28,820][04086] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
+[2024-11-07 14:32:28,984][04086] Num visible devices: 1
+[2024-11-07 14:32:29,023][04086] Starting seed is not provided
+[2024-11-07 14:32:29,023][04086] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-11-07 14:32:29,023][04086] Initializing actor-critic model on device cuda:0
+[2024-11-07 14:32:29,024][04086] RunningMeanStd input shape: (3, 72, 128)
+[2024-11-07 14:32:29,030][04086] RunningMeanStd input shape: (1,)
+[2024-11-07 14:32:29,031][04101] Worker 1 uses CPU cores [1]
+[2024-11-07 14:32:29,065][04086] ConvEncoder: input_channels=3
+[2024-11-07 14:32:29,179][04105] Worker 5 uses CPU cores [5]
+[2024-11-07 14:32:29,418][04086] Conv encoder output size: 512
+[2024-11-07 14:32:29,418][04086] Policy head output size: 512
+[2024-11-07 14:32:29,488][04086] Created Actor Critic model with architecture:
+[2024-11-07 14:32:29,488][04086] ActorCriticSharedWeights(
+  (obs_normalizer): ObservationNormalizer(
+    (running_mean_std): RunningMeanStdDictInPlace(
+      (running_mean_std): ModuleDict(
+        (obs): RunningMeanStdInPlace()
+      )
+    )
+  )
+  (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
+  (encoder): VizdoomEncoder(
+    (basic_encoder): ConvEncoder(
+      (enc): RecursiveScriptModule(
+        original_name=ConvEncoderImpl
+        (conv_head): RecursiveScriptModule(
+          original_name=Sequential
+          (0): RecursiveScriptModule(original_name=Conv2d)
+          (1): RecursiveScriptModule(original_name=ELU)
+          (2): RecursiveScriptModule(original_name=Conv2d)
+          (3): RecursiveScriptModule(original_name=ELU)
+          (4): RecursiveScriptModule(original_name=Conv2d)
+          (5): RecursiveScriptModule(original_name=ELU)
+        )
+        (mlp_layers): RecursiveScriptModule(
+          original_name=Sequential
+          (0): RecursiveScriptModule(original_name=Linear)
+          (1): RecursiveScriptModule(original_name=ELU)
+        )
+      )
+    )
+  )
+  (core): ModelCoreRNN(
+    (core): GRU(512, 512)
+  )
+  (decoder): MlpDecoder(
+    (mlp): Identity()
+  )
+  (critic_linear): Linear(in_features=512, out_features=1, bias=True)
+  (action_parameterization): ActionParameterizationDefault(
+    (distribution_linear): Linear(in_features=512, out_features=4, bias=True)
+  )
+)
+[2024-11-07 14:32:29,721][04100] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-11-07 14:32:29,722][04100] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
+[2024-11-07 14:32:29,759][04100] Num visible devices: 1
+[2024-11-07 14:32:29,806][04106] Worker 6 uses CPU cores [6]
+[2024-11-07 14:32:29,878][04107] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6]
+[2024-11-07 14:32:29,920][04104] Worker 4 uses CPU cores [4]
+[2024-11-07 14:32:30,421][04086] Using optimizer <class 'torch.optim.adam.Adam'>
+[2024-11-07 14:32:31,436][04086] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000986_4038656.pth...
+[2024-11-07 14:32:31,484][04086] Loading model from checkpoint
+[2024-11-07 14:32:31,485][04086] EvtLoop [learner_proc0_evt_loop, process=learner_proc0] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Runner_EvtLoop', signal_name='start'), args=()
+Traceback (most recent call last):
+  File "/root/miniconda3/envs/unit8/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal
+    slot_callable(*args)
+  File "/root/miniconda3/envs/unit8/lib/python3.10/site-packages/sample_factory/algo/learning/learner_worker.py", line 139, in init
+    init_model_data = self.learner.init()
+  File "/root/miniconda3/envs/unit8/lib/python3.10/site-packages/sample_factory/algo/learning/learner.py", line 245, in init
+    self.load_from_checkpoint(self.policy_id)
+  File "/root/miniconda3/envs/unit8/lib/python3.10/site-packages/sample_factory/algo/learning/learner.py", line 307, in load_from_checkpoint
+    self._load_state(checkpoint_dict, load_progress=load_progress)
+  File "/root/miniconda3/envs/unit8/lib/python3.10/site-packages/sample_factory/algo/learning/learner.py", line 291, in _load_state
+    self.actor_critic.load_state_dict(checkpoint_dict["model"])
+  File "/root/miniconda3/envs/unit8/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2584, in load_state_dict
+    raise RuntimeError(
+RuntimeError: Error(s) in loading state_dict for ActorCriticSharedWeights:
+	size mismatch for action_parameterization.distribution_linear.weight: copying a param with shape torch.Size([5, 512]) from checkpoint, the shape in current model is torch.Size([4, 512]).
+	size mismatch for action_parameterization.distribution_linear.bias: copying a param with shape torch.Size([5]) from checkpoint, the shape in current model is torch.Size([4]).
+[2024-11-07 14:32:31,488][04086] Unhandled exception Error(s) in loading state_dict for ActorCriticSharedWeights:
+	size mismatch for action_parameterization.distribution_linear.weight: copying a param with shape torch.Size([5, 512]) from checkpoint, the shape in current model is torch.Size([4, 512]).
+	size mismatch for action_parameterization.distribution_linear.bias: copying a param with shape torch.Size([5]) from checkpoint, the shape in current model is torch.Size([4]). in evt loop learner_proc0_evt_loop
+[2024-11-07 14:32:43,973][03851] Heartbeat connected on Batcher_0
+[2024-11-07 14:32:43,982][03851] Heartbeat connected on InferenceWorker_p0-w0
+[2024-11-07 14:32:43,989][03851] Heartbeat connected on RolloutWorker_w0
+[2024-11-07 14:32:43,992][03851] Heartbeat connected on RolloutWorker_w1
+[2024-11-07 14:32:43,997][03851] Heartbeat connected on RolloutWorker_w2
+[2024-11-07 14:32:44,000][03851] Heartbeat connected on RolloutWorker_w3
+[2024-11-07 14:32:44,004][03851] Heartbeat connected on RolloutWorker_w4
+[2024-11-07 14:32:44,007][03851] Heartbeat connected on RolloutWorker_w5
+[2024-11-07 14:32:44,067][03851] Heartbeat connected on RolloutWorker_w6
+[2024-11-07 14:32:44,068][03851] Heartbeat connected on RolloutWorker_w7
+[2024-11-07 14:34:40,019][03851] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 3851], exiting...
+[2024-11-07 14:34:40,023][04100] Stopping InferenceWorker_p0-w0...
+[2024-11-07 14:34:40,023][04106] Stopping RolloutWorker_w6...
+[2024-11-07 14:34:40,023][04099] Stopping RolloutWorker_w0...
+[2024-11-07 14:34:40,023][04106] Loop rollout_proc6_evt_loop terminating...
+[2024-11-07 14:34:40,023][04100] Loop inference_proc0-0_evt_loop terminating...
+[2024-11-07 14:34:40,023][04099] Loop rollout_proc0_evt_loop terminating...
+[2024-11-07 14:34:40,022][04107] Stopping RolloutWorker_w7...
+[2024-11-07 14:34:40,024][04107] Loop rollout_proc7_evt_loop terminating...
+[2024-11-07 14:34:40,024][04104] Stopping RolloutWorker_w4...
+[2024-11-07 14:34:40,026][04104] Loop rollout_proc4_evt_loop terminating...
+[2024-11-07 14:34:40,023][03851] Runner profile tree view:
+main_loop: 135.9546
+[2024-11-07 14:34:40,027][04086] Stopping Batcher_0...
+[2024-11-07 14:34:40,028][04086] Loop batcher_evt_loop terminating...
+[2024-11-07 14:34:40,027][03851] Collected {}, FPS: 0.0
+[2024-11-07 14:34:40,029][04102] Stopping RolloutWorker_w2...
+[2024-11-07 14:34:40,029][04102] Loop rollout_proc2_evt_loop terminating...
+[2024-11-07 14:34:40,030][04105] Stopping RolloutWorker_w5...
+[2024-11-07 14:34:40,031][04105] Loop rollout_proc5_evt_loop terminating...
+[2024-11-07 14:34:40,034][04101] Stopping RolloutWorker_w1...
+[2024-11-07 14:34:40,040][04101] Loop rollout_proc1_evt_loop terminating...
+[2024-11-07 14:34:40,042][04103] Stopping RolloutWorker_w3...
+[2024-11-07 14:34:40,046][04103] Loop rollout_proc3_evt_loop terminating...
+[2024-11-07 14:35:11,133][04584] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json...
+[2024-11-07 14:35:11,135][04584] Rollout worker 0 uses device cpu
+[2024-11-07 14:35:11,136][04584] Rollout worker 1 uses device cpu
+[2024-11-07 14:35:11,138][04584] Rollout worker 2 uses device cpu
+[2024-11-07 14:35:11,139][04584] Rollout worker 3 uses device cpu
+[2024-11-07 14:35:11,141][04584] Rollout worker 4 uses device cpu
+[2024-11-07 14:35:11,142][04584] Rollout worker 5 uses device cpu
+[2024-11-07 14:35:11,144][04584] Rollout worker 6 uses device cpu
+[2024-11-07 14:35:11,146][04584] Rollout worker 7 uses device cpu
+[2024-11-07 14:35:11,205][04584] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-11-07 14:35:11,206][04584] InferenceWorker_p0-w0: min num requests: 2
+[2024-11-07 14:35:11,239][04584] Starting all processes...
+[2024-11-07 14:35:11,241][04584] Starting process learner_proc0
+[2024-11-07 14:35:11,376][04584] Starting all processes...
+[2024-11-07 14:35:11,383][04584] Starting process inference_proc0-0
+[2024-11-07 14:35:11,384][04584] Starting process rollout_proc0
+[2024-11-07 14:35:11,385][04584] Starting process rollout_proc1
+[2024-11-07 14:35:11,385][04584] Starting process rollout_proc2
+[2024-11-07 14:35:11,386][04584] Starting process rollout_proc3
+[2024-11-07 14:35:11,387][04584] Starting process rollout_proc4
+[2024-11-07 14:35:11,388][04584] Starting process rollout_proc5
+[2024-11-07 14:35:11,390][04584] Starting process rollout_proc6
+[2024-11-07 14:35:11,390][04584] Starting process rollout_proc7
+[2024-11-07 14:35:16,833][04708] Worker 6 uses CPU cores [6]
+[2024-11-07 14:35:16,839][04706] Worker 4 uses CPU cores [4]
+[2024-11-07 14:35:17,166][04709] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6]
+[2024-11-07 14:35:17,188][04705] Worker 2 uses CPU cores [2]
+[2024-11-07 14:35:17,642][04702] Worker 0 uses CPU cores [0]
+[2024-11-07 14:35:17,899][04707] Worker 5 uses CPU cores [5]
+[2024-11-07 14:35:17,907][04701] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-11-07 14:35:17,907][04701] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
+[2024-11-07 14:35:17,932][04703] Worker 1 uses CPU cores [1]
+[2024-11-07 14:35:17,951][04688] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-11-07 14:35:17,952][04688] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
+[2024-11-07 14:35:17,978][04688] Num visible devices: 1
+[2024-11-07 14:35:17,978][04701] Num visible devices: 1
+[2024-11-07 14:35:17,993][04688] Starting seed is not provided
+[2024-11-07 14:35:17,993][04688] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-11-07 14:35:17,994][04688] Initializing actor-critic model on device cuda:0
+[2024-11-07 14:35:17,994][04688] RunningMeanStd input shape: (3, 72, 128)
+[2024-11-07 14:35:17,995][04688] RunningMeanStd input shape: (1,)
+[2024-11-07 14:35:18,006][04688] ConvEncoder: input_channels=3
+[2024-11-07 14:35:18,132][04688] Conv encoder output size: 512
+[2024-11-07 14:35:18,133][04688] Policy head output size: 512
+[2024-11-07 14:35:18,148][04688] Created Actor Critic model with architecture:
+[2024-11-07 14:35:18,148][04688] ActorCriticSharedWeights(
+  (obs_normalizer): ObservationNormalizer(
+    (running_mean_std): RunningMeanStdDictInPlace(
+      (running_mean_std): ModuleDict(
+        (obs): RunningMeanStdInPlace()
+      )
+    )
+  )
+  (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
+  (encoder): VizdoomEncoder(
+    (basic_encoder): ConvEncoder(
+      (enc): RecursiveScriptModule(
+        original_name=ConvEncoderImpl
+        (conv_head): RecursiveScriptModule(
+          original_name=Sequential
+          (0): RecursiveScriptModule(original_name=Conv2d)
+          (1): RecursiveScriptModule(original_name=ELU)
+          (2): RecursiveScriptModule(original_name=Conv2d)
+          (3): RecursiveScriptModule(original_name=ELU)
+          (4): RecursiveScriptModule(original_name=Conv2d)
+          (5): RecursiveScriptModule(original_name=ELU)
+        )
+        (mlp_layers): RecursiveScriptModule(
+          original_name=Sequential
+          (0): RecursiveScriptModule(original_name=Linear)
+          (1): RecursiveScriptModule(original_name=ELU)
+        )
+      )
+    )
+  )
+  (core): ModelCoreRNN(
+    (core): GRU(512, 512)
+  )
+  (decoder): MlpDecoder(
+    (mlp): Identity()
+  )
+  (critic_linear): Linear(in_features=512, out_features=1, bias=True)
+  (action_parameterization): ActionParameterizationDefault(
+    (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
+  )
+)
+[2024-11-07 14:35:18,249][04704] Worker 3 uses CPU cores [3]
+[2024-11-07 14:35:18,791][04688] Using optimizer <class 'torch.optim.adam.Adam'>
+[2024-11-07 14:35:19,775][04688] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000986_4038656.pth...
+[2024-11-07 14:35:19,840][04688] Loading model from checkpoint
+[2024-11-07 14:35:19,842][04688] Loaded experiment state at self.train_step=986, self.env_steps=4038656
+[2024-11-07 14:35:19,842][04688] Initialized policy 0 weights for model version 986
+[2024-11-07 14:35:19,848][04688] LearnerWorker_p0 finished initialization!
+[2024-11-07 14:35:19,849][04688] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-11-07 14:35:20,005][04701] RunningMeanStd input shape: (3, 72, 128)
+[2024-11-07 14:35:20,006][04701] RunningMeanStd input shape: (1,)
+[2024-11-07 14:35:20,019][04701] ConvEncoder: input_channels=3
+[2024-11-07 14:35:20,124][04701] Conv encoder output size: 512
+[2024-11-07 14:35:20,124][04701] Policy head output size: 512
+[2024-11-07 14:35:20,165][04584] Inference worker 0-0 is ready!
+[2024-11-07 14:35:20,167][04584] All inference workers are ready! Signal rollout workers to start!
+[2024-11-07 14:35:20,253][04705] Doom resolution: 160x120, resize resolution: (128, 72)
+[2024-11-07 14:35:20,255][04703] Doom resolution: 160x120, resize resolution: (128, 72)
+[2024-11-07 14:35:20,257][04706] Doom resolution: 160x120, resize resolution: (128, 72)
+[2024-11-07 14:35:20,259][04707] Doom resolution: 160x120, resize resolution: (128, 72)
+[2024-11-07 14:35:20,265][04704] Doom resolution: 160x120, resize resolution: (128, 72)
+[2024-11-07 14:35:20,283][04708] Doom resolution: 160x120, resize resolution: (128, 72)
+[2024-11-07 14:35:20,317][04709] Doom resolution: 160x120, resize resolution: (128, 72)
+[2024-11-07 14:35:20,332][04702] Doom resolution: 160x120, resize resolution: (128, 72)
+[2024-11-07 14:35:20,647][04705] Decorrelating experience for 0 frames...
+[2024-11-07 14:35:20,650][04703] Decorrelating experience for 0 frames...
+[2024-11-07 14:35:20,663][04707] Decorrelating experience for 0 frames...
+[2024-11-07 14:35:20,681][04708] Decorrelating experience for 0 frames...
+[2024-11-07 14:35:20,693][04709] Decorrelating experience for 0 frames...
+[2024-11-07 14:35:20,977][04704] Decorrelating experience for 0 frames...
+[2024-11-07 14:35:21,024][04708] Decorrelating experience for 32 frames...
+[2024-11-07 14:35:21,043][04709] Decorrelating experience for 32 frames...
+[2024-11-07 14:35:21,063][04705] Decorrelating experience for 32 frames...
+[2024-11-07 14:35:21,064][04707] Decorrelating experience for 32 frames...
+[2024-11-07 14:35:21,066][04703] Decorrelating experience for 32 frames...
+[2024-11-07 14:35:21,373][04702] Decorrelating experience for 0 frames...
+[2024-11-07 14:35:21,407][04704] Decorrelating experience for 32 frames...
+[2024-11-07 14:35:21,456][04705] Decorrelating experience for 64 frames...
+[2024-11-07 14:35:21,457][04708] Decorrelating experience for 64 frames...
+[2024-11-07 14:35:21,525][04703] Decorrelating experience for 64 frames...
+[2024-11-07 14:35:21,694][04702] Decorrelating experience for 32 frames...
+[2024-11-07 14:35:21,817][04704] Decorrelating experience for 64 frames...
+[2024-11-07 14:35:21,843][04705] Decorrelating experience for 96 frames...
+[2024-11-07 14:35:21,844][04708] Decorrelating experience for 96 frames...
+[2024-11-07 14:35:22,066][04703] Decorrelating experience for 96 frames...
+[2024-11-07 14:35:22,068][04706] Decorrelating experience for 0 frames...
+[2024-11-07 14:35:22,108][04584] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4038656. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-11-07 14:35:22,184][04704] Decorrelating experience for 96 frames...
+[2024-11-07 14:35:22,274][04702] Decorrelating experience for 64 frames...
+[2024-11-07 14:35:22,403][04706] Decorrelating experience for 32 frames...
+[2024-11-07 14:35:22,687][04702] Decorrelating experience for 96 frames...
+[2024-11-07 14:35:22,831][04707] Decorrelating experience for 64 frames...
+[2024-11-07 14:35:23,152][04706] Decorrelating experience for 64 frames...
+[2024-11-07 14:35:23,300][04707] Decorrelating experience for 96 frames...
+[2024-11-07 14:35:23,313][04709] Decorrelating experience for 64 frames...
+[2024-11-07 14:35:23,775][04706] Decorrelating experience for 96 frames...
+[2024-11-07 14:35:24,023][04709] Decorrelating experience for 96 frames...
+[2024-11-07 14:35:24,784][04688] Signal inference workers to stop experience collection...
+[2024-11-07 14:35:24,831][04701] InferenceWorker_p0-w0: stopping experience collection
+[2024-11-07 14:35:27,108][04584] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4038656. Throughput: 0: 546.0. Samples: 2730. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-11-07 14:35:27,109][04584] Avg episode reward: [(0, '2.632')]
+[2024-11-07 14:35:29,009][04688] Signal inference workers to resume experience collection...
+[2024-11-07 14:35:29,010][04701] InferenceWorker_p0-w0: resuming experience collection
+[2024-11-07 14:35:31,197][04584] Heartbeat connected on Batcher_0
+[2024-11-07 14:35:31,200][04584] Heartbeat connected on LearnerWorker_p0
+[2024-11-07 14:35:31,212][04584] Heartbeat connected on RolloutWorker_w0
+[2024-11-07 14:35:31,216][04584] Heartbeat connected on RolloutWorker_w1
+[2024-11-07 14:35:31,223][04584] Heartbeat connected on InferenceWorker_p0-w0
+[2024-11-07 14:35:31,227][04584] Heartbeat connected on RolloutWorker_w2
+[2024-11-07 14:35:31,235][04584] Heartbeat connected on RolloutWorker_w3
+[2024-11-07 14:35:31,238][04584] Heartbeat connected on RolloutWorker_w6
+[2024-11-07 14:35:31,244][04584] Heartbeat connected on RolloutWorker_w4
+[2024-11-07 14:35:31,250][04584] Heartbeat connected on RolloutWorker_w5
+[2024-11-07 14:35:31,255][04584] Heartbeat connected on RolloutWorker_w7
+[2024-11-07 14:35:32,108][04584] Fps is (10 sec: 2867.1, 60 sec: 2867.1, 300 sec: 2867.1). Total num frames: 4067328. Throughput: 0: 299.6. Samples: 2996. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2024-11-07 14:35:32,111][04584] Avg episode reward: [(0, '3.943')]
+[2024-11-07 14:35:35,402][04701] Updated weights for policy 0, policy_version 996 (0.0025)
+[2024-11-07 14:35:37,108][04584] Fps is (10 sec: 5324.7, 60 sec: 3549.8, 300 sec: 3549.8). Total num frames: 4091904. Throughput: 0: 700.9. Samples: 10514. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2024-11-07 14:35:37,112][04584] Avg episode reward: [(0, '4.473')]
+[2024-11-07 14:35:40,471][04701] Updated weights for policy 0, policy_version 1006 (0.0023)
+[2024-11-07 14:35:42,108][04584] Fps is (10 sec: 6553.5, 60 sec: 4710.3, 300 sec: 4710.3). Total num frames: 4132864. Throughput: 0: 1134.7. Samples: 22694. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:35:42,112][04584] Avg episode reward: [(0, '4.427')]
+[2024-11-07 14:35:45,559][04701] Updated weights for policy 0, policy_version 1016 (0.0033)
+[2024-11-07 14:35:47,108][04584] Fps is (10 sec: 8192.2, 60 sec: 5406.7, 300 sec: 5406.7). Total num frames: 4173824. Throughput: 0: 1148.1. Samples: 28702. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:35:47,110][04584] Avg episode reward: [(0, '4.231')]
+[2024-11-07 14:35:50,462][04701] Updated weights for policy 0, policy_version 1026 (0.0025)
+[2024-11-07 14:35:52,108][04584] Fps is (10 sec: 8192.2, 60 sec: 5870.9, 300 sec: 5870.9). Total num frames: 4214784. Throughput: 0: 1364.1. Samples: 40924. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2024-11-07 14:35:52,111][04584] Avg episode reward: [(0, '4.551')]
+[2024-11-07 14:35:55,421][04701] Updated weights for policy 0, policy_version 1036 (0.0029)
+[2024-11-07 14:35:57,108][04584] Fps is (10 sec: 8192.0, 60 sec: 6202.5, 300 sec: 6202.5). Total num frames: 4255744. Throughput: 0: 1530.4. Samples: 53564. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-11-07 14:35:57,109][04584] Avg episode reward: [(0, '4.498')]
+[2024-11-07 14:36:00,664][04701] Updated weights for policy 0, policy_version 1046 (0.0037)
+[2024-11-07 14:36:02,109][04584] Fps is (10 sec: 7781.9, 60 sec: 6348.7, 300 sec: 6348.7). Total num frames: 4292608. Throughput: 0: 1490.7. Samples: 59628. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-11-07 14:36:02,118][04584] Avg episode reward: [(0, '4.486')]
+[2024-11-07 14:36:07,108][04584] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6280.5). Total num frames: 4321280. Throughput: 0: 1538.8. Samples: 69244. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2024-11-07 14:36:07,110][04584] Avg episode reward: [(0, '4.349')]
+[2024-11-07 14:36:07,113][04701] Updated weights for policy 0, policy_version 1056 (0.0030)
+[2024-11-07 14:36:12,109][04584] Fps is (10 sec: 4915.4, 60 sec: 6062.0, 300 sec: 6062.0). Total num frames: 4341760. Throughput: 0: 1618.7. Samples: 75570. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
+[2024-11-07 14:36:12,111][04584] Avg episode reward: [(0, '4.336')]
+[2024-11-07 14:36:15,164][04701] Updated weights for policy 0, policy_version 1066 (0.0029)
+[2024-11-07 14:36:17,109][04584] Fps is (10 sec: 5734.2, 60 sec: 6181.2, 300 sec: 6181.2). Total num frames: 4378624. Throughput: 0: 1729.5. Samples: 80826. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2024-11-07 14:36:17,114][04584] Avg episode reward: [(0, '4.405')]
+[2024-11-07 14:36:20,267][04701] Updated weights for policy 0, policy_version 1076 (0.0026)
+[2024-11-07 14:36:22,108][04584] Fps is (10 sec: 7782.7, 60 sec: 6348.8, 300 sec: 6348.8). Total num frames: 4419584. Throughput: 0: 1830.0. Samples: 92864. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-11-07 14:36:22,111][04584] Avg episode reward: [(0, '4.417')]
+[2024-11-07 14:36:25,223][04701] Updated weights for policy 0, policy_version 1086 (0.0024)
+[2024-11-07 14:36:27,108][04584] Fps is (10 sec: 8192.4, 60 sec: 7031.5, 300 sec: 6490.6). Total num frames: 4460544. Throughput: 0: 1831.8. Samples: 105126. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-11-07 14:36:27,110][04584] Avg episode reward: [(0, '4.564')]
+[2024-11-07 14:36:30,159][04701] Updated weights for policy 0, policy_version 1096 (0.0032)
+[2024-11-07 14:36:32,108][04584] Fps is (10 sec: 8192.0, 60 sec: 7236.3, 300 sec: 6612.1). Total num frames: 4501504. Throughput: 0: 1834.5. Samples: 111254. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:36:32,110][04584] Avg episode reward: [(0, '4.676')]
+[2024-11-07 14:36:35,225][04701] Updated weights for policy 0, policy_version 1106 (0.0022)
+[2024-11-07 14:36:37,108][04584] Fps is (10 sec: 8191.8, 60 sec: 7509.3, 300 sec: 6717.4). Total num frames: 4542464. Throughput: 0: 1839.1. Samples: 123682. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-11-07 14:36:37,110][04584] Avg episode reward: [(0, '4.525')]
+[2024-11-07 14:36:40,134][04701] Updated weights for policy 0, policy_version 1116 (0.0024)
+[2024-11-07 14:36:43,724][04584] Fps is (10 sec: 6699.8, 60 sec: 7246.0, 300 sec: 6624.6). Total num frames: 4579328. Throughput: 0: 1768.0. Samples: 135982. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2024-11-07 14:36:43,725][04584] Avg episode reward: [(0, '4.556')]
+[2024-11-07 14:36:47,108][04584] Fps is (10 sec: 6553.6, 60 sec: 7236.2, 300 sec: 6698.1). Total num frames: 4608000. Throughput: 0: 1733.3. Samples: 137624. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2024-11-07 14:36:47,113][04584] Avg episode reward: [(0, '4.348')]
+[2024-11-07 14:36:47,253][04701] Updated weights for policy 0, policy_version 1126 (0.0024)
+[2024-11-07 14:36:52,108][04584] Fps is (10 sec: 8305.3, 60 sec: 7236.3, 300 sec: 6781.2). Total num frames: 4648960. Throughput: 0: 1783.3. Samples: 149492. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2024-11-07 14:36:52,109][04584] Avg episode reward: [(0, '4.319')]
+[2024-11-07 14:36:52,633][04701] Updated weights for policy 0, policy_version 1136 (0.0028)
+[2024-11-07 14:36:57,108][04584] Fps is (10 sec: 7782.6, 60 sec: 7168.0, 300 sec: 6812.3). Total num frames: 4685824. Throughput: 0: 1909.7. Samples: 161508. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:36:57,110][04584] Avg episode reward: [(0, '4.504')]
+[2024-11-07 14:36:57,814][04701] Updated weights for policy 0, policy_version 1146 (0.0030)
+[2024-11-07 14:37:02,108][04584] Fps is (10 sec: 7372.7, 60 sec: 7168.1, 300 sec: 6840.3). Total num frames: 4722688. Throughput: 0: 1923.7. Samples: 167394. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:37:02,110][04584] Avg episode reward: [(0, '4.470')]
+[2024-11-07 14:37:03,385][04701] Updated weights for policy 0, policy_version 1156 (0.0031)
+[2024-11-07 14:37:07,108][04584] Fps is (10 sec: 7782.3, 60 sec: 7372.8, 300 sec: 6904.7). Total num frames: 4763648. Throughput: 0: 1904.9. Samples: 178584. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:37:07,111][04584] Avg episode reward: [(0, '4.431')]
+[2024-11-07 14:37:07,126][04688] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001163_4763648.pth...
+[2024-11-07 14:37:07,332][04688] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000984_4030464.pth
+[2024-11-07 14:37:08,663][04701] Updated weights for policy 0, policy_version 1166 (0.0024)
+[2024-11-07 14:37:12,108][04584] Fps is (10 sec: 7372.8, 60 sec: 7577.6, 300 sec: 6888.7). Total num frames: 4796416. Throughput: 0: 1873.3. Samples: 189424. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:37:12,110][04584] Avg episode reward: [(0, '4.392')]
+[2024-11-07 14:37:14,273][04701] Updated weights for policy 0, policy_version 1176 (0.0024)
+[2024-11-07 14:37:18,138][04584] Fps is (10 sec: 5941.6, 60 sec: 7382.6, 300 sec: 6813.1). Total num frames: 4829184. Throughput: 0: 1821.9. Samples: 195116. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:37:18,142][04584] Avg episode reward: [(0, '4.604')]
+[2024-11-07 14:37:21,528][04701] Updated weights for policy 0, policy_version 1186 (0.0027)
+[2024-11-07 14:37:22,108][04584] Fps is (10 sec: 6553.5, 60 sec: 7372.8, 300 sec: 6860.8). Total num frames: 4861952. Throughput: 0: 1756.1. Samples: 202708. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2024-11-07 14:37:22,110][04584] Avg episode reward: [(0, '4.486')]
+[2024-11-07 14:37:26,690][04701] Updated weights for policy 0, policy_version 1196 (0.0022)
+[2024-11-07 14:37:27,108][04584] Fps is (10 sec: 7763.0, 60 sec: 7304.5, 300 sec: 6881.3). Total num frames: 4898816. Throughput: 0: 1816.0. Samples: 214768. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:37:27,110][04584] Avg episode reward: [(0, '4.418')]
+[2024-11-07 14:37:31,942][04701] Updated weights for policy 0, policy_version 1206 (0.0030)
+[2024-11-07 14:37:32,108][04584] Fps is (10 sec: 7782.5, 60 sec: 7304.5, 300 sec: 6931.7). Total num frames: 4939776. Throughput: 0: 1847.4. Samples: 220758. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
+[2024-11-07 14:37:32,110][04584] Avg episode reward: [(0, '4.316')]
+[2024-11-07 14:37:36,986][04701] Updated weights for policy 0, policy_version 1216 (0.0022)
+[2024-11-07 14:37:37,108][04584] Fps is (10 sec: 8192.1, 60 sec: 7304.6, 300 sec: 6978.4). Total num frames: 4980736. Throughput: 0: 1848.0. Samples: 232650. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-11-07 14:37:37,109][04584] Avg episode reward: [(0, '4.397')]
+[2024-11-07 14:37:41,959][04701] Updated weights for policy 0, policy_version 1226 (0.0024)
+[2024-11-07 14:37:42,109][04584] Fps is (10 sec: 8191.4, 60 sec: 7576.7, 300 sec: 7021.7). Total num frames: 5021696. Throughput: 0: 1853.3. Samples: 244906. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-11-07 14:37:42,111][04584] Avg episode reward: [(0, '4.489')]
+[2024-11-07 14:37:46,940][04701] Updated weights for policy 0, policy_version 1236 (0.0028)
+[2024-11-07 14:37:47,108][04584] Fps is (10 sec: 8191.8, 60 sec: 7577.6, 300 sec: 7062.1). Total num frames: 5062656. Throughput: 0: 1859.2. Samples: 251056. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:37:47,110][04584] Avg episode reward: [(0, '4.449')]
+[2024-11-07 14:37:52,554][04584] Fps is (10 sec: 6274.2, 60 sec: 7250.6, 300 sec: 6969.8). Total num frames: 5087232. Throughput: 0: 1865.9. Samples: 263382. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-11-07 14:37:52,559][04584] Avg episode reward: [(0, '4.337')]
+[2024-11-07 14:37:54,172][04701] Updated weights for policy 0, policy_version 1246 (0.0027)
+[2024-11-07 14:37:57,108][04584] Fps is (10 sec: 6144.0, 60 sec: 7304.5, 300 sec: 7002.8). Total num frames: 5124096. Throughput: 0: 1812.0. Samples: 270964. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-11-07 14:37:57,110][04584] Avg episode reward: [(0, '4.357')]
+[2024-11-07 14:37:59,230][04701] Updated weights for policy 0, policy_version 1256 (0.0026)
+[2024-11-07 14:38:02,108][04584] Fps is (10 sec: 8145.9, 60 sec: 7372.8, 300 sec: 7040.0). Total num frames: 5165056. Throughput: 0: 1866.2. Samples: 277174. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:38:02,109][04584] Avg episode reward: [(0, '4.437')]
+[2024-11-07 14:38:04,744][04701] Updated weights for policy 0, policy_version 1266 (0.0023)
+[2024-11-07 14:38:07,108][04584] Fps is (10 sec: 7782.5, 60 sec: 7304.5, 300 sec: 7050.1). Total num frames: 5201920. Throughput: 0: 1901.1. Samples: 288256. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2024-11-07 14:38:07,110][04584] Avg episode reward: [(0, '4.331')]
+[2024-11-07 14:38:09,768][04701] Updated weights for policy 0, policy_version 1276 (0.0026)
+[2024-11-07 14:38:12,108][04584] Fps is (10 sec: 7782.4, 60 sec: 7441.1, 300 sec: 7083.7). Total num frames: 5242880. Throughput: 0: 1908.4. Samples: 300644. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-11-07 14:38:12,110][04584] Avg episode reward: [(0, '4.525')]
+[2024-11-07 14:38:14,720][04701] Updated weights for policy 0, policy_version 1286 (0.0019)
+[2024-11-07 14:38:17,108][04584] Fps is (10 sec: 8191.9, 60 sec: 7710.0, 300 sec: 7115.3). Total num frames: 5283840. Throughput: 0: 1914.2. Samples: 306896. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-11-07 14:38:17,110][04584] Avg episode reward: [(0, '4.575')]
+[2024-11-07 14:38:19,757][04701] Updated weights for policy 0, policy_version 1296 (0.0029)
+[2024-11-07 14:38:22,108][04584] Fps is (10 sec: 8192.0, 60 sec: 7714.2, 300 sec: 7145.2). Total num frames: 5324800. Throughput: 0: 1920.4. Samples: 319070. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:38:22,112][04584] Avg episode reward: [(0, '4.339')]
+[2024-11-07 14:38:24,651][04701] Updated weights for policy 0, policy_version 1306 (0.0030)
+[2024-11-07 14:38:27,108][04584] Fps is (10 sec: 6553.5, 60 sec: 7509.3, 300 sec: 7085.0). Total num frames: 5349376. Throughput: 0: 1862.0. Samples: 328694. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-11-07 14:38:27,110][04584] Avg episode reward: [(0, '4.349')]
+[2024-11-07 14:38:31,931][04701] Updated weights for policy 0, policy_version 1316 (0.0027)
+[2024-11-07 14:38:32,108][04584] Fps is (10 sec: 6553.6, 60 sec: 7509.4, 300 sec: 7114.1). Total num frames: 5390336. Throughput: 0: 1825.5. Samples: 333202. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2024-11-07 14:38:32,110][04584] Avg episode reward: [(0, '4.554')]
+[2024-11-07 14:38:36,771][04701] Updated weights for policy 0, policy_version 1326 (0.0026)
+[2024-11-07 14:38:37,108][04584] Fps is (10 sec: 8192.1, 60 sec: 7509.3, 300 sec: 7141.7). Total num frames: 5431296. Throughput: 0: 1844.8. Samples: 345574. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-11-07 14:38:37,111][04584] Avg episode reward: [(0, '4.685')]
+[2024-11-07 14:38:41,807][04701] Updated weights for policy 0, policy_version 1336 (0.0021)
+[2024-11-07 14:38:42,109][04584] Fps is (10 sec: 8191.4, 60 sec: 7509.4, 300 sec: 7168.0). Total num frames: 5472256. Throughput: 0: 1933.9. Samples: 357990. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2024-11-07 14:38:42,111][04584] Avg episode reward: [(0, '4.322')]
+[2024-11-07 14:38:46,713][04701] Updated weights for policy 0, policy_version 1346 (0.0026)
+[2024-11-07 14:38:47,109][04584] Fps is (10 sec: 8191.8, 60 sec: 7509.3, 300 sec: 7193.0). Total num frames: 5513216. Throughput: 0: 1934.4. Samples: 364222. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-11-07 14:38:47,112][04584] Avg episode reward: [(0, '4.330')]
+[2024-11-07 14:38:51,629][04701] Updated weights for policy 0, policy_version 1356 (0.0022)
+[2024-11-07 14:38:52,108][04584] Fps is (10 sec: 8192.4, 60 sec: 7840.7, 300 sec: 7216.8). Total num frames: 5554176. Throughput: 0: 1963.5. Samples: 376612. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:38:52,110][04584] Avg episode reward: [(0, '4.316')]
+[2024-11-07 14:38:56,630][04701] Updated weights for policy 0, policy_version 1366 (0.0025)
+[2024-11-07 14:38:57,109][04584] Fps is (10 sec: 8192.0, 60 sec: 7850.7, 300 sec: 7239.4). Total num frames: 5595136. Throughput: 0: 1961.9. Samples: 388932. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:38:57,111][04584] Avg episode reward: [(0, '4.435')]
+[2024-11-07 14:39:02,108][04584] Fps is (10 sec: 6553.8, 60 sec: 7577.6, 300 sec: 7186.6). Total num frames: 5619712. Throughput: 0: 1958.1. Samples: 395012. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-11-07 14:39:02,110][04584] Avg episode reward: [(0, '4.515')]
+[2024-11-07 14:39:04,078][04701] Updated weights for policy 0, policy_version 1376 (0.0027)
+[2024-11-07 14:39:07,108][04584] Fps is (10 sec: 6144.1, 60 sec: 7577.6, 300 sec: 7190.7). Total num frames: 5656576. Throughput: 0: 1847.9. Samples: 402224. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:39:07,110][04584] Avg episode reward: [(0, '4.473')]
+[2024-11-07 14:39:07,120][04688] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001382_5660672.pth...
+[2024-11-07 14:39:07,214][04688] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000986_4038656.pth
+[2024-11-07 14:39:09,137][04701] Updated weights for policy 0, policy_version 1386 (0.0028)
+[2024-11-07 14:39:12,108][04584] Fps is (10 sec: 7782.3, 60 sec: 7577.6, 300 sec: 7212.5). Total num frames: 5697536. Throughput: 0: 1907.0. Samples: 414508. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:39:12,111][04584] Avg episode reward: [(0, '4.401')]
+[2024-11-07 14:39:14,203][04701] Updated weights for policy 0, policy_version 1396 (0.0024)
+[2024-11-07 14:39:17,110][04584] Fps is (10 sec: 8600.6, 60 sec: 7645.7, 300 sec: 7250.7). Total num frames: 5742592. Throughput: 0: 1942.3. Samples: 420606. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:39:17,112][04584] Avg episode reward: [(0, '4.251')]
+[2024-11-07 14:39:19,131][04701] Updated weights for policy 0, policy_version 1406 (0.0027)
+[2024-11-07 14:39:22,108][04584] Fps is (10 sec: 8601.8, 60 sec: 7645.9, 300 sec: 7270.4). Total num frames: 5783552. Throughput: 0: 1941.7. Samples: 432950. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:39:22,110][04584] Avg episode reward: [(0, '4.408')]
+[2024-11-07 14:39:24,033][04701] Updated weights for policy 0, policy_version 1416 (0.0026)
+[2024-11-07 14:39:27,108][04584] Fps is (10 sec: 8193.0, 60 sec: 7918.9, 300 sec: 7289.2). Total num frames: 5824512. Throughput: 0: 1942.5. Samples: 445400. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2024-11-07 14:39:27,112][04584] Avg episode reward: [(0, '4.368')]
+[2024-11-07 14:39:29,061][04701] Updated weights for policy 0, policy_version 1426 (0.0028)
+[2024-11-07 14:39:32,109][04584] Fps is (10 sec: 8191.0, 60 sec: 7918.8, 300 sec: 7307.2). Total num frames: 5865472. Throughput: 0: 1942.4. Samples: 451632. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-11-07 14:39:32,112][04584] Avg episode reward: [(0, '4.311')]
+[2024-11-07 14:39:36,164][04701] Updated weights for policy 0, policy_version 1436 (0.0029)
+[2024-11-07 14:39:37,108][04584] Fps is (10 sec: 6144.0, 60 sec: 7577.6, 300 sec: 7244.3). Total num frames: 5885952. Throughput: 0: 1871.7. Samples: 460840. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:39:37,110][04584] Avg episode reward: [(0, '4.291')]
+[2024-11-07 14:39:41,153][04701] Updated weights for policy 0, policy_version 1446 (0.0026)
+[2024-11-07 14:39:42,108][04584] Fps is (10 sec: 6144.7, 60 sec: 7577.7, 300 sec: 7262.5). Total num frames: 5926912. Throughput: 0: 1838.3. Samples: 471654. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2024-11-07 14:39:42,110][04584] Avg episode reward: [(0, '4.399')]
+[2024-11-07 14:39:46,287][04701] Updated weights for policy 0, policy_version 1456 (0.0024)
+[2024-11-07 14:39:47,109][04584] Fps is (10 sec: 8191.7, 60 sec: 7577.6, 300 sec: 7280.0). Total num frames: 5967872. Throughput: 0: 1836.7. Samples: 477666. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-11-07 14:39:47,111][04584] Avg episode reward: [(0, '4.391')]
+[2024-11-07 14:39:51,647][04701] Updated weights for policy 0, policy_version 1466 (0.0024)
+[2024-11-07 14:39:52,108][04584] Fps is (10 sec: 8192.0, 60 sec: 7577.6, 300 sec: 7296.9). Total num frames: 6008832. Throughput: 0: 1936.8. Samples: 489382. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-11-07 14:39:52,110][04584] Avg episode reward: [(0, '4.306')]
+[2024-11-07 14:39:56,638][04701] Updated weights for policy 0, policy_version 1476 (0.0027)
+[2024-11-07 14:39:57,109][04584] Fps is (10 sec: 7782.4, 60 sec: 7509.3, 300 sec: 7298.3). Total num frames: 6045696. Throughput: 0: 1930.7. Samples: 501392. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:39:57,111][04584] Avg episode reward: [(0, '4.586')]
+[2024-11-07 14:40:01,928][04701] Updated weights for policy 0, policy_version 1486 (0.0022)
+[2024-11-07 14:40:02,108][04584] Fps is (10 sec: 7782.5, 60 sec: 7782.4, 300 sec: 7314.3). Total num frames: 6086656. Throughput: 0: 1933.2. Samples: 507596. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:40:02,111][04584] Avg episode reward: [(0, '4.157')]
+[2024-11-07 14:40:07,045][04701] Updated weights for policy 0, policy_version 1496 (0.0028)
+[2024-11-07 14:40:07,108][04584] Fps is (10 sec: 8192.4, 60 sec: 7850.7, 300 sec: 7329.7). Total num frames: 6127616. Throughput: 0: 1921.9. Samples: 519434. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:40:07,110][04584] Avg episode reward: [(0, '4.650')]
+[2024-11-07 14:40:12,108][04584] Fps is (10 sec: 5734.3, 60 sec: 7441.1, 300 sec: 7259.8). Total num frames: 6144000. Throughput: 0: 1798.1. Samples: 526316. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-11-07 14:40:12,115][04584] Avg episode reward: [(0, '4.644')]
+[2024-11-07 14:40:15,781][04701] Updated weights for policy 0, policy_version 1506 (0.0037)
+[2024-11-07 14:40:17,108][04584] Fps is (10 sec: 4915.2, 60 sec: 7236.4, 300 sec: 7247.8). Total num frames: 6176768. Throughput: 0: 1762.2. Samples: 530928. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-11-07 14:40:17,111][04584] Avg episode reward: [(0, '4.390')]
+[2024-11-07 14:40:21,601][04701] Updated weights for policy 0, policy_version 1516 (0.0034)
+[2024-11-07 14:40:22,108][04584] Fps is (10 sec: 6553.7, 60 sec: 7099.7, 300 sec: 7358.9). Total num frames: 6209536. Throughput: 0: 1769.1. Samples: 540450. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:40:22,110][04584] Avg episode reward: [(0, '4.347')]
+[2024-11-07 14:40:26,748][04701] Updated weights for policy 0, policy_version 1526 (0.0027)
+[2024-11-07 14:40:27,109][04584] Fps is (10 sec: 7372.5, 60 sec: 7099.7, 300 sec: 7400.6). Total num frames: 6250496. Throughput: 0: 1796.7. Samples: 552504. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-11-07 14:40:27,110][04584] Avg episode reward: [(0, '4.378')]
+[2024-11-07 14:40:31,804][04701] Updated weights for policy 0, policy_version 1536 (0.0030)
+[2024-11-07 14:40:32,108][04584] Fps is (10 sec: 8191.9, 60 sec: 7099.9, 300 sec: 7456.1). Total num frames: 6291456. Throughput: 0: 1797.6. Samples: 558558. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2024-11-07 14:40:32,111][04584] Avg episode reward: [(0, '4.402')]
+[2024-11-07 14:40:36,996][04701] Updated weights for policy 0, policy_version 1546 (0.0025)
+[2024-11-07 14:40:37,108][04584] Fps is (10 sec: 8192.1, 60 sec: 7441.1, 300 sec: 7456.1). Total num frames: 6332416. Throughput: 0: 1804.4. Samples: 570582. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-11-07 14:40:37,110][04584] Avg episode reward: [(0, '4.223')]
+[2024-11-07 14:40:42,112][04584] Fps is (10 sec: 7779.5, 60 sec: 7372.4, 300 sec: 7442.1). Total num frames: 6369280. Throughput: 0: 1800.2. Samples: 582408. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:40:42,115][04584] Avg episode reward: [(0, '4.626')]
+[2024-11-07 14:40:42,246][04701] Updated weights for policy 0, policy_version 1556 (0.0021)
+[2024-11-07 14:40:47,109][04584] Fps is (10 sec: 5734.3, 60 sec: 7031.5, 300 sec: 7372.8). Total num frames: 6389760. Throughput: 0: 1715.1. Samples: 584778. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:40:47,111][04584] Avg episode reward: [(0, '4.667')]
+[2024-11-07 14:40:50,289][04701] Updated weights for policy 0, policy_version 1566 (0.0036)
+[2024-11-07 14:40:52,108][04584] Fps is (10 sec: 5736.4, 60 sec: 6963.2, 300 sec: 7358.9). Total num frames: 6426624. Throughput: 0: 1668.8. Samples: 594530. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-11-07 14:40:52,111][04584] Avg episode reward: [(0, '4.452')]
+[2024-11-07 14:40:55,426][04701] Updated weights for policy 0, policy_version 1576 (0.0025)
+[2024-11-07 14:40:57,108][04584] Fps is (10 sec: 7782.4, 60 sec: 7031.5, 300 sec: 7372.8). Total num frames: 6467584. Throughput: 0: 1779.1. Samples: 606376. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:40:57,110][04584] Avg episode reward: [(0, '4.567')]
+[2024-11-07 14:41:00,789][04701] Updated weights for policy 0, policy_version 1586 (0.0028)
+[2024-11-07 14:41:02,109][04584] Fps is (10 sec: 7781.9, 60 sec: 6963.1, 300 sec: 7400.5). Total num frames: 6504448. Throughput: 0: 1799.3. Samples: 611900. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2024-11-07 14:41:02,112][04584] Avg episode reward: [(0, '4.513')]
+[2024-11-07 14:41:06,194][04701] Updated weights for policy 0, policy_version 1596 (0.0034)
+[2024-11-07 14:41:07,108][04584] Fps is (10 sec: 7373.0, 60 sec: 6894.9, 300 sec: 7456.1). Total num frames: 6541312. Throughput: 0: 1839.9. Samples: 623244. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:41:07,110][04584] Avg episode reward: [(0, '4.449')]
+[2024-11-07 14:41:07,219][04688] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001598_6545408.pth...
+[2024-11-07 14:41:07,348][04688] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001163_4763648.pth
+[2024-11-07 14:41:11,911][04701] Updated weights for policy 0, policy_version 1606 (0.0033)
+[2024-11-07 14:41:12,108][04584] Fps is (10 sec: 7373.5, 60 sec: 7236.3, 300 sec: 7456.1). Total num frames: 6578176. Throughput: 0: 1821.6. Samples: 634474. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:41:12,110][04584] Avg episode reward: [(0, '4.328')]
+[2024-11-07 14:41:18,681][04584] Fps is (10 sec: 6370.7, 60 sec: 7117.9, 300 sec: 7402.7). Total num frames: 6615040. Throughput: 0: 1743.5. Samples: 639758. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:41:18,683][04584] Avg episode reward: [(0, '4.432')]
+[2024-11-07 14:41:18,837][04701] Updated weights for policy 0, policy_version 1616 (0.0024)
+[2024-11-07 14:41:22,108][04584] Fps is (10 sec: 6144.0, 60 sec: 7168.0, 300 sec: 7386.7). Total num frames: 6639616. Throughput: 0: 1727.3. Samples: 648310. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:41:22,112][04584] Avg episode reward: [(0, '4.391')]
+[2024-11-07 14:41:24,288][04701] Updated weights for policy 0, policy_version 1626 (0.0036)
+[2024-11-07 14:41:27,108][04584] Fps is (10 sec: 7777.0, 60 sec: 7168.0, 300 sec: 7386.7). Total num frames: 6680576. Throughput: 0: 1725.2. Samples: 660036. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:41:27,110][04584] Avg episode reward: [(0, '4.320')]
+[2024-11-07 14:41:29,172][04701] Updated weights for policy 0, policy_version 1636 (0.0028)
+[2024-11-07 14:41:32,108][04584] Fps is (10 sec: 8192.0, 60 sec: 7168.0, 300 sec: 7386.7). Total num frames: 6721536. Throughput: 0: 1813.1. Samples: 666366. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2024-11-07 14:41:32,110][04584] Avg episode reward: [(0, '4.490')]
+[2024-11-07 14:41:34,208][04701] Updated weights for policy 0, policy_version 1646 (0.0023)
+[2024-11-07 14:41:37,108][04584] Fps is (10 sec: 8192.0, 60 sec: 7168.0, 300 sec: 7441.3). Total num frames: 6762496. Throughput: 0: 1863.7. Samples: 678394. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2024-11-07 14:41:37,109][04584] Avg episode reward: [(0, '4.537')]
+[2024-11-07 14:41:39,513][04701] Updated weights for policy 0, policy_version 1656 (0.0026)
+[2024-11-07 14:41:42,108][04584] Fps is (10 sec: 7782.3, 60 sec: 7168.5, 300 sec: 7428.3). Total num frames: 6799360. Throughput: 0: 1856.8. Samples: 689932. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:41:42,110][04584] Avg episode reward: [(0, '4.371')]
+[2024-11-07 14:41:44,930][04701] Updated weights for policy 0, policy_version 1666 (0.0028)
+[2024-11-07 14:41:47,108][04584] Fps is (10 sec: 7782.4, 60 sec: 7509.4, 300 sec: 7428.3). Total num frames: 6840320. Throughput: 0: 1863.2. Samples: 695744. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-11-07 14:41:47,110][04584] Avg episode reward: [(0, '4.330')]
+[2024-11-07 14:41:50,403][04701] Updated weights for policy 0, policy_version 1676 (0.0030)
+[2024-11-07 14:41:53,381][04584] Fps is (10 sec: 6177.0, 60 sec: 7219.7, 300 sec: 7368.8). Total num frames: 6868992. Throughput: 0: 1810.9. Samples: 707038. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-11-07 14:41:53,382][04584] Avg episode reward: [(0, '4.407')]
+[2024-11-07 14:41:57,108][04584] Fps is (10 sec: 5734.4, 60 sec: 7168.0, 300 sec: 7372.8). Total num frames: 6897664. Throughput: 0: 1764.1. Samples: 713860. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-11-07 14:41:57,110][04584] Avg episode reward: [(0, '4.690')]
+[2024-11-07 14:41:58,056][04701] Updated weights for policy 0, policy_version 1686 (0.0034)
+[2024-11-07 14:42:02,109][04584] Fps is (10 sec: 7509.1, 60 sec: 7168.1, 300 sec: 7358.9). Total num frames: 6934528. Throughput: 0: 1843.5. Samples: 719816. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:42:02,110][04584] Avg episode reward: [(0, '4.442')]
+[2024-11-07 14:42:03,920][04701] Updated weights for policy 0, policy_version 1696 (0.0041)
+[2024-11-07 14:42:07,108][04584] Fps is (10 sec: 6963.1, 60 sec: 7099.7, 300 sec: 7358.9). Total num frames: 6967296. Throughput: 0: 1819.1. Samples: 730168. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:42:07,111][04584] Avg episode reward: [(0, '4.615')]
+[2024-11-07 14:42:09,344][04701] Updated weights for policy 0, policy_version 1706 (0.0034)
+[2024-11-07 14:42:12,108][04584] Fps is (10 sec: 6963.5, 60 sec: 7099.7, 300 sec: 7398.6). Total num frames: 7004160. Throughput: 0: 1801.8. Samples: 741116. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:42:12,111][04584] Avg episode reward: [(0, '4.542')]
+[2024-11-07 14:42:15,345][04701] Updated weights for policy 0, policy_version 1716 (0.0029)
+[2024-11-07 14:42:17,109][04584] Fps is (10 sec: 7372.2, 60 sec: 7290.8, 300 sec: 7386.7). Total num frames: 7041024. Throughput: 0: 1774.6. Samples: 746224. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:42:17,130][04584] Avg episode reward: [(0, '4.596')]
+[2024-11-07 14:42:20,616][04701] Updated weights for policy 0, policy_version 1726 (0.0025)
+[2024-11-07 14:42:22,108][04584] Fps is (10 sec: 7372.7, 60 sec: 7304.5, 300 sec: 7386.7). Total num frames: 7077888. Throughput: 0: 1764.3. Samples: 757786. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-11-07 14:42:22,110][04584] Avg episode reward: [(0, '4.395')]
+[2024-11-07 14:42:27,773][04584] Fps is (10 sec: 6145.8, 60 sec: 7022.0, 300 sec: 7328.5). Total num frames: 7106560. Throughput: 0: 1616.7. Samples: 763760. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:42:27,774][04584] Avg episode reward: [(0, '4.528')]
+[2024-11-07 14:42:27,814][04701] Updated weights for policy 0, policy_version 1736 (0.0034)
+[2024-11-07 14:42:32,108][04584] Fps is (10 sec: 6553.6, 60 sec: 7031.4, 300 sec: 7331.1). Total num frames: 7143424. Throughput: 0: 1680.9. Samples: 771384. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:42:32,113][04584] Avg episode reward: [(0, '4.745')]
+[2024-11-07 14:42:33,100][04701] Updated weights for policy 0, policy_version 1746 (0.0023)
+[2024-11-07 14:42:37,108][04584] Fps is (10 sec: 8336.2, 60 sec: 7031.4, 300 sec: 7331.2). Total num frames: 7184384. Throughput: 0: 1741.0. Samples: 783168. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-11-07 14:42:37,110][04584] Avg episode reward: [(0, '4.300')]
+[2024-11-07 14:42:38,140][04701] Updated weights for policy 0, policy_version 1756 (0.0034)
+[2024-11-07 14:42:42,108][04584] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 7303.4). Total num frames: 7217152. Throughput: 0: 1784.7. Samples: 794170. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2024-11-07 14:42:42,110][04584] Avg episode reward: [(0, '4.610')]
+[2024-11-07 14:42:43,824][04701] Updated weights for policy 0, policy_version 1766 (0.0026)
+[2024-11-07 14:42:47,108][04584] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 7370.1). Total num frames: 7258112. Throughput: 0: 1789.5. Samples: 800344. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
+[2024-11-07 14:42:47,110][04584] Avg episode reward: [(0, '4.393')]
+[2024-11-07 14:42:49,006][04701] Updated weights for policy 0, policy_version 1776 (0.0030)
+[2024-11-07 14:42:52,109][04584] Fps is (10 sec: 7781.6, 60 sec: 7253.5, 300 sec: 7358.9). Total num frames: 7294976. Throughput: 0: 1817.0. Samples: 811934. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-11-07 14:42:52,113][04584] Avg episode reward: [(0, '4.440')]
+[2024-11-07 14:42:54,207][04701] Updated weights for policy 0, policy_version 1786 (0.0029)
+[2024-11-07 14:42:57,108][04584] Fps is (10 sec: 7782.3, 60 sec: 7304.5, 300 sec: 7358.9). Total num frames: 7335936. Throughput: 0: 1841.5. Samples: 823984. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:42:57,110][04584] Avg episode reward: [(0, '4.300')]
+[2024-11-07 14:42:59,392][04701] Updated weights for policy 0, policy_version 1796 (0.0028)
+[2024-11-07 14:43:02,199][04584] Fps is (10 sec: 6495.4, 60 sec: 7089.1, 300 sec: 7315.0). Total num frames: 7360512. Throughput: 0: 1856.0. Samples: 829910. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-11-07 14:43:02,202][04584] Avg episode reward: [(0, '4.381')]
+[2024-11-07 14:43:06,958][04701] Updated weights for policy 0, policy_version 1806 (0.0026)
+[2024-11-07 14:43:07,108][04584] Fps is (10 sec: 6144.0, 60 sec: 7168.0, 300 sec: 7303.4). Total num frames: 7397376. Throughput: 0: 1761.0. Samples: 837030. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-11-07 14:43:07,110][04584] Avg episode reward: [(0, '4.652')]
+[2024-11-07 14:43:07,127][04688] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001806_7397376.pth...
+[2024-11-07 14:43:07,318][04688] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001382_5660672.pth
+[2024-11-07 14:43:12,082][04701] Updated weights for policy 0, policy_version 1816 (0.0029)
+[2024-11-07 14:43:12,108][04584] Fps is (10 sec: 7853.5, 60 sec: 7236.3, 300 sec: 7303.4). Total num frames: 7438336. Throughput: 0: 1917.2. Samples: 848760. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:43:12,110][04584] Avg episode reward: [(0, '4.544')]
+[2024-11-07 14:43:17,108][04584] Fps is (10 sec: 7782.6, 60 sec: 7236.4, 300 sec: 7289.5). Total num frames: 7475200. Throughput: 0: 1853.6. Samples: 854794. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2024-11-07 14:43:17,111][04584] Avg episode reward: [(0, '4.361')]
+[2024-11-07 14:43:17,153][04701] Updated weights for policy 0, policy_version 1826 (0.0025)
+[2024-11-07 14:43:22,108][04584] Fps is (10 sec: 7782.4, 60 sec: 7304.5, 300 sec: 7345.0). Total num frames: 7516160. Throughput: 0: 1864.3. Samples: 867062. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-11-07 14:43:22,112][04584] Avg episode reward: [(0, '4.248')]
+[2024-11-07 14:43:22,238][04701] Updated weights for policy 0, policy_version 1836 (0.0026)
+[2024-11-07 14:43:27,109][04584] Fps is (10 sec: 8191.7, 60 sec: 7593.4, 300 sec: 7345.0). Total num frames: 7557120. Throughput: 0: 1885.3. Samples: 879010. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:43:27,111][04584] Avg episode reward: [(0, '4.410')]
+[2024-11-07 14:43:27,492][04701] Updated weights for policy 0, policy_version 1846 (0.0023)
+[2024-11-07 14:43:32,108][04584] Fps is (10 sec: 7782.3, 60 sec: 7509.3, 300 sec: 7331.1). Total num frames: 7593984. Throughput: 0: 1875.4. Samples: 884738. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-11-07 14:43:32,110][04584] Avg episode reward: [(0, '4.527')]
+[2024-11-07 14:43:32,694][04701] Updated weights for policy 0, policy_version 1856 (0.0022)
+[2024-11-07 14:43:37,108][04584] Fps is (10 sec: 6144.2, 60 sec: 7236.3, 300 sec: 7275.6). Total num frames: 7618560. Throughput: 0: 1849.5. Samples: 895158. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:43:37,110][04584] Avg episode reward: [(0, '4.513')]
+[2024-11-07 14:43:39,970][04701] Updated weights for policy 0, policy_version 1866 (0.0035)
+[2024-11-07 14:43:42,108][04584] Fps is (10 sec: 6553.6, 60 sec: 7372.8, 300 sec: 7275.6). Total num frames: 7659520. Throughput: 0: 1785.7. Samples: 904342. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:43:42,110][04584] Avg episode reward: [(0, '4.196')]
+[2024-11-07 14:43:45,251][04701] Updated weights for policy 0, policy_version 1876 (0.0029)
+[2024-11-07 14:43:47,108][04584] Fps is (10 sec: 7782.4, 60 sec: 7304.5, 300 sec: 7261.7). Total num frames: 7696384. Throughput: 0: 1783.9. Samples: 910026. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:43:47,110][04584] Avg episode reward: [(0, '4.252')]
+[2024-11-07 14:43:50,305][04701] Updated weights for policy 0, policy_version 1886 (0.0024)
+[2024-11-07 14:43:52,108][04584] Fps is (10 sec: 7782.5, 60 sec: 7372.9, 300 sec: 7261.7). Total num frames: 7737344. Throughput: 0: 1890.9. Samples: 922118. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:43:52,110][04584] Avg episode reward: [(0, '4.414')]
+[2024-11-07 14:43:55,379][04701] Updated weights for policy 0, policy_version 1896 (0.0020)
+[2024-11-07 14:43:57,109][04584] Fps is (10 sec: 8191.5, 60 sec: 7372.7, 300 sec: 7317.2). Total num frames: 7778304. Throughput: 0: 1901.1. Samples: 934312. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:43:57,111][04584] Avg episode reward: [(0, '4.448')]
+[2024-11-07 14:44:00,489][04701] Updated weights for policy 0, policy_version 1906 (0.0023)
+[2024-11-07 14:44:02,108][04584] Fps is (10 sec: 7782.3, 60 sec: 7589.0, 300 sec: 7317.3). Total num frames: 7815168. Throughput: 0: 1898.8. Samples: 940240. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2024-11-07 14:44:02,111][04584] Avg episode reward: [(0, '4.370')]
+[2024-11-07 14:44:05,857][04701] Updated weights for policy 0, policy_version 1916 (0.0025)
+[2024-11-07 14:44:07,108][04584] Fps is (10 sec: 7782.9, 60 sec: 7645.9, 300 sec: 7317.3). Total num frames: 7856128. Throughput: 0: 1880.7. Samples: 951692. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
+[2024-11-07 14:44:07,110][04584] Avg episode reward: [(0, '4.466')]
+[2024-11-07 14:44:12,108][04584] Fps is (10 sec: 6144.1, 60 sec: 7304.6, 300 sec: 7234.0). Total num frames: 7876608. Throughput: 0: 1785.3. Samples: 959350. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2024-11-07 14:44:12,110][04584] Avg episode reward: [(0, '4.323')]
+[2024-11-07 14:44:13,147][04701] Updated weights for policy 0, policy_version 1926 (0.0023)
+[2024-11-07 14:44:17,108][04584] Fps is (10 sec: 6143.9, 60 sec: 7372.8, 300 sec: 7233.9). Total num frames: 7917568. Throughput: 0: 1789.9. Samples: 965282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2024-11-07 14:44:17,111][04584] Avg episode reward: [(0, '4.247')]
+[2024-11-07 14:44:18,201][04701] Updated weights for policy 0, policy_version 1936 (0.0026)
+[2024-11-07 14:44:22,109][04584] Fps is (10 sec: 8191.6, 60 sec: 7372.8, 300 sec: 7233.9). Total num frames: 7958528. Throughput: 0: 1830.9. Samples: 977550. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2024-11-07 14:44:22,114][04584] Avg episode reward: [(0, '4.299')]
+[2024-11-07 14:44:23,245][04701] Updated weights for policy 0, policy_version 1946 (0.0026)
+[2024-11-07 14:44:27,109][04584] Fps is (10 sec: 8601.1, 60 sec: 7441.0, 300 sec: 7247.8). Total num frames: 8003584. Throughput: 0: 1905.0. Samples: 990068. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
+[2024-11-07 14:44:27,111][04584] Avg episode reward: [(0, '4.466')]
+[2024-11-07 14:44:27,617][04688] Stopping Batcher_0...
+[2024-11-07 14:44:27,618][04688] Loop batcher_evt_loop terminating...
+[2024-11-07 14:44:27,620][04688] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth...
+[2024-11-07 14:44:27,619][04584] Component Batcher_0 stopped!
+[2024-11-07 14:44:27,676][04701] Weights refcount: 2 0
+[2024-11-07 14:44:27,678][04701] Stopping InferenceWorker_p0-w0...
+[2024-11-07 14:44:27,679][04701] Loop inference_proc0-0_evt_loop terminating...
+[2024-11-07 14:44:27,678][04584] Component InferenceWorker_p0-w0 stopped!
+[2024-11-07 14:44:27,709][04688] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001598_6545408.pth
+[2024-11-07 14:44:27,722][04688] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth...
+[2024-11-07 14:44:27,750][04584] Component RolloutWorker_w0 stopped!
+[2024-11-07 14:44:27,751][04702] Stopping RolloutWorker_w0...
+[2024-11-07 14:44:27,752][04702] Loop rollout_proc0_evt_loop terminating...
+[2024-11-07 14:44:27,756][04703] Stopping RolloutWorker_w1...
+[2024-11-07 14:44:27,756][04584] Component RolloutWorker_w1 stopped!
+[2024-11-07 14:44:27,759][04703] Loop rollout_proc1_evt_loop terminating...
+[2024-11-07 14:44:27,765][04707] Stopping RolloutWorker_w5...
+[2024-11-07 14:44:27,768][04707] Loop rollout_proc5_evt_loop terminating...
+[2024-11-07 14:44:27,772][04706] Stopping RolloutWorker_w4...
+[2024-11-07 14:44:27,774][04706] Loop rollout_proc4_evt_loop terminating...
+[2024-11-07 14:44:27,765][04584] Component RolloutWorker_w5 stopped!
+[2024-11-07 14:44:27,796][04584] Component RolloutWorker_w4 stopped!
+[2024-11-07 14:44:27,812][04708] Stopping RolloutWorker_w6...
+[2024-11-07 14:44:27,814][04708] Loop rollout_proc6_evt_loop terminating...
+[2024-11-07 14:44:27,812][04584] Component RolloutWorker_w6 stopped!
+[2024-11-07 14:44:27,845][04688] Stopping LearnerWorker_p0...
+[2024-11-07 14:44:27,846][04688] Loop learner_proc0_evt_loop terminating...
+[2024-11-07 14:44:27,846][04584] Component LearnerWorker_p0 stopped!
+[2024-11-07 14:44:27,888][04709] Stopping RolloutWorker_w7...
+[2024-11-07 14:44:27,889][04584] Component RolloutWorker_w7 stopped!
+[2024-11-07 14:44:27,908][04709] Loop rollout_proc7_evt_loop terminating...
+[2024-11-07 14:44:27,913][04704] Stopping RolloutWorker_w3...
+[2024-11-07 14:44:27,913][04584] Component RolloutWorker_w3 stopped!
+[2024-11-07 14:44:27,916][04704] Loop rollout_proc3_evt_loop terminating...
+[2024-11-07 14:44:28,016][04705] Stopping RolloutWorker_w2...
+[2024-11-07 14:44:28,016][04584] Component RolloutWorker_w2 stopped!
+[2024-11-07 14:44:28,019][04705] Loop rollout_proc2_evt_loop terminating...
+[2024-11-07 14:44:28,019][04584] Waiting for process learner_proc0 to stop...
+[2024-11-07 14:44:29,498][04584] Waiting for process inference_proc0-0 to join...
+[2024-11-07 14:44:29,500][04584] Waiting for process rollout_proc0 to join...
+[2024-11-07 14:44:29,501][04584] Waiting for process rollout_proc1 to join...
+[2024-11-07 14:44:29,504][04584] Waiting for process rollout_proc2 to join...
+[2024-11-07 14:44:29,653][04584] Waiting for process rollout_proc3 to join...
+[2024-11-07 14:44:29,655][04584] Waiting for process rollout_proc4 to join...
+[2024-11-07 14:44:29,656][04584] Waiting for process rollout_proc5 to join...
+[2024-11-07 14:44:29,657][04584] Waiting for process rollout_proc6 to join...
+[2024-11-07 14:44:29,660][04584] Waiting for process rollout_proc7 to join...
+[2024-11-07 14:44:29,661][04584] Batcher 0 profile tree view:
+batching: 27.4377, releasing_batches: 0.0459
+[2024-11-07 14:44:29,663][04584] InferenceWorker_p0-w0 profile tree view:
+wait_policy: 0.0000
+  wait_policy_total: 6.5812
+update_model: 7.6567
+  weight_update: 0.0027
+one_step: 0.0070
+  handle_policy_step: 510.3365
+    deserialize: 13.7673, stack: 2.4265, obs_to_device_normalize: 150.3995, forward: 227.4887, send_messages: 31.2471
+    prepare_outputs: 68.6568
+      to_cpu: 53.1126
+[2024-11-07 14:44:29,665][04584] Learner 0 profile tree view:
+misc: 0.0053, prepare_batch: 30.8301
+train: 106.7453
+  epoch_init: 0.0079, minibatch_init: 0.0131, losses_postprocess: 0.8354, kl_divergence: 0.9434, after_optimizer: 4.0108
+  calculate_losses: 31.3550
+    losses_init: 0.0058, forward_head: 1.9126, bptt_initial: 21.8044, tail: 1.0207, advantages_returns: 0.3056, losses: 3.3075
+    bptt: 2.6948
+      bptt_forward_core: 2.5765
+  update: 68.9694
+    clip: 1.2501
+[2024-11-07 14:44:29,666][04584] RolloutWorker_w0 profile tree view:
+wait_for_trajectories: 0.2375, enqueue_policy_requests: 11.9428, env_step: 149.5260, overhead: 10.7908, complete_rollouts: 0.6476
+save_policy_outputs: 16.0346
+  split_output_tensors: 5.4611
+[2024-11-07 14:44:29,670][04584] RolloutWorker_w7 profile tree view:
+wait_for_trajectories: 0.2022, enqueue_policy_requests: 12.7636, env_step: 249.0214, overhead: 10.6673, complete_rollouts: 0.3962
+save_policy_outputs: 18.4068
+  split_output_tensors: 7.8369
+[2024-11-07 14:44:29,672][04584] Loop Runner_EvtLoop terminating...
+[2024-11-07 14:44:29,675][04584] Runner profile tree view:
+main_loop: 558.4363
+[2024-11-07 14:44:29,676][04584] Collected {0: 8007680}, FPS: 7107.4
+[2024-11-07 14:44:30,065][04584] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json
+[2024-11-07 14:44:30,066][04584] Overriding arg 'num_workers' with value 1 passed from command line
+[2024-11-07 14:44:30,067][04584] Adding new argument 'no_render'=True that is not in the saved config file!
+[2024-11-07 14:44:30,068][04584] Adding new argument 'save_video'=True that is not in the saved config file!
+[2024-11-07 14:44:30,071][04584] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2024-11-07 14:44:30,072][04584] Adding new argument 'video_name'=None that is not in the saved config file!
+[2024-11-07 14:44:30,074][04584] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
+[2024-11-07 14:44:30,075][04584] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2024-11-07 14:44:30,077][04584] Adding new argument 'push_to_hub'=False that is not in the saved config file!
+[2024-11-07 14:44:30,079][04584] Adding new argument 'hf_repository'=None that is not in the saved config file!
+[2024-11-07 14:44:30,081][04584] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2024-11-07 14:44:30,082][04584] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2024-11-07 14:44:30,084][04584] Adding new argument 'train_script'=None that is not in the saved config file!
+[2024-11-07 14:44:30,085][04584] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2024-11-07 14:44:30,088][04584] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2024-11-07 14:44:30,176][04584] Doom resolution: 160x120, resize resolution: (128, 72)
+[2024-11-07 14:44:30,189][04584] RunningMeanStd input shape: (3, 72, 128)
+[2024-11-07 14:44:30,193][04584] RunningMeanStd input shape: (1,)
+[2024-11-07 14:44:30,218][04584] ConvEncoder: input_channels=3
+[2024-11-07 14:44:30,407][04584] Conv encoder output size: 512
+[2024-11-07 14:44:30,408][04584] Policy head output size: 512
+[2024-11-07 14:44:31,399][04584] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth...
+[2024-11-07 14:44:32,337][04584] Num frames 100...
+[2024-11-07 14:44:32,546][04584] Num frames 200...
+[2024-11-07 14:44:32,757][04584] Num frames 300...
+[2024-11-07 14:44:32,967][04584] Num frames 400...
+[2024-11-07 14:44:33,113][04584] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480
+[2024-11-07 14:44:33,114][04584] Avg episode reward: 5.480, avg true_objective: 4.480
+[2024-11-07 14:44:33,216][04584] Num frames 500...
+[2024-11-07 14:44:33,402][04584] Num frames 600...
+[2024-11-07 14:44:33,611][04584] Num frames 700...
+[2024-11-07 14:44:33,809][04584] Num frames 800...
+[2024-11-07 14:44:34,017][04584] Num frames 900...
+[2024-11-07 14:44:34,144][04584] Avg episode rewards: #0: 6.140, true rewards: #0: 4.640
+[2024-11-07 14:44:34,146][04584] Avg episode reward: 6.140, avg true_objective: 4.640
+[2024-11-07 14:44:34,289][04584] Num frames 1000...
+[2024-11-07 14:44:34,485][04584] Num frames 1100...
+[2024-11-07 14:44:34,693][04584] Num frames 1200...
+[2024-11-07 14:44:34,909][04584] Avg episode rewards: #0: 5.600, true rewards: #0: 4.267
+[2024-11-07 14:44:34,910][04584] Avg episode reward: 5.600, avg true_objective: 4.267
+[2024-11-07 14:44:34,955][04584] Num frames 1300...
+[2024-11-07 14:44:35,146][04584] Num frames 1400...
+[2024-11-07 14:44:35,332][04584] Num frames 1500...
+[2024-11-07 14:44:35,527][04584] Num frames 1600...
+[2024-11-07 14:44:35,702][04584] Avg episode rewards: #0: 5.160, true rewards: #0: 4.160
+[2024-11-07 14:44:35,706][04584] Avg episode reward: 5.160, avg true_objective: 4.160
+[2024-11-07 14:44:35,791][04584] Num frames 1700...
+[2024-11-07 14:44:35,959][04584] Num frames 1800...
+[2024-11-07 14:44:36,136][04584] Num frames 1900...
+[2024-11-07 14:44:36,333][04584] Num frames 2000...
+[2024-11-07 14:44:36,475][04584] Avg episode rewards: #0: 4.896, true rewards: #0: 4.096
+[2024-11-07 14:44:36,477][04584] Avg episode reward: 4.896, avg true_objective: 4.096
+[2024-11-07 14:44:36,590][04584] Num frames 2100...
+[2024-11-07 14:44:36,787][04584] Num frames 2200...
+[2024-11-07 14:44:36,980][04584] Num frames 2300...
+[2024-11-07 14:44:37,181][04584] Num frames 2400...
+[2024-11-07 14:44:37,439][04584] Avg episode rewards: #0: 4.993, true rewards: #0: 4.160
+[2024-11-07 14:44:37,440][04584] Avg episode reward: 4.993, avg true_objective: 4.160
+[2024-11-07 14:44:37,450][04584] Num frames 2500...
+[2024-11-07 14:44:37,658][04584] Num frames 2600...
+[2024-11-07 14:44:37,847][04584] Num frames 2700...
+[2024-11-07 14:44:38,030][04584] Num frames 2800...
+[2024-11-07 14:44:38,229][04584] Avg episode rewards: #0: 4.829, true rewards: #0: 4.114
+[2024-11-07 14:44:38,232][04584] Avg episode reward: 4.829, avg true_objective: 4.114
+[2024-11-07 14:44:38,291][04584] Num frames 2900...
+[2024-11-07 14:44:38,471][04584] Num frames 3000...
+[2024-11-07 14:44:38,681][04584] Num frames 3100...
+[2024-11-07 14:44:38,872][04584] Num frames 3200...
+[2024-11-07 14:44:39,064][04584] Avg episode rewards: #0: 4.705, true rewards: #0: 4.080
+[2024-11-07 14:44:39,066][04584] Avg episode reward: 4.705, avg true_objective: 4.080
+[2024-11-07 14:44:39,137][04584] Num frames 3300...
+[2024-11-07 14:44:39,304][04584] Num frames 3400...
+[2024-11-07 14:44:39,485][04584] Num frames 3500...
+[2024-11-07 14:44:39,725][04584] Num frames 3600...
+[2024-11-07 14:44:39,856][04584] Avg episode rewards: #0: 4.609, true rewards: #0: 4.053
+[2024-11-07 14:44:39,861][04584] Avg episode reward: 4.609, avg true_objective: 4.053
+[2024-11-07 14:44:39,965][04584] Num frames 3700...
+[2024-11-07 14:44:40,133][04584] Num frames 3800...
+[2024-11-07 14:44:40,308][04584] Num frames 3900...
+[2024-11-07 14:44:40,493][04584] Num frames 4000...
+[2024-11-07 14:44:40,602][04584] Avg episode rewards: #0: 4.532, true rewards: #0: 4.032
+[2024-11-07 14:44:40,606][04584] Avg episode reward: 4.532, avg true_objective: 4.032
+[2024-11-07 14:44:51,260][04584] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4!
+[2024-11-07 14:44:53,903][04584] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json
+[2024-11-07 14:44:53,904][04584] Overriding arg 'num_workers' with value 1 passed from command line
+[2024-11-07 14:44:53,906][04584] Adding new argument 'no_render'=True that is not in the saved config file!
+[2024-11-07 14:44:53,908][04584] Adding new argument 'save_video'=True that is not in the saved config file!
+[2024-11-07 14:44:53,909][04584] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2024-11-07 14:44:53,910][04584] Adding new argument 'video_name'=None that is not in the saved config file!
+[2024-11-07 14:44:53,913][04584] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
+[2024-11-07 14:44:53,915][04584] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2024-11-07 14:44:53,917][04584] Adding new argument 'push_to_hub'=True that is not in the saved config file!
+[2024-11-07 14:44:53,920][04584] Adding new argument 'hf_repository'='alidenewade/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
+[2024-11-07 14:44:53,922][04584] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2024-11-07 14:44:53,923][04584] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2024-11-07 14:44:53,926][04584] Adding new argument 'train_script'=None that is not in the saved config file!
+[2024-11-07 14:44:53,928][04584] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2024-11-07 14:44:53,931][04584] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2024-11-07 14:44:53,958][04584] RunningMeanStd input shape: (3, 72, 128)
+[2024-11-07 14:44:53,960][04584] RunningMeanStd input shape: (1,)
+[2024-11-07 14:44:53,973][04584] ConvEncoder: input_channels=3
+[2024-11-07 14:44:54,021][04584] Conv encoder output size: 512
+[2024-11-07 14:44:54,023][04584] Policy head output size: 512
+[2024-11-07 14:44:54,046][04584] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth...
+[2024-11-07 14:44:54,568][04584] Num frames 100...
+[2024-11-07 14:44:54,745][04584] Num frames 200...
+[2024-11-07 14:44:54,907][04584] Num frames 300...
+[2024-11-07 14:44:55,060][04584] Num frames 400...
+[2024-11-07 14:44:55,186][04584] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480
+[2024-11-07 14:44:55,187][04584] Avg episode reward: 5.480, avg true_objective: 4.480
+[2024-11-07 14:44:55,276][04584] Num frames 500...
+[2024-11-07 14:44:55,445][04584] Num frames 600...
+[2024-11-07 14:44:55,620][04584] Num frames 700...
+[2024-11-07 14:44:55,759][04584] Num frames 800...
+[2024-11-07 14:44:55,862][04584] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160
+[2024-11-07 14:44:55,863][04584] Avg episode reward: 4.660, avg true_objective: 4.160
+[2024-11-07 14:44:55,968][04584] Num frames 900...
+[2024-11-07 14:44:56,121][04584] Num frames 1000...
+[2024-11-07 14:44:56,271][04584] Num frames 1100...
+[2024-11-07 14:44:56,427][04584] Num frames 1200...
+[2024-11-07 14:44:56,511][04584] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053
+[2024-11-07 14:44:56,512][04584] Avg episode reward: 4.387, avg true_objective: 4.053
+[2024-11-07 14:44:56,656][04584] Num frames 1300...
+[2024-11-07 14:44:56,815][04584] Num frames 1400...
+[2024-11-07 14:44:56,966][04584] Num frames 1500...
+[2024-11-07 14:44:57,111][04584] Num frames 1600...
+[2024-11-07 14:44:57,215][04584] Avg episode rewards: #0: 4.580, true rewards: #0: 4.080
+[2024-11-07 14:44:57,215][04584] Avg episode reward: 4.580, avg true_objective: 4.080
+[2024-11-07 14:44:57,321][04584] Num frames 1700...
+[2024-11-07 14:44:57,471][04584] Num frames 1800...
+[2024-11-07 14:44:57,627][04584] Num frames 1900...
+[2024-11-07 14:44:57,776][04584] Num frames 2000...
+[2024-11-07 14:44:57,856][04584] Avg episode rewards: #0: 4.432, true rewards: #0: 4.032
+[2024-11-07 14:44:57,857][04584] Avg episode reward: 4.432, avg true_objective: 4.032
+[2024-11-07 14:44:57,981][04584] Num frames 2100...
+[2024-11-07 14:44:58,132][04584] Num frames 2200...
+[2024-11-07 14:44:58,301][04584] Num frames 2300...
+[2024-11-07 14:44:58,455][04584] Num frames 2400...
+[2024-11-07 14:44:58,615][04584] Avg episode rewards: #0: 4.607, true rewards: #0: 4.107
+[2024-11-07 14:44:58,616][04584] Avg episode reward: 4.607, avg true_objective: 4.107
+[2024-11-07 14:44:58,681][04584] Num frames 2500...
+[2024-11-07 14:44:58,837][04584] Num frames 2600...
+[2024-11-07 14:44:58,996][04584] Num frames 2700...
+[2024-11-07 14:44:59,149][04584] Num frames 2800...
+[2024-11-07 14:44:59,330][04584] Avg episode rewards: #0: 4.686, true rewards: #0: 4.114
+[2024-11-07 14:44:59,331][04584] Avg episode reward: 4.686, avg true_objective: 4.114
+[2024-11-07 14:44:59,363][04584] Num frames 2900...
+[2024-11-07 14:44:59,538][04584] Num frames 3000...
+[2024-11-07 14:44:59,703][04584] Num frames 3100...
+[2024-11-07 14:44:59,853][04584] Num frames 3200...
+[2024-11-07 14:45:00,003][04584] Avg episode rewards: #0: 4.580, true rewards: #0: 4.080
+[2024-11-07 14:45:00,005][04584] Avg episode reward: 4.580, avg true_objective: 4.080
+[2024-11-07 14:45:00,066][04584] Num frames 3300...
+[2024-11-07 14:45:00,231][04584] Num frames 3400...
+[2024-11-07 14:45:00,389][04584] Num frames 3500...
+[2024-11-07 14:45:00,565][04584] Num frames 3600...
+[2024-11-07 14:45:00,743][04584] Num frames 3700...
+[2024-11-07 14:45:00,972][04584] Avg episode rewards: #0: 4.862, true rewards: #0: 4.196
+[2024-11-07 14:45:00,974][04584] Avg episode reward: 4.862, avg true_objective: 4.196
+[2024-11-07 14:45:01,026][04584] Num frames 3800...
+[2024-11-07 14:45:01,224][04584] Num frames 3900...
+[2024-11-07 14:45:01,414][04584] Num frames 4000...
+[2024-11-07 14:45:01,597][04584] Num frames 4100...
+[2024-11-07 14:45:01,777][04584] Avg episode rewards: #0: 4.760, true rewards: #0: 4.160
+[2024-11-07 14:45:01,778][04584] Avg episode reward: 4.760, avg true_objective: 4.160
+[2024-11-07 14:45:10,932][04584] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4!