alidenewade commited on
Commit
d2bd787
·
verified ·
1 Parent(s): 0ee30ff

Upload folder using huggingface_hub

Browse files
.summary/0/events.out.tfevents.1730986539.ali ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f1e11b0059b918b90d03360ce897dbcfbe4919841fc122b58ade530701936f1f
3
+ size 3137
README.md CHANGED
@@ -15,7 +15,7 @@ model-index:
15
  type: doom_health_gathering_supreme
16
  metrics:
17
  - type: mean_reward
18
- value: 3.71 +/- 0.63
19
  name: mean_reward
20
  verified: false
21
  ---
 
15
  type: doom_health_gathering_supreme
16
  metrics:
17
  - type: mean_reward
18
+ value: 3.81 +/- 0.46
19
  name: mean_reward
20
  verified: false
21
  ---
checkpoint_p0/checkpoint_000003910_16015360.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bfde67df8265d9dcdfab6c6c171b5c5aeb413e650926559ad51aa292116196f0
3
+ size 34929669
config.json CHANGED
@@ -16,7 +16,7 @@
16
  "policy_workers_per_policy": 1,
17
  "max_policy_lag": 1000,
18
  "num_workers": 10,
19
- "num_envs_per_worker": 6,
20
  "batch_size": 1024,
21
  "num_batches_per_epoch": 1,
22
  "num_epochs": 1,
@@ -43,7 +43,7 @@
43
  "adam_beta1": 0.9,
44
  "adam_beta2": 0.999,
45
  "max_grad_norm": 4.0,
46
- "learning_rate": 0.0001,
47
  "lr_schedule": "constant",
48
  "lr_schedule_kl_threshold": 0.008,
49
  "lr_adaptive_min": 1e-06,
@@ -65,7 +65,7 @@
65
  "summaries_use_frameskip": true,
66
  "heartbeat_interval": 20,
67
  "heartbeat_reporting_interval": 600,
68
- "train_for_env_steps": 16000000,
69
  "train_for_seconds": 10000000000,
70
  "save_every_sec": 120,
71
  "keep_checkpoints": 2,
 
16
  "policy_workers_per_policy": 1,
17
  "max_policy_lag": 1000,
18
  "num_workers": 10,
19
+ "num_envs_per_worker": 4,
20
  "batch_size": 1024,
21
  "num_batches_per_epoch": 1,
22
  "num_epochs": 1,
 
43
  "adam_beta1": 0.9,
44
  "adam_beta2": 0.999,
45
  "max_grad_norm": 4.0,
46
+ "learning_rate": 0.0003,
47
  "lr_schedule": "constant",
48
  "lr_schedule_kl_threshold": 0.008,
49
  "lr_adaptive_min": 1e-06,
 
65
  "summaries_use_frameskip": true,
66
  "heartbeat_interval": 20,
67
  "heartbeat_reporting_interval": 600,
68
+ "train_for_env_steps": 4000000,
69
  "train_for_seconds": 10000000000,
70
  "save_every_sec": 120,
71
  "keep_checkpoints": 2,
replay.mp4 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:cc8d65182a01240d78094bcad1ac054bcbd3b602de2485fbaf2f3c34247352b4
3
- size 5783761
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:37f9723db5bf8d58e69570838f86321d5b61c9daef0e0eed219cccfba4dcdbc0
3
+ size 5588709
sf_log.txt CHANGED
@@ -8013,3 +8013,814 @@ main_loop: 1467.0711
8013
  [2024-11-07 15:28:22,426][04584] Avg episode rewards: #0: 3.912, true rewards: #0: 3.712
8014
  [2024-11-07 15:28:22,431][04584] Avg episode reward: 3.912, avg true_objective: 3.712
8015
  [2024-11-07 15:28:31,547][04584] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4!
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8013
  [2024-11-07 15:28:22,426][04584] Avg episode rewards: #0: 3.912, true rewards: #0: 3.712
8014
  [2024-11-07 15:28:22,431][04584] Avg episode reward: 3.912, avg true_objective: 3.712
8015
  [2024-11-07 15:28:31,547][04584] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4!
8016
+ [2024-11-07 15:28:36,824][04584] The model has been pushed to https://huggingface.co/alidenewade/rl_course_vizdoom_health_gathering_supreme
8017
+ [2024-11-07 15:35:39,627][04584] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json
8018
+ [2024-11-07 15:35:39,629][04584] Overriding arg 'num_envs_per_worker' with value 4 passed from command line
8019
+ [2024-11-07 15:35:39,631][04584] Overriding arg 'learning_rate' with value 0.0003 passed from command line
8020
+ [2024-11-07 15:35:39,633][04584] Overriding arg 'train_for_env_steps' with value 4000000 passed from command line
8021
+ [2024-11-07 15:35:39,766][04584] Experiment dir /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment already exists!
8022
+ [2024-11-07 15:35:39,767][04584] Resuming existing experiment from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment...
8023
+ [2024-11-07 15:35:39,768][04584] Weights and Biases integration disabled
8024
+ [2024-11-07 15:35:39,773][04584] Environment var CUDA_VISIBLE_DEVICES is 0
8025
+
8026
+ [2024-11-07 15:35:53,100][04584] Starting experiment with the following configuration:
8027
+ help=False
8028
+ algo=APPO
8029
+ env=doom_health_gathering_supreme
8030
+ experiment=default_experiment
8031
+ train_dir=/root/hfRL/ml/LunarLander-v2/train_dir
8032
+ restart_behavior=resume
8033
+ device=gpu
8034
+ seed=None
8035
+ num_policies=1
8036
+ async_rl=True
8037
+ serial_mode=False
8038
+ batched_sampling=False
8039
+ num_batches_to_accumulate=2
8040
+ worker_num_splits=2
8041
+ policy_workers_per_policy=1
8042
+ max_policy_lag=1000
8043
+ num_workers=10
8044
+ num_envs_per_worker=4
8045
+ batch_size=1024
8046
+ num_batches_per_epoch=1
8047
+ num_epochs=1
8048
+ rollout=32
8049
+ recurrence=32
8050
+ shuffle_minibatches=False
8051
+ gamma=0.99
8052
+ reward_scale=1.0
8053
+ reward_clip=1000.0
8054
+ value_bootstrap=False
8055
+ normalize_returns=True
8056
+ exploration_loss_coeff=0.001
8057
+ value_loss_coeff=0.5
8058
+ kl_loss_coeff=0.0
8059
+ exploration_loss=symmetric_kl
8060
+ gae_lambda=0.95
8061
+ ppo_clip_ratio=0.1
8062
+ ppo_clip_value=0.2
8063
+ with_vtrace=False
8064
+ vtrace_rho=1.0
8065
+ vtrace_c=1.0
8066
+ optimizer=adam
8067
+ adam_eps=1e-06
8068
+ adam_beta1=0.9
8069
+ adam_beta2=0.999
8070
+ max_grad_norm=4.0
8071
+ learning_rate=0.0003
8072
+ lr_schedule=constant
8073
+ lr_schedule_kl_threshold=0.008
8074
+ lr_adaptive_min=1e-06
8075
+ lr_adaptive_max=0.01
8076
+ obs_subtract_mean=0.0
8077
+ obs_scale=255.0
8078
+ normalize_input=True
8079
+ normalize_input_keys=None
8080
+ decorrelate_experience_max_seconds=0
8081
+ decorrelate_envs_on_one_worker=True
8082
+ actor_worker_gpus=[]
8083
+ set_workers_cpu_affinity=True
8084
+ force_envs_single_thread=False
8085
+ default_niceness=0
8086
+ log_to_file=True
8087
+ experiment_summaries_interval=10
8088
+ flush_summaries_interval=30
8089
+ stats_avg=100
8090
+ summaries_use_frameskip=True
8091
+ heartbeat_interval=20
8092
+ heartbeat_reporting_interval=600
8093
+ train_for_env_steps=4000000
8094
+ train_for_seconds=10000000000
8095
+ save_every_sec=120
8096
+ keep_checkpoints=2
8097
+ load_checkpoint_kind=latest
8098
+ save_milestones_sec=-1
8099
+ save_best_every_sec=5
8100
+ save_best_metric=reward
8101
+ save_best_after=100000
8102
+ benchmark=False
8103
+ encoder_mlp_layers=[512, 512]
8104
+ encoder_conv_architecture=convnet_simple
8105
+ encoder_conv_mlp_layers=[512]
8106
+ use_rnn=True
8107
+ rnn_size=512
8108
+ rnn_type=gru
8109
+ rnn_num_layers=1
8110
+ decoder_mlp_layers=[]
8111
+ nonlinearity=elu
8112
+ policy_initialization=orthogonal
8113
+ policy_init_gain=1.0
8114
+ actor_critic_share_weights=True
8115
+ adaptive_stddev=True
8116
+ continuous_tanh_scale=0.0
8117
+ initial_stddev=1.0
8118
+ use_env_info_cache=False
8119
+ env_gpu_actions=False
8120
+ env_gpu_observations=True
8121
+ env_frameskip=4
8122
+ env_framestack=1
8123
+ pixel_format=CHW
8124
+ use_record_episode_statistics=False
8125
+ with_wandb=False
8126
+ wandb_user=None
8127
+ wandb_project=sample_factory
8128
+ wandb_group=None
8129
+ wandb_job_type=SF
8130
+ wandb_tags=[]
8131
+ with_pbt=False
8132
+ pbt_mix_policies_in_one_env=True
8133
+ pbt_period_env_steps=5000000
8134
+ pbt_start_mutation=20000000
8135
+ pbt_replace_fraction=0.3
8136
+ pbt_mutation_rate=0.15
8137
+ pbt_replace_reward_gap=0.1
8138
+ pbt_replace_reward_gap_absolute=1e-06
8139
+ pbt_optimize_gamma=False
8140
+ pbt_target_objective=true_objective
8141
+ pbt_perturb_min=1.1
8142
+ pbt_perturb_max=1.5
8143
+ num_agents=-1
8144
+ num_humans=0
8145
+ num_bots=-1
8146
+ start_bot_difficulty=None
8147
+ timelimit=None
8148
+ res_w=128
8149
+ res_h=72
8150
+ wide_aspect_ratio=False
8151
+ eval_env_frameskip=1
8152
+ fps=35
8153
+ command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000
8154
+ cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000}
8155
+ git_hash=unknown
8156
+ git_repo_name=not a git repository
8157
+ [2024-11-07 15:35:53,102][04584] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json...
8158
+ [2024-11-07 15:35:53,105][04584] Rollout worker 0 uses device cpu
8159
+ [2024-11-07 15:35:53,106][04584] Rollout worker 1 uses device cpu
8160
+ [2024-11-07 15:35:53,108][04584] Rollout worker 2 uses device cpu
8161
+ [2024-11-07 15:35:53,111][04584] Rollout worker 3 uses device cpu
8162
+ [2024-11-07 15:35:53,113][04584] Rollout worker 4 uses device cpu
8163
+ [2024-11-07 15:35:53,115][04584] Rollout worker 5 uses device cpu
8164
+ [2024-11-07 15:35:53,117][04584] Rollout worker 6 uses device cpu
8165
+ [2024-11-07 15:35:53,118][04584] Rollout worker 7 uses device cpu
8166
+ [2024-11-07 15:35:53,121][04584] Rollout worker 8 uses device cpu
8167
+ [2024-11-07 15:35:53,124][04584] Rollout worker 9 uses device cpu
8168
+ [2024-11-07 15:35:53,218][04584] Using GPUs [0] for process 0 (actually maps to GPUs [0])
8169
+ [2024-11-07 15:35:53,219][04584] InferenceWorker_p0-w0: min num requests: 3
8170
+ [2024-11-07 15:35:53,265][04584] Starting all processes...
8171
+ [2024-11-07 15:35:53,267][04584] Starting process learner_proc0
8172
+ [2024-11-07 15:35:53,321][04584] Starting all processes...
8173
+ [2024-11-07 15:35:53,334][04584] Starting process inference_proc0-0
8174
+ [2024-11-07 15:35:53,338][04584] Starting process rollout_proc0
8175
+ [2024-11-07 15:35:53,338][04584] Starting process rollout_proc1
8176
+ [2024-11-07 15:35:53,340][04584] Starting process rollout_proc2
8177
+ [2024-11-07 15:35:53,341][04584] Starting process rollout_proc3
8178
+ [2024-11-07 15:35:53,343][04584] Starting process rollout_proc4
8179
+ [2024-11-07 15:35:53,344][04584] Starting process rollout_proc5
8180
+ [2024-11-07 15:35:53,346][04584] Starting process rollout_proc6
8181
+ [2024-11-07 15:35:53,348][04584] Starting process rollout_proc7
8182
+ [2024-11-07 15:35:53,350][04584] Starting process rollout_proc8
8183
+ [2024-11-07 15:35:53,351][04584] Starting process rollout_proc9
8184
+ [2024-11-07 15:36:01,323][12380] Using GPUs [0] for process 0 (actually maps to GPUs [0])
8185
+ [2024-11-07 15:36:01,324][12380] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
8186
+ [2024-11-07 15:36:01,855][12406] Worker 8 uses CPU cores [0, 1, 2, 3, 4, 5, 6]
8187
+ [2024-11-07 15:36:01,933][12395] Using GPUs [0] for process 0 (actually maps to GPUs [0])
8188
+ [2024-11-07 15:36:01,934][12395] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
8189
+ [2024-11-07 15:36:02,006][12380] Num visible devices: 1
8190
+ [2024-11-07 15:36:02,044][12395] Num visible devices: 1
8191
+ [2024-11-07 15:36:02,084][12398] Worker 2 uses CPU cores [2]
8192
+ [2024-11-07 15:36:02,105][12380] Starting seed is not provided
8193
+ [2024-11-07 15:36:02,106][12380] Using GPUs [0] for process 0 (actually maps to GPUs [0])
8194
+ [2024-11-07 15:36:02,106][12380] Initializing actor-critic model on device cuda:0
8195
+ [2024-11-07 15:36:02,107][12380] RunningMeanStd input shape: (3, 72, 128)
8196
+ [2024-11-07 15:36:02,125][12380] RunningMeanStd input shape: (1,)
8197
+ [2024-11-07 15:36:02,413][12380] ConvEncoder: input_channels=3
8198
+ [2024-11-07 15:36:02,565][12407] Worker 4 uses CPU cores [4]
8199
+ [2024-11-07 15:36:02,968][12408] Worker 6 uses CPU cores [6]
8200
+ [2024-11-07 15:36:03,069][12397] Worker 0 uses CPU cores [0]
8201
+ [2024-11-07 15:36:03,316][12380] Conv encoder output size: 512
8202
+ [2024-11-07 15:36:03,322][12380] Policy head output size: 512
8203
+ [2024-11-07 15:36:03,383][12396] Worker 1 uses CPU cores [1]
8204
+ [2024-11-07 15:36:03,390][12380] Created Actor Critic model with architecture:
8205
+ [2024-11-07 15:36:03,390][12380] ActorCriticSharedWeights(
8206
+ (obs_normalizer): ObservationNormalizer(
8207
+ (running_mean_std): RunningMeanStdDictInPlace(
8208
+ (running_mean_std): ModuleDict(
8209
+ (obs): RunningMeanStdInPlace()
8210
+ )
8211
+ )
8212
+ )
8213
+ (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
8214
+ (encoder): VizdoomEncoder(
8215
+ (basic_encoder): ConvEncoder(
8216
+ (enc): RecursiveScriptModule(
8217
+ original_name=ConvEncoderImpl
8218
+ (conv_head): RecursiveScriptModule(
8219
+ original_name=Sequential
8220
+ (0): RecursiveScriptModule(original_name=Conv2d)
8221
+ (1): RecursiveScriptModule(original_name=ELU)
8222
+ (2): RecursiveScriptModule(original_name=Conv2d)
8223
+ (3): RecursiveScriptModule(original_name=ELU)
8224
+ (4): RecursiveScriptModule(original_name=Conv2d)
8225
+ (5): RecursiveScriptModule(original_name=ELU)
8226
+ )
8227
+ (mlp_layers): RecursiveScriptModule(
8228
+ original_name=Sequential
8229
+ (0): RecursiveScriptModule(original_name=Linear)
8230
+ (1): RecursiveScriptModule(original_name=ELU)
8231
+ )
8232
+ )
8233
+ )
8234
+ )
8235
+ (core): ModelCoreRNN(
8236
+ (core): GRU(512, 512)
8237
+ )
8238
+ (decoder): MlpDecoder(
8239
+ (mlp): Identity()
8240
+ )
8241
+ (critic_linear): Linear(in_features=512, out_features=1, bias=True)
8242
+ (action_parameterization): ActionParameterizationDefault(
8243
+ (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
8244
+ )
8245
+ )
8246
+ [2024-11-07 15:36:03,484][12409] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6]
8247
+ [2024-11-07 15:36:03,503][12399] Worker 3 uses CPU cores [3]
8248
+ [2024-11-07 15:36:03,843][12410] Worker 5 uses CPU cores [5]
8249
+ [2024-11-07 15:36:03,920][12411] Worker 9 uses CPU cores [0, 1, 2, 3, 4, 5, 6]
8250
+ [2024-11-07 15:36:04,363][12380] Using optimizer <class 'torch.optim.adam.Adam'>
8251
+ [2024-11-07 15:36:07,491][12380] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003908_16007168.pth...
8252
+ [2024-11-07 15:36:07,594][12380] Loading model from checkpoint
8253
+ [2024-11-07 15:36:07,596][12380] Loaded experiment state at self.train_step=3908, self.env_steps=16007168
8254
+ [2024-11-07 15:36:07,597][12380] Initialized policy 0 weights for model version 3908
8255
+ [2024-11-07 15:36:07,606][12380] LearnerWorker_p0 finished initialization!
8256
+ [2024-11-07 15:36:07,606][12380] Using GPUs [0] for process 0 (actually maps to GPUs [0])
8257
+ [2024-11-07 15:36:08,060][12395] RunningMeanStd input shape: (3, 72, 128)
8258
+ [2024-11-07 15:36:08,061][12395] RunningMeanStd input shape: (1,)
8259
+ [2024-11-07 15:36:08,077][12395] ConvEncoder: input_channels=3
8260
+ [2024-11-07 15:36:08,262][12395] Conv encoder output size: 512
8261
+ [2024-11-07 15:36:08,262][12395] Policy head output size: 512
8262
+ [2024-11-07 15:36:08,356][04584] Inference worker 0-0 is ready!
8263
+ [2024-11-07 15:36:08,358][04584] All inference workers are ready! Signal rollout workers to start!
8264
+ [2024-11-07 15:36:08,479][12399] Doom resolution: 160x120, resize resolution: (128, 72)
8265
+ [2024-11-07 15:36:08,521][12407] Doom resolution: 160x120, resize resolution: (128, 72)
8266
+ [2024-11-07 15:36:08,526][12411] Doom resolution: 160x120, resize resolution: (128, 72)
8267
+ [2024-11-07 15:36:08,532][12409] Doom resolution: 160x120, resize resolution: (128, 72)
8268
+ [2024-11-07 15:36:08,535][12398] Doom resolution: 160x120, resize resolution: (128, 72)
8269
+ [2024-11-07 15:36:08,536][12396] Doom resolution: 160x120, resize resolution: (128, 72)
8270
+ [2024-11-07 15:36:08,545][12397] Doom resolution: 160x120, resize resolution: (128, 72)
8271
+ [2024-11-07 15:36:08,546][12410] Doom resolution: 160x120, resize resolution: (128, 72)
8272
+ [2024-11-07 15:36:08,689][12406] Doom resolution: 160x120, resize resolution: (128, 72)
8273
+ [2024-11-07 15:36:08,695][12408] Doom resolution: 160x120, resize resolution: (128, 72)
8274
+ [2024-11-07 15:36:09,198][12407] Decorrelating experience for 0 frames...
8275
+ [2024-11-07 15:36:09,203][12399] Decorrelating experience for 0 frames...
8276
+ [2024-11-07 15:36:09,236][12398] Decorrelating experience for 0 frames...
8277
+ [2024-11-07 15:36:09,720][12411] Decorrelating experience for 0 frames...
8278
+ [2024-11-07 15:36:09,748][12397] Decorrelating experience for 0 frames...
8279
+ [2024-11-07 15:36:09,774][04584] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 16007168. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
8280
+ [2024-11-07 15:36:09,841][12410] Decorrelating experience for 0 frames...
8281
+ [2024-11-07 15:36:09,917][12407] Decorrelating experience for 32 frames...
8282
+ [2024-11-07 15:36:09,948][12399] Decorrelating experience for 32 frames...
8283
+ [2024-11-07 15:36:10,220][12396] Decorrelating experience for 0 frames...
8284
+ [2024-11-07 15:36:10,305][12408] Decorrelating experience for 0 frames...
8285
+ [2024-11-07 15:36:10,708][12406] Decorrelating experience for 0 frames...
8286
+ [2024-11-07 15:36:10,813][12407] Decorrelating experience for 64 frames...
8287
+ [2024-11-07 15:36:10,964][12399] Decorrelating experience for 64 frames...
8288
+ [2024-11-07 15:36:11,144][12408] Decorrelating experience for 32 frames...
8289
+ [2024-11-07 15:36:11,150][12396] Decorrelating experience for 32 frames...
8290
+ [2024-11-07 15:36:11,225][12397] Decorrelating experience for 32 frames...
8291
+ [2024-11-07 15:36:11,390][12411] Decorrelating experience for 32 frames...
8292
+ [2024-11-07 15:36:11,571][12407] Decorrelating experience for 96 frames...
8293
+ [2024-11-07 15:36:11,584][12399] Decorrelating experience for 96 frames...
8294
+ [2024-11-07 15:36:13,567][04584] Heartbeat connected on LearnerWorker_p0
8295
+ [2024-11-07 15:36:13,569][04584] Heartbeat connected on RolloutWorker_w3
8296
+ [2024-11-07 15:36:13,575][04584] Heartbeat connected on Batcher_0
8297
+ [2024-11-07 15:36:13,580][04584] Heartbeat connected on RolloutWorker_w4
8298
+ [2024-11-07 15:36:13,730][12406] Decorrelating experience for 32 frames...
8299
+ [2024-11-07 15:36:13,871][12410] Decorrelating experience for 32 frames...
8300
+ [2024-11-07 15:36:13,880][12397] Decorrelating experience for 64 frames...
8301
+ [2024-11-07 15:36:13,894][12408] Decorrelating experience for 64 frames...
8302
+ [2024-11-07 15:36:14,131][12396] Decorrelating experience for 64 frames...
8303
+ [2024-11-07 15:36:14,255][12411] Decorrelating experience for 64 frames...
8304
+ [2024-11-07 15:36:14,386][12409] Decorrelating experience for 0 frames...
8305
+ [2024-11-07 15:36:14,449][12408] Decorrelating experience for 96 frames...
8306
+ [2024-11-07 15:36:14,489][12397] Decorrelating experience for 96 frames...
8307
+ [2024-11-07 15:36:14,512][04584] Heartbeat connected on RolloutWorker_w6
8308
+ [2024-11-07 15:36:14,595][04584] Heartbeat connected on RolloutWorker_w0
8309
+ [2024-11-07 15:36:14,604][12396] Decorrelating experience for 96 frames...
8310
+ [2024-11-07 15:36:14,678][04584] Heartbeat connected on RolloutWorker_w1
8311
+ [2024-11-07 15:36:14,774][04584] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 16007168. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
8312
+ [2024-11-07 15:36:14,971][12398] Decorrelating experience for 32 frames...
8313
+ [2024-11-07 15:36:15,065][12411] Decorrelating experience for 96 frames...
8314
+ [2024-11-07 15:36:15,200][12409] Decorrelating experience for 32 frames...
8315
+ [2024-11-07 15:36:15,266][04584] Heartbeat connected on RolloutWorker_w9
8316
+ [2024-11-07 15:36:15,504][04584] Heartbeat connected on InferenceWorker_p0-w0
8317
+ [2024-11-07 15:36:15,728][12410] Decorrelating experience for 64 frames...
8318
+ [2024-11-07 15:36:16,300][12398] Decorrelating experience for 64 frames...
8319
+ [2024-11-07 15:36:16,335][12409] Decorrelating experience for 64 frames...
8320
+ [2024-11-07 15:36:17,274][12406] Decorrelating experience for 64 frames...
8321
+ [2024-11-07 15:36:17,565][12410] Decorrelating experience for 96 frames...
8322
+ [2024-11-07 15:36:17,901][04584] Heartbeat connected on RolloutWorker_w5
8323
+ [2024-11-07 15:36:18,870][12398] Decorrelating experience for 96 frames...
8324
+ [2024-11-07 15:36:19,062][12406] Decorrelating experience for 96 frames...
8325
+ [2024-11-07 15:36:19,077][04584] Heartbeat connected on RolloutWorker_w2
8326
+ [2024-11-07 15:36:19,382][04584] Heartbeat connected on RolloutWorker_w8
8327
+ [2024-11-07 15:36:19,785][04584] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 16007168. Throughput: 0: 73.1. Samples: 732. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
8328
+ [2024-11-07 15:36:19,789][04584] Avg episode reward: [(0, '1.797')]
8329
+ [2024-11-07 15:36:20,780][12409] Decorrelating experience for 96 frames...
8330
+ [2024-11-07 15:36:20,905][12380] Signal inference workers to stop experience collection...
8331
+ [2024-11-07 15:36:20,925][12395] InferenceWorker_p0-w0: stopping experience collection
8332
+ [2024-11-07 15:36:21,046][04584] Heartbeat connected on RolloutWorker_w7
8333
+ [2024-11-07 15:36:24,774][04584] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 16007168. Throughput: 0: 184.4. Samples: 2766. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
8334
+ [2024-11-07 15:36:24,775][04584] Avg episode reward: [(0, '2.131')]
8335
+ [2024-11-07 15:36:29,774][04584] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 16007168. Throughput: 0: 138.3. Samples: 2766. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
8336
+ [2024-11-07 15:36:29,776][04584] Avg episode reward: [(0, '2.131')]
8337
+ [2024-11-07 15:36:31,391][12380] Signal inference workers to resume experience collection...
8338
+ [2024-11-07 15:36:31,392][12395] InferenceWorker_p0-w0: resuming experience collection
8339
+ [2024-11-07 15:36:31,396][12380] Stopping Batcher_0...
8340
+ [2024-11-07 15:36:31,396][12380] Loop batcher_evt_loop terminating...
8341
+ [2024-11-07 15:36:31,433][04584] Component Batcher_0 stopped!
8342
+ [2024-11-07 15:36:31,892][12395] Weights refcount: 2 0
8343
+ [2024-11-07 15:36:31,894][12395] Stopping InferenceWorker_p0-w0...
8344
+ [2024-11-07 15:36:31,894][12395] Loop inference_proc0-0_evt_loop terminating...
8345
+ [2024-11-07 15:36:31,894][04584] Component InferenceWorker_p0-w0 stopped!
8346
+ [2024-11-07 15:36:32,099][12380] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003910_16015360.pth...
8347
+ [2024-11-07 15:36:32,269][12411] Stopping RolloutWorker_w9...
8348
+ [2024-11-07 15:36:32,270][12411] Loop rollout_proc9_evt_loop terminating...
8349
+ [2024-11-07 15:36:32,269][04584] Component RolloutWorker_w9 stopped!
8350
+ [2024-11-07 15:36:32,294][12408] Stopping RolloutWorker_w6...
8351
+ [2024-11-07 15:36:32,295][12408] Loop rollout_proc6_evt_loop terminating...
8352
+ [2024-11-07 15:36:32,296][12396] Stopping RolloutWorker_w1...
8353
+ [2024-11-07 15:36:32,297][12396] Loop rollout_proc1_evt_loop terminating...
8354
+ [2024-11-07 15:36:32,284][04584] Component RolloutWorker_w8 stopped!
8355
+ [2024-11-07 15:36:32,468][12406] Stopping RolloutWorker_w8...
8356
+ [2024-11-07 15:36:32,469][12406] Loop rollout_proc8_evt_loop terminating...
8357
+ [2024-11-07 15:36:32,467][04584] Component RolloutWorker_w6 stopped!
8358
+ [2024-11-07 15:36:32,470][04584] Component RolloutWorker_w1 stopped!
8359
+ [2024-11-07 15:36:32,489][04584] Component RolloutWorker_w5 stopped!
8360
+ [2024-11-07 15:36:32,490][12399] Stopping RolloutWorker_w3...
8361
+ [2024-11-07 15:36:32,490][04584] Component RolloutWorker_w3 stopped!
8362
+ [2024-11-07 15:36:32,492][12410] Stopping RolloutWorker_w5...
8363
+ [2024-11-07 15:36:32,492][12399] Loop rollout_proc3_evt_loop terminating...
8364
+ [2024-11-07 15:36:32,493][12410] Loop rollout_proc5_evt_loop terminating...
8365
+ [2024-11-07 15:36:32,644][12409] Stopping RolloutWorker_w7...
8366
+ [2024-11-07 15:36:32,645][12409] Loop rollout_proc7_evt_loop terminating...
8367
+ [2024-11-07 15:36:32,645][04584] Component RolloutWorker_w7 stopped!
8368
+ [2024-11-07 15:36:32,716][12397] Stopping RolloutWorker_w0...
8369
+ [2024-11-07 15:36:32,717][04584] Component RolloutWorker_w0 stopped!
8370
+ [2024-11-07 15:36:32,735][12397] Loop rollout_proc0_evt_loop terminating...
8371
+ [2024-11-07 15:36:32,920][12398] Stopping RolloutWorker_w2...
8372
+ [2024-11-07 15:36:32,920][12398] Loop rollout_proc2_evt_loop terminating...
8373
+ [2024-11-07 15:36:32,920][04584] Component RolloutWorker_w2 stopped!
8374
+ [2024-11-07 15:36:33,007][04584] Component RolloutWorker_w4 stopped!
8375
+ [2024-11-07 15:36:33,009][12407] Stopping RolloutWorker_w4...
8376
+ [2024-11-07 15:36:33,011][12407] Loop rollout_proc4_evt_loop terminating...
8377
+ [2024-11-07 15:36:33,392][12380] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003865_15831040.pth
8378
+ [2024-11-07 15:36:33,454][12380] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003910_16015360.pth...
8379
+ [2024-11-07 15:36:34,291][04584] Component LearnerWorker_p0 stopped!
8380
+ [2024-11-07 15:36:34,290][12380] Stopping LearnerWorker_p0...
8381
+ [2024-11-07 15:36:34,300][12380] Loop learner_proc0_evt_loop terminating...
8382
+ [2024-11-07 15:36:34,300][04584] Waiting for process learner_proc0 to stop...
8383
+ [2024-11-07 15:36:36,214][04584] Waiting for process inference_proc0-0 to join...
8384
+ [2024-11-07 15:36:36,215][04584] Waiting for process rollout_proc0 to join...
8385
+ [2024-11-07 15:36:36,216][04584] Waiting for process rollout_proc1 to join...
8386
+ [2024-11-07 15:36:36,218][04584] Waiting for process rollout_proc2 to join...
8387
+ [2024-11-07 15:36:36,220][04584] Waiting for process rollout_proc3 to join...
8388
+ [2024-11-07 15:36:36,224][04584] Waiting for process rollout_proc4 to join...
8389
+ [2024-11-07 15:36:36,225][04584] Waiting for process rollout_proc5 to join...
8390
+ [2024-11-07 15:36:36,228][04584] Waiting for process rollout_proc6 to join...
8391
+ [2024-11-07 15:36:36,229][04584] Waiting for process rollout_proc7 to join...
8392
+ [2024-11-07 15:36:36,231][04584] Waiting for process rollout_proc8 to join...
8393
+ [2024-11-07 15:36:36,244][04584] Waiting for process rollout_proc9 to join...
8394
+ [2024-11-07 15:36:36,246][04584] Batcher 0 profile tree view:
8395
+ batching: 0.1186, releasing_batches: 0.0010
8396
+ [2024-11-07 15:36:36,249][04584] InferenceWorker_p0-w0 profile tree view:
8397
+ wait_policy: 0.0051
8398
+ wait_policy_total: 3.3668
8399
+ update_model: 0.4621
8400
+ weight_update: 0.4034
8401
+ one_step: 0.0533
8402
+ handle_policy_step: 8.7830
8403
+ deserialize: 0.1270, stack: 0.0172, obs_to_device_normalize: 3.1395, forward: 4.5298, send_messages: 0.1979
8404
+ prepare_outputs: 0.5682
8405
+ to_cpu: 0.4418
8406
+ [2024-11-07 15:36:36,251][04584] Learner 0 profile tree view:
8407
+ misc: 0.0000, prepare_batch: 3.5253
8408
+ train: 9.8063
8409
+ epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0012, kl_divergence: 0.1010, after_optimizer: 1.5531
8410
+ calculate_losses: 1.1109
8411
+ losses_init: 0.0000, forward_head: 0.5490, bptt_initial: 0.2427, tail: 0.0206, advantages_returns: 0.0022, losses: 0.2701
8412
+ bptt: 0.0253
8413
+ bptt_forward_core: 0.0251
8414
+ update: 7.0378
8415
+ clip: 0.2975
8416
+ [2024-11-07 15:36:36,255][04584] RolloutWorker_w0 profile tree view:
8417
+ wait_for_trajectories: 0.0012, enqueue_policy_requests: 0.1829, env_step: 1.6410, overhead: 0.1598, complete_rollouts: 0.0046
8418
+ save_policy_outputs: 0.0904
8419
+ split_output_tensors: 0.0334
8420
+ [2024-11-07 15:36:36,260][04584] RolloutWorker_w9 profile tree view:
8421
+ wait_for_trajectories: 0.0012, enqueue_policy_requests: 0.3210, env_step: 2.2709, overhead: 0.0749, complete_rollouts: 0.0414
8422
+ save_policy_outputs: 0.0948
8423
+ split_output_tensors: 0.0223
8424
+ [2024-11-07 15:36:36,263][04584] Loop Runner_EvtLoop terminating...
8425
+ [2024-11-07 15:36:36,268][04584] Runner profile tree view:
8426
+ main_loop: 43.0008
8427
+ [2024-11-07 15:36:36,270][04584] Collected {0: 16015360}, FPS: 190.5
8428
+ [2024-11-07 15:37:41,648][04584] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json
8429
+ [2024-11-07 15:37:41,650][04584] Adding new argument 'no_render'=True that is not in the saved config file!
8430
+ [2024-11-07 15:37:41,652][04584] Adding new argument 'save_video'=True that is not in the saved config file!
8431
+ [2024-11-07 15:37:41,654][04584] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
8432
+ [2024-11-07 15:37:41,656][04584] Adding new argument 'video_name'=None that is not in the saved config file!
8433
+ [2024-11-07 15:37:41,657][04584] Adding new argument 'max_num_frames'=50000 that is not in the saved config file!
8434
+ [2024-11-07 15:37:41,659][04584] Adding new argument 'max_num_episodes'=50 that is not in the saved config file!
8435
+ [2024-11-07 15:37:41,661][04584] Adding new argument 'push_to_hub'=False that is not in the saved config file!
8436
+ [2024-11-07 15:37:41,662][04584] Adding new argument 'hf_repository'=None that is not in the saved config file!
8437
+ [2024-11-07 15:37:41,663][04584] Adding new argument 'policy_index'=0 that is not in the saved config file!
8438
+ [2024-11-07 15:37:41,665][04584] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
8439
+ [2024-11-07 15:37:41,666][04584] Adding new argument 'train_script'=None that is not in the saved config file!
8440
+ [2024-11-07 15:37:41,668][04584] Adding new argument 'enjoy_script'=None that is not in the saved config file!
8441
+ [2024-11-07 15:37:41,671][04584] Using frameskip 1 and render_action_repeat=4 for evaluation
8442
+ [2024-11-07 15:37:41,960][04584] RunningMeanStd input shape: (3, 72, 128)
8443
+ [2024-11-07 15:37:41,972][04584] RunningMeanStd input shape: (1,)
8444
+ [2024-11-07 15:37:42,066][04584] ConvEncoder: input_channels=3
8445
+ [2024-11-07 15:37:42,267][04584] Conv encoder output size: 512
8446
+ [2024-11-07 15:37:42,271][04584] Policy head output size: 512
8447
+ [2024-11-07 15:37:42,469][04584] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003910_16015360.pth...
8448
+ [2024-11-07 15:37:43,334][04584] Num frames 100...
8449
+ [2024-11-07 15:37:43,616][04584] Num frames 200...
8450
+ [2024-11-07 15:37:43,874][04584] Num frames 300...
8451
+ [2024-11-07 15:37:44,131][04584] Num frames 400...
8452
+ [2024-11-07 15:37:44,308][04584] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480
8453
+ [2024-11-07 15:37:44,314][04584] Avg episode reward: 5.480, avg true_objective: 4.480
8454
+ [2024-11-07 15:37:44,458][04584] Num frames 500...
8455
+ [2024-11-07 15:37:44,722][04584] Num frames 600...
8456
+ [2024-11-07 15:37:44,966][04584] Num frames 700...
8457
+ [2024-11-07 15:37:45,212][04584] Num frames 800...
8458
+ [2024-11-07 15:37:45,399][04584] Num frames 900...
8459
+ [2024-11-07 15:37:45,576][04584] Avg episode rewards: #0: 6.300, true rewards: #0: 4.800
8460
+ [2024-11-07 15:37:45,579][04584] Avg episode reward: 6.300, avg true_objective: 4.800
8461
+ [2024-11-07 15:37:45,668][04584] Num frames 1000...
8462
+ [2024-11-07 15:37:45,866][04584] Num frames 1100...
8463
+ [2024-11-07 15:37:46,070][04584] Num frames 1200...
8464
+ [2024-11-07 15:37:46,219][04584] Avg episode rewards: #0: 5.493, true rewards: #0: 4.160
8465
+ [2024-11-07 15:37:46,222][04584] Avg episode reward: 5.493, avg true_objective: 4.160
8466
+ [2024-11-07 15:37:46,327][04584] Num frames 1300...
8467
+ [2024-11-07 15:37:46,558][04584] Num frames 1400...
8468
+ [2024-11-07 15:37:46,796][04584] Num frames 1500...
8469
+ [2024-11-07 15:37:47,024][04584] Num frames 1600...
8470
+ [2024-11-07 15:37:47,319][04584] Avg episode rewards: #0: 5.490, true rewards: #0: 4.240
8471
+ [2024-11-07 15:37:47,320][04584] Avg episode reward: 5.490, avg true_objective: 4.240
8472
+ [2024-11-07 15:37:47,332][04584] Num frames 1700...
8473
+ [2024-11-07 15:37:47,581][04584] Num frames 1800...
8474
+ [2024-11-07 15:37:47,849][04584] Num frames 1900...
8475
+ [2024-11-07 15:37:48,088][04584] Num frames 2000...
8476
+ [2024-11-07 15:37:48,351][04584] Num frames 2100...
8477
+ [2024-11-07 15:37:48,524][04584] Avg episode rewards: #0: 5.488, true rewards: #0: 4.288
8478
+ [2024-11-07 15:37:48,526][04584] Avg episode reward: 5.488, avg true_objective: 4.288
8479
+ [2024-11-07 15:37:48,684][04584] Num frames 2200...
8480
+ [2024-11-07 15:37:48,922][04584] Num frames 2300...
8481
+ [2024-11-07 15:37:49,182][04584] Num frames 2400...
8482
+ [2024-11-07 15:37:49,236][04584] Avg episode rewards: #0: 5.000, true rewards: #0: 4.000
8483
+ [2024-11-07 15:37:49,239][04584] Avg episode reward: 5.000, avg true_objective: 4.000
8484
+ [2024-11-07 15:37:49,535][04584] Num frames 2500...
8485
+ [2024-11-07 15:37:49,918][04584] Num frames 2600...
8486
+ [2024-11-07 15:37:50,190][04584] Num frames 2700...
8487
+ [2024-11-07 15:37:50,445][04584] Avg episode rewards: #0: 4.834, true rewards: #0: 3.977
8488
+ [2024-11-07 15:37:50,448][04584] Avg episode reward: 4.834, avg true_objective: 3.977
8489
+ [2024-11-07 15:37:50,499][04584] Num frames 2800...
8490
+ [2024-11-07 15:37:50,761][04584] Num frames 2900...
8491
+ [2024-11-07 15:37:51,078][04584] Num frames 3000...
8492
+ [2024-11-07 15:37:51,427][04584] Num frames 3100...
8493
+ [2024-11-07 15:37:51,661][04584] Num frames 3200...
8494
+ [2024-11-07 15:37:51,786][04584] Avg episode rewards: #0: 4.915, true rewards: #0: 4.040
8495
+ [2024-11-07 15:37:51,792][04584] Avg episode reward: 4.915, avg true_objective: 4.040
8496
+ [2024-11-07 15:37:51,964][04584] Num frames 3300...
8497
+ [2024-11-07 15:37:52,359][04584] Num frames 3400...
8498
+ [2024-11-07 15:37:52,697][04584] Num frames 3500...
8499
+ [2024-11-07 15:37:52,980][04584] Avg episode rewards: #0: 4.777, true rewards: #0: 3.999
8500
+ [2024-11-07 15:37:52,985][04584] Avg episode reward: 4.777, avg true_objective: 3.999
8501
+ [2024-11-07 15:37:53,003][04584] Num frames 3600...
8502
+ [2024-11-07 15:37:53,277][04584] Num frames 3700...
8503
+ [2024-11-07 15:37:53,537][04584] Num frames 3800...
8504
+ [2024-11-07 15:37:55,495][04584] Num frames 3900...
8505
+ [2024-11-07 15:37:55,889][04584] Num frames 4000...
8506
+ [2024-11-07 15:37:56,113][04584] Avg episode rewards: #0: 4.847, true rewards: #0: 4.047
8507
+ [2024-11-07 15:37:56,114][04584] Avg episode reward: 4.847, avg true_objective: 4.047
8508
+ [2024-11-07 15:37:56,302][04584] Num frames 4100...
8509
+ [2024-11-07 15:37:56,492][04584] Num frames 4200...
8510
+ [2024-11-07 15:37:56,686][04584] Num frames 4300...
8511
+ [2024-11-07 15:37:56,874][04584] Num frames 4400...
8512
+ [2024-11-07 15:37:56,989][04584] Avg episode rewards: #0: 4.755, true rewards: #0: 4.028
8513
+ [2024-11-07 15:37:56,995][04584] Avg episode reward: 4.755, avg true_objective: 4.028
8514
+ [2024-11-07 15:37:57,143][04584] Num frames 4500...
8515
+ [2024-11-07 15:37:57,319][04584] Num frames 4600...
8516
+ [2024-11-07 15:37:57,495][04584] Num frames 4700...
8517
+ [2024-11-07 15:37:57,661][04584] Num frames 4800...
8518
+ [2024-11-07 15:37:57,745][04584] Avg episode rewards: #0: 4.679, true rewards: #0: 4.012
8519
+ [2024-11-07 15:37:57,749][04584] Avg episode reward: 4.679, avg true_objective: 4.012
8520
+ [2024-11-07 15:37:57,913][04584] Num frames 4900...
8521
+ [2024-11-07 15:37:58,074][04584] Num frames 5000...
8522
+ [2024-11-07 15:37:58,242][04584] Num frames 5100...
8523
+ [2024-11-07 15:37:58,405][04584] Num frames 5200...
8524
+ [2024-11-07 15:37:58,518][04584] Avg episode rewards: #0: 4.639, true rewards: #0: 4.024
8525
+ [2024-11-07 15:37:58,524][04584] Avg episode reward: 4.639, avg true_objective: 4.024
8526
+ [2024-11-07 15:37:58,671][04584] Num frames 5300...
8527
+ [2024-11-07 15:37:59,095][04584] Num frames 5400...
8528
+ [2024-11-07 15:37:59,551][04584] Num frames 5500...
8529
+ [2024-11-07 15:37:59,996][04584] Num frames 5600...
8530
+ [2024-11-07 15:38:00,123][04584] Avg episode rewards: #0: 4.582, true rewards: #0: 4.011
8531
+ [2024-11-07 15:38:00,127][04584] Avg episode reward: 4.582, avg true_objective: 4.011
8532
+ [2024-11-07 15:38:00,379][04584] Num frames 5700...
8533
+ [2024-11-07 15:38:00,650][04584] Num frames 5800...
8534
+ [2024-11-07 15:38:00,927][04584] Num frames 5900...
8535
+ [2024-11-07 15:38:01,238][04584] Avg episode rewards: #0: 4.533, true rewards: #0: 3.999
8536
+ [2024-11-07 15:38:01,240][04584] Avg episode reward: 4.533, avg true_objective: 3.999
8537
+ [2024-11-07 15:38:01,243][04584] Num frames 6000...
8538
+ [2024-11-07 15:38:01,558][04584] Num frames 6100...
8539
+ [2024-11-07 15:38:01,882][04584] Num frames 6200...
8540
+ [2024-11-07 15:38:02,265][04584] Num frames 6300...
8541
+ [2024-11-07 15:38:02,603][04584] Avg episode rewards: #0: 4.489, true rewards: #0: 3.989
8542
+ [2024-11-07 15:38:02,607][04584] Avg episode reward: 4.489, avg true_objective: 3.989
8543
+ [2024-11-07 15:38:02,683][04584] Num frames 6400...
8544
+ [2024-11-07 15:38:03,119][04584] Num frames 6500...
8545
+ [2024-11-07 15:38:03,370][04584] Num frames 6600...
8546
+ [2024-11-07 15:38:03,681][04584] Num frames 6700...
8547
+ [2024-11-07 15:38:03,940][04584] Avg episode rewards: #0: 4.451, true rewards: #0: 3.981
8548
+ [2024-11-07 15:38:03,946][04584] Avg episode reward: 4.451, avg true_objective: 3.981
8549
+ [2024-11-07 15:38:04,043][04584] Num frames 6800...
8550
+ [2024-11-07 15:38:04,300][04584] Num frames 6900...
8551
+ [2024-11-07 15:38:04,526][04584] Num frames 7000...
8552
+ [2024-11-07 15:38:04,767][04584] Num frames 7100...
8553
+ [2024-11-07 15:38:05,021][04584] Num frames 7200...
8554
+ [2024-11-07 15:38:05,132][04584] Avg episode rewards: #0: 4.508, true rewards: #0: 4.008
8555
+ [2024-11-07 15:38:05,136][04584] Avg episode reward: 4.508, avg true_objective: 4.008
8556
+ [2024-11-07 15:38:05,405][04584] Num frames 7300...
8557
+ [2024-11-07 15:38:05,721][04584] Num frames 7400...
8558
+ [2024-11-07 15:38:06,063][04584] Num frames 7500...
8559
+ [2024-11-07 15:38:06,318][04584] Num frames 7600...
8560
+ [2024-11-07 15:38:06,580][04584] Num frames 7700...
8561
+ [2024-11-07 15:38:06,845][04584] Num frames 7800...
8562
+ [2024-11-07 15:38:06,965][04584] Avg episode rewards: #0: 4.749, true rewards: #0: 4.117
8563
+ [2024-11-07 15:38:06,971][04584] Avg episode reward: 4.749, avg true_objective: 4.117
8564
+ [2024-11-07 15:38:07,186][04584] Num frames 7900...
8565
+ [2024-11-07 15:38:07,437][04584] Num frames 8000...
8566
+ [2024-11-07 15:38:07,692][04584] Num frames 8100...
8567
+ [2024-11-07 15:38:07,937][04584] Num frames 8200...
8568
+ [2024-11-07 15:38:08,025][04584] Avg episode rewards: #0: 4.704, true rewards: #0: 4.103
8569
+ [2024-11-07 15:38:08,029][04584] Avg episode reward: 4.704, avg true_objective: 4.103
8570
+ [2024-11-07 15:38:08,317][04584] Num frames 8300...
8571
+ [2024-11-07 15:38:08,566][04584] Num frames 8400...
8572
+ [2024-11-07 15:38:08,837][04584] Num frames 8500...
8573
+ [2024-11-07 15:38:09,107][04584] Num frames 8600...
8574
+ [2024-11-07 15:38:09,307][04584] Avg episode rewards: #0: 4.740, true rewards: #0: 4.121
8575
+ [2024-11-07 15:38:09,308][04584] Avg episode reward: 4.740, avg true_objective: 4.121
8576
+ [2024-11-07 15:38:09,453][04584] Num frames 8700...
8577
+ [2024-11-07 15:38:09,686][04584] Num frames 8800...
8578
+ [2024-11-07 15:38:09,939][04584] Num frames 8900...
8579
+ [2024-11-07 15:38:10,177][04584] Num frames 9000...
8580
+ [2024-11-07 15:38:10,338][04584] Avg episode rewards: #0: 4.700, true rewards: #0: 4.109
8581
+ [2024-11-07 15:38:10,342][04584] Avg episode reward: 4.700, avg true_objective: 4.109
8582
+ [2024-11-07 15:38:10,508][04584] Num frames 9100...
8583
+ [2024-11-07 15:38:10,747][04584] Num frames 9200...
8584
+ [2024-11-07 15:38:11,002][04584] Num frames 9300...
8585
+ [2024-11-07 15:38:11,253][04584] Num frames 9400...
8586
+ [2024-11-07 15:38:11,371][04584] Avg episode rewards: #0: 4.662, true rewards: #0: 4.097
8587
+ [2024-11-07 15:38:11,373][04584] Avg episode reward: 4.662, avg true_objective: 4.097
8588
+ [2024-11-07 15:38:11,558][04584] Num frames 9500...
8589
+ [2024-11-07 15:38:11,797][04584] Num frames 9600...
8590
+ [2024-11-07 15:38:12,042][04584] Num frames 9700...
8591
+ [2024-11-07 15:38:12,321][04584] Num frames 9800...
8592
+ [2024-11-07 15:38:12,400][04584] Avg episode rewards: #0: 4.628, true rewards: #0: 4.086
8593
+ [2024-11-07 15:38:12,405][04584] Avg episode reward: 4.628, avg true_objective: 4.086
8594
+ [2024-11-07 15:38:12,627][04584] Num frames 9900...
8595
+ [2024-11-07 15:38:12,861][04584] Num frames 10000...
8596
+ [2024-11-07 15:38:13,121][04584] Num frames 10100...
8597
+ [2024-11-07 15:38:13,439][04584] Avg episode rewards: #0: 4.596, true rewards: #0: 4.076
8598
+ [2024-11-07 15:38:13,441][04584] Avg episode reward: 4.596, avg true_objective: 4.076
8599
+ [2024-11-07 15:38:13,474][04584] Num frames 10200...
8600
+ [2024-11-07 15:38:13,711][04584] Num frames 10300...
8601
+ [2024-11-07 15:38:13,959][04584] Num frames 10400...
8602
+ [2024-11-07 15:38:14,240][04584] Num frames 10500...
8603
+ [2024-11-07 15:38:14,512][04584] Avg episode rewards: #0: 4.567, true rewards: #0: 4.067
8604
+ [2024-11-07 15:38:14,518][04584] Avg episode reward: 4.567, avg true_objective: 4.067
8605
+ [2024-11-07 15:38:14,597][04584] Num frames 10600...
8606
+ [2024-11-07 15:38:14,856][04584] Num frames 10700...
8607
+ [2024-11-07 15:38:15,109][04584] Num frames 10800...
8608
+ [2024-11-07 15:38:15,380][04584] Num frames 10900...
8609
+ [2024-11-07 15:38:15,591][04584] Avg episode rewards: #0: 4.540, true rewards: #0: 4.059
8610
+ [2024-11-07 15:38:15,597][04584] Avg episode reward: 4.540, avg true_objective: 4.059
8611
+ [2024-11-07 15:38:15,718][04584] Num frames 11000...
8612
+ [2024-11-07 15:38:15,972][04584] Num frames 11100...
8613
+ [2024-11-07 15:38:16,222][04584] Num frames 11200...
8614
+ [2024-11-07 15:38:16,517][04584] Num frames 11300...
8615
+ [2024-11-07 15:38:16,679][04584] Avg episode rewards: #0: 4.515, true rewards: #0: 4.051
8616
+ [2024-11-07 15:38:16,682][04584] Avg episode reward: 4.515, avg true_objective: 4.051
8617
+ [2024-11-07 15:38:16,832][04584] Num frames 11400...
8618
+ [2024-11-07 15:38:17,074][04584] Num frames 11500...
8619
+ [2024-11-07 15:38:17,327][04584] Num frames 11600...
8620
+ [2024-11-07 15:38:17,592][04584] Num frames 11700...
8621
+ [2024-11-07 15:38:17,715][04584] Avg episode rewards: #0: 4.492, true rewards: #0: 4.044
8622
+ [2024-11-07 15:38:17,716][04584] Avg episode reward: 4.492, avg true_objective: 4.044
8623
+ [2024-11-07 15:38:17,918][04584] Num frames 11800...
8624
+ [2024-11-07 15:38:18,163][04584] Num frames 11900...
8625
+ [2024-11-07 15:38:18,415][04584] Num frames 12000...
8626
+ [2024-11-07 15:38:18,667][04584] Num frames 12100...
8627
+ [2024-11-07 15:38:18,899][04584] Avg episode rewards: #0: 4.525, true rewards: #0: 4.058
8628
+ [2024-11-07 15:38:18,901][04584] Avg episode reward: 4.525, avg true_objective: 4.058
8629
+ [2024-11-07 15:38:18,974][04584] Num frames 12200...
8630
+ [2024-11-07 15:38:19,220][04584] Num frames 12300...
8631
+ [2024-11-07 15:38:19,443][04584] Num frames 12400...
8632
+ [2024-11-07 15:38:19,702][04584] Num frames 12500...
8633
+ [2024-11-07 15:38:19,901][04584] Avg episode rewards: #0: 4.503, true rewards: #0: 4.051
8634
+ [2024-11-07 15:38:19,907][04584] Avg episode reward: 4.503, avg true_objective: 4.051
8635
+ [2024-11-07 15:38:20,027][04584] Num frames 12600...
8636
+ [2024-11-07 15:38:20,278][04584] Num frames 12700...
8637
+ [2024-11-07 15:38:20,526][04584] Num frames 12800...
8638
+ [2024-11-07 15:38:20,752][04584] Num frames 12900...
8639
+ [2024-11-07 15:38:20,839][04584] Avg episode rewards: #0: 4.503, true rewards: #0: 4.035
8640
+ [2024-11-07 15:38:20,846][04584] Avg episode reward: 4.503, avg true_objective: 4.035
8641
+ [2024-11-07 15:38:21,054][04584] Num frames 13000...
8642
+ [2024-11-07 15:38:21,288][04584] Num frames 13100...
8643
+ [2024-11-07 15:38:21,597][04584] Num frames 13200...
8644
+ [2024-11-07 15:38:21,876][04584] Avg episode rewards: #0: 4.483, true rewards: #0: 4.029
8645
+ [2024-11-07 15:38:21,880][04584] Avg episode reward: 4.483, avg true_objective: 4.029
8646
+ [2024-11-07 15:38:21,916][04584] Num frames 13300...
8647
+ [2024-11-07 15:38:22,192][04584] Num frames 13400...
8648
+ [2024-11-07 15:38:22,459][04584] Num frames 13500...
8649
+ [2024-11-07 15:38:22,719][04584] Num frames 13600...
8650
+ [2024-11-07 15:38:22,781][04584] Avg episode rewards: #0: 4.442, true rewards: #0: 4.001
8651
+ [2024-11-07 15:38:22,788][04584] Avg episode reward: 4.442, avg true_objective: 4.001
8652
+ [2024-11-07 15:38:23,111][04584] Num frames 13700...
8653
+ [2024-11-07 15:38:23,346][04584] Num frames 13800...
8654
+ [2024-11-07 15:38:23,604][04584] Num frames 13900...
8655
+ [2024-11-07 15:38:23,879][04584] Num frames 14000...
8656
+ [2024-11-07 15:38:23,997][04584] Avg episode rewards: #0: 4.434, true rewards: #0: 4.005
8657
+ [2024-11-07 15:38:24,001][04584] Avg episode reward: 4.434, avg true_objective: 4.005
8658
+ [2024-11-07 15:38:24,227][04584] Num frames 14100...
8659
+ [2024-11-07 15:38:24,495][04584] Num frames 14200...
8660
+ [2024-11-07 15:38:24,742][04584] Num frames 14300...
8661
+ [2024-11-07 15:38:25,027][04584] Num frames 14400...
8662
+ [2024-11-07 15:38:25,091][04584] Avg episode rewards: #0: 4.417, true rewards: #0: 4.001
8663
+ [2024-11-07 15:38:25,094][04584] Avg episode reward: 4.417, avg true_objective: 4.001
8664
+ [2024-11-07 15:38:25,343][04584] Num frames 14500...
8665
+ [2024-11-07 15:38:25,587][04584] Num frames 14600...
8666
+ [2024-11-07 15:38:25,844][04584] Num frames 14700...
8667
+ [2024-11-07 15:38:26,130][04584] Avg episode rewards: #0: 4.402, true rewards: #0: 3.996
8668
+ [2024-11-07 15:38:26,132][04584] Avg episode reward: 4.402, avg true_objective: 3.996
8669
+ [2024-11-07 15:38:26,174][04584] Num frames 14800...
8670
+ [2024-11-07 15:38:26,405][04584] Num frames 14900...
8671
+ [2024-11-07 15:38:26,648][04584] Num frames 15000...
8672
+ [2024-11-07 15:38:26,901][04584] Num frames 15100...
8673
+ [2024-11-07 15:38:27,138][04584] Avg episode rewards: #0: 4.387, true rewards: #0: 3.992
8674
+ [2024-11-07 15:38:27,141][04584] Avg episode reward: 4.387, avg true_objective: 3.992
8675
+ [2024-11-07 15:38:27,236][04584] Num frames 15200...
8676
+ [2024-11-07 15:38:27,478][04584] Num frames 15300...
8677
+ [2024-11-07 15:38:29,403][04584] Num frames 15400...
8678
+ [2024-11-07 15:38:29,697][04584] Num frames 15500...
8679
+ [2024-11-07 15:38:29,972][04584] Num frames 15600...
8680
+ [2024-11-07 15:38:30,085][04584] Avg episode rewards: #0: 4.415, true rewards: #0: 4.005
8681
+ [2024-11-07 15:38:30,089][04584] Avg episode reward: 4.415, avg true_objective: 4.005
8682
+ [2024-11-07 15:38:30,347][04584] Num frames 15700...
8683
+ [2024-11-07 15:38:30,657][04584] Num frames 15800...
8684
+ [2024-11-07 15:38:30,952][04584] Num frames 15900...
8685
+ [2024-11-07 15:38:31,268][04584] Num frames 16000...
8686
+ [2024-11-07 15:38:31,337][04584] Avg episode rewards: #0: 4.401, true rewards: #0: 4.000
8687
+ [2024-11-07 15:38:31,342][04584] Avg episode reward: 4.401, avg true_objective: 4.000
8688
+ [2024-11-07 15:38:31,684][04584] Num frames 16100...
8689
+ [2024-11-07 15:38:32,012][04584] Num frames 16200...
8690
+ [2024-11-07 15:38:32,319][04584] Num frames 16300...
8691
+ [2024-11-07 15:38:32,645][04584] Avg episode rewards: #0: 4.387, true rewards: #0: 3.997
8692
+ [2024-11-07 15:38:32,646][04584] Avg episode reward: 4.387, avg true_objective: 3.997
8693
+ [2024-11-07 15:38:32,697][04584] Num frames 16400...
8694
+ [2024-11-07 15:38:33,003][04584] Num frames 16500...
8695
+ [2024-11-07 15:38:33,307][04584] Num frames 16600...
8696
+ [2024-11-07 15:38:33,484][04584] Avg episode rewards: #0: 4.343, true rewards: #0: 3.962
8697
+ [2024-11-07 15:38:33,488][04584] Avg episode reward: 4.343, avg true_objective: 3.962
8698
+ [2024-11-07 15:38:33,668][04584] Num frames 16700...
8699
+ [2024-11-07 15:38:34,047][04584] Num frames 16800...
8700
+ [2024-11-07 15:38:34,523][04584] Num frames 16900...
8701
+ [2024-11-07 15:38:34,994][04584] Num frames 17000...
8702
+ [2024-11-07 15:38:35,794][04584] Avg episode rewards: #0: 4.370, true rewards: #0: 3.974
8703
+ [2024-11-07 15:38:35,798][04584] Avg episode reward: 4.370, avg true_objective: 3.974
8704
+ [2024-11-07 15:38:35,859][04584] Num frames 17100...
8705
+ [2024-11-07 15:38:36,262][04584] Num frames 17200...
8706
+ [2024-11-07 15:38:36,655][04584] Num frames 17300...
8707
+ [2024-11-07 15:38:37,022][04584] Num frames 17400...
8708
+ [2024-11-07 15:38:37,312][04584] Avg episode rewards: #0: 4.358, true rewards: #0: 3.971
8709
+ [2024-11-07 15:38:37,315][04584] Avg episode reward: 4.358, avg true_objective: 3.971
8710
+ [2024-11-07 15:38:37,421][04584] Num frames 17500...
8711
+ [2024-11-07 15:38:37,774][04584] Num frames 17600...
8712
+ [2024-11-07 15:38:38,184][04584] Num frames 17700...
8713
+ [2024-11-07 15:38:38,526][04584] Num frames 17800...
8714
+ [2024-11-07 15:38:38,811][04584] Avg episode rewards: #0: 4.346, true rewards: #0: 3.968
8715
+ [2024-11-07 15:38:38,813][04584] Avg episode reward: 4.346, avg true_objective: 3.968
8716
+ [2024-11-07 15:38:39,007][04584] Num frames 17900...
8717
+ [2024-11-07 15:38:39,335][04584] Num frames 18000...
8718
+ [2024-11-07 15:38:39,650][04584] Num frames 18100...
8719
+ [2024-11-07 15:38:39,961][04584] Num frames 18200...
8720
+ [2024-11-07 15:38:40,263][04584] Avg episode rewards: #0: 4.342, true rewards: #0: 3.973
8721
+ [2024-11-07 15:38:40,266][04584] Avg episode reward: 4.342, avg true_objective: 3.973
8722
+ [2024-11-07 15:38:40,341][04584] Num frames 18300...
8723
+ [2024-11-07 15:38:40,634][04584] Num frames 18400...
8724
+ [2024-11-07 15:38:40,919][04584] Num frames 18500...
8725
+ [2024-11-07 15:38:41,232][04584] Num frames 18600...
8726
+ [2024-11-07 15:38:41,471][04584] Avg episode rewards: #0: 4.331, true rewards: #0: 3.970
8727
+ [2024-11-07 15:38:41,474][04584] Avg episode reward: 4.331, avg true_objective: 3.970
8728
+ [2024-11-07 15:38:41,617][04584] Num frames 18700...
8729
+ [2024-11-07 15:38:42,015][04584] Num frames 18800...
8730
+ [2024-11-07 15:38:42,322][04584] Num frames 18900...
8731
+ [2024-11-07 15:38:42,631][04584] Num frames 19000...
8732
+ [2024-11-07 15:38:42,886][04584] Num frames 19100...
8733
+ [2024-11-07 15:38:42,995][04584] Avg episode rewards: #0: 4.355, true rewards: #0: 3.980
8734
+ [2024-11-07 15:38:42,997][04584] Avg episode reward: 4.355, avg true_objective: 3.980
8735
+ [2024-11-07 15:38:43,228][04584] Num frames 19200...
8736
+ [2024-11-07 15:38:43,472][04584] Num frames 19300...
8737
+ [2024-11-07 15:38:43,715][04584] Num frames 19400...
8738
+ [2024-11-07 15:38:44,003][04584] Avg episode rewards: #0: 4.345, true rewards: #0: 3.978
8739
+ [2024-11-07 15:38:44,006][04584] Avg episode reward: 4.345, avg true_objective: 3.978
8740
+ [2024-11-07 15:38:44,051][04584] Num frames 19500...
8741
+ [2024-11-07 15:38:44,300][04584] Num frames 19600...
8742
+ [2024-11-07 15:38:44,563][04584] Num frames 19700...
8743
+ [2024-11-07 15:38:44,799][04584] Num frames 19800...
8744
+ [2024-11-07 15:38:45,032][04584] Avg episode rewards: #0: 4.335, true rewards: #0: 3.975
8745
+ [2024-11-07 15:38:45,034][04584] Avg episode reward: 4.335, avg true_objective: 3.975
8746
+ [2024-11-07 15:39:53,318][04584] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4!
8747
+ [2024-11-07 15:40:57,409][04584] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json
8748
+ [2024-11-07 15:40:57,411][04584] Overriding arg 'num_workers' with value 4 passed from command line
8749
+ [2024-11-07 15:40:57,412][04584] Adding new argument 'no_render'=True that is not in the saved config file!
8750
+ [2024-11-07 15:40:57,414][04584] Adding new argument 'save_video'=True that is not in the saved config file!
8751
+ [2024-11-07 15:40:57,416][04584] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
8752
+ [2024-11-07 15:40:57,417][04584] Adding new argument 'video_name'=None that is not in the saved config file!
8753
+ [2024-11-07 15:40:57,418][04584] Adding new argument 'max_num_frames'=50000 that is not in the saved config file!
8754
+ [2024-11-07 15:40:57,421][04584] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
8755
+ [2024-11-07 15:40:57,423][04584] Adding new argument 'push_to_hub'=True that is not in the saved config file!
8756
+ [2024-11-07 15:40:57,424][04584] Adding new argument 'hf_repository'='alidenewade/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
8757
+ [2024-11-07 15:40:57,425][04584] Adding new argument 'policy_index'=0 that is not in the saved config file!
8758
+ [2024-11-07 15:40:57,427][04584] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
8759
+ [2024-11-07 15:40:57,431][04584] Adding new argument 'train_script'=None that is not in the saved config file!
8760
+ [2024-11-07 15:40:57,433][04584] Adding new argument 'enjoy_script'=None that is not in the saved config file!
8761
+ [2024-11-07 15:40:57,436][04584] Using frameskip 1 and render_action_repeat=4 for evaluation
8762
+ [2024-11-07 15:40:57,474][04584] RunningMeanStd input shape: (3, 72, 128)
8763
+ [2024-11-07 15:40:57,476][04584] RunningMeanStd input shape: (1,)
8764
+ [2024-11-07 15:40:57,512][04584] ConvEncoder: input_channels=3
8765
+ [2024-11-07 15:40:57,565][04584] Conv encoder output size: 512
8766
+ [2024-11-07 15:40:57,567][04584] Policy head output size: 512
8767
+ [2024-11-07 15:40:57,598][04584] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003910_16015360.pth...
8768
+ [2024-11-07 15:40:58,332][04584] Num frames 100...
8769
+ [2024-11-07 15:40:58,540][04584] Num frames 200...
8770
+ [2024-11-07 15:40:58,710][04584] Avg episode rewards: #0: 2.560, true rewards: #0: 2.560
8771
+ [2024-11-07 15:40:58,711][04584] Avg episode reward: 2.560, avg true_objective: 2.560
8772
+ [2024-11-07 15:40:58,803][04584] Num frames 300...
8773
+ [2024-11-07 15:40:58,973][04584] Num frames 400...
8774
+ [2024-11-07 15:40:59,162][04584] Num frames 500...
8775
+ [2024-11-07 15:40:59,354][04584] Num frames 600...
8776
+ [2024-11-07 15:40:59,547][04584] Avg episode rewards: #0: 3.860, true rewards: #0: 3.360
8777
+ [2024-11-07 15:40:59,550][04584] Avg episode reward: 3.860, avg true_objective: 3.360
8778
+ [2024-11-07 15:40:59,619][04584] Num frames 700...
8779
+ [2024-11-07 15:40:59,833][04584] Num frames 800...
8780
+ [2024-11-07 15:41:00,011][04584] Num frames 900...
8781
+ [2024-11-07 15:41:00,183][04584] Num frames 1000...
8782
+ [2024-11-07 15:41:00,331][04584] Avg episode rewards: #0: 3.853, true rewards: #0: 3.520
8783
+ [2024-11-07 15:41:00,333][04584] Avg episode reward: 3.853, avg true_objective: 3.520
8784
+ [2024-11-07 15:41:00,444][04584] Num frames 1100...
8785
+ [2024-11-07 15:41:00,779][04584] Num frames 1200...
8786
+ [2024-11-07 15:41:01,066][04584] Num frames 1300...
8787
+ [2024-11-07 15:41:01,488][04584] Num frames 1400...
8788
+ [2024-11-07 15:41:01,736][04584] Avg episode rewards: #0: 3.850, true rewards: #0: 3.600
8789
+ [2024-11-07 15:41:01,738][04584] Avg episode reward: 3.850, avg true_objective: 3.600
8790
+ [2024-11-07 15:41:01,904][04584] Num frames 1500...
8791
+ [2024-11-07 15:41:02,150][04584] Num frames 1600...
8792
+ [2024-11-07 15:41:02,426][04584] Num frames 1700...
8793
+ [2024-11-07 15:41:02,629][04584] Num frames 1800...
8794
+ [2024-11-07 15:41:02,733][04584] Avg episode rewards: #0: 3.848, true rewards: #0: 3.648
8795
+ [2024-11-07 15:41:02,736][04584] Avg episode reward: 3.848, avg true_objective: 3.648
8796
+ [2024-11-07 15:41:03,184][04584] Num frames 1900...
8797
+ [2024-11-07 15:41:03,620][04584] Num frames 2000...
8798
+ [2024-11-07 15:41:03,844][04584] Num frames 2100...
8799
+ [2024-11-07 15:41:04,080][04584] Num frames 2200...
8800
+ [2024-11-07 15:41:04,173][04584] Avg episode rewards: #0: 3.847, true rewards: #0: 3.680
8801
+ [2024-11-07 15:41:04,174][04584] Avg episode reward: 3.847, avg true_objective: 3.680
8802
+ [2024-11-07 15:41:04,417][04584] Num frames 2300...
8803
+ [2024-11-07 15:41:04,822][04584] Num frames 2400...
8804
+ [2024-11-07 15:41:05,110][04584] Num frames 2500...
8805
+ [2024-11-07 15:41:05,409][04584] Num frames 2600...
8806
+ [2024-11-07 15:41:05,670][04584] Avg episode rewards: #0: 4.080, true rewards: #0: 3.794
8807
+ [2024-11-07 15:41:05,674][04584] Avg episode reward: 4.080, avg true_objective: 3.794
8808
+ [2024-11-07 15:41:05,824][04584] Num frames 2700...
8809
+ [2024-11-07 15:41:06,090][04584] Num frames 2800...
8810
+ [2024-11-07 15:41:06,377][04584] Num frames 2900...
8811
+ [2024-11-07 15:41:06,676][04584] Num frames 3000...
8812
+ [2024-11-07 15:41:06,844][04584] Avg episode rewards: #0: 4.050, true rewards: #0: 3.800
8813
+ [2024-11-07 15:41:06,846][04584] Avg episode reward: 4.050, avg true_objective: 3.800
8814
+ [2024-11-07 15:41:07,023][04584] Num frames 3100...
8815
+ [2024-11-07 15:41:07,265][04584] Num frames 3200...
8816
+ [2024-11-07 15:41:07,535][04584] Num frames 3300...
8817
+ [2024-11-07 15:41:07,812][04584] Num frames 3400...
8818
+ [2024-11-07 15:41:07,932][04584] Avg episode rewards: #0: 4.027, true rewards: #0: 3.804
8819
+ [2024-11-07 15:41:07,935][04584] Avg episode reward: 4.027, avg true_objective: 3.804
8820
+ [2024-11-07 15:41:08,163][04584] Num frames 3500...
8821
+ [2024-11-07 15:41:08,509][04584] Num frames 3600...
8822
+ [2024-11-07 15:41:08,879][04584] Num frames 3700...
8823
+ [2024-11-07 15:41:09,205][04584] Num frames 3800...
8824
+ [2024-11-07 15:41:09,303][04584] Avg episode rewards: #0: 4.008, true rewards: #0: 3.808
8825
+ [2024-11-07 15:41:09,305][04584] Avg episode reward: 4.008, avg true_objective: 3.808
8826
+ [2024-11-07 15:41:24,061][04584] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4!