[2024-11-07 12:11:25,065][118435] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... [2024-11-07 12:11:25,068][118435] Rollout worker 0 uses device cpu [2024-11-07 12:11:25,069][118435] Rollout worker 1 uses device cpu [2024-11-07 12:11:25,070][118435] Rollout worker 2 uses device cpu [2024-11-07 12:11:25,071][118435] Rollout worker 3 uses device cpu [2024-11-07 12:11:25,072][118435] Rollout worker 4 uses device cpu [2024-11-07 12:11:25,073][118435] Rollout worker 5 uses device cpu [2024-11-07 12:11:25,073][118435] Rollout worker 6 uses device cpu [2024-11-07 12:11:25,074][118435] Rollout worker 7 uses device cpu [2024-11-07 12:11:25,221][118435] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:11:25,222][118435] InferenceWorker_p0-w0: min num requests: 2 [2024-11-07 12:11:25,257][118435] Starting all processes... [2024-11-07 12:11:25,258][118435] Starting process learner_proc0 [2024-11-07 12:11:25,372][118435] Starting all processes... [2024-11-07 12:11:25,465][118435] Starting process inference_proc0-0 [2024-11-07 12:11:25,466][118435] Starting process rollout_proc0 [2024-11-07 12:11:25,467][118435] Starting process rollout_proc1 [2024-11-07 12:11:25,467][118435] Starting process rollout_proc2 [2024-11-07 12:11:25,470][118435] Starting process rollout_proc3 [2024-11-07 12:11:25,475][118435] Starting process rollout_proc4 [2024-11-07 12:11:25,475][118435] Starting process rollout_proc5 [2024-11-07 12:11:25,476][118435] Starting process rollout_proc6 [2024-11-07 12:11:25,477][118435] Starting process rollout_proc7 [2024-11-07 12:11:32,755][118900] Worker 3 uses CPU cores [3] [2024-11-07 12:11:32,895][118881] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:11:32,896][118881] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-11-07 12:11:33,210][118881] Num visible devices: 1 [2024-11-07 12:11:33,257][118881] Starting seed is not provided [2024-11-07 12:11:33,257][118881] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:11:33,258][118881] Initializing actor-critic model on device cuda:0 [2024-11-07 12:11:33,258][118881] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 12:11:33,260][118881] RunningMeanStd input shape: (1,) [2024-11-07 12:11:33,371][118881] ConvEncoder: input_channels=3 [2024-11-07 12:11:33,723][118881] Conv encoder output size: 512 [2024-11-07 12:11:33,724][118881] Policy head output size: 512 [2024-11-07 12:11:33,957][118904] Worker 6 uses CPU cores [6] [2024-11-07 12:11:34,139][118901] Worker 2 uses CPU cores [2] [2024-11-07 12:11:34,178][118899] Worker 1 uses CPU cores [1] [2024-11-07 12:11:34,283][118903] Worker 5 uses CPU cores [5] [2024-11-07 12:11:34,454][118897] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:11:34,455][118897] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-11-07 12:11:34,489][118897] Num visible devices: 1 [2024-11-07 12:11:34,503][118902] Worker 4 uses CPU cores [4] [2024-11-07 12:11:34,563][118911] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] [2024-11-07 12:11:34,612][118898] Worker 0 uses CPU cores [0] [2024-11-07 12:11:34,638][118881] Created Actor Critic model with architecture: [2024-11-07 12:11:34,638][118881] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-11-07 12:11:36,780][118881] Using optimizer [2024-11-07 12:11:43,557][118435] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 118435], exiting... [2024-11-07 12:11:43,559][118899] Stopping RolloutWorker_w1... [2024-11-07 12:11:43,559][118898] Stopping RolloutWorker_w0... [2024-11-07 12:11:43,560][118911] Stopping RolloutWorker_w7... [2024-11-07 12:11:43,560][118898] Loop rollout_proc0_evt_loop terminating... [2024-11-07 12:11:43,560][118899] Loop rollout_proc1_evt_loop terminating... [2024-11-07 12:11:43,560][118903] Stopping RolloutWorker_w5... [2024-11-07 12:11:43,560][118897] Stopping InferenceWorker_p0-w0... [2024-11-07 12:11:43,561][118903] Loop rollout_proc5_evt_loop terminating... [2024-11-07 12:11:43,561][118897] Loop inference_proc0-0_evt_loop terminating... [2024-11-07 12:11:43,561][118901] Stopping RolloutWorker_w2... [2024-11-07 12:11:43,561][118900] Stopping RolloutWorker_w3... [2024-11-07 12:11:43,561][118901] Loop rollout_proc2_evt_loop terminating... [2024-11-07 12:11:43,562][118900] Loop rollout_proc3_evt_loop terminating... [2024-11-07 12:11:43,560][118435] Runner profile tree view: main_loop: 18.3035 [2024-11-07 12:11:43,569][118904] Stopping RolloutWorker_w6... [2024-11-07 12:11:43,569][118902] Stopping RolloutWorker_w4... [2024-11-07 12:11:43,569][118904] Loop rollout_proc6_evt_loop terminating... [2024-11-07 12:11:43,570][118902] Loop rollout_proc4_evt_loop terminating... [2024-11-07 12:11:43,569][118435] Collected {}, FPS: 0.0 [2024-11-07 12:11:43,572][118911] Loop rollout_proc7_evt_loop terminating... [2024-11-07 12:11:43,604][118881] Stopping Batcher_0... [2024-11-07 12:11:43,604][118881] Loop batcher_evt_loop terminating... [2024-11-07 12:11:43,805][118881] No checkpoints found [2024-11-07 12:11:43,805][118881] Did not load from checkpoint, starting from scratch! [2024-11-07 12:11:43,829][118881] Initialized policy 0 weights for model version 0 [2024-11-07 12:11:43,887][118881] LearnerWorker_p0 finished initialization! [2024-11-07 12:11:43,890][118881] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth... [2024-11-07 12:11:44,125][118881] Stopping LearnerWorker_p0... [2024-11-07 12:11:44,126][118881] Loop learner_proc0_evt_loop terminating... [2024-11-07 12:16:30,455][118435] Environment doom_basic already registered, overwriting... [2024-11-07 12:16:30,458][118435] Environment doom_two_colors_easy already registered, overwriting... [2024-11-07 12:16:30,459][118435] Environment doom_two_colors_hard already registered, overwriting... [2024-11-07 12:16:30,461][118435] Environment doom_dm already registered, overwriting... [2024-11-07 12:16:30,462][118435] Environment doom_dwango5 already registered, overwriting... [2024-11-07 12:16:30,465][118435] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-11-07 12:16:30,467][118435] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-11-07 12:16:30,468][118435] Environment doom_my_way_home already registered, overwriting... [2024-11-07 12:16:30,471][118435] Environment doom_deadly_corridor already registered, overwriting... [2024-11-07 12:16:30,473][118435] Environment doom_defend_the_center already registered, overwriting... [2024-11-07 12:16:30,474][118435] Environment doom_defend_the_line already registered, overwriting... [2024-11-07 12:16:30,476][118435] Environment doom_health_gathering already registered, overwriting... [2024-11-07 12:16:30,477][118435] Environment doom_health_gathering_supreme already registered, overwriting... [2024-11-07 12:16:30,478][118435] Environment doom_battle already registered, overwriting... [2024-11-07 12:16:30,481][118435] Environment doom_battle2 already registered, overwriting... [2024-11-07 12:16:30,482][118435] Environment doom_duel_bots already registered, overwriting... [2024-11-07 12:16:30,485][118435] Environment doom_deathmatch_bots already registered, overwriting... [2024-11-07 12:16:30,486][118435] Environment doom_duel already registered, overwriting... [2024-11-07 12:16:30,488][118435] Environment doom_deathmatch_full already registered, overwriting... [2024-11-07 12:16:30,489][118435] Environment doom_benchmark already registered, overwriting... [2024-11-07 12:16:30,490][118435] register_encoder_factory: [2024-11-07 12:16:30,506][118435] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json [2024-11-07 12:16:30,519][118435] Experiment dir /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment already exists! [2024-11-07 12:16:30,521][118435] Resuming existing experiment from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment... [2024-11-07 12:16:30,523][118435] Weights and Biases integration disabled [2024-11-07 12:16:30,527][118435] Environment var CUDA_VISIBLE_DEVICES is 0 [2024-11-07 12:16:35,987][118435] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/root/hfRL/ml/LunarLander-v2/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=4000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2024-11-07 12:16:35,989][118435] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... [2024-11-07 12:16:35,990][118435] Rollout worker 0 uses device cpu [2024-11-07 12:16:35,991][118435] Rollout worker 1 uses device cpu [2024-11-07 12:16:35,993][118435] Rollout worker 2 uses device cpu [2024-11-07 12:16:35,994][118435] Rollout worker 3 uses device cpu [2024-11-07 12:16:35,995][118435] Rollout worker 4 uses device cpu [2024-11-07 12:16:35,996][118435] Rollout worker 5 uses device cpu [2024-11-07 12:16:35,997][118435] Rollout worker 6 uses device cpu [2024-11-07 12:16:35,999][118435] Rollout worker 7 uses device cpu [2024-11-07 12:16:36,166][118435] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:16:36,168][118435] InferenceWorker_p0-w0: min num requests: 2 [2024-11-07 12:16:36,203][118435] Starting all processes... [2024-11-07 12:16:36,204][118435] Starting process learner_proc0 [2024-11-07 12:16:36,252][118435] Starting all processes... [2024-11-07 12:16:36,257][118435] Starting process inference_proc0-0 [2024-11-07 12:16:36,258][118435] Starting process rollout_proc0 [2024-11-07 12:16:36,259][118435] Starting process rollout_proc1 [2024-11-07 12:16:36,259][118435] Starting process rollout_proc2 [2024-11-07 12:16:36,260][118435] Starting process rollout_proc3 [2024-11-07 12:16:36,264][118435] Starting process rollout_proc4 [2024-11-07 12:16:36,265][118435] Starting process rollout_proc5 [2024-11-07 12:16:36,266][118435] Starting process rollout_proc6 [2024-11-07 12:16:36,266][118435] Starting process rollout_proc7 [2024-11-07 12:16:42,024][121082] Worker 6 uses CPU cores [6] [2024-11-07 12:16:42,134][121076] Worker 0 uses CPU cores [0] [2024-11-07 12:16:42,334][121080] Worker 5 uses CPU cores [5] [2024-11-07 12:16:42,498][121062] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:16:42,498][121062] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-11-07 12:16:42,545][121075] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:16:42,546][121075] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-11-07 12:16:42,646][121075] Num visible devices: 1 [2024-11-07 12:16:42,647][121062] Num visible devices: 1 [2024-11-07 12:16:42,665][121077] Worker 1 uses CPU cores [1] [2024-11-07 12:16:42,675][121062] Starting seed is not provided [2024-11-07 12:16:42,675][121062] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:16:42,675][121062] Initializing actor-critic model on device cuda:0 [2024-11-07 12:16:42,676][121062] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 12:16:42,679][121062] RunningMeanStd input shape: (1,) [2024-11-07 12:16:42,703][121062] ConvEncoder: input_channels=3 [2024-11-07 12:16:42,715][121081] Worker 3 uses CPU cores [3] [2024-11-07 12:16:43,044][121078] Worker 2 uses CPU cores [2] [2024-11-07 12:16:43,051][121079] Worker 4 uses CPU cores [4] [2024-11-07 12:16:43,067][121083] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] [2024-11-07 12:16:43,335][121062] Conv encoder output size: 512 [2024-11-07 12:16:43,335][121062] Policy head output size: 512 [2024-11-07 12:16:43,380][121062] Created Actor Critic model with architecture: [2024-11-07 12:16:43,380][121062] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-11-07 12:16:44,388][121062] Using optimizer [2024-11-07 12:16:47,810][121062] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth... [2024-11-07 12:16:47,858][121062] Loading model from checkpoint [2024-11-07 12:16:47,860][121062] Loaded experiment state at self.train_step=0, self.env_steps=0 [2024-11-07 12:16:47,861][121062] Initialized policy 0 weights for model version 0 [2024-11-07 12:16:47,868][121062] LearnerWorker_p0 finished initialization! [2024-11-07 12:16:47,868][121062] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:16:48,092][121075] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 12:16:48,094][121075] RunningMeanStd input shape: (1,) [2024-11-07 12:16:48,106][121075] ConvEncoder: input_channels=3 [2024-11-07 12:16:48,225][121075] Conv encoder output size: 512 [2024-11-07 12:16:48,225][121075] Policy head output size: 512 [2024-11-07 12:16:48,281][118435] Inference worker 0-0 is ready! [2024-11-07 12:16:48,282][118435] All inference workers are ready! Signal rollout workers to start! [2024-11-07 12:16:48,365][121081] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:16:48,371][121080] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:16:48,371][121079] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:16:48,376][121077] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:16:48,378][121076] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:16:48,381][121082] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:16:48,388][121083] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:16:48,396][121078] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:16:50,528][118435] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 12:16:54,744][121081] Decorrelating experience for 0 frames... [2024-11-07 12:16:54,744][121079] Decorrelating experience for 0 frames... [2024-11-07 12:16:55,091][121081] Decorrelating experience for 32 frames... [2024-11-07 12:16:55,528][118435] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 12:16:56,054][118435] Heartbeat connected on Batcher_0 [2024-11-07 12:16:56,058][118435] Heartbeat connected on LearnerWorker_p0 [2024-11-07 12:16:56,198][118435] Heartbeat connected on InferenceWorker_p0-w0 [2024-11-07 12:17:00,528][118435] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 12:17:06,999][118435] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 12:17:09,503][118435] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 118435], exiting... [2024-11-07 12:17:09,506][118435] Runner profile tree view: main_loop: 33.3031 [2024-11-07 12:17:09,506][121062] Stopping Batcher_0... [2024-11-07 12:17:09,509][121062] Loop batcher_evt_loop terminating... [2024-11-07 12:17:09,508][118435] Collected {0: 0}, FPS: 0.0 [2024-11-07 12:17:09,510][121062] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth... [2024-11-07 12:17:09,569][121075] Weights refcount: 2 0 [2024-11-07 12:17:09,572][121075] Stopping InferenceWorker_p0-w0... [2024-11-07 12:17:09,572][121075] Loop inference_proc0-0_evt_loop terminating... [2024-11-07 12:17:09,626][121062] Stopping LearnerWorker_p0... [2024-11-07 12:17:09,627][121062] Loop learner_proc0_evt_loop terminating... [2024-11-07 12:17:09,637][121083] Decorrelating experience for 0 frames... [2024-11-07 12:17:10,364][121082] Another process currently holds the lock /tmp/sf2_root/doom_003.lockfile, attempt: 1 [2024-11-07 12:17:16,520][121079] Another process currently holds the lock /tmp/sf2_root/doom_003.lockfile, attempt: 1 [2024-11-07 12:17:16,979][121081] Another process currently holds the lock /tmp/sf2_root/doom_003.lockfile, attempt: 1 [2024-11-07 12:17:17,874][121083] Decorrelating experience for 32 frames... [2024-11-07 12:17:18,292][121083] Decorrelating experience for 64 frames... [2024-11-07 12:17:18,658][121083] Decorrelating experience for 96 frames... [2024-11-07 12:17:19,206][121083] Stopping RolloutWorker_w7... [2024-11-07 12:17:19,207][121083] Loop rollout_proc7_evt_loop terminating... [2024-11-07 12:17:26,095][121082] Decorrelating experience for 0 frames... [2024-11-07 12:17:26,430][121079] Decorrelating experience for 32 frames... [2024-11-07 12:17:26,739][121081] Decorrelating experience for 64 frames... [2024-11-07 12:17:26,811][121079] Decorrelating experience for 64 frames... [2024-11-07 12:17:27,157][121081] Decorrelating experience for 96 frames... [2024-11-07 12:17:27,213][121079] Decorrelating experience for 96 frames... [2024-11-07 12:17:27,249][121081] Stopping RolloutWorker_w3... [2024-11-07 12:17:27,249][121081] Loop rollout_proc3_evt_loop terminating... [2024-11-07 12:17:27,328][121079] Stopping RolloutWorker_w4... [2024-11-07 12:17:27,328][121079] Loop rollout_proc4_evt_loop terminating... [2024-11-07 12:17:34,653][121082] Decorrelating experience for 32 frames... [2024-11-07 12:17:35,009][121082] Decorrelating experience for 64 frames... [2024-11-07 12:17:35,369][121082] Decorrelating experience for 96 frames... [2024-11-07 12:17:35,594][121082] Stopping RolloutWorker_w6... [2024-11-07 12:17:35,595][121082] Loop rollout_proc6_evt_loop terminating... [2024-11-07 12:17:50,263][122819] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... [2024-11-07 12:17:50,264][122819] Rollout worker 0 uses device cpu [2024-11-07 12:17:50,266][122819] Rollout worker 1 uses device cpu [2024-11-07 12:17:50,266][122819] Rollout worker 2 uses device cpu [2024-11-07 12:17:50,267][122819] Rollout worker 3 uses device cpu [2024-11-07 12:17:50,268][122819] Rollout worker 4 uses device cpu [2024-11-07 12:17:50,268][122819] Rollout worker 5 uses device cpu [2024-11-07 12:17:50,269][122819] Rollout worker 6 uses device cpu [2024-11-07 12:17:50,270][122819] Rollout worker 7 uses device cpu [2024-11-07 12:17:50,323][122819] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:17:50,324][122819] InferenceWorker_p0-w0: min num requests: 2 [2024-11-07 12:17:50,355][122819] Starting all processes... [2024-11-07 12:17:50,356][122819] Starting process learner_proc0 [2024-11-07 12:17:50,483][122819] Starting all processes... [2024-11-07 12:17:50,821][122819] Starting process inference_proc0-0 [2024-11-07 12:17:50,822][122819] Starting process rollout_proc0 [2024-11-07 12:17:50,822][122819] Starting process rollout_proc1 [2024-11-07 12:17:50,823][122819] Starting process rollout_proc2 [2024-11-07 12:17:50,823][122819] Starting process rollout_proc3 [2024-11-07 12:17:50,824][122819] Starting process rollout_proc4 [2024-11-07 12:17:50,824][122819] Starting process rollout_proc5 [2024-11-07 12:17:50,828][122819] Starting process rollout_proc6 [2024-11-07 12:17:50,829][122819] Starting process rollout_proc7 [2024-11-07 12:17:55,453][122943] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:17:55,454][122943] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-11-07 12:17:55,543][122944] Worker 1 uses CPU cores [1] [2024-11-07 12:17:55,604][122929] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:17:55,604][122929] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-11-07 12:17:55,716][122943] Num visible devices: 1 [2024-11-07 12:17:55,735][122929] Num visible devices: 1 [2024-11-07 12:17:55,780][122929] Starting seed is not provided [2024-11-07 12:17:55,780][122929] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:17:55,781][122929] Initializing actor-critic model on device cuda:0 [2024-11-07 12:17:55,781][122929] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 12:17:55,784][122929] RunningMeanStd input shape: (1,) [2024-11-07 12:17:55,829][122929] ConvEncoder: input_channels=3 [2024-11-07 12:17:55,890][122947] Worker 4 uses CPU cores [4] [2024-11-07 12:17:55,970][122945] Worker 2 uses CPU cores [2] [2024-11-07 12:17:55,995][122942] Worker 0 uses CPU cores [0] [2024-11-07 12:17:56,101][122948] Worker 5 uses CPU cores [5] [2024-11-07 12:17:56,137][122929] Conv encoder output size: 512 [2024-11-07 12:17:56,137][122929] Policy head output size: 512 [2024-11-07 12:17:56,170][122946] Worker 3 uses CPU cores [3] [2024-11-07 12:17:56,174][122929] Created Actor Critic model with architecture: [2024-11-07 12:17:56,175][122929] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-11-07 12:17:56,275][122956] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] [2024-11-07 12:17:56,391][122949] Worker 6 uses CPU cores [6] [2024-11-07 12:17:56,939][122929] Using optimizer [2024-11-07 12:17:59,196][122929] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth... [2024-11-07 12:17:59,227][122929] Loading model from checkpoint [2024-11-07 12:17:59,228][122929] Loaded experiment state at self.train_step=0, self.env_steps=0 [2024-11-07 12:17:59,229][122929] Initialized policy 0 weights for model version 0 [2024-11-07 12:17:59,236][122929] LearnerWorker_p0 finished initialization! [2024-11-07 12:17:59,239][122929] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:17:59,395][122943] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 12:17:59,396][122943] RunningMeanStd input shape: (1,) [2024-11-07 12:17:59,408][122943] ConvEncoder: input_channels=3 [2024-11-07 12:17:59,539][122943] Conv encoder output size: 512 [2024-11-07 12:17:59,539][122943] Policy head output size: 512 [2024-11-07 12:17:59,585][122819] Inference worker 0-0 is ready! [2024-11-07 12:17:59,586][122819] All inference workers are ready! Signal rollout workers to start! [2024-11-07 12:17:59,725][122942] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:17:59,728][122947] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:17:59,738][122949] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:17:59,742][122946] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:17:59,750][122944] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:17:59,779][122945] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:17:59,787][122956] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:17:59,820][122948] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:18:00,172][122947] Decorrelating experience for 0 frames... [2024-11-07 12:18:00,200][122942] Decorrelating experience for 0 frames... [2024-11-07 12:18:00,223][122944] Decorrelating experience for 0 frames... [2024-11-07 12:18:00,223][122949] Decorrelating experience for 0 frames... [2024-11-07 12:18:00,228][122956] Decorrelating experience for 0 frames... [2024-11-07 12:18:00,277][122948] Decorrelating experience for 0 frames... [2024-11-07 12:18:00,565][122947] Decorrelating experience for 32 frames... [2024-11-07 12:18:00,584][122949] Decorrelating experience for 32 frames... [2024-11-07 12:18:00,590][122956] Decorrelating experience for 32 frames... [2024-11-07 12:18:00,628][122944] Decorrelating experience for 32 frames... [2024-11-07 12:18:01,122][122942] Decorrelating experience for 32 frames... [2024-11-07 12:18:01,147][122945] Decorrelating experience for 0 frames... [2024-11-07 12:18:01,268][122947] Decorrelating experience for 64 frames... [2024-11-07 12:18:01,299][122948] Decorrelating experience for 32 frames... [2024-11-07 12:18:01,303][122956] Decorrelating experience for 64 frames... [2024-11-07 12:18:01,352][122944] Decorrelating experience for 64 frames... [2024-11-07 12:18:01,608][122946] Decorrelating experience for 0 frames... [2024-11-07 12:18:01,625][122945] Decorrelating experience for 32 frames... [2024-11-07 12:18:01,944][122947] Decorrelating experience for 96 frames... [2024-11-07 12:18:01,995][122956] Decorrelating experience for 96 frames... [2024-11-07 12:18:01,995][122944] Decorrelating experience for 96 frames... [2024-11-07 12:18:02,015][122942] Decorrelating experience for 64 frames... [2024-11-07 12:18:02,102][122948] Decorrelating experience for 64 frames... [2024-11-07 12:18:02,162][122946] Decorrelating experience for 32 frames... [2024-11-07 12:18:02,232][122945] Decorrelating experience for 64 frames... [2024-11-07 12:18:03,024][122949] Decorrelating experience for 64 frames... [2024-11-07 12:18:03,097][122948] Decorrelating experience for 96 frames... [2024-11-07 12:18:03,204][122946] Decorrelating experience for 64 frames... [2024-11-07 12:18:03,217][122942] Decorrelating experience for 96 frames... [2024-11-07 12:18:03,454][122949] Decorrelating experience for 96 frames... [2024-11-07 12:18:03,715][122946] Decorrelating experience for 96 frames... [2024-11-07 12:18:03,740][122945] Decorrelating experience for 96 frames... [2024-11-07 12:18:04,155][122819] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 12:18:09,155][122819] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 12:18:10,315][122819] Heartbeat connected on Batcher_0 [2024-11-07 12:18:10,318][122819] Heartbeat connected on LearnerWorker_p0 [2024-11-07 12:18:10,329][122819] Heartbeat connected on RolloutWorker_w0 [2024-11-07 12:18:10,333][122819] Heartbeat connected on RolloutWorker_w1 [2024-11-07 12:18:10,336][122819] Heartbeat connected on RolloutWorker_w2 [2024-11-07 12:18:10,340][122819] Heartbeat connected on RolloutWorker_w3 [2024-11-07 12:18:10,344][122819] Heartbeat connected on RolloutWorker_w4 [2024-11-07 12:18:10,348][122819] Heartbeat connected on RolloutWorker_w5 [2024-11-07 12:18:10,351][122819] Heartbeat connected on RolloutWorker_w6 [2024-11-07 12:18:10,355][122819] Heartbeat connected on RolloutWorker_w7 [2024-11-07 12:18:11,719][122819] Heartbeat connected on InferenceWorker_p0-w0 [2024-11-07 12:18:15,019][122819] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 9.0. Samples: 98. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 12:18:15,021][122819] Avg episode reward: [(0, '1.371')] [2024-11-07 12:18:15,670][122929] Signal inference workers to stop experience collection... [2024-11-07 12:18:15,689][122943] InferenceWorker_p0-w0: stopping experience collection [2024-11-07 12:18:19,155][122819] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 155.2. Samples: 2328. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 12:18:19,158][122819] Avg episode reward: [(0, '2.097')] [2024-11-07 12:18:24,155][122819] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 116.4. Samples: 2328. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 12:18:24,156][122819] Avg episode reward: [(0, '2.097')] [2024-11-07 12:18:29,155][122819] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 93.1. Samples: 2328. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 12:18:29,157][122819] Avg episode reward: [(0, '2.097')] [2024-11-07 12:18:29,624][122929] Signal inference workers to resume experience collection... [2024-11-07 12:18:29,624][122943] InferenceWorker_p0-w0: resuming experience collection [2024-11-07 12:18:34,155][122819] Fps is (10 sec: 3686.4, 60 sec: 1228.8, 300 sec: 1228.8). Total num frames: 36864. Throughput: 0: 260.8. Samples: 7824. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-11-07 12:18:34,157][122819] Avg episode reward: [(0, '3.975')] [2024-11-07 12:18:34,234][122943] Updated weights for policy 0, policy_version 10 (0.0033) [2024-11-07 12:18:39,156][122819] Fps is (10 sec: 7782.4, 60 sec: 2223.6, 300 sec: 2223.6). Total num frames: 77824. Throughput: 0: 567.8. Samples: 19874. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 12:18:39,159][122819] Avg episode reward: [(0, '4.551')] [2024-11-07 12:18:39,407][122943] Updated weights for policy 0, policy_version 20 (0.0031) [2024-11-07 12:18:44,155][122819] Fps is (10 sec: 7372.6, 60 sec: 2764.8, 300 sec: 2764.8). Total num frames: 110592. Throughput: 0: 625.4. Samples: 25018. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 12:18:44,158][122819] Avg episode reward: [(0, '4.375')] [2024-11-07 12:18:44,171][122929] Saving new best policy, reward=4.375! [2024-11-07 12:18:45,758][122943] Updated weights for policy 0, policy_version 30 (0.0032) [2024-11-07 12:18:49,155][122819] Fps is (10 sec: 5324.9, 60 sec: 2912.7, 300 sec: 2912.7). Total num frames: 131072. Throughput: 0: 752.9. Samples: 33882. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 12:18:49,157][122819] Avg episode reward: [(0, '4.381')] [2024-11-07 12:18:49,310][122929] Saving new best policy, reward=4.381! [2024-11-07 12:18:54,155][122819] Fps is (10 sec: 4915.3, 60 sec: 3194.9, 300 sec: 3194.9). Total num frames: 159744. Throughput: 0: 896.5. Samples: 40344. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-07 12:18:54,158][122819] Avg episode reward: [(0, '4.426')] [2024-11-07 12:18:54,169][122929] Saving new best policy, reward=4.426! [2024-11-07 12:18:54,743][122943] Updated weights for policy 0, policy_version 40 (0.0042) [2024-11-07 12:18:59,155][122819] Fps is (10 sec: 5734.4, 60 sec: 3425.8, 300 sec: 3425.8). Total num frames: 188416. Throughput: 0: 1008.9. Samples: 44626. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 12:18:59,156][122819] Avg episode reward: [(0, '4.479')] [2024-11-07 12:18:59,158][122929] Saving new best policy, reward=4.479! [2024-11-07 12:19:01,057][122943] Updated weights for policy 0, policy_version 50 (0.0042) [2024-11-07 12:19:04,155][122819] Fps is (10 sec: 6144.1, 60 sec: 3686.4, 300 sec: 3686.4). Total num frames: 221184. Throughput: 0: 1147.8. Samples: 53978. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 12:19:04,158][122819] Avg episode reward: [(0, '4.507')] [2024-11-07 12:19:04,306][122929] Saving new best policy, reward=4.507! [2024-11-07 12:19:07,031][122943] Updated weights for policy 0, policy_version 60 (0.0024) [2024-11-07 12:19:09,155][122819] Fps is (10 sec: 7372.8, 60 sec: 4369.1, 300 sec: 4033.0). Total num frames: 262144. Throughput: 0: 1398.8. Samples: 65272. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 12:19:09,157][122819] Avg episode reward: [(0, '4.468')] [2024-11-07 12:19:12,838][122943] Updated weights for policy 0, policy_version 70 (0.0029) [2024-11-07 12:19:14,156][122819] Fps is (10 sec: 6962.9, 60 sec: 4917.7, 300 sec: 4154.5). Total num frames: 290816. Throughput: 0: 1510.1. Samples: 70284. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 12:19:14,161][122819] Avg episode reward: [(0, '4.555')] [2024-11-07 12:19:14,197][122929] Saving new best policy, reward=4.555! [2024-11-07 12:19:19,162][122819] Fps is (10 sec: 6139.5, 60 sec: 5392.4, 300 sec: 4314.1). Total num frames: 323584. Throughput: 0: 1582.6. Samples: 79050. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 12:19:19,167][122819] Avg episode reward: [(0, '4.434')] [2024-11-07 12:19:19,851][122943] Updated weights for policy 0, policy_version 80 (0.0039) [2024-11-07 12:19:24,157][122819] Fps is (10 sec: 5324.3, 60 sec: 5734.3, 300 sec: 4300.7). Total num frames: 344064. Throughput: 0: 1477.7. Samples: 86374. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 12:19:24,159][122819] Avg episode reward: [(0, '4.234')] [2024-11-07 12:19:27,482][122943] Updated weights for policy 0, policy_version 90 (0.0041) [2024-11-07 12:19:29,155][122819] Fps is (10 sec: 5328.6, 60 sec: 6280.5, 300 sec: 4433.3). Total num frames: 376832. Throughput: 0: 1474.2. Samples: 91356. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-11-07 12:19:29,159][122819] Avg episode reward: [(0, '4.254')] [2024-11-07 12:19:33,232][122943] Updated weights for policy 0, policy_version 100 (0.0032) [2024-11-07 12:19:34,155][122819] Fps is (10 sec: 6964.1, 60 sec: 6280.5, 300 sec: 4596.6). Total num frames: 413696. Throughput: 0: 1512.4. Samples: 101942. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 12:19:34,160][122819] Avg episode reward: [(0, '4.592')] [2024-11-07 12:19:34,170][122929] Saving new best policy, reward=4.592! [2024-11-07 12:19:39,050][122943] Updated weights for policy 0, policy_version 110 (0.0031) [2024-11-07 12:19:39,155][122819] Fps is (10 sec: 7372.8, 60 sec: 6212.3, 300 sec: 4742.8). Total num frames: 450560. Throughput: 0: 1604.2. Samples: 112532. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 12:19:39,157][122819] Avg episode reward: [(0, '4.377')] [2024-11-07 12:19:44,155][122819] Fps is (10 sec: 7372.8, 60 sec: 6280.6, 300 sec: 4874.3). Total num frames: 487424. Throughput: 0: 1638.9. Samples: 118378. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 12:19:44,157][122819] Avg episode reward: [(0, '4.348')] [2024-11-07 12:19:44,180][122929] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000119_487424.pth... [2024-11-07 12:19:44,635][122943] Updated weights for policy 0, policy_version 120 (0.0033) [2024-11-07 12:19:49,156][122819] Fps is (10 sec: 7371.8, 60 sec: 6553.5, 300 sec: 4993.2). Total num frames: 524288. Throughput: 0: 1660.6. Samples: 128706. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 12:19:49,161][122819] Avg episode reward: [(0, '4.624')] [2024-11-07 12:19:49,165][122929] Saving new best policy, reward=4.624! [2024-11-07 12:19:50,445][122943] Updated weights for policy 0, policy_version 130 (0.0034) [2024-11-07 12:19:54,155][122819] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 5064.2). Total num frames: 557056. Throughput: 0: 1648.7. Samples: 139464. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 12:19:54,158][122819] Avg episode reward: [(0, '4.426')] [2024-11-07 12:19:57,840][122943] Updated weights for policy 0, policy_version 140 (0.0025) [2024-11-07 12:19:59,155][122819] Fps is (10 sec: 5735.1, 60 sec: 6553.6, 300 sec: 5057.7). Total num frames: 581632. Throughput: 0: 1598.8. Samples: 142230. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 12:19:59,157][122819] Avg episode reward: [(0, '4.516')] [2024-11-07 12:20:04,058][122943] Updated weights for policy 0, policy_version 150 (0.0033) [2024-11-07 12:20:04,155][122819] Fps is (10 sec: 5734.5, 60 sec: 6553.6, 300 sec: 5120.0). Total num frames: 614400. Throughput: 0: 1620.6. Samples: 151964. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 12:20:04,156][122819] Avg episode reward: [(0, '4.474')] [2024-11-07 12:20:09,155][122819] Fps is (10 sec: 6963.3, 60 sec: 6485.3, 300 sec: 5210.1). Total num frames: 651264. Throughput: 0: 1697.2. Samples: 162746. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 12:20:09,157][122819] Avg episode reward: [(0, '4.298')] [2024-11-07 12:20:09,614][122943] Updated weights for policy 0, policy_version 160 (0.0028) [2024-11-07 12:20:14,155][122819] Fps is (10 sec: 7372.6, 60 sec: 6621.9, 300 sec: 5293.3). Total num frames: 688128. Throughput: 0: 1718.5. Samples: 168688. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 12:20:14,158][122819] Avg episode reward: [(0, '4.335')] [2024-11-07 12:20:15,086][122943] Updated weights for policy 0, policy_version 170 (0.0025) [2024-11-07 12:20:19,155][122819] Fps is (10 sec: 6143.9, 60 sec: 6486.1, 300 sec: 5279.3). Total num frames: 712704. Throughput: 0: 1665.2. Samples: 176876. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 12:20:19,157][122819] Avg episode reward: [(0, '4.277')] [2024-11-07 12:20:22,352][122943] Updated weights for policy 0, policy_version 180 (0.0037) [2024-11-07 12:20:24,156][122819] Fps is (10 sec: 6143.6, 60 sec: 6758.5, 300 sec: 5354.0). Total num frames: 749568. Throughput: 0: 1663.3. Samples: 187384. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 12:20:24,158][122819] Avg episode reward: [(0, '4.517')] [2024-11-07 12:20:28,302][122943] Updated weights for policy 0, policy_version 190 (0.0031) [2024-11-07 12:20:30,637][122819] Fps is (10 sec: 6064.3, 60 sec: 6595.4, 300 sec: 5340.8). Total num frames: 782336. Throughput: 0: 1593.3. Samples: 192438. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 12:20:30,640][122819] Avg episode reward: [(0, '4.553')] [2024-11-07 12:20:34,155][122819] Fps is (10 sec: 5325.2, 60 sec: 6485.3, 300 sec: 5352.1). Total num frames: 802816. Throughput: 0: 1576.1. Samples: 199630. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 12:20:34,157][122819] Avg episode reward: [(0, '4.420')] [2024-11-07 12:20:36,012][122943] Updated weights for policy 0, policy_version 200 (0.0034) [2024-11-07 12:20:39,155][122819] Fps is (10 sec: 7213.3, 60 sec: 6553.6, 300 sec: 5443.7). Total num frames: 843776. Throughput: 0: 1581.4. Samples: 210628. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 12:20:39,156][122819] Avg episode reward: [(0, '4.603')] [2024-11-07 12:20:41,201][122943] Updated weights for policy 0, policy_version 210 (0.0021) [2024-11-07 12:20:44,155][122819] Fps is (10 sec: 7782.5, 60 sec: 6553.6, 300 sec: 5504.0). Total num frames: 880640. Throughput: 0: 1651.3. Samples: 216538. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 12:20:44,157][122819] Avg episode reward: [(0, '4.202')] [2024-11-07 12:20:46,573][122943] Updated weights for policy 0, policy_version 220 (0.0028) [2024-11-07 12:20:49,155][122819] Fps is (10 sec: 7372.7, 60 sec: 6553.7, 300 sec: 5560.6). Total num frames: 917504. Throughput: 0: 1685.6. Samples: 227816. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 12:20:49,160][122819] Avg episode reward: [(0, '4.434')] [2024-11-07 12:20:51,948][122943] Updated weights for policy 0, policy_version 230 (0.0027) [2024-11-07 12:20:54,155][122819] Fps is (10 sec: 7372.7, 60 sec: 6621.9, 300 sec: 5613.9). Total num frames: 954368. Throughput: 0: 1697.3. Samples: 239124. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 12:20:54,157][122819] Avg episode reward: [(0, '4.418')] [2024-11-07 12:20:57,740][122943] Updated weights for policy 0, policy_version 240 (0.0040) [2024-11-07 12:20:59,155][122819] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 5664.2). Total num frames: 991232. Throughput: 0: 1682.4. Samples: 244396. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 12:20:59,162][122819] Avg episode reward: [(0, '4.365')] [2024-11-07 12:21:04,507][122819] Fps is (10 sec: 5935.2, 60 sec: 6651.1, 300 sec: 5632.4). Total num frames: 1015808. Throughput: 0: 1695.8. Samples: 253782. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 12:21:04,509][122819] Avg episode reward: [(0, '4.325')] [2024-11-07 12:21:05,592][122943] Updated weights for policy 0, policy_version 250 (0.0034) [2024-11-07 12:21:09,155][122819] Fps is (10 sec: 4915.2, 60 sec: 6485.3, 300 sec: 5623.7). Total num frames: 1040384. Throughput: 0: 1629.6. Samples: 260714. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-07 12:21:09,165][122819] Avg episode reward: [(0, '4.297')] [2024-11-07 12:21:12,581][122943] Updated weights for policy 0, policy_version 260 (0.0046) [2024-11-07 12:21:14,156][122819] Fps is (10 sec: 5943.0, 60 sec: 6417.0, 300 sec: 5648.1). Total num frames: 1073152. Throughput: 0: 1674.0. Samples: 265290. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-07 12:21:14,158][122819] Avg episode reward: [(0, '4.538')] [2024-11-07 12:21:18,394][122943] Updated weights for policy 0, policy_version 270 (0.0029) [2024-11-07 12:21:19,155][122819] Fps is (10 sec: 6963.2, 60 sec: 6621.8, 300 sec: 5692.4). Total num frames: 1110016. Throughput: 0: 1693.4. Samples: 275834. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 12:21:19,157][122819] Avg episode reward: [(0, '4.423')] [2024-11-07 12:21:24,116][122943] Updated weights for policy 0, policy_version 280 (0.0046) [2024-11-07 12:21:24,155][122819] Fps is (10 sec: 7373.4, 60 sec: 6621.9, 300 sec: 5734.4). Total num frames: 1146880. Throughput: 0: 1682.9. Samples: 286360. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 12:21:24,156][122819] Avg episode reward: [(0, '4.495')] [2024-11-07 12:21:29,155][122819] Fps is (10 sec: 6963.3, 60 sec: 6789.6, 300 sec: 5754.4). Total num frames: 1179648. Throughput: 0: 1668.0. Samples: 291598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 12:21:29,157][122819] Avg episode reward: [(0, '4.812')] [2024-11-07 12:21:29,161][122929] Saving new best policy, reward=4.812! [2024-11-07 12:21:29,921][122943] Updated weights for policy 0, policy_version 290 (0.0049) [2024-11-07 12:21:34,155][122819] Fps is (10 sec: 6553.6, 60 sec: 6826.7, 300 sec: 5773.4). Total num frames: 1212416. Throughput: 0: 1644.4. Samples: 301814. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 12:21:34,157][122819] Avg episode reward: [(0, '4.543')] [2024-11-07 12:21:36,087][122943] Updated weights for policy 0, policy_version 300 (0.0029) [2024-11-07 12:21:39,155][122819] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 5810.6). Total num frames: 1249280. Throughput: 0: 1628.2. Samples: 312392. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 12:21:39,157][122819] Avg episode reward: [(0, '4.347')] [2024-11-07 12:21:41,723][122943] Updated weights for policy 0, policy_version 310 (0.0064) [2024-11-07 12:21:44,155][122819] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 5846.1). Total num frames: 1286144. Throughput: 0: 1634.3. Samples: 317938. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 12:21:44,157][122819] Avg episode reward: [(0, '4.511')] [2024-11-07 12:21:44,170][122929] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000314_1286144.pth... [2024-11-07 12:21:44,269][122929] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth [2024-11-07 12:21:47,402][122943] Updated weights for policy 0, policy_version 320 (0.0031) [2024-11-07 12:21:49,155][122819] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 5880.0). Total num frames: 1323008. Throughput: 0: 1685.0. Samples: 329016. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-11-07 12:21:49,157][122819] Avg episode reward: [(0, '4.405')] [2024-11-07 12:21:53,010][122943] Updated weights for policy 0, policy_version 330 (0.0025) [2024-11-07 12:21:54,155][122819] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 5894.7). Total num frames: 1355776. Throughput: 0: 1755.7. Samples: 339720. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 12:21:54,158][122819] Avg episode reward: [(0, '4.536')] [2024-11-07 12:21:58,782][122943] Updated weights for policy 0, policy_version 340 (0.0044) [2024-11-07 12:21:59,155][122819] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 5926.1). Total num frames: 1392640. Throughput: 0: 1772.8. Samples: 345064. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 12:21:59,158][122819] Avg episode reward: [(0, '4.511')] [2024-11-07 12:22:04,156][122819] Fps is (10 sec: 6552.9, 60 sec: 6798.1, 300 sec: 5922.1). Total num frames: 1421312. Throughput: 0: 1739.9. Samples: 354130. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 12:22:04,159][122819] Avg episode reward: [(0, '4.302')] [2024-11-07 12:22:05,663][122943] Updated weights for policy 0, policy_version 350 (0.0031) [2024-11-07 12:22:10,540][122819] Fps is (10 sec: 5036.8, 60 sec: 6672.6, 300 sec: 5885.0). Total num frames: 1449984. Throughput: 0: 1665.0. Samples: 363592. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 12:22:10,542][122819] Avg episode reward: [(0, '4.394')] [2024-11-07 12:22:13,964][122943] Updated weights for policy 0, policy_version 360 (0.0034) [2024-11-07 12:22:14,156][122819] Fps is (10 sec: 5325.2, 60 sec: 6690.2, 300 sec: 5898.2). Total num frames: 1474560. Throughput: 0: 1644.6. Samples: 365604. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 12:22:14,157][122819] Avg episode reward: [(0, '4.341')] [2024-11-07 12:22:19,155][122819] Fps is (10 sec: 7131.8, 60 sec: 6690.2, 300 sec: 5927.2). Total num frames: 1511424. Throughput: 0: 1649.2. Samples: 376026. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 12:22:19,158][122819] Avg episode reward: [(0, '4.387')] [2024-11-07 12:22:19,429][122943] Updated weights for policy 0, policy_version 370 (0.0025) [2024-11-07 12:22:24,156][122819] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 5954.9). Total num frames: 1548288. Throughput: 0: 1672.0. Samples: 387634. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 12:22:24,157][122819] Avg episode reward: [(0, '4.164')] [2024-11-07 12:22:24,864][122943] Updated weights for policy 0, policy_version 380 (0.0028) [2024-11-07 12:22:29,156][122819] Fps is (10 sec: 7372.4, 60 sec: 6758.3, 300 sec: 5981.7). Total num frames: 1585152. Throughput: 0: 1672.2. Samples: 393190. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 12:22:29,160][122819] Avg episode reward: [(0, '4.529')] [2024-11-07 12:22:30,336][122943] Updated weights for policy 0, policy_version 390 (0.0025) [2024-11-07 12:22:34,155][122819] Fps is (10 sec: 7373.2, 60 sec: 6826.7, 300 sec: 6007.5). Total num frames: 1622016. Throughput: 0: 1665.7. Samples: 403970. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 12:22:34,157][122819] Avg episode reward: [(0, '4.375')] [2024-11-07 12:22:36,419][122943] Updated weights for policy 0, policy_version 400 (0.0031) [2024-11-07 12:22:39,155][122819] Fps is (10 sec: 6963.6, 60 sec: 6758.4, 300 sec: 6017.4). Total num frames: 1654784. Throughput: 0: 1656.8. Samples: 414276. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 12:22:39,157][122819] Avg episode reward: [(0, '4.374')] [2024-11-07 12:22:42,402][122943] Updated weights for policy 0, policy_version 410 (0.0031) [2024-11-07 12:22:44,446][122819] Fps is (10 sec: 5970.1, 60 sec: 6589.9, 300 sec: 6006.1). Total num frames: 1683456. Throughput: 0: 1639.3. Samples: 419310. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 12:22:44,448][122819] Avg episode reward: [(0, '4.303')] [2024-11-07 12:22:49,155][122819] Fps is (10 sec: 5734.3, 60 sec: 6485.3, 300 sec: 6007.5). Total num frames: 1712128. Throughput: 0: 1616.8. Samples: 426886. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 12:22:49,157][122819] Avg episode reward: [(0, '4.298')] [2024-11-07 12:22:49,921][122943] Updated weights for policy 0, policy_version 420 (0.0029) [2024-11-07 12:22:54,155][122819] Fps is (10 sec: 6750.0, 60 sec: 6553.6, 300 sec: 6031.0). Total num frames: 1748992. Throughput: 0: 1696.2. Samples: 437572. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 12:22:54,157][122819] Avg episode reward: [(0, '4.409')] [2024-11-07 12:22:55,504][122943] Updated weights for policy 0, policy_version 430 (0.0040) [2024-11-07 12:22:59,155][122819] Fps is (10 sec: 7372.7, 60 sec: 6553.6, 300 sec: 6053.7). Total num frames: 1785856. Throughput: 0: 1718.7. Samples: 442944. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 12:22:59,157][122819] Avg episode reward: [(0, '4.417')] [2024-11-07 12:23:00,954][122943] Updated weights for policy 0, policy_version 440 (0.0028) [2024-11-07 12:23:04,155][122819] Fps is (10 sec: 6963.2, 60 sec: 6622.0, 300 sec: 6164.8). Total num frames: 1818624. Throughput: 0: 1725.8. Samples: 453688. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 12:23:04,157][122819] Avg episode reward: [(0, '4.456')] [2024-11-07 12:23:06,963][122943] Updated weights for policy 0, policy_version 450 (0.0028) [2024-11-07 12:23:09,155][122819] Fps is (10 sec: 7373.1, 60 sec: 6988.0, 300 sec: 6322.2). Total num frames: 1859584. Throughput: 0: 1710.8. Samples: 464618. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 12:23:09,159][122819] Avg episode reward: [(0, '4.311')] [2024-11-07 12:23:12,643][122943] Updated weights for policy 0, policy_version 460 (0.0027) [2024-11-07 12:23:14,155][122819] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6414.8). Total num frames: 1892352. Throughput: 0: 1705.7. Samples: 469946. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 12:23:14,158][122819] Avg episode reward: [(0, '4.664')] [2024-11-07 12:23:19,055][122819] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 122819], exiting... [2024-11-07 12:23:19,060][122929] Stopping Batcher_0... [2024-11-07 12:23:19,061][122929] Loop batcher_evt_loop terminating... [2024-11-07 12:23:19,060][122819] Runner profile tree view: main_loop: 328.7046 [2024-11-07 12:23:19,063][122929] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000468_1916928.pth... [2024-11-07 12:23:19,062][122819] Collected {0: 1916928}, FPS: 5831.8 [2024-11-07 12:23:19,095][122943] Weights refcount: 2 0 [2024-11-07 12:23:19,099][122943] Stopping InferenceWorker_p0-w0... [2024-11-07 12:23:19,100][122943] Loop inference_proc0-0_evt_loop terminating... [2024-11-07 12:23:19,237][122929] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000119_487424.pth [2024-11-07 12:23:19,246][122929] Stopping LearnerWorker_p0... [2024-11-07 12:23:19,246][122929] Loop learner_proc0_evt_loop terminating... [2024-11-07 12:23:19,533][122942] Stopping RolloutWorker_w0... [2024-11-07 12:23:19,536][122942] Loop rollout_proc0_evt_loop terminating... [2024-11-07 12:23:19,569][122947] Stopping RolloutWorker_w4... [2024-11-07 12:23:19,570][122947] Loop rollout_proc4_evt_loop terminating... [2024-11-07 12:23:19,566][122948] Stopping RolloutWorker_w5... [2024-11-07 12:23:19,572][122948] Loop rollout_proc5_evt_loop terminating... [2024-11-07 12:23:19,599][122946] Stopping RolloutWorker_w3... [2024-11-07 12:23:19,601][122946] Loop rollout_proc3_evt_loop terminating... [2024-11-07 12:23:19,626][122944] Stopping RolloutWorker_w1... [2024-11-07 12:23:19,626][122944] Loop rollout_proc1_evt_loop terminating... [2024-11-07 12:23:19,702][122949] Stopping RolloutWorker_w6... [2024-11-07 12:23:19,704][122949] Loop rollout_proc6_evt_loop terminating... [2024-11-07 12:23:19,735][122956] Stopping RolloutWorker_w7... [2024-11-07 12:23:19,737][122956] Loop rollout_proc7_evt_loop terminating... [2024-11-07 12:23:19,826][122945] Stopping RolloutWorker_w2... [2024-11-07 12:23:19,827][122945] Loop rollout_proc2_evt_loop terminating... [2024-11-07 12:26:37,256][125367] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... [2024-11-07 12:26:37,258][125367] Rollout worker 0 uses device cpu [2024-11-07 12:26:37,261][125367] Rollout worker 1 uses device cpu [2024-11-07 12:26:37,262][125367] Rollout worker 2 uses device cpu [2024-11-07 12:26:37,263][125367] Rollout worker 3 uses device cpu [2024-11-07 12:26:37,264][125367] Rollout worker 4 uses device cpu [2024-11-07 12:26:37,265][125367] Rollout worker 5 uses device cpu [2024-11-07 12:26:37,265][125367] Rollout worker 6 uses device cpu [2024-11-07 12:26:37,266][125367] Rollout worker 7 uses device cpu [2024-11-07 12:26:37,339][125367] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:26:37,341][125367] InferenceWorker_p0-w0: min num requests: 2 [2024-11-07 12:26:37,388][125367] Starting all processes... [2024-11-07 12:26:37,390][125367] Starting process learner_proc0 [2024-11-07 12:26:37,515][125367] Starting all processes... [2024-11-07 12:26:37,560][125367] Starting process inference_proc0-0 [2024-11-07 12:26:37,561][125367] Starting process rollout_proc0 [2024-11-07 12:26:37,562][125367] Starting process rollout_proc1 [2024-11-07 12:26:37,562][125367] Starting process rollout_proc2 [2024-11-07 12:26:37,568][125367] Starting process rollout_proc3 [2024-11-07 12:26:37,573][125367] Starting process rollout_proc4 [2024-11-07 12:26:37,581][125367] Starting process rollout_proc5 [2024-11-07 12:26:37,586][125367] Starting process rollout_proc6 [2024-11-07 12:26:37,588][125367] Starting process rollout_proc7 [2024-11-07 12:26:46,473][125885] Worker 5 uses CPU cores [5] [2024-11-07 12:26:46,763][125890] Worker 3 uses CPU cores [3] [2024-11-07 12:26:46,971][125892] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] [2024-11-07 12:26:47,013][125882] Worker 0 uses CPU cores [0] [2024-11-07 12:26:47,264][125868] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:26:47,264][125868] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-11-07 12:26:47,428][125881] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:26:47,428][125881] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-11-07 12:26:47,449][125868] Num visible devices: 1 [2024-11-07 12:26:47,454][125881] Num visible devices: 1 [2024-11-07 12:26:47,487][125868] Starting seed is not provided [2024-11-07 12:26:47,488][125868] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:26:47,488][125868] Initializing actor-critic model on device cuda:0 [2024-11-07 12:26:47,489][125868] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 12:26:47,493][125868] RunningMeanStd input shape: (1,) [2024-11-07 12:26:47,525][125868] ConvEncoder: input_channels=3 [2024-11-07 12:26:47,606][125887] Worker 4 uses CPU cores [4] [2024-11-07 12:26:47,620][125891] Worker 6 uses CPU cores [6] [2024-11-07 12:26:47,657][125884] Worker 2 uses CPU cores [2] [2024-11-07 12:26:47,713][125883] Worker 1 uses CPU cores [1] [2024-11-07 12:26:47,781][125868] Conv encoder output size: 512 [2024-11-07 12:26:47,782][125868] Policy head output size: 512 [2024-11-07 12:26:47,826][125868] Created Actor Critic model with architecture: [2024-11-07 12:26:47,826][125868] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-11-07 12:26:49,316][125868] Using optimizer [2024-11-07 12:26:55,921][125868] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000468_1916928.pth... [2024-11-07 12:26:56,020][125868] Loading model from checkpoint [2024-11-07 12:26:56,022][125868] Loaded experiment state at self.train_step=468, self.env_steps=1916928 [2024-11-07 12:26:56,023][125868] Initialized policy 0 weights for model version 468 [2024-11-07 12:26:56,029][125868] LearnerWorker_p0 finished initialization! [2024-11-07 12:26:56,030][125868] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:26:56,062][125367] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 1916928. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 12:26:56,304][125881] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 12:26:56,305][125881] RunningMeanStd input shape: (1,) [2024-11-07 12:26:56,324][125881] ConvEncoder: input_channels=3 [2024-11-07 12:26:56,453][125881] Conv encoder output size: 512 [2024-11-07 12:26:56,453][125881] Policy head output size: 512 [2024-11-07 12:26:56,506][125367] Inference worker 0-0 is ready! [2024-11-07 12:26:56,507][125367] All inference workers are ready! Signal rollout workers to start! [2024-11-07 12:26:56,586][125890] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:26:56,588][125887] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:26:56,594][125884] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:26:56,609][125891] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:26:56,617][125883] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:26:56,624][125882] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:26:56,651][125885] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:26:56,658][125892] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:26:57,329][125367] Heartbeat connected on Batcher_0 [2024-11-07 12:26:57,332][125367] Heartbeat connected on LearnerWorker_p0 [2024-11-07 12:26:57,370][125367] Heartbeat connected on InferenceWorker_p0-w0 [2024-11-07 12:26:59,179][125884] Decorrelating experience for 0 frames... [2024-11-07 12:26:59,179][125892] Decorrelating experience for 0 frames... [2024-11-07 12:26:59,179][125891] Decorrelating experience for 0 frames... [2024-11-07 12:26:59,179][125890] Decorrelating experience for 0 frames... [2024-11-07 12:26:59,650][125891] Decorrelating experience for 32 frames... [2024-11-07 12:26:59,654][125892] Decorrelating experience for 32 frames... [2024-11-07 12:26:59,656][125890] Decorrelating experience for 32 frames... [2024-11-07 12:26:59,674][125887] Decorrelating experience for 0 frames... [2024-11-07 12:26:59,679][125885] Decorrelating experience for 0 frames... [2024-11-07 12:27:00,122][125887] Decorrelating experience for 32 frames... [2024-11-07 12:27:00,169][125882] Decorrelating experience for 0 frames... [2024-11-07 12:27:00,172][125885] Decorrelating experience for 32 frames... [2024-11-07 12:27:00,399][125890] Decorrelating experience for 64 frames... [2024-11-07 12:27:00,739][125882] Decorrelating experience for 32 frames... [2024-11-07 12:27:00,752][125884] Decorrelating experience for 32 frames... [2024-11-07 12:27:00,767][125892] Decorrelating experience for 64 frames... [2024-11-07 12:27:01,033][125887] Decorrelating experience for 64 frames... [2024-11-07 12:27:01,046][125891] Decorrelating experience for 64 frames... [2024-11-07 12:27:01,062][125367] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 1916928. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 12:27:01,180][125885] Decorrelating experience for 64 frames... [2024-11-07 12:27:01,483][125883] Decorrelating experience for 0 frames... [2024-11-07 12:27:01,578][125890] Decorrelating experience for 96 frames... [2024-11-07 12:27:01,608][125892] Decorrelating experience for 96 frames... [2024-11-07 12:27:01,639][125887] Decorrelating experience for 96 frames... [2024-11-07 12:27:01,724][125367] Heartbeat connected on RolloutWorker_w3 [2024-11-07 12:27:01,812][125367] Heartbeat connected on RolloutWorker_w7 [2024-11-07 12:27:01,819][125884] Decorrelating experience for 64 frames... [2024-11-07 12:27:01,822][125367] Heartbeat connected on RolloutWorker_w4 [2024-11-07 12:27:01,937][125882] Decorrelating experience for 64 frames... [2024-11-07 12:27:01,946][125891] Decorrelating experience for 96 frames... [2024-11-07 12:27:02,111][125367] Heartbeat connected on RolloutWorker_w6 [2024-11-07 12:27:02,205][125883] Decorrelating experience for 32 frames... [2024-11-07 12:27:02,425][125885] Decorrelating experience for 96 frames... [2024-11-07 12:27:02,612][125367] Heartbeat connected on RolloutWorker_w5 [2024-11-07 12:27:02,723][125882] Decorrelating experience for 96 frames... [2024-11-07 12:27:02,806][125367] Heartbeat connected on RolloutWorker_w0 [2024-11-07 12:27:02,967][125884] Decorrelating experience for 96 frames... [2024-11-07 12:27:02,972][125883] Decorrelating experience for 64 frames... [2024-11-07 12:27:03,060][125367] Heartbeat connected on RolloutWorker_w2 [2024-11-07 12:27:03,582][125883] Decorrelating experience for 96 frames... [2024-11-07 12:27:03,729][125367] Heartbeat connected on RolloutWorker_w1 [2024-11-07 12:27:05,475][125868] Signal inference workers to stop experience collection... [2024-11-07 12:27:05,493][125881] InferenceWorker_p0-w0: stopping experience collection [2024-11-07 12:27:06,062][125367] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 1916928. Throughput: 0: 287.2. Samples: 2872. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 12:27:06,064][125367] Avg episode reward: [(0, '2.347')] [2024-11-07 12:27:11,062][125367] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 1916928. Throughput: 0: 191.5. Samples: 2872. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 12:27:11,063][125367] Avg episode reward: [(0, '2.347')] [2024-11-07 12:27:15,761][125868] Signal inference workers to resume experience collection... [2024-11-07 12:27:15,762][125881] InferenceWorker_p0-w0: resuming experience collection [2024-11-07 12:27:16,062][125367] Fps is (10 sec: 409.6, 60 sec: 204.8, 300 sec: 204.8). Total num frames: 1921024. Throughput: 0: 143.6. Samples: 2872. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-11-07 12:27:16,065][125367] Avg episode reward: [(0, '2.347')] [2024-11-07 12:27:21,062][125367] Fps is (10 sec: 3276.6, 60 sec: 1310.7, 300 sec: 1310.7). Total num frames: 1949696. Throughput: 0: 348.9. Samples: 8722. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-11-07 12:27:21,065][125367] Avg episode reward: [(0, '4.010')] [2024-11-07 12:27:22,265][125881] Updated weights for policy 0, policy_version 478 (0.0035) [2024-11-07 12:27:26,061][125367] Fps is (10 sec: 5734.7, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 1978368. Throughput: 0: 437.4. Samples: 13122. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-11-07 12:27:26,064][125367] Avg episode reward: [(0, '4.390')] [2024-11-07 12:27:28,815][125881] Updated weights for policy 0, policy_version 488 (0.0035) [2024-11-07 12:27:31,062][125367] Fps is (10 sec: 6144.3, 60 sec: 2691.7, 300 sec: 2691.7). Total num frames: 2011136. Throughput: 0: 635.8. Samples: 22252. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 12:27:31,064][125367] Avg episode reward: [(0, '4.244')] [2024-11-07 12:27:35,979][125881] Updated weights for policy 0, policy_version 498 (0.0044) [2024-11-07 12:27:36,062][125367] Fps is (10 sec: 6143.8, 60 sec: 3072.0, 300 sec: 3072.0). Total num frames: 2039808. Throughput: 0: 767.1. Samples: 30686. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 12:27:36,065][125367] Avg episode reward: [(0, '4.582')] [2024-11-07 12:27:41,062][125367] Fps is (10 sec: 5324.8, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 2064384. Throughput: 0: 776.3. Samples: 34932. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 12:27:41,066][125367] Avg episode reward: [(0, '4.401')] [2024-11-07 12:27:44,344][125881] Updated weights for policy 0, policy_version 508 (0.0059) [2024-11-07 12:27:46,062][125367] Fps is (10 sec: 4505.7, 60 sec: 3358.7, 300 sec: 3358.7). Total num frames: 2084864. Throughput: 0: 927.3. Samples: 41728. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 12:27:46,064][125367] Avg episode reward: [(0, '4.441')] [2024-11-07 12:27:51,064][125367] Fps is (10 sec: 3276.0, 60 sec: 3276.7, 300 sec: 3276.7). Total num frames: 2097152. Throughput: 0: 951.9. Samples: 45712. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-07 12:27:51,069][125367] Avg episode reward: [(0, '4.615')] [2024-11-07 12:27:56,062][125367] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3345.1). Total num frames: 2117632. Throughput: 0: 1015.4. Samples: 48564. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-07 12:27:56,069][125367] Avg episode reward: [(0, '4.553')] [2024-11-07 12:27:56,853][125881] Updated weights for policy 0, policy_version 518 (0.0057) [2024-11-07 12:28:01,062][125367] Fps is (10 sec: 3687.3, 60 sec: 3618.1, 300 sec: 3339.8). Total num frames: 2134016. Throughput: 0: 1142.8. Samples: 54296. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-07 12:28:01,064][125367] Avg episode reward: [(0, '4.420')] [2024-11-07 12:28:04,738][125367] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 125367], exiting... [2024-11-07 12:28:04,741][125868] Stopping Batcher_0... [2024-11-07 12:28:04,741][125367] Runner profile tree view: main_loop: 87.3535 [2024-11-07 12:28:04,744][125367] Collected {0: 2146304}, FPS: 2625.8 [2024-11-07 12:28:04,742][125868] Loop batcher_evt_loop terminating... [2024-11-07 12:28:04,801][125881] Weights refcount: 2 0 [2024-11-07 12:28:04,812][125881] Stopping InferenceWorker_p0-w0... [2024-11-07 12:28:04,812][125881] Loop inference_proc0-0_evt_loop terminating... [2024-11-07 12:28:04,826][125868] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000525_2150400.pth... [2024-11-07 12:28:05,126][125868] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000314_1286144.pth [2024-11-07 12:28:05,129][125868] Stopping LearnerWorker_p0... [2024-11-07 12:28:05,132][125868] Loop learner_proc0_evt_loop terminating... [2024-11-07 12:28:05,856][125884] Stopping RolloutWorker_w2... [2024-11-07 12:28:05,858][125884] Loop rollout_proc2_evt_loop terminating... [2024-11-07 12:28:05,873][125887] Stopping RolloutWorker_w4... [2024-11-07 12:28:05,874][125887] Loop rollout_proc4_evt_loop terminating... [2024-11-07 12:28:05,913][125885] Stopping RolloutWorker_w5... [2024-11-07 12:28:05,914][125885] Loop rollout_proc5_evt_loop terminating... [2024-11-07 12:28:05,915][125883] Stopping RolloutWorker_w1... [2024-11-07 12:28:05,916][125883] Loop rollout_proc1_evt_loop terminating... [2024-11-07 12:28:05,992][125890] Stopping RolloutWorker_w3... [2024-11-07 12:28:05,992][125890] Loop rollout_proc3_evt_loop terminating... [2024-11-07 12:28:06,002][125891] Stopping RolloutWorker_w6... [2024-11-07 12:28:06,009][125891] Loop rollout_proc6_evt_loop terminating... [2024-11-07 12:28:06,026][125892] Stopping RolloutWorker_w7... [2024-11-07 12:28:06,026][125892] Loop rollout_proc7_evt_loop terminating... [2024-11-07 12:28:06,164][125882] Stopping RolloutWorker_w0... [2024-11-07 12:28:06,165][125882] Loop rollout_proc0_evt_loop terminating... [2024-11-07 12:32:34,540][129156] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... [2024-11-07 12:32:34,542][129156] Rollout worker 0 uses device cpu [2024-11-07 12:32:34,543][129156] Rollout worker 1 uses device cpu [2024-11-07 12:32:34,544][129156] Rollout worker 2 uses device cpu [2024-11-07 12:32:34,546][129156] Rollout worker 3 uses device cpu [2024-11-07 12:32:34,547][129156] Rollout worker 4 uses device cpu [2024-11-07 12:32:34,548][129156] Rollout worker 5 uses device cpu [2024-11-07 12:32:34,551][129156] Rollout worker 6 uses device cpu [2024-11-07 12:32:34,552][129156] Rollout worker 7 uses device cpu [2024-11-07 12:32:34,629][129156] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:32:34,630][129156] InferenceWorker_p0-w0: min num requests: 2 [2024-11-07 12:32:34,670][129156] Starting all processes... [2024-11-07 12:32:34,671][129156] Starting process learner_proc0 [2024-11-07 12:32:34,759][129156] Starting all processes... [2024-11-07 12:32:34,888][129156] Starting process inference_proc0-0 [2024-11-07 12:32:34,889][129156] Starting process rollout_proc0 [2024-11-07 12:32:34,890][129156] Starting process rollout_proc1 [2024-11-07 12:32:34,892][129156] Starting process rollout_proc2 [2024-11-07 12:32:34,903][129156] Starting process rollout_proc3 [2024-11-07 12:32:34,905][129156] Starting process rollout_proc4 [2024-11-07 12:32:34,906][129156] Starting process rollout_proc5 [2024-11-07 12:32:34,907][129156] Starting process rollout_proc6 [2024-11-07 12:32:34,914][129156] Starting process rollout_proc7 [2024-11-07 12:32:42,998][129261] Worker 5 uses CPU cores [5] [2024-11-07 12:32:43,007][129242] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:32:43,007][129242] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-11-07 12:32:43,292][129242] Num visible devices: 1 [2024-11-07 12:32:43,328][129242] Starting seed is not provided [2024-11-07 12:32:43,329][129242] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:32:43,329][129242] Initializing actor-critic model on device cuda:0 [2024-11-07 12:32:43,330][129242] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 12:32:43,331][129242] RunningMeanStd input shape: (1,) [2024-11-07 12:32:43,368][129242] ConvEncoder: input_channels=3 [2024-11-07 12:32:43,403][129260] Worker 4 uses CPU cores [4] [2024-11-07 12:32:43,740][129242] Conv encoder output size: 512 [2024-11-07 12:32:43,742][129242] Policy head output size: 512 [2024-11-07 12:32:43,792][129242] Created Actor Critic model with architecture: [2024-11-07 12:32:43,793][129242] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-11-07 12:32:43,838][129255] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:32:43,838][129255] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-11-07 12:32:43,878][129258] Worker 2 uses CPU cores [2] [2024-11-07 12:32:43,890][129255] Num visible devices: 1 [2024-11-07 12:32:44,070][129257] Worker 1 uses CPU cores [1] [2024-11-07 12:32:44,178][129256] Worker 0 uses CPU cores [0] [2024-11-07 12:32:44,278][129263] Worker 6 uses CPU cores [6] [2024-11-07 12:32:44,298][129259] Worker 3 uses CPU cores [3] [2024-11-07 12:32:44,564][129262] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] [2024-11-07 12:32:45,125][129242] Using optimizer [2024-11-07 12:32:49,395][129242] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000525_2150400.pth... [2024-11-07 12:32:49,492][129242] Loading model from checkpoint [2024-11-07 12:32:49,494][129242] Loaded experiment state at self.train_step=525, self.env_steps=2150400 [2024-11-07 12:32:49,495][129242] Initialized policy 0 weights for model version 525 [2024-11-07 12:32:49,502][129242] LearnerWorker_p0 finished initialization! [2024-11-07 12:32:49,503][129242] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:32:49,707][129156] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 2150400. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 12:32:49,753][129255] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 12:32:49,754][129255] RunningMeanStd input shape: (1,) [2024-11-07 12:32:49,770][129255] ConvEncoder: input_channels=3 [2024-11-07 12:32:49,905][129255] Conv encoder output size: 512 [2024-11-07 12:32:49,906][129255] Policy head output size: 512 [2024-11-07 12:32:49,958][129156] Inference worker 0-0 is ready! [2024-11-07 12:32:49,959][129156] All inference workers are ready! Signal rollout workers to start! [2024-11-07 12:32:50,049][129260] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:32:50,071][129257] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:32:50,078][129256] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:32:50,084][129259] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:32:50,086][129263] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:32:50,091][129258] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:32:50,117][129261] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:32:50,151][129262] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:32:50,703][129260] Decorrelating experience for 0 frames... [2024-11-07 12:32:50,702][129259] Decorrelating experience for 0 frames... [2024-11-07 12:32:50,710][129258] Decorrelating experience for 0 frames... [2024-11-07 12:32:50,741][129263] Decorrelating experience for 0 frames... [2024-11-07 12:32:50,741][129257] Decorrelating experience for 0 frames... [2024-11-07 12:32:50,767][129256] Decorrelating experience for 0 frames... [2024-11-07 12:32:51,143][129260] Decorrelating experience for 32 frames... [2024-11-07 12:32:51,154][129259] Decorrelating experience for 32 frames... [2024-11-07 12:32:51,192][129257] Decorrelating experience for 32 frames... [2024-11-07 12:32:51,231][129263] Decorrelating experience for 32 frames... [2024-11-07 12:32:51,242][129261] Decorrelating experience for 0 frames... [2024-11-07 12:32:51,886][129256] Decorrelating experience for 32 frames... [2024-11-07 12:32:51,942][129258] Decorrelating experience for 32 frames... [2024-11-07 12:32:51,945][129261] Decorrelating experience for 32 frames... [2024-11-07 12:32:52,175][129257] Decorrelating experience for 64 frames... [2024-11-07 12:32:52,182][129259] Decorrelating experience for 64 frames... [2024-11-07 12:32:52,265][129260] Decorrelating experience for 64 frames... [2024-11-07 12:32:54,620][129156] Heartbeat connected on Batcher_0 [2024-11-07 12:32:54,632][129156] Heartbeat connected on LearnerWorker_p0 [2024-11-07 12:32:54,678][129156] Heartbeat connected on InferenceWorker_p0-w0 [2024-11-07 12:32:54,710][129156] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 2150400. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 12:32:55,085][129262] Decorrelating experience for 0 frames... [2024-11-07 12:32:55,146][129259] Decorrelating experience for 96 frames... [2024-11-07 12:32:55,158][129260] Decorrelating experience for 96 frames... [2024-11-07 12:32:55,164][129258] Decorrelating experience for 64 frames... [2024-11-07 12:32:55,171][129256] Decorrelating experience for 64 frames... [2024-11-07 12:32:55,297][129156] Heartbeat connected on RolloutWorker_w4 [2024-11-07 12:32:55,303][129261] Decorrelating experience for 64 frames... [2024-11-07 12:32:55,319][129263] Decorrelating experience for 64 frames... [2024-11-07 12:32:55,337][129156] Heartbeat connected on RolloutWorker_w3 [2024-11-07 12:32:55,799][129258] Decorrelating experience for 96 frames... [2024-11-07 12:32:55,896][129156] Heartbeat connected on RolloutWorker_w2 [2024-11-07 12:32:55,985][129257] Decorrelating experience for 96 frames... [2024-11-07 12:32:56,004][129262] Decorrelating experience for 32 frames... [2024-11-07 12:32:56,147][129156] Heartbeat connected on RolloutWorker_w1 [2024-11-07 12:32:56,171][129256] Decorrelating experience for 96 frames... [2024-11-07 12:32:56,220][129261] Decorrelating experience for 96 frames... [2024-11-07 12:32:56,258][129263] Decorrelating experience for 96 frames... [2024-11-07 12:32:56,279][129156] Heartbeat connected on RolloutWorker_w0 [2024-11-07 12:32:56,300][129156] Heartbeat connected on RolloutWorker_w5 [2024-11-07 12:32:56,342][129156] Heartbeat connected on RolloutWorker_w6 [2024-11-07 12:32:56,557][129262] Decorrelating experience for 64 frames... [2024-11-07 12:32:57,231][129262] Decorrelating experience for 96 frames... [2024-11-07 12:32:57,381][129156] Heartbeat connected on RolloutWorker_w7 [2024-11-07 12:32:59,711][129156] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 2150400. Throughput: 0: 148.3. Samples: 1484. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 12:32:59,713][129156] Avg episode reward: [(0, '1.831')] [2024-11-07 12:32:59,879][129242] Signal inference workers to stop experience collection... [2024-11-07 12:32:59,913][129255] InferenceWorker_p0-w0: stopping experience collection [2024-11-07 12:33:04,707][129156] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 2150400. Throughput: 0: 148.9. Samples: 2234. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 12:33:04,709][129156] Avg episode reward: [(0, '1.979')] [2024-11-07 12:33:08,573][129242] Signal inference workers to resume experience collection... [2024-11-07 12:33:08,574][129255] InferenceWorker_p0-w0: resuming experience collection [2024-11-07 12:33:09,707][129156] Fps is (10 sec: 819.6, 60 sec: 409.6, 300 sec: 409.6). Total num frames: 2158592. Throughput: 0: 111.7. Samples: 2234. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-11-07 12:33:09,712][129156] Avg episode reward: [(0, '2.843')] [2024-11-07 12:33:14,707][129156] Fps is (10 sec: 3686.5, 60 sec: 1474.6, 300 sec: 1474.6). Total num frames: 2187264. Throughput: 0: 367.0. Samples: 9174. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 12:33:14,709][129156] Avg episode reward: [(0, '3.927')] [2024-11-07 12:33:15,426][129255] Updated weights for policy 0, policy_version 535 (0.0041) [2024-11-07 12:33:19,707][129156] Fps is (10 sec: 4915.1, 60 sec: 1911.4, 300 sec: 1911.4). Total num frames: 2207744. Throughput: 0: 433.9. Samples: 13018. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 12:33:19,710][129156] Avg episode reward: [(0, '4.468')] [2024-11-07 12:33:24,035][129255] Updated weights for policy 0, policy_version 545 (0.0073) [2024-11-07 12:33:24,710][129156] Fps is (10 sec: 4504.2, 60 sec: 2340.4, 300 sec: 2340.4). Total num frames: 2232320. Throughput: 0: 565.1. Samples: 19780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 12:33:24,714][129156] Avg episode reward: [(0, '4.618')] [2024-11-07 12:33:29,707][129156] Fps is (10 sec: 4096.2, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 2248704. Throughput: 0: 630.4. Samples: 25214. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 12:33:29,708][129156] Avg episode reward: [(0, '4.466')] [2024-11-07 12:33:34,306][129255] Updated weights for policy 0, policy_version 555 (0.0049) [2024-11-07 12:33:34,707][129156] Fps is (10 sec: 4097.1, 60 sec: 2730.6, 300 sec: 2730.6). Total num frames: 2273280. Throughput: 0: 625.4. Samples: 28144. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 12:33:34,712][129156] Avg episode reward: [(0, '4.361')] [2024-11-07 12:33:39,707][129156] Fps is (10 sec: 5324.8, 60 sec: 3031.0, 300 sec: 3031.0). Total num frames: 2301952. Throughput: 0: 809.0. Samples: 36402. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 12:33:39,711][129156] Avg episode reward: [(0, '4.402')] [2024-11-07 12:33:41,514][129255] Updated weights for policy 0, policy_version 565 (0.0030) [2024-11-07 12:33:44,707][129156] Fps is (10 sec: 5734.5, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 2330624. Throughput: 0: 965.4. Samples: 44922. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 12:33:44,709][129156] Avg episode reward: [(0, '4.465')] [2024-11-07 12:33:48,703][129255] Updated weights for policy 0, policy_version 575 (0.0026) [2024-11-07 12:33:49,707][129156] Fps is (10 sec: 5734.2, 60 sec: 3481.6, 300 sec: 3481.6). Total num frames: 2359296. Throughput: 0: 1048.7. Samples: 49424. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 12:33:49,713][129156] Avg episode reward: [(0, '4.284')] [2024-11-07 12:33:54,707][129156] Fps is (10 sec: 5734.4, 60 sec: 3959.7, 300 sec: 3654.9). Total num frames: 2387968. Throughput: 0: 1234.8. Samples: 57800. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-11-07 12:33:54,709][129156] Avg episode reward: [(0, '4.330')] [2024-11-07 12:33:56,229][129255] Updated weights for policy 0, policy_version 585 (0.0046) [2024-11-07 12:33:59,707][129156] Fps is (10 sec: 5734.5, 60 sec: 4437.7, 300 sec: 3803.4). Total num frames: 2416640. Throughput: 0: 1272.5. Samples: 66438. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 12:33:59,709][129156] Avg episode reward: [(0, '4.526')] [2024-11-07 12:34:04,707][129156] Fps is (10 sec: 4505.7, 60 sec: 4710.4, 300 sec: 3768.3). Total num frames: 2433024. Throughput: 0: 1224.1. Samples: 68102. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 12:34:04,709][129156] Avg episode reward: [(0, '4.584')] [2024-11-07 12:34:05,375][129255] Updated weights for policy 0, policy_version 595 (0.0031) [2024-11-07 12:34:09,712][129156] Fps is (10 sec: 3684.7, 60 sec: 4914.8, 300 sec: 3788.6). Total num frames: 2453504. Throughput: 0: 1234.4. Samples: 75332. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 12:34:09,718][129156] Avg episode reward: [(0, '4.446')] [2024-11-07 12:34:14,707][129156] Fps is (10 sec: 4095.9, 60 sec: 4778.6, 300 sec: 3806.9). Total num frames: 2473984. Throughput: 0: 1235.3. Samples: 80802. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 12:34:14,718][129156] Avg episode reward: [(0, '4.383')] [2024-11-07 12:34:15,898][129255] Updated weights for policy 0, policy_version 605 (0.0067) [2024-11-07 12:34:19,708][129156] Fps is (10 sec: 3687.6, 60 sec: 4710.3, 300 sec: 3777.4). Total num frames: 2490368. Throughput: 0: 1233.0. Samples: 83632. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 12:34:19,711][129156] Avg episode reward: [(0, '4.357')] [2024-11-07 12:34:20,565][129156] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 129156], exiting... [2024-11-07 12:34:20,586][129242] Stopping Batcher_0... [2024-11-07 12:34:20,586][129242] Loop batcher_evt_loop terminating... [2024-11-07 12:34:20,570][129156] Runner profile tree view: main_loop: 105.9005 [2024-11-07 12:34:20,590][129156] Collected {0: 2490368}, FPS: 3210.3 [2024-11-07 12:34:20,692][129255] Weights refcount: 2 0 [2024-11-07 12:34:20,731][129255] Stopping InferenceWorker_p0-w0... [2024-11-07 12:34:20,732][129255] Loop inference_proc0-0_evt_loop terminating... [2024-11-07 12:34:20,733][129242] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000609_2494464.pth... [2024-11-07 12:34:20,954][129259] Stopping RolloutWorker_w3... [2024-11-07 12:34:20,955][129259] Loop rollout_proc3_evt_loop terminating... [2024-11-07 12:34:21,021][129260] Stopping RolloutWorker_w4... [2024-11-07 12:34:21,021][129260] Loop rollout_proc4_evt_loop terminating... [2024-11-07 12:34:21,091][129263] Stopping RolloutWorker_w6... [2024-11-07 12:34:21,092][129263] Loop rollout_proc6_evt_loop terminating... [2024-11-07 12:34:21,253][129242] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000468_1916928.pth [2024-11-07 12:34:21,242][129256] Stopping RolloutWorker_w0... [2024-11-07 12:34:21,259][129256] Loop rollout_proc0_evt_loop terminating... [2024-11-07 12:34:21,281][129242] Stopping LearnerWorker_p0... [2024-11-07 12:34:21,282][129242] Loop learner_proc0_evt_loop terminating... [2024-11-07 12:34:21,299][129257] Stopping RolloutWorker_w1... [2024-11-07 12:34:21,324][129257] Loop rollout_proc1_evt_loop terminating... [2024-11-07 12:34:21,363][129258] Stopping RolloutWorker_w2... [2024-11-07 12:34:21,364][129258] Loop rollout_proc2_evt_loop terminating... [2024-11-07 12:34:21,385][129262] Stopping RolloutWorker_w7... [2024-11-07 12:34:21,386][129262] Loop rollout_proc7_evt_loop terminating... [2024-11-07 12:34:21,515][129261] Stopping RolloutWorker_w5... [2024-11-07 12:34:21,523][129261] Loop rollout_proc5_evt_loop terminating... [2024-11-07 12:36:09,909][129156] Environment doom_basic already registered, overwriting... [2024-11-07 12:36:09,911][129156] Environment doom_two_colors_easy already registered, overwriting... [2024-11-07 12:36:09,912][129156] Environment doom_two_colors_hard already registered, overwriting... [2024-11-07 12:36:09,914][129156] Environment doom_dm already registered, overwriting... [2024-11-07 12:36:09,915][129156] Environment doom_dwango5 already registered, overwriting... [2024-11-07 12:36:09,916][129156] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-11-07 12:36:09,917][129156] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-11-07 12:36:09,919][129156] Environment doom_my_way_home already registered, overwriting... [2024-11-07 12:36:09,920][129156] Environment doom_deadly_corridor already registered, overwriting... [2024-11-07 12:36:09,921][129156] Environment doom_defend_the_center already registered, overwriting... [2024-11-07 12:36:09,922][129156] Environment doom_defend_the_line already registered, overwriting... [2024-11-07 12:36:09,922][129156] Environment doom_health_gathering already registered, overwriting... [2024-11-07 12:36:09,928][129156] Environment doom_health_gathering_supreme already registered, overwriting... [2024-11-07 12:36:09,930][129156] Environment doom_battle already registered, overwriting... [2024-11-07 12:36:09,931][129156] Environment doom_battle2 already registered, overwriting... [2024-11-07 12:36:09,933][129156] Environment doom_duel_bots already registered, overwriting... [2024-11-07 12:36:09,934][129156] Environment doom_deathmatch_bots already registered, overwriting... [2024-11-07 12:36:09,936][129156] Environment doom_duel already registered, overwriting... [2024-11-07 12:36:09,938][129156] Environment doom_deathmatch_full already registered, overwriting... [2024-11-07 12:36:09,941][129156] Environment doom_benchmark already registered, overwriting... [2024-11-07 12:36:09,943][129156] register_encoder_factory: [2024-11-07 12:36:10,059][129156] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json [2024-11-07 12:36:10,066][129156] Experiment dir /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment already exists! [2024-11-07 12:36:10,067][129156] Resuming existing experiment from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment... [2024-11-07 12:36:10,069][129156] Weights and Biases integration disabled [2024-11-07 12:36:10,073][129156] Environment var CUDA_VISIBLE_DEVICES is 0 [2024-11-07 12:36:15,029][129156] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/root/hfRL/ml/LunarLander-v2/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=4000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2024-11-07 12:36:15,031][129156] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... [2024-11-07 12:36:15,035][129156] Rollout worker 0 uses device cpu [2024-11-07 12:36:15,038][129156] Rollout worker 1 uses device cpu [2024-11-07 12:36:15,039][129156] Rollout worker 2 uses device cpu [2024-11-07 12:36:15,039][129156] Rollout worker 3 uses device cpu [2024-11-07 12:36:15,040][129156] Rollout worker 4 uses device cpu [2024-11-07 12:36:15,041][129156] Rollout worker 5 uses device cpu [2024-11-07 12:36:15,044][129156] Rollout worker 6 uses device cpu [2024-11-07 12:36:15,045][129156] Rollout worker 7 uses device cpu [2024-11-07 12:36:15,119][129156] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:36:15,121][129156] InferenceWorker_p0-w0: min num requests: 2 [2024-11-07 12:36:15,157][129156] Starting all processes... [2024-11-07 12:36:15,158][129156] Starting process learner_proc0 [2024-11-07 12:36:15,197][129156] Starting all processes... [2024-11-07 12:36:15,202][129156] Starting process inference_proc0-0 [2024-11-07 12:36:15,202][129156] Starting process rollout_proc0 [2024-11-07 12:36:15,204][129156] Starting process rollout_proc1 [2024-11-07 12:36:15,207][129156] Starting process rollout_proc2 [2024-11-07 12:36:15,208][129156] Starting process rollout_proc3 [2024-11-07 12:36:15,208][129156] Starting process rollout_proc4 [2024-11-07 12:36:15,209][129156] Starting process rollout_proc5 [2024-11-07 12:36:15,215][129156] Starting process rollout_proc6 [2024-11-07 12:36:15,226][129156] Starting process rollout_proc7 [2024-11-07 12:36:20,758][129156] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 129156], exiting... [2024-11-07 12:36:20,767][129156] Runner profile tree view: main_loop: 5.6098 [2024-11-07 12:36:20,769][129156] Collected {}, FPS: 0.0 [2024-11-07 12:36:21,570][130420] Worker 5 uses CPU cores [5] [2024-11-07 12:36:21,772][130420] Stopping RolloutWorker_w5... [2024-11-07 12:36:21,773][130420] Loop rollout_proc5_evt_loop terminating... [2024-11-07 12:36:21,981][130417] Worker 3 uses CPU cores [3] [2024-11-07 12:36:22,131][130417] Stopping RolloutWorker_w3... [2024-11-07 12:36:22,132][130417] Loop rollout_proc3_evt_loop terminating... [2024-11-07 12:36:22,630][130415] Worker 1 uses CPU cores [1] [2024-11-07 12:36:22,821][130415] Stopping RolloutWorker_w1... [2024-11-07 12:36:22,822][130415] Loop rollout_proc1_evt_loop terminating... [2024-11-07 12:36:23,284][130419] Worker 6 uses CPU cores [6] [2024-11-07 12:36:23,402][130419] Stopping RolloutWorker_w6... [2024-11-07 12:36:23,423][130419] Loop rollout_proc6_evt_loop terminating... [2024-11-07 12:36:23,485][130422] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] [2024-11-07 12:36:23,529][130422] Stopping RolloutWorker_w7... [2024-11-07 12:36:23,529][130422] Loop rollout_proc7_evt_loop terminating... [2024-11-07 12:36:23,762][130414] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:36:23,764][130414] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-11-07 12:36:23,870][130416] Worker 2 uses CPU cores [2] [2024-11-07 12:36:23,919][130418] Worker 4 uses CPU cores [4] [2024-11-07 12:36:23,929][130416] Stopping RolloutWorker_w2... [2024-11-07 12:36:23,929][130416] Loop rollout_proc2_evt_loop terminating... [2024-11-07 12:36:24,001][130418] Stopping RolloutWorker_w4... [2024-11-07 12:36:24,001][130418] Loop rollout_proc4_evt_loop terminating... [2024-11-07 12:36:24,328][130414] Num visible devices: 1 [2024-11-07 12:36:24,377][130414] Stopping InferenceWorker_p0-w0... [2024-11-07 12:36:24,378][130414] Loop inference_proc0-0_evt_loop terminating... [2024-11-07 12:36:24,462][130413] Worker 0 uses CPU cores [0] [2024-11-07 12:36:24,641][130413] Stopping RolloutWorker_w0... [2024-11-07 12:36:24,642][130413] Loop rollout_proc0_evt_loop terminating... [2024-11-07 12:36:24,841][130400] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:36:24,841][130400] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-11-07 12:36:24,938][130400] Num visible devices: 1 [2024-11-07 12:36:25,011][130400] Starting seed is not provided [2024-11-07 12:36:25,011][130400] Stopping Batcher_0... [2024-11-07 12:36:25,012][130400] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:36:25,012][130400] Loop batcher_evt_loop terminating... [2024-11-07 12:36:25,013][130400] Initializing actor-critic model on device cuda:0 [2024-11-07 12:36:25,014][130400] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 12:36:25,016][130400] RunningMeanStd input shape: (1,) [2024-11-07 12:36:25,101][130400] ConvEncoder: input_channels=3 [2024-11-07 12:36:25,691][130400] Conv encoder output size: 512 [2024-11-07 12:36:25,692][130400] Policy head output size: 512 [2024-11-07 12:36:25,718][130400] Created Actor Critic model with architecture: [2024-11-07 12:36:25,719][130400] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-11-07 12:36:26,916][130400] Using optimizer [2024-11-07 12:36:28,167][130400] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000609_2494464.pth... [2024-11-07 12:36:28,230][130400] Loading model from checkpoint [2024-11-07 12:36:28,231][130400] Loaded experiment state at self.train_step=609, self.env_steps=2494464 [2024-11-07 12:36:28,232][130400] Initialized policy 0 weights for model version 609 [2024-11-07 12:36:28,238][130400] LearnerWorker_p0 finished initialization! [2024-11-07 12:36:28,238][130400] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000609_2494464.pth... [2024-11-07 12:36:28,304][130400] Stopping LearnerWorker_p0... [2024-11-07 12:36:28,304][130400] Loop learner_proc0_evt_loop terminating... [2024-11-07 12:36:49,466][129156] Environment doom_basic already registered, overwriting... [2024-11-07 12:36:49,469][129156] Environment doom_two_colors_easy already registered, overwriting... [2024-11-07 12:36:49,469][129156] Environment doom_two_colors_hard already registered, overwriting... [2024-11-07 12:36:49,471][129156] Environment doom_dm already registered, overwriting... [2024-11-07 12:36:49,472][129156] Environment doom_dwango5 already registered, overwriting... [2024-11-07 12:36:49,473][129156] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-11-07 12:36:49,475][129156] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-11-07 12:36:49,477][129156] Environment doom_my_way_home already registered, overwriting... [2024-11-07 12:36:49,480][129156] Environment doom_deadly_corridor already registered, overwriting... [2024-11-07 12:36:49,482][129156] Environment doom_defend_the_center already registered, overwriting... [2024-11-07 12:36:49,484][129156] Environment doom_defend_the_line already registered, overwriting... [2024-11-07 12:36:49,485][129156] Environment doom_health_gathering already registered, overwriting... [2024-11-07 12:36:49,486][129156] Environment doom_health_gathering_supreme already registered, overwriting... [2024-11-07 12:36:49,488][129156] Environment doom_battle already registered, overwriting... [2024-11-07 12:36:49,489][129156] Environment doom_battle2 already registered, overwriting... [2024-11-07 12:36:49,491][129156] Environment doom_duel_bots already registered, overwriting... [2024-11-07 12:36:49,494][129156] Environment doom_deathmatch_bots already registered, overwriting... [2024-11-07 12:36:49,495][129156] Environment doom_duel already registered, overwriting... [2024-11-07 12:36:49,497][129156] Environment doom_deathmatch_full already registered, overwriting... [2024-11-07 12:36:49,498][129156] Environment doom_benchmark already registered, overwriting... [2024-11-07 12:36:49,499][129156] register_encoder_factory: [2024-11-07 12:36:49,515][129156] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json [2024-11-07 12:36:49,521][129156] Experiment dir /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment already exists! [2024-11-07 12:36:49,522][129156] Resuming existing experiment from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment... [2024-11-07 12:36:49,523][129156] Weights and Biases integration disabled [2024-11-07 12:36:49,527][129156] Environment var CUDA_VISIBLE_DEVICES is 0 [2024-11-07 12:36:54,270][129156] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/root/hfRL/ml/LunarLander-v2/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=4000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2024-11-07 12:36:54,272][129156] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... [2024-11-07 12:36:54,276][129156] Rollout worker 0 uses device cpu [2024-11-07 12:36:54,278][129156] Rollout worker 1 uses device cpu [2024-11-07 12:36:54,279][129156] Rollout worker 2 uses device cpu [2024-11-07 12:36:54,280][129156] Rollout worker 3 uses device cpu [2024-11-07 12:36:54,282][129156] Rollout worker 4 uses device cpu [2024-11-07 12:36:54,282][129156] Rollout worker 5 uses device cpu [2024-11-07 12:36:54,283][129156] Rollout worker 6 uses device cpu [2024-11-07 12:36:54,285][129156] Rollout worker 7 uses device cpu [2024-11-07 12:36:54,346][129156] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:36:54,348][129156] InferenceWorker_p0-w0: min num requests: 2 [2024-11-07 12:36:54,383][129156] Starting all processes... [2024-11-07 12:36:54,386][129156] Starting process learner_proc0 [2024-11-07 12:36:54,434][129156] Starting all processes... [2024-11-07 12:36:54,440][129156] Starting process inference_proc0-0 [2024-11-07 12:36:54,441][129156] Starting process rollout_proc0 [2024-11-07 12:36:54,447][129156] Starting process rollout_proc1 [2024-11-07 12:36:54,447][129156] Starting process rollout_proc2 [2024-11-07 12:36:54,447][129156] Starting process rollout_proc3 [2024-11-07 12:36:54,447][129156] Starting process rollout_proc4 [2024-11-07 12:36:54,448][129156] Starting process rollout_proc5 [2024-11-07 12:36:54,449][129156] Starting process rollout_proc6 [2024-11-07 12:36:54,450][129156] Starting process rollout_proc7 [2024-11-07 12:36:59,619][130707] Worker 6 uses CPU cores [6] [2024-11-07 12:36:59,676][130699] Worker 1 uses CPU cores [1] [2024-11-07 12:36:59,687][129156] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 129156], exiting... [2024-11-07 12:36:59,688][129156] Runner profile tree view: main_loop: 5.3055 [2024-11-07 12:36:59,689][129156] Collected {}, FPS: 0.0 [2024-11-07 12:36:59,702][130681] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:36:59,703][130681] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-11-07 12:36:59,769][130699] Stopping RolloutWorker_w1... [2024-11-07 12:36:59,770][130699] Loop rollout_proc1_evt_loop terminating... [2024-11-07 12:36:59,819][130707] Stopping RolloutWorker_w6... [2024-11-07 12:36:59,821][130707] Loop rollout_proc6_evt_loop terminating... [2024-11-07 12:36:59,828][130703] Worker 2 uses CPU cores [2] [2024-11-07 12:36:59,932][130703] Stopping RolloutWorker_w2... [2024-11-07 12:36:59,932][130703] Loop rollout_proc2_evt_loop terminating... [2024-11-07 12:36:59,967][130681] Num visible devices: 1 [2024-11-07 12:37:00,021][130681] Starting seed is not provided [2024-11-07 12:37:00,021][130681] Stopping Batcher_0... [2024-11-07 12:37:00,022][130681] Loop batcher_evt_loop terminating... [2024-11-07 12:37:00,022][130681] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:37:00,024][130681] Initializing actor-critic model on device cuda:0 [2024-11-07 12:37:00,025][130681] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 12:37:00,030][130681] RunningMeanStd input shape: (1,) [2024-11-07 12:37:00,138][130681] ConvEncoder: input_channels=3 [2024-11-07 12:37:00,190][130697] Worker 0 uses CPU cores [0] [2024-11-07 12:37:00,228][130701] Worker 4 uses CPU cores [4] [2024-11-07 12:37:00,268][130697] Stopping RolloutWorker_w0... [2024-11-07 12:37:00,268][130697] Loop rollout_proc0_evt_loop terminating... [2024-11-07 12:37:00,300][130700] Worker 3 uses CPU cores [3] [2024-11-07 12:37:00,325][130701] Stopping RolloutWorker_w4... [2024-11-07 12:37:00,329][130701] Loop rollout_proc4_evt_loop terminating... [2024-11-07 12:37:00,356][130700] Stopping RolloutWorker_w3... [2024-11-07 12:37:00,357][130700] Loop rollout_proc3_evt_loop terminating... [2024-11-07 12:37:00,511][130705] Worker 5 uses CPU cores [5] [2024-11-07 12:37:00,589][130705] Stopping RolloutWorker_w5... [2024-11-07 12:37:00,589][130705] Loop rollout_proc5_evt_loop terminating... [2024-11-07 12:37:00,637][130681] Conv encoder output size: 512 [2024-11-07 12:37:00,638][130681] Policy head output size: 512 [2024-11-07 12:37:00,720][130681] Created Actor Critic model with architecture: [2024-11-07 12:37:00,720][130681] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-11-07 12:37:00,922][130698] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:37:00,922][130698] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-11-07 12:37:00,971][130698] Num visible devices: 1 [2024-11-07 12:37:01,009][130698] Stopping InferenceWorker_p0-w0... [2024-11-07 12:37:01,010][130698] Loop inference_proc0-0_evt_loop terminating... [2024-11-07 12:37:01,410][130708] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] [2024-11-07 12:37:01,434][130708] Stopping RolloutWorker_w7... [2024-11-07 12:37:01,435][130708] Loop rollout_proc7_evt_loop terminating... [2024-11-07 12:37:02,039][130681] Using optimizer [2024-11-07 12:37:03,034][130681] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000609_2494464.pth... [2024-11-07 12:37:03,073][130681] Loading model from checkpoint [2024-11-07 12:37:03,075][130681] Loaded experiment state at self.train_step=609, self.env_steps=2494464 [2024-11-07 12:37:03,075][130681] Initialized policy 0 weights for model version 609 [2024-11-07 12:37:03,081][130681] LearnerWorker_p0 finished initialization! [2024-11-07 12:37:03,082][130681] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000609_2494464.pth... [2024-11-07 12:37:03,140][130681] Stopping LearnerWorker_p0... [2024-11-07 12:37:03,140][130681] Loop learner_proc0_evt_loop terminating... [2024-11-07 12:37:09,549][129156] Environment doom_basic already registered, overwriting... [2024-11-07 12:37:09,551][129156] Environment doom_two_colors_easy already registered, overwriting... [2024-11-07 12:37:09,553][129156] Environment doom_two_colors_hard already registered, overwriting... [2024-11-07 12:37:09,556][129156] Environment doom_dm already registered, overwriting... [2024-11-07 12:37:09,557][129156] Environment doom_dwango5 already registered, overwriting... [2024-11-07 12:37:09,559][129156] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-11-07 12:37:09,560][129156] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-11-07 12:37:09,561][129156] Environment doom_my_way_home already registered, overwriting... [2024-11-07 12:37:09,562][129156] Environment doom_deadly_corridor already registered, overwriting... [2024-11-07 12:37:09,563][129156] Environment doom_defend_the_center already registered, overwriting... [2024-11-07 12:37:09,564][129156] Environment doom_defend_the_line already registered, overwriting... [2024-11-07 12:37:09,566][129156] Environment doom_health_gathering already registered, overwriting... [2024-11-07 12:37:09,568][129156] Environment doom_health_gathering_supreme already registered, overwriting... [2024-11-07 12:37:09,568][129156] Environment doom_battle already registered, overwriting... [2024-11-07 12:37:09,570][129156] Environment doom_battle2 already registered, overwriting... [2024-11-07 12:37:09,572][129156] Environment doom_duel_bots already registered, overwriting... [2024-11-07 12:37:09,573][129156] Environment doom_deathmatch_bots already registered, overwriting... [2024-11-07 12:37:09,575][129156] Environment doom_duel already registered, overwriting... [2024-11-07 12:37:09,577][129156] Environment doom_deathmatch_full already registered, overwriting... [2024-11-07 12:37:09,581][129156] Environment doom_benchmark already registered, overwriting... [2024-11-07 12:37:09,582][129156] register_encoder_factory: [2024-11-07 12:37:09,599][129156] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json [2024-11-07 12:37:09,607][129156] Experiment dir /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment already exists! [2024-11-07 12:37:09,609][129156] Resuming existing experiment from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment... [2024-11-07 12:37:09,610][129156] Weights and Biases integration disabled [2024-11-07 12:37:09,614][129156] Environment var CUDA_VISIBLE_DEVICES is 0 [2024-11-07 12:37:11,933][129156] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/root/hfRL/ml/LunarLander-v2/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=4000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2024-11-07 12:37:11,934][129156] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... [2024-11-07 12:37:11,937][129156] Rollout worker 0 uses device cpu [2024-11-07 12:37:11,938][129156] Rollout worker 1 uses device cpu [2024-11-07 12:37:11,939][129156] Rollout worker 2 uses device cpu [2024-11-07 12:37:11,941][129156] Rollout worker 3 uses device cpu [2024-11-07 12:37:11,942][129156] Rollout worker 4 uses device cpu [2024-11-07 12:37:11,943][129156] Rollout worker 5 uses device cpu [2024-11-07 12:37:11,945][129156] Rollout worker 6 uses device cpu [2024-11-07 12:37:11,946][129156] Rollout worker 7 uses device cpu [2024-11-07 12:37:12,028][129156] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:37:12,031][129156] InferenceWorker_p0-w0: min num requests: 2 [2024-11-07 12:37:12,068][129156] Starting all processes... [2024-11-07 12:37:12,069][129156] Starting process learner_proc0 [2024-11-07 12:37:12,118][129156] Starting all processes... [2024-11-07 12:37:12,123][129156] Starting process inference_proc0-0 [2024-11-07 12:37:12,124][129156] Starting process rollout_proc0 [2024-11-07 12:37:12,124][129156] Starting process rollout_proc1 [2024-11-07 12:37:12,125][129156] Starting process rollout_proc2 [2024-11-07 12:37:12,126][129156] Starting process rollout_proc3 [2024-11-07 12:37:12,133][129156] Starting process rollout_proc4 [2024-11-07 12:37:12,137][129156] Starting process rollout_proc5 [2024-11-07 12:37:12,138][129156] Starting process rollout_proc6 [2024-11-07 12:37:12,139][129156] Starting process rollout_proc7 [2024-11-07 12:37:16,596][130924] Worker 1 uses CPU cores [1] [2024-11-07 12:37:16,645][130909] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:37:16,645][130909] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-11-07 12:37:16,761][130909] Num visible devices: 1 [2024-11-07 12:37:16,826][130909] Starting seed is not provided [2024-11-07 12:37:16,827][130909] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:37:16,827][130909] Initializing actor-critic model on device cuda:0 [2024-11-07 12:37:16,827][130909] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 12:37:16,828][130909] RunningMeanStd input shape: (1,) [2024-11-07 12:37:16,855][130909] ConvEncoder: input_channels=3 [2024-11-07 12:37:17,022][130926] Worker 3 uses CPU cores [3] [2024-11-07 12:37:17,032][130929] Worker 5 uses CPU cores [5] [2024-11-07 12:37:17,124][130909] Conv encoder output size: 512 [2024-11-07 12:37:17,125][130909] Policy head output size: 512 [2024-11-07 12:37:17,143][130909] Created Actor Critic model with architecture: [2024-11-07 12:37:17,144][130909] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-11-07 12:37:17,494][130923] Worker 0 uses CPU cores [0] [2024-11-07 12:37:17,505][130922] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:37:17,505][130922] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-11-07 12:37:17,506][130936] Worker 4 uses CPU cores [4] [2024-11-07 12:37:17,541][130922] Num visible devices: 1 [2024-11-07 12:37:17,576][130928] Worker 6 uses CPU cores [6] [2024-11-07 12:37:17,590][130927] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] [2024-11-07 12:37:17,637][130925] Worker 2 uses CPU cores [2] [2024-11-07 12:37:17,840][130909] Using optimizer [2024-11-07 12:37:18,775][130909] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000609_2494464.pth... [2024-11-07 12:37:18,816][130909] Loading model from checkpoint [2024-11-07 12:37:18,819][130909] Loaded experiment state at self.train_step=609, self.env_steps=2494464 [2024-11-07 12:37:18,819][130909] Initialized policy 0 weights for model version 609 [2024-11-07 12:37:18,826][130909] LearnerWorker_p0 finished initialization! [2024-11-07 12:37:18,828][130909] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:37:19,029][130922] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 12:37:19,030][130922] RunningMeanStd input shape: (1,) [2024-11-07 12:37:19,046][130922] ConvEncoder: input_channels=3 [2024-11-07 12:37:19,171][130922] Conv encoder output size: 512 [2024-11-07 12:37:19,172][130922] Policy head output size: 512 [2024-11-07 12:37:19,216][129156] Inference worker 0-0 is ready! [2024-11-07 12:37:19,217][129156] All inference workers are ready! Signal rollout workers to start! [2024-11-07 12:37:19,275][130929] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:37:19,278][130926] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:37:19,285][130936] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:37:19,288][130924] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:37:19,301][130928] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:37:19,303][130923] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:37:19,330][130927] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:37:19,348][130925] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:37:19,615][129156] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 2494464. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 12:37:21,706][130923] Decorrelating experience for 0 frames... [2024-11-07 12:37:21,708][130929] Decorrelating experience for 0 frames... [2024-11-07 12:37:21,709][130928] Decorrelating experience for 0 frames... [2024-11-07 12:37:21,712][130925] Decorrelating experience for 0 frames... [2024-11-07 12:37:21,718][130927] Decorrelating experience for 0 frames... [2024-11-07 12:37:21,721][130926] Decorrelating experience for 0 frames... [2024-11-07 12:37:22,045][130923] Decorrelating experience for 32 frames... [2024-11-07 12:37:22,061][130927] Decorrelating experience for 32 frames... [2024-11-07 12:37:22,062][130929] Decorrelating experience for 32 frames... [2024-11-07 12:37:22,121][130936] Decorrelating experience for 0 frames... [2024-11-07 12:37:22,140][130928] Decorrelating experience for 32 frames... [2024-11-07 12:37:22,477][130936] Decorrelating experience for 32 frames... [2024-11-07 12:37:22,491][130925] Decorrelating experience for 32 frames... [2024-11-07 12:37:22,515][130924] Decorrelating experience for 0 frames... [2024-11-07 12:37:22,659][130928] Decorrelating experience for 64 frames... [2024-11-07 12:37:22,663][130929] Decorrelating experience for 64 frames... [2024-11-07 12:37:23,009][130927] Decorrelating experience for 64 frames... [2024-11-07 12:37:23,040][130924] Decorrelating experience for 32 frames... [2024-11-07 12:37:23,063][130926] Decorrelating experience for 32 frames... [2024-11-07 12:37:23,153][130936] Decorrelating experience for 64 frames... [2024-11-07 12:37:23,247][130923] Decorrelating experience for 64 frames... [2024-11-07 12:37:23,290][130929] Decorrelating experience for 96 frames... [2024-11-07 12:37:23,343][130928] Decorrelating experience for 96 frames... [2024-11-07 12:37:23,594][130925] Decorrelating experience for 64 frames... [2024-11-07 12:37:23,652][130926] Decorrelating experience for 64 frames... [2024-11-07 12:37:23,698][130924] Decorrelating experience for 64 frames... [2024-11-07 12:37:23,709][130936] Decorrelating experience for 96 frames... [2024-11-07 12:37:23,949][130927] Decorrelating experience for 96 frames... [2024-11-07 12:37:24,191][130925] Decorrelating experience for 96 frames... [2024-11-07 12:37:24,236][130926] Decorrelating experience for 96 frames... [2024-11-07 12:37:24,271][130923] Decorrelating experience for 96 frames... [2024-11-07 12:37:24,478][130924] Decorrelating experience for 96 frames... [2024-11-07 12:37:24,615][129156] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 2494464. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 12:37:28,321][130909] Signal inference workers to stop experience collection... [2024-11-07 12:37:28,331][130922] InferenceWorker_p0-w0: stopping experience collection [2024-11-07 12:37:29,614][129156] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 2494464. Throughput: 0: 211.8. Samples: 2118. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 12:37:29,616][129156] Avg episode reward: [(0, '1.631')] [2024-11-07 12:37:30,568][129156] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 129156], exiting... [2024-11-07 12:37:30,571][130909] Stopping Batcher_0... [2024-11-07 12:37:30,571][130909] Loop batcher_evt_loop terminating... [2024-11-07 12:37:30,570][129156] Runner profile tree view: main_loop: 18.5024 [2024-11-07 12:37:30,574][129156] Collected {0: 2494464}, FPS: 0.0 [2024-11-07 12:37:30,588][130922] Weights refcount: 2 0 [2024-11-07 12:37:30,591][130922] Stopping InferenceWorker_p0-w0... [2024-11-07 12:37:30,592][130922] Loop inference_proc0-0_evt_loop terminating... [2024-11-07 12:37:30,719][130923] Stopping RolloutWorker_w0... [2024-11-07 12:37:30,720][130923] Loop rollout_proc0_evt_loop terminating... [2024-11-07 12:37:30,740][130936] Stopping RolloutWorker_w4... [2024-11-07 12:37:30,741][130936] Loop rollout_proc4_evt_loop terminating... [2024-11-07 12:37:30,745][130929] Stopping RolloutWorker_w5... [2024-11-07 12:37:30,746][130929] Loop rollout_proc5_evt_loop terminating... [2024-11-07 12:37:30,747][130925] Stopping RolloutWorker_w2... [2024-11-07 12:37:30,747][130925] Loop rollout_proc2_evt_loop terminating... [2024-11-07 12:37:30,778][130926] Stopping RolloutWorker_w3... [2024-11-07 12:37:30,779][130926] Loop rollout_proc3_evt_loop terminating... [2024-11-07 12:37:30,782][130928] Stopping RolloutWorker_w6... [2024-11-07 12:37:30,783][130928] Loop rollout_proc6_evt_loop terminating... [2024-11-07 12:37:30,814][130924] Stopping RolloutWorker_w1... [2024-11-07 12:37:30,816][130924] Loop rollout_proc1_evt_loop terminating... [2024-11-07 12:37:30,851][130927] Stopping RolloutWorker_w7... [2024-11-07 12:37:30,852][130927] Loop rollout_proc7_evt_loop terminating... [2024-11-07 12:37:35,714][130909] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000611_2502656.pth... [2024-11-07 12:37:35,783][130909] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000525_2150400.pth [2024-11-07 12:37:35,785][130909] Stopping LearnerWorker_p0... [2024-11-07 12:37:35,786][130909] Loop learner_proc0_evt_loop terminating... [2024-11-07 12:40:27,910][129156] Environment doom_basic already registered, overwriting... [2024-11-07 12:40:27,912][129156] Environment doom_two_colors_easy already registered, overwriting... [2024-11-07 12:40:27,913][129156] Environment doom_two_colors_hard already registered, overwriting... [2024-11-07 12:40:27,914][129156] Environment doom_dm already registered, overwriting... [2024-11-07 12:40:27,916][129156] Environment doom_dwango5 already registered, overwriting... [2024-11-07 12:40:27,917][129156] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-11-07 12:40:27,919][129156] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-11-07 12:40:27,921][129156] Environment doom_my_way_home already registered, overwriting... [2024-11-07 12:40:27,922][129156] Environment doom_deadly_corridor already registered, overwriting... [2024-11-07 12:40:27,923][129156] Environment doom_defend_the_center already registered, overwriting... [2024-11-07 12:40:27,924][129156] Environment doom_defend_the_line already registered, overwriting... [2024-11-07 12:40:27,926][129156] Environment doom_health_gathering already registered, overwriting... [2024-11-07 12:40:27,927][129156] Environment doom_health_gathering_supreme already registered, overwriting... [2024-11-07 12:40:27,928][129156] Environment doom_battle already registered, overwriting... [2024-11-07 12:40:27,929][129156] Environment doom_battle2 already registered, overwriting... [2024-11-07 12:40:27,930][129156] Environment doom_duel_bots already registered, overwriting... [2024-11-07 12:40:27,931][129156] Environment doom_deathmatch_bots already registered, overwriting... [2024-11-07 12:40:27,932][129156] Environment doom_duel already registered, overwriting... [2024-11-07 12:40:27,933][129156] Environment doom_deathmatch_full already registered, overwriting... [2024-11-07 12:40:27,935][129156] Environment doom_benchmark already registered, overwriting... [2024-11-07 12:40:27,937][129156] register_encoder_factory: [2024-11-07 12:40:27,953][129156] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json [2024-11-07 12:40:27,960][129156] Experiment dir /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment already exists! [2024-11-07 12:40:27,962][129156] Resuming existing experiment from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment... [2024-11-07 12:40:27,963][129156] Weights and Biases integration disabled [2024-11-07 12:40:27,966][129156] Environment var CUDA_VISIBLE_DEVICES is 0 [2024-11-07 12:40:31,814][129156] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/root/hfRL/ml/LunarLander-v2/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=4000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2024-11-07 12:40:31,816][129156] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... [2024-11-07 12:40:31,818][129156] Rollout worker 0 uses device cpu [2024-11-07 12:40:31,819][129156] Rollout worker 1 uses device cpu [2024-11-07 12:40:31,820][129156] Rollout worker 2 uses device cpu [2024-11-07 12:40:31,821][129156] Rollout worker 3 uses device cpu [2024-11-07 12:40:31,822][129156] Rollout worker 4 uses device cpu [2024-11-07 12:40:31,822][129156] Rollout worker 5 uses device cpu [2024-11-07 12:40:31,823][129156] Rollout worker 6 uses device cpu [2024-11-07 12:40:31,824][129156] Rollout worker 7 uses device cpu [2024-11-07 12:40:31,875][129156] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:40:31,876][129156] InferenceWorker_p0-w0: min num requests: 2 [2024-11-07 12:40:31,909][129156] Starting all processes... [2024-11-07 12:40:31,910][129156] Starting process learner_proc0 [2024-11-07 12:40:31,959][129156] Starting all processes... [2024-11-07 12:40:31,964][129156] Starting process inference_proc0-0 [2024-11-07 12:40:31,965][129156] Starting process rollout_proc0 [2024-11-07 12:40:31,965][129156] Starting process rollout_proc1 [2024-11-07 12:40:31,966][129156] Starting process rollout_proc2 [2024-11-07 12:40:31,966][129156] Starting process rollout_proc3 [2024-11-07 12:40:31,967][129156] Starting process rollout_proc4 [2024-11-07 12:40:31,967][129156] Starting process rollout_proc5 [2024-11-07 12:40:31,968][129156] Starting process rollout_proc6 [2024-11-07 12:40:31,970][129156] Starting process rollout_proc7 [2024-11-07 12:40:36,276][132047] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:40:36,276][132047] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-11-07 12:40:36,304][132058] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] [2024-11-07 12:40:36,305][132049] Worker 1 uses CPU cores [1] [2024-11-07 12:40:36,347][132048] Worker 0 uses CPU cores [0] [2024-11-07 12:40:36,369][132047] Num visible devices: 1 [2024-11-07 12:40:36,375][132053] Worker 5 uses CPU cores [5] [2024-11-07 12:40:36,574][132031] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:40:36,574][132031] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-11-07 12:40:36,575][132051] Worker 2 uses CPU cores [2] [2024-11-07 12:40:36,597][132031] Num visible devices: 1 [2024-11-07 12:40:36,605][132052] Worker 4 uses CPU cores [4] [2024-11-07 12:40:36,606][132055] Worker 6 uses CPU cores [6] [2024-11-07 12:40:36,613][132031] Starting seed is not provided [2024-11-07 12:40:36,614][132031] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:40:36,614][132031] Initializing actor-critic model on device cuda:0 [2024-11-07 12:40:36,614][132031] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 12:40:36,615][132031] RunningMeanStd input shape: (1,) [2024-11-07 12:40:36,638][132031] ConvEncoder: input_channels=3 [2024-11-07 12:40:36,746][132031] Conv encoder output size: 512 [2024-11-07 12:40:36,746][132031] Policy head output size: 512 [2024-11-07 12:40:36,759][132031] Created Actor Critic model with architecture: [2024-11-07 12:40:36,759][132031] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-11-07 12:40:36,784][132050] Worker 3 uses CPU cores [3] [2024-11-07 12:40:37,258][132031] Using optimizer [2024-11-07 12:40:38,132][132031] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000611_2502656.pth... [2024-11-07 12:40:38,170][132031] Loading model from checkpoint [2024-11-07 12:40:38,172][132031] Loaded experiment state at self.train_step=611, self.env_steps=2502656 [2024-11-07 12:40:38,172][132031] Initialized policy 0 weights for model version 611 [2024-11-07 12:40:38,178][132031] LearnerWorker_p0 finished initialization! [2024-11-07 12:40:38,178][132031] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 12:40:38,330][132047] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 12:40:38,331][132047] RunningMeanStd input shape: (1,) [2024-11-07 12:40:38,342][132047] ConvEncoder: input_channels=3 [2024-11-07 12:40:38,444][132047] Conv encoder output size: 512 [2024-11-07 12:40:38,444][132047] Policy head output size: 512 [2024-11-07 12:40:38,489][129156] Inference worker 0-0 is ready! [2024-11-07 12:40:38,490][129156] All inference workers are ready! Signal rollout workers to start! [2024-11-07 12:40:38,554][132050] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:40:38,560][132048] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:40:38,562][132053] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:40:38,564][132051] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:40:38,567][132055] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:40:38,567][132049] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:40:38,600][132052] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:40:38,613][132058] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 12:40:39,173][132051] Decorrelating experience for 0 frames... [2024-11-07 12:40:39,179][132048] Decorrelating experience for 0 frames... [2024-11-07 12:40:39,179][132053] Decorrelating experience for 0 frames... [2024-11-07 12:40:39,187][132050] Decorrelating experience for 0 frames... [2024-11-07 12:40:39,188][132058] Decorrelating experience for 0 frames... [2024-11-07 12:40:39,516][132051] Decorrelating experience for 32 frames... [2024-11-07 12:40:39,521][132050] Decorrelating experience for 32 frames... [2024-11-07 12:40:39,521][132049] Decorrelating experience for 0 frames... [2024-11-07 12:40:39,562][132058] Decorrelating experience for 32 frames... [2024-11-07 12:40:39,596][132052] Decorrelating experience for 0 frames... [2024-11-07 12:40:39,652][132048] Decorrelating experience for 32 frames... [2024-11-07 12:40:39,897][132053] Decorrelating experience for 32 frames... [2024-11-07 12:40:39,949][132050] Decorrelating experience for 64 frames... [2024-11-07 12:40:40,005][132051] Decorrelating experience for 64 frames... [2024-11-07 12:40:40,033][132058] Decorrelating experience for 64 frames... [2024-11-07 12:40:40,071][132049] Decorrelating experience for 32 frames... [2024-11-07 12:40:40,297][132052] Decorrelating experience for 32 frames... [2024-11-07 12:40:40,419][132050] Decorrelating experience for 96 frames... [2024-11-07 12:40:40,420][132051] Decorrelating experience for 96 frames... [2024-11-07 12:40:40,436][132048] Decorrelating experience for 64 frames... [2024-11-07 12:40:40,540][132058] Decorrelating experience for 96 frames... [2024-11-07 12:40:40,637][132049] Decorrelating experience for 64 frames... [2024-11-07 12:40:40,855][132052] Decorrelating experience for 64 frames... [2024-11-07 12:40:41,058][132055] Decorrelating experience for 0 frames... [2024-11-07 12:40:41,163][132053] Decorrelating experience for 64 frames... [2024-11-07 12:40:41,270][132048] Decorrelating experience for 96 frames... [2024-11-07 12:40:41,699][132052] Decorrelating experience for 96 frames... [2024-11-07 12:40:41,819][132049] Decorrelating experience for 96 frames... [2024-11-07 12:40:41,992][132055] Decorrelating experience for 32 frames... [2024-11-07 12:40:42,383][132053] Decorrelating experience for 96 frames... [2024-11-07 12:40:42,749][132055] Decorrelating experience for 64 frames... [2024-11-07 12:40:42,967][129156] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 2502656. Throughput: 0: nan. Samples: 104. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 12:40:42,972][129156] Avg episode reward: [(0, '1.194')] [2024-11-07 12:40:43,443][132055] Decorrelating experience for 96 frames... [2024-11-07 12:40:44,214][132031] Signal inference workers to stop experience collection... [2024-11-07 12:40:44,231][132047] InferenceWorker_p0-w0: stopping experience collection [2024-11-07 12:40:46,259][129156] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 129156], exiting... [2024-11-07 12:40:46,260][132031] Stopping Batcher_0... [2024-11-07 12:40:46,261][132031] Loop batcher_evt_loop terminating... [2024-11-07 12:40:46,261][129156] Runner profile tree view: main_loop: 14.3520 [2024-11-07 12:40:46,262][129156] Collected {0: 2502656}, FPS: 0.0 [2024-11-07 12:40:46,277][132047] Weights refcount: 2 0 [2024-11-07 12:40:46,280][132047] Stopping InferenceWorker_p0-w0... [2024-11-07 12:40:46,280][132047] Loop inference_proc0-0_evt_loop terminating... [2024-11-07 12:40:46,344][132052] Stopping RolloutWorker_w4... [2024-11-07 12:40:46,344][132052] Loop rollout_proc4_evt_loop terminating... [2024-11-07 12:40:46,411][132048] Stopping RolloutWorker_w0... [2024-11-07 12:40:46,411][132048] Loop rollout_proc0_evt_loop terminating... [2024-11-07 12:40:46,419][132055] Stopping RolloutWorker_w6... [2024-11-07 12:40:46,420][132055] Loop rollout_proc6_evt_loop terminating... [2024-11-07 12:40:46,445][132053] Stopping RolloutWorker_w5... [2024-11-07 12:40:46,446][132053] Loop rollout_proc5_evt_loop terminating... [2024-11-07 12:40:46,461][132050] Stopping RolloutWorker_w3... [2024-11-07 12:40:46,462][132050] Loop rollout_proc3_evt_loop terminating... [2024-11-07 12:40:46,495][132058] Stopping RolloutWorker_w7... [2024-11-07 12:40:46,497][132058] Loop rollout_proc7_evt_loop terminating... [2024-11-07 12:40:46,499][132051] Stopping RolloutWorker_w2... [2024-11-07 12:40:46,500][132051] Loop rollout_proc2_evt_loop terminating... [2024-11-07 12:40:46,560][132049] Stopping RolloutWorker_w1... [2024-11-07 12:40:46,561][132049] Loop rollout_proc1_evt_loop terminating... [2024-11-07 12:40:49,852][132031] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000613_2510848.pth... [2024-11-07 12:40:49,919][132031] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000609_2494464.pth [2024-11-07 12:40:49,924][132031] Stopping LearnerWorker_p0... [2024-11-07 12:40:49,925][132031] Loop learner_proc0_evt_loop terminating... [2024-11-07 12:42:00,143][129156] Environment doom_basic already registered, overwriting... [2024-11-07 12:42:00,146][129156] Environment doom_two_colors_easy already registered, overwriting... [2024-11-07 12:42:00,147][129156] Environment doom_two_colors_hard already registered, overwriting... [2024-11-07 12:42:00,150][129156] Environment doom_dm already registered, overwriting... [2024-11-07 12:42:00,151][129156] Environment doom_dwango5 already registered, overwriting... [2024-11-07 12:42:00,152][129156] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-11-07 12:42:00,153][129156] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-11-07 12:42:00,155][129156] Environment doom_my_way_home already registered, overwriting... [2024-11-07 12:42:00,156][129156] Environment doom_deadly_corridor already registered, overwriting... [2024-11-07 12:42:00,158][129156] Environment doom_defend_the_center already registered, overwriting... [2024-11-07 12:42:00,160][129156] Environment doom_defend_the_line already registered, overwriting... [2024-11-07 12:42:00,161][129156] Environment doom_health_gathering already registered, overwriting... [2024-11-07 12:42:00,164][129156] Environment doom_health_gathering_supreme already registered, overwriting... [2024-11-07 12:42:00,166][129156] Environment doom_battle already registered, overwriting... [2024-11-07 12:42:00,167][129156] Environment doom_battle2 already registered, overwriting... [2024-11-07 12:42:00,169][129156] Environment doom_duel_bots already registered, overwriting... [2024-11-07 12:42:00,170][129156] Environment doom_deathmatch_bots already registered, overwriting... [2024-11-07 12:42:00,172][129156] Environment doom_duel already registered, overwriting... [2024-11-07 12:42:00,176][129156] Environment doom_deathmatch_full already registered, overwriting... [2024-11-07 12:42:00,177][129156] Environment doom_benchmark already registered, overwriting... [2024-11-07 12:42:00,178][129156] register_encoder_factory: [2024-11-07 13:15:32,891][07338] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... [2024-11-07 13:15:32,915][07338] Rollout worker 0 uses device cpu [2024-11-07 13:15:32,917][07338] Rollout worker 1 uses device cpu [2024-11-07 13:15:32,919][07338] Rollout worker 2 uses device cpu [2024-11-07 13:15:32,920][07338] Rollout worker 3 uses device cpu [2024-11-07 13:15:32,922][07338] Rollout worker 4 uses device cpu [2024-11-07 13:15:32,924][07338] Rollout worker 5 uses device cpu [2024-11-07 13:15:32,926][07338] Rollout worker 6 uses device cpu [2024-11-07 13:15:32,928][07338] Rollout worker 7 uses device cpu [2024-11-07 13:15:33,385][07338] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 13:15:33,386][07338] InferenceWorker_p0-w0: min num requests: 2 [2024-11-07 13:15:33,425][07338] Starting all processes... [2024-11-07 13:15:33,428][07338] Starting process learner_proc0 [2024-11-07 13:15:33,598][07338] Starting all processes... [2024-11-07 13:15:33,669][07338] Starting process inference_proc0-0 [2024-11-07 13:15:33,670][07338] Starting process rollout_proc0 [2024-11-07 13:15:33,671][07338] Starting process rollout_proc1 [2024-11-07 13:15:33,672][07338] Starting process rollout_proc2 [2024-11-07 13:15:33,672][07338] Starting process rollout_proc3 [2024-11-07 13:15:33,673][07338] Starting process rollout_proc4 [2024-11-07 13:15:33,673][07338] Starting process rollout_proc5 [2024-11-07 13:15:33,679][07338] Starting process rollout_proc6 [2024-11-07 13:15:33,680][07338] Starting process rollout_proc7 [2024-11-07 13:15:43,393][07455] Worker 4 uses CPU cores [4] [2024-11-07 13:15:43,672][07452] Worker 1 uses CPU cores [1] [2024-11-07 13:15:44,020][07457] Worker 6 uses CPU cores [6] [2024-11-07 13:15:44,077][07437] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 13:15:44,078][07437] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-11-07 13:15:44,428][07451] Worker 0 uses CPU cores [0] [2024-11-07 13:15:44,461][07453] Worker 2 uses CPU cores [2] [2024-11-07 13:15:44,493][07437] Num visible devices: 1 [2024-11-07 13:15:44,495][07454] Worker 3 uses CPU cores [3] [2024-11-07 13:15:44,501][07456] Worker 5 uses CPU cores [5] [2024-11-07 13:15:44,510][07437] Starting seed is not provided [2024-11-07 13:15:44,510][07437] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 13:15:44,510][07437] Initializing actor-critic model on device cuda:0 [2024-11-07 13:15:44,512][07437] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 13:15:44,539][07437] RunningMeanStd input shape: (1,) [2024-11-07 13:15:44,583][07437] ConvEncoder: input_channels=3 [2024-11-07 13:15:44,637][07450] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 13:15:44,637][07450] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-11-07 13:15:44,794][07450] Num visible devices: 1 [2024-11-07 13:15:44,972][07458] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] [2024-11-07 13:15:46,129][07437] Conv encoder output size: 512 [2024-11-07 13:15:46,130][07437] Policy head output size: 512 [2024-11-07 13:15:46,689][07437] Created Actor Critic model with architecture: [2024-11-07 13:15:46,690][07437] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-11-07 13:15:49,286][07437] Using optimizer [2024-11-07 13:15:53,386][07338] Heartbeat connected on InferenceWorker_p0-w0 [2024-11-07 13:15:53,392][07338] Heartbeat connected on RolloutWorker_w0 [2024-11-07 13:15:53,396][07338] Heartbeat connected on RolloutWorker_w1 [2024-11-07 13:15:53,399][07338] Heartbeat connected on RolloutWorker_w2 [2024-11-07 13:15:53,405][07338] Heartbeat connected on RolloutWorker_w3 [2024-11-07 13:15:53,410][07338] Heartbeat connected on RolloutWorker_w4 [2024-11-07 13:15:53,417][07338] Heartbeat connected on RolloutWorker_w5 [2024-11-07 13:15:53,420][07338] Heartbeat connected on RolloutWorker_w6 [2024-11-07 13:15:53,425][07338] Heartbeat connected on RolloutWorker_w7 [2024-11-07 13:15:54,978][07338] Heartbeat connected on Batcher_0 [2024-11-07 13:15:58,478][07437] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000613_2510848.pth... [2024-11-07 13:15:59,119][07437] Loading model from checkpoint [2024-11-07 13:15:59,121][07437] Loaded experiment state at self.train_step=613, self.env_steps=2510848 [2024-11-07 13:15:59,183][07437] Initialized policy 0 weights for model version 613 [2024-11-07 13:15:59,194][07437] LearnerWorker_p0 finished initialization! [2024-11-07 13:15:59,194][07437] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 13:15:59,195][07338] Heartbeat connected on LearnerWorker_p0 [2024-11-07 13:15:59,685][07450] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 13:15:59,689][07450] RunningMeanStd input shape: (1,) [2024-11-07 13:15:59,750][07450] ConvEncoder: input_channels=3 [2024-11-07 13:16:00,208][07450] Conv encoder output size: 512 [2024-11-07 13:16:00,209][07450] Policy head output size: 512 [2024-11-07 13:16:00,283][07338] Inference worker 0-0 is ready! [2024-11-07 13:16:00,285][07338] All inference workers are ready! Signal rollout workers to start! [2024-11-07 13:16:00,894][07456] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:16:00,900][07454] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:16:00,941][07452] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:16:00,964][07453] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:16:00,985][07451] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:16:00,993][07457] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:16:01,056][07455] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:16:01,412][07458] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:16:02,891][07338] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 2510848. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 13:16:05,134][07456] Decorrelating experience for 0 frames... [2024-11-07 13:16:05,135][07453] Decorrelating experience for 0 frames... [2024-11-07 13:16:05,134][07454] Decorrelating experience for 0 frames... [2024-11-07 13:16:05,135][07455] Decorrelating experience for 0 frames... [2024-11-07 13:16:05,136][07452] Decorrelating experience for 0 frames... [2024-11-07 13:16:05,144][07457] Decorrelating experience for 0 frames... [2024-11-07 13:16:05,134][07458] Decorrelating experience for 0 frames... [2024-11-07 13:16:05,674][07451] Decorrelating experience for 0 frames... [2024-11-07 13:16:05,848][07453] Decorrelating experience for 32 frames... [2024-11-07 13:16:05,848][07455] Decorrelating experience for 32 frames... [2024-11-07 13:16:05,876][07452] Decorrelating experience for 32 frames... [2024-11-07 13:16:05,887][07456] Decorrelating experience for 32 frames... [2024-11-07 13:16:05,897][07454] Decorrelating experience for 32 frames... [2024-11-07 13:16:06,245][07457] Decorrelating experience for 32 frames... [2024-11-07 13:16:06,261][07451] Decorrelating experience for 32 frames... [2024-11-07 13:16:06,619][07453] Decorrelating experience for 64 frames... [2024-11-07 13:16:06,624][07455] Decorrelating experience for 64 frames... [2024-11-07 13:16:06,670][07452] Decorrelating experience for 64 frames... [2024-11-07 13:16:06,834][07456] Decorrelating experience for 64 frames... [2024-11-07 13:16:07,019][07457] Decorrelating experience for 64 frames... [2024-11-07 13:16:07,115][07454] Decorrelating experience for 64 frames... [2024-11-07 13:16:07,267][07455] Decorrelating experience for 96 frames... [2024-11-07 13:16:07,786][07458] Decorrelating experience for 32 frames... [2024-11-07 13:16:07,892][07338] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 2510848. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 13:16:08,164][07453] Decorrelating experience for 96 frames... [2024-11-07 13:16:08,292][07454] Decorrelating experience for 96 frames... [2024-11-07 13:16:08,644][07457] Decorrelating experience for 96 frames... [2024-11-07 13:16:08,976][07452] Decorrelating experience for 96 frames... [2024-11-07 13:16:08,978][07458] Decorrelating experience for 64 frames... [2024-11-07 13:16:09,105][07451] Decorrelating experience for 64 frames... [2024-11-07 13:16:09,656][07458] Decorrelating experience for 96 frames... [2024-11-07 13:16:09,688][07456] Decorrelating experience for 96 frames... [2024-11-07 13:16:09,713][07451] Decorrelating experience for 96 frames... [2024-11-07 13:16:12,892][07338] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 2510848. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 13:16:17,891][07338] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 2510848. Throughput: 0: 5.5. Samples: 82. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 13:16:17,893][07338] Avg episode reward: [(0, '0.590')] [2024-11-07 13:16:18,994][07437] Signal inference workers to stop experience collection... [2024-11-07 13:16:19,022][07450] InferenceWorker_p0-w0: stopping experience collection [2024-11-07 13:16:22,892][07338] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 2510848. Throughput: 0: 105.1. Samples: 2102. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 13:16:22,894][07338] Avg episode reward: [(0, '1.991')] [2024-11-07 13:16:27,891][07338] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 2510848. Throughput: 0: 84.1. Samples: 2102. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 13:16:27,894][07338] Avg episode reward: [(0, '1.991')] [2024-11-07 13:16:32,892][07338] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 2510848. Throughput: 0: 70.1. Samples: 2102. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 13:16:32,894][07338] Avg episode reward: [(0, '1.991')] [2024-11-07 13:16:37,892][07338] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 2510848. Throughput: 0: 60.1. Samples: 2102. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 13:16:37,893][07338] Avg episode reward: [(0, '1.991')] [2024-11-07 13:16:42,510][07437] Signal inference workers to resume experience collection... [2024-11-07 13:16:42,512][07450] InferenceWorker_p0-w0: resuming experience collection [2024-11-07 13:16:42,892][07338] Fps is (10 sec: 409.6, 60 sec: 102.4, 300 sec: 102.4). Total num frames: 2514944. Throughput: 0: 52.5. Samples: 2102. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-11-07 13:16:42,895][07338] Avg episode reward: [(0, '1.991')] [2024-11-07 13:16:47,891][07338] Fps is (10 sec: 3276.9, 60 sec: 728.2, 300 sec: 728.2). Total num frames: 2543616. Throughput: 0: 113.3. Samples: 5098. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 13:16:47,894][07338] Avg episode reward: [(0, '3.847')] [2024-11-07 13:16:48,812][07450] Updated weights for policy 0, policy_version 623 (0.0047) [2024-11-07 13:16:52,892][07338] Fps is (10 sec: 6144.1, 60 sec: 1310.7, 300 sec: 1310.7). Total num frames: 2576384. Throughput: 0: 341.2. Samples: 15356. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 13:16:52,970][07338] Avg episode reward: [(0, '4.473')] [2024-11-07 13:16:55,293][07450] Updated weights for policy 0, policy_version 633 (0.0034) [2024-11-07 13:16:57,891][07338] Fps is (10 sec: 6553.6, 60 sec: 1787.3, 300 sec: 1787.3). Total num frames: 2609152. Throughput: 0: 539.4. Samples: 24272. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 13:16:57,895][07338] Avg episode reward: [(0, '4.368')] [2024-11-07 13:17:01,853][07450] Updated weights for policy 0, policy_version 643 (0.0027) [2024-11-07 13:17:03,537][07338] Fps is (10 sec: 5771.7, 60 sec: 2093.8, 300 sec: 2093.8). Total num frames: 2637824. Throughput: 0: 638.7. Samples: 29236. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 13:17:03,538][07338] Avg episode reward: [(0, '4.472')] [2024-11-07 13:17:03,725][07338] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 7338], exiting... [2024-11-07 13:17:03,727][07338] Runner profile tree view: main_loop: 90.3018 [2024-11-07 13:17:03,728][07338] Collected {0: 2637824}, FPS: 1406.1 [2024-11-07 13:17:03,760][07437] Stopping Batcher_0... [2024-11-07 13:17:03,762][07437] Loop batcher_evt_loop terminating... [2024-11-07 13:17:03,824][07437] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000645_2641920.pth... [2024-11-07 13:17:04,574][07458] Stopping RolloutWorker_w7... [2024-11-07 13:17:04,575][07458] Loop rollout_proc7_evt_loop terminating... [2024-11-07 13:17:04,588][07457] Stopping RolloutWorker_w6... [2024-11-07 13:17:04,588][07455] Stopping RolloutWorker_w4... [2024-11-07 13:17:04,589][07455] Loop rollout_proc4_evt_loop terminating... [2024-11-07 13:17:04,589][07457] Loop rollout_proc6_evt_loop terminating... [2024-11-07 13:17:04,589][07452] Stopping RolloutWorker_w1... [2024-11-07 13:17:04,590][07456] Stopping RolloutWorker_w5... [2024-11-07 13:17:04,590][07452] Loop rollout_proc1_evt_loop terminating... [2024-11-07 13:17:04,590][07456] Loop rollout_proc5_evt_loop terminating... [2024-11-07 13:17:04,590][07451] Stopping RolloutWorker_w0... [2024-11-07 13:17:04,591][07451] Loop rollout_proc0_evt_loop terminating... [2024-11-07 13:17:04,600][07453] Stopping RolloutWorker_w2... [2024-11-07 13:17:04,600][07454] Stopping RolloutWorker_w3... [2024-11-07 13:17:04,601][07454] Loop rollout_proc3_evt_loop terminating... [2024-11-07 13:17:04,601][07453] Loop rollout_proc2_evt_loop terminating... [2024-11-07 13:17:04,602][07437] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000611_2502656.pth [2024-11-07 13:17:04,603][07437] Stopping LearnerWorker_p0... [2024-11-07 13:17:04,604][07437] Loop learner_proc0_evt_loop terminating... [2024-11-07 13:17:04,773][07450] Weights refcount: 2 0 [2024-11-07 13:17:04,899][07450] Stopping InferenceWorker_p0-w0... [2024-11-07 13:17:04,900][07450] Loop inference_proc0-0_evt_loop terminating... [2024-11-07 13:18:34,922][08210] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... [2024-11-07 13:18:34,924][08210] Rollout worker 0 uses device cpu [2024-11-07 13:18:34,925][08210] Rollout worker 1 uses device cpu [2024-11-07 13:18:34,925][08210] Rollout worker 2 uses device cpu [2024-11-07 13:18:34,927][08210] Rollout worker 3 uses device cpu [2024-11-07 13:18:34,929][08210] Rollout worker 4 uses device cpu [2024-11-07 13:18:34,929][08210] Rollout worker 5 uses device cpu [2024-11-07 13:18:34,931][08210] Rollout worker 6 uses device cpu [2024-11-07 13:18:34,933][08210] Rollout worker 7 uses device cpu [2024-11-07 13:18:35,019][08210] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 13:18:35,021][08210] InferenceWorker_p0-w0: min num requests: 2 [2024-11-07 13:18:35,053][08210] Starting all processes... [2024-11-07 13:18:35,054][08210] Starting process learner_proc0 [2024-11-07 13:18:35,234][08210] Starting all processes... [2024-11-07 13:18:35,283][08210] Starting process inference_proc0-0 [2024-11-07 13:18:35,284][08210] Starting process rollout_proc0 [2024-11-07 13:18:35,284][08210] Starting process rollout_proc1 [2024-11-07 13:18:35,285][08210] Starting process rollout_proc2 [2024-11-07 13:18:35,285][08210] Starting process rollout_proc3 [2024-11-07 13:18:35,289][08210] Starting process rollout_proc4 [2024-11-07 13:18:35,294][08210] Starting process rollout_proc5 [2024-11-07 13:18:35,295][08210] Starting process rollout_proc6 [2024-11-07 13:18:35,300][08210] Starting process rollout_proc7 [2024-11-07 13:18:40,907][08464] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 13:18:40,907][08464] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-11-07 13:18:40,908][08467] Worker 2 uses CPU cores [2] [2024-11-07 13:18:40,909][08465] Worker 0 uses CPU cores [0] [2024-11-07 13:18:40,912][08471] Worker 5 uses CPU cores [5] [2024-11-07 13:18:41,023][08464] Num visible devices: 1 [2024-11-07 13:18:41,259][08472] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] [2024-11-07 13:18:41,407][08468] Worker 3 uses CPU cores [3] [2024-11-07 13:18:41,417][08466] Worker 1 uses CPU cores [1] [2024-11-07 13:18:41,471][08451] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 13:18:41,471][08451] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-11-07 13:18:41,480][08470] Worker 6 uses CPU cores [6] [2024-11-07 13:18:41,496][08451] Num visible devices: 1 [2024-11-07 13:18:41,508][08451] Starting seed is not provided [2024-11-07 13:18:41,509][08451] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 13:18:41,509][08451] Initializing actor-critic model on device cuda:0 [2024-11-07 13:18:41,510][08451] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 13:18:41,511][08451] RunningMeanStd input shape: (1,) [2024-11-07 13:18:41,529][08451] ConvEncoder: input_channels=3 [2024-11-07 13:18:41,562][08469] Worker 4 uses CPU cores [4] [2024-11-07 13:18:43,335][08451] Conv encoder output size: 512 [2024-11-07 13:18:43,335][08451] Policy head output size: 512 [2024-11-07 13:18:43,699][08451] Created Actor Critic model with architecture: [2024-11-07 13:18:43,699][08451] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-11-07 13:18:45,204][08451] Using optimizer [2024-11-07 13:18:53,245][08451] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000645_2641920.pth... [2024-11-07 13:18:53,333][08451] Loading model from checkpoint [2024-11-07 13:18:53,336][08451] Loaded experiment state at self.train_step=645, self.env_steps=2641920 [2024-11-07 13:18:53,337][08451] Initialized policy 0 weights for model version 645 [2024-11-07 13:18:53,346][08451] LearnerWorker_p0 finished initialization! [2024-11-07 13:18:53,347][08451] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 13:18:53,607][08464] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 13:18:53,609][08464] RunningMeanStd input shape: (1,) [2024-11-07 13:18:53,624][08464] ConvEncoder: input_channels=3 [2024-11-07 13:18:53,742][08464] Conv encoder output size: 512 [2024-11-07 13:18:53,742][08464] Policy head output size: 512 [2024-11-07 13:18:53,788][08210] Inference worker 0-0 is ready! [2024-11-07 13:18:53,790][08210] All inference workers are ready! Signal rollout workers to start! [2024-11-07 13:18:53,863][08469] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:18:53,871][08465] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:18:53,870][08467] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:18:53,873][08468] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:18:53,887][08466] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:18:53,893][08470] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:18:53,939][08471] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:18:53,942][08472] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:18:54,932][08210] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 2641920. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 13:18:55,011][08210] Heartbeat connected on Batcher_0 [2024-11-07 13:18:55,015][08210] Heartbeat connected on LearnerWorker_p0 [2024-11-07 13:18:55,070][08210] Heartbeat connected on InferenceWorker_p0-w0 [2024-11-07 13:18:56,575][08468] Decorrelating experience for 0 frames... [2024-11-07 13:18:56,576][08471] Decorrelating experience for 0 frames... [2024-11-07 13:18:56,576][08470] Decorrelating experience for 0 frames... [2024-11-07 13:18:56,581][08467] Decorrelating experience for 0 frames... [2024-11-07 13:18:56,605][08469] Decorrelating experience for 0 frames... [2024-11-07 13:18:56,950][08468] Decorrelating experience for 32 frames... [2024-11-07 13:18:56,958][08472] Decorrelating experience for 0 frames... [2024-11-07 13:18:56,975][08467] Decorrelating experience for 32 frames... [2024-11-07 13:18:56,977][08466] Decorrelating experience for 0 frames... [2024-11-07 13:18:57,011][08465] Decorrelating experience for 0 frames... [2024-11-07 13:18:57,366][08469] Decorrelating experience for 32 frames... [2024-11-07 13:18:57,369][08471] Decorrelating experience for 32 frames... [2024-11-07 13:18:57,373][08472] Decorrelating experience for 32 frames... [2024-11-07 13:18:57,440][08465] Decorrelating experience for 32 frames... [2024-11-07 13:18:57,500][08467] Decorrelating experience for 64 frames... [2024-11-07 13:18:57,526][08470] Decorrelating experience for 32 frames... [2024-11-07 13:18:57,791][08466] Decorrelating experience for 32 frames... [2024-11-07 13:18:57,879][08472] Decorrelating experience for 64 frames... [2024-11-07 13:18:57,883][08471] Decorrelating experience for 64 frames... [2024-11-07 13:18:57,938][08467] Decorrelating experience for 96 frames... [2024-11-07 13:18:58,010][08210] Heartbeat connected on RolloutWorker_w2 [2024-11-07 13:18:58,163][08469] Decorrelating experience for 64 frames... [2024-11-07 13:18:58,383][08470] Decorrelating experience for 64 frames... [2024-11-07 13:18:58,385][08471] Decorrelating experience for 96 frames... [2024-11-07 13:18:58,416][08466] Decorrelating experience for 64 frames... [2024-11-07 13:18:58,472][08472] Decorrelating experience for 96 frames... [2024-11-07 13:18:58,517][08210] Heartbeat connected on RolloutWorker_w5 [2024-11-07 13:18:58,602][08210] Heartbeat connected on RolloutWorker_w7 [2024-11-07 13:18:58,641][08468] Decorrelating experience for 64 frames... [2024-11-07 13:18:58,688][08469] Decorrelating experience for 96 frames... [2024-11-07 13:18:58,759][08210] Heartbeat connected on RolloutWorker_w4 [2024-11-07 13:18:58,861][08470] Decorrelating experience for 96 frames... [2024-11-07 13:18:58,922][08210] Heartbeat connected on RolloutWorker_w6 [2024-11-07 13:18:59,109][08466] Decorrelating experience for 96 frames... [2024-11-07 13:18:59,113][08468] Decorrelating experience for 96 frames... [2024-11-07 13:18:59,114][08465] Decorrelating experience for 64 frames... [2024-11-07 13:18:59,233][08210] Heartbeat connected on RolloutWorker_w1 [2024-11-07 13:18:59,237][08210] Heartbeat connected on RolloutWorker_w3 [2024-11-07 13:18:59,731][08465] Decorrelating experience for 96 frames... [2024-11-07 13:18:59,816][08210] Heartbeat connected on RolloutWorker_w0 [2024-11-07 13:18:59,931][08210] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 2641920. Throughput: 0: 138.0. Samples: 690. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 13:18:59,933][08210] Avg episode reward: [(0, '0.640')] [2024-11-07 13:19:01,468][08451] Signal inference workers to stop experience collection... [2024-11-07 13:19:01,481][08464] InferenceWorker_p0-w0: stopping experience collection [2024-11-07 13:19:04,931][08210] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 2641920. Throughput: 0: 286.0. Samples: 2860. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 13:19:04,933][08210] Avg episode reward: [(0, '1.968')] [2024-11-07 13:19:09,603][08210] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 8210], exiting... [2024-11-07 13:19:09,606][08451] Stopping Batcher_0... [2024-11-07 13:19:09,607][08451] Loop batcher_evt_loop terminating... [2024-11-07 13:19:09,606][08210] Runner profile tree view: main_loop: 34.5527 [2024-11-07 13:19:09,611][08210] Collected {0: 2641920}, FPS: 0.0 [2024-11-07 13:19:09,650][08464] Weights refcount: 2 0 [2024-11-07 13:19:09,653][08464] Stopping InferenceWorker_p0-w0... [2024-11-07 13:19:09,653][08464] Loop inference_proc0-0_evt_loop terminating... [2024-11-07 13:19:09,747][08466] Stopping RolloutWorker_w1... [2024-11-07 13:19:09,747][08466] Loop rollout_proc1_evt_loop terminating... [2024-11-07 13:19:09,749][08471] Stopping RolloutWorker_w5... [2024-11-07 13:19:09,750][08471] Loop rollout_proc5_evt_loop terminating... [2024-11-07 13:19:09,800][08469] Stopping RolloutWorker_w4... [2024-11-07 13:19:09,801][08469] Loop rollout_proc4_evt_loop terminating... [2024-11-07 13:19:09,837][08470] Stopping RolloutWorker_w6... [2024-11-07 13:19:09,842][08470] Loop rollout_proc6_evt_loop terminating... [2024-11-07 13:19:09,862][08472] Stopping RolloutWorker_w7... [2024-11-07 13:19:09,862][08472] Loop rollout_proc7_evt_loop terminating... [2024-11-07 13:19:09,869][08467] Stopping RolloutWorker_w2... [2024-11-07 13:19:09,870][08467] Loop rollout_proc2_evt_loop terminating... [2024-11-07 13:19:09,900][08468] Stopping RolloutWorker_w3... [2024-11-07 13:19:09,901][08468] Loop rollout_proc3_evt_loop terminating... [2024-11-07 13:19:09,962][08465] Stopping RolloutWorker_w0... [2024-11-07 13:19:09,963][08465] Loop rollout_proc0_evt_loop terminating... [2024-11-07 13:19:13,550][08451] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000647_2650112.pth... [2024-11-07 13:19:13,648][08451] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000613_2510848.pth [2024-11-07 13:19:13,653][08451] Stopping LearnerWorker_p0... [2024-11-07 13:19:13,655][08451] Loop learner_proc0_evt_loop terminating... [2024-11-07 13:23:18,807][09379] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... [2024-11-07 13:23:18,808][09379] Rollout worker 0 uses device cpu [2024-11-07 13:23:18,809][09379] Rollout worker 1 uses device cpu [2024-11-07 13:23:18,811][09379] Rollout worker 2 uses device cpu [2024-11-07 13:23:18,812][09379] Rollout worker 3 uses device cpu [2024-11-07 13:23:18,813][09379] Rollout worker 4 uses device cpu [2024-11-07 13:23:18,814][09379] Rollout worker 5 uses device cpu [2024-11-07 13:23:18,815][09379] Rollout worker 6 uses device cpu [2024-11-07 13:23:18,816][09379] Rollout worker 7 uses device cpu [2024-11-07 13:23:18,871][09379] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 13:23:18,872][09379] InferenceWorker_p0-w0: min num requests: 2 [2024-11-07 13:23:18,907][09379] Starting all processes... [2024-11-07 13:23:18,908][09379] Starting process learner_proc0 [2024-11-07 13:23:19,009][09379] Starting all processes... [2024-11-07 13:23:19,017][09379] Starting process inference_proc0-0 [2024-11-07 13:23:19,018][09379] Starting process rollout_proc0 [2024-11-07 13:23:19,019][09379] Starting process rollout_proc1 [2024-11-07 13:23:19,021][09379] Starting process rollout_proc2 [2024-11-07 13:23:19,022][09379] Starting process rollout_proc3 [2024-11-07 13:23:19,024][09379] Starting process rollout_proc4 [2024-11-07 13:23:19,027][09379] Starting process rollout_proc5 [2024-11-07 13:23:19,027][09379] Starting process rollout_proc6 [2024-11-07 13:23:19,028][09379] Starting process rollout_proc7 [2024-11-07 13:23:25,787][09667] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 13:23:25,788][09667] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-11-07 13:23:25,968][09689] Worker 2 uses CPU cores [2] [2024-11-07 13:23:26,036][09692] Worker 5 uses CPU cores [5] [2024-11-07 13:23:26,074][09667] Num visible devices: 1 [2024-11-07 13:23:26,179][09667] Starting seed is not provided [2024-11-07 13:23:26,179][09667] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 13:23:26,183][09667] Initializing actor-critic model on device cuda:0 [2024-11-07 13:23:26,184][09667] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 13:23:26,185][09667] RunningMeanStd input shape: (1,) [2024-11-07 13:23:26,220][09667] ConvEncoder: input_channels=3 [2024-11-07 13:23:26,424][09691] Worker 4 uses CPU cores [4] [2024-11-07 13:23:26,608][09667] Conv encoder output size: 512 [2024-11-07 13:23:26,609][09667] Policy head output size: 512 [2024-11-07 13:23:26,646][09667] Created Actor Critic model with architecture: [2024-11-07 13:23:26,646][09667] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-11-07 13:23:27,048][09688] Worker 1 uses CPU cores [1] [2024-11-07 13:23:27,431][09680] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 13:23:27,432][09680] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-11-07 13:23:27,480][09680] Num visible devices: 1 [2024-11-07 13:23:27,534][09690] Worker 3 uses CPU cores [3] [2024-11-07 13:23:27,633][09694] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] [2024-11-07 13:23:27,678][09693] Worker 6 uses CPU cores [6] [2024-11-07 13:23:27,704][09667] Using optimizer [2024-11-07 13:23:27,848][09687] Worker 0 uses CPU cores [0] [2024-11-07 13:23:28,869][09667] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000647_2650112.pth... [2024-11-07 13:23:28,945][09667] Loading model from checkpoint [2024-11-07 13:23:28,947][09667] Loaded experiment state at self.train_step=647, self.env_steps=2650112 [2024-11-07 13:23:28,948][09667] Initialized policy 0 weights for model version 647 [2024-11-07 13:23:28,957][09667] LearnerWorker_p0 finished initialization! [2024-11-07 13:23:28,957][09667] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 13:23:29,118][09680] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 13:23:29,119][09680] RunningMeanStd input shape: (1,) [2024-11-07 13:23:29,131][09680] ConvEncoder: input_channels=3 [2024-11-07 13:23:29,243][09680] Conv encoder output size: 512 [2024-11-07 13:23:29,243][09680] Policy head output size: 512 [2024-11-07 13:23:29,288][09379] Inference worker 0-0 is ready! [2024-11-07 13:23:29,290][09379] All inference workers are ready! Signal rollout workers to start! [2024-11-07 13:23:29,358][09691] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:23:29,363][09689] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:23:29,370][09688] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:23:29,377][09690] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:23:29,384][09692] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:23:29,387][09693] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:23:29,417][09687] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:23:29,419][09694] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:23:29,902][09693] Decorrelating experience for 0 frames... [2024-11-07 13:23:29,906][09688] Decorrelating experience for 0 frames... [2024-11-07 13:23:29,907][09689] Decorrelating experience for 0 frames... [2024-11-07 13:23:29,920][09691] Decorrelating experience for 0 frames... [2024-11-07 13:23:30,219][09689] Decorrelating experience for 32 frames... [2024-11-07 13:23:30,219][09692] Decorrelating experience for 0 frames... [2024-11-07 13:23:30,268][09691] Decorrelating experience for 32 frames... [2024-11-07 13:23:30,272][09690] Decorrelating experience for 0 frames... [2024-11-07 13:23:30,558][09692] Decorrelating experience for 32 frames... [2024-11-07 13:23:30,627][09690] Decorrelating experience for 32 frames... [2024-11-07 13:23:30,668][09694] Decorrelating experience for 0 frames... [2024-11-07 13:23:30,700][09693] Decorrelating experience for 32 frames... [2024-11-07 13:23:30,735][09691] Decorrelating experience for 64 frames... [2024-11-07 13:23:31,049][09687] Decorrelating experience for 0 frames... [2024-11-07 13:23:31,224][09689] Decorrelating experience for 64 frames... [2024-11-07 13:23:31,231][09692] Decorrelating experience for 64 frames... [2024-11-07 13:23:31,256][09690] Decorrelating experience for 64 frames... [2024-11-07 13:23:31,270][09688] Decorrelating experience for 32 frames... [2024-11-07 13:23:31,347][09691] Decorrelating experience for 96 frames... [2024-11-07 13:23:31,639][09694] Decorrelating experience for 32 frames... [2024-11-07 13:23:31,693][09689] Decorrelating experience for 96 frames... [2024-11-07 13:23:31,729][09690] Decorrelating experience for 96 frames... [2024-11-07 13:23:31,788][09692] Decorrelating experience for 96 frames... [2024-11-07 13:23:31,844][09693] Decorrelating experience for 64 frames... [2024-11-07 13:23:31,889][09687] Decorrelating experience for 32 frames... [2024-11-07 13:23:32,168][09694] Decorrelating experience for 64 frames... [2024-11-07 13:23:32,323][09693] Decorrelating experience for 96 frames... [2024-11-07 13:23:32,363][09688] Decorrelating experience for 64 frames... [2024-11-07 13:23:32,842][09687] Decorrelating experience for 64 frames... [2024-11-07 13:23:33,039][09694] Decorrelating experience for 96 frames... [2024-11-07 13:23:33,344][09688] Decorrelating experience for 96 frames... [2024-11-07 13:23:33,526][09687] Decorrelating experience for 96 frames... [2024-11-07 13:23:33,808][09379] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 2650112. Throughput: 0: nan. Samples: 108. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 13:23:33,810][09379] Avg episode reward: [(0, '1.508')] [2024-11-07 13:23:34,954][09667] Signal inference workers to stop experience collection... [2024-11-07 13:23:34,966][09680] InferenceWorker_p0-w0: stopping experience collection [2024-11-07 13:23:38,807][09379] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 2650112. Throughput: 0: 491.2. Samples: 2564. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 13:23:38,809][09379] Avg episode reward: [(0, '2.011')] [2024-11-07 13:23:39,687][09379] Heartbeat connected on RolloutWorker_w3 [2024-11-07 13:23:39,692][09379] Heartbeat connected on RolloutWorker_w1 [2024-11-07 13:23:39,693][09379] Heartbeat connected on RolloutWorker_w0 [2024-11-07 13:23:39,695][09379] Heartbeat connected on RolloutWorker_w5 [2024-11-07 13:23:39,696][09379] Heartbeat connected on RolloutWorker_w2 [2024-11-07 13:23:39,699][09379] Heartbeat connected on Batcher_0 [2024-11-07 13:23:39,701][09379] Heartbeat connected on InferenceWorker_p0-w0 [2024-11-07 13:23:39,704][09379] Heartbeat connected on RolloutWorker_w7 [2024-11-07 13:23:39,711][09379] Heartbeat connected on RolloutWorker_w6 [2024-11-07 13:23:39,713][09379] Heartbeat connected on RolloutWorker_w4 [2024-11-07 13:23:43,239][09667] Signal inference workers to resume experience collection... [2024-11-07 13:23:43,240][09680] InferenceWorker_p0-w0: resuming experience collection [2024-11-07 13:23:43,722][09379] Heartbeat connected on LearnerWorker_p0 [2024-11-07 13:23:43,807][09379] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2658304. Throughput: 0: 245.6. Samples: 2564. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-11-07 13:23:43,812][09379] Avg episode reward: [(0, '2.815')] [2024-11-07 13:23:48,807][09379] Fps is (10 sec: 3686.3, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 2686976. Throughput: 0: 516.8. Samples: 7860. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-07 13:23:48,810][09379] Avg episode reward: [(0, '3.854')] [2024-11-07 13:23:49,227][09680] Updated weights for policy 0, policy_version 657 (0.0145) [2024-11-07 13:23:53,808][09379] Fps is (10 sec: 5734.4, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 2715648. Throughput: 0: 818.3. Samples: 16474. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 13:23:53,810][09379] Avg episode reward: [(0, '4.550')] [2024-11-07 13:23:56,301][09680] Updated weights for policy 0, policy_version 667 (0.0025) [2024-11-07 13:23:58,807][09379] Fps is (10 sec: 5734.5, 60 sec: 3768.4, 300 sec: 3768.4). Total num frames: 2744320. Throughput: 0: 838.2. Samples: 21062. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 13:23:58,809][09379] Avg episode reward: [(0, '4.253')] [2024-11-07 13:24:03,808][09379] Fps is (10 sec: 5324.7, 60 sec: 3959.5, 300 sec: 3959.5). Total num frames: 2768896. Throughput: 0: 961.1. Samples: 28942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-07 13:24:03,811][09379] Avg episode reward: [(0, '4.408')] [2024-11-07 13:24:03,980][09680] Updated weights for policy 0, policy_version 677 (0.0052) [2024-11-07 13:24:08,807][09379] Fps is (10 sec: 5324.7, 60 sec: 4213.0, 300 sec: 4213.0). Total num frames: 2797568. Throughput: 0: 1070.4. Samples: 37572. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 13:24:08,809][09379] Avg episode reward: [(0, '4.407')] [2024-11-07 13:24:13,168][09680] Updated weights for policy 0, policy_version 687 (0.0044) [2024-11-07 13:24:13,808][09379] Fps is (10 sec: 4505.2, 60 sec: 4095.9, 300 sec: 4095.9). Total num frames: 2813952. Throughput: 0: 1009.0. Samples: 40470. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 13:24:13,826][09379] Avg episode reward: [(0, '4.455')] [2024-11-07 13:24:18,813][09379] Fps is (10 sec: 4503.2, 60 sec: 4277.6, 300 sec: 4277.6). Total num frames: 2842624. Throughput: 0: 1037.6. Samples: 46806. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 13:24:18,814][09379] Avg episode reward: [(0, '4.384')] [2024-11-07 13:24:20,307][09680] Updated weights for policy 0, policy_version 697 (0.0037) [2024-11-07 13:24:23,808][09379] Fps is (10 sec: 6144.6, 60 sec: 4505.6, 300 sec: 4505.6). Total num frames: 2875392. Throughput: 0: 1211.8. Samples: 57094. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 13:24:23,809][09379] Avg episode reward: [(0, '4.468')] [2024-11-07 13:24:26,144][09680] Updated weights for policy 0, policy_version 707 (0.0054) [2024-11-07 13:24:28,808][09379] Fps is (10 sec: 6966.5, 60 sec: 4766.2, 300 sec: 4766.2). Total num frames: 2912256. Throughput: 0: 1328.8. Samples: 62362. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 13:24:28,811][09379] Avg episode reward: [(0, '4.492')] [2024-11-07 13:24:32,975][09680] Updated weights for policy 0, policy_version 717 (0.0042) [2024-11-07 13:24:33,808][09379] Fps is (10 sec: 6144.0, 60 sec: 4778.7, 300 sec: 4778.7). Total num frames: 2936832. Throughput: 0: 1416.7. Samples: 71610. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 13:24:33,810][09379] Avg episode reward: [(0, '4.530')] [2024-11-07 13:24:38,807][09379] Fps is (10 sec: 5734.7, 60 sec: 5324.8, 300 sec: 4915.2). Total num frames: 2969600. Throughput: 0: 1418.6. Samples: 80310. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-11-07 13:24:38,809][09379] Avg episode reward: [(0, '4.484')] [2024-11-07 13:24:40,028][09680] Updated weights for policy 0, policy_version 727 (0.0044) [2024-11-07 13:24:43,808][09379] Fps is (10 sec: 6143.8, 60 sec: 5666.1, 300 sec: 4973.7). Total num frames: 2998272. Throughput: 0: 1412.3. Samples: 84614. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 13:24:43,810][09379] Avg episode reward: [(0, '4.236')] [2024-11-07 13:24:47,449][09680] Updated weights for policy 0, policy_version 737 (0.0033) [2024-11-07 13:24:48,807][09379] Fps is (10 sec: 5734.5, 60 sec: 5666.1, 300 sec: 5024.4). Total num frames: 3026944. Throughput: 0: 1418.0. Samples: 92750. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 13:24:48,809][09379] Avg episode reward: [(0, '4.404')] [2024-11-07 13:24:53,808][09379] Fps is (10 sec: 5324.9, 60 sec: 5597.9, 300 sec: 5017.6). Total num frames: 3051520. Throughput: 0: 1420.4. Samples: 101492. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 13:24:53,809][09379] Avg episode reward: [(0, '4.505')] [2024-11-07 13:24:55,139][09680] Updated weights for policy 0, policy_version 747 (0.0029) [2024-11-07 13:24:58,810][09379] Fps is (10 sec: 4504.3, 60 sec: 5461.1, 300 sec: 4963.2). Total num frames: 3072000. Throughput: 0: 1418.6. Samples: 104310. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 13:24:58,816][09379] Avg episode reward: [(0, '4.553')] [2024-11-07 13:25:03,808][09379] Fps is (10 sec: 4096.0, 60 sec: 5393.1, 300 sec: 4915.2). Total num frames: 3092480. Throughput: 0: 1403.0. Samples: 109932. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 13:25:03,810][09379] Avg episode reward: [(0, '4.461')] [2024-11-07 13:25:05,380][09680] Updated weights for policy 0, policy_version 757 (0.0053) [2024-11-07 13:25:08,808][09379] Fps is (10 sec: 4096.9, 60 sec: 5256.5, 300 sec: 4872.1). Total num frames: 3112960. Throughput: 0: 1321.0. Samples: 116540. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 13:25:08,812][09379] Avg episode reward: [(0, '4.396')] [2024-11-07 13:25:13,828][09379] Fps is (10 sec: 4496.5, 60 sec: 5391.3, 300 sec: 4873.2). Total num frames: 3137536. Throughput: 0: 1286.8. Samples: 120296. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 13:25:13,830][09379] Avg episode reward: [(0, '4.375')] [2024-11-07 13:25:13,866][09667] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000766_3137536.pth... [2024-11-07 13:25:14,472][09667] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000645_2641920.pth [2024-11-07 13:25:14,866][09680] Updated weights for policy 0, policy_version 767 (0.0033) [2024-11-07 13:25:18,903][09379] Fps is (10 sec: 4868.8, 60 sec: 5316.8, 300 sec: 4871.8). Total num frames: 3162112. Throughput: 0: 1233.0. Samples: 127214. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 13:25:18,908][09379] Avg episode reward: [(0, '4.390')] [2024-11-07 13:25:22,340][09680] Updated weights for policy 0, policy_version 777 (0.0048) [2024-11-07 13:25:23,808][09379] Fps is (10 sec: 5335.7, 60 sec: 5256.5, 300 sec: 4915.2). Total num frames: 3190784. Throughput: 0: 1221.7. Samples: 135288. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 13:25:23,810][09379] Avg episode reward: [(0, '4.556')] [2024-11-07 13:25:28,808][09379] Fps is (10 sec: 5789.6, 60 sec: 5120.0, 300 sec: 4950.8). Total num frames: 3219456. Throughput: 0: 1224.6. Samples: 139722. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 13:25:28,810][09379] Avg episode reward: [(0, '4.370')] [2024-11-07 13:25:29,662][09680] Updated weights for policy 0, policy_version 787 (0.0035) [2024-11-07 13:25:33,808][09379] Fps is (10 sec: 5324.9, 60 sec: 5120.0, 300 sec: 4949.3). Total num frames: 3244032. Throughput: 0: 1216.3. Samples: 147484. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 13:25:33,809][09379] Avg episode reward: [(0, '4.519')] [2024-11-07 13:25:38,682][09680] Updated weights for policy 0, policy_version 797 (0.0058) [2024-11-07 13:25:38,807][09379] Fps is (10 sec: 4505.8, 60 sec: 4915.2, 300 sec: 4915.2). Total num frames: 3264512. Throughput: 0: 1161.2. Samples: 153746. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 13:25:38,809][09379] Avg episode reward: [(0, '4.491')] [2024-11-07 13:25:43,807][09379] Fps is (10 sec: 5324.9, 60 sec: 4983.5, 300 sec: 4978.2). Total num frames: 3297280. Throughput: 0: 1204.4. Samples: 158504. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 13:25:43,811][09379] Avg episode reward: [(0, '4.404')] [2024-11-07 13:25:44,859][09680] Updated weights for policy 0, policy_version 807 (0.0036) [2024-11-07 13:25:48,807][09379] Fps is (10 sec: 6553.6, 60 sec: 5051.7, 300 sec: 5036.6). Total num frames: 3330048. Throughput: 0: 1305.8. Samples: 168692. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 13:25:48,810][09379] Avg episode reward: [(0, '4.359')] [2024-11-07 13:25:50,715][09680] Updated weights for policy 0, policy_version 817 (0.0028) [2024-11-07 13:25:53,809][09379] Fps is (10 sec: 6143.3, 60 sec: 5119.9, 300 sec: 5061.4). Total num frames: 3358720. Throughput: 0: 1362.7. Samples: 177862. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 13:25:53,812][09379] Avg episode reward: [(0, '4.424')] [2024-11-07 13:25:57,499][09680] Updated weights for policy 0, policy_version 827 (0.0033) [2024-11-07 13:25:58,808][09379] Fps is (10 sec: 6553.2, 60 sec: 5393.3, 300 sec: 5141.2). Total num frames: 3395584. Throughput: 0: 1392.9. Samples: 182950. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 13:25:58,811][09379] Avg episode reward: [(0, '4.532')] [2024-11-07 13:26:03,807][09379] Fps is (10 sec: 6144.7, 60 sec: 5461.4, 300 sec: 5133.7). Total num frames: 3420160. Throughput: 0: 1444.6. Samples: 192082. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 13:26:03,810][09379] Avg episode reward: [(0, '4.664')] [2024-11-07 13:26:04,512][09680] Updated weights for policy 0, policy_version 837 (0.0026) [2024-11-07 13:26:08,807][09379] Fps is (10 sec: 5325.1, 60 sec: 5597.9, 300 sec: 5153.0). Total num frames: 3448832. Throughput: 0: 1446.3. Samples: 200370. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-07 13:26:08,809][09379] Avg episode reward: [(0, '4.565')] [2024-11-07 13:26:11,306][09680] Updated weights for policy 0, policy_version 847 (0.0035) [2024-11-07 13:26:13,807][09379] Fps is (10 sec: 6144.0, 60 sec: 5736.4, 300 sec: 5196.8). Total num frames: 3481600. Throughput: 0: 1459.3. Samples: 205392. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-07 13:26:13,809][09379] Avg episode reward: [(0, '4.378')] [2024-11-07 13:26:18,016][09680] Updated weights for policy 0, policy_version 857 (0.0042) [2024-11-07 13:26:18,810][09379] Fps is (10 sec: 6143.8, 60 sec: 5811.9, 300 sec: 5213.1). Total num frames: 3510272. Throughput: 0: 1491.9. Samples: 214618. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 13:26:18,815][09379] Avg episode reward: [(0, '4.483')] [2024-11-07 13:26:23,812][09379] Fps is (10 sec: 6550.9, 60 sec: 5938.8, 300 sec: 5276.5). Total num frames: 3547136. Throughput: 0: 1572.7. Samples: 224524. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 13:26:23,816][09379] Avg episode reward: [(0, '4.256')] [2024-11-07 13:26:25,085][09680] Updated weights for policy 0, policy_version 867 (0.0039) [2024-11-07 13:26:28,807][09379] Fps is (10 sec: 5734.6, 60 sec: 5802.7, 300 sec: 5242.9). Total num frames: 3567616. Throughput: 0: 1534.2. Samples: 227542. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 13:26:28,812][09379] Avg episode reward: [(0, '4.291')] [2024-11-07 13:26:32,256][09379] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 9379], exiting... [2024-11-07 13:26:32,269][09379] Runner profile tree view: main_loop: 193.3628 [2024-11-07 13:26:32,284][09667] Stopping Batcher_0... [2024-11-07 13:26:32,285][09667] Loop batcher_evt_loop terminating... [2024-11-07 13:26:32,272][09379] Collected {0: 3588096}, FPS: 4850.9 [2024-11-07 13:26:32,366][09680] Weights refcount: 2 0 [2024-11-07 13:26:32,371][09680] Stopping InferenceWorker_p0-w0... [2024-11-07 13:26:32,372][09680] Loop inference_proc0-0_evt_loop terminating... [2024-11-07 13:26:32,392][09667] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000877_3592192.pth... [2024-11-07 13:26:32,520][09667] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000647_2650112.pth [2024-11-07 13:26:32,521][09690] Stopping RolloutWorker_w3... [2024-11-07 13:26:32,521][09690] Loop rollout_proc3_evt_loop terminating... [2024-11-07 13:26:32,525][09667] Stopping LearnerWorker_p0... [2024-11-07 13:26:32,526][09667] Loop learner_proc0_evt_loop terminating... [2024-11-07 13:26:32,530][09693] Stopping RolloutWorker_w6... [2024-11-07 13:26:32,531][09693] Loop rollout_proc6_evt_loop terminating... [2024-11-07 13:26:32,534][09692] Stopping RolloutWorker_w5... [2024-11-07 13:26:32,535][09692] Loop rollout_proc5_evt_loop terminating... [2024-11-07 13:26:32,578][09688] Stopping RolloutWorker_w1... [2024-11-07 13:26:32,580][09688] Loop rollout_proc1_evt_loop terminating... [2024-11-07 13:26:32,586][09691] Stopping RolloutWorker_w4... [2024-11-07 13:26:32,587][09691] Loop rollout_proc4_evt_loop terminating... [2024-11-07 13:26:32,714][09694] Stopping RolloutWorker_w7... [2024-11-07 13:26:32,724][09694] Loop rollout_proc7_evt_loop terminating... [2024-11-07 13:26:32,772][09689] Stopping RolloutWorker_w2... [2024-11-07 13:26:32,773][09689] Loop rollout_proc2_evt_loop terminating... [2024-11-07 13:26:32,823][09687] Stopping RolloutWorker_w0... [2024-11-07 13:26:32,884][09687] Loop rollout_proc0_evt_loop terminating... [2024-11-07 13:28:07,838][10732] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... [2024-11-07 13:28:07,840][10732] Rollout worker 0 uses device cpu [2024-11-07 13:28:07,842][10732] Rollout worker 1 uses device cpu [2024-11-07 13:28:07,842][10732] Rollout worker 2 uses device cpu [2024-11-07 13:28:07,845][10732] Rollout worker 3 uses device cpu [2024-11-07 13:28:07,848][10732] Rollout worker 4 uses device cpu [2024-11-07 13:28:07,852][10732] Rollout worker 5 uses device cpu [2024-11-07 13:28:07,853][10732] Rollout worker 6 uses device cpu [2024-11-07 13:28:07,854][10732] Rollout worker 7 uses device cpu [2024-11-07 13:28:07,916][10732] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 13:28:07,918][10732] InferenceWorker_p0-w0: min num requests: 2 [2024-11-07 13:28:07,963][10732] Starting all processes... [2024-11-07 13:28:07,964][10732] Starting process learner_proc0 [2024-11-07 13:28:08,065][10732] Starting all processes... [2024-11-07 13:28:08,074][10732] Starting process inference_proc0-0 [2024-11-07 13:28:08,074][10732] Starting process rollout_proc0 [2024-11-07 13:28:08,076][10732] Starting process rollout_proc1 [2024-11-07 13:28:08,077][10732] Starting process rollout_proc2 [2024-11-07 13:28:08,080][10732] Starting process rollout_proc3 [2024-11-07 13:28:08,183][10732] Starting process rollout_proc4 [2024-11-07 13:28:08,186][10732] Starting process rollout_proc5 [2024-11-07 13:28:08,187][10732] Starting process rollout_proc6 [2024-11-07 13:28:08,194][10732] Starting process rollout_proc7 [2024-11-07 13:28:13,465][11017] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 13:28:13,465][11017] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-11-07 13:28:13,666][11017] Num visible devices: 1 [2024-11-07 13:28:13,735][11017] Starting seed is not provided [2024-11-07 13:28:13,736][11017] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 13:28:13,736][11017] Initializing actor-critic model on device cuda:0 [2024-11-07 13:28:13,737][11017] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 13:28:13,739][11017] RunningMeanStd input shape: (1,) [2024-11-07 13:28:13,944][11017] ConvEncoder: input_channels=3 [2024-11-07 13:28:14,059][11033] Worker 1 uses CPU cores [1] [2024-11-07 13:28:14,140][11030] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 13:28:14,140][11030] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-11-07 13:28:14,169][11030] Num visible devices: 1 [2024-11-07 13:28:14,427][11017] Conv encoder output size: 512 [2024-11-07 13:28:14,433][11017] Policy head output size: 512 [2024-11-07 13:28:14,484][11017] Created Actor Critic model with architecture: [2024-11-07 13:28:14,484][11017] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-11-07 13:28:14,689][11035] Worker 5 uses CPU cores [5] [2024-11-07 13:28:14,789][11034] Worker 3 uses CPU cores [3] [2024-11-07 13:28:14,913][11037] Worker 6 uses CPU cores [6] [2024-11-07 13:28:14,986][11031] Worker 0 uses CPU cores [0] [2024-11-07 13:28:15,019][11032] Worker 2 uses CPU cores [2] [2024-11-07 13:28:15,049][11036] Worker 4 uses CPU cores [4] [2024-11-07 13:28:15,097][11038] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] [2024-11-07 13:28:15,542][11017] Using optimizer [2024-11-07 13:28:17,478][11017] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000877_3592192.pth... [2024-11-07 13:28:17,542][11017] Loading model from checkpoint [2024-11-07 13:28:17,546][11017] Loaded experiment state at self.train_step=877, self.env_steps=3592192 [2024-11-07 13:28:17,546][11017] Initialized policy 0 weights for model version 877 [2024-11-07 13:28:17,552][11017] LearnerWorker_p0 finished initialization! [2024-11-07 13:28:17,552][11017] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 13:28:17,765][11030] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 13:28:17,766][11030] RunningMeanStd input shape: (1,) [2024-11-07 13:28:17,790][11030] ConvEncoder: input_channels=3 [2024-11-07 13:28:17,962][11030] Conv encoder output size: 512 [2024-11-07 13:28:17,963][11030] Policy head output size: 512 [2024-11-07 13:28:18,029][10732] Inference worker 0-0 is ready! [2024-11-07 13:28:18,030][10732] All inference workers are ready! Signal rollout workers to start! [2024-11-07 13:28:18,116][11032] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:28:18,120][11034] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:28:18,141][11031] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:28:18,148][11033] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:28:18,149][11037] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:28:18,169][11036] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:28:18,241][11035] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:28:18,242][11038] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:28:18,777][11032] Decorrelating experience for 0 frames... [2024-11-07 13:28:18,778][11034] Decorrelating experience for 0 frames... [2024-11-07 13:28:18,779][11036] Decorrelating experience for 0 frames... [2024-11-07 13:28:18,782][11037] Decorrelating experience for 0 frames... [2024-11-07 13:28:18,835][11033] Decorrelating experience for 0 frames... [2024-11-07 13:28:19,234][11034] Decorrelating experience for 32 frames... [2024-11-07 13:28:19,251][11032] Decorrelating experience for 32 frames... [2024-11-07 13:28:19,285][11036] Decorrelating experience for 32 frames... [2024-11-07 13:28:19,362][11035] Decorrelating experience for 0 frames... [2024-11-07 13:28:19,452][11031] Decorrelating experience for 0 frames... [2024-11-07 13:28:19,497][11033] Decorrelating experience for 32 frames... [2024-11-07 13:28:19,812][11038] Decorrelating experience for 0 frames... [2024-11-07 13:28:19,849][11035] Decorrelating experience for 32 frames... [2024-11-07 13:28:19,965][11036] Decorrelating experience for 64 frames... [2024-11-07 13:28:20,085][11033] Decorrelating experience for 64 frames... [2024-11-07 13:28:20,335][11034] Decorrelating experience for 64 frames... [2024-11-07 13:28:20,397][11032] Decorrelating experience for 64 frames... [2024-11-07 13:28:20,421][11038] Decorrelating experience for 32 frames... [2024-11-07 13:28:20,513][11036] Decorrelating experience for 96 frames... [2024-11-07 13:28:20,810][11035] Decorrelating experience for 64 frames... [2024-11-07 13:28:20,917][11034] Decorrelating experience for 96 frames... [2024-11-07 13:28:20,964][11032] Decorrelating experience for 96 frames... [2024-11-07 13:28:20,997][11037] Decorrelating experience for 32 frames... [2024-11-07 13:28:21,136][11038] Decorrelating experience for 64 frames... [2024-11-07 13:28:21,137][11033] Decorrelating experience for 96 frames... [2024-11-07 13:28:21,444][11031] Decorrelating experience for 32 frames... [2024-11-07 13:28:21,567][11037] Decorrelating experience for 64 frames... [2024-11-07 13:28:21,570][11035] Decorrelating experience for 96 frames... [2024-11-07 13:28:21,912][10732] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 3592192. Throughput: 0: nan. Samples: 20. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 13:28:22,075][11031] Decorrelating experience for 64 frames... [2024-11-07 13:28:22,074][11038] Decorrelating experience for 96 frames... [2024-11-07 13:28:22,658][11031] Decorrelating experience for 96 frames... [2024-11-07 13:28:22,698][11037] Decorrelating experience for 96 frames... [2024-11-07 13:28:23,828][11017] Signal inference workers to stop experience collection... [2024-11-07 13:28:23,842][11030] InferenceWorker_p0-w0: stopping experience collection [2024-11-07 13:28:26,912][10732] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 3592192. Throughput: 0: 499.2. Samples: 2516. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 13:28:26,915][10732] Avg episode reward: [(0, '2.447')] [2024-11-07 13:28:27,907][10732] Heartbeat connected on Batcher_0 [2024-11-07 13:28:27,916][10732] Heartbeat connected on InferenceWorker_p0-w0 [2024-11-07 13:28:27,925][10732] Heartbeat connected on RolloutWorker_w0 [2024-11-07 13:28:27,932][10732] Heartbeat connected on RolloutWorker_w1 [2024-11-07 13:28:27,937][10732] Heartbeat connected on RolloutWorker_w2 [2024-11-07 13:28:27,949][10732] Heartbeat connected on RolloutWorker_w4 [2024-11-07 13:28:27,951][10732] Heartbeat connected on RolloutWorker_w3 [2024-11-07 13:28:27,953][10732] Heartbeat connected on RolloutWorker_w5 [2024-11-07 13:28:27,957][10732] Heartbeat connected on RolloutWorker_w6 [2024-11-07 13:28:27,962][10732] Heartbeat connected on RolloutWorker_w7 [2024-11-07 13:28:28,660][11017] Signal inference workers to resume experience collection... [2024-11-07 13:28:28,661][11030] InferenceWorker_p0-w0: resuming experience collection [2024-11-07 13:28:29,107][10732] Heartbeat connected on LearnerWorker_p0 [2024-11-07 13:28:31,912][10732] Fps is (10 sec: 2457.6, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 3616768. Throughput: 0: 463.4. Samples: 4654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-07 13:28:31,914][10732] Avg episode reward: [(0, '3.662')] [2024-11-07 13:28:35,934][11030] Updated weights for policy 0, policy_version 887 (0.0180) [2024-11-07 13:28:36,967][10732] Fps is (10 sec: 4073.5, 60 sec: 2720.7, 300 sec: 2720.7). Total num frames: 3633152. Throughput: 0: 732.2. Samples: 11044. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-07 13:28:36,971][10732] Avg episode reward: [(0, '4.103')] [2024-11-07 13:28:41,912][10732] Fps is (10 sec: 3686.3, 60 sec: 3072.0, 300 sec: 3072.0). Total num frames: 3653632. Throughput: 0: 672.3. Samples: 13466. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 13:28:41,921][10732] Avg episode reward: [(0, '4.330')] [2024-11-07 13:28:46,604][11030] Updated weights for policy 0, policy_version 897 (0.0061) [2024-11-07 13:28:46,912][10732] Fps is (10 sec: 4118.6, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 3674112. Throughput: 0: 772.6. Samples: 19336. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 13:28:46,914][10732] Avg episode reward: [(0, '4.329')] [2024-11-07 13:28:51,912][10732] Fps is (10 sec: 4915.4, 60 sec: 3686.4, 300 sec: 3686.4). Total num frames: 3702784. Throughput: 0: 914.3. Samples: 27448. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 13:28:51,914][10732] Avg episode reward: [(0, '4.308')] [2024-11-07 13:28:53,575][11030] Updated weights for policy 0, policy_version 907 (0.0034) [2024-11-07 13:28:56,912][10732] Fps is (10 sec: 6144.1, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 3735552. Throughput: 0: 919.2. Samples: 32192. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 13:28:56,914][10732] Avg episode reward: [(0, '4.552')] [2024-11-07 13:28:59,822][11030] Updated weights for policy 0, policy_version 917 (0.0037) [2024-11-07 13:29:01,912][10732] Fps is (10 sec: 6143.8, 60 sec: 4300.8, 300 sec: 4300.8). Total num frames: 3764224. Throughput: 0: 1046.4. Samples: 41876. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-07 13:29:01,919][10732] Avg episode reward: [(0, '4.401')] [2024-11-07 13:29:06,286][11030] Updated weights for policy 0, policy_version 927 (0.0031) [2024-11-07 13:29:06,912][10732] Fps is (10 sec: 6144.1, 60 sec: 4551.1, 300 sec: 4551.1). Total num frames: 3796992. Throughput: 0: 1145.6. Samples: 51570. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 13:29:06,913][10732] Avg episode reward: [(0, '4.400')] [2024-11-07 13:29:11,912][10732] Fps is (10 sec: 6144.2, 60 sec: 4669.5, 300 sec: 4669.5). Total num frames: 3825664. Throughput: 0: 1200.0. Samples: 56514. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-11-07 13:29:11,915][10732] Avg episode reward: [(0, '4.256')] [2024-11-07 13:29:13,526][11030] Updated weights for policy 0, policy_version 937 (0.0024) [2024-11-07 13:29:16,912][10732] Fps is (10 sec: 6144.0, 60 sec: 4840.8, 300 sec: 4840.8). Total num frames: 3858432. Throughput: 0: 1328.4. Samples: 64430. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 13:29:16,913][10732] Avg episode reward: [(0, '4.483')] [2024-11-07 13:29:20,208][11030] Updated weights for policy 0, policy_version 947 (0.0058) [2024-11-07 13:29:21,912][10732] Fps is (10 sec: 6553.5, 60 sec: 4983.5, 300 sec: 4983.5). Total num frames: 3891200. Throughput: 0: 1397.2. Samples: 73840. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-11-07 13:29:21,914][10732] Avg episode reward: [(0, '4.370')] [2024-11-07 13:29:25,891][11030] Updated weights for policy 0, policy_version 957 (0.0021) [2024-11-07 13:29:26,912][10732] Fps is (10 sec: 6553.6, 60 sec: 5529.6, 300 sec: 5104.3). Total num frames: 3923968. Throughput: 0: 1463.5. Samples: 79324. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-07 13:29:26,913][10732] Avg episode reward: [(0, '4.497')] [2024-11-07 13:29:31,912][10732] Fps is (10 sec: 6553.7, 60 sec: 5666.2, 300 sec: 5207.8). Total num frames: 3956736. Throughput: 0: 1552.5. Samples: 89200. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 13:29:31,914][10732] Avg episode reward: [(0, '4.394')] [2024-11-07 13:29:32,158][11030] Updated weights for policy 0, policy_version 967 (0.0034) [2024-11-07 13:29:36,913][10732] Fps is (10 sec: 6552.9, 60 sec: 5944.6, 300 sec: 5297.4). Total num frames: 3989504. Throughput: 0: 1599.2. Samples: 99414. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 13:29:36,918][10732] Avg episode reward: [(0, '4.530')] [2024-11-07 13:29:38,554][11030] Updated weights for policy 0, policy_version 977 (0.0027) [2024-11-07 13:29:39,066][11017] Stopping Batcher_0... [2024-11-07 13:29:39,067][11017] Loop batcher_evt_loop terminating... [2024-11-07 13:29:39,068][10732] Component Batcher_0 stopped! [2024-11-07 13:29:39,072][11017] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-07 13:29:39,102][11030] Weights refcount: 2 0 [2024-11-07 13:29:39,104][11030] Stopping InferenceWorker_p0-w0... [2024-11-07 13:29:39,105][11030] Loop inference_proc0-0_evt_loop terminating... [2024-11-07 13:29:39,105][10732] Component InferenceWorker_p0-w0 stopped! [2024-11-07 13:29:39,175][11034] Stopping RolloutWorker_w3... [2024-11-07 13:29:39,176][11034] Loop rollout_proc3_evt_loop terminating... [2024-11-07 13:29:39,175][10732] Component RolloutWorker_w3 stopped! [2024-11-07 13:29:39,187][11031] Stopping RolloutWorker_w0... [2024-11-07 13:29:39,188][11031] Loop rollout_proc0_evt_loop terminating... [2024-11-07 13:29:39,187][10732] Component RolloutWorker_w0 stopped! [2024-11-07 13:29:39,194][11036] Stopping RolloutWorker_w4... [2024-11-07 13:29:39,196][11036] Loop rollout_proc4_evt_loop terminating... [2024-11-07 13:29:39,195][10732] Component RolloutWorker_w4 stopped! [2024-11-07 13:29:39,209][10732] Component RolloutWorker_w1 stopped! [2024-11-07 13:29:39,209][11033] Stopping RolloutWorker_w1... [2024-11-07 13:29:39,213][11033] Loop rollout_proc1_evt_loop terminating... [2024-11-07 13:29:39,218][11017] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000766_3137536.pth [2024-11-07 13:29:39,235][11017] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-07 13:29:39,256][11038] Stopping RolloutWorker_w7... [2024-11-07 13:29:39,255][10732] Component RolloutWorker_w7 stopped! [2024-11-07 13:29:39,263][11038] Loop rollout_proc7_evt_loop terminating... [2024-11-07 13:29:39,269][10732] Component RolloutWorker_w6 stopped! [2024-11-07 13:29:39,268][11037] Stopping RolloutWorker_w6... [2024-11-07 13:29:39,281][11037] Loop rollout_proc6_evt_loop terminating... [2024-11-07 13:29:39,390][11032] Stopping RolloutWorker_w2... [2024-11-07 13:29:39,392][11032] Loop rollout_proc2_evt_loop terminating... [2024-11-07 13:29:39,390][10732] Component RolloutWorker_w2 stopped! [2024-11-07 13:29:39,486][11035] Stopping RolloutWorker_w5... [2024-11-07 13:29:39,485][10732] Component RolloutWorker_w5 stopped! [2024-11-07 13:29:39,489][11035] Loop rollout_proc5_evt_loop terminating... [2024-11-07 13:29:39,645][10732] Component LearnerWorker_p0 stopped! [2024-11-07 13:29:39,651][10732] Waiting for process learner_proc0 to stop... [2024-11-07 13:29:39,645][11017] Stopping LearnerWorker_p0... [2024-11-07 13:29:39,654][11017] Loop learner_proc0_evt_loop terminating... [2024-11-07 13:29:41,382][10732] Waiting for process inference_proc0-0 to join... [2024-11-07 13:29:41,383][10732] Waiting for process rollout_proc0 to join... [2024-11-07 13:29:41,384][10732] Waiting for process rollout_proc1 to join... [2024-11-07 13:29:41,386][10732] Waiting for process rollout_proc2 to join... [2024-11-07 13:29:41,387][10732] Waiting for process rollout_proc3 to join... [2024-11-07 13:29:41,388][10732] Waiting for process rollout_proc4 to join... [2024-11-07 13:29:41,389][10732] Waiting for process rollout_proc5 to join... [2024-11-07 13:29:41,391][10732] Waiting for process rollout_proc6 to join... [2024-11-07 13:29:41,392][10732] Waiting for process rollout_proc7 to join... [2024-11-07 13:29:41,393][10732] Batcher 0 profile tree view: batching: 4.1772, releasing_batches: 0.0071 [2024-11-07 13:29:41,394][10732] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0001 wait_policy_total: 3.3427 update_model: 1.2367 weight_update: 0.0027 one_step: 0.0111 handle_policy_step: 68.2333 deserialize: 1.8008, stack: 0.3402, obs_to_device_normalize: 19.5773, forward: 29.6476, send_messages: 4.4855 prepare_outputs: 10.0586 to_cpu: 6.9987 [2024-11-07 13:29:41,396][10732] Learner 0 profile tree view: misc: 0.0005, prepare_batch: 3.8501 train: 16.4720 epoch_init: 0.0008, minibatch_init: 0.0049, losses_postprocess: 0.1092, kl_divergence: 0.1493, after_optimizer: 0.7284 calculate_losses: 4.1094 losses_init: 0.0007, forward_head: 0.6633, bptt_initial: 2.3525, tail: 0.1590, advantages_returns: 0.0443, losses: 0.4608 bptt: 0.3909 bptt_forward_core: 0.3798 update: 11.2589 clip: 0.2380 [2024-11-07 13:29:41,396][10732] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.0249, enqueue_policy_requests: 1.6630, env_step: 18.9067, overhead: 1.4135, complete_rollouts: 0.0593 save_policy_outputs: 2.1104 split_output_tensors: 0.7211 [2024-11-07 13:29:41,397][10732] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.0229, enqueue_policy_requests: 1.5401, env_step: 31.2942, overhead: 1.4459, complete_rollouts: 0.0523 save_policy_outputs: 2.0803 split_output_tensors: 0.6932 [2024-11-07 13:29:41,401][10732] Loop Runner_EvtLoop terminating... [2024-11-07 13:29:41,403][10732] Runner profile tree view: main_loop: 93.4405 [2024-11-07 13:29:41,404][10732] Collected {0: 4005888}, FPS: 4427.4 [2024-11-07 13:39:17,255][11922] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... [2024-11-07 13:39:17,258][11922] Rollout worker 0 uses device cpu [2024-11-07 13:39:17,259][11922] Rollout worker 1 uses device cpu [2024-11-07 13:39:17,261][11922] Rollout worker 2 uses device cpu [2024-11-07 13:39:17,263][11922] Rollout worker 3 uses device cpu [2024-11-07 13:39:17,265][11922] Rollout worker 4 uses device cpu [2024-11-07 13:39:17,266][11922] Rollout worker 5 uses device cpu [2024-11-07 13:39:17,268][11922] Rollout worker 6 uses device cpu [2024-11-07 13:39:17,269][11922] Rollout worker 7 uses device cpu [2024-11-07 13:39:17,330][11922] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 13:39:17,332][11922] InferenceWorker_p0-w0: min num requests: 2 [2024-11-07 13:39:17,371][11922] Starting all processes... [2024-11-07 13:39:17,373][11922] Starting process learner_proc0 [2024-11-07 13:39:17,483][11922] Starting all processes... [2024-11-07 13:39:17,490][11922] Starting process inference_proc0-0 [2024-11-07 13:39:17,561][11922] Starting process rollout_proc0 [2024-11-07 13:39:17,570][11922] Starting process rollout_proc4 [2024-11-07 13:39:17,564][11922] Starting process rollout_proc2 [2024-11-07 13:39:17,569][11922] Starting process rollout_proc3 [2024-11-07 13:39:17,571][11922] Starting process rollout_proc5 [2024-11-07 13:39:17,563][11922] Starting process rollout_proc1 [2024-11-07 13:39:17,571][11922] Starting process rollout_proc6 [2024-11-07 13:39:17,578][11922] Starting process rollout_proc7 [2024-11-07 13:39:23,020][13795] Worker 2 uses CPU cores [2] [2024-11-07 13:39:23,164][13792] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 13:39:23,165][13792] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-11-07 13:39:23,266][13798] Worker 1 uses CPU cores [1] [2024-11-07 13:39:23,338][13792] Num visible devices: 1 [2024-11-07 13:39:23,429][13793] Worker 0 uses CPU cores [0] [2024-11-07 13:39:23,606][13779] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 13:39:23,607][13779] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-11-07 13:39:23,643][13779] Num visible devices: 1 [2024-11-07 13:39:23,662][13779] Starting seed is not provided [2024-11-07 13:39:23,662][13779] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 13:39:23,663][13779] Initializing actor-critic model on device cuda:0 [2024-11-07 13:39:23,663][13779] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 13:39:23,664][13779] RunningMeanStd input shape: (1,) [2024-11-07 13:39:23,687][13779] ConvEncoder: input_channels=3 [2024-11-07 13:39:23,929][13794] Worker 4 uses CPU cores [4] [2024-11-07 13:39:24,026][13779] Conv encoder output size: 512 [2024-11-07 13:39:24,027][13779] Policy head output size: 512 [2024-11-07 13:39:24,029][13797] Worker 5 uses CPU cores [5] [2024-11-07 13:39:24,044][13779] Created Actor Critic model with architecture: [2024-11-07 13:39:24,045][13779] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-11-07 13:39:24,052][13806] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] [2024-11-07 13:39:24,184][13796] Worker 3 uses CPU cores [3] [2024-11-07 13:39:24,380][13805] Worker 6 uses CPU cores [6] [2024-11-07 13:39:24,922][13779] Using optimizer [2024-11-07 13:39:26,175][13779] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-07 13:39:26,246][13779] Loading model from checkpoint [2024-11-07 13:39:26,249][13779] Loaded experiment state at self.train_step=978, self.env_steps=4005888 [2024-11-07 13:39:26,249][13779] Initialized policy 0 weights for model version 978 [2024-11-07 13:39:26,260][13779] LearnerWorker_p0 finished initialization! [2024-11-07 13:39:26,261][13779] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 13:39:26,511][13792] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 13:39:26,512][13792] RunningMeanStd input shape: (1,) [2024-11-07 13:39:26,539][13792] ConvEncoder: input_channels=3 [2024-11-07 13:39:26,678][13792] Conv encoder output size: 512 [2024-11-07 13:39:26,678][13792] Policy head output size: 512 [2024-11-07 13:39:26,736][11922] Inference worker 0-0 is ready! [2024-11-07 13:39:26,738][11922] All inference workers are ready! Signal rollout workers to start! [2024-11-07 13:39:26,827][13795] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:39:26,837][13796] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:39:26,845][13794] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:39:26,898][13797] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:39:26,909][13798] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:39:26,923][13805] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:39:26,975][13806] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:39:27,065][13793] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:39:27,367][11922] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4005888. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 13:39:27,764][13795] Decorrelating experience for 0 frames... [2024-11-07 13:39:27,768][13796] Decorrelating experience for 0 frames... [2024-11-07 13:39:27,990][13798] Decorrelating experience for 0 frames... [2024-11-07 13:39:28,044][13797] Decorrelating experience for 0 frames... [2024-11-07 13:39:28,045][13793] Decorrelating experience for 0 frames... [2024-11-07 13:39:28,337][13795] Decorrelating experience for 32 frames... [2024-11-07 13:39:28,360][13806] Decorrelating experience for 0 frames... [2024-11-07 13:39:28,566][13798] Decorrelating experience for 32 frames... [2024-11-07 13:39:28,636][13805] Decorrelating experience for 0 frames... [2024-11-07 13:39:28,639][13793] Decorrelating experience for 32 frames... [2024-11-07 13:39:28,863][13797] Decorrelating experience for 32 frames... [2024-11-07 13:39:28,870][13794] Decorrelating experience for 0 frames... [2024-11-07 13:39:28,905][13795] Decorrelating experience for 64 frames... [2024-11-07 13:39:29,069][13796] Decorrelating experience for 32 frames... [2024-11-07 13:39:29,786][13795] Decorrelating experience for 96 frames... [2024-11-07 13:39:29,787][13794] Decorrelating experience for 32 frames... [2024-11-07 13:39:29,824][13805] Decorrelating experience for 32 frames... [2024-11-07 13:39:29,869][13806] Decorrelating experience for 32 frames... [2024-11-07 13:39:29,878][13798] Decorrelating experience for 64 frames... [2024-11-07 13:39:30,063][13796] Decorrelating experience for 64 frames... [2024-11-07 13:39:30,112][13797] Decorrelating experience for 64 frames... [2024-11-07 13:39:30,492][13794] Decorrelating experience for 64 frames... [2024-11-07 13:39:30,540][13806] Decorrelating experience for 64 frames... [2024-11-07 13:39:30,598][13798] Decorrelating experience for 96 frames... [2024-11-07 13:39:30,629][13796] Decorrelating experience for 96 frames... [2024-11-07 13:39:30,669][13797] Decorrelating experience for 96 frames... [2024-11-07 13:39:30,869][13793] Decorrelating experience for 64 frames... [2024-11-07 13:39:31,079][13794] Decorrelating experience for 96 frames... [2024-11-07 13:39:31,145][13806] Decorrelating experience for 96 frames... [2024-11-07 13:39:31,182][13805] Decorrelating experience for 64 frames... [2024-11-07 13:39:31,415][13793] Decorrelating experience for 96 frames... [2024-11-07 13:39:31,709][13805] Decorrelating experience for 96 frames... [2024-11-07 13:39:32,367][11922] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 13:39:32,368][11922] Avg episode reward: [(0, '1.136')] [2024-11-07 13:39:33,228][13779] Signal inference workers to stop experience collection... [2024-11-07 13:39:33,240][13792] InferenceWorker_p0-w0: stopping experience collection [2024-11-07 13:39:38,026][11922] Heartbeat connected on InferenceWorker_p0-w0 [2024-11-07 13:39:38,028][11922] Heartbeat connected on RolloutWorker_w6 [2024-11-07 13:39:38,030][11922] Heartbeat connected on Batcher_0 [2024-11-07 13:39:38,032][11922] Heartbeat connected on RolloutWorker_w5 [2024-11-07 13:39:38,034][11922] Heartbeat connected on RolloutWorker_w7 [2024-11-07 13:39:38,036][11922] Heartbeat connected on RolloutWorker_w0 [2024-11-07 13:39:38,039][11922] Heartbeat connected on RolloutWorker_w3 [2024-11-07 13:39:38,040][11922] Heartbeat connected on RolloutWorker_w2 [2024-11-07 13:39:38,046][11922] Heartbeat connected on RolloutWorker_w1 [2024-11-07 13:39:38,048][11922] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 243.6. Samples: 2602. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 13:39:38,051][11922] Avg episode reward: [(0, '2.131')] [2024-11-07 13:39:38,063][11922] Heartbeat connected on RolloutWorker_w4 [2024-11-07 13:39:42,367][11922] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 173.5. Samples: 2602. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 13:39:42,369][11922] Avg episode reward: [(0, '2.131')] [2024-11-07 13:39:43,444][13779] Signal inference workers to resume experience collection... [2024-11-07 13:39:43,445][13779] Stopping Batcher_0... [2024-11-07 13:39:43,445][13792] InferenceWorker_p0-w0: resuming experience collection [2024-11-07 13:39:43,446][13779] Loop batcher_evt_loop terminating... [2024-11-07 13:39:43,453][11922] Component Batcher_0 stopped! [2024-11-07 13:39:43,517][13792] Weights refcount: 2 0 [2024-11-07 13:39:43,545][13792] Stopping InferenceWorker_p0-w0... [2024-11-07 13:39:43,546][13792] Loop inference_proc0-0_evt_loop terminating... [2024-11-07 13:39:43,549][11922] Component InferenceWorker_p0-w0 stopped! [2024-11-07 13:39:44,148][11922] Component RolloutWorker_w2 stopped! [2024-11-07 13:39:44,150][13795] Stopping RolloutWorker_w2... [2024-11-07 13:39:44,151][13795] Loop rollout_proc2_evt_loop terminating... [2024-11-07 13:39:44,236][13794] Stopping RolloutWorker_w4... [2024-11-07 13:39:44,236][13794] Loop rollout_proc4_evt_loop terminating... [2024-11-07 13:39:44,235][11922] Component RolloutWorker_w4 stopped! [2024-11-07 13:39:44,241][13793] Stopping RolloutWorker_w0... [2024-11-07 13:39:44,242][13793] Loop rollout_proc0_evt_loop terminating... [2024-11-07 13:39:44,242][11922] Component RolloutWorker_w0 stopped! [2024-11-07 13:39:44,288][13806] Stopping RolloutWorker_w7... [2024-11-07 13:39:44,288][11922] Component RolloutWorker_w7 stopped! [2024-11-07 13:39:44,295][13806] Loop rollout_proc7_evt_loop terminating... [2024-11-07 13:39:44,331][13779] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth... [2024-11-07 13:39:44,336][11922] Heartbeat connected on LearnerWorker_p0 [2024-11-07 13:39:44,381][11922] Component RolloutWorker_w6 stopped! [2024-11-07 13:39:44,398][13797] Stopping RolloutWorker_w5... [2024-11-07 13:39:44,399][13797] Loop rollout_proc5_evt_loop terminating... [2024-11-07 13:39:44,400][11922] Component RolloutWorker_w5 stopped! [2024-11-07 13:39:44,380][13805] Stopping RolloutWorker_w6... [2024-11-07 13:39:44,412][13805] Loop rollout_proc6_evt_loop terminating... [2024-11-07 13:39:44,561][13796] Stopping RolloutWorker_w3... [2024-11-07 13:39:44,561][13796] Loop rollout_proc3_evt_loop terminating... [2024-11-07 13:39:44,561][11922] Component RolloutWorker_w3 stopped! [2024-11-07 13:39:44,604][13798] Stopping RolloutWorker_w1... [2024-11-07 13:39:44,609][13798] Loop rollout_proc1_evt_loop terminating... [2024-11-07 13:39:44,605][11922] Component RolloutWorker_w1 stopped! [2024-11-07 13:39:44,765][13779] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000877_3592192.pth [2024-11-07 13:39:44,816][13779] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth... [2024-11-07 13:39:45,460][13779] Stopping LearnerWorker_p0... [2024-11-07 13:39:45,460][13779] Loop learner_proc0_evt_loop terminating... [2024-11-07 13:39:45,461][11922] Component LearnerWorker_p0 stopped! [2024-11-07 13:39:45,466][11922] Waiting for process learner_proc0 to stop... [2024-11-07 13:39:47,174][11922] Waiting for process inference_proc0-0 to join... [2024-11-07 13:39:47,176][11922] Waiting for process rollout_proc0 to join... [2024-11-07 13:39:47,178][11922] Waiting for process rollout_proc1 to join... [2024-11-07 13:39:47,180][11922] Waiting for process rollout_proc2 to join... [2024-11-07 13:39:47,182][11922] Waiting for process rollout_proc3 to join... [2024-11-07 13:39:47,184][11922] Waiting for process rollout_proc4 to join... [2024-11-07 13:39:47,186][11922] Waiting for process rollout_proc5 to join... [2024-11-07 13:39:47,189][11922] Waiting for process rollout_proc6 to join... [2024-11-07 13:39:47,195][11922] Waiting for process rollout_proc7 to join... [2024-11-07 13:39:47,199][11922] Batcher 0 profile tree view: batching: 0.0833, releasing_batches: 0.0013 [2024-11-07 13:39:47,201][11922] InferenceWorker_p0-w0 profile tree view: update_model: 0.0149 wait_policy: 0.0000 wait_policy_total: 3.0989 one_step: 0.0127 handle_policy_step: 3.2174 deserialize: 0.0602, stack: 0.0094, obs_to_device_normalize: 0.7060, forward: 1.9784, send_messages: 0.1164 prepare_outputs: 0.2735 to_cpu: 0.2112 [2024-11-07 13:39:47,204][11922] Learner 0 profile tree view: misc: 0.0000, prepare_batch: 1.8149 train: 10.0389 epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0008, kl_divergence: 0.0238, after_optimizer: 3.8432 calculate_losses: 0.8068 losses_init: 0.0000, forward_head: 0.5561, bptt_initial: 0.1521, tail: 0.0237, advantages_returns: 0.0014, losses: 0.0411 bptt: 0.0316 bptt_forward_core: 0.0313 update: 5.3565 clip: 0.0752 [2024-11-07 13:39:47,207][11922] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.0008, enqueue_policy_requests: 0.0580, env_step: 0.4822, overhead: 0.0318, complete_rollouts: 0.0011 save_policy_outputs: 0.0730 split_output_tensors: 0.0210 [2024-11-07 13:39:47,210][11922] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.0010, enqueue_policy_requests: 0.0571, env_step: 0.8575, overhead: 0.0384, complete_rollouts: 0.0041 save_policy_outputs: 0.0486 split_output_tensors: 0.0154 [2024-11-07 13:39:47,215][11922] Loop Runner_EvtLoop terminating... [2024-11-07 13:39:47,218][11922] Runner profile tree view: main_loop: 29.8465 [2024-11-07 13:39:47,222][11922] Collected {0: 4014080}, FPS: 274.5 [2024-11-07 13:46:19,573][11922] Environment doom_basic already registered, overwriting... [2024-11-07 13:46:19,579][11922] Environment doom_two_colors_easy already registered, overwriting... [2024-11-07 13:46:19,582][11922] Environment doom_two_colors_hard already registered, overwriting... [2024-11-07 13:46:19,583][11922] Environment doom_dm already registered, overwriting... [2024-11-07 13:46:19,586][11922] Environment doom_dwango5 already registered, overwriting... [2024-11-07 13:46:19,589][11922] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-11-07 13:46:19,593][11922] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-11-07 13:46:19,595][11922] Environment doom_my_way_home already registered, overwriting... [2024-11-07 13:46:19,597][11922] Environment doom_deadly_corridor already registered, overwriting... [2024-11-07 13:46:19,599][11922] Environment doom_defend_the_center already registered, overwriting... [2024-11-07 13:46:19,601][11922] Environment doom_defend_the_line already registered, overwriting... [2024-11-07 13:46:19,603][11922] Environment doom_health_gathering already registered, overwriting... [2024-11-07 13:46:19,605][11922] Environment doom_health_gathering_supreme already registered, overwriting... [2024-11-07 13:46:19,607][11922] Environment doom_battle already registered, overwriting... [2024-11-07 13:46:19,608][11922] Environment doom_battle2 already registered, overwriting... [2024-11-07 13:46:19,611][11922] Environment doom_duel_bots already registered, overwriting... [2024-11-07 13:46:19,613][11922] Environment doom_deathmatch_bots already registered, overwriting... [2024-11-07 13:46:19,616][11922] Environment doom_duel already registered, overwriting... [2024-11-07 13:46:19,620][11922] Environment doom_deathmatch_full already registered, overwriting... [2024-11-07 13:46:19,622][11922] Environment doom_benchmark already registered, overwriting... [2024-11-07 13:46:19,624][11922] register_encoder_factory: [2024-11-07 13:46:19,794][11922] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json [2024-11-07 13:46:19,803][11922] Experiment dir /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment already exists! [2024-11-07 13:46:19,805][11922] Resuming existing experiment from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment... [2024-11-07 13:46:19,806][11922] Weights and Biases integration disabled [2024-11-07 13:46:19,812][11922] Environment var CUDA_VISIBLE_DEVICES is 0 [2024-11-07 13:46:25,974][11922] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/root/hfRL/ml/LunarLander-v2/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=4000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2024-11-07 13:46:25,976][11922] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... [2024-11-07 13:46:25,978][11922] Rollout worker 0 uses device cpu [2024-11-07 13:46:25,979][11922] Rollout worker 1 uses device cpu [2024-11-07 13:46:25,981][11922] Rollout worker 2 uses device cpu [2024-11-07 13:46:25,983][11922] Rollout worker 3 uses device cpu [2024-11-07 13:46:25,984][11922] Rollout worker 4 uses device cpu [2024-11-07 13:46:25,985][11922] Rollout worker 5 uses device cpu [2024-11-07 13:46:25,986][11922] Rollout worker 6 uses device cpu [2024-11-07 13:46:25,988][11922] Rollout worker 7 uses device cpu [2024-11-07 13:46:26,047][11922] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 13:46:26,048][11922] InferenceWorker_p0-w0: min num requests: 2 [2024-11-07 13:46:26,084][11922] Starting all processes... [2024-11-07 13:46:26,085][11922] Starting process learner_proc0 [2024-11-07 13:46:26,128][11922] Starting all processes... [2024-11-07 13:46:26,134][11922] Starting process inference_proc0-0 [2024-11-07 13:46:26,135][11922] Starting process rollout_proc0 [2024-11-07 13:46:26,136][11922] Starting process rollout_proc1 [2024-11-07 13:46:26,137][11922] Starting process rollout_proc2 [2024-11-07 13:46:26,138][11922] Starting process rollout_proc3 [2024-11-07 13:46:26,142][11922] Starting process rollout_proc4 [2024-11-07 13:46:26,142][11922] Starting process rollout_proc5 [2024-11-07 13:46:26,148][11922] Starting process rollout_proc6 [2024-11-07 13:46:26,150][11922] Starting process rollout_proc7 [2024-11-07 13:46:32,861][15894] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 13:46:32,862][15894] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-11-07 13:46:33,101][15894] Num visible devices: 1 [2024-11-07 13:46:33,152][15894] Starting seed is not provided [2024-11-07 13:46:33,152][15894] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 13:46:33,153][15894] Initializing actor-critic model on device cuda:0 [2024-11-07 13:46:33,153][15894] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 13:46:33,156][15894] RunningMeanStd input shape: (1,) [2024-11-07 13:46:33,188][15894] ConvEncoder: input_channels=3 [2024-11-07 13:46:33,934][15894] Conv encoder output size: 512 [2024-11-07 13:46:33,935][15894] Policy head output size: 512 [2024-11-07 13:46:33,989][15894] Created Actor Critic model with architecture: [2024-11-07 13:46:34,012][15894] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-11-07 13:46:34,270][15914] Worker 6 uses CPU cores [6] [2024-11-07 13:46:34,717][15915] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] [2024-11-07 13:46:34,760][15912] Worker 4 uses CPU cores [4] [2024-11-07 13:46:35,118][15911] Worker 3 uses CPU cores [3] [2024-11-07 13:46:35,310][15910] Worker 2 uses CPU cores [2] [2024-11-07 13:46:35,910][15913] Worker 5 uses CPU cores [5] [2024-11-07 13:46:36,133][15894] Using optimizer [2024-11-07 13:46:36,385][15907] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 13:46:36,386][15907] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-11-07 13:46:36,392][15908] Worker 0 uses CPU cores [0] [2024-11-07 13:46:36,413][15907] Num visible devices: 1 [2024-11-07 13:46:36,420][15909] Worker 1 uses CPU cores [1] [2024-11-07 13:46:37,928][15894] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth... [2024-11-07 13:46:38,045][15894] Loading model from checkpoint [2024-11-07 13:46:38,049][15894] Loaded experiment state at self.train_step=980, self.env_steps=4014080 [2024-11-07 13:46:38,050][15894] Initialized policy 0 weights for model version 980 [2024-11-07 13:46:38,059][15894] LearnerWorker_p0 finished initialization! [2024-11-07 13:46:38,059][15894] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 13:46:38,256][15907] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 13:46:38,257][15907] RunningMeanStd input shape: (1,) [2024-11-07 13:46:38,272][15907] ConvEncoder: input_channels=3 [2024-11-07 13:46:38,433][15907] Conv encoder output size: 512 [2024-11-07 13:46:38,434][15907] Policy head output size: 512 [2024-11-07 13:46:38,489][11922] Inference worker 0-0 is ready! [2024-11-07 13:46:38,491][11922] All inference workers are ready! Signal rollout workers to start! [2024-11-07 13:46:38,572][15912] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:46:38,588][15910] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:46:38,592][15909] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:46:38,605][15908] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:46:38,607][15913] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:46:38,615][15911] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:46:38,671][15914] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:46:38,691][15915] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:46:39,151][15910] Decorrelating experience for 0 frames... [2024-11-07 13:46:39,153][15911] Decorrelating experience for 0 frames... [2024-11-07 13:46:39,154][15912] Decorrelating experience for 0 frames... [2024-11-07 13:46:39,214][15908] Decorrelating experience for 0 frames... [2024-11-07 13:46:39,218][15913] Decorrelating experience for 0 frames... [2024-11-07 13:46:39,320][15914] Decorrelating experience for 0 frames... [2024-11-07 13:46:39,597][15912] Decorrelating experience for 32 frames... [2024-11-07 13:46:39,601][15911] Decorrelating experience for 32 frames... [2024-11-07 13:46:39,804][15909] Decorrelating experience for 0 frames... [2024-11-07 13:46:39,813][11922] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4014080. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 13:46:39,812][15910] Decorrelating experience for 32 frames... [2024-11-07 13:46:39,815][15908] Decorrelating experience for 32 frames... [2024-11-07 13:46:39,818][15913] Decorrelating experience for 32 frames... [2024-11-07 13:46:40,179][15914] Decorrelating experience for 32 frames... [2024-11-07 13:46:40,448][15912] Decorrelating experience for 64 frames... [2024-11-07 13:46:40,454][15909] Decorrelating experience for 32 frames... [2024-11-07 13:46:40,491][15910] Decorrelating experience for 64 frames... [2024-11-07 13:46:40,532][15908] Decorrelating experience for 64 frames... [2024-11-07 13:46:40,548][15913] Decorrelating experience for 64 frames... [2024-11-07 13:46:40,806][15915] Decorrelating experience for 0 frames... [2024-11-07 13:46:41,100][15912] Decorrelating experience for 96 frames... [2024-11-07 13:46:41,140][15910] Decorrelating experience for 96 frames... [2024-11-07 13:46:41,259][15909] Decorrelating experience for 64 frames... [2024-11-07 13:46:41,267][15914] Decorrelating experience for 64 frames... [2024-11-07 13:46:41,381][15913] Decorrelating experience for 96 frames... [2024-11-07 13:46:41,381][15911] Decorrelating experience for 64 frames... [2024-11-07 13:46:41,744][15908] Decorrelating experience for 96 frames... [2024-11-07 13:46:41,760][15915] Decorrelating experience for 32 frames... [2024-11-07 13:46:41,947][15914] Decorrelating experience for 96 frames... [2024-11-07 13:46:42,091][15909] Decorrelating experience for 96 frames... [2024-11-07 13:46:42,535][15911] Decorrelating experience for 96 frames... [2024-11-07 13:46:42,731][15915] Decorrelating experience for 64 frames... [2024-11-07 13:46:43,672][15915] Decorrelating experience for 96 frames... [2024-11-07 13:46:44,816][11922] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4014080. Throughput: 0: 383.4. Samples: 1918. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 13:46:44,824][11922] Avg episode reward: [(0, '1.815')] [2024-11-07 13:46:45,370][15894] Signal inference workers to stop experience collection... [2024-11-07 13:46:45,393][15907] InferenceWorker_p0-w0: stopping experience collection [2024-11-07 13:46:46,038][11922] Heartbeat connected on Batcher_0 [2024-11-07 13:46:46,047][11922] Heartbeat connected on InferenceWorker_p0-w0 [2024-11-07 13:46:46,054][11922] Heartbeat connected on RolloutWorker_w0 [2024-11-07 13:46:46,061][11922] Heartbeat connected on RolloutWorker_w1 [2024-11-07 13:46:46,068][11922] Heartbeat connected on RolloutWorker_w3 [2024-11-07 13:46:46,072][11922] Heartbeat connected on RolloutWorker_w4 [2024-11-07 13:46:46,076][11922] Heartbeat connected on RolloutWorker_w5 [2024-11-07 13:46:46,077][11922] Heartbeat connected on RolloutWorker_w2 [2024-11-07 13:46:46,080][11922] Heartbeat connected on RolloutWorker_w6 [2024-11-07 13:46:46,084][11922] Heartbeat connected on RolloutWorker_w7 [2024-11-07 13:46:49,814][11922] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4014080. Throughput: 0: 280.0. Samples: 2800. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 13:46:49,815][11922] Avg episode reward: [(0, '2.198')] [2024-11-07 13:46:53,479][15894] Signal inference workers to resume experience collection... [2024-11-07 13:46:53,479][15907] InferenceWorker_p0-w0: resuming experience collection [2024-11-07 13:46:53,483][15894] Stopping Batcher_0... [2024-11-07 13:46:53,483][15894] Loop batcher_evt_loop terminating... [2024-11-07 13:46:53,491][11922] Component Batcher_0 stopped! [2024-11-07 13:46:53,503][15907] Weights refcount: 2 0 [2024-11-07 13:46:53,506][15907] Stopping InferenceWorker_p0-w0... [2024-11-07 13:46:53,507][15907] Loop inference_proc0-0_evt_loop terminating... [2024-11-07 13:46:53,507][11922] Component InferenceWorker_p0-w0 stopped! [2024-11-07 13:46:53,609][15911] Stopping RolloutWorker_w3... [2024-11-07 13:46:53,610][15911] Loop rollout_proc3_evt_loop terminating... [2024-11-07 13:46:53,609][11922] Component RolloutWorker_w3 stopped! [2024-11-07 13:46:53,681][11922] Component RolloutWorker_w1 stopped! [2024-11-07 13:46:53,685][15909] Stopping RolloutWorker_w1... [2024-11-07 13:46:53,686][15909] Loop rollout_proc1_evt_loop terminating... [2024-11-07 13:46:53,700][11922] Component RolloutWorker_w5 stopped! [2024-11-07 13:46:53,699][15913] Stopping RolloutWorker_w5... [2024-11-07 13:46:53,705][15913] Loop rollout_proc5_evt_loop terminating... [2024-11-07 13:46:53,765][15912] Stopping RolloutWorker_w4... [2024-11-07 13:46:53,773][11922] Component RolloutWorker_w4 stopped! [2024-11-07 13:46:53,776][15912] Loop rollout_proc4_evt_loop terminating... [2024-11-07 13:46:54,091][15910] Stopping RolloutWorker_w2... [2024-11-07 13:46:54,091][15910] Loop rollout_proc2_evt_loop terminating... [2024-11-07 13:46:54,096][11922] Component RolloutWorker_w2 stopped! [2024-11-07 13:46:54,158][15915] Stopping RolloutWorker_w7... [2024-11-07 13:46:54,160][11922] Component RolloutWorker_w7 stopped! [2024-11-07 13:46:54,176][15915] Loop rollout_proc7_evt_loop terminating... [2024-11-07 13:46:54,256][15914] Stopping RolloutWorker_w6... [2024-11-07 13:46:54,245][11922] Component RolloutWorker_w6 stopped! [2024-11-07 13:46:54,270][15914] Loop rollout_proc6_evt_loop terminating... [2024-11-07 13:46:54,464][11922] Component RolloutWorker_w0 stopped! [2024-11-07 13:46:54,467][15908] Stopping RolloutWorker_w0... [2024-11-07 13:46:54,469][15908] Loop rollout_proc0_evt_loop terminating... [2024-11-07 13:46:54,999][15894] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000982_4022272.pth... [2024-11-07 13:46:55,359][15894] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth [2024-11-07 13:46:55,376][15894] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000982_4022272.pth... [2024-11-07 13:46:55,375][11922] Heartbeat connected on LearnerWorker_p0 [2024-11-07 13:46:55,741][15894] Stopping LearnerWorker_p0... [2024-11-07 13:46:55,742][15894] Loop learner_proc0_evt_loop terminating... [2024-11-07 13:46:55,743][11922] Component LearnerWorker_p0 stopped! [2024-11-07 13:46:55,751][11922] Waiting for process learner_proc0 to stop... [2024-11-07 13:46:57,104][11922] Waiting for process inference_proc0-0 to join... [2024-11-07 13:46:57,106][11922] Waiting for process rollout_proc0 to join... [2024-11-07 13:46:57,108][11922] Waiting for process rollout_proc1 to join... [2024-11-07 13:46:57,109][11922] Waiting for process rollout_proc2 to join... [2024-11-07 13:46:57,111][11922] Waiting for process rollout_proc3 to join... [2024-11-07 13:46:57,113][11922] Waiting for process rollout_proc4 to join... [2024-11-07 13:46:57,114][11922] Waiting for process rollout_proc5 to join... [2024-11-07 13:46:57,116][11922] Waiting for process rollout_proc6 to join... [2024-11-07 13:46:57,117][11922] Waiting for process rollout_proc7 to join... [2024-11-07 13:46:57,124][11922] Batcher 0 profile tree view: batching: 0.0647, releasing_batches: 0.0037 [2024-11-07 13:46:57,127][11922] InferenceWorker_p0-w0 profile tree view: update_model: 0.0205 wait_policy: 0.0001 wait_policy_total: 2.6671 one_step: 0.0122 handle_policy_step: 4.0411 deserialize: 0.0722, stack: 0.0210, obs_to_device_normalize: 0.9497, forward: 2.4204, send_messages: 0.1187 prepare_outputs: 0.3446 to_cpu: 0.2395 [2024-11-07 13:46:57,128][11922] Learner 0 profile tree view: misc: 0.0001, prepare_batch: 2.5911 train: 8.0508 epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0011, kl_divergence: 0.0721, after_optimizer: 0.4508 calculate_losses: 1.1483 losses_init: 0.0000, forward_head: 0.5330, bptt_initial: 0.3452, tail: 0.0913, advantages_returns: 0.0017, losses: 0.1113 bptt: 0.0650 bptt_forward_core: 0.0649 update: 6.3767 clip: 0.1802 [2024-11-07 13:46:57,130][11922] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.0006, enqueue_policy_requests: 0.0646, env_step: 0.7793, overhead: 0.0623, complete_rollouts: 0.0010 save_policy_outputs: 0.0912 split_output_tensors: 0.0297 [2024-11-07 13:46:57,131][11922] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.0009, enqueue_policy_requests: 0.0323, env_step: 0.6692, overhead: 0.0294, complete_rollouts: 0.0012 save_policy_outputs: 0.0352 split_output_tensors: 0.0111 [2024-11-07 13:46:57,134][11922] Loop Runner_EvtLoop terminating... [2024-11-07 13:46:57,136][11922] Runner profile tree view: main_loop: 31.0520 [2024-11-07 13:46:57,138][11922] Collected {0: 4022272}, FPS: 263.8 [2024-11-07 13:56:29,353][01021] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... [2024-11-07 13:56:29,375][01021] Rollout worker 0 uses device cpu [2024-11-07 13:56:29,376][01021] Rollout worker 1 uses device cpu [2024-11-07 13:56:29,378][01021] Rollout worker 2 uses device cpu [2024-11-07 13:56:29,379][01021] Rollout worker 3 uses device cpu [2024-11-07 13:56:29,380][01021] Rollout worker 4 uses device cpu [2024-11-07 13:56:29,381][01021] Rollout worker 5 uses device cpu [2024-11-07 13:56:29,382][01021] Rollout worker 6 uses device cpu [2024-11-07 13:56:29,382][01021] Rollout worker 7 uses device cpu [2024-11-07 13:56:29,753][01021] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 13:56:29,755][01021] InferenceWorker_p0-w0: min num requests: 2 [2024-11-07 13:56:29,791][01021] Starting all processes... [2024-11-07 13:56:29,792][01021] Starting process learner_proc0 [2024-11-07 13:56:29,946][01021] Starting all processes... [2024-11-07 13:56:29,999][01021] Starting process inference_proc0-0 [2024-11-07 13:56:30,001][01021] Starting process rollout_proc0 [2024-11-07 13:56:30,001][01021] Starting process rollout_proc1 [2024-11-07 13:56:30,005][01021] Starting process rollout_proc2 [2024-11-07 13:56:30,005][01021] Starting process rollout_proc3 [2024-11-07 13:56:30,006][01021] Starting process rollout_proc4 [2024-11-07 13:56:30,011][01021] Starting process rollout_proc5 [2024-11-07 13:56:30,013][01021] Starting process rollout_proc6 [2024-11-07 13:56:30,017][01021] Starting process rollout_proc7 [2024-11-07 13:56:36,174][01326] Worker 2 uses CPU cores [2] [2024-11-07 13:56:36,430][01327] Worker 3 uses CPU cores [3] [2024-11-07 13:56:36,676][01310] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 13:56:36,677][01310] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-11-07 13:56:36,769][01328] Worker 4 uses CPU cores [4] [2024-11-07 13:56:36,883][01310] Num visible devices: 1 [2024-11-07 13:56:36,934][01310] Starting seed is not provided [2024-11-07 13:56:36,935][01310] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 13:56:36,935][01310] Initializing actor-critic model on device cuda:0 [2024-11-07 13:56:36,935][01310] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 13:56:36,950][01310] RunningMeanStd input shape: (1,) [2024-11-07 13:56:36,985][01310] ConvEncoder: input_channels=3 [2024-11-07 13:56:37,147][01331] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] [2024-11-07 13:56:37,496][01324] Worker 0 uses CPU cores [0] [2024-11-07 13:56:37,671][01323] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 13:56:37,671][01323] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-11-07 13:56:37,733][01323] Num visible devices: 1 [2024-11-07 13:56:37,916][01325] Worker 1 uses CPU cores [1] [2024-11-07 13:56:37,960][01330] Worker 6 uses CPU cores [6] [2024-11-07 13:56:38,125][01329] Worker 5 uses CPU cores [5] [2024-11-07 13:56:38,455][01310] Conv encoder output size: 512 [2024-11-07 13:56:38,456][01310] Policy head output size: 512 [2024-11-07 13:56:38,662][01310] Created Actor Critic model with architecture: [2024-11-07 13:56:38,662][01310] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-11-07 13:56:41,671][01310] Using optimizer [2024-11-07 13:56:49,591][01310] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000982_4022272.pth... [2024-11-07 13:56:49,745][01021] Heartbeat connected on Batcher_0 [2024-11-07 13:56:49,754][01021] Heartbeat connected on InferenceWorker_p0-w0 [2024-11-07 13:56:49,761][01021] Heartbeat connected on RolloutWorker_w0 [2024-11-07 13:56:49,764][01021] Heartbeat connected on RolloutWorker_w1 [2024-11-07 13:56:49,769][01021] Heartbeat connected on RolloutWorker_w2 [2024-11-07 13:56:49,773][01021] Heartbeat connected on RolloutWorker_w3 [2024-11-07 13:56:49,778][01021] Heartbeat connected on RolloutWorker_w4 [2024-11-07 13:56:49,782][01021] Heartbeat connected on RolloutWorker_w5 [2024-11-07 13:56:49,787][01021] Heartbeat connected on RolloutWorker_w6 [2024-11-07 13:56:49,790][01021] Heartbeat connected on RolloutWorker_w7 [2024-11-07 13:56:49,951][01310] Loading model from checkpoint [2024-11-07 13:56:49,954][01310] Loaded experiment state at self.train_step=982, self.env_steps=4022272 [2024-11-07 13:56:49,955][01310] Initialized policy 0 weights for model version 982 [2024-11-07 13:56:49,964][01310] LearnerWorker_p0 finished initialization! [2024-11-07 13:56:49,965][01310] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 13:56:49,966][01021] Heartbeat connected on LearnerWorker_p0 [2024-11-07 13:56:50,158][01323] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 13:56:50,159][01323] RunningMeanStd input shape: (1,) [2024-11-07 13:56:50,174][01323] ConvEncoder: input_channels=3 [2024-11-07 13:56:50,310][01323] Conv encoder output size: 512 [2024-11-07 13:56:50,310][01323] Policy head output size: 512 [2024-11-07 13:56:50,362][01021] Inference worker 0-0 is ready! [2024-11-07 13:56:50,363][01021] All inference workers are ready! Signal rollout workers to start! [2024-11-07 13:56:50,460][01328] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:56:50,476][01324] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:56:50,479][01327] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:56:50,484][01325] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:56:50,491][01326] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:56:50,500][01329] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:56:50,563][01330] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:56:50,596][01331] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 13:56:53,662][01328] Decorrelating experience for 0 frames... [2024-11-07 13:56:53,662][01330] Decorrelating experience for 0 frames... [2024-11-07 13:56:53,662][01324] Decorrelating experience for 0 frames... [2024-11-07 13:56:53,662][01325] Decorrelating experience for 0 frames... [2024-11-07 13:56:53,668][01327] Decorrelating experience for 0 frames... [2024-11-07 13:56:53,979][01326] Decorrelating experience for 0 frames... [2024-11-07 13:56:53,996][01331] Decorrelating experience for 0 frames... [2024-11-07 13:56:54,023][01329] Decorrelating experience for 0 frames... [2024-11-07 13:56:54,091][01324] Decorrelating experience for 32 frames... [2024-11-07 13:56:54,109][01325] Decorrelating experience for 32 frames... [2024-11-07 13:56:54,131][01021] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4022272. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 13:56:54,431][01331] Decorrelating experience for 32 frames... [2024-11-07 13:56:54,452][01328] Decorrelating experience for 32 frames... [2024-11-07 13:56:54,542][01330] Decorrelating experience for 32 frames... [2024-11-07 13:56:54,805][01329] Decorrelating experience for 32 frames... [2024-11-07 13:56:54,843][01324] Decorrelating experience for 64 frames... [2024-11-07 13:56:54,872][01325] Decorrelating experience for 64 frames... [2024-11-07 13:56:55,034][01331] Decorrelating experience for 64 frames... [2024-11-07 13:56:55,041][01328] Decorrelating experience for 64 frames... [2024-11-07 13:56:55,044][01327] Decorrelating experience for 32 frames... [2024-11-07 13:56:55,174][01330] Decorrelating experience for 64 frames... [2024-11-07 13:56:55,333][01324] Decorrelating experience for 96 frames... [2024-11-07 13:56:55,463][01326] Decorrelating experience for 32 frames... [2024-11-07 13:56:55,464][01328] Decorrelating experience for 96 frames... [2024-11-07 13:56:55,520][01331] Decorrelating experience for 96 frames... [2024-11-07 13:56:55,555][01327] Decorrelating experience for 64 frames... [2024-11-07 13:56:55,867][01329] Decorrelating experience for 64 frames... [2024-11-07 13:56:55,898][01330] Decorrelating experience for 96 frames... [2024-11-07 13:56:55,935][01326] Decorrelating experience for 64 frames... [2024-11-07 13:56:55,947][01327] Decorrelating experience for 96 frames... [2024-11-07 13:56:56,231][01329] Decorrelating experience for 96 frames... [2024-11-07 13:56:56,233][01325] Decorrelating experience for 96 frames... [2024-11-07 13:56:56,289][01326] Decorrelating experience for 96 frames... [2024-11-07 13:56:59,130][01021] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4022272. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 13:57:04,131][01021] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4022272. Throughput: 0: 8.2. Samples: 82. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 13:57:04,134][01021] Avg episode reward: [(0, '1.040')] [2024-11-07 13:57:04,869][01310] Signal inference workers to stop experience collection... [2024-11-07 13:57:04,898][01323] InferenceWorker_p0-w0: stopping experience collection [2024-11-07 13:57:09,130][01021] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4022272. Throughput: 0: 161.6. Samples: 2424. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 13:57:09,132][01021] Avg episode reward: [(0, '1.992')] [2024-11-07 13:57:14,905][01021] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4022272. Throughput: 0: 116.7. Samples: 2424. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 13:57:14,908][01021] Avg episode reward: [(0, '1.992')] [2024-11-07 13:57:19,131][01021] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4022272. Throughput: 0: 97.0. Samples: 2424. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 13:57:19,134][01021] Avg episode reward: [(0, '1.992')] [2024-11-07 13:57:19,956][01310] Signal inference workers to resume experience collection... [2024-11-07 13:57:19,957][01323] InferenceWorker_p0-w0: resuming experience collection [2024-11-07 13:57:19,958][01310] Stopping Batcher_0... [2024-11-07 13:57:19,958][01310] Loop batcher_evt_loop terminating... [2024-11-07 13:57:19,969][01021] Component Batcher_0 stopped! [2024-11-07 13:57:19,983][01323] Weights refcount: 2 0 [2024-11-07 13:57:19,997][01323] Stopping InferenceWorker_p0-w0... [2024-11-07 13:57:19,997][01323] Loop inference_proc0-0_evt_loop terminating... [2024-11-07 13:57:19,997][01021] Component InferenceWorker_p0-w0 stopped! [2024-11-07 13:57:20,142][01328] Stopping RolloutWorker_w4... [2024-11-07 13:57:20,142][01021] Component RolloutWorker_w4 stopped! [2024-11-07 13:57:20,145][01327] Stopping RolloutWorker_w3... [2024-11-07 13:57:20,146][01327] Loop rollout_proc3_evt_loop terminating... [2024-11-07 13:57:20,149][01328] Loop rollout_proc4_evt_loop terminating... [2024-11-07 13:57:20,157][01325] Stopping RolloutWorker_w1... [2024-11-07 13:57:20,158][01325] Loop rollout_proc1_evt_loop terminating... [2024-11-07 13:57:20,145][01021] Component RolloutWorker_w3 stopped! [2024-11-07 13:57:20,159][01021] Component RolloutWorker_w1 stopped! [2024-11-07 13:57:20,203][01021] Component RolloutWorker_w2 stopped! [2024-11-07 13:57:20,203][01326] Stopping RolloutWorker_w2... [2024-11-07 13:57:20,205][01326] Loop rollout_proc2_evt_loop terminating... [2024-11-07 13:57:20,207][01324] Stopping RolloutWorker_w0... [2024-11-07 13:57:20,207][01324] Loop rollout_proc0_evt_loop terminating... [2024-11-07 13:57:20,207][01021] Component RolloutWorker_w0 stopped! [2024-11-07 13:57:20,215][01329] Stopping RolloutWorker_w5... [2024-11-07 13:57:20,215][01329] Loop rollout_proc5_evt_loop terminating... [2024-11-07 13:57:20,214][01021] Component RolloutWorker_w6 stopped! [2024-11-07 13:57:20,217][01021] Component RolloutWorker_w5 stopped! [2024-11-07 13:57:20,219][01330] Stopping RolloutWorker_w6... [2024-11-07 13:57:20,220][01330] Loop rollout_proc6_evt_loop terminating... [2024-11-07 13:57:20,230][01021] Component RolloutWorker_w7 stopped! [2024-11-07 13:57:20,232][01331] Stopping RolloutWorker_w7... [2024-11-07 13:57:20,232][01331] Loop rollout_proc7_evt_loop terminating... [2024-11-07 13:57:21,122][01310] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000984_4030464.pth... [2024-11-07 13:57:21,556][01310] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth [2024-11-07 13:57:21,559][01310] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000984_4030464.pth... [2024-11-07 13:57:21,776][01310] Stopping LearnerWorker_p0... [2024-11-07 13:57:21,777][01310] Loop learner_proc0_evt_loop terminating... [2024-11-07 13:57:21,777][01021] Component LearnerWorker_p0 stopped! [2024-11-07 13:57:21,787][01021] Waiting for process learner_proc0 to stop... [2024-11-07 13:57:23,265][01021] Waiting for process inference_proc0-0 to join... [2024-11-07 13:57:23,267][01021] Waiting for process rollout_proc0 to join... [2024-11-07 13:57:23,268][01021] Waiting for process rollout_proc1 to join... [2024-11-07 13:57:23,269][01021] Waiting for process rollout_proc2 to join... [2024-11-07 13:57:23,271][01021] Waiting for process rollout_proc3 to join... [2024-11-07 13:57:23,274][01021] Waiting for process rollout_proc4 to join... [2024-11-07 13:57:23,276][01021] Waiting for process rollout_proc5 to join... [2024-11-07 13:57:23,278][01021] Waiting for process rollout_proc6 to join... [2024-11-07 13:57:23,279][01021] Waiting for process rollout_proc7 to join... [2024-11-07 13:57:23,281][01021] Batcher 0 profile tree view: batching: 0.2353, releasing_batches: 0.0013 [2024-11-07 13:57:23,283][01021] InferenceWorker_p0-w0 profile tree view: update_model: 0.0166 wait_policy: 0.0001 wait_policy_total: 4.9637 one_step: 0.0133 handle_policy_step: 9.3400 deserialize: 0.0446, stack: 0.0056, obs_to_device_normalize: 1.9853, forward: 6.9578, send_messages: 0.0880 prepare_outputs: 0.2132 to_cpu: 0.1588 [2024-11-07 13:57:23,284][01021] Learner 0 profile tree view: misc: 0.0000, prepare_batch: 5.2481 train: 11.8101 epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0014, kl_divergence: 0.0295, after_optimizer: 0.4496 calculate_losses: 2.7341 losses_init: 0.0000, forward_head: 0.5470, bptt_initial: 1.6040, tail: 0.0654, advantages_returns: 0.0018, losses: 0.2425 bptt: 0.2468 bptt_forward_core: 0.2465 update: 8.5930 clip: 0.9877 [2024-11-07 13:57:23,287][01021] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.0012, enqueue_policy_requests: 0.0547, env_step: 0.3752, overhead: 0.0338, complete_rollouts: 0.0017 save_policy_outputs: 0.0544 split_output_tensors: 0.0182 [2024-11-07 13:57:23,291][01021] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.0007, enqueue_policy_requests: 0.0555, env_step: 0.7234, overhead: 0.0269, complete_rollouts: 0.0012 save_policy_outputs: 0.0563 split_output_tensors: 0.0150 [2024-11-07 13:57:23,299][01021] Loop Runner_EvtLoop terminating... [2024-11-07 13:57:23,303][01021] Runner profile tree view: main_loop: 53.5122 [2024-11-07 13:57:23,305][01021] Collected {0: 4030464}, FPS: 153.1 [2024-11-07 14:00:00,289][01021] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json [2024-11-07 14:00:00,291][01021] Overriding arg 'num_workers' with value 1 passed from command line [2024-11-07 14:00:00,292][01021] Adding new argument 'no_render'=True that is not in the saved config file! [2024-11-07 14:00:00,293][01021] Adding new argument 'save_video'=True that is not in the saved config file! [2024-11-07 14:00:00,294][01021] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-11-07 14:00:00,295][01021] Adding new argument 'video_name'=None that is not in the saved config file! [2024-11-07 14:00:00,295][01021] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-11-07 14:00:00,296][01021] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-11-07 14:00:00,297][01021] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-11-07 14:00:00,298][01021] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-11-07 14:00:00,299][01021] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-11-07 14:00:00,299][01021] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-11-07 14:00:00,300][01021] Adding new argument 'train_script'=None that is not in the saved config file! [2024-11-07 14:00:00,303][01021] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-11-07 14:00:00,305][01021] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-11-07 14:00:00,346][01021] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 14:00:00,350][01021] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 14:00:00,353][01021] RunningMeanStd input shape: (1,) [2024-11-07 14:00:00,390][01021] ConvEncoder: input_channels=3 [2024-11-07 14:00:00,786][01021] Conv encoder output size: 512 [2024-11-07 14:00:00,788][01021] Policy head output size: 512 [2024-11-07 14:00:01,587][01021] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000984_4030464.pth... [2024-11-07 14:00:04,356][01021] Num frames 100... [2024-11-07 14:00:04,562][01021] Num frames 200... [2024-11-07 14:00:04,746][01021] Num frames 300... [2024-11-07 14:00:04,936][01021] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2024-11-07 14:00:04,939][01021] Avg episode reward: 3.840, avg true_objective: 3.840 [2024-11-07 14:00:04,988][01021] Num frames 400... [2024-11-07 14:00:05,172][01021] Num frames 500... [2024-11-07 14:00:05,341][01021] Num frames 600... [2024-11-07 14:00:05,508][01021] Num frames 700... [2024-11-07 14:00:05,673][01021] Num frames 800... [2024-11-07 14:00:05,789][01021] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 [2024-11-07 14:00:05,793][01021] Avg episode reward: 4.660, avg true_objective: 4.160 [2024-11-07 14:00:05,918][01021] Num frames 900... [2024-11-07 14:00:06,069][01021] Num frames 1000... [2024-11-07 14:00:06,227][01021] Num frames 1100... [2024-11-07 14:00:06,391][01021] Num frames 1200... [2024-11-07 14:00:06,479][01021] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 [2024-11-07 14:00:06,483][01021] Avg episode reward: 4.387, avg true_objective: 4.053 [2024-11-07 14:00:06,692][01021] Num frames 1300... [2024-11-07 14:00:06,916][01021] Num frames 1400... [2024-11-07 14:00:07,079][01021] Num frames 1500... [2024-11-07 14:00:07,271][01021] Num frames 1600... [2024-11-07 14:00:07,323][01021] Avg episode rewards: #0: 4.250, true rewards: #0: 4.000 [2024-11-07 14:00:07,325][01021] Avg episode reward: 4.250, avg true_objective: 4.000 [2024-11-07 14:00:07,479][01021] Num frames 1700... [2024-11-07 14:00:07,634][01021] Num frames 1800... [2024-11-07 14:00:07,786][01021] Num frames 1900... [2024-11-07 14:00:08,004][01021] Avg episode rewards: #0: 4.168, true rewards: #0: 3.968 [2024-11-07 14:00:08,008][01021] Avg episode reward: 4.168, avg true_objective: 3.968 [2024-11-07 14:00:08,059][01021] Num frames 2000... [2024-11-07 14:00:08,245][01021] Num frames 2100... [2024-11-07 14:00:08,406][01021] Num frames 2200... [2024-11-07 14:00:08,566][01021] Num frames 2300... [2024-11-07 14:00:08,737][01021] Avg episode rewards: #0: 4.113, true rewards: #0: 3.947 [2024-11-07 14:00:08,742][01021] Avg episode reward: 4.113, avg true_objective: 3.947 [2024-11-07 14:00:08,812][01021] Num frames 2400... [2024-11-07 14:00:08,961][01021] Num frames 2500... [2024-11-07 14:00:09,102][01021] Num frames 2600... [2024-11-07 14:00:09,253][01021] Num frames 2700... [2024-11-07 14:00:09,408][01021] Num frames 2800... [2024-11-07 14:00:09,493][01021] Avg episode rewards: #0: 4.309, true rewards: #0: 4.023 [2024-11-07 14:00:09,496][01021] Avg episode reward: 4.309, avg true_objective: 4.023 [2024-11-07 14:00:09,649][01021] Num frames 2900... [2024-11-07 14:00:09,796][01021] Num frames 3000... [2024-11-07 14:00:09,947][01021] Num frames 3100... [2024-11-07 14:00:10,098][01021] Num frames 3200... [2024-11-07 14:00:10,151][01021] Avg episode rewards: #0: 4.250, true rewards: #0: 4.000 [2024-11-07 14:00:10,153][01021] Avg episode reward: 4.250, avg true_objective: 4.000 [2024-11-07 14:00:10,317][01021] Num frames 3300... [2024-11-07 14:00:10,471][01021] Num frames 3400... [2024-11-07 14:00:10,628][01021] Num frames 3500... [2024-11-07 14:00:10,807][01021] Avg episode rewards: #0: 4.204, true rewards: #0: 3.982 [2024-11-07 14:00:10,808][01021] Avg episode reward: 4.204, avg true_objective: 3.982 [2024-11-07 14:00:10,841][01021] Num frames 3600... [2024-11-07 14:00:11,029][01021] Num frames 3700... [2024-11-07 14:00:11,179][01021] Num frames 3800... [2024-11-07 14:00:11,327][01021] Num frames 3900... [2024-11-07 14:00:11,471][01021] Num frames 4000... [2024-11-07 14:00:11,524][01021] Avg episode rewards: #0: 4.300, true rewards: #0: 4.000 [2024-11-07 14:00:11,526][01021] Avg episode reward: 4.300, avg true_objective: 4.000 [2024-11-07 14:00:25,803][01021] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4! [2024-11-07 14:19:59,772][01364] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... [2024-11-07 14:19:59,796][01364] Rollout worker 0 uses device cpu [2024-11-07 14:19:59,799][01364] Rollout worker 1 uses device cpu [2024-11-07 14:19:59,801][01364] Rollout worker 2 uses device cpu [2024-11-07 14:19:59,804][01364] Rollout worker 3 uses device cpu [2024-11-07 14:19:59,806][01364] Rollout worker 4 uses device cpu [2024-11-07 14:19:59,811][01364] Rollout worker 5 uses device cpu [2024-11-07 14:19:59,815][01364] Rollout worker 6 uses device cpu [2024-11-07 14:19:59,819][01364] Rollout worker 7 uses device cpu [2024-11-07 14:20:00,277][01364] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 14:20:00,278][01364] InferenceWorker_p0-w0: min num requests: 2 [2024-11-07 14:20:00,326][01364] Starting all processes... [2024-11-07 14:20:00,328][01364] Starting process learner_proc0 [2024-11-07 14:20:00,463][01364] Starting all processes... [2024-11-07 14:20:00,522][01364] Starting process inference_proc0-0 [2024-11-07 14:20:00,524][01364] Starting process rollout_proc0 [2024-11-07 14:20:00,525][01364] Starting process rollout_proc1 [2024-11-07 14:20:00,531][01364] Starting process rollout_proc2 [2024-11-07 14:20:00,532][01364] Starting process rollout_proc3 [2024-11-07 14:20:00,533][01364] Starting process rollout_proc4 [2024-11-07 14:20:00,534][01364] Starting process rollout_proc5 [2024-11-07 14:20:00,534][01364] Starting process rollout_proc6 [2024-11-07 14:20:00,539][01364] Starting process rollout_proc7 [2024-11-07 14:20:09,203][01593] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 14:20:09,203][01593] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-11-07 14:20:09,347][01608] Worker 2 uses CPU cores [2] [2024-11-07 14:20:09,365][01617] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] [2024-11-07 14:20:09,426][01615] Worker 6 uses CPU cores [6] [2024-11-07 14:20:09,451][01609] Worker 1 uses CPU cores [1] [2024-11-07 14:20:09,547][01593] Num visible devices: 1 [2024-11-07 14:20:09,599][01593] Starting seed is not provided [2024-11-07 14:20:09,600][01593] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 14:20:09,601][01593] Initializing actor-critic model on device cuda:0 [2024-11-07 14:20:09,610][01593] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 14:20:09,636][01593] RunningMeanStd input shape: (1,) [2024-11-07 14:20:09,721][01593] ConvEncoder: input_channels=3 [2024-11-07 14:20:09,749][01614] Worker 5 uses CPU cores [5] [2024-11-07 14:20:10,207][01610] Worker 3 uses CPU cores [3] [2024-11-07 14:20:10,222][01612] Worker 4 uses CPU cores [4] [2024-11-07 14:20:10,250][01606] Worker 0 uses CPU cores [0] [2024-11-07 14:20:10,328][01607] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 14:20:10,328][01607] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-11-07 14:20:10,365][01607] Num visible devices: 1 [2024-11-07 14:20:11,011][01593] Conv encoder output size: 512 [2024-11-07 14:20:11,012][01593] Policy head output size: 512 [2024-11-07 14:20:11,256][01593] Created Actor Critic model with architecture: [2024-11-07 14:20:11,256][01593] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-11-07 14:20:15,178][01593] Using optimizer [2024-11-07 14:20:20,270][01364] Heartbeat connected on Batcher_0 [2024-11-07 14:20:20,280][01364] Heartbeat connected on InferenceWorker_p0-w0 [2024-11-07 14:20:20,287][01364] Heartbeat connected on RolloutWorker_w0 [2024-11-07 14:20:20,293][01364] Heartbeat connected on RolloutWorker_w1 [2024-11-07 14:20:20,305][01364] Heartbeat connected on RolloutWorker_w3 [2024-11-07 14:20:20,308][01364] Heartbeat connected on RolloutWorker_w2 [2024-11-07 14:20:20,312][01364] Heartbeat connected on RolloutWorker_w4 [2024-11-07 14:20:20,318][01364] Heartbeat connected on RolloutWorker_w5 [2024-11-07 14:20:20,327][01364] Heartbeat connected on RolloutWorker_w6 [2024-11-07 14:20:20,328][01364] Heartbeat connected on RolloutWorker_w7 [2024-11-07 14:20:22,849][01593] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000984_4030464.pth... [2024-11-07 14:20:23,407][01593] Loading model from checkpoint [2024-11-07 14:20:23,412][01593] Loaded experiment state at self.train_step=984, self.env_steps=4030464 [2024-11-07 14:20:23,413][01593] Initialized policy 0 weights for model version 984 [2024-11-07 14:20:23,423][01593] LearnerWorker_p0 finished initialization! [2024-11-07 14:20:23,423][01593] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 14:20:23,425][01364] Heartbeat connected on LearnerWorker_p0 [2024-11-07 14:20:23,701][01607] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 14:20:23,702][01607] RunningMeanStd input shape: (1,) [2024-11-07 14:20:23,723][01607] ConvEncoder: input_channels=3 [2024-11-07 14:20:23,896][01607] Conv encoder output size: 512 [2024-11-07 14:20:23,897][01607] Policy head output size: 512 [2024-11-07 14:20:23,967][01364] Inference worker 0-0 is ready! [2024-11-07 14:20:23,969][01364] All inference workers are ready! Signal rollout workers to start! [2024-11-07 14:20:24,129][01612] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 14:20:24,149][01608] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 14:20:24,151][01610] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 14:20:24,184][01614] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 14:20:24,211][01609] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 14:20:24,261][01617] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 14:20:24,269][01615] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 14:20:24,270][01606] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 14:20:24,824][01364] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4030464. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 14:20:27,103][01617] Decorrelating experience for 0 frames... [2024-11-07 14:20:27,104][01608] Decorrelating experience for 0 frames... [2024-11-07 14:20:27,104][01610] Decorrelating experience for 0 frames... [2024-11-07 14:20:27,104][01609] Decorrelating experience for 0 frames... [2024-11-07 14:20:27,103][01614] Decorrelating experience for 0 frames... [2024-11-07 14:20:27,569][01612] Decorrelating experience for 0 frames... [2024-11-07 14:20:27,573][01615] Decorrelating experience for 0 frames... [2024-11-07 14:20:27,611][01617] Decorrelating experience for 32 frames... [2024-11-07 14:20:27,652][01609] Decorrelating experience for 32 frames... [2024-11-07 14:20:27,843][01614] Decorrelating experience for 32 frames... [2024-11-07 14:20:28,161][01608] Decorrelating experience for 32 frames... [2024-11-07 14:20:28,192][01615] Decorrelating experience for 32 frames... [2024-11-07 14:20:28,215][01612] Decorrelating experience for 32 frames... [2024-11-07 14:20:28,243][01606] Decorrelating experience for 0 frames... [2024-11-07 14:20:28,384][01610] Decorrelating experience for 32 frames... [2024-11-07 14:20:28,438][01617] Decorrelating experience for 64 frames... [2024-11-07 14:20:28,446][01609] Decorrelating experience for 64 frames... [2024-11-07 14:20:28,616][01614] Decorrelating experience for 64 frames... [2024-11-07 14:20:28,992][01608] Decorrelating experience for 64 frames... [2024-11-07 14:20:29,095][01612] Decorrelating experience for 64 frames... [2024-11-07 14:20:29,309][01606] Decorrelating experience for 32 frames... [2024-11-07 14:20:29,709][01610] Decorrelating experience for 64 frames... [2024-11-07 14:20:29,735][01609] Decorrelating experience for 96 frames... [2024-11-07 14:20:29,821][01364] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4030464. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 14:20:29,941][01614] Decorrelating experience for 96 frames... [2024-11-07 14:20:29,954][01608] Decorrelating experience for 96 frames... [2024-11-07 14:20:29,975][01615] Decorrelating experience for 64 frames... [2024-11-07 14:20:30,490][01612] Decorrelating experience for 96 frames... [2024-11-07 14:20:30,705][01617] Decorrelating experience for 96 frames... [2024-11-07 14:20:31,022][01610] Decorrelating experience for 96 frames... [2024-11-07 14:20:31,053][01606] Decorrelating experience for 64 frames... [2024-11-07 14:20:31,060][01615] Decorrelating experience for 96 frames... [2024-11-07 14:20:31,473][01606] Decorrelating experience for 96 frames... [2024-11-07 14:20:34,819][01364] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4030464. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 14:20:38,893][01593] Signal inference workers to stop experience collection... [2024-11-07 14:20:38,915][01607] InferenceWorker_p0-w0: stopping experience collection [2024-11-07 14:20:39,820][01364] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4030464. Throughput: 0: 151.9. Samples: 2278. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 14:20:39,822][01364] Avg episode reward: [(0, '1.962')] [2024-11-07 14:20:44,819][01364] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4030464. Throughput: 0: 113.9. Samples: 2278. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 14:20:44,823][01364] Avg episode reward: [(0, '1.962')] [2024-11-07 14:20:49,819][01364] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4030464. Throughput: 0: 91.1. Samples: 2278. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 14:20:49,821][01364] Avg episode reward: [(0, '1.962')] [2024-11-07 14:20:51,836][01593] Signal inference workers to resume experience collection... [2024-11-07 14:20:51,842][01593] Stopping Batcher_0... [2024-11-07 14:20:51,842][01593] Loop batcher_evt_loop terminating... [2024-11-07 14:20:51,882][01364] Component Batcher_0 stopped! [2024-11-07 14:20:51,951][01614] Stopping RolloutWorker_w5... [2024-11-07 14:20:51,976][01614] Loop rollout_proc5_evt_loop terminating... [2024-11-07 14:20:51,977][01608] Stopping RolloutWorker_w2... [2024-11-07 14:20:51,977][01608] Loop rollout_proc2_evt_loop terminating... [2024-11-07 14:20:51,978][01609] Stopping RolloutWorker_w1... [2024-11-07 14:20:51,979][01609] Loop rollout_proc1_evt_loop terminating... [2024-11-07 14:20:51,952][01364] Component RolloutWorker_w5 stopped! [2024-11-07 14:20:52,051][01364] Component RolloutWorker_w2 stopped! [2024-11-07 14:20:52,053][01364] Component RolloutWorker_w1 stopped! [2024-11-07 14:20:52,062][01606] Stopping RolloutWorker_w0... [2024-11-07 14:20:52,063][01606] Loop rollout_proc0_evt_loop terminating... [2024-11-07 14:20:52,067][01364] Component RolloutWorker_w0 stopped! [2024-11-07 14:20:52,133][01364] Component RolloutWorker_w6 stopped! [2024-11-07 14:20:52,135][01615] Stopping RolloutWorker_w6... [2024-11-07 14:20:52,136][01615] Loop rollout_proc6_evt_loop terminating... [2024-11-07 14:20:52,214][01607] Weights refcount: 2 0 [2024-11-07 14:20:52,217][01607] Stopping InferenceWorker_p0-w0... [2024-11-07 14:20:52,217][01607] Loop inference_proc0-0_evt_loop terminating... [2024-11-07 14:20:52,217][01364] Component InferenceWorker_p0-w0 stopped! [2024-11-07 14:20:52,251][01610] Stopping RolloutWorker_w3... [2024-11-07 14:20:52,252][01610] Loop rollout_proc3_evt_loop terminating... [2024-11-07 14:20:52,245][01364] Component RolloutWorker_w3 stopped! [2024-11-07 14:20:52,289][01617] Stopping RolloutWorker_w7... [2024-11-07 14:20:52,288][01364] Component RolloutWorker_w7 stopped! [2024-11-07 14:20:52,290][01617] Loop rollout_proc7_evt_loop terminating... [2024-11-07 14:20:52,398][01612] Stopping RolloutWorker_w4... [2024-11-07 14:20:52,399][01612] Loop rollout_proc4_evt_loop terminating... [2024-11-07 14:20:52,398][01364] Component RolloutWorker_w4 stopped! [2024-11-07 14:20:53,361][01593] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000986_4038656.pth... [2024-11-07 14:20:53,602][01593] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000982_4022272.pth [2024-11-07 14:20:53,608][01593] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000986_4038656.pth... [2024-11-07 14:20:53,773][01364] Component LearnerWorker_p0 stopped! [2024-11-07 14:20:53,776][01364] Waiting for process learner_proc0 to stop... [2024-11-07 14:20:53,778][01593] Stopping LearnerWorker_p0... [2024-11-07 14:20:53,779][01593] Loop learner_proc0_evt_loop terminating... [2024-11-07 14:20:55,232][01364] Waiting for process inference_proc0-0 to join... [2024-11-07 14:20:55,234][01364] Waiting for process rollout_proc0 to join... [2024-11-07 14:20:55,237][01364] Waiting for process rollout_proc1 to join... [2024-11-07 14:20:55,238][01364] Waiting for process rollout_proc2 to join... [2024-11-07 14:20:55,240][01364] Waiting for process rollout_proc3 to join... [2024-11-07 14:20:55,242][01364] Waiting for process rollout_proc4 to join... [2024-11-07 14:20:55,244][01364] Waiting for process rollout_proc5 to join... [2024-11-07 14:20:55,246][01364] Waiting for process rollout_proc6 to join... [2024-11-07 14:20:55,248][01364] Waiting for process rollout_proc7 to join... [2024-11-07 14:20:55,251][01364] Batcher 0 profile tree view: batching: 0.1728, releasing_batches: 0.0057 [2024-11-07 14:20:55,254][01364] InferenceWorker_p0-w0 profile tree view: update_model: 0.0126 wait_policy: 0.0001 wait_policy_total: 5.8211 one_step: 0.0248 handle_policy_step: 8.8489 deserialize: 0.0802, stack: 0.0286, obs_to_device_normalize: 1.8886, forward: 6.2329, send_messages: 0.1620 prepare_outputs: 0.4037 to_cpu: 0.3303 [2024-11-07 14:20:55,256][01364] Learner 0 profile tree view: misc: 0.0000, prepare_batch: 5.2266 train: 10.5341 epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0017, kl_divergence: 0.0164, after_optimizer: 0.4198 calculate_losses: 2.3312 losses_init: 0.0000, forward_head: 0.6032, bptt_initial: 1.2839, tail: 0.0609, advantages_returns: 0.0013, losses: 0.2480 bptt: 0.1068 bptt_forward_core: 0.1065 update: 7.7635 clip: 0.7057 [2024-11-07 14:20:55,257][01364] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.0005, enqueue_policy_requests: 0.0795, env_step: 0.4782, overhead: 0.0679, complete_rollouts: 0.0116 save_policy_outputs: 0.0594 split_output_tensors: 0.0209 [2024-11-07 14:20:55,259][01364] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.0008, enqueue_policy_requests: 0.0926, env_step: 0.8226, overhead: 0.0360, complete_rollouts: 0.0009 save_policy_outputs: 0.0530 split_output_tensors: 0.0154 [2024-11-07 14:20:55,262][01364] Loop Runner_EvtLoop terminating... [2024-11-07 14:20:55,264][01364] Runner profile tree view: main_loop: 54.9388 [2024-11-07 14:20:55,268][01364] Collected {0: 4038656}, FPS: 149.1 [2024-11-07 14:20:56,973][01364] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json [2024-11-07 14:20:56,975][01364] Overriding arg 'num_workers' with value 1 passed from command line [2024-11-07 14:20:56,977][01364] Adding new argument 'no_render'=True that is not in the saved config file! [2024-11-07 14:20:56,980][01364] Adding new argument 'save_video'=True that is not in the saved config file! [2024-11-07 14:20:56,982][01364] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-11-07 14:20:56,985][01364] Adding new argument 'video_name'=None that is not in the saved config file! [2024-11-07 14:20:56,992][01364] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-11-07 14:20:56,995][01364] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-11-07 14:20:56,999][01364] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-11-07 14:20:57,001][01364] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-11-07 14:20:57,004][01364] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-11-07 14:20:57,006][01364] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-11-07 14:20:57,008][01364] Adding new argument 'train_script'=None that is not in the saved config file! [2024-11-07 14:20:57,010][01364] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-11-07 14:20:57,013][01364] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-11-07 14:20:57,069][01364] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 14:20:57,073][01364] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 14:20:57,077][01364] RunningMeanStd input shape: (1,) [2024-11-07 14:20:57,119][01364] ConvEncoder: input_channels=3 [2024-11-07 14:20:57,361][01364] Conv encoder output size: 512 [2024-11-07 14:20:57,363][01364] Policy head output size: 512 [2024-11-07 14:20:58,389][01364] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000986_4038656.pth... [2024-11-07 14:20:59,532][01364] Num frames 100... [2024-11-07 14:20:59,794][01364] Num frames 200... [2024-11-07 14:21:00,049][01364] Num frames 300... [2024-11-07 14:21:00,361][01364] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2024-11-07 14:21:00,366][01364] Avg episode reward: 3.840, avg true_objective: 3.840 [2024-11-07 14:21:00,413][01364] Num frames 400... [2024-11-07 14:21:00,720][01364] Num frames 500... [2024-11-07 14:21:01,029][01364] Num frames 600... [2024-11-07 14:21:01,338][01364] Num frames 700... [2024-11-07 14:21:01,669][01364] Num frames 800... [2024-11-07 14:21:01,723][01364] Avg episode rewards: #0: 4.500, true rewards: #0: 4.000 [2024-11-07 14:21:01,726][01364] Avg episode reward: 4.500, avg true_objective: 4.000 [2024-11-07 14:21:02,061][01364] Num frames 900... [2024-11-07 14:21:02,385][01364] Num frames 1000... [2024-11-07 14:21:02,682][01364] Num frames 1100... [2024-11-07 14:21:02,997][01364] Avg episode rewards: #0: 4.280, true rewards: #0: 3.947 [2024-11-07 14:21:02,998][01364] Avg episode reward: 4.280, avg true_objective: 3.947 [2024-11-07 14:21:03,057][01364] Num frames 1200... [2024-11-07 14:21:03,486][01364] Num frames 1300... [2024-11-07 14:21:03,816][01364] Num frames 1400... [2024-11-07 14:21:04,086][01364] Num frames 1500... [2024-11-07 14:21:04,305][01364] Avg episode rewards: #0: 4.170, true rewards: #0: 3.920 [2024-11-07 14:21:04,310][01364] Avg episode reward: 4.170, avg true_objective: 3.920 [2024-11-07 14:21:04,411][01364] Num frames 1600... [2024-11-07 14:21:04,677][01364] Num frames 1700... [2024-11-07 14:21:04,958][01364] Num frames 1800... [2024-11-07 14:21:05,230][01364] Num frames 1900... [2024-11-07 14:21:05,420][01364] Avg episode rewards: #0: 4.104, true rewards: #0: 3.904 [2024-11-07 14:21:05,422][01364] Avg episode reward: 4.104, avg true_objective: 3.904 [2024-11-07 14:21:05,555][01364] Num frames 2000... [2024-11-07 14:21:05,791][01364] Num frames 2100... [2024-11-07 14:21:06,071][01364] Num frames 2200... [2024-11-07 14:21:06,353][01364] Num frames 2300... [2024-11-07 14:21:06,653][01364] Num frames 2400... [2024-11-07 14:21:06,706][01364] Avg episode rewards: #0: 4.333, true rewards: #0: 4.000 [2024-11-07 14:21:06,707][01364] Avg episode reward: 4.333, avg true_objective: 4.000 [2024-11-07 14:21:06,949][01364] Num frames 2500... [2024-11-07 14:21:07,174][01364] Num frames 2600... [2024-11-07 14:21:07,368][01364] Num frames 2700... [2024-11-07 14:21:07,468][01364] Avg episode rewards: #0: 4.170, true rewards: #0: 3.884 [2024-11-07 14:21:07,470][01364] Avg episode reward: 4.170, avg true_objective: 3.884 [2024-11-07 14:21:07,648][01364] Num frames 2800... [2024-11-07 14:21:08,033][01364] Num frames 2900... [2024-11-07 14:21:08,231][01364] Num frames 3000... [2024-11-07 14:21:08,439][01364] Num frames 3100... [2024-11-07 14:21:08,507][01364] Avg episode rewards: #0: 4.129, true rewards: #0: 3.879 [2024-11-07 14:21:08,510][01364] Avg episode reward: 4.129, avg true_objective: 3.879 [2024-11-07 14:21:08,729][01364] Num frames 3200... [2024-11-07 14:21:08,921][01364] Num frames 3300... [2024-11-07 14:21:09,156][01364] Num frames 3400... [2024-11-07 14:21:09,370][01364] Num frames 3500... [2024-11-07 14:21:09,590][01364] Num frames 3600... [2024-11-07 14:21:09,769][01364] Avg episode rewards: #0: 4.497, true rewards: #0: 4.052 [2024-11-07 14:21:09,771][01364] Avg episode reward: 4.497, avg true_objective: 4.052 [2024-11-07 14:21:09,918][01364] Num frames 3700... [2024-11-07 14:21:10,199][01364] Num frames 3800... [2024-11-07 14:21:10,507][01364] Num frames 3900... [2024-11-07 14:21:10,841][01364] Num frames 4000... [2024-11-07 14:21:11,242][01364] Avg episode rewards: #0: 4.595, true rewards: #0: 4.095 [2024-11-07 14:21:11,244][01364] Avg episode reward: 4.595, avg true_objective: 4.095 [2024-11-07 14:21:30,385][01364] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4! [2024-11-07 14:21:30,983][01364] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json [2024-11-07 14:21:30,986][01364] Overriding arg 'num_workers' with value 1 passed from command line [2024-11-07 14:21:30,987][01364] Adding new argument 'no_render'=True that is not in the saved config file! [2024-11-07 14:21:30,989][01364] Adding new argument 'save_video'=True that is not in the saved config file! [2024-11-07 14:21:30,991][01364] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-11-07 14:21:30,992][01364] Adding new argument 'video_name'=None that is not in the saved config file! [2024-11-07 14:21:30,996][01364] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-11-07 14:21:31,003][01364] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-11-07 14:21:31,008][01364] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-11-07 14:21:31,015][01364] Adding new argument 'hf_repository'='alidenewade/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-11-07 14:21:31,016][01364] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-11-07 14:21:31,019][01364] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-11-07 14:21:31,022][01364] Adding new argument 'train_script'=None that is not in the saved config file! [2024-11-07 14:21:31,024][01364] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-11-07 14:21:31,027][01364] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-11-07 14:21:31,140][01364] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 14:21:31,144][01364] RunningMeanStd input shape: (1,) [2024-11-07 14:21:31,185][01364] ConvEncoder: input_channels=3 [2024-11-07 14:21:31,424][01364] Conv encoder output size: 512 [2024-11-07 14:21:31,427][01364] Policy head output size: 512 [2024-11-07 14:21:31,517][01364] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000986_4038656.pth... [2024-11-07 14:21:32,709][01364] Num frames 100... [2024-11-07 14:21:33,141][01364] Num frames 200... [2024-11-07 14:21:33,553][01364] Num frames 300... [2024-11-07 14:21:33,980][01364] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2024-11-07 14:21:33,982][01364] Avg episode reward: 3.840, avg true_objective: 3.840 [2024-11-07 14:21:34,090][01364] Num frames 400... [2024-11-07 14:21:34,592][01364] Num frames 500... [2024-11-07 14:21:34,983][01364] Num frames 600... [2024-11-07 14:21:35,204][01364] Num frames 700... [2024-11-07 14:21:35,568][01364] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2024-11-07 14:21:35,569][01364] Avg episode reward: 3.840, avg true_objective: 3.840 [2024-11-07 14:21:35,685][01364] Num frames 800... [2024-11-07 14:21:35,875][01364] Num frames 900... [2024-11-07 14:21:36,122][01364] Num frames 1000... [2024-11-07 14:21:36,392][01364] Num frames 1100... [2024-11-07 14:21:36,594][01364] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2024-11-07 14:21:36,600][01364] Avg episode reward: 3.840, avg true_objective: 3.840 [2024-11-07 14:21:36,708][01364] Num frames 1200... [2024-11-07 14:21:36,914][01364] Num frames 1300... [2024-11-07 14:21:37,102][01364] Num frames 1400... [2024-11-07 14:21:37,285][01364] Num frames 1500... [2024-11-07 14:21:37,408][01364] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2024-11-07 14:21:37,410][01364] Avg episode reward: 3.840, avg true_objective: 3.840 [2024-11-07 14:21:37,556][01364] Num frames 1600... [2024-11-07 14:21:37,767][01364] Num frames 1700... [2024-11-07 14:21:37,971][01364] Num frames 1800... [2024-11-07 14:21:38,265][01364] Num frames 1900... [2024-11-07 14:21:38,537][01364] Avg episode rewards: #0: 4.168, true rewards: #0: 3.968 [2024-11-07 14:21:38,539][01364] Avg episode reward: 4.168, avg true_objective: 3.968 [2024-11-07 14:21:38,587][01364] Num frames 2000... [2024-11-07 14:21:38,827][01364] Num frames 2100... [2024-11-07 14:21:39,012][01364] Num frames 2200... [2024-11-07 14:21:39,203][01364] Num frames 2300... [2024-11-07 14:21:39,384][01364] Avg episode rewards: #0: 4.113, true rewards: #0: 3.947 [2024-11-07 14:21:39,386][01364] Avg episode reward: 4.113, avg true_objective: 3.947 [2024-11-07 14:21:39,445][01364] Num frames 2400... [2024-11-07 14:21:39,646][01364] Num frames 2500... [2024-11-07 14:21:39,981][01364] Num frames 2600... [2024-11-07 14:21:40,214][01364] Num frames 2700... [2024-11-07 14:21:40,359][01364] Avg episode rewards: #0: 4.074, true rewards: #0: 3.931 [2024-11-07 14:21:40,364][01364] Avg episode reward: 4.074, avg true_objective: 3.931 [2024-11-07 14:21:40,462][01364] Num frames 2800... [2024-11-07 14:21:40,724][01364] Num frames 2900... [2024-11-07 14:21:40,965][01364] Num frames 3000... [2024-11-07 14:21:41,149][01364] Num frames 3100... [2024-11-07 14:21:41,273][01364] Avg episode rewards: #0: 4.045, true rewards: #0: 3.920 [2024-11-07 14:21:41,276][01364] Avg episode reward: 4.045, avg true_objective: 3.920 [2024-11-07 14:21:41,466][01364] Num frames 3200... [2024-11-07 14:21:41,748][01364] Num frames 3300... [2024-11-07 14:21:42,245][01364] Num frames 3400... [2024-11-07 14:21:42,632][01364] Num frames 3500... [2024-11-07 14:21:43,045][01364] Avg episode rewards: #0: 4.204, true rewards: #0: 3.982 [2024-11-07 14:21:43,049][01364] Avg episode reward: 4.204, avg true_objective: 3.982 [2024-11-07 14:21:43,134][01364] Num frames 3600... [2024-11-07 14:21:43,343][01364] Num frames 3700... [2024-11-07 14:21:43,558][01364] Num frames 3800... [2024-11-07 14:21:43,775][01364] Num frames 3900... [2024-11-07 14:21:43,963][01364] Avg episode rewards: #0: 4.168, true rewards: #0: 3.968 [2024-11-07 14:21:43,965][01364] Avg episode reward: 4.168, avg true_objective: 3.968 [2024-11-07 14:21:56,432][01364] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4! [2024-11-07 14:22:10,194][01364] The model has been pushed to https://huggingface.co/alidenewade/rl_course_vizdoom_health_gathering_supreme [2024-11-07 14:22:17,362][01364] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json [2024-11-07 14:22:17,364][01364] Overriding arg 'num_workers' with value 1 passed from command line [2024-11-07 14:22:17,367][01364] Adding new argument 'no_render'=True that is not in the saved config file! [2024-11-07 14:22:17,370][01364] Adding new argument 'save_video'=True that is not in the saved config file! [2024-11-07 14:22:17,372][01364] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-11-07 14:22:17,375][01364] Adding new argument 'video_name'=None that is not in the saved config file! [2024-11-07 14:22:17,377][01364] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-11-07 14:22:17,382][01364] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-11-07 14:22:17,384][01364] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-11-07 14:22:17,385][01364] Adding new argument 'hf_repository'='alidenewade/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-11-07 14:22:17,388][01364] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-11-07 14:22:17,389][01364] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-11-07 14:22:17,392][01364] Adding new argument 'train_script'=None that is not in the saved config file! [2024-11-07 14:22:17,396][01364] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-11-07 14:22:17,398][01364] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-11-07 14:22:17,445][01364] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 14:22:17,450][01364] RunningMeanStd input shape: (1,) [2024-11-07 14:22:17,485][01364] ConvEncoder: input_channels=3 [2024-11-07 14:22:17,580][01364] Conv encoder output size: 512 [2024-11-07 14:22:17,582][01364] Policy head output size: 512 [2024-11-07 14:22:17,619][01364] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000986_4038656.pth... [2024-11-07 14:22:18,441][01364] Num frames 100... [2024-11-07 14:22:18,875][01364] Num frames 200... [2024-11-07 14:22:19,235][01364] Num frames 300... [2024-11-07 14:22:19,612][01364] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2024-11-07 14:22:19,614][01364] Avg episode reward: 3.840, avg true_objective: 3.840 [2024-11-07 14:22:19,694][01364] Num frames 400... [2024-11-07 14:22:20,083][01364] Num frames 500... [2024-11-07 14:22:20,493][01364] Num frames 600... [2024-11-07 14:22:20,960][01364] Num frames 700... [2024-11-07 14:22:21,365][01364] Num frames 800... [2024-11-07 14:22:21,518][01364] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 [2024-11-07 14:22:21,519][01364] Avg episode reward: 4.660, avg true_objective: 4.160 [2024-11-07 14:22:21,722][01364] Num frames 900... [2024-11-07 14:22:22,052][01364] Num frames 1000... [2024-11-07 14:22:22,332][01364] Num frames 1100... [2024-11-07 14:22:22,612][01364] Num frames 1200... [2024-11-07 14:22:22,926][01364] Avg episode rewards: #0: 4.933, true rewards: #0: 4.267 [2024-11-07 14:22:22,927][01364] Avg episode reward: 4.933, avg true_objective: 4.267 [2024-11-07 14:22:23,001][01364] Num frames 1300... [2024-11-07 14:22:23,320][01364] Num frames 1400... [2024-11-07 14:22:23,606][01364] Num frames 1500... [2024-11-07 14:22:23,880][01364] Num frames 1600... [2024-11-07 14:22:24,135][01364] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 [2024-11-07 14:22:24,136][01364] Avg episode reward: 4.660, avg true_objective: 4.160 [2024-11-07 14:22:24,254][01364] Num frames 1700... [2024-11-07 14:22:24,530][01364] Num frames 1800... [2024-11-07 14:22:24,780][01364] Num frames 1900... [2024-11-07 14:22:25,066][01364] Num frames 2000... [2024-11-07 14:22:25,268][01364] Avg episode rewards: #0: 4.496, true rewards: #0: 4.096 [2024-11-07 14:22:25,274][01364] Avg episode reward: 4.496, avg true_objective: 4.096 [2024-11-07 14:22:25,413][01364] Num frames 2100... [2024-11-07 14:22:25,642][01364] Num frames 2200... [2024-11-07 14:22:25,905][01364] Num frames 2300... [2024-11-07 14:22:26,208][01364] Num frames 2400... [2024-11-07 14:22:26,349][01364] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 [2024-11-07 14:22:26,350][01364] Avg episode reward: 4.387, avg true_objective: 4.053 [2024-11-07 14:22:26,528][01364] Num frames 2500... [2024-11-07 14:22:26,795][01364] Num frames 2600... [2024-11-07 14:22:27,084][01364] Num frames 2700... [2024-11-07 14:22:27,366][01364] Num frames 2800... [2024-11-07 14:22:29,207][01364] Num frames 2900... [2024-11-07 14:22:29,387][01364] Avg episode rewards: #0: 4.777, true rewards: #0: 4.206 [2024-11-07 14:22:29,392][01364] Avg episode reward: 4.777, avg true_objective: 4.206 [2024-11-07 14:22:29,605][01364] Num frames 3000... [2024-11-07 14:22:29,954][01364] Num frames 3100... [2024-11-07 14:22:30,304][01364] Num frames 3200... [2024-11-07 14:22:30,773][01364] Num frames 3300... [2024-11-07 14:22:30,997][01364] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 [2024-11-07 14:22:30,999][01364] Avg episode reward: 4.660, avg true_objective: 4.160 [2024-11-07 14:22:31,199][01364] Num frames 3400... [2024-11-07 14:22:31,513][01364] Num frames 3500... [2024-11-07 14:22:31,765][01364] Num frames 3600... [2024-11-07 14:22:32,016][01364] Num frames 3700... [2024-11-07 14:22:32,260][01364] Avg episode rewards: #0: 4.751, true rewards: #0: 4.196 [2024-11-07 14:22:32,266][01364] Avg episode reward: 4.751, avg true_objective: 4.196 [2024-11-07 14:22:32,344][01364] Num frames 3800... [2024-11-07 14:22:32,570][01364] Num frames 3900... [2024-11-07 14:22:32,817][01364] Num frames 4000... [2024-11-07 14:22:33,072][01364] Num frames 4100... [2024-11-07 14:22:33,260][01364] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 [2024-11-07 14:22:33,261][01364] Avg episode reward: 4.660, avg true_objective: 4.160 [2024-11-07 14:22:46,334][01364] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4! [2024-11-07 14:22:51,976][01364] The model has been pushed to https://huggingface.co/alidenewade/rl_course_vizdoom_health_gathering_supreme [2024-11-07 14:26:53,545][01364] Environment doom_basic already registered, overwriting... [2024-11-07 14:26:53,548][01364] Environment doom_two_colors_easy already registered, overwriting... [2024-11-07 14:26:53,550][01364] Environment doom_two_colors_hard already registered, overwriting... [2024-11-07 14:26:53,552][01364] Environment doom_dm already registered, overwriting... [2024-11-07 14:26:53,553][01364] Environment doom_dwango5 already registered, overwriting... [2024-11-07 14:26:53,555][01364] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-11-07 14:26:53,556][01364] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-11-07 14:26:53,558][01364] Environment doom_my_way_home already registered, overwriting... [2024-11-07 14:26:53,559][01364] Environment doom_deadly_corridor already registered, overwriting... [2024-11-07 14:26:53,561][01364] Environment doom_defend_the_center already registered, overwriting... [2024-11-07 14:26:53,564][01364] Environment doom_defend_the_line already registered, overwriting... [2024-11-07 14:26:53,565][01364] Environment doom_health_gathering already registered, overwriting... [2024-11-07 14:26:53,565][01364] Environment doom_health_gathering_supreme already registered, overwriting... [2024-11-07 14:26:53,568][01364] Environment doom_battle already registered, overwriting... [2024-11-07 14:26:53,569][01364] Environment doom_battle2 already registered, overwriting... [2024-11-07 14:26:53,570][01364] Environment doom_duel_bots already registered, overwriting... [2024-11-07 14:26:53,572][01364] Environment doom_deathmatch_bots already registered, overwriting... [2024-11-07 14:26:53,574][01364] Environment doom_duel already registered, overwriting... [2024-11-07 14:26:53,575][01364] Environment doom_deathmatch_full already registered, overwriting... [2024-11-07 14:26:53,577][01364] Environment doom_benchmark already registered, overwriting... [2024-11-07 14:26:53,579][01364] register_encoder_factory: [2024-11-07 14:26:53,595][01364] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json [2024-11-07 14:26:53,597][01364] Overriding arg 'env' with value 'LunarLander-v2' passed from command line [2024-11-07 14:26:53,603][01364] Experiment dir /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment already exists! [2024-11-07 14:26:53,604][01364] Resuming existing experiment from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment... [2024-11-07 14:26:53,607][01364] Weights and Biases integration disabled [2024-11-07 14:26:53,612][01364] Environment var CUDA_VISIBLE_DEVICES is 0 [2024-11-07 14:32:23,905][03851] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... [2024-11-07 14:32:23,907][03851] Rollout worker 0 uses device cpu [2024-11-07 14:32:23,908][03851] Rollout worker 1 uses device cpu [2024-11-07 14:32:23,909][03851] Rollout worker 2 uses device cpu [2024-11-07 14:32:23,911][03851] Rollout worker 3 uses device cpu [2024-11-07 14:32:23,913][03851] Rollout worker 4 uses device cpu [2024-11-07 14:32:23,914][03851] Rollout worker 5 uses device cpu [2024-11-07 14:32:23,916][03851] Rollout worker 6 uses device cpu [2024-11-07 14:32:23,918][03851] Rollout worker 7 uses device cpu [2024-11-07 14:32:23,982][03851] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 14:32:23,983][03851] InferenceWorker_p0-w0: min num requests: 2 [2024-11-07 14:32:24,069][03851] Starting all processes... [2024-11-07 14:32:24,070][03851] Starting process learner_proc0 [2024-11-07 14:32:24,119][03851] Starting all processes... [2024-11-07 14:32:24,126][03851] Starting process inference_proc0-0 [2024-11-07 14:32:24,126][03851] Starting process rollout_proc0 [2024-11-07 14:32:24,127][03851] Starting process rollout_proc1 [2024-11-07 14:32:24,128][03851] Starting process rollout_proc2 [2024-11-07 14:32:24,130][03851] Starting process rollout_proc3 [2024-11-07 14:32:24,131][03851] Starting process rollout_proc4 [2024-11-07 14:32:24,132][03851] Starting process rollout_proc5 [2024-11-07 14:32:24,133][03851] Starting process rollout_proc6 [2024-11-07 14:32:24,134][03851] Starting process rollout_proc7 [2024-11-07 14:32:28,240][04103] Worker 3 uses CPU cores [3] [2024-11-07 14:32:28,599][04099] Worker 0 uses CPU cores [0] [2024-11-07 14:32:28,800][04102] Worker 2 uses CPU cores [2] [2024-11-07 14:32:28,820][04086] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 14:32:28,820][04086] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-11-07 14:32:28,984][04086] Num visible devices: 1 [2024-11-07 14:32:29,023][04086] Starting seed is not provided [2024-11-07 14:32:29,023][04086] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 14:32:29,023][04086] Initializing actor-critic model on device cuda:0 [2024-11-07 14:32:29,024][04086] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 14:32:29,030][04086] RunningMeanStd input shape: (1,) [2024-11-07 14:32:29,031][04101] Worker 1 uses CPU cores [1] [2024-11-07 14:32:29,065][04086] ConvEncoder: input_channels=3 [2024-11-07 14:32:29,179][04105] Worker 5 uses CPU cores [5] [2024-11-07 14:32:29,418][04086] Conv encoder output size: 512 [2024-11-07 14:32:29,418][04086] Policy head output size: 512 [2024-11-07 14:32:29,488][04086] Created Actor Critic model with architecture: [2024-11-07 14:32:29,488][04086] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=4, bias=True) ) ) [2024-11-07 14:32:29,721][04100] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 14:32:29,722][04100] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-11-07 14:32:29,759][04100] Num visible devices: 1 [2024-11-07 14:32:29,806][04106] Worker 6 uses CPU cores [6] [2024-11-07 14:32:29,878][04107] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] [2024-11-07 14:32:29,920][04104] Worker 4 uses CPU cores [4] [2024-11-07 14:32:30,421][04086] Using optimizer [2024-11-07 14:32:31,436][04086] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000986_4038656.pth... [2024-11-07 14:32:31,484][04086] Loading model from checkpoint [2024-11-07 14:32:31,485][04086] EvtLoop [learner_proc0_evt_loop, process=learner_proc0] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Runner_EvtLoop', signal_name='start'), args=() Traceback (most recent call last): File "/root/miniconda3/envs/unit8/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/root/miniconda3/envs/unit8/lib/python3.10/site-packages/sample_factory/algo/learning/learner_worker.py", line 139, in init init_model_data = self.learner.init() File "/root/miniconda3/envs/unit8/lib/python3.10/site-packages/sample_factory/algo/learning/learner.py", line 245, in init self.load_from_checkpoint(self.policy_id) File "/root/miniconda3/envs/unit8/lib/python3.10/site-packages/sample_factory/algo/learning/learner.py", line 307, in load_from_checkpoint self._load_state(checkpoint_dict, load_progress=load_progress) File "/root/miniconda3/envs/unit8/lib/python3.10/site-packages/sample_factory/algo/learning/learner.py", line 291, in _load_state self.actor_critic.load_state_dict(checkpoint_dict["model"]) File "/root/miniconda3/envs/unit8/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2584, in load_state_dict raise RuntimeError( RuntimeError: Error(s) in loading state_dict for ActorCriticSharedWeights: size mismatch for action_parameterization.distribution_linear.weight: copying a param with shape torch.Size([5, 512]) from checkpoint, the shape in current model is torch.Size([4, 512]). size mismatch for action_parameterization.distribution_linear.bias: copying a param with shape torch.Size([5]) from checkpoint, the shape in current model is torch.Size([4]). [2024-11-07 14:32:31,488][04086] Unhandled exception Error(s) in loading state_dict for ActorCriticSharedWeights: size mismatch for action_parameterization.distribution_linear.weight: copying a param with shape torch.Size([5, 512]) from checkpoint, the shape in current model is torch.Size([4, 512]). size mismatch for action_parameterization.distribution_linear.bias: copying a param with shape torch.Size([5]) from checkpoint, the shape in current model is torch.Size([4]). in evt loop learner_proc0_evt_loop [2024-11-07 14:32:43,973][03851] Heartbeat connected on Batcher_0 [2024-11-07 14:32:43,982][03851] Heartbeat connected on InferenceWorker_p0-w0 [2024-11-07 14:32:43,989][03851] Heartbeat connected on RolloutWorker_w0 [2024-11-07 14:32:43,992][03851] Heartbeat connected on RolloutWorker_w1 [2024-11-07 14:32:43,997][03851] Heartbeat connected on RolloutWorker_w2 [2024-11-07 14:32:44,000][03851] Heartbeat connected on RolloutWorker_w3 [2024-11-07 14:32:44,004][03851] Heartbeat connected on RolloutWorker_w4 [2024-11-07 14:32:44,007][03851] Heartbeat connected on RolloutWorker_w5 [2024-11-07 14:32:44,067][03851] Heartbeat connected on RolloutWorker_w6 [2024-11-07 14:32:44,068][03851] Heartbeat connected on RolloutWorker_w7 [2024-11-07 14:34:40,019][03851] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 3851], exiting... [2024-11-07 14:34:40,023][04100] Stopping InferenceWorker_p0-w0... [2024-11-07 14:34:40,023][04106] Stopping RolloutWorker_w6... [2024-11-07 14:34:40,023][04099] Stopping RolloutWorker_w0... [2024-11-07 14:34:40,023][04106] Loop rollout_proc6_evt_loop terminating... [2024-11-07 14:34:40,023][04100] Loop inference_proc0-0_evt_loop terminating... [2024-11-07 14:34:40,023][04099] Loop rollout_proc0_evt_loop terminating... [2024-11-07 14:34:40,022][04107] Stopping RolloutWorker_w7... [2024-11-07 14:34:40,024][04107] Loop rollout_proc7_evt_loop terminating... [2024-11-07 14:34:40,024][04104] Stopping RolloutWorker_w4... [2024-11-07 14:34:40,026][04104] Loop rollout_proc4_evt_loop terminating... [2024-11-07 14:34:40,023][03851] Runner profile tree view: main_loop: 135.9546 [2024-11-07 14:34:40,027][04086] Stopping Batcher_0... [2024-11-07 14:34:40,028][04086] Loop batcher_evt_loop terminating... [2024-11-07 14:34:40,027][03851] Collected {}, FPS: 0.0 [2024-11-07 14:34:40,029][04102] Stopping RolloutWorker_w2... [2024-11-07 14:34:40,029][04102] Loop rollout_proc2_evt_loop terminating... [2024-11-07 14:34:40,030][04105] Stopping RolloutWorker_w5... [2024-11-07 14:34:40,031][04105] Loop rollout_proc5_evt_loop terminating... [2024-11-07 14:34:40,034][04101] Stopping RolloutWorker_w1... [2024-11-07 14:34:40,040][04101] Loop rollout_proc1_evt_loop terminating... [2024-11-07 14:34:40,042][04103] Stopping RolloutWorker_w3... [2024-11-07 14:34:40,046][04103] Loop rollout_proc3_evt_loop terminating... [2024-11-07 14:35:11,133][04584] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... [2024-11-07 14:35:11,135][04584] Rollout worker 0 uses device cpu [2024-11-07 14:35:11,136][04584] Rollout worker 1 uses device cpu [2024-11-07 14:35:11,138][04584] Rollout worker 2 uses device cpu [2024-11-07 14:35:11,139][04584] Rollout worker 3 uses device cpu [2024-11-07 14:35:11,141][04584] Rollout worker 4 uses device cpu [2024-11-07 14:35:11,142][04584] Rollout worker 5 uses device cpu [2024-11-07 14:35:11,144][04584] Rollout worker 6 uses device cpu [2024-11-07 14:35:11,146][04584] Rollout worker 7 uses device cpu [2024-11-07 14:35:11,205][04584] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 14:35:11,206][04584] InferenceWorker_p0-w0: min num requests: 2 [2024-11-07 14:35:11,239][04584] Starting all processes... [2024-11-07 14:35:11,241][04584] Starting process learner_proc0 [2024-11-07 14:35:11,376][04584] Starting all processes... [2024-11-07 14:35:11,383][04584] Starting process inference_proc0-0 [2024-11-07 14:35:11,384][04584] Starting process rollout_proc0 [2024-11-07 14:35:11,385][04584] Starting process rollout_proc1 [2024-11-07 14:35:11,385][04584] Starting process rollout_proc2 [2024-11-07 14:35:11,386][04584] Starting process rollout_proc3 [2024-11-07 14:35:11,387][04584] Starting process rollout_proc4 [2024-11-07 14:35:11,388][04584] Starting process rollout_proc5 [2024-11-07 14:35:11,390][04584] Starting process rollout_proc6 [2024-11-07 14:35:11,390][04584] Starting process rollout_proc7 [2024-11-07 14:35:16,833][04708] Worker 6 uses CPU cores [6] [2024-11-07 14:35:16,839][04706] Worker 4 uses CPU cores [4] [2024-11-07 14:35:17,166][04709] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] [2024-11-07 14:35:17,188][04705] Worker 2 uses CPU cores [2] [2024-11-07 14:35:17,642][04702] Worker 0 uses CPU cores [0] [2024-11-07 14:35:17,899][04707] Worker 5 uses CPU cores [5] [2024-11-07 14:35:17,907][04701] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 14:35:17,907][04701] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-11-07 14:35:17,932][04703] Worker 1 uses CPU cores [1] [2024-11-07 14:35:17,951][04688] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 14:35:17,952][04688] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-11-07 14:35:17,978][04688] Num visible devices: 1 [2024-11-07 14:35:17,978][04701] Num visible devices: 1 [2024-11-07 14:35:17,993][04688] Starting seed is not provided [2024-11-07 14:35:17,993][04688] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 14:35:17,994][04688] Initializing actor-critic model on device cuda:0 [2024-11-07 14:35:17,994][04688] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 14:35:17,995][04688] RunningMeanStd input shape: (1,) [2024-11-07 14:35:18,006][04688] ConvEncoder: input_channels=3 [2024-11-07 14:35:18,132][04688] Conv encoder output size: 512 [2024-11-07 14:35:18,133][04688] Policy head output size: 512 [2024-11-07 14:35:18,148][04688] Created Actor Critic model with architecture: [2024-11-07 14:35:18,148][04688] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-11-07 14:35:18,249][04704] Worker 3 uses CPU cores [3] [2024-11-07 14:35:18,791][04688] Using optimizer [2024-11-07 14:35:19,775][04688] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000986_4038656.pth... [2024-11-07 14:35:19,840][04688] Loading model from checkpoint [2024-11-07 14:35:19,842][04688] Loaded experiment state at self.train_step=986, self.env_steps=4038656 [2024-11-07 14:35:19,842][04688] Initialized policy 0 weights for model version 986 [2024-11-07 14:35:19,848][04688] LearnerWorker_p0 finished initialization! [2024-11-07 14:35:19,849][04688] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 14:35:20,005][04701] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 14:35:20,006][04701] RunningMeanStd input shape: (1,) [2024-11-07 14:35:20,019][04701] ConvEncoder: input_channels=3 [2024-11-07 14:35:20,124][04701] Conv encoder output size: 512 [2024-11-07 14:35:20,124][04701] Policy head output size: 512 [2024-11-07 14:35:20,165][04584] Inference worker 0-0 is ready! [2024-11-07 14:35:20,167][04584] All inference workers are ready! Signal rollout workers to start! [2024-11-07 14:35:20,253][04705] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 14:35:20,255][04703] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 14:35:20,257][04706] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 14:35:20,259][04707] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 14:35:20,265][04704] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 14:35:20,283][04708] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 14:35:20,317][04709] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 14:35:20,332][04702] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 14:35:20,647][04705] Decorrelating experience for 0 frames... [2024-11-07 14:35:20,650][04703] Decorrelating experience for 0 frames... [2024-11-07 14:35:20,663][04707] Decorrelating experience for 0 frames... [2024-11-07 14:35:20,681][04708] Decorrelating experience for 0 frames... [2024-11-07 14:35:20,693][04709] Decorrelating experience for 0 frames... [2024-11-07 14:35:20,977][04704] Decorrelating experience for 0 frames... [2024-11-07 14:35:21,024][04708] Decorrelating experience for 32 frames... [2024-11-07 14:35:21,043][04709] Decorrelating experience for 32 frames... [2024-11-07 14:35:21,063][04705] Decorrelating experience for 32 frames... [2024-11-07 14:35:21,064][04707] Decorrelating experience for 32 frames... [2024-11-07 14:35:21,066][04703] Decorrelating experience for 32 frames... [2024-11-07 14:35:21,373][04702] Decorrelating experience for 0 frames... [2024-11-07 14:35:21,407][04704] Decorrelating experience for 32 frames... [2024-11-07 14:35:21,456][04705] Decorrelating experience for 64 frames... [2024-11-07 14:35:21,457][04708] Decorrelating experience for 64 frames... [2024-11-07 14:35:21,525][04703] Decorrelating experience for 64 frames... [2024-11-07 14:35:21,694][04702] Decorrelating experience for 32 frames... [2024-11-07 14:35:21,817][04704] Decorrelating experience for 64 frames... [2024-11-07 14:35:21,843][04705] Decorrelating experience for 96 frames... [2024-11-07 14:35:21,844][04708] Decorrelating experience for 96 frames... [2024-11-07 14:35:22,066][04703] Decorrelating experience for 96 frames... [2024-11-07 14:35:22,068][04706] Decorrelating experience for 0 frames... [2024-11-07 14:35:22,108][04584] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4038656. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 14:35:22,184][04704] Decorrelating experience for 96 frames... [2024-11-07 14:35:22,274][04702] Decorrelating experience for 64 frames... [2024-11-07 14:35:22,403][04706] Decorrelating experience for 32 frames... [2024-11-07 14:35:22,687][04702] Decorrelating experience for 96 frames... [2024-11-07 14:35:22,831][04707] Decorrelating experience for 64 frames... [2024-11-07 14:35:23,152][04706] Decorrelating experience for 64 frames... [2024-11-07 14:35:23,300][04707] Decorrelating experience for 96 frames... [2024-11-07 14:35:23,313][04709] Decorrelating experience for 64 frames... [2024-11-07 14:35:23,775][04706] Decorrelating experience for 96 frames... [2024-11-07 14:35:24,023][04709] Decorrelating experience for 96 frames... [2024-11-07 14:35:24,784][04688] Signal inference workers to stop experience collection... [2024-11-07 14:35:24,831][04701] InferenceWorker_p0-w0: stopping experience collection [2024-11-07 14:35:27,108][04584] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4038656. Throughput: 0: 546.0. Samples: 2730. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 14:35:27,109][04584] Avg episode reward: [(0, '2.632')] [2024-11-07 14:35:29,009][04688] Signal inference workers to resume experience collection... [2024-11-07 14:35:29,010][04701] InferenceWorker_p0-w0: resuming experience collection [2024-11-07 14:35:31,197][04584] Heartbeat connected on Batcher_0 [2024-11-07 14:35:31,200][04584] Heartbeat connected on LearnerWorker_p0 [2024-11-07 14:35:31,212][04584] Heartbeat connected on RolloutWorker_w0 [2024-11-07 14:35:31,216][04584] Heartbeat connected on RolloutWorker_w1 [2024-11-07 14:35:31,223][04584] Heartbeat connected on InferenceWorker_p0-w0 [2024-11-07 14:35:31,227][04584] Heartbeat connected on RolloutWorker_w2 [2024-11-07 14:35:31,235][04584] Heartbeat connected on RolloutWorker_w3 [2024-11-07 14:35:31,238][04584] Heartbeat connected on RolloutWorker_w6 [2024-11-07 14:35:31,244][04584] Heartbeat connected on RolloutWorker_w4 [2024-11-07 14:35:31,250][04584] Heartbeat connected on RolloutWorker_w5 [2024-11-07 14:35:31,255][04584] Heartbeat connected on RolloutWorker_w7 [2024-11-07 14:35:32,108][04584] Fps is (10 sec: 2867.1, 60 sec: 2867.1, 300 sec: 2867.1). Total num frames: 4067328. Throughput: 0: 299.6. Samples: 2996. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-07 14:35:32,111][04584] Avg episode reward: [(0, '3.943')] [2024-11-07 14:35:35,402][04701] Updated weights for policy 0, policy_version 996 (0.0025) [2024-11-07 14:35:37,108][04584] Fps is (10 sec: 5324.7, 60 sec: 3549.8, 300 sec: 3549.8). Total num frames: 4091904. Throughput: 0: 700.9. Samples: 10514. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-07 14:35:37,112][04584] Avg episode reward: [(0, '4.473')] [2024-11-07 14:35:40,471][04701] Updated weights for policy 0, policy_version 1006 (0.0023) [2024-11-07 14:35:42,108][04584] Fps is (10 sec: 6553.5, 60 sec: 4710.3, 300 sec: 4710.3). Total num frames: 4132864. Throughput: 0: 1134.7. Samples: 22694. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:35:42,112][04584] Avg episode reward: [(0, '4.427')] [2024-11-07 14:35:45,559][04701] Updated weights for policy 0, policy_version 1016 (0.0033) [2024-11-07 14:35:47,108][04584] Fps is (10 sec: 8192.2, 60 sec: 5406.7, 300 sec: 5406.7). Total num frames: 4173824. Throughput: 0: 1148.1. Samples: 28702. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:35:47,110][04584] Avg episode reward: [(0, '4.231')] [2024-11-07 14:35:50,462][04701] Updated weights for policy 0, policy_version 1026 (0.0025) [2024-11-07 14:35:52,108][04584] Fps is (10 sec: 8192.2, 60 sec: 5870.9, 300 sec: 5870.9). Total num frames: 4214784. Throughput: 0: 1364.1. Samples: 40924. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-07 14:35:52,111][04584] Avg episode reward: [(0, '4.551')] [2024-11-07 14:35:55,421][04701] Updated weights for policy 0, policy_version 1036 (0.0029) [2024-11-07 14:35:57,108][04584] Fps is (10 sec: 8192.0, 60 sec: 6202.5, 300 sec: 6202.5). Total num frames: 4255744. Throughput: 0: 1530.4. Samples: 53564. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 14:35:57,109][04584] Avg episode reward: [(0, '4.498')] [2024-11-07 14:36:00,664][04701] Updated weights for policy 0, policy_version 1046 (0.0037) [2024-11-07 14:36:02,109][04584] Fps is (10 sec: 7781.9, 60 sec: 6348.7, 300 sec: 6348.7). Total num frames: 4292608. Throughput: 0: 1490.7. Samples: 59628. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 14:36:02,118][04584] Avg episode reward: [(0, '4.486')] [2024-11-07 14:36:07,108][04584] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6280.5). Total num frames: 4321280. Throughput: 0: 1538.8. Samples: 69244. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-07 14:36:07,110][04584] Avg episode reward: [(0, '4.349')] [2024-11-07 14:36:07,113][04701] Updated weights for policy 0, policy_version 1056 (0.0030) [2024-11-07 14:36:12,109][04584] Fps is (10 sec: 4915.4, 60 sec: 6062.0, 300 sec: 6062.0). Total num frames: 4341760. Throughput: 0: 1618.7. Samples: 75570. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-11-07 14:36:12,111][04584] Avg episode reward: [(0, '4.336')] [2024-11-07 14:36:15,164][04701] Updated weights for policy 0, policy_version 1066 (0.0029) [2024-11-07 14:36:17,109][04584] Fps is (10 sec: 5734.2, 60 sec: 6181.2, 300 sec: 6181.2). Total num frames: 4378624. Throughput: 0: 1729.5. Samples: 80826. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 14:36:17,114][04584] Avg episode reward: [(0, '4.405')] [2024-11-07 14:36:20,267][04701] Updated weights for policy 0, policy_version 1076 (0.0026) [2024-11-07 14:36:22,108][04584] Fps is (10 sec: 7782.7, 60 sec: 6348.8, 300 sec: 6348.8). Total num frames: 4419584. Throughput: 0: 1830.0. Samples: 92864. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 14:36:22,111][04584] Avg episode reward: [(0, '4.417')] [2024-11-07 14:36:25,223][04701] Updated weights for policy 0, policy_version 1086 (0.0024) [2024-11-07 14:36:27,108][04584] Fps is (10 sec: 8192.4, 60 sec: 7031.5, 300 sec: 6490.6). Total num frames: 4460544. Throughput: 0: 1831.8. Samples: 105126. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 14:36:27,110][04584] Avg episode reward: [(0, '4.564')] [2024-11-07 14:36:30,159][04701] Updated weights for policy 0, policy_version 1096 (0.0032) [2024-11-07 14:36:32,108][04584] Fps is (10 sec: 8192.0, 60 sec: 7236.3, 300 sec: 6612.1). Total num frames: 4501504. Throughput: 0: 1834.5. Samples: 111254. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:36:32,110][04584] Avg episode reward: [(0, '4.676')] [2024-11-07 14:36:35,225][04701] Updated weights for policy 0, policy_version 1106 (0.0022) [2024-11-07 14:36:37,108][04584] Fps is (10 sec: 8191.8, 60 sec: 7509.3, 300 sec: 6717.4). Total num frames: 4542464. Throughput: 0: 1839.1. Samples: 123682. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 14:36:37,110][04584] Avg episode reward: [(0, '4.525')] [2024-11-07 14:36:40,134][04701] Updated weights for policy 0, policy_version 1116 (0.0024) [2024-11-07 14:36:43,724][04584] Fps is (10 sec: 6699.8, 60 sec: 7246.0, 300 sec: 6624.6). Total num frames: 4579328. Throughput: 0: 1768.0. Samples: 135982. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 14:36:43,725][04584] Avg episode reward: [(0, '4.556')] [2024-11-07 14:36:47,108][04584] Fps is (10 sec: 6553.6, 60 sec: 7236.2, 300 sec: 6698.1). Total num frames: 4608000. Throughput: 0: 1733.3. Samples: 137624. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 14:36:47,113][04584] Avg episode reward: [(0, '4.348')] [2024-11-07 14:36:47,253][04701] Updated weights for policy 0, policy_version 1126 (0.0024) [2024-11-07 14:36:52,108][04584] Fps is (10 sec: 8305.3, 60 sec: 7236.3, 300 sec: 6781.2). Total num frames: 4648960. Throughput: 0: 1783.3. Samples: 149492. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-07 14:36:52,109][04584] Avg episode reward: [(0, '4.319')] [2024-11-07 14:36:52,633][04701] Updated weights for policy 0, policy_version 1136 (0.0028) [2024-11-07 14:36:57,108][04584] Fps is (10 sec: 7782.6, 60 sec: 7168.0, 300 sec: 6812.3). Total num frames: 4685824. Throughput: 0: 1909.7. Samples: 161508. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:36:57,110][04584] Avg episode reward: [(0, '4.504')] [2024-11-07 14:36:57,814][04701] Updated weights for policy 0, policy_version 1146 (0.0030) [2024-11-07 14:37:02,108][04584] Fps is (10 sec: 7372.7, 60 sec: 7168.1, 300 sec: 6840.3). Total num frames: 4722688. Throughput: 0: 1923.7. Samples: 167394. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:37:02,110][04584] Avg episode reward: [(0, '4.470')] [2024-11-07 14:37:03,385][04701] Updated weights for policy 0, policy_version 1156 (0.0031) [2024-11-07 14:37:07,108][04584] Fps is (10 sec: 7782.3, 60 sec: 7372.8, 300 sec: 6904.7). Total num frames: 4763648. Throughput: 0: 1904.9. Samples: 178584. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:37:07,111][04584] Avg episode reward: [(0, '4.431')] [2024-11-07 14:37:07,126][04688] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001163_4763648.pth... [2024-11-07 14:37:07,332][04688] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000984_4030464.pth [2024-11-07 14:37:08,663][04701] Updated weights for policy 0, policy_version 1166 (0.0024) [2024-11-07 14:37:12,108][04584] Fps is (10 sec: 7372.8, 60 sec: 7577.6, 300 sec: 6888.7). Total num frames: 4796416. Throughput: 0: 1873.3. Samples: 189424. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:37:12,110][04584] Avg episode reward: [(0, '4.392')] [2024-11-07 14:37:14,273][04701] Updated weights for policy 0, policy_version 1176 (0.0024) [2024-11-07 14:37:18,138][04584] Fps is (10 sec: 5941.6, 60 sec: 7382.6, 300 sec: 6813.1). Total num frames: 4829184. Throughput: 0: 1821.9. Samples: 195116. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:37:18,142][04584] Avg episode reward: [(0, '4.604')] [2024-11-07 14:37:21,528][04701] Updated weights for policy 0, policy_version 1186 (0.0027) [2024-11-07 14:37:22,108][04584] Fps is (10 sec: 6553.5, 60 sec: 7372.8, 300 sec: 6860.8). Total num frames: 4861952. Throughput: 0: 1756.1. Samples: 202708. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 14:37:22,110][04584] Avg episode reward: [(0, '4.486')] [2024-11-07 14:37:26,690][04701] Updated weights for policy 0, policy_version 1196 (0.0022) [2024-11-07 14:37:27,108][04584] Fps is (10 sec: 7763.0, 60 sec: 7304.5, 300 sec: 6881.3). Total num frames: 4898816. Throughput: 0: 1816.0. Samples: 214768. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:37:27,110][04584] Avg episode reward: [(0, '4.418')] [2024-11-07 14:37:31,942][04701] Updated weights for policy 0, policy_version 1206 (0.0030) [2024-11-07 14:37:32,108][04584] Fps is (10 sec: 7782.5, 60 sec: 7304.5, 300 sec: 6931.7). Total num frames: 4939776. Throughput: 0: 1847.4. Samples: 220758. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-11-07 14:37:32,110][04584] Avg episode reward: [(0, '4.316')] [2024-11-07 14:37:36,986][04701] Updated weights for policy 0, policy_version 1216 (0.0022) [2024-11-07 14:37:37,108][04584] Fps is (10 sec: 8192.1, 60 sec: 7304.6, 300 sec: 6978.4). Total num frames: 4980736. Throughput: 0: 1848.0. Samples: 232650. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 14:37:37,109][04584] Avg episode reward: [(0, '4.397')] [2024-11-07 14:37:41,959][04701] Updated weights for policy 0, policy_version 1226 (0.0024) [2024-11-07 14:37:42,109][04584] Fps is (10 sec: 8191.4, 60 sec: 7576.7, 300 sec: 7021.7). Total num frames: 5021696. Throughput: 0: 1853.3. Samples: 244906. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 14:37:42,111][04584] Avg episode reward: [(0, '4.489')] [2024-11-07 14:37:46,940][04701] Updated weights for policy 0, policy_version 1236 (0.0028) [2024-11-07 14:37:47,108][04584] Fps is (10 sec: 8191.8, 60 sec: 7577.6, 300 sec: 7062.1). Total num frames: 5062656. Throughput: 0: 1859.2. Samples: 251056. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:37:47,110][04584] Avg episode reward: [(0, '4.449')] [2024-11-07 14:37:52,554][04584] Fps is (10 sec: 6274.2, 60 sec: 7250.6, 300 sec: 6969.8). Total num frames: 5087232. Throughput: 0: 1865.9. Samples: 263382. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 14:37:52,559][04584] Avg episode reward: [(0, '4.337')] [2024-11-07 14:37:54,172][04701] Updated weights for policy 0, policy_version 1246 (0.0027) [2024-11-07 14:37:57,108][04584] Fps is (10 sec: 6144.0, 60 sec: 7304.5, 300 sec: 7002.8). Total num frames: 5124096. Throughput: 0: 1812.0. Samples: 270964. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 14:37:57,110][04584] Avg episode reward: [(0, '4.357')] [2024-11-07 14:37:59,230][04701] Updated weights for policy 0, policy_version 1256 (0.0026) [2024-11-07 14:38:02,108][04584] Fps is (10 sec: 8145.9, 60 sec: 7372.8, 300 sec: 7040.0). Total num frames: 5165056. Throughput: 0: 1866.2. Samples: 277174. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:38:02,109][04584] Avg episode reward: [(0, '4.437')] [2024-11-07 14:38:04,744][04701] Updated weights for policy 0, policy_version 1266 (0.0023) [2024-11-07 14:38:07,108][04584] Fps is (10 sec: 7782.5, 60 sec: 7304.5, 300 sec: 7050.1). Total num frames: 5201920. Throughput: 0: 1901.1. Samples: 288256. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 14:38:07,110][04584] Avg episode reward: [(0, '4.331')] [2024-11-07 14:38:09,768][04701] Updated weights for policy 0, policy_version 1276 (0.0026) [2024-11-07 14:38:12,108][04584] Fps is (10 sec: 7782.4, 60 sec: 7441.1, 300 sec: 7083.7). Total num frames: 5242880. Throughput: 0: 1908.4. Samples: 300644. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 14:38:12,110][04584] Avg episode reward: [(0, '4.525')] [2024-11-07 14:38:14,720][04701] Updated weights for policy 0, policy_version 1286 (0.0019) [2024-11-07 14:38:17,108][04584] Fps is (10 sec: 8191.9, 60 sec: 7710.0, 300 sec: 7115.3). Total num frames: 5283840. Throughput: 0: 1914.2. Samples: 306896. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 14:38:17,110][04584] Avg episode reward: [(0, '4.575')] [2024-11-07 14:38:19,757][04701] Updated weights for policy 0, policy_version 1296 (0.0029) [2024-11-07 14:38:22,108][04584] Fps is (10 sec: 8192.0, 60 sec: 7714.2, 300 sec: 7145.2). Total num frames: 5324800. Throughput: 0: 1920.4. Samples: 319070. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:38:22,112][04584] Avg episode reward: [(0, '4.339')] [2024-11-07 14:38:24,651][04701] Updated weights for policy 0, policy_version 1306 (0.0030) [2024-11-07 14:38:27,108][04584] Fps is (10 sec: 6553.5, 60 sec: 7509.3, 300 sec: 7085.0). Total num frames: 5349376. Throughput: 0: 1862.0. Samples: 328694. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 14:38:27,110][04584] Avg episode reward: [(0, '4.349')] [2024-11-07 14:38:31,931][04701] Updated weights for policy 0, policy_version 1316 (0.0027) [2024-11-07 14:38:32,108][04584] Fps is (10 sec: 6553.6, 60 sec: 7509.4, 300 sec: 7114.1). Total num frames: 5390336. Throughput: 0: 1825.5. Samples: 333202. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 14:38:32,110][04584] Avg episode reward: [(0, '4.554')] [2024-11-07 14:38:36,771][04701] Updated weights for policy 0, policy_version 1326 (0.0026) [2024-11-07 14:38:37,108][04584] Fps is (10 sec: 8192.1, 60 sec: 7509.3, 300 sec: 7141.7). Total num frames: 5431296. Throughput: 0: 1844.8. Samples: 345574. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 14:38:37,111][04584] Avg episode reward: [(0, '4.685')] [2024-11-07 14:38:41,807][04701] Updated weights for policy 0, policy_version 1336 (0.0021) [2024-11-07 14:38:42,109][04584] Fps is (10 sec: 8191.4, 60 sec: 7509.4, 300 sec: 7168.0). Total num frames: 5472256. Throughput: 0: 1933.9. Samples: 357990. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 14:38:42,111][04584] Avg episode reward: [(0, '4.322')] [2024-11-07 14:38:46,713][04701] Updated weights for policy 0, policy_version 1346 (0.0026) [2024-11-07 14:38:47,109][04584] Fps is (10 sec: 8191.8, 60 sec: 7509.3, 300 sec: 7193.0). Total num frames: 5513216. Throughput: 0: 1934.4. Samples: 364222. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 14:38:47,112][04584] Avg episode reward: [(0, '4.330')] [2024-11-07 14:38:51,629][04701] Updated weights for policy 0, policy_version 1356 (0.0022) [2024-11-07 14:38:52,108][04584] Fps is (10 sec: 8192.4, 60 sec: 7840.7, 300 sec: 7216.8). Total num frames: 5554176. Throughput: 0: 1963.5. Samples: 376612. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:38:52,110][04584] Avg episode reward: [(0, '4.316')] [2024-11-07 14:38:56,630][04701] Updated weights for policy 0, policy_version 1366 (0.0025) [2024-11-07 14:38:57,109][04584] Fps is (10 sec: 8192.0, 60 sec: 7850.7, 300 sec: 7239.4). Total num frames: 5595136. Throughput: 0: 1961.9. Samples: 388932. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:38:57,111][04584] Avg episode reward: [(0, '4.435')] [2024-11-07 14:39:02,108][04584] Fps is (10 sec: 6553.8, 60 sec: 7577.6, 300 sec: 7186.6). Total num frames: 5619712. Throughput: 0: 1958.1. Samples: 395012. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 14:39:02,110][04584] Avg episode reward: [(0, '4.515')] [2024-11-07 14:39:04,078][04701] Updated weights for policy 0, policy_version 1376 (0.0027) [2024-11-07 14:39:07,108][04584] Fps is (10 sec: 6144.1, 60 sec: 7577.6, 300 sec: 7190.7). Total num frames: 5656576. Throughput: 0: 1847.9. Samples: 402224. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:39:07,110][04584] Avg episode reward: [(0, '4.473')] [2024-11-07 14:39:07,120][04688] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001382_5660672.pth... [2024-11-07 14:39:07,214][04688] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000986_4038656.pth [2024-11-07 14:39:09,137][04701] Updated weights for policy 0, policy_version 1386 (0.0028) [2024-11-07 14:39:12,108][04584] Fps is (10 sec: 7782.3, 60 sec: 7577.6, 300 sec: 7212.5). Total num frames: 5697536. Throughput: 0: 1907.0. Samples: 414508. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:39:12,111][04584] Avg episode reward: [(0, '4.401')] [2024-11-07 14:39:14,203][04701] Updated weights for policy 0, policy_version 1396 (0.0024) [2024-11-07 14:39:17,110][04584] Fps is (10 sec: 8600.6, 60 sec: 7645.7, 300 sec: 7250.7). Total num frames: 5742592. Throughput: 0: 1942.3. Samples: 420606. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:39:17,112][04584] Avg episode reward: [(0, '4.251')] [2024-11-07 14:39:19,131][04701] Updated weights for policy 0, policy_version 1406 (0.0027) [2024-11-07 14:39:22,108][04584] Fps is (10 sec: 8601.8, 60 sec: 7645.9, 300 sec: 7270.4). Total num frames: 5783552. Throughput: 0: 1941.7. Samples: 432950. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:39:22,110][04584] Avg episode reward: [(0, '4.408')] [2024-11-07 14:39:24,033][04701] Updated weights for policy 0, policy_version 1416 (0.0026) [2024-11-07 14:39:27,108][04584] Fps is (10 sec: 8193.0, 60 sec: 7918.9, 300 sec: 7289.2). Total num frames: 5824512. Throughput: 0: 1942.5. Samples: 445400. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 14:39:27,112][04584] Avg episode reward: [(0, '4.368')] [2024-11-07 14:39:29,061][04701] Updated weights for policy 0, policy_version 1426 (0.0028) [2024-11-07 14:39:32,109][04584] Fps is (10 sec: 8191.0, 60 sec: 7918.8, 300 sec: 7307.2). Total num frames: 5865472. Throughput: 0: 1942.4. Samples: 451632. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 14:39:32,112][04584] Avg episode reward: [(0, '4.311')] [2024-11-07 14:39:36,164][04701] Updated weights for policy 0, policy_version 1436 (0.0029) [2024-11-07 14:39:37,108][04584] Fps is (10 sec: 6144.0, 60 sec: 7577.6, 300 sec: 7244.3). Total num frames: 5885952. Throughput: 0: 1871.7. Samples: 460840. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:39:37,110][04584] Avg episode reward: [(0, '4.291')] [2024-11-07 14:39:41,153][04701] Updated weights for policy 0, policy_version 1446 (0.0026) [2024-11-07 14:39:42,108][04584] Fps is (10 sec: 6144.7, 60 sec: 7577.7, 300 sec: 7262.5). Total num frames: 5926912. Throughput: 0: 1838.3. Samples: 471654. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 14:39:42,110][04584] Avg episode reward: [(0, '4.399')] [2024-11-07 14:39:46,287][04701] Updated weights for policy 0, policy_version 1456 (0.0024) [2024-11-07 14:39:47,109][04584] Fps is (10 sec: 8191.7, 60 sec: 7577.6, 300 sec: 7280.0). Total num frames: 5967872. Throughput: 0: 1836.7. Samples: 477666. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 14:39:47,111][04584] Avg episode reward: [(0, '4.391')] [2024-11-07 14:39:51,647][04701] Updated weights for policy 0, policy_version 1466 (0.0024) [2024-11-07 14:39:52,108][04584] Fps is (10 sec: 8192.0, 60 sec: 7577.6, 300 sec: 7296.9). Total num frames: 6008832. Throughput: 0: 1936.8. Samples: 489382. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 14:39:52,110][04584] Avg episode reward: [(0, '4.306')] [2024-11-07 14:39:56,638][04701] Updated weights for policy 0, policy_version 1476 (0.0027) [2024-11-07 14:39:57,109][04584] Fps is (10 sec: 7782.4, 60 sec: 7509.3, 300 sec: 7298.3). Total num frames: 6045696. Throughput: 0: 1930.7. Samples: 501392. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:39:57,111][04584] Avg episode reward: [(0, '4.586')] [2024-11-07 14:40:01,928][04701] Updated weights for policy 0, policy_version 1486 (0.0022) [2024-11-07 14:40:02,108][04584] Fps is (10 sec: 7782.5, 60 sec: 7782.4, 300 sec: 7314.3). Total num frames: 6086656. Throughput: 0: 1933.2. Samples: 507596. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:40:02,111][04584] Avg episode reward: [(0, '4.157')] [2024-11-07 14:40:07,045][04701] Updated weights for policy 0, policy_version 1496 (0.0028) [2024-11-07 14:40:07,108][04584] Fps is (10 sec: 8192.4, 60 sec: 7850.7, 300 sec: 7329.7). Total num frames: 6127616. Throughput: 0: 1921.9. Samples: 519434. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:40:07,110][04584] Avg episode reward: [(0, '4.650')] [2024-11-07 14:40:12,108][04584] Fps is (10 sec: 5734.3, 60 sec: 7441.1, 300 sec: 7259.8). Total num frames: 6144000. Throughput: 0: 1798.1. Samples: 526316. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 14:40:12,115][04584] Avg episode reward: [(0, '4.644')] [2024-11-07 14:40:15,781][04701] Updated weights for policy 0, policy_version 1506 (0.0037) [2024-11-07 14:40:17,108][04584] Fps is (10 sec: 4915.2, 60 sec: 7236.4, 300 sec: 7247.8). Total num frames: 6176768. Throughput: 0: 1762.2. Samples: 530928. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 14:40:17,111][04584] Avg episode reward: [(0, '4.390')] [2024-11-07 14:40:21,601][04701] Updated weights for policy 0, policy_version 1516 (0.0034) [2024-11-07 14:40:22,108][04584] Fps is (10 sec: 6553.7, 60 sec: 7099.7, 300 sec: 7358.9). Total num frames: 6209536. Throughput: 0: 1769.1. Samples: 540450. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:40:22,110][04584] Avg episode reward: [(0, '4.347')] [2024-11-07 14:40:26,748][04701] Updated weights for policy 0, policy_version 1526 (0.0027) [2024-11-07 14:40:27,109][04584] Fps is (10 sec: 7372.5, 60 sec: 7099.7, 300 sec: 7400.6). Total num frames: 6250496. Throughput: 0: 1796.7. Samples: 552504. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 14:40:27,110][04584] Avg episode reward: [(0, '4.378')] [2024-11-07 14:40:31,804][04701] Updated weights for policy 0, policy_version 1536 (0.0030) [2024-11-07 14:40:32,108][04584] Fps is (10 sec: 8191.9, 60 sec: 7099.9, 300 sec: 7456.1). Total num frames: 6291456. Throughput: 0: 1797.6. Samples: 558558. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 14:40:32,111][04584] Avg episode reward: [(0, '4.402')] [2024-11-07 14:40:36,996][04701] Updated weights for policy 0, policy_version 1546 (0.0025) [2024-11-07 14:40:37,108][04584] Fps is (10 sec: 8192.1, 60 sec: 7441.1, 300 sec: 7456.1). Total num frames: 6332416. Throughput: 0: 1804.4. Samples: 570582. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 14:40:37,110][04584] Avg episode reward: [(0, '4.223')] [2024-11-07 14:40:42,112][04584] Fps is (10 sec: 7779.5, 60 sec: 7372.4, 300 sec: 7442.1). Total num frames: 6369280. Throughput: 0: 1800.2. Samples: 582408. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:40:42,115][04584] Avg episode reward: [(0, '4.626')] [2024-11-07 14:40:42,246][04701] Updated weights for policy 0, policy_version 1556 (0.0021) [2024-11-07 14:40:47,109][04584] Fps is (10 sec: 5734.3, 60 sec: 7031.5, 300 sec: 7372.8). Total num frames: 6389760. Throughput: 0: 1715.1. Samples: 584778. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:40:47,111][04584] Avg episode reward: [(0, '4.667')] [2024-11-07 14:40:50,289][04701] Updated weights for policy 0, policy_version 1566 (0.0036) [2024-11-07 14:40:52,108][04584] Fps is (10 sec: 5736.4, 60 sec: 6963.2, 300 sec: 7358.9). Total num frames: 6426624. Throughput: 0: 1668.8. Samples: 594530. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 14:40:52,111][04584] Avg episode reward: [(0, '4.452')] [2024-11-07 14:40:55,426][04701] Updated weights for policy 0, policy_version 1576 (0.0025) [2024-11-07 14:40:57,108][04584] Fps is (10 sec: 7782.4, 60 sec: 7031.5, 300 sec: 7372.8). Total num frames: 6467584. Throughput: 0: 1779.1. Samples: 606376. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:40:57,110][04584] Avg episode reward: [(0, '4.567')] [2024-11-07 14:41:00,789][04701] Updated weights for policy 0, policy_version 1586 (0.0028) [2024-11-07 14:41:02,109][04584] Fps is (10 sec: 7781.9, 60 sec: 6963.1, 300 sec: 7400.5). Total num frames: 6504448. Throughput: 0: 1799.3. Samples: 611900. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 14:41:02,112][04584] Avg episode reward: [(0, '4.513')] [2024-11-07 14:41:06,194][04701] Updated weights for policy 0, policy_version 1596 (0.0034) [2024-11-07 14:41:07,108][04584] Fps is (10 sec: 7373.0, 60 sec: 6894.9, 300 sec: 7456.1). Total num frames: 6541312. Throughput: 0: 1839.9. Samples: 623244. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:41:07,110][04584] Avg episode reward: [(0, '4.449')] [2024-11-07 14:41:07,219][04688] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001598_6545408.pth... [2024-11-07 14:41:07,348][04688] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001163_4763648.pth [2024-11-07 14:41:11,911][04701] Updated weights for policy 0, policy_version 1606 (0.0033) [2024-11-07 14:41:12,108][04584] Fps is (10 sec: 7373.5, 60 sec: 7236.3, 300 sec: 7456.1). Total num frames: 6578176. Throughput: 0: 1821.6. Samples: 634474. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:41:12,110][04584] Avg episode reward: [(0, '4.328')] [2024-11-07 14:41:18,681][04584] Fps is (10 sec: 6370.7, 60 sec: 7117.9, 300 sec: 7402.7). Total num frames: 6615040. Throughput: 0: 1743.5. Samples: 639758. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:41:18,683][04584] Avg episode reward: [(0, '4.432')] [2024-11-07 14:41:18,837][04701] Updated weights for policy 0, policy_version 1616 (0.0024) [2024-11-07 14:41:22,108][04584] Fps is (10 sec: 6144.0, 60 sec: 7168.0, 300 sec: 7386.7). Total num frames: 6639616. Throughput: 0: 1727.3. Samples: 648310. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:41:22,112][04584] Avg episode reward: [(0, '4.391')] [2024-11-07 14:41:24,288][04701] Updated weights for policy 0, policy_version 1626 (0.0036) [2024-11-07 14:41:27,108][04584] Fps is (10 sec: 7777.0, 60 sec: 7168.0, 300 sec: 7386.7). Total num frames: 6680576. Throughput: 0: 1725.2. Samples: 660036. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:41:27,110][04584] Avg episode reward: [(0, '4.320')] [2024-11-07 14:41:29,172][04701] Updated weights for policy 0, policy_version 1636 (0.0028) [2024-11-07 14:41:32,108][04584] Fps is (10 sec: 8192.0, 60 sec: 7168.0, 300 sec: 7386.7). Total num frames: 6721536. Throughput: 0: 1813.1. Samples: 666366. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 14:41:32,110][04584] Avg episode reward: [(0, '4.490')] [2024-11-07 14:41:34,208][04701] Updated weights for policy 0, policy_version 1646 (0.0023) [2024-11-07 14:41:37,108][04584] Fps is (10 sec: 8192.0, 60 sec: 7168.0, 300 sec: 7441.3). Total num frames: 6762496. Throughput: 0: 1863.7. Samples: 678394. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-07 14:41:37,109][04584] Avg episode reward: [(0, '4.537')] [2024-11-07 14:41:39,513][04701] Updated weights for policy 0, policy_version 1656 (0.0026) [2024-11-07 14:41:42,108][04584] Fps is (10 sec: 7782.3, 60 sec: 7168.5, 300 sec: 7428.3). Total num frames: 6799360. Throughput: 0: 1856.8. Samples: 689932. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:41:42,110][04584] Avg episode reward: [(0, '4.371')] [2024-11-07 14:41:44,930][04701] Updated weights for policy 0, policy_version 1666 (0.0028) [2024-11-07 14:41:47,108][04584] Fps is (10 sec: 7782.4, 60 sec: 7509.4, 300 sec: 7428.3). Total num frames: 6840320. Throughput: 0: 1863.2. Samples: 695744. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 14:41:47,110][04584] Avg episode reward: [(0, '4.330')] [2024-11-07 14:41:50,403][04701] Updated weights for policy 0, policy_version 1676 (0.0030) [2024-11-07 14:41:53,381][04584] Fps is (10 sec: 6177.0, 60 sec: 7219.7, 300 sec: 7368.8). Total num frames: 6868992. Throughput: 0: 1810.9. Samples: 707038. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 14:41:53,382][04584] Avg episode reward: [(0, '4.407')] [2024-11-07 14:41:57,108][04584] Fps is (10 sec: 5734.4, 60 sec: 7168.0, 300 sec: 7372.8). Total num frames: 6897664. Throughput: 0: 1764.1. Samples: 713860. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 14:41:57,110][04584] Avg episode reward: [(0, '4.690')] [2024-11-07 14:41:58,056][04701] Updated weights for policy 0, policy_version 1686 (0.0034) [2024-11-07 14:42:02,109][04584] Fps is (10 sec: 7509.1, 60 sec: 7168.1, 300 sec: 7358.9). Total num frames: 6934528. Throughput: 0: 1843.5. Samples: 719816. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:42:02,110][04584] Avg episode reward: [(0, '4.442')] [2024-11-07 14:42:03,920][04701] Updated weights for policy 0, policy_version 1696 (0.0041) [2024-11-07 14:42:07,108][04584] Fps is (10 sec: 6963.1, 60 sec: 7099.7, 300 sec: 7358.9). Total num frames: 6967296. Throughput: 0: 1819.1. Samples: 730168. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:42:07,111][04584] Avg episode reward: [(0, '4.615')] [2024-11-07 14:42:09,344][04701] Updated weights for policy 0, policy_version 1706 (0.0034) [2024-11-07 14:42:12,108][04584] Fps is (10 sec: 6963.5, 60 sec: 7099.7, 300 sec: 7398.6). Total num frames: 7004160. Throughput: 0: 1801.8. Samples: 741116. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:42:12,111][04584] Avg episode reward: [(0, '4.542')] [2024-11-07 14:42:15,345][04701] Updated weights for policy 0, policy_version 1716 (0.0029) [2024-11-07 14:42:17,109][04584] Fps is (10 sec: 7372.2, 60 sec: 7290.8, 300 sec: 7386.7). Total num frames: 7041024. Throughput: 0: 1774.6. Samples: 746224. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:42:17,130][04584] Avg episode reward: [(0, '4.596')] [2024-11-07 14:42:20,616][04701] Updated weights for policy 0, policy_version 1726 (0.0025) [2024-11-07 14:42:22,108][04584] Fps is (10 sec: 7372.7, 60 sec: 7304.5, 300 sec: 7386.7). Total num frames: 7077888. Throughput: 0: 1764.3. Samples: 757786. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 14:42:22,110][04584] Avg episode reward: [(0, '4.395')] [2024-11-07 14:42:27,773][04584] Fps is (10 sec: 6145.8, 60 sec: 7022.0, 300 sec: 7328.5). Total num frames: 7106560. Throughput: 0: 1616.7. Samples: 763760. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:42:27,774][04584] Avg episode reward: [(0, '4.528')] [2024-11-07 14:42:27,814][04701] Updated weights for policy 0, policy_version 1736 (0.0034) [2024-11-07 14:42:32,108][04584] Fps is (10 sec: 6553.6, 60 sec: 7031.4, 300 sec: 7331.1). Total num frames: 7143424. Throughput: 0: 1680.9. Samples: 771384. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:42:32,113][04584] Avg episode reward: [(0, '4.745')] [2024-11-07 14:42:33,100][04701] Updated weights for policy 0, policy_version 1746 (0.0023) [2024-11-07 14:42:37,108][04584] Fps is (10 sec: 8336.2, 60 sec: 7031.4, 300 sec: 7331.2). Total num frames: 7184384. Throughput: 0: 1741.0. Samples: 783168. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 14:42:37,110][04584] Avg episode reward: [(0, '4.300')] [2024-11-07 14:42:38,140][04701] Updated weights for policy 0, policy_version 1756 (0.0034) [2024-11-07 14:42:42,108][04584] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 7303.4). Total num frames: 7217152. Throughput: 0: 1784.7. Samples: 794170. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-07 14:42:42,110][04584] Avg episode reward: [(0, '4.610')] [2024-11-07 14:42:43,824][04701] Updated weights for policy 0, policy_version 1766 (0.0026) [2024-11-07 14:42:47,108][04584] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 7370.1). Total num frames: 7258112. Throughput: 0: 1789.5. Samples: 800344. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-11-07 14:42:47,110][04584] Avg episode reward: [(0, '4.393')] [2024-11-07 14:42:49,006][04701] Updated weights for policy 0, policy_version 1776 (0.0030) [2024-11-07 14:42:52,109][04584] Fps is (10 sec: 7781.6, 60 sec: 7253.5, 300 sec: 7358.9). Total num frames: 7294976. Throughput: 0: 1817.0. Samples: 811934. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 14:42:52,113][04584] Avg episode reward: [(0, '4.440')] [2024-11-07 14:42:54,207][04701] Updated weights for policy 0, policy_version 1786 (0.0029) [2024-11-07 14:42:57,108][04584] Fps is (10 sec: 7782.3, 60 sec: 7304.5, 300 sec: 7358.9). Total num frames: 7335936. Throughput: 0: 1841.5. Samples: 823984. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:42:57,110][04584] Avg episode reward: [(0, '4.300')] [2024-11-07 14:42:59,392][04701] Updated weights for policy 0, policy_version 1796 (0.0028) [2024-11-07 14:43:02,199][04584] Fps is (10 sec: 6495.4, 60 sec: 7089.1, 300 sec: 7315.0). Total num frames: 7360512. Throughput: 0: 1856.0. Samples: 829910. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 14:43:02,202][04584] Avg episode reward: [(0, '4.381')] [2024-11-07 14:43:06,958][04701] Updated weights for policy 0, policy_version 1806 (0.0026) [2024-11-07 14:43:07,108][04584] Fps is (10 sec: 6144.0, 60 sec: 7168.0, 300 sec: 7303.4). Total num frames: 7397376. Throughput: 0: 1761.0. Samples: 837030. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 14:43:07,110][04584] Avg episode reward: [(0, '4.652')] [2024-11-07 14:43:07,127][04688] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001806_7397376.pth... [2024-11-07 14:43:07,318][04688] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001382_5660672.pth [2024-11-07 14:43:12,082][04701] Updated weights for policy 0, policy_version 1816 (0.0029) [2024-11-07 14:43:12,108][04584] Fps is (10 sec: 7853.5, 60 sec: 7236.3, 300 sec: 7303.4). Total num frames: 7438336. Throughput: 0: 1917.2. Samples: 848760. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:43:12,110][04584] Avg episode reward: [(0, '4.544')] [2024-11-07 14:43:17,108][04584] Fps is (10 sec: 7782.6, 60 sec: 7236.4, 300 sec: 7289.5). Total num frames: 7475200. Throughput: 0: 1853.6. Samples: 854794. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 14:43:17,111][04584] Avg episode reward: [(0, '4.361')] [2024-11-07 14:43:17,153][04701] Updated weights for policy 0, policy_version 1826 (0.0025) [2024-11-07 14:43:22,108][04584] Fps is (10 sec: 7782.4, 60 sec: 7304.5, 300 sec: 7345.0). Total num frames: 7516160. Throughput: 0: 1864.3. Samples: 867062. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 14:43:22,112][04584] Avg episode reward: [(0, '4.248')] [2024-11-07 14:43:22,238][04701] Updated weights for policy 0, policy_version 1836 (0.0026) [2024-11-07 14:43:27,109][04584] Fps is (10 sec: 8191.7, 60 sec: 7593.4, 300 sec: 7345.0). Total num frames: 7557120. Throughput: 0: 1885.3. Samples: 879010. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:43:27,111][04584] Avg episode reward: [(0, '4.410')] [2024-11-07 14:43:27,492][04701] Updated weights for policy 0, policy_version 1846 (0.0023) [2024-11-07 14:43:32,108][04584] Fps is (10 sec: 7782.3, 60 sec: 7509.3, 300 sec: 7331.1). Total num frames: 7593984. Throughput: 0: 1875.4. Samples: 884738. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 14:43:32,110][04584] Avg episode reward: [(0, '4.527')] [2024-11-07 14:43:32,694][04701] Updated weights for policy 0, policy_version 1856 (0.0022) [2024-11-07 14:43:37,108][04584] Fps is (10 sec: 6144.2, 60 sec: 7236.3, 300 sec: 7275.6). Total num frames: 7618560. Throughput: 0: 1849.5. Samples: 895158. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:43:37,110][04584] Avg episode reward: [(0, '4.513')] [2024-11-07 14:43:39,970][04701] Updated weights for policy 0, policy_version 1866 (0.0035) [2024-11-07 14:43:42,108][04584] Fps is (10 sec: 6553.6, 60 sec: 7372.8, 300 sec: 7275.6). Total num frames: 7659520. Throughput: 0: 1785.7. Samples: 904342. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:43:42,110][04584] Avg episode reward: [(0, '4.196')] [2024-11-07 14:43:45,251][04701] Updated weights for policy 0, policy_version 1876 (0.0029) [2024-11-07 14:43:47,108][04584] Fps is (10 sec: 7782.4, 60 sec: 7304.5, 300 sec: 7261.7). Total num frames: 7696384. Throughput: 0: 1783.9. Samples: 910026. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:43:47,110][04584] Avg episode reward: [(0, '4.252')] [2024-11-07 14:43:50,305][04701] Updated weights for policy 0, policy_version 1886 (0.0024) [2024-11-07 14:43:52,108][04584] Fps is (10 sec: 7782.5, 60 sec: 7372.9, 300 sec: 7261.7). Total num frames: 7737344. Throughput: 0: 1890.9. Samples: 922118. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:43:52,110][04584] Avg episode reward: [(0, '4.414')] [2024-11-07 14:43:55,379][04701] Updated weights for policy 0, policy_version 1896 (0.0020) [2024-11-07 14:43:57,109][04584] Fps is (10 sec: 8191.5, 60 sec: 7372.7, 300 sec: 7317.2). Total num frames: 7778304. Throughput: 0: 1901.1. Samples: 934312. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:43:57,111][04584] Avg episode reward: [(0, '4.448')] [2024-11-07 14:44:00,489][04701] Updated weights for policy 0, policy_version 1906 (0.0023) [2024-11-07 14:44:02,108][04584] Fps is (10 sec: 7782.3, 60 sec: 7589.0, 300 sec: 7317.3). Total num frames: 7815168. Throughput: 0: 1898.8. Samples: 940240. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-07 14:44:02,111][04584] Avg episode reward: [(0, '4.370')] [2024-11-07 14:44:05,857][04701] Updated weights for policy 0, policy_version 1916 (0.0025) [2024-11-07 14:44:07,108][04584] Fps is (10 sec: 7782.9, 60 sec: 7645.9, 300 sec: 7317.3). Total num frames: 7856128. Throughput: 0: 1880.7. Samples: 951692. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-11-07 14:44:07,110][04584] Avg episode reward: [(0, '4.466')] [2024-11-07 14:44:12,108][04584] Fps is (10 sec: 6144.1, 60 sec: 7304.6, 300 sec: 7234.0). Total num frames: 7876608. Throughput: 0: 1785.3. Samples: 959350. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 14:44:12,110][04584] Avg episode reward: [(0, '4.323')] [2024-11-07 14:44:13,147][04701] Updated weights for policy 0, policy_version 1926 (0.0023) [2024-11-07 14:44:17,108][04584] Fps is (10 sec: 6143.9, 60 sec: 7372.8, 300 sec: 7233.9). Total num frames: 7917568. Throughput: 0: 1789.9. Samples: 965282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-07 14:44:17,111][04584] Avg episode reward: [(0, '4.247')] [2024-11-07 14:44:18,201][04701] Updated weights for policy 0, policy_version 1936 (0.0026) [2024-11-07 14:44:22,109][04584] Fps is (10 sec: 8191.6, 60 sec: 7372.8, 300 sec: 7233.9). Total num frames: 7958528. Throughput: 0: 1830.9. Samples: 977550. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-07 14:44:22,114][04584] Avg episode reward: [(0, '4.299')] [2024-11-07 14:44:23,245][04701] Updated weights for policy 0, policy_version 1946 (0.0026) [2024-11-07 14:44:27,109][04584] Fps is (10 sec: 8601.1, 60 sec: 7441.0, 300 sec: 7247.8). Total num frames: 8003584. Throughput: 0: 1905.0. Samples: 990068. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-11-07 14:44:27,111][04584] Avg episode reward: [(0, '4.466')] [2024-11-07 14:44:27,617][04688] Stopping Batcher_0... [2024-11-07 14:44:27,618][04688] Loop batcher_evt_loop terminating... [2024-11-07 14:44:27,620][04688] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... [2024-11-07 14:44:27,619][04584] Component Batcher_0 stopped! [2024-11-07 14:44:27,676][04701] Weights refcount: 2 0 [2024-11-07 14:44:27,678][04701] Stopping InferenceWorker_p0-w0... [2024-11-07 14:44:27,679][04701] Loop inference_proc0-0_evt_loop terminating... [2024-11-07 14:44:27,678][04584] Component InferenceWorker_p0-w0 stopped! [2024-11-07 14:44:27,709][04688] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001598_6545408.pth [2024-11-07 14:44:27,722][04688] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... [2024-11-07 14:44:27,750][04584] Component RolloutWorker_w0 stopped! [2024-11-07 14:44:27,751][04702] Stopping RolloutWorker_w0... [2024-11-07 14:44:27,752][04702] Loop rollout_proc0_evt_loop terminating... [2024-11-07 14:44:27,756][04703] Stopping RolloutWorker_w1... [2024-11-07 14:44:27,756][04584] Component RolloutWorker_w1 stopped! [2024-11-07 14:44:27,759][04703] Loop rollout_proc1_evt_loop terminating... [2024-11-07 14:44:27,765][04707] Stopping RolloutWorker_w5... [2024-11-07 14:44:27,768][04707] Loop rollout_proc5_evt_loop terminating... [2024-11-07 14:44:27,772][04706] Stopping RolloutWorker_w4... [2024-11-07 14:44:27,774][04706] Loop rollout_proc4_evt_loop terminating... [2024-11-07 14:44:27,765][04584] Component RolloutWorker_w5 stopped! [2024-11-07 14:44:27,796][04584] Component RolloutWorker_w4 stopped! [2024-11-07 14:44:27,812][04708] Stopping RolloutWorker_w6... [2024-11-07 14:44:27,814][04708] Loop rollout_proc6_evt_loop terminating... [2024-11-07 14:44:27,812][04584] Component RolloutWorker_w6 stopped! [2024-11-07 14:44:27,845][04688] Stopping LearnerWorker_p0... [2024-11-07 14:44:27,846][04688] Loop learner_proc0_evt_loop terminating... [2024-11-07 14:44:27,846][04584] Component LearnerWorker_p0 stopped! [2024-11-07 14:44:27,888][04709] Stopping RolloutWorker_w7... [2024-11-07 14:44:27,889][04584] Component RolloutWorker_w7 stopped! [2024-11-07 14:44:27,908][04709] Loop rollout_proc7_evt_loop terminating... [2024-11-07 14:44:27,913][04704] Stopping RolloutWorker_w3... [2024-11-07 14:44:27,913][04584] Component RolloutWorker_w3 stopped! [2024-11-07 14:44:27,916][04704] Loop rollout_proc3_evt_loop terminating... [2024-11-07 14:44:28,016][04705] Stopping RolloutWorker_w2... [2024-11-07 14:44:28,016][04584] Component RolloutWorker_w2 stopped! [2024-11-07 14:44:28,019][04705] Loop rollout_proc2_evt_loop terminating... [2024-11-07 14:44:28,019][04584] Waiting for process learner_proc0 to stop... [2024-11-07 14:44:29,498][04584] Waiting for process inference_proc0-0 to join... [2024-11-07 14:44:29,500][04584] Waiting for process rollout_proc0 to join... [2024-11-07 14:44:29,501][04584] Waiting for process rollout_proc1 to join... [2024-11-07 14:44:29,504][04584] Waiting for process rollout_proc2 to join... [2024-11-07 14:44:29,653][04584] Waiting for process rollout_proc3 to join... [2024-11-07 14:44:29,655][04584] Waiting for process rollout_proc4 to join... [2024-11-07 14:44:29,656][04584] Waiting for process rollout_proc5 to join... [2024-11-07 14:44:29,657][04584] Waiting for process rollout_proc6 to join... [2024-11-07 14:44:29,660][04584] Waiting for process rollout_proc7 to join... [2024-11-07 14:44:29,661][04584] Batcher 0 profile tree view: batching: 27.4377, releasing_batches: 0.0459 [2024-11-07 14:44:29,663][04584] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 6.5812 update_model: 7.6567 weight_update: 0.0027 one_step: 0.0070 handle_policy_step: 510.3365 deserialize: 13.7673, stack: 2.4265, obs_to_device_normalize: 150.3995, forward: 227.4887, send_messages: 31.2471 prepare_outputs: 68.6568 to_cpu: 53.1126 [2024-11-07 14:44:29,665][04584] Learner 0 profile tree view: misc: 0.0053, prepare_batch: 30.8301 train: 106.7453 epoch_init: 0.0079, minibatch_init: 0.0131, losses_postprocess: 0.8354, kl_divergence: 0.9434, after_optimizer: 4.0108 calculate_losses: 31.3550 losses_init: 0.0058, forward_head: 1.9126, bptt_initial: 21.8044, tail: 1.0207, advantages_returns: 0.3056, losses: 3.3075 bptt: 2.6948 bptt_forward_core: 2.5765 update: 68.9694 clip: 1.2501 [2024-11-07 14:44:29,666][04584] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.2375, enqueue_policy_requests: 11.9428, env_step: 149.5260, overhead: 10.7908, complete_rollouts: 0.6476 save_policy_outputs: 16.0346 split_output_tensors: 5.4611 [2024-11-07 14:44:29,670][04584] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.2022, enqueue_policy_requests: 12.7636, env_step: 249.0214, overhead: 10.6673, complete_rollouts: 0.3962 save_policy_outputs: 18.4068 split_output_tensors: 7.8369 [2024-11-07 14:44:29,672][04584] Loop Runner_EvtLoop terminating... [2024-11-07 14:44:29,675][04584] Runner profile tree view: main_loop: 558.4363 [2024-11-07 14:44:29,676][04584] Collected {0: 8007680}, FPS: 7107.4 [2024-11-07 14:44:30,065][04584] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json [2024-11-07 14:44:30,066][04584] Overriding arg 'num_workers' with value 1 passed from command line [2024-11-07 14:44:30,067][04584] Adding new argument 'no_render'=True that is not in the saved config file! [2024-11-07 14:44:30,068][04584] Adding new argument 'save_video'=True that is not in the saved config file! [2024-11-07 14:44:30,071][04584] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-11-07 14:44:30,072][04584] Adding new argument 'video_name'=None that is not in the saved config file! [2024-11-07 14:44:30,074][04584] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-11-07 14:44:30,075][04584] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-11-07 14:44:30,077][04584] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-11-07 14:44:30,079][04584] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-11-07 14:44:30,081][04584] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-11-07 14:44:30,082][04584] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-11-07 14:44:30,084][04584] Adding new argument 'train_script'=None that is not in the saved config file! [2024-11-07 14:44:30,085][04584] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-11-07 14:44:30,088][04584] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-11-07 14:44:30,176][04584] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 14:44:30,189][04584] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 14:44:30,193][04584] RunningMeanStd input shape: (1,) [2024-11-07 14:44:30,218][04584] ConvEncoder: input_channels=3 [2024-11-07 14:44:30,407][04584] Conv encoder output size: 512 [2024-11-07 14:44:30,408][04584] Policy head output size: 512 [2024-11-07 14:44:31,399][04584] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... [2024-11-07 14:44:32,337][04584] Num frames 100... [2024-11-07 14:44:32,546][04584] Num frames 200... [2024-11-07 14:44:32,757][04584] Num frames 300... [2024-11-07 14:44:32,967][04584] Num frames 400... [2024-11-07 14:44:33,113][04584] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480 [2024-11-07 14:44:33,114][04584] Avg episode reward: 5.480, avg true_objective: 4.480 [2024-11-07 14:44:33,216][04584] Num frames 500... [2024-11-07 14:44:33,402][04584] Num frames 600... [2024-11-07 14:44:33,611][04584] Num frames 700... [2024-11-07 14:44:33,809][04584] Num frames 800... [2024-11-07 14:44:34,017][04584] Num frames 900... [2024-11-07 14:44:34,144][04584] Avg episode rewards: #0: 6.140, true rewards: #0: 4.640 [2024-11-07 14:44:34,146][04584] Avg episode reward: 6.140, avg true_objective: 4.640 [2024-11-07 14:44:34,289][04584] Num frames 1000... [2024-11-07 14:44:34,485][04584] Num frames 1100... [2024-11-07 14:44:34,693][04584] Num frames 1200... [2024-11-07 14:44:34,909][04584] Avg episode rewards: #0: 5.600, true rewards: #0: 4.267 [2024-11-07 14:44:34,910][04584] Avg episode reward: 5.600, avg true_objective: 4.267 [2024-11-07 14:44:34,955][04584] Num frames 1300... [2024-11-07 14:44:35,146][04584] Num frames 1400... [2024-11-07 14:44:35,332][04584] Num frames 1500... [2024-11-07 14:44:35,527][04584] Num frames 1600... [2024-11-07 14:44:35,702][04584] Avg episode rewards: #0: 5.160, true rewards: #0: 4.160 [2024-11-07 14:44:35,706][04584] Avg episode reward: 5.160, avg true_objective: 4.160 [2024-11-07 14:44:35,791][04584] Num frames 1700... [2024-11-07 14:44:35,959][04584] Num frames 1800... [2024-11-07 14:44:36,136][04584] Num frames 1900... [2024-11-07 14:44:36,333][04584] Num frames 2000... [2024-11-07 14:44:36,475][04584] Avg episode rewards: #0: 4.896, true rewards: #0: 4.096 [2024-11-07 14:44:36,477][04584] Avg episode reward: 4.896, avg true_objective: 4.096 [2024-11-07 14:44:36,590][04584] Num frames 2100... [2024-11-07 14:44:36,787][04584] Num frames 2200... [2024-11-07 14:44:36,980][04584] Num frames 2300... [2024-11-07 14:44:37,181][04584] Num frames 2400... [2024-11-07 14:44:37,439][04584] Avg episode rewards: #0: 4.993, true rewards: #0: 4.160 [2024-11-07 14:44:37,440][04584] Avg episode reward: 4.993, avg true_objective: 4.160 [2024-11-07 14:44:37,450][04584] Num frames 2500... [2024-11-07 14:44:37,658][04584] Num frames 2600... [2024-11-07 14:44:37,847][04584] Num frames 2700... [2024-11-07 14:44:38,030][04584] Num frames 2800... [2024-11-07 14:44:38,229][04584] Avg episode rewards: #0: 4.829, true rewards: #0: 4.114 [2024-11-07 14:44:38,232][04584] Avg episode reward: 4.829, avg true_objective: 4.114 [2024-11-07 14:44:38,291][04584] Num frames 2900... [2024-11-07 14:44:38,471][04584] Num frames 3000... [2024-11-07 14:44:38,681][04584] Num frames 3100... [2024-11-07 14:44:38,872][04584] Num frames 3200... [2024-11-07 14:44:39,064][04584] Avg episode rewards: #0: 4.705, true rewards: #0: 4.080 [2024-11-07 14:44:39,066][04584] Avg episode reward: 4.705, avg true_objective: 4.080 [2024-11-07 14:44:39,137][04584] Num frames 3300... [2024-11-07 14:44:39,304][04584] Num frames 3400... [2024-11-07 14:44:39,485][04584] Num frames 3500... [2024-11-07 14:44:39,725][04584] Num frames 3600... [2024-11-07 14:44:39,856][04584] Avg episode rewards: #0: 4.609, true rewards: #0: 4.053 [2024-11-07 14:44:39,861][04584] Avg episode reward: 4.609, avg true_objective: 4.053 [2024-11-07 14:44:39,965][04584] Num frames 3700... [2024-11-07 14:44:40,133][04584] Num frames 3800... [2024-11-07 14:44:40,308][04584] Num frames 3900... [2024-11-07 14:44:40,493][04584] Num frames 4000... [2024-11-07 14:44:40,602][04584] Avg episode rewards: #0: 4.532, true rewards: #0: 4.032 [2024-11-07 14:44:40,606][04584] Avg episode reward: 4.532, avg true_objective: 4.032 [2024-11-07 14:44:51,260][04584] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4! [2024-11-07 14:44:53,903][04584] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json [2024-11-07 14:44:53,904][04584] Overriding arg 'num_workers' with value 1 passed from command line [2024-11-07 14:44:53,906][04584] Adding new argument 'no_render'=True that is not in the saved config file! [2024-11-07 14:44:53,908][04584] Adding new argument 'save_video'=True that is not in the saved config file! [2024-11-07 14:44:53,909][04584] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-11-07 14:44:53,910][04584] Adding new argument 'video_name'=None that is not in the saved config file! [2024-11-07 14:44:53,913][04584] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-11-07 14:44:53,915][04584] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-11-07 14:44:53,917][04584] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-11-07 14:44:53,920][04584] Adding new argument 'hf_repository'='alidenewade/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-11-07 14:44:53,922][04584] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-11-07 14:44:53,923][04584] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-11-07 14:44:53,926][04584] Adding new argument 'train_script'=None that is not in the saved config file! [2024-11-07 14:44:53,928][04584] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-11-07 14:44:53,931][04584] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-11-07 14:44:53,958][04584] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 14:44:53,960][04584] RunningMeanStd input shape: (1,) [2024-11-07 14:44:53,973][04584] ConvEncoder: input_channels=3 [2024-11-07 14:44:54,021][04584] Conv encoder output size: 512 [2024-11-07 14:44:54,023][04584] Policy head output size: 512 [2024-11-07 14:44:54,046][04584] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... [2024-11-07 14:44:54,568][04584] Num frames 100... [2024-11-07 14:44:54,745][04584] Num frames 200... [2024-11-07 14:44:54,907][04584] Num frames 300... [2024-11-07 14:44:55,060][04584] Num frames 400... [2024-11-07 14:44:55,186][04584] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480 [2024-11-07 14:44:55,187][04584] Avg episode reward: 5.480, avg true_objective: 4.480 [2024-11-07 14:44:55,276][04584] Num frames 500... [2024-11-07 14:44:55,445][04584] Num frames 600... [2024-11-07 14:44:55,620][04584] Num frames 700... [2024-11-07 14:44:55,759][04584] Num frames 800... [2024-11-07 14:44:55,862][04584] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 [2024-11-07 14:44:55,863][04584] Avg episode reward: 4.660, avg true_objective: 4.160 [2024-11-07 14:44:55,968][04584] Num frames 900... [2024-11-07 14:44:56,121][04584] Num frames 1000... [2024-11-07 14:44:56,271][04584] Num frames 1100... [2024-11-07 14:44:56,427][04584] Num frames 1200... [2024-11-07 14:44:56,511][04584] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 [2024-11-07 14:44:56,512][04584] Avg episode reward: 4.387, avg true_objective: 4.053 [2024-11-07 14:44:56,656][04584] Num frames 1300... [2024-11-07 14:44:56,815][04584] Num frames 1400... [2024-11-07 14:44:56,966][04584] Num frames 1500... [2024-11-07 14:44:57,111][04584] Num frames 1600... [2024-11-07 14:44:57,215][04584] Avg episode rewards: #0: 4.580, true rewards: #0: 4.080 [2024-11-07 14:44:57,215][04584] Avg episode reward: 4.580, avg true_objective: 4.080 [2024-11-07 14:44:57,321][04584] Num frames 1700... [2024-11-07 14:44:57,471][04584] Num frames 1800... [2024-11-07 14:44:57,627][04584] Num frames 1900... [2024-11-07 14:44:57,776][04584] Num frames 2000... [2024-11-07 14:44:57,856][04584] Avg episode rewards: #0: 4.432, true rewards: #0: 4.032 [2024-11-07 14:44:57,857][04584] Avg episode reward: 4.432, avg true_objective: 4.032 [2024-11-07 14:44:57,981][04584] Num frames 2100... [2024-11-07 14:44:58,132][04584] Num frames 2200... [2024-11-07 14:44:58,301][04584] Num frames 2300... [2024-11-07 14:44:58,455][04584] Num frames 2400... [2024-11-07 14:44:58,615][04584] Avg episode rewards: #0: 4.607, true rewards: #0: 4.107 [2024-11-07 14:44:58,616][04584] Avg episode reward: 4.607, avg true_objective: 4.107 [2024-11-07 14:44:58,681][04584] Num frames 2500... [2024-11-07 14:44:58,837][04584] Num frames 2600... [2024-11-07 14:44:58,996][04584] Num frames 2700... [2024-11-07 14:44:59,149][04584] Num frames 2800... [2024-11-07 14:44:59,330][04584] Avg episode rewards: #0: 4.686, true rewards: #0: 4.114 [2024-11-07 14:44:59,331][04584] Avg episode reward: 4.686, avg true_objective: 4.114 [2024-11-07 14:44:59,363][04584] Num frames 2900... [2024-11-07 14:44:59,538][04584] Num frames 3000... [2024-11-07 14:44:59,703][04584] Num frames 3100... [2024-11-07 14:44:59,853][04584] Num frames 3200... [2024-11-07 14:45:00,003][04584] Avg episode rewards: #0: 4.580, true rewards: #0: 4.080 [2024-11-07 14:45:00,005][04584] Avg episode reward: 4.580, avg true_objective: 4.080 [2024-11-07 14:45:00,066][04584] Num frames 3300... [2024-11-07 14:45:00,231][04584] Num frames 3400... [2024-11-07 14:45:00,389][04584] Num frames 3500... [2024-11-07 14:45:00,565][04584] Num frames 3600... [2024-11-07 14:45:00,743][04584] Num frames 3700... [2024-11-07 14:45:00,972][04584] Avg episode rewards: #0: 4.862, true rewards: #0: 4.196 [2024-11-07 14:45:00,974][04584] Avg episode reward: 4.862, avg true_objective: 4.196 [2024-11-07 14:45:01,026][04584] Num frames 3800... [2024-11-07 14:45:01,224][04584] Num frames 3900... [2024-11-07 14:45:01,414][04584] Num frames 4000... [2024-11-07 14:45:01,597][04584] Num frames 4100... [2024-11-07 14:45:01,777][04584] Avg episode rewards: #0: 4.760, true rewards: #0: 4.160 [2024-11-07 14:45:01,778][04584] Avg episode reward: 4.760, avg true_objective: 4.160 [2024-11-07 14:45:10,932][04584] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4! [2024-11-07 14:45:22,820][04584] The model has been pushed to https://huggingface.co/alidenewade/rl_course_vizdoom_health_gathering_supreme [2024-11-07 14:52:22,743][04584] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json [2024-11-07 14:52:22,744][04584] Overriding arg 'num_workers' with value 1 passed from command line [2024-11-07 14:52:22,746][04584] Adding new argument 'no_render'=True that is not in the saved config file! [2024-11-07 14:52:22,747][04584] Adding new argument 'save_video'=True that is not in the saved config file! [2024-11-07 14:52:22,749][04584] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-11-07 14:52:22,750][04584] Adding new argument 'video_name'=None that is not in the saved config file! [2024-11-07 14:52:22,753][04584] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-11-07 14:52:22,755][04584] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-11-07 14:52:22,756][04584] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-11-07 14:52:22,757][04584] Adding new argument 'hf_repository'='alidenewade/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-11-07 14:52:22,758][04584] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-11-07 14:52:22,761][04584] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-11-07 14:52:22,762][04584] Adding new argument 'train_script'=None that is not in the saved config file! [2024-11-07 14:52:22,764][04584] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-11-07 14:52:22,765][04584] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-11-07 14:52:22,805][04584] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 14:52:22,807][04584] RunningMeanStd input shape: (1,) [2024-11-07 14:52:22,823][04584] ConvEncoder: input_channels=3 [2024-11-07 14:52:22,886][04584] Conv encoder output size: 512 [2024-11-07 14:52:22,887][04584] Policy head output size: 512 [2024-11-07 14:52:22,925][04584] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... [2024-11-07 14:52:23,495][04584] Num frames 100... [2024-11-07 14:52:23,750][04584] Num frames 200... [2024-11-07 14:52:23,933][04584] Num frames 300... [2024-11-07 14:52:24,160][04584] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2024-11-07 14:52:24,166][04584] Avg episode reward: 3.840, avg true_objective: 3.840 [2024-11-07 14:52:24,207][04584] Num frames 400... [2024-11-07 14:52:24,395][04584] Num frames 500... [2024-11-07 14:52:24,568][04584] Num frames 600... [2024-11-07 14:52:24,735][04584] Num frames 700... [2024-11-07 14:52:24,905][04584] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2024-11-07 14:52:24,908][04584] Avg episode reward: 3.840, avg true_objective: 3.840 [2024-11-07 14:52:24,982][04584] Num frames 800... [2024-11-07 14:52:25,141][04584] Num frames 900... [2024-11-07 14:52:25,293][04584] Num frames 1000... [2024-11-07 14:52:25,495][04584] Num frames 1100... [2024-11-07 14:52:25,703][04584] Num frames 1200... [2024-11-07 14:52:25,787][04584] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 [2024-11-07 14:52:25,789][04584] Avg episode reward: 4.387, avg true_objective: 4.053 [2024-11-07 14:52:25,973][04584] Num frames 1300... [2024-11-07 14:52:26,162][04584] Num frames 1400... [2024-11-07 14:52:26,327][04584] Avg episode rewards: #0: 3.925, true rewards: #0: 3.675 [2024-11-07 14:52:26,328][04584] Avg episode reward: 3.925, avg true_objective: 3.675 [2024-11-07 14:52:26,390][04584] Num frames 1500... [2024-11-07 14:52:26,547][04584] Num frames 1600... [2024-11-07 14:52:26,703][04584] Num frames 1700... [2024-11-07 14:52:26,865][04584] Num frames 1800... [2024-11-07 14:52:27,004][04584] Avg episode rewards: #0: 3.908, true rewards: #0: 3.708 [2024-11-07 14:52:27,009][04584] Avg episode reward: 3.908, avg true_objective: 3.708 [2024-11-07 14:52:27,106][04584] Num frames 1900... [2024-11-07 14:52:27,292][04584] Num frames 2000... [2024-11-07 14:52:27,460][04584] Num frames 2100... [2024-11-07 14:52:27,618][04584] Num frames 2200... [2024-11-07 14:52:27,770][04584] Num frames 2300... [2024-11-07 14:52:27,829][04584] Avg episode rewards: #0: 4.170, true rewards: #0: 3.837 [2024-11-07 14:52:27,830][04584] Avg episode reward: 4.170, avg true_objective: 3.837 [2024-11-07 14:52:28,026][04584] Num frames 2400... [2024-11-07 14:52:28,195][04584] Num frames 2500... [2024-11-07 14:52:28,412][04584] Num frames 2600... [2024-11-07 14:52:28,639][04584] Num frames 2700... [2024-11-07 14:52:28,757][04584] Avg episode rewards: #0: 4.169, true rewards: #0: 3.883 [2024-11-07 14:52:28,759][04584] Avg episode reward: 4.169, avg true_objective: 3.883 [2024-11-07 14:52:28,999][04584] Num frames 2800... [2024-11-07 14:52:29,214][04584] Num frames 2900... [2024-11-07 14:52:29,416][04584] Num frames 3000... [2024-11-07 14:52:29,700][04584] Num frames 3100... [2024-11-07 14:52:30,033][04584] Avg episode rewards: #0: 4.498, true rewards: #0: 3.997 [2024-11-07 14:52:30,037][04584] Avg episode reward: 4.498, avg true_objective: 3.997 [2024-11-07 14:52:30,057][04584] Num frames 3200... [2024-11-07 14:52:30,268][04584] Num frames 3300... [2024-11-07 14:52:30,520][04584] Num frames 3400... [2024-11-07 14:52:30,893][04584] Num frames 3500... [2024-11-07 14:52:31,203][04584] Avg episode rewards: #0: 4.424, true rewards: #0: 3.980 [2024-11-07 14:52:31,209][04584] Avg episode reward: 4.424, avg true_objective: 3.980 [2024-11-07 14:52:31,268][04584] Num frames 3600... [2024-11-07 14:52:31,510][04584] Num frames 3700... [2024-11-07 14:52:31,731][04584] Num frames 3800... [2024-11-07 14:52:31,955][04584] Num frames 3900... [2024-11-07 14:52:32,276][04584] Avg episode rewards: #0: 4.498, true rewards: #0: 3.998 [2024-11-07 14:52:32,278][04584] Avg episode reward: 4.498, avg true_objective: 3.998 [2024-11-07 14:52:32,283][04584] Num frames 4000... [2024-11-07 14:52:40,832][04584] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4! [2024-11-07 14:52:50,207][04584] The model has been pushed to https://huggingface.co/alidenewade/rl_course_vizdoom_health_gathering_supreme [2024-11-07 14:55:34,046][04584] Environment doom_basic already registered, overwriting... [2024-11-07 14:55:34,050][04584] Environment doom_two_colors_easy already registered, overwriting... [2024-11-07 14:55:34,053][04584] Environment doom_two_colors_hard already registered, overwriting... [2024-11-07 14:55:34,054][04584] Environment doom_dm already registered, overwriting... [2024-11-07 14:55:34,055][04584] Environment doom_dwango5 already registered, overwriting... [2024-11-07 14:55:34,057][04584] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-11-07 14:55:34,058][04584] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-11-07 14:55:34,059][04584] Environment doom_my_way_home already registered, overwriting... [2024-11-07 14:55:34,060][04584] Environment doom_deadly_corridor already registered, overwriting... [2024-11-07 14:55:34,063][04584] Environment doom_defend_the_center already registered, overwriting... [2024-11-07 14:55:34,065][04584] Environment doom_defend_the_line already registered, overwriting... [2024-11-07 14:55:34,067][04584] Environment doom_health_gathering already registered, overwriting... [2024-11-07 14:55:34,069][04584] Environment doom_health_gathering_supreme already registered, overwriting... [2024-11-07 14:55:34,070][04584] Environment doom_battle already registered, overwriting... [2024-11-07 14:55:34,072][04584] Environment doom_battle2 already registered, overwriting... [2024-11-07 14:55:34,073][04584] Environment doom_duel_bots already registered, overwriting... [2024-11-07 14:55:34,075][04584] Environment doom_deathmatch_bots already registered, overwriting... [2024-11-07 14:55:34,075][04584] Environment doom_duel already registered, overwriting... [2024-11-07 14:55:34,078][04584] Environment doom_deathmatch_full already registered, overwriting... [2024-11-07 14:55:34,079][04584] Environment doom_benchmark already registered, overwriting... [2024-11-07 14:55:34,081][04584] register_encoder_factory: [2024-11-07 14:55:34,099][04584] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json [2024-11-07 14:55:34,107][04584] Experiment dir /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment already exists! [2024-11-07 14:55:34,109][04584] Resuming existing experiment from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment... [2024-11-07 14:55:34,110][04584] Weights and Biases integration disabled [2024-11-07 14:55:34,116][04584] Environment var CUDA_VISIBLE_DEVICES is 0 [2024-11-07 14:55:36,623][04584] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/root/hfRL/ml/LunarLander-v2/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=8000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2024-11-07 14:55:36,625][04584] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... [2024-11-07 14:55:36,628][04584] Rollout worker 0 uses device cpu [2024-11-07 14:55:36,629][04584] Rollout worker 1 uses device cpu [2024-11-07 14:55:36,631][04584] Rollout worker 2 uses device cpu [2024-11-07 14:55:36,633][04584] Rollout worker 3 uses device cpu [2024-11-07 14:55:36,635][04584] Rollout worker 4 uses device cpu [2024-11-07 14:55:36,637][04584] Rollout worker 5 uses device cpu [2024-11-07 14:55:36,639][04584] Rollout worker 6 uses device cpu [2024-11-07 14:55:36,641][04584] Rollout worker 7 uses device cpu [2024-11-07 14:55:36,708][04584] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 14:55:36,710][04584] InferenceWorker_p0-w0: min num requests: 2 [2024-11-07 14:55:36,746][04584] Starting all processes... [2024-11-07 14:55:36,748][04584] Starting process learner_proc0 [2024-11-07 14:55:36,796][04584] Starting all processes... [2024-11-07 14:55:36,802][04584] Starting process inference_proc0-0 [2024-11-07 14:55:36,803][04584] Starting process rollout_proc0 [2024-11-07 14:55:36,803][04584] Starting process rollout_proc1 [2024-11-07 14:55:36,803][04584] Starting process rollout_proc2 [2024-11-07 14:55:36,804][04584] Starting process rollout_proc3 [2024-11-07 14:55:36,805][04584] Starting process rollout_proc4 [2024-11-07 14:55:36,806][04584] Starting process rollout_proc5 [2024-11-07 14:55:36,807][04584] Starting process rollout_proc6 [2024-11-07 14:55:36,808][04584] Starting process rollout_proc7 [2024-11-07 14:55:43,371][07866] Worker 0 uses CPU cores [0] [2024-11-07 14:55:44,070][07873] Worker 1 uses CPU cores [1] [2024-11-07 14:55:44,330][07871] Worker 3 uses CPU cores [3] [2024-11-07 14:55:44,346][07852] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 14:55:44,347][07852] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-11-07 14:55:44,350][07865] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 14:55:44,351][07865] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-11-07 14:55:44,383][07852] Num visible devices: 1 [2024-11-07 14:55:44,418][07865] Num visible devices: 1 [2024-11-07 14:55:44,418][07852] Starting seed is not provided [2024-11-07 14:55:44,418][07852] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 14:55:44,418][07852] Initializing actor-critic model on device cuda:0 [2024-11-07 14:55:44,419][07852] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 14:55:44,422][07852] RunningMeanStd input shape: (1,) [2024-11-07 14:55:44,488][07852] ConvEncoder: input_channels=3 [2024-11-07 14:55:44,600][07874] Worker 5 uses CPU cores [5] [2024-11-07 14:55:44,619][07885] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] [2024-11-07 14:55:44,664][07884] Worker 6 uses CPU cores [6] [2024-11-07 14:55:44,740][07870] Worker 2 uses CPU cores [2] [2024-11-07 14:55:44,747][07852] Conv encoder output size: 512 [2024-11-07 14:55:44,747][07852] Policy head output size: 512 [2024-11-07 14:55:44,767][07852] Created Actor Critic model with architecture: [2024-11-07 14:55:44,768][07852] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-11-07 14:55:45,005][07852] Using optimizer [2024-11-07 14:55:45,120][07872] Worker 4 uses CPU cores [4] [2024-11-07 14:55:46,269][07852] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... [2024-11-07 14:55:46,317][07852] Loading model from checkpoint [2024-11-07 14:55:46,320][07852] Loaded experiment state at self.train_step=1955, self.env_steps=8007680 [2024-11-07 14:55:46,320][07852] Initialized policy 0 weights for model version 1955 [2024-11-07 14:55:46,327][07852] LearnerWorker_p0 finished initialization! [2024-11-07 14:55:46,327][07852] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 14:55:46,561][07865] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 14:55:46,563][07865] RunningMeanStd input shape: (1,) [2024-11-07 14:55:46,580][07865] ConvEncoder: input_channels=3 [2024-11-07 14:55:46,746][07865] Conv encoder output size: 512 [2024-11-07 14:55:46,747][07865] Policy head output size: 512 [2024-11-07 14:55:46,824][04584] Inference worker 0-0 is ready! [2024-11-07 14:55:46,826][04584] All inference workers are ready! Signal rollout workers to start! [2024-11-07 14:55:47,036][07866] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 14:55:47,038][07872] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 14:55:47,069][07874] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 14:55:47,090][07871] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 14:55:47,104][07873] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 14:55:47,104][07870] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 14:55:47,247][07885] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 14:55:47,261][07884] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 14:55:48,282][07871] Decorrelating experience for 0 frames... [2024-11-07 14:55:48,298][07872] Decorrelating experience for 0 frames... [2024-11-07 14:55:48,386][07885] Decorrelating experience for 0 frames... [2024-11-07 14:55:48,436][07866] Decorrelating experience for 0 frames... [2024-11-07 14:55:48,449][07884] Decorrelating experience for 0 frames... [2024-11-07 14:55:48,975][07871] Decorrelating experience for 32 frames... [2024-11-07 14:55:48,987][07870] Decorrelating experience for 0 frames... [2024-11-07 14:55:49,060][07872] Decorrelating experience for 32 frames... [2024-11-07 14:55:49,080][07885] Decorrelating experience for 32 frames... [2024-11-07 14:55:49,117][04584] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 8007680. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 14:55:49,212][07884] Decorrelating experience for 32 frames... [2024-11-07 14:55:49,240][07866] Decorrelating experience for 32 frames... [2024-11-07 14:55:49,253][07874] Decorrelating experience for 0 frames... [2024-11-07 14:55:49,700][07870] Decorrelating experience for 32 frames... [2024-11-07 14:55:49,826][07872] Decorrelating experience for 64 frames... [2024-11-07 14:55:49,939][07871] Decorrelating experience for 64 frames... [2024-11-07 14:55:50,064][07885] Decorrelating experience for 64 frames... [2024-11-07 14:55:50,088][07874] Decorrelating experience for 32 frames... [2024-11-07 14:55:50,292][07870] Decorrelating experience for 64 frames... [2024-11-07 14:55:50,415][07873] Decorrelating experience for 0 frames... [2024-11-07 14:55:50,424][07866] Decorrelating experience for 64 frames... [2024-11-07 14:55:50,533][07872] Decorrelating experience for 96 frames... [2024-11-07 14:55:50,673][07884] Decorrelating experience for 64 frames... [2024-11-07 14:55:50,733][07871] Decorrelating experience for 96 frames... [2024-11-07 14:55:50,841][07885] Decorrelating experience for 96 frames... [2024-11-07 14:55:50,955][07870] Decorrelating experience for 96 frames... [2024-11-07 14:55:50,987][07873] Decorrelating experience for 32 frames... [2024-11-07 14:55:51,133][07866] Decorrelating experience for 96 frames... [2024-11-07 14:55:51,244][07874] Decorrelating experience for 64 frames... [2024-11-07 14:55:51,659][07873] Decorrelating experience for 64 frames... [2024-11-07 14:55:51,785][07874] Decorrelating experience for 96 frames... [2024-11-07 14:55:51,817][07884] Decorrelating experience for 96 frames... [2024-11-07 14:55:52,517][07873] Decorrelating experience for 96 frames... [2024-11-07 14:55:54,118][04584] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 8007680. Throughput: 0: 231.1. Samples: 1156. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 14:55:54,125][07852] Signal inference workers to stop experience collection... [2024-11-07 14:55:54,125][04584] Avg episode reward: [(0, '2.241')] [2024-11-07 14:55:54,133][07865] InferenceWorker_p0-w0: stopping experience collection [2024-11-07 14:55:56,697][04584] Heartbeat connected on Batcher_0 [2024-11-07 14:55:56,708][04584] Heartbeat connected on InferenceWorker_p0-w0 [2024-11-07 14:55:56,717][04584] Heartbeat connected on RolloutWorker_w0 [2024-11-07 14:55:56,721][04584] Heartbeat connected on RolloutWorker_w1 [2024-11-07 14:55:56,726][04584] Heartbeat connected on RolloutWorker_w2 [2024-11-07 14:55:56,738][04584] Heartbeat connected on RolloutWorker_w5 [2024-11-07 14:55:56,742][04584] Heartbeat connected on RolloutWorker_w3 [2024-11-07 14:55:56,744][04584] Heartbeat connected on RolloutWorker_w6 [2024-11-07 14:55:56,746][04584] Heartbeat connected on RolloutWorker_w4 [2024-11-07 14:55:56,749][04584] Heartbeat connected on RolloutWorker_w7 [2024-11-07 14:55:59,116][04584] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 8007680. Throughput: 0: 272.4. Samples: 2724. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 14:55:59,117][04584] Avg episode reward: [(0, '2.386')] [2024-11-07 14:56:04,116][04584] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 8007680. Throughput: 0: 181.6. Samples: 2724. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 14:56:04,121][04584] Avg episode reward: [(0, '2.386')] [2024-11-07 14:56:06,231][07852] Signal inference workers to resume experience collection... [2024-11-07 14:56:06,233][07865] InferenceWorker_p0-w0: resuming experience collection [2024-11-07 14:56:06,236][07852] Stopping Batcher_0... [2024-11-07 14:56:06,237][07852] Loop batcher_evt_loop terminating... [2024-11-07 14:56:06,247][04584] Component Batcher_0 stopped! [2024-11-07 14:56:06,361][04584] Component RolloutWorker_w0 stopped! [2024-11-07 14:56:06,362][07866] Stopping RolloutWorker_w0... [2024-11-07 14:56:06,365][07866] Loop rollout_proc0_evt_loop terminating... [2024-11-07 14:56:06,366][07874] Stopping RolloutWorker_w5... [2024-11-07 14:56:06,367][07874] Loop rollout_proc5_evt_loop terminating... [2024-11-07 14:56:06,366][04584] Component RolloutWorker_w5 stopped! [2024-11-07 14:56:06,377][07865] Weights refcount: 2 0 [2024-11-07 14:56:06,422][07865] Stopping InferenceWorker_p0-w0... [2024-11-07 14:56:06,423][07865] Loop inference_proc0-0_evt_loop terminating... [2024-11-07 14:56:06,422][04584] Component InferenceWorker_p0-w0 stopped! [2024-11-07 14:56:06,461][07884] Stopping RolloutWorker_w6... [2024-11-07 14:56:06,462][07884] Loop rollout_proc6_evt_loop terminating... [2024-11-07 14:56:06,461][04584] Component RolloutWorker_w6 stopped! [2024-11-07 14:56:06,467][07871] Stopping RolloutWorker_w3... [2024-11-07 14:56:06,468][07871] Loop rollout_proc3_evt_loop terminating... [2024-11-07 14:56:06,468][04584] Component RolloutWorker_w3 stopped! [2024-11-07 14:56:06,479][07872] Stopping RolloutWorker_w4... [2024-11-07 14:56:06,479][07872] Loop rollout_proc4_evt_loop terminating... [2024-11-07 14:56:06,479][04584] Component RolloutWorker_w4 stopped! [2024-11-07 14:56:06,516][07885] Stopping RolloutWorker_w7... [2024-11-07 14:56:06,515][04584] Component RolloutWorker_w7 stopped! [2024-11-07 14:56:06,517][07885] Loop rollout_proc7_evt_loop terminating... [2024-11-07 14:56:06,608][07870] Stopping RolloutWorker_w2... [2024-11-07 14:56:06,609][07870] Loop rollout_proc2_evt_loop terminating... [2024-11-07 14:56:06,610][04584] Component RolloutWorker_w2 stopped! [2024-11-07 14:56:06,634][07873] Stopping RolloutWorker_w1... [2024-11-07 14:56:06,635][04584] Component RolloutWorker_w1 stopped! [2024-11-07 14:56:06,640][07873] Loop rollout_proc1_evt_loop terminating... [2024-11-07 14:56:07,253][07852] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001957_8015872.pth... [2024-11-07 14:56:07,252][04584] Heartbeat connected on LearnerWorker_p0 [2024-11-07 14:56:07,807][07852] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001806_7397376.pth [2024-11-07 14:56:07,821][07852] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001957_8015872.pth... [2024-11-07 14:56:08,058][07852] Stopping LearnerWorker_p0... [2024-11-07 14:56:08,058][07852] Loop learner_proc0_evt_loop terminating... [2024-11-07 14:56:08,073][04584] Component LearnerWorker_p0 stopped! [2024-11-07 14:56:08,075][04584] Waiting for process learner_proc0 to stop... [2024-11-07 14:56:09,968][04584] Waiting for process inference_proc0-0 to join... [2024-11-07 14:56:09,970][04584] Waiting for process rollout_proc0 to join... [2024-11-07 14:56:09,971][04584] Waiting for process rollout_proc1 to join... [2024-11-07 14:56:09,973][04584] Waiting for process rollout_proc2 to join... [2024-11-07 14:56:09,975][04584] Waiting for process rollout_proc3 to join... [2024-11-07 14:56:09,978][04584] Waiting for process rollout_proc4 to join... [2024-11-07 14:56:09,980][04584] Waiting for process rollout_proc5 to join... [2024-11-07 14:56:09,982][04584] Waiting for process rollout_proc6 to join... [2024-11-07 14:56:09,985][04584] Waiting for process rollout_proc7 to join... [2024-11-07 14:56:09,988][04584] Batcher 0 profile tree view: batching: 0.0548, releasing_batches: 0.0018 [2024-11-07 14:56:09,990][04584] InferenceWorker_p0-w0 profile tree view: update_model: 0.0160 wait_policy: 0.0005 wait_policy_total: 3.7567 one_step: 0.0108 handle_policy_step: 3.3550 deserialize: 0.0684, stack: 0.0110, obs_to_device_normalize: 0.7746, forward: 2.0215, send_messages: 0.1367 prepare_outputs: 0.2665 to_cpu: 0.1882 [2024-11-07 14:56:09,992][04584] Learner 0 profile tree view: misc: 0.0001, prepare_batch: 2.3351 train: 11.7240 epoch_init: 0.0001, minibatch_init: 0.0000, losses_postprocess: 0.0015, kl_divergence: 0.4367, after_optimizer: 0.9795 calculate_losses: 2.5855 losses_init: 0.0000, forward_head: 0.4665, bptt_initial: 1.1746, tail: 0.3188, advantages_returns: 0.0022, losses: 0.4083 bptt: 0.2146 bptt_forward_core: 0.2144 update: 7.7171 clip: 0.6435 [2024-11-07 14:56:09,993][04584] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.0039, enqueue_policy_requests: 0.0693, env_step: 0.7017, overhead: 0.0387, complete_rollouts: 0.0014 save_policy_outputs: 0.0719 split_output_tensors: 0.0275 [2024-11-07 14:56:09,996][04584] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.0008, enqueue_policy_requests: 0.0552, env_step: 1.1240, overhead: 0.0403, complete_rollouts: 0.0010 save_policy_outputs: 0.0683 split_output_tensors: 0.0209 [2024-11-07 14:56:10,000][04584] Loop Runner_EvtLoop terminating... [2024-11-07 14:56:10,003][04584] Runner profile tree view: main_loop: 33.2571 [2024-11-07 14:56:10,006][04584] Collected {0: 8015872}, FPS: 246.3 [2024-11-07 14:56:10,173][04584] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json [2024-11-07 14:56:10,175][04584] Overriding arg 'num_workers' with value 4 passed from command line [2024-11-07 14:56:10,177][04584] Adding new argument 'no_render'=True that is not in the saved config file! [2024-11-07 14:56:10,179][04584] Adding new argument 'save_video'=True that is not in the saved config file! [2024-11-07 14:56:10,182][04584] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-11-07 14:56:10,184][04584] Adding new argument 'video_name'=None that is not in the saved config file! [2024-11-07 14:56:10,185][04584] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-11-07 14:56:10,187][04584] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-11-07 14:56:10,189][04584] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-11-07 14:56:10,191][04584] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-11-07 14:56:10,193][04584] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-11-07 14:56:10,194][04584] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-11-07 14:56:10,195][04584] Adding new argument 'train_script'=None that is not in the saved config file! [2024-11-07 14:56:10,197][04584] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-11-07 14:56:10,202][04584] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-11-07 14:56:10,254][04584] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 14:56:10,256][04584] RunningMeanStd input shape: (1,) [2024-11-07 14:56:10,293][04584] ConvEncoder: input_channels=3 [2024-11-07 14:56:10,355][04584] Conv encoder output size: 512 [2024-11-07 14:56:10,357][04584] Policy head output size: 512 [2024-11-07 14:56:10,406][04584] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001957_8015872.pth... [2024-11-07 14:56:13,878][04584] Num frames 100... [2024-11-07 14:56:14,106][04584] Num frames 200... [2024-11-07 14:56:14,341][04584] Num frames 300... [2024-11-07 14:56:14,572][04584] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2024-11-07 14:56:14,574][04584] Avg episode reward: 3.840, avg true_objective: 3.840 [2024-11-07 14:56:14,609][04584] Num frames 400... [2024-11-07 14:56:14,867][04584] Num frames 500... [2024-11-07 14:56:15,349][04584] Num frames 600... [2024-11-07 14:56:16,008][04584] Num frames 700... [2024-11-07 14:56:16,669][04584] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2024-11-07 14:56:16,683][04584] Avg episode reward: 3.840, avg true_objective: 3.840 [2024-11-07 14:56:16,990][04584] Num frames 800... [2024-11-07 14:56:17,996][04584] Num frames 900... [2024-11-07 14:56:18,675][04584] Num frames 1000... [2024-11-07 14:56:19,397][04584] Num frames 1100... [2024-11-07 14:56:19,814][04584] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2024-11-07 14:56:19,818][04584] Avg episode reward: 3.840, avg true_objective: 3.840 [2024-11-07 14:56:20,211][04584] Num frames 1200... [2024-11-07 14:56:20,758][04584] Num frames 1300... [2024-11-07 14:56:21,379][04584] Num frames 1400... [2024-11-07 14:56:21,844][04584] Num frames 1500... [2024-11-07 14:56:22,003][04584] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2024-11-07 14:56:22,005][04584] Avg episode reward: 3.840, avg true_objective: 3.840 [2024-11-07 14:56:22,207][04584] Num frames 1600... [2024-11-07 14:56:22,423][04584] Num frames 1700... [2024-11-07 14:56:22,654][04584] Num frames 1800... [2024-11-07 14:56:22,864][04584] Num frames 1900... [2024-11-07 14:56:22,961][04584] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2024-11-07 14:56:22,962][04584] Avg episode reward: 3.840, avg true_objective: 3.840 [2024-11-07 14:56:23,201][04584] Num frames 2000... [2024-11-07 14:56:23,415][04584] Num frames 2100... [2024-11-07 14:56:23,717][04584] Num frames 2200... [2024-11-07 14:56:23,940][04584] Num frames 2300... [2024-11-07 14:56:24,003][04584] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2024-11-07 14:56:24,005][04584] Avg episode reward: 3.840, avg true_objective: 3.840 [2024-11-07 14:56:24,222][04584] Num frames 2400... [2024-11-07 14:56:24,428][04584] Num frames 2500... [2024-11-07 14:56:24,677][04584] Num frames 2600... [2024-11-07 14:56:24,929][04584] Num frames 2700... [2024-11-07 14:56:25,159][04584] Num frames 2800... [2024-11-07 14:56:25,249][04584] Avg episode rewards: #0: 4.309, true rewards: #0: 4.023 [2024-11-07 14:56:25,250][04584] Avg episode reward: 4.309, avg true_objective: 4.023 [2024-11-07 14:56:25,442][04584] Num frames 2900... [2024-11-07 14:56:25,633][04584] Num frames 3000... [2024-11-07 14:56:25,835][04584] Num frames 3100... [2024-11-07 14:56:26,055][04584] Num frames 3200... [2024-11-07 14:56:26,248][04584] Avg episode rewards: #0: 4.455, true rewards: #0: 4.080 [2024-11-07 14:56:26,251][04584] Avg episode reward: 4.455, avg true_objective: 4.080 [2024-11-07 14:56:26,359][04584] Num frames 3300... [2024-11-07 14:56:26,561][04584] Num frames 3400... [2024-11-07 14:56:26,732][04584] Num frames 3500... [2024-11-07 14:56:26,889][04584] Num frames 3600... [2024-11-07 14:56:27,027][04584] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 [2024-11-07 14:56:27,030][04584] Avg episode reward: 4.387, avg true_objective: 4.053 [2024-11-07 14:56:27,125][04584] Num frames 3700... [2024-11-07 14:56:27,303][04584] Num frames 3800... [2024-11-07 14:56:27,472][04584] Num frames 3900... [2024-11-07 14:56:27,634][04584] Num frames 4000... [2024-11-07 14:56:27,752][04584] Avg episode rewards: #0: 4.332, true rewards: #0: 4.032 [2024-11-07 14:56:27,754][04584] Avg episode reward: 4.332, avg true_objective: 4.032 [2024-11-07 14:56:40,409][04584] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4! [2024-11-07 14:56:40,960][04584] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json [2024-11-07 14:56:40,962][04584] Overriding arg 'num_workers' with value 4 passed from command line [2024-11-07 14:56:40,964][04584] Adding new argument 'no_render'=True that is not in the saved config file! [2024-11-07 14:56:40,966][04584] Adding new argument 'save_video'=True that is not in the saved config file! [2024-11-07 14:56:40,973][04584] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-11-07 14:56:40,976][04584] Adding new argument 'video_name'=None that is not in the saved config file! [2024-11-07 14:56:40,979][04584] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-11-07 14:56:40,988][04584] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-11-07 14:56:40,990][04584] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-11-07 14:56:40,993][04584] Adding new argument 'hf_repository'='alidenewade/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-11-07 14:56:40,995][04584] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-11-07 14:56:40,996][04584] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-11-07 14:56:41,005][04584] Adding new argument 'train_script'=None that is not in the saved config file! [2024-11-07 14:56:41,007][04584] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-11-07 14:56:41,009][04584] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-11-07 14:56:41,091][04584] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 14:56:41,095][04584] RunningMeanStd input shape: (1,) [2024-11-07 14:56:41,124][04584] ConvEncoder: input_channels=3 [2024-11-07 14:56:41,187][04584] Conv encoder output size: 512 [2024-11-07 14:56:41,188][04584] Policy head output size: 512 [2024-11-07 14:56:41,211][04584] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001957_8015872.pth... [2024-11-07 14:56:41,745][04584] Num frames 100... [2024-11-07 14:56:41,957][04584] Num frames 200... [2024-11-07 14:56:42,160][04584] Num frames 300... [2024-11-07 14:56:42,410][04584] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2024-11-07 14:56:42,414][04584] Avg episode reward: 3.840, avg true_objective: 3.840 [2024-11-07 14:56:42,475][04584] Num frames 400... [2024-11-07 14:56:42,737][04584] Num frames 500... [2024-11-07 14:56:43,040][04584] Num frames 600... [2024-11-07 14:56:43,312][04584] Num frames 700... [2024-11-07 14:56:43,592][04584] Num frames 800... [2024-11-07 14:56:43,713][04584] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 [2024-11-07 14:56:43,714][04584] Avg episode reward: 4.660, avg true_objective: 4.160 [2024-11-07 14:56:43,872][04584] Num frames 900... [2024-11-07 14:56:44,191][04584] Num frames 1000... [2024-11-07 14:56:44,433][04584] Num frames 1100... [2024-11-07 14:56:44,628][04584] Num frames 1200... [2024-11-07 14:56:44,766][04584] Avg episode rewards: #0: 4.827, true rewards: #0: 4.160 [2024-11-07 14:56:44,772][04584] Avg episode reward: 4.827, avg true_objective: 4.160 [2024-11-07 14:56:44,879][04584] Num frames 1300... [2024-11-07 14:56:45,064][04584] Num frames 1400... [2024-11-07 14:56:45,236][04584] Num frames 1500... [2024-11-07 14:56:45,421][04584] Num frames 1600... [2024-11-07 14:56:47,592][04584] Avg episode rewards: #0: 4.580, true rewards: #0: 4.080 [2024-11-07 14:56:47,594][04584] Avg episode reward: 4.580, avg true_objective: 4.080 [2024-11-07 14:56:47,724][04584] Num frames 1700... [2024-11-07 14:56:47,905][04584] Num frames 1800... [2024-11-07 14:56:48,109][04584] Num frames 1900... [2024-11-07 14:56:48,295][04584] Num frames 2000... [2024-11-07 14:56:48,425][04584] Avg episode rewards: #0: 4.680, true rewards: #0: 4.080 [2024-11-07 14:56:48,428][04584] Avg episode reward: 4.680, avg true_objective: 4.080 [2024-11-07 14:56:48,562][04584] Num frames 2100... [2024-11-07 14:56:48,738][04584] Num frames 2200... [2024-11-07 14:56:48,913][04584] Num frames 2300... [2024-11-07 14:56:49,095][04584] Num frames 2400... [2024-11-07 14:56:49,191][04584] Avg episode rewards: #0: 4.540, true rewards: #0: 4.040 [2024-11-07 14:56:49,193][04584] Avg episode reward: 4.540, avg true_objective: 4.040 [2024-11-07 14:56:49,333][04584] Num frames 2500... [2024-11-07 14:56:49,501][04584] Num frames 2600... [2024-11-07 14:56:49,658][04584] Num frames 2700... [2024-11-07 14:56:49,848][04584] Num frames 2800... [2024-11-07 14:56:49,918][04584] Avg episode rewards: #0: 4.440, true rewards: #0: 4.011 [2024-11-07 14:56:49,921][04584] Avg episode reward: 4.440, avg true_objective: 4.011 [2024-11-07 14:56:50,094][04584] Num frames 2900... [2024-11-07 14:56:50,431][04584] Num frames 3000... [2024-11-07 14:56:50,666][04584] Num frames 3100... [2024-11-07 14:56:50,917][04584] Avg episode rewards: #0: 4.365, true rewards: #0: 3.990 [2024-11-07 14:56:50,920][04584] Avg episode reward: 4.365, avg true_objective: 3.990 [2024-11-07 14:56:50,945][04584] Num frames 3200... [2024-11-07 14:56:51,105][04584] Num frames 3300... [2024-11-07 14:56:51,283][04584] Num frames 3400... [2024-11-07 14:56:51,493][04584] Num frames 3500... [2024-11-07 14:56:51,666][04584] Num frames 3600... [2024-11-07 14:56:51,793][04584] Avg episode rewards: #0: 4.489, true rewards: #0: 4.044 [2024-11-07 14:56:51,795][04584] Avg episode reward: 4.489, avg true_objective: 4.044 [2024-11-07 14:56:51,923][04584] Num frames 3700... [2024-11-07 14:56:52,126][04584] Num frames 3800... [2024-11-07 14:56:52,324][04584] Num frames 3900... [2024-11-07 14:56:52,514][04584] Num frames 4000... [2024-11-07 14:56:52,732][04584] Avg episode rewards: #0: 4.588, true rewards: #0: 4.088 [2024-11-07 14:56:52,736][04584] Avg episode reward: 4.588, avg true_objective: 4.088 [2024-11-07 14:57:02,444][04584] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4! [2024-11-07 14:57:10,868][04584] The model has been pushed to https://huggingface.co/alidenewade/rl_course_vizdoom_health_gathering_supreme [2024-11-07 14:59:42,796][04584] Environment doom_basic already registered, overwriting... [2024-11-07 14:59:42,798][04584] Environment doom_two_colors_easy already registered, overwriting... [2024-11-07 14:59:42,800][04584] Environment doom_two_colors_hard already registered, overwriting... [2024-11-07 14:59:42,802][04584] Environment doom_dm already registered, overwriting... [2024-11-07 14:59:42,803][04584] Environment doom_dwango5 already registered, overwriting... [2024-11-07 14:59:42,804][04584] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-11-07 14:59:42,805][04584] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-11-07 14:59:42,806][04584] Environment doom_my_way_home already registered, overwriting... [2024-11-07 14:59:42,808][04584] Environment doom_deadly_corridor already registered, overwriting... [2024-11-07 14:59:42,809][04584] Environment doom_defend_the_center already registered, overwriting... [2024-11-07 14:59:42,813][04584] Environment doom_defend_the_line already registered, overwriting... [2024-11-07 14:59:42,814][04584] Environment doom_health_gathering already registered, overwriting... [2024-11-07 14:59:42,815][04584] Environment doom_health_gathering_supreme already registered, overwriting... [2024-11-07 14:59:42,817][04584] Environment doom_battle already registered, overwriting... [2024-11-07 14:59:42,820][04584] Environment doom_battle2 already registered, overwriting... [2024-11-07 14:59:42,822][04584] Environment doom_duel_bots already registered, overwriting... [2024-11-07 14:59:42,825][04584] Environment doom_deathmatch_bots already registered, overwriting... [2024-11-07 14:59:42,828][04584] Environment doom_duel already registered, overwriting... [2024-11-07 14:59:42,829][04584] Environment doom_deathmatch_full already registered, overwriting... [2024-11-07 14:59:42,831][04584] Environment doom_benchmark already registered, overwriting... [2024-11-07 14:59:42,833][04584] register_encoder_factory: [2024-11-07 15:01:10,944][04584] Environment doom_basic already registered, overwriting... [2024-11-07 15:01:10,947][04584] Environment doom_two_colors_easy already registered, overwriting... [2024-11-07 15:01:10,949][04584] Environment doom_two_colors_hard already registered, overwriting... [2024-11-07 15:01:10,950][04584] Environment doom_dm already registered, overwriting... [2024-11-07 15:01:10,951][04584] Environment doom_dwango5 already registered, overwriting... [2024-11-07 15:01:10,953][04584] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-11-07 15:01:10,954][04584] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-11-07 15:01:10,956][04584] Environment doom_my_way_home already registered, overwriting... [2024-11-07 15:01:10,958][04584] Environment doom_deadly_corridor already registered, overwriting... [2024-11-07 15:01:10,960][04584] Environment doom_defend_the_center already registered, overwriting... [2024-11-07 15:01:10,962][04584] Environment doom_defend_the_line already registered, overwriting... [2024-11-07 15:01:10,963][04584] Environment doom_health_gathering already registered, overwriting... [2024-11-07 15:01:10,965][04584] Environment doom_health_gathering_supreme already registered, overwriting... [2024-11-07 15:01:10,967][04584] Environment doom_battle already registered, overwriting... [2024-11-07 15:01:10,969][04584] Environment doom_battle2 already registered, overwriting... [2024-11-07 15:01:10,971][04584] Environment doom_duel_bots already registered, overwriting... [2024-11-07 15:01:10,974][04584] Environment doom_deathmatch_bots already registered, overwriting... [2024-11-07 15:01:10,975][04584] Environment doom_duel already registered, overwriting... [2024-11-07 15:01:10,976][04584] Environment doom_deathmatch_full already registered, overwriting... [2024-11-07 15:01:10,979][04584] Environment doom_benchmark already registered, overwriting... [2024-11-07 15:01:10,983][04584] register_encoder_factory: [2024-11-07 15:01:11,005][04584] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json [2024-11-07 15:01:11,008][04584] Overriding arg 'num_workers' with value 10 passed from command line [2024-11-07 15:01:11,010][04584] Overriding arg 'num_envs_per_worker' with value 6 passed from command line [2024-11-07 15:01:11,011][04584] Overriding arg 'train_for_env_steps' with value 16000000 passed from command line [2024-11-07 15:01:11,021][04584] Experiment dir /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment already exists! [2024-11-07 15:01:11,022][04584] Resuming existing experiment from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment... [2024-11-07 15:01:11,024][04584] Weights and Biases integration disabled [2024-11-07 15:01:11,027][04584] Environment var CUDA_VISIBLE_DEVICES is 0 [2024-11-07 15:01:16,848][04584] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/root/hfRL/ml/LunarLander-v2/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=10 num_envs_per_worker=6 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=16000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2024-11-07 15:01:16,849][04584] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... [2024-11-07 15:01:16,851][04584] Rollout worker 0 uses device cpu [2024-11-07 15:01:16,852][04584] Rollout worker 1 uses device cpu [2024-11-07 15:01:16,854][04584] Rollout worker 2 uses device cpu [2024-11-07 15:01:16,855][04584] Rollout worker 3 uses device cpu [2024-11-07 15:01:16,857][04584] Rollout worker 4 uses device cpu [2024-11-07 15:01:16,859][04584] Rollout worker 5 uses device cpu [2024-11-07 15:01:16,862][04584] Rollout worker 6 uses device cpu [2024-11-07 15:01:16,863][04584] Rollout worker 7 uses device cpu [2024-11-07 15:01:16,866][04584] Rollout worker 8 uses device cpu [2024-11-07 15:01:16,868][04584] Rollout worker 9 uses device cpu [2024-11-07 15:01:17,011][04584] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 15:01:17,012][04584] InferenceWorker_p0-w0: min num requests: 3 [2024-11-07 15:01:17,055][04584] Starting all processes... [2024-11-07 15:01:17,056][04584] Starting process learner_proc0 [2024-11-07 15:01:17,097][04584] Starting all processes... [2024-11-07 15:01:17,104][04584] Starting process inference_proc0-0 [2024-11-07 15:01:17,106][04584] Starting process rollout_proc0 [2024-11-07 15:01:17,106][04584] Starting process rollout_proc1 [2024-11-07 15:01:17,106][04584] Starting process rollout_proc2 [2024-11-07 15:01:17,107][04584] Starting process rollout_proc3 [2024-11-07 15:01:17,109][04584] Starting process rollout_proc4 [2024-11-07 15:01:17,109][04584] Starting process rollout_proc5 [2024-11-07 15:01:17,112][04584] Starting process rollout_proc6 [2024-11-07 15:01:17,113][04584] Starting process rollout_proc7 [2024-11-07 15:01:17,114][04584] Starting process rollout_proc8 [2024-11-07 15:01:17,125][04584] Starting process rollout_proc9 [2024-11-07 15:01:25,913][09025] Worker 0 uses CPU cores [0] [2024-11-07 15:01:25,954][09009] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 15:01:25,954][09009] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-11-07 15:01:26,005][09009] Num visible devices: 1 [2024-11-07 15:01:26,051][09009] Starting seed is not provided [2024-11-07 15:01:26,051][09009] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 15:01:26,051][09009] Initializing actor-critic model on device cuda:0 [2024-11-07 15:01:26,052][09009] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 15:01:26,053][09009] RunningMeanStd input shape: (1,) [2024-11-07 15:01:26,083][09009] ConvEncoder: input_channels=3 [2024-11-07 15:01:26,334][09028] Worker 5 uses CPU cores [5] [2024-11-07 15:01:26,676][09009] Conv encoder output size: 512 [2024-11-07 15:01:26,677][09009] Policy head output size: 512 [2024-11-07 15:01:26,705][09009] Created Actor Critic model with architecture: [2024-11-07 15:01:26,706][09009] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-11-07 15:01:26,934][09037] Worker 6 uses CPU cores [6] [2024-11-07 15:01:27,144][09029] Worker 1 uses CPU cores [1] [2024-11-07 15:01:27,256][09024] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 15:01:27,257][09024] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-11-07 15:01:27,288][09038] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] [2024-11-07 15:01:27,300][09024] Num visible devices: 1 [2024-11-07 15:01:27,315][09009] Using optimizer [2024-11-07 15:01:27,485][09026] Worker 3 uses CPU cores [3] [2024-11-07 15:01:27,594][09027] Worker 2 uses CPU cores [2] [2024-11-07 15:01:27,608][09040] Worker 8 uses CPU cores [0, 1, 2, 3, 4, 5, 6] [2024-11-07 15:01:27,695][09039] Worker 9 uses CPU cores [0, 1, 2, 3, 4, 5, 6] [2024-11-07 15:01:27,770][09030] Worker 4 uses CPU cores [4] [2024-11-07 15:01:28,594][09009] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001957_8015872.pth... [2024-11-07 15:01:28,657][09009] Loading model from checkpoint [2024-11-07 15:01:28,659][09009] Loaded experiment state at self.train_step=1957, self.env_steps=8015872 [2024-11-07 15:01:28,659][09009] Initialized policy 0 weights for model version 1957 [2024-11-07 15:01:28,667][09009] LearnerWorker_p0 finished initialization! [2024-11-07 15:01:28,667][09009] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-07 15:01:28,872][09024] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 15:01:28,873][09024] RunningMeanStd input shape: (1,) [2024-11-07 15:01:28,885][09024] ConvEncoder: input_channels=3 [2024-11-07 15:01:28,989][09024] Conv encoder output size: 512 [2024-11-07 15:01:28,990][09024] Policy head output size: 512 [2024-11-07 15:01:29,034][04584] Inference worker 0-0 is ready! [2024-11-07 15:01:29,035][04584] All inference workers are ready! Signal rollout workers to start! [2024-11-07 15:01:29,114][09030] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 15:01:29,123][09028] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 15:01:29,124][09029] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 15:01:29,146][09037] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 15:01:29,171][09027] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 15:01:29,179][09026] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 15:01:29,195][09038] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 15:01:29,196][09040] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 15:01:29,201][09039] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 15:01:29,234][09025] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-07 15:01:29,686][09030] Decorrelating experience for 0 frames... [2024-11-07 15:01:29,772][09028] Decorrelating experience for 0 frames... [2024-11-07 15:01:29,822][09026] Decorrelating experience for 0 frames... [2024-11-07 15:01:29,854][09029] Decorrelating experience for 0 frames... [2024-11-07 15:01:29,856][09039] Decorrelating experience for 0 frames... [2024-11-07 15:01:29,892][09025] Decorrelating experience for 0 frames... [2024-11-07 15:01:30,044][09037] Decorrelating experience for 0 frames... [2024-11-07 15:01:30,158][09028] Decorrelating experience for 32 frames... [2024-11-07 15:01:30,194][09029] Decorrelating experience for 32 frames... [2024-11-07 15:01:30,250][09025] Decorrelating experience for 32 frames... [2024-11-07 15:01:30,266][09026] Decorrelating experience for 32 frames... [2024-11-07 15:01:30,378][09030] Decorrelating experience for 32 frames... [2024-11-07 15:01:30,568][09029] Decorrelating experience for 64 frames... [2024-11-07 15:01:30,574][09038] Decorrelating experience for 0 frames... [2024-11-07 15:01:30,755][09037] Decorrelating experience for 32 frames... [2024-11-07 15:01:30,762][09025] Decorrelating experience for 64 frames... [2024-11-07 15:01:30,781][09028] Decorrelating experience for 64 frames... [2024-11-07 15:01:30,817][09040] Decorrelating experience for 0 frames... [2024-11-07 15:01:31,015][09038] Decorrelating experience for 32 frames... [2024-11-07 15:01:31,028][04584] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 8015872. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 15:01:31,099][09029] Decorrelating experience for 96 frames... [2024-11-07 15:01:31,165][09037] Decorrelating experience for 64 frames... [2024-11-07 15:01:31,187][09039] Decorrelating experience for 32 frames... [2024-11-07 15:01:31,232][09027] Decorrelating experience for 0 frames... [2024-11-07 15:01:31,277][09040] Decorrelating experience for 32 frames... [2024-11-07 15:01:31,623][09025] Decorrelating experience for 96 frames... [2024-11-07 15:01:31,645][09030] Decorrelating experience for 64 frames... [2024-11-07 15:01:31,670][09027] Decorrelating experience for 32 frames... [2024-11-07 15:01:31,724][09038] Decorrelating experience for 64 frames... [2024-11-07 15:01:31,779][09039] Decorrelating experience for 64 frames... [2024-11-07 15:01:31,820][09037] Decorrelating experience for 96 frames... [2024-11-07 15:01:32,124][09028] Decorrelating experience for 96 frames... [2024-11-07 15:01:32,158][09040] Decorrelating experience for 64 frames... [2024-11-07 15:01:32,236][09025] Decorrelating experience for 128 frames... [2024-11-07 15:01:32,377][09037] Decorrelating experience for 128 frames... [2024-11-07 15:01:32,378][09039] Decorrelating experience for 96 frames... [2024-11-07 15:01:32,382][09027] Decorrelating experience for 64 frames... [2024-11-07 15:01:32,595][09029] Decorrelating experience for 128 frames... [2024-11-07 15:01:32,641][09038] Decorrelating experience for 96 frames... [2024-11-07 15:01:32,642][09028] Decorrelating experience for 128 frames... [2024-11-07 15:01:32,801][09025] Decorrelating experience for 160 frames... [2024-11-07 15:01:32,914][09037] Decorrelating experience for 160 frames... [2024-11-07 15:01:33,079][09039] Decorrelating experience for 128 frames... [2024-11-07 15:01:33,111][09029] Decorrelating experience for 160 frames... [2024-11-07 15:01:33,115][09040] Decorrelating experience for 96 frames... [2024-11-07 15:01:33,257][09030] Decorrelating experience for 96 frames... [2024-11-07 15:01:33,332][09028] Decorrelating experience for 160 frames... [2024-11-07 15:01:33,679][09027] Decorrelating experience for 96 frames... [2024-11-07 15:01:33,725][09038] Decorrelating experience for 128 frames... [2024-11-07 15:01:34,031][09040] Decorrelating experience for 128 frames... [2024-11-07 15:01:34,034][09026] Decorrelating experience for 64 frames... [2024-11-07 15:01:34,045][09039] Decorrelating experience for 160 frames... [2024-11-07 15:01:34,590][09027] Decorrelating experience for 128 frames... [2024-11-07 15:01:34,691][09038] Decorrelating experience for 160 frames... [2024-11-07 15:01:34,869][09030] Decorrelating experience for 128 frames... [2024-11-07 15:01:35,419][09026] Decorrelating experience for 96 frames... [2024-11-07 15:01:35,430][09027] Decorrelating experience for 160 frames... [2024-11-07 15:01:35,434][09040] Decorrelating experience for 160 frames... [2024-11-07 15:01:36,027][04584] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 8015872. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 15:01:36,611][09026] Decorrelating experience for 128 frames... [2024-11-07 15:01:36,611][09030] Decorrelating experience for 160 frames... [2024-11-07 15:01:37,001][04584] Heartbeat connected on Batcher_0 [2024-11-07 15:01:37,006][04584] Heartbeat connected on LearnerWorker_p0 [2024-11-07 15:01:37,020][04584] Heartbeat connected on RolloutWorker_w0 [2024-11-07 15:01:37,022][04584] Heartbeat connected on RolloutWorker_w1 [2024-11-07 15:01:37,030][04584] Heartbeat connected on RolloutWorker_w2 [2024-11-07 15:01:37,037][04584] Heartbeat connected on RolloutWorker_w4 [2024-11-07 15:01:37,040][04584] Heartbeat connected on RolloutWorker_w5 [2024-11-07 15:01:37,047][04584] Heartbeat connected on RolloutWorker_w7 [2024-11-07 15:01:37,050][04584] Heartbeat connected on RolloutWorker_w8 [2024-11-07 15:01:37,054][04584] Heartbeat connected on RolloutWorker_w6 [2024-11-07 15:01:37,056][04584] Heartbeat connected on RolloutWorker_w9 [2024-11-07 15:01:37,083][04584] Heartbeat connected on InferenceWorker_p0-w0 [2024-11-07 15:01:37,535][09026] Decorrelating experience for 160 frames... [2024-11-07 15:01:37,748][04584] Heartbeat connected on RolloutWorker_w3 [2024-11-07 15:01:39,624][09009] Signal inference workers to stop experience collection... [2024-11-07 15:01:39,636][09024] InferenceWorker_p0-w0: stopping experience collection [2024-11-07 15:01:41,028][04584] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 8015872. Throughput: 0: 266.1. Samples: 2661. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 15:01:41,030][04584] Avg episode reward: [(0, '1.997')] [2024-11-07 15:01:46,027][04584] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 8015872. Throughput: 0: 322.8. Samples: 4842. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-07 15:01:46,029][04584] Avg episode reward: [(0, '1.997')] [2024-11-07 15:01:49,711][09009] Signal inference workers to resume experience collection... [2024-11-07 15:01:49,726][09024] InferenceWorker_p0-w0: resuming experience collection [2024-11-07 15:01:51,029][04584] Fps is (10 sec: 1638.4, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8032256. Throughput: 0: 248.1. Samples: 4962. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-11-07 15:01:51,034][04584] Avg episode reward: [(0, '2.057')] [2024-11-07 15:01:56,244][04584] Fps is (10 sec: 3207.3, 60 sec: 1299.5, 300 sec: 1299.5). Total num frames: 8048640. Throughput: 0: 315.9. Samples: 7965. Policy #0 lag: (min: 0.0, avg: 1.5, max: 5.0) [2024-11-07 15:01:56,253][04584] Avg episode reward: [(0, '3.230')] [2024-11-07 15:01:58,030][09024] Updated weights for policy 0, policy_version 1967 (0.0072) [2024-11-07 15:02:01,029][04584] Fps is (10 sec: 3276.4, 60 sec: 1638.3, 300 sec: 1638.3). Total num frames: 8065024. Throughput: 0: 400.0. Samples: 12000. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:02:01,049][04584] Avg episode reward: [(0, '4.097')] [2024-11-07 15:02:06,035][04584] Fps is (10 sec: 2928.5, 60 sec: 1755.1, 300 sec: 1755.1). Total num frames: 8077312. Throughput: 0: 478.1. Samples: 16737. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-11-07 15:02:06,038][04584] Avg episode reward: [(0, '4.522')] [2024-11-07 15:02:09,709][09024] Updated weights for policy 0, policy_version 1977 (0.0075) [2024-11-07 15:02:11,028][04584] Fps is (10 sec: 3686.6, 60 sec: 2150.4, 300 sec: 2150.4). Total num frames: 8101888. Throughput: 0: 483.9. Samples: 19356. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:02:11,034][04584] Avg episode reward: [(0, '4.531')] [2024-11-07 15:02:16,028][04584] Fps is (10 sec: 5328.5, 60 sec: 2548.6, 300 sec: 2548.6). Total num frames: 8130560. Throughput: 0: 634.1. Samples: 28533. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:02:16,037][04584] Avg episode reward: [(0, '4.269')] [2024-11-07 15:02:16,968][09024] Updated weights for policy 0, policy_version 1987 (0.0049) [2024-11-07 15:02:21,028][04584] Fps is (10 sec: 5325.0, 60 sec: 2785.3, 300 sec: 2785.3). Total num frames: 8155136. Throughput: 0: 791.9. Samples: 35634. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2024-11-07 15:02:21,040][04584] Avg episode reward: [(0, '4.001')] [2024-11-07 15:02:24,670][09024] Updated weights for policy 0, policy_version 1997 (0.0064) [2024-11-07 15:02:26,032][04584] Fps is (10 sec: 5731.8, 60 sec: 3127.6, 300 sec: 3127.6). Total num frames: 8187904. Throughput: 0: 829.0. Samples: 39969. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2024-11-07 15:02:26,037][04584] Avg episode reward: [(0, '4.182')] [2024-11-07 15:02:31,028][04584] Fps is (10 sec: 4505.7, 60 sec: 3072.0, 300 sec: 3072.0). Total num frames: 8200192. Throughput: 0: 954.2. Samples: 47781. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-11-07 15:02:31,029][04584] Avg episode reward: [(0, '4.225')] [2024-11-07 15:02:34,783][09024] Updated weights for policy 0, policy_version 2007 (0.0084) [2024-11-07 15:02:36,032][04584] Fps is (10 sec: 3686.7, 60 sec: 3481.4, 300 sec: 3213.6). Total num frames: 8224768. Throughput: 0: 1066.8. Samples: 52974. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:02:36,045][04584] Avg episode reward: [(0, '4.560')] [2024-11-07 15:02:41,030][04584] Fps is (10 sec: 5733.4, 60 sec: 4027.6, 300 sec: 3452.2). Total num frames: 8257536. Throughput: 0: 1122.3. Samples: 58227. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:02:41,181][04584] Avg episode reward: [(0, '4.499')] [2024-11-07 15:02:42,375][09024] Updated weights for policy 0, policy_version 2017 (0.0076) [2024-11-07 15:02:46,028][04584] Fps is (10 sec: 5736.4, 60 sec: 4437.3, 300 sec: 3549.8). Total num frames: 8282112. Throughput: 0: 1179.7. Samples: 65085. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:02:46,033][04584] Avg episode reward: [(0, '4.458')] [2024-11-07 15:02:48,966][09024] Updated weights for policy 0, policy_version 2027 (0.0057) [2024-11-07 15:02:51,044][04584] Fps is (10 sec: 5317.1, 60 sec: 4640.9, 300 sec: 3685.6). Total num frames: 8310784. Throughput: 0: 1271.3. Samples: 73959. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-11-07 15:02:51,050][04584] Avg episode reward: [(0, '4.488')] [2024-11-07 15:02:55,463][09024] Updated weights for policy 0, policy_version 2037 (0.0045) [2024-11-07 15:02:56,027][04584] Fps is (10 sec: 6553.9, 60 sec: 5001.5, 300 sec: 3903.3). Total num frames: 8347648. Throughput: 0: 1327.8. Samples: 79104. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:02:56,029][04584] Avg episode reward: [(0, '4.301')] [2024-11-07 15:03:01,028][04584] Fps is (10 sec: 6974.4, 60 sec: 5256.6, 300 sec: 4050.5). Total num frames: 8380416. Throughput: 0: 1384.7. Samples: 90843. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) [2024-11-07 15:03:01,035][04584] Avg episode reward: [(0, '4.534')] [2024-11-07 15:03:01,342][09024] Updated weights for policy 0, policy_version 2047 (0.0066) [2024-11-07 15:03:06,028][04584] Fps is (10 sec: 4505.6, 60 sec: 5257.2, 300 sec: 3966.7). Total num frames: 8392704. Throughput: 0: 1321.1. Samples: 95085. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-11-07 15:03:06,035][04584] Avg episode reward: [(0, '4.484')] [2024-11-07 15:03:11,030][04584] Fps is (10 sec: 4095.1, 60 sec: 5324.6, 300 sec: 4054.9). Total num frames: 8421376. Throughput: 0: 1306.7. Samples: 98769. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:03:11,033][04584] Avg episode reward: [(0, '4.455')] [2024-11-07 15:03:11,076][09009] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000002056_8421376.pth... [2024-11-07 15:03:12,209][09009] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth [2024-11-07 15:03:12,448][09024] Updated weights for policy 0, policy_version 2057 (0.0053) [2024-11-07 15:03:16,028][04584] Fps is (10 sec: 5324.7, 60 sec: 5256.5, 300 sec: 4096.0). Total num frames: 8445952. Throughput: 0: 1310.5. Samples: 106755. Policy #0 lag: (min: 0.0, avg: 1.9, max: 3.0) [2024-11-07 15:03:16,033][04584] Avg episode reward: [(0, '4.398')] [2024-11-07 15:03:18,413][09024] Updated weights for policy 0, policy_version 2067 (0.0046) [2024-11-07 15:03:21,028][04584] Fps is (10 sec: 6145.6, 60 sec: 5461.4, 300 sec: 4244.9). Total num frames: 8482816. Throughput: 0: 1428.1. Samples: 117234. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2024-11-07 15:03:21,032][04584] Avg episode reward: [(0, '4.426')] [2024-11-07 15:03:23,406][09024] Updated weights for policy 0, policy_version 2077 (0.0053) [2024-11-07 15:03:26,029][04584] Fps is (10 sec: 8191.4, 60 sec: 5666.5, 300 sec: 4452.1). Total num frames: 8527872. Throughput: 0: 1449.0. Samples: 123429. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-11-07 15:03:26,044][04584] Avg episode reward: [(0, '4.548')] [2024-11-07 15:03:28,644][09024] Updated weights for policy 0, policy_version 2087 (0.0052) [2024-11-07 15:03:31,028][04584] Fps is (10 sec: 8601.3, 60 sec: 6144.0, 300 sec: 4608.0). Total num frames: 8568832. Throughput: 0: 1560.3. Samples: 135300. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:03:31,038][04584] Avg episode reward: [(0, '4.514')] [2024-11-07 15:03:33,631][09024] Updated weights for policy 0, policy_version 2097 (0.0036) [2024-11-07 15:03:36,147][04584] Fps is (10 sec: 6882.2, 60 sec: 6200.4, 300 sec: 4648.6). Total num frames: 8597504. Throughput: 0: 1612.3. Samples: 146679. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:03:36,153][04584] Avg episode reward: [(0, '4.475')] [2024-11-07 15:03:41,028][04584] Fps is (10 sec: 4096.1, 60 sec: 5871.1, 300 sec: 4568.6). Total num frames: 8609792. Throughput: 0: 1535.9. Samples: 148221. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) [2024-11-07 15:03:41,049][04584] Avg episode reward: [(0, '4.297')] [2024-11-07 15:03:43,494][09024] Updated weights for policy 0, policy_version 2107 (0.0056) [2024-11-07 15:03:46,039][04584] Fps is (10 sec: 4554.6, 60 sec: 6006.4, 300 sec: 4641.8). Total num frames: 8642560. Throughput: 0: 1440.9. Samples: 155700. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2024-11-07 15:03:46,062][04584] Avg episode reward: [(0, '4.360')] [2024-11-07 15:03:50,512][09024] Updated weights for policy 0, policy_version 2117 (0.0062) [2024-11-07 15:03:51,029][04584] Fps is (10 sec: 6552.9, 60 sec: 6077.3, 300 sec: 4710.4). Total num frames: 8675328. Throughput: 0: 1533.7. Samples: 164103. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:03:51,034][04584] Avg episode reward: [(0, '4.391')] [2024-11-07 15:03:56,028][04584] Fps is (10 sec: 6560.9, 60 sec: 6007.5, 300 sec: 4774.0). Total num frames: 8708096. Throughput: 0: 1600.7. Samples: 170796. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-07 15:03:56,034][04584] Avg episode reward: [(0, '4.322')] [2024-11-07 15:03:56,064][09024] Updated weights for policy 0, policy_version 2127 (0.0079) [2024-11-07 15:04:01,028][04584] Fps is (10 sec: 6554.4, 60 sec: 6007.5, 300 sec: 4833.3). Total num frames: 8740864. Throughput: 0: 1627.9. Samples: 180012. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:04:01,029][04584] Avg episode reward: [(0, '4.624')] [2024-11-07 15:04:03,142][09024] Updated weights for policy 0, policy_version 2137 (0.0080) [2024-11-07 15:04:06,031][04584] Fps is (10 sec: 6551.5, 60 sec: 6348.5, 300 sec: 4888.7). Total num frames: 8773632. Throughput: 0: 1591.7. Samples: 188865. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:04:06,033][04584] Avg episode reward: [(0, '4.471')] [2024-11-07 15:04:09,383][09024] Updated weights for policy 0, policy_version 2147 (0.0045) [2024-11-07 15:04:11,030][04584] Fps is (10 sec: 6142.3, 60 sec: 6348.8, 300 sec: 4915.1). Total num frames: 8802304. Throughput: 0: 1567.7. Samples: 193977. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2024-11-07 15:04:11,034][04584] Avg episode reward: [(0, '4.431')] [2024-11-07 15:04:16,028][04584] Fps is (10 sec: 5326.3, 60 sec: 6348.8, 300 sec: 4915.2). Total num frames: 8826880. Throughput: 0: 1439.4. Samples: 200073. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-11-07 15:04:16,031][04584] Avg episode reward: [(0, '4.391')] [2024-11-07 15:04:17,708][09024] Updated weights for policy 0, policy_version 2157 (0.0058) [2024-11-07 15:04:21,028][04584] Fps is (10 sec: 6145.7, 60 sec: 6348.8, 300 sec: 4987.5). Total num frames: 8863744. Throughput: 0: 1450.2. Samples: 211764. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:04:21,030][04584] Avg episode reward: [(0, '4.511')] [2024-11-07 15:04:22,029][09024] Updated weights for policy 0, policy_version 2167 (0.0046) [2024-11-07 15:04:26,028][04584] Fps is (10 sec: 7373.2, 60 sec: 6212.4, 300 sec: 5055.6). Total num frames: 8900608. Throughput: 0: 1557.3. Samples: 218301. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:04:26,031][04584] Avg episode reward: [(0, '4.462')] [2024-11-07 15:04:28,346][09024] Updated weights for policy 0, policy_version 2177 (0.0073) [2024-11-07 15:04:31,028][04584] Fps is (10 sec: 7372.6, 60 sec: 6144.0, 300 sec: 5120.0). Total num frames: 8937472. Throughput: 0: 1611.6. Samples: 228204. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:04:31,029][04584] Avg episode reward: [(0, '4.386')] [2024-11-07 15:04:33,371][09024] Updated weights for policy 0, policy_version 2187 (0.0044) [2024-11-07 15:04:36,034][04584] Fps is (10 sec: 8186.5, 60 sec: 6429.1, 300 sec: 5225.0). Total num frames: 8982528. Throughput: 0: 1702.5. Samples: 240723. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:04:36,038][04584] Avg episode reward: [(0, '4.766')] [2024-11-07 15:04:37,745][09024] Updated weights for policy 0, policy_version 2197 (0.0031) [2024-11-07 15:04:41,028][04584] Fps is (10 sec: 9011.4, 60 sec: 6963.2, 300 sec: 5324.8). Total num frames: 9027584. Throughput: 0: 1710.0. Samples: 247746. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:04:41,030][04584] Avg episode reward: [(0, '4.323')] [2024-11-07 15:04:42,365][09024] Updated weights for policy 0, policy_version 2207 (0.0034) [2024-11-07 15:04:47,914][04584] Fps is (10 sec: 7240.3, 60 sec: 6884.5, 300 sec: 5346.6). Total num frames: 9068544. Throughput: 0: 1731.7. Samples: 261207. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) [2024-11-07 15:04:47,916][04584] Avg episode reward: [(0, '4.443')] [2024-11-07 15:04:48,941][09024] Updated weights for policy 0, policy_version 2217 (0.0039) [2024-11-07 15:04:51,029][04584] Fps is (10 sec: 6143.1, 60 sec: 6894.9, 300 sec: 5365.7). Total num frames: 9089024. Throughput: 0: 1786.5. Samples: 269256. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:04:51,038][04584] Avg episode reward: [(0, '4.315')] [2024-11-07 15:04:55,889][09024] Updated weights for policy 0, policy_version 2227 (0.0049) [2024-11-07 15:04:56,028][04584] Fps is (10 sec: 6563.0, 60 sec: 6894.9, 300 sec: 5394.7). Total num frames: 9121792. Throughput: 0: 1776.3. Samples: 273906. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2024-11-07 15:04:56,030][04584] Avg episode reward: [(0, '4.585')] [2024-11-07 15:05:01,028][04584] Fps is (10 sec: 6145.0, 60 sec: 6826.7, 300 sec: 5402.8). Total num frames: 9150464. Throughput: 0: 1833.8. Samples: 282591. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:05:01,030][04584] Avg episode reward: [(0, '4.626')] [2024-11-07 15:05:02,542][09024] Updated weights for policy 0, policy_version 2237 (0.0078) [2024-11-07 15:05:06,028][04584] Fps is (10 sec: 7372.5, 60 sec: 7031.8, 300 sec: 5486.7). Total num frames: 9195520. Throughput: 0: 1828.4. Samples: 294042. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:05:06,031][04584] Avg episode reward: [(0, '4.594')] [2024-11-07 15:05:06,896][09024] Updated weights for policy 0, policy_version 2247 (0.0041) [2024-11-07 15:05:11,028][04584] Fps is (10 sec: 8601.0, 60 sec: 7236.5, 300 sec: 5548.2). Total num frames: 9236480. Throughput: 0: 1833.2. Samples: 300798. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2024-11-07 15:05:11,030][04584] Avg episode reward: [(0, '4.237')] [2024-11-07 15:05:11,039][09009] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000002255_9236480.pth... [2024-11-07 15:05:11,180][09009] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001957_8015872.pth [2024-11-07 15:05:11,798][09024] Updated weights for policy 0, policy_version 2257 (0.0037) [2024-11-07 15:05:16,028][04584] Fps is (10 sec: 6963.7, 60 sec: 7304.6, 300 sec: 5552.4). Total num frames: 9265152. Throughput: 0: 1843.3. Samples: 311151. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:05:16,029][04584] Avg episode reward: [(0, '4.511')] [2024-11-07 15:05:18,230][09024] Updated weights for policy 0, policy_version 2267 (0.0036) [2024-11-07 15:05:22,243][04584] Fps is (10 sec: 5478.3, 60 sec: 7092.5, 300 sec: 5544.8). Total num frames: 9297920. Throughput: 0: 1763.7. Samples: 322221. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:05:22,263][04584] Avg episode reward: [(0, '4.497')] [2024-11-07 15:05:25,579][09024] Updated weights for policy 0, policy_version 2277 (0.0035) [2024-11-07 15:05:26,028][04584] Fps is (10 sec: 6553.6, 60 sec: 7168.0, 300 sec: 5595.0). Total num frames: 9330688. Throughput: 0: 1691.6. Samples: 323868. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2024-11-07 15:05:26,029][04584] Avg episode reward: [(0, '4.590')] [2024-11-07 15:05:30,294][09024] Updated weights for policy 0, policy_version 2287 (0.0041) [2024-11-07 15:05:31,028][04584] Fps is (10 sec: 8393.2, 60 sec: 7236.3, 300 sec: 5649.1). Total num frames: 9371648. Throughput: 0: 1751.2. Samples: 336708. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:05:31,030][04584] Avg episode reward: [(0, '4.708')] [2024-11-07 15:05:34,912][09024] Updated weights for policy 0, policy_version 2297 (0.0042) [2024-11-07 15:05:36,027][04584] Fps is (10 sec: 9011.3, 60 sec: 7305.4, 300 sec: 5734.4). Total num frames: 9420800. Throughput: 0: 1796.1. Samples: 350079. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:05:36,029][04584] Avg episode reward: [(0, '4.454')] [2024-11-07 15:05:39,037][09024] Updated weights for policy 0, policy_version 2307 (0.0063) [2024-11-07 15:05:41,029][04584] Fps is (10 sec: 8600.8, 60 sec: 7167.9, 300 sec: 5767.1). Total num frames: 9457664. Throughput: 0: 1853.6. Samples: 357318. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) [2024-11-07 15:05:41,031][04584] Avg episode reward: [(0, '4.661')] [2024-11-07 15:05:45,685][09024] Updated weights for policy 0, policy_version 2317 (0.0065) [2024-11-07 15:05:46,028][04584] Fps is (10 sec: 7372.7, 60 sec: 7330.3, 300 sec: 5798.7). Total num frames: 9494528. Throughput: 0: 1890.9. Samples: 367683. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2024-11-07 15:05:46,030][04584] Avg episode reward: [(0, '4.410')] [2024-11-07 15:05:50,912][09024] Updated weights for policy 0, policy_version 2327 (0.0035) [2024-11-07 15:05:51,028][04584] Fps is (10 sec: 7373.4, 60 sec: 7373.0, 300 sec: 5828.9). Total num frames: 9531392. Throughput: 0: 1887.5. Samples: 378978. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:05:51,031][04584] Avg episode reward: [(0, '4.357')] [2024-11-07 15:05:56,548][04584] Fps is (10 sec: 5450.5, 60 sec: 7106.3, 300 sec: 5784.9). Total num frames: 9551872. Throughput: 0: 1821.6. Samples: 383718. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-11-07 15:05:56,552][04584] Avg episode reward: [(0, '4.499')] [2024-11-07 15:05:59,046][09024] Updated weights for policy 0, policy_version 2337 (0.0071) [2024-11-07 15:06:01,028][04584] Fps is (10 sec: 5324.8, 60 sec: 7236.3, 300 sec: 5810.3). Total num frames: 9584640. Throughput: 0: 1762.5. Samples: 390465. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-11-07 15:06:01,030][04584] Avg episode reward: [(0, '4.315')] [2024-11-07 15:06:04,144][09024] Updated weights for policy 0, policy_version 2347 (0.0033) [2024-11-07 15:06:06,029][04584] Fps is (10 sec: 7777.0, 60 sec: 7168.0, 300 sec: 5853.5). Total num frames: 9625600. Throughput: 0: 1835.6. Samples: 402594. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:06:06,031][04584] Avg episode reward: [(0, '4.349')] [2024-11-07 15:06:09,038][09024] Updated weights for policy 0, policy_version 2357 (0.0025) [2024-11-07 15:06:11,028][04584] Fps is (10 sec: 8192.0, 60 sec: 7168.1, 300 sec: 5895.3). Total num frames: 9666560. Throughput: 0: 1899.2. Samples: 409332. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2024-11-07 15:06:11,033][04584] Avg episode reward: [(0, '4.490')] [2024-11-07 15:06:14,224][09024] Updated weights for policy 0, policy_version 2367 (0.0037) [2024-11-07 15:06:16,028][04584] Fps is (10 sec: 8192.1, 60 sec: 7372.7, 300 sec: 5935.6). Total num frames: 9707520. Throughput: 0: 1871.7. Samples: 420936. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:06:16,030][04584] Avg episode reward: [(0, '4.367')] [2024-11-07 15:06:19,411][09024] Updated weights for policy 0, policy_version 2377 (0.0041) [2024-11-07 15:06:21,029][04584] Fps is (10 sec: 8190.8, 60 sec: 7664.4, 300 sec: 5974.5). Total num frames: 9748480. Throughput: 0: 1848.9. Samples: 433281. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-11-07 15:06:21,032][04584] Avg episode reward: [(0, '4.296')] [2024-11-07 15:06:24,504][09024] Updated weights for policy 0, policy_version 2387 (0.0053) [2024-11-07 15:06:26,027][04584] Fps is (10 sec: 7783.2, 60 sec: 7577.6, 300 sec: 5998.2). Total num frames: 9785344. Throughput: 0: 1812.0. Samples: 438855. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2024-11-07 15:06:26,031][04584] Avg episode reward: [(0, '4.495')] [2024-11-07 15:06:31,028][04584] Fps is (10 sec: 5735.2, 60 sec: 7236.3, 300 sec: 6067.6). Total num frames: 9805824. Throughput: 0: 1818.3. Samples: 449505. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-11-07 15:06:31,030][04584] Avg episode reward: [(0, '4.282')] [2024-11-07 15:06:33,393][09024] Updated weights for policy 0, policy_version 2397 (0.0054) [2024-11-07 15:06:36,028][04584] Fps is (10 sec: 4505.4, 60 sec: 6826.6, 300 sec: 6150.9). Total num frames: 9830400. Throughput: 0: 1667.3. Samples: 454008. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-11-07 15:06:36,033][04584] Avg episode reward: [(0, '4.324')] [2024-11-07 15:06:41,028][04584] Fps is (10 sec: 4505.6, 60 sec: 6553.7, 300 sec: 6220.4). Total num frames: 9850880. Throughput: 0: 1647.4. Samples: 456993. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:06:41,030][04584] Avg episode reward: [(0, '4.430')] [2024-11-07 15:06:42,724][09024] Updated weights for policy 0, policy_version 2407 (0.0111) [2024-11-07 15:06:46,037][04584] Fps is (10 sec: 4092.3, 60 sec: 6279.5, 300 sec: 6234.1). Total num frames: 9871360. Throughput: 0: 1631.1. Samples: 463878. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-11-07 15:06:46,049][04584] Avg episode reward: [(0, '4.396')] [2024-11-07 15:06:50,217][09024] Updated weights for policy 0, policy_version 2417 (0.0070) [2024-11-07 15:06:51,028][04584] Fps is (10 sec: 5324.8, 60 sec: 6212.3, 300 sec: 6294.4). Total num frames: 9904128. Throughput: 0: 1541.4. Samples: 471957. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:06:51,033][04584] Avg episode reward: [(0, '4.471')] [2024-11-07 15:06:56,028][04584] Fps is (10 sec: 6559.7, 60 sec: 6473.3, 300 sec: 6345.4). Total num frames: 9936896. Throughput: 0: 1495.2. Samples: 476616. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:06:56,030][04584] Avg episode reward: [(0, '4.333')] [2024-11-07 15:06:56,741][09024] Updated weights for policy 0, policy_version 2427 (0.0055) [2024-11-07 15:07:01,028][04584] Fps is (10 sec: 6143.9, 60 sec: 6348.8, 300 sec: 6401.0). Total num frames: 9965568. Throughput: 0: 1452.2. Samples: 486282. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-11-07 15:07:01,030][04584] Avg episode reward: [(0, '4.301')] [2024-11-07 15:07:06,028][04584] Fps is (10 sec: 4096.0, 60 sec: 5871.0, 300 sec: 6359.2). Total num frames: 9977856. Throughput: 0: 1286.4. Samples: 491169. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-11-07 15:07:06,029][04584] Avg episode reward: [(0, '4.232')] [2024-11-07 15:07:06,836][09024] Updated weights for policy 0, policy_version 2437 (0.0050) [2024-11-07 15:07:11,028][04584] Fps is (10 sec: 2867.1, 60 sec: 5461.3, 300 sec: 6317.6). Total num frames: 9994240. Throughput: 0: 1219.8. Samples: 493749. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:07:11,049][04584] Avg episode reward: [(0, '4.415')] [2024-11-07 15:07:11,110][09009] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000002440_9994240.pth... [2024-11-07 15:07:12,466][09009] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000002056_8421376.pth [2024-11-07 15:07:16,031][04584] Fps is (10 sec: 3685.2, 60 sec: 5119.8, 300 sec: 6303.6). Total num frames: 10014720. Throughput: 0: 1105.2. Samples: 499242. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2024-11-07 15:07:16,033][04584] Avg episode reward: [(0, '4.528')] [2024-11-07 15:07:17,973][09024] Updated weights for policy 0, policy_version 2447 (0.0103) [2024-11-07 15:07:21,028][04584] Fps is (10 sec: 4096.0, 60 sec: 4778.8, 300 sec: 6262.1). Total num frames: 10035200. Throughput: 0: 1144.8. Samples: 505524. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:07:21,031][04584] Avg episode reward: [(0, '4.479')] [2024-11-07 15:07:26,028][04584] Fps is (10 sec: 4097.4, 60 sec: 4505.6, 300 sec: 6289.8). Total num frames: 10055680. Throughput: 0: 1155.7. Samples: 509001. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-11-07 15:07:26,030][04584] Avg episode reward: [(0, '4.388')] [2024-11-07 15:07:26,954][09024] Updated weights for policy 0, policy_version 2457 (0.0073) [2024-11-07 15:07:31,029][04584] Fps is (10 sec: 4914.9, 60 sec: 4642.1, 300 sec: 6303.7). Total num frames: 10084352. Throughput: 0: 1154.4. Samples: 515817. Policy #0 lag: (min: 0.0, avg: 1.3, max: 4.0) [2024-11-07 15:07:31,031][04584] Avg episode reward: [(0, '4.512')] [2024-11-07 15:07:34,484][09024] Updated weights for policy 0, policy_version 2467 (0.0077) [2024-11-07 15:07:36,039][04584] Fps is (10 sec: 5318.9, 60 sec: 4641.3, 300 sec: 6275.7). Total num frames: 10108928. Throughput: 0: 1161.7. Samples: 524247. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-11-07 15:07:36,041][04584] Avg episode reward: [(0, '4.336')] [2024-11-07 15:07:41,029][04584] Fps is (10 sec: 4095.9, 60 sec: 4573.8, 300 sec: 6248.1). Total num frames: 10125312. Throughput: 0: 1109.8. Samples: 526557. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-11-07 15:07:41,033][04584] Avg episode reward: [(0, '4.377')] [2024-11-07 15:07:44,723][09024] Updated weights for policy 0, policy_version 2477 (0.0055) [2024-11-07 15:07:46,028][04584] Fps is (10 sec: 4510.3, 60 sec: 4711.1, 300 sec: 6248.5). Total num frames: 10153984. Throughput: 0: 1039.3. Samples: 533052. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-11-07 15:07:46,030][04584] Avg episode reward: [(0, '4.577')] [2024-11-07 15:07:51,028][04584] Fps is (10 sec: 5325.4, 60 sec: 4573.9, 300 sec: 6206.5). Total num frames: 10178560. Throughput: 0: 1120.3. Samples: 541584. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:07:51,030][04584] Avg episode reward: [(0, '4.321')] [2024-11-07 15:07:52,827][09024] Updated weights for policy 0, policy_version 2487 (0.0129) [2024-11-07 15:07:56,028][04584] Fps is (10 sec: 4915.5, 60 sec: 4437.3, 300 sec: 6178.7). Total num frames: 10203136. Throughput: 0: 1124.5. Samples: 544350. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:07:56,030][04584] Avg episode reward: [(0, '4.500')] [2024-11-07 15:07:59,814][09024] Updated weights for policy 0, policy_version 2497 (0.0061) [2024-11-07 15:08:01,029][04584] Fps is (10 sec: 5324.1, 60 sec: 4437.2, 300 sec: 6234.2). Total num frames: 10231808. Throughput: 0: 1192.4. Samples: 552897. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-11-07 15:08:01,033][04584] Avg episode reward: [(0, '4.706')] [2024-11-07 15:08:06,028][04584] Fps is (10 sec: 6144.0, 60 sec: 4778.7, 300 sec: 6248.2). Total num frames: 10264576. Throughput: 0: 1246.2. Samples: 561603. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:08:06,029][04584] Avg episode reward: [(0, '4.515')] [2024-11-07 15:08:06,822][09024] Updated weights for policy 0, policy_version 2507 (0.0059) [2024-11-07 15:08:11,035][04584] Fps is (10 sec: 6140.4, 60 sec: 4982.9, 300 sec: 6261.9). Total num frames: 10293248. Throughput: 0: 1278.5. Samples: 566541. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:08:11,038][04584] Avg episode reward: [(0, '4.344')] [2024-11-07 15:08:15,470][09024] Updated weights for policy 0, policy_version 2517 (0.0058) [2024-11-07 15:08:16,028][04584] Fps is (10 sec: 4915.1, 60 sec: 4983.7, 300 sec: 6206.5). Total num frames: 10313728. Throughput: 0: 1262.9. Samples: 572646. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:08:16,040][04584] Avg episode reward: [(0, '4.524')] [2024-11-07 15:08:21,027][04584] Fps is (10 sec: 4918.9, 60 sec: 5120.0, 300 sec: 6151.0). Total num frames: 10342400. Throughput: 0: 1276.5. Samples: 581676. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:08:21,030][04584] Avg episode reward: [(0, '4.594')] [2024-11-07 15:08:22,084][09024] Updated weights for policy 0, policy_version 2527 (0.0055) [2024-11-07 15:08:26,028][04584] Fps is (10 sec: 6143.8, 60 sec: 5324.8, 300 sec: 6123.2). Total num frames: 10375168. Throughput: 0: 1339.8. Samples: 586845. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:08:26,030][04584] Avg episode reward: [(0, '4.716')] [2024-11-07 15:08:28,099][09024] Updated weights for policy 0, policy_version 2537 (0.0069) [2024-11-07 15:08:31,028][04584] Fps is (10 sec: 6553.3, 60 sec: 5393.1, 300 sec: 6139.5). Total num frames: 10407936. Throughput: 0: 1419.3. Samples: 596922. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:08:31,031][04584] Avg episode reward: [(0, '4.377')] [2024-11-07 15:08:34,340][09024] Updated weights for policy 0, policy_version 2547 (0.0045) [2024-11-07 15:08:36,028][04584] Fps is (10 sec: 6553.8, 60 sec: 5530.6, 300 sec: 6206.5). Total num frames: 10440704. Throughput: 0: 1443.5. Samples: 606540. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-11-07 15:08:36,030][04584] Avg episode reward: [(0, '4.421')] [2024-11-07 15:08:40,592][09024] Updated weights for policy 0, policy_version 2557 (0.0049) [2024-11-07 15:08:41,028][04584] Fps is (10 sec: 6553.6, 60 sec: 5802.8, 300 sec: 6206.7). Total num frames: 10473472. Throughput: 0: 1493.3. Samples: 611547. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:08:41,034][04584] Avg episode reward: [(0, '4.527')] [2024-11-07 15:08:48,061][04584] Fps is (10 sec: 5446.4, 60 sec: 5678.6, 300 sec: 6164.0). Total num frames: 10506240. Throughput: 0: 1458.3. Samples: 621483. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-11-07 15:08:48,062][04584] Avg episode reward: [(0, '4.465')] [2024-11-07 15:08:48,629][09024] Updated weights for policy 0, policy_version 2567 (0.0053) [2024-11-07 15:08:51,028][04584] Fps is (10 sec: 5324.6, 60 sec: 5802.6, 300 sec: 6164.8). Total num frames: 10526720. Throughput: 0: 1481.7. Samples: 628281. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2024-11-07 15:08:51,037][04584] Avg episode reward: [(0, '4.470')] [2024-11-07 15:08:54,930][09024] Updated weights for policy 0, policy_version 2577 (0.0040) [2024-11-07 15:08:56,028][04584] Fps is (10 sec: 6683.4, 60 sec: 5939.2, 300 sec: 6164.8). Total num frames: 10559488. Throughput: 0: 1483.7. Samples: 633297. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) [2024-11-07 15:08:56,035][04584] Avg episode reward: [(0, '4.549')] [2024-11-07 15:09:00,969][09024] Updated weights for policy 0, policy_version 2587 (0.0046) [2024-11-07 15:09:01,028][04584] Fps is (10 sec: 6963.3, 60 sec: 6075.8, 300 sec: 6178.8). Total num frames: 10596352. Throughput: 0: 1563.5. Samples: 643005. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:09:01,030][04584] Avg episode reward: [(0, '4.366')] [2024-11-07 15:09:06,030][04584] Fps is (10 sec: 6552.4, 60 sec: 6007.3, 300 sec: 6178.7). Total num frames: 10625024. Throughput: 0: 1577.3. Samples: 652659. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:09:06,049][04584] Avg episode reward: [(0, '4.493')] [2024-11-07 15:09:08,985][09024] Updated weights for policy 0, policy_version 2597 (0.0064) [2024-11-07 15:09:11,030][04584] Fps is (10 sec: 4914.3, 60 sec: 5871.5, 300 sec: 6164.8). Total num frames: 10645504. Throughput: 0: 1533.3. Samples: 655848. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:09:11,036][04584] Avg episode reward: [(0, '4.463')] [2024-11-07 15:09:11,070][09009] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000002599_10645504.pth... [2024-11-07 15:09:11,734][09009] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000002255_9236480.pth [2024-11-07 15:09:16,028][04584] Fps is (10 sec: 4096.8, 60 sec: 5870.9, 300 sec: 6109.3). Total num frames: 10665984. Throughput: 0: 1445.5. Samples: 661971. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:09:16,032][04584] Avg episode reward: [(0, '4.449')] [2024-11-07 15:09:18,879][09024] Updated weights for policy 0, policy_version 2607 (0.0099) [2024-11-07 15:09:22,383][04584] Fps is (10 sec: 3247.1, 60 sec: 5541.0, 300 sec: 6012.2). Total num frames: 10682368. Throughput: 0: 1330.0. Samples: 668193. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:09:22,396][04584] Avg episode reward: [(0, '4.305')] [2024-11-07 15:09:26,028][04584] Fps is (10 sec: 3276.8, 60 sec: 5393.1, 300 sec: 5970.4). Total num frames: 10698752. Throughput: 0: 1279.7. Samples: 669135. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:09:26,030][04584] Avg episode reward: [(0, '4.290')] [2024-11-07 15:09:29,818][09024] Updated weights for policy 0, policy_version 2617 (0.0067) [2024-11-07 15:09:31,028][04584] Fps is (10 sec: 4738.2, 60 sec: 5256.6, 300 sec: 5901.2). Total num frames: 10723328. Throughput: 0: 1272.1. Samples: 676140. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-11-07 15:09:31,030][04584] Avg episode reward: [(0, '4.222')] [2024-11-07 15:09:36,028][04584] Fps is (10 sec: 4505.6, 60 sec: 5051.7, 300 sec: 5817.7). Total num frames: 10743808. Throughput: 0: 1216.6. Samples: 683025. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:09:36,029][04584] Avg episode reward: [(0, '4.278')] [2024-11-07 15:09:39,151][09024] Updated weights for policy 0, policy_version 2627 (0.0095) [2024-11-07 15:09:41,028][04584] Fps is (10 sec: 4505.4, 60 sec: 4915.2, 300 sec: 5799.3). Total num frames: 10768384. Throughput: 0: 1181.2. Samples: 686451. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) [2024-11-07 15:09:41,032][04584] Avg episode reward: [(0, '4.383')] [2024-11-07 15:09:46,028][04584] Fps is (10 sec: 4505.5, 60 sec: 4875.6, 300 sec: 5762.2). Total num frames: 10788864. Throughput: 0: 1113.2. Samples: 693099. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:09:46,043][04584] Avg episode reward: [(0, '4.335')] [2024-11-07 15:09:49,553][09024] Updated weights for policy 0, policy_version 2637 (0.0102) [2024-11-07 15:09:51,028][04584] Fps is (10 sec: 3686.5, 60 sec: 4642.2, 300 sec: 5706.6). Total num frames: 10805248. Throughput: 0: 1009.0. Samples: 698064. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:09:51,030][04584] Avg episode reward: [(0, '4.319')] [2024-11-07 15:09:56,739][04584] Fps is (10 sec: 2676.9, 60 sec: 4250.4, 300 sec: 5637.5). Total num frames: 10817536. Throughput: 0: 974.1. Samples: 700371. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2024-11-07 15:09:56,745][04584] Avg episode reward: [(0, '4.441')] [2024-11-07 15:10:01,028][04584] Fps is (10 sec: 2867.2, 60 sec: 3959.5, 300 sec: 5553.9). Total num frames: 10833920. Throughput: 0: 930.9. Samples: 703860. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:10:01,033][04584] Avg episode reward: [(0, '4.393')] [2024-11-07 15:10:02,874][09024] Updated weights for policy 0, policy_version 2647 (0.0087) [2024-11-07 15:10:06,066][04584] Fps is (10 sec: 3513.6, 60 sec: 3752.5, 300 sec: 5469.9). Total num frames: 10850304. Throughput: 0: 949.9. Samples: 709686. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:10:06,070][04584] Avg episode reward: [(0, '4.337')] [2024-11-07 15:10:11,031][04584] Fps is (10 sec: 3685.4, 60 sec: 3754.6, 300 sec: 5442.8). Total num frames: 10870784. Throughput: 0: 963.9. Samples: 712512. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:10:11,047][04584] Avg episode reward: [(0, '4.329')] [2024-11-07 15:10:13,800][09024] Updated weights for policy 0, policy_version 2657 (0.0062) [2024-11-07 15:10:16,028][04584] Fps is (10 sec: 4111.2, 60 sec: 3754.7, 300 sec: 5423.5). Total num frames: 10891264. Throughput: 0: 943.7. Samples: 718608. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:10:16,060][04584] Avg episode reward: [(0, '4.453')] [2024-11-07 15:10:21,029][04584] Fps is (10 sec: 4096.4, 60 sec: 3911.2, 300 sec: 5359.5). Total num frames: 10911744. Throughput: 0: 910.6. Samples: 724002. Policy #0 lag: (min: 0.0, avg: 1.0, max: 4.0) [2024-11-07 15:10:21,035][04584] Avg episode reward: [(0, '4.501')] [2024-11-07 15:10:24,256][09024] Updated weights for policy 0, policy_version 2667 (0.0070) [2024-11-07 15:10:26,028][04584] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 5276.2). Total num frames: 10928128. Throughput: 0: 899.3. Samples: 726918. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:10:26,030][04584] Avg episode reward: [(0, '4.423')] [2024-11-07 15:10:31,028][04584] Fps is (10 sec: 2458.0, 60 sec: 3549.9, 300 sec: 5137.4). Total num frames: 10936320. Throughput: 0: 820.8. Samples: 730035. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-11-07 15:10:31,030][04584] Avg episode reward: [(0, '4.404')] [2024-11-07 15:10:36,028][04584] Fps is (10 sec: 2457.6, 60 sec: 3481.6, 300 sec: 5067.9). Total num frames: 10952704. Throughput: 0: 815.3. Samples: 734751. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-11-07 15:10:36,034][04584] Avg episode reward: [(0, '4.371')] [2024-11-07 15:10:39,724][09024] Updated weights for policy 0, policy_version 2677 (0.0092) [2024-11-07 15:10:41,028][04584] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 4998.5). Total num frames: 10969088. Throughput: 0: 831.2. Samples: 737184. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:10:41,030][04584] Avg episode reward: [(0, '4.353')] [2024-11-07 15:10:46,030][04584] Fps is (10 sec: 3685.4, 60 sec: 3344.9, 300 sec: 4942.9). Total num frames: 10989568. Throughput: 0: 869.6. Samples: 742995. Policy #0 lag: (min: 0.0, avg: 1.0, max: 4.0) [2024-11-07 15:10:46,045][04584] Avg episode reward: [(0, '4.321')] [2024-11-07 15:10:49,813][09024] Updated weights for policy 0, policy_version 2687 (0.0083) [2024-11-07 15:10:51,028][04584] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 4951.7). Total num frames: 11010048. Throughput: 0: 878.9. Samples: 749202. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-11-07 15:10:51,037][04584] Avg episode reward: [(0, '4.182')] [2024-11-07 15:10:56,028][04584] Fps is (10 sec: 4097.0, 60 sec: 3592.4, 300 sec: 4901.3). Total num frames: 11030528. Throughput: 0: 881.2. Samples: 752163. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-11-07 15:10:56,038][04584] Avg episode reward: [(0, '4.313')] [2024-11-07 15:10:59,310][09024] Updated weights for policy 0, policy_version 2697 (0.0070) [2024-11-07 15:11:01,028][04584] Fps is (10 sec: 4095.9, 60 sec: 3618.1, 300 sec: 4831.9). Total num frames: 11051008. Throughput: 0: 893.9. Samples: 758835. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-11-07 15:11:01,030][04584] Avg episode reward: [(0, '4.453')] [2024-11-07 15:11:06,028][04584] Fps is (10 sec: 2867.3, 60 sec: 3483.8, 300 sec: 4720.8). Total num frames: 11059200. Throughput: 0: 858.9. Samples: 762651. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-11-07 15:11:06,050][04584] Avg episode reward: [(0, '4.518')] [2024-11-07 15:11:11,039][04584] Fps is (10 sec: 2864.2, 60 sec: 3481.1, 300 sec: 4651.2). Total num frames: 11079680. Throughput: 0: 850.9. Samples: 765216. Policy #0 lag: (min: 1.0, avg: 1.3, max: 3.0) [2024-11-07 15:11:11,054][04584] Avg episode reward: [(0, '4.563')] [2024-11-07 15:11:11,654][09009] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000002706_11083776.pth... [2024-11-07 15:11:12,410][09009] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000002440_9994240.pth [2024-11-07 15:11:12,676][09024] Updated weights for policy 0, policy_version 2707 (0.0102) [2024-11-07 15:11:16,028][04584] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 4595.9). Total num frames: 11104256. Throughput: 0: 909.9. Samples: 770982. Policy #0 lag: (min: 0.0, avg: 1.6, max: 4.0) [2024-11-07 15:11:16,035][04584] Avg episode reward: [(0, '4.511')] [2024-11-07 15:11:20,515][09024] Updated weights for policy 0, policy_version 2717 (0.0108) [2024-11-07 15:11:21,029][04584] Fps is (10 sec: 4919.9, 60 sec: 3618.2, 300 sec: 4554.2). Total num frames: 11128832. Throughput: 0: 974.7. Samples: 778614. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:11:21,032][04584] Avg episode reward: [(0, '4.433')] [2024-11-07 15:11:26,028][04584] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 4554.2). Total num frames: 11149312. Throughput: 0: 994.3. Samples: 781929. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:11:26,032][04584] Avg episode reward: [(0, '4.463')] [2024-11-07 15:11:30,644][09024] Updated weights for policy 0, policy_version 2727 (0.0103) [2024-11-07 15:11:31,036][04584] Fps is (10 sec: 4092.9, 60 sec: 3890.6, 300 sec: 4540.2). Total num frames: 11169792. Throughput: 0: 1013.8. Samples: 788622. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:11:31,051][04584] Avg episode reward: [(0, '4.541')] [2024-11-07 15:11:36,028][04584] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 4526.4). Total num frames: 11186176. Throughput: 0: 990.2. Samples: 793761. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2024-11-07 15:11:36,095][04584] Avg episode reward: [(0, '4.513')] [2024-11-07 15:11:41,029][04584] Fps is (10 sec: 2869.4, 60 sec: 3822.9, 300 sec: 4498.8). Total num frames: 11198464. Throughput: 0: 979.4. Samples: 796236. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-11-07 15:11:41,032][04584] Avg episode reward: [(0, '4.443')] [2024-11-07 15:11:44,286][09024] Updated weights for policy 0, policy_version 2737 (0.0075) [2024-11-07 15:11:46,028][04584] Fps is (10 sec: 2867.2, 60 sec: 3754.8, 300 sec: 4443.1). Total num frames: 11214848. Throughput: 0: 908.7. Samples: 799725. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2024-11-07 15:11:46,036][04584] Avg episode reward: [(0, '4.426')] [2024-11-07 15:11:51,028][04584] Fps is (10 sec: 4096.3, 60 sec: 3822.9, 300 sec: 4415.3). Total num frames: 11239424. Throughput: 0: 965.5. Samples: 806097. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:11:51,030][04584] Avg episode reward: [(0, '4.502')] [2024-11-07 15:11:53,698][09024] Updated weights for policy 0, policy_version 2747 (0.0120) [2024-11-07 15:11:56,031][04584] Fps is (10 sec: 4094.8, 60 sec: 3754.5, 300 sec: 4373.7). Total num frames: 11255808. Throughput: 0: 989.4. Samples: 809733. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:11:56,035][04584] Avg episode reward: [(0, '4.506')] [2024-11-07 15:12:01,028][04584] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 4387.6). Total num frames: 11272192. Throughput: 0: 975.5. Samples: 814878. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:12:01,061][04584] Avg episode reward: [(0, '4.458')] [2024-11-07 15:12:05,292][09024] Updated weights for policy 0, policy_version 2757 (0.0095) [2024-11-07 15:12:06,028][04584] Fps is (10 sec: 3687.5, 60 sec: 3891.2, 300 sec: 4401.5). Total num frames: 11292672. Throughput: 0: 924.8. Samples: 820227. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:12:06,030][04584] Avg episode reward: [(0, '4.342')] [2024-11-07 15:12:11,028][04584] Fps is (10 sec: 5324.8, 60 sec: 4096.7, 300 sec: 4443.2). Total num frames: 11325440. Throughput: 0: 946.0. Samples: 824499. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:12:11,035][04584] Avg episode reward: [(0, '4.363')] [2024-11-07 15:12:11,709][09024] Updated weights for policy 0, policy_version 2767 (0.0038) [2024-11-07 15:12:16,028][04584] Fps is (10 sec: 4915.2, 60 sec: 3959.5, 300 sec: 4429.2). Total num frames: 11341824. Throughput: 0: 948.7. Samples: 831306. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:12:16,036][04584] Avg episode reward: [(0, '4.397')] [2024-11-07 15:12:21,030][04584] Fps is (10 sec: 4505.0, 60 sec: 4027.7, 300 sec: 4457.0). Total num frames: 11370496. Throughput: 0: 1013.9. Samples: 839388. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:12:21,033][04584] Avg episode reward: [(0, '4.267')] [2024-11-07 15:12:21,441][09024] Updated weights for policy 0, policy_version 2777 (0.0056) [2024-11-07 15:12:26,029][04584] Fps is (10 sec: 5733.9, 60 sec: 4164.2, 300 sec: 4457.0). Total num frames: 11399168. Throughput: 0: 1048.7. Samples: 843426. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2024-11-07 15:12:26,040][04584] Avg episode reward: [(0, '4.437')] [2024-11-07 15:12:27,849][09024] Updated weights for policy 0, policy_version 2787 (0.0065) [2024-11-07 15:12:31,033][04584] Fps is (10 sec: 6141.7, 60 sec: 4369.3, 300 sec: 4484.9). Total num frames: 11431936. Throughput: 0: 1182.1. Samples: 852924. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-11-07 15:12:31,045][04584] Avg episode reward: [(0, '4.275')] [2024-11-07 15:12:34,138][09024] Updated weights for policy 0, policy_version 2797 (0.0074) [2024-11-07 15:12:36,035][04584] Fps is (10 sec: 6958.7, 60 sec: 4709.8, 300 sec: 4554.1). Total num frames: 11468800. Throughput: 0: 1266.2. Samples: 863085. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-11-07 15:12:36,038][04584] Avg episode reward: [(0, '4.393')] [2024-11-07 15:12:40,396][09024] Updated weights for policy 0, policy_version 2807 (0.0059) [2024-11-07 15:12:41,029][04584] Fps is (10 sec: 6556.5, 60 sec: 4983.5, 300 sec: 4554.2). Total num frames: 11497472. Throughput: 0: 1290.9. Samples: 867822. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:12:41,032][04584] Avg episode reward: [(0, '4.360')] [2024-11-07 15:12:46,028][04584] Fps is (10 sec: 6558.3, 60 sec: 5324.8, 300 sec: 4595.9). Total num frames: 11534336. Throughput: 0: 1392.6. Samples: 877545. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:12:46,030][04584] Avg episode reward: [(0, '4.605')] [2024-11-07 15:12:48,617][09024] Updated weights for policy 0, policy_version 2817 (0.0046) [2024-11-07 15:12:51,028][04584] Fps is (10 sec: 5325.1, 60 sec: 5188.3, 300 sec: 4568.1). Total num frames: 11550720. Throughput: 0: 1424.7. Samples: 884337. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:12:51,031][04584] Avg episode reward: [(0, '4.664')] [2024-11-07 15:12:54,695][09024] Updated weights for policy 0, policy_version 2827 (0.0045) [2024-11-07 15:12:56,028][04584] Fps is (10 sec: 5325.0, 60 sec: 5529.9, 300 sec: 4595.9). Total num frames: 11587584. Throughput: 0: 1441.0. Samples: 889341. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2024-11-07 15:12:56,033][04584] Avg episode reward: [(0, '4.448')] [2024-11-07 15:13:00,537][09024] Updated weights for policy 0, policy_version 2837 (0.0054) [2024-11-07 15:13:01,030][04584] Fps is (10 sec: 6961.8, 60 sec: 5802.5, 300 sec: 4595.8). Total num frames: 11620352. Throughput: 0: 1518.0. Samples: 899619. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:13:01,037][04584] Avg episode reward: [(0, '4.399')] [2024-11-07 15:13:06,034][04584] Fps is (10 sec: 6139.8, 60 sec: 5938.5, 300 sec: 4595.9). Total num frames: 11649024. Throughput: 0: 1536.9. Samples: 908556. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-11-07 15:13:06,038][04584] Avg episode reward: [(0, '4.576')] [2024-11-07 15:13:07,824][09024] Updated weights for policy 0, policy_version 2847 (0.0046) [2024-11-07 15:13:11,028][04584] Fps is (10 sec: 5735.9, 60 sec: 5871.0, 300 sec: 4623.6). Total num frames: 11677696. Throughput: 0: 1546.8. Samples: 913032. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:13:11,030][04584] Avg episode reward: [(0, '4.395')] [2024-11-07 15:13:11,245][09009] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000002852_11681792.pth... [2024-11-07 15:13:11,431][09009] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000002599_10645504.pth [2024-11-07 15:13:14,223][09024] Updated weights for policy 0, policy_version 2857 (0.0048) [2024-11-07 15:13:16,028][04584] Fps is (10 sec: 6148.2, 60 sec: 6144.0, 300 sec: 4637.5). Total num frames: 11710464. Throughput: 0: 1545.3. Samples: 922455. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:13:16,030][04584] Avg episode reward: [(0, '4.332')] [2024-11-07 15:13:20,452][09024] Updated weights for policy 0, policy_version 2867 (0.0048) [2024-11-07 15:13:22,886][04584] Fps is (10 sec: 5526.7, 60 sec: 6025.9, 300 sec: 4608.5). Total num frames: 11743232. Throughput: 0: 1485.1. Samples: 932661. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2024-11-07 15:13:22,894][04584] Avg episode reward: [(0, '4.160')] [2024-11-07 15:13:26,028][04584] Fps is (10 sec: 5734.4, 60 sec: 6144.1, 300 sec: 4609.7). Total num frames: 11767808. Throughput: 0: 1475.2. Samples: 934206. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:13:26,030][04584] Avg episode reward: [(0, '4.440')] [2024-11-07 15:13:28,531][09024] Updated weights for policy 0, policy_version 2877 (0.0041) [2024-11-07 15:13:31,027][04584] Fps is (10 sec: 7043.1, 60 sec: 6144.6, 300 sec: 4609.7). Total num frames: 11800576. Throughput: 0: 1484.0. Samples: 944325. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-11-07 15:13:31,030][04584] Avg episode reward: [(0, '4.327')] [2024-11-07 15:13:35,141][09024] Updated weights for policy 0, policy_version 2887 (0.0033) [2024-11-07 15:13:36,033][04584] Fps is (10 sec: 6140.8, 60 sec: 6007.7, 300 sec: 4595.8). Total num frames: 11829248. Throughput: 0: 1536.6. Samples: 953490. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:13:36,034][04584] Avg episode reward: [(0, '4.494')] [2024-11-07 15:13:41,035][04584] Fps is (10 sec: 6139.4, 60 sec: 6075.1, 300 sec: 4627.6). Total num frames: 11862016. Throughput: 0: 1525.2. Samples: 957987. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-11-07 15:13:41,037][04584] Avg episode reward: [(0, '4.347')] [2024-11-07 15:13:41,220][09024] Updated weights for policy 0, policy_version 2897 (0.0051) [2024-11-07 15:13:46,028][04584] Fps is (10 sec: 6966.9, 60 sec: 6075.8, 300 sec: 4651.4). Total num frames: 11898880. Throughput: 0: 1542.1. Samples: 969009. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) [2024-11-07 15:13:46,030][04584] Avg episode reward: [(0, '4.503')] [2024-11-07 15:13:46,718][09024] Updated weights for policy 0, policy_version 2907 (0.0032) [2024-11-07 15:13:51,028][04584] Fps is (10 sec: 7377.9, 60 sec: 6417.1, 300 sec: 4665.3). Total num frames: 11935744. Throughput: 0: 1589.0. Samples: 980049. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2024-11-07 15:13:51,031][04584] Avg episode reward: [(0, '4.431')] [2024-11-07 15:13:52,617][09024] Updated weights for policy 0, policy_version 2917 (0.0039) [2024-11-07 15:13:57,206][04584] Fps is (10 sec: 5863.0, 60 sec: 6159.6, 300 sec: 4619.1). Total num frames: 11964416. Throughput: 0: 1561.5. Samples: 985140. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) [2024-11-07 15:13:57,212][04584] Avg episode reward: [(0, '4.592')] [2024-11-07 15:14:00,312][09024] Updated weights for policy 0, policy_version 2927 (0.0028) [2024-11-07 15:14:01,028][04584] Fps is (10 sec: 5734.7, 60 sec: 6212.5, 300 sec: 4637.5). Total num frames: 11993088. Throughput: 0: 1551.5. Samples: 992271. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-11-07 15:14:01,029][04584] Avg episode reward: [(0, '4.379')] [2024-11-07 15:14:06,028][04584] Fps is (10 sec: 6964.1, 60 sec: 6281.2, 300 sec: 4679.2). Total num frames: 12025856. Throughput: 0: 1607.6. Samples: 1002018. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2024-11-07 15:14:06,030][04584] Avg episode reward: [(0, '4.472')] [2024-11-07 15:14:06,718][09024] Updated weights for policy 0, policy_version 2937 (0.0058) [2024-11-07 15:14:11,028][04584] Fps is (10 sec: 6963.2, 60 sec: 6417.1, 300 sec: 4734.7). Total num frames: 12062720. Throughput: 0: 1629.3. Samples: 1007526. Policy #0 lag: (min: 0.0, avg: 1.3, max: 4.0) [2024-11-07 15:14:11,033][04584] Avg episode reward: [(0, '4.317')] [2024-11-07 15:14:11,886][09024] Updated weights for policy 0, policy_version 2947 (0.0051) [2024-11-07 15:14:16,029][04584] Fps is (10 sec: 7372.4, 60 sec: 6485.2, 300 sec: 4826.3). Total num frames: 12099584. Throughput: 0: 1657.5. Samples: 1018914. Policy #0 lag: (min: 0.0, avg: 1.3, max: 4.0) [2024-11-07 15:14:16,031][04584] Avg episode reward: [(0, '4.249')] [2024-11-07 15:14:17,578][09024] Updated weights for policy 0, policy_version 2957 (0.0039) [2024-11-07 15:14:21,028][04584] Fps is (10 sec: 7372.8, 60 sec: 6763.0, 300 sec: 4873.5). Total num frames: 12136448. Throughput: 0: 1692.9. Samples: 1029663. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:14:21,029][04584] Avg episode reward: [(0, '4.486')] [2024-11-07 15:14:23,230][09024] Updated weights for policy 0, policy_version 2967 (0.0057) [2024-11-07 15:14:26,028][04584] Fps is (10 sec: 6963.8, 60 sec: 6690.1, 300 sec: 4901.3). Total num frames: 12169216. Throughput: 0: 1719.2. Samples: 1035336. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:14:26,034][04584] Avg episode reward: [(0, '4.601')] [2024-11-07 15:14:28,687][09024] Updated weights for policy 0, policy_version 2977 (0.0045) [2024-11-07 15:14:31,524][04584] Fps is (10 sec: 5463.0, 60 sec: 6499.8, 300 sec: 4906.9). Total num frames: 12193792. Throughput: 0: 1696.9. Samples: 1046211. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2024-11-07 15:14:31,526][04584] Avg episode reward: [(0, '4.651')] [2024-11-07 15:14:36,028][04584] Fps is (10 sec: 6143.9, 60 sec: 6690.7, 300 sec: 4956.9). Total num frames: 12230656. Throughput: 0: 1634.1. Samples: 1053585. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:14:36,030][04584] Avg episode reward: [(0, '4.456')] [2024-11-07 15:14:36,569][09024] Updated weights for policy 0, policy_version 2987 (0.0041) [2024-11-07 15:14:41,028][04584] Fps is (10 sec: 6896.1, 60 sec: 6622.7, 300 sec: 4984.6). Total num frames: 12259328. Throughput: 0: 1685.2. Samples: 1058991. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2024-11-07 15:14:41,031][04584] Avg episode reward: [(0, '4.358')] [2024-11-07 15:14:44,208][09024] Updated weights for policy 0, policy_version 2997 (0.0079) [2024-11-07 15:14:46,028][04584] Fps is (10 sec: 4915.3, 60 sec: 6348.8, 300 sec: 4998.5). Total num frames: 12279808. Throughput: 0: 1637.1. Samples: 1065942. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-11-07 15:14:46,036][04584] Avg episode reward: [(0, '4.494')] [2024-11-07 15:14:51,037][04584] Fps is (10 sec: 5319.7, 60 sec: 6279.6, 300 sec: 5080.0). Total num frames: 12312576. Throughput: 0: 1610.5. Samples: 1074504. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:14:51,040][04584] Avg episode reward: [(0, '4.573')] [2024-11-07 15:14:51,137][09024] Updated weights for policy 0, policy_version 3007 (0.0068) [2024-11-07 15:14:56,028][04584] Fps is (10 sec: 7372.9, 60 sec: 6615.2, 300 sec: 5151.2). Total num frames: 12353536. Throughput: 0: 1614.2. Samples: 1080165. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:14:56,030][04584] Avg episode reward: [(0, '4.318')] [2024-11-07 15:14:56,610][09024] Updated weights for policy 0, policy_version 3017 (0.0036) [2024-11-07 15:15:01,028][04584] Fps is (10 sec: 7379.8, 60 sec: 6553.6, 300 sec: 5207.4). Total num frames: 12386304. Throughput: 0: 1603.7. Samples: 1091079. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:15:01,033][04584] Avg episode reward: [(0, '4.576')] [2024-11-07 15:15:02,672][09024] Updated weights for policy 0, policy_version 3027 (0.0045) [2024-11-07 15:15:06,031][04584] Fps is (10 sec: 5323.2, 60 sec: 6348.5, 300 sec: 5206.8). Total num frames: 12406784. Throughput: 0: 1540.1. Samples: 1098972. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2024-11-07 15:15:06,033][04584] Avg episode reward: [(0, '4.521')] [2024-11-07 15:15:10,389][09024] Updated weights for policy 0, policy_version 3037 (0.0042) [2024-11-07 15:15:11,029][04584] Fps is (10 sec: 5324.3, 60 sec: 6280.4, 300 sec: 5248.4). Total num frames: 12439552. Throughput: 0: 1499.1. Samples: 1102797. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:15:11,031][04584] Avg episode reward: [(0, '4.354')] [2024-11-07 15:15:11,268][09009] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003038_12443648.pth... [2024-11-07 15:15:11,509][09009] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000002706_11083776.pth [2024-11-07 15:15:16,028][04584] Fps is (10 sec: 6555.2, 60 sec: 6212.3, 300 sec: 5290.1). Total num frames: 12472320. Throughput: 0: 1503.8. Samples: 1113135. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:15:16,032][04584] Avg episode reward: [(0, '4.369')] [2024-11-07 15:15:16,987][09024] Updated weights for policy 0, policy_version 3047 (0.0049) [2024-11-07 15:15:21,028][04584] Fps is (10 sec: 6554.3, 60 sec: 6144.0, 300 sec: 5345.6). Total num frames: 12505088. Throughput: 0: 1518.1. Samples: 1121901. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:15:21,030][04584] Avg episode reward: [(0, '4.181')] [2024-11-07 15:15:24,057][09024] Updated weights for policy 0, policy_version 3057 (0.0030) [2024-11-07 15:15:26,029][04584] Fps is (10 sec: 6143.3, 60 sec: 6075.6, 300 sec: 5415.0). Total num frames: 12533760. Throughput: 0: 1497.6. Samples: 1126386. Policy #0 lag: (min: 0.0, avg: 1.2, max: 4.0) [2024-11-07 15:15:26,031][04584] Avg episode reward: [(0, '4.241')] [2024-11-07 15:15:29,604][09024] Updated weights for policy 0, policy_version 3067 (0.0041) [2024-11-07 15:15:31,030][04584] Fps is (10 sec: 6551.9, 60 sec: 6332.7, 300 sec: 5484.4). Total num frames: 12570624. Throughput: 0: 1579.4. Samples: 1137018. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:15:31,040][04584] Avg episode reward: [(0, '4.564')] [2024-11-07 15:15:35,284][09024] Updated weights for policy 0, policy_version 3077 (0.0032) [2024-11-07 15:15:36,030][04584] Fps is (10 sec: 7372.0, 60 sec: 6280.3, 300 sec: 5553.8). Total num frames: 12607488. Throughput: 0: 1630.2. Samples: 1147851. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:15:36,032][04584] Avg episode reward: [(0, '4.505')] [2024-11-07 15:15:41,028][04584] Fps is (10 sec: 5735.8, 60 sec: 6144.0, 300 sec: 5553.9). Total num frames: 12627968. Throughput: 0: 1625.3. Samples: 1153302. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:15:41,030][04584] Avg episode reward: [(0, '4.570')] [2024-11-07 15:15:42,967][09024] Updated weights for policy 0, policy_version 3087 (0.0043) [2024-11-07 15:15:46,028][04584] Fps is (10 sec: 5735.9, 60 sec: 6417.0, 300 sec: 5609.4). Total num frames: 12664832. Throughput: 0: 1542.5. Samples: 1160490. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-11-07 15:15:46,030][04584] Avg episode reward: [(0, '4.759')] [2024-11-07 15:15:49,866][09024] Updated weights for policy 0, policy_version 3097 (0.0048) [2024-11-07 15:15:51,032][04584] Fps is (10 sec: 6141.4, 60 sec: 6281.1, 300 sec: 5623.2). Total num frames: 12689408. Throughput: 0: 1555.1. Samples: 1168953. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2024-11-07 15:15:51,035][04584] Avg episode reward: [(0, '4.603')] [2024-11-07 15:15:56,030][04584] Fps is (10 sec: 4914.2, 60 sec: 6007.2, 300 sec: 5637.2). Total num frames: 12713984. Throughput: 0: 1541.6. Samples: 1172169. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:15:56,034][04584] Avg episode reward: [(0, '4.540')] [2024-11-07 15:15:58,053][09024] Updated weights for policy 0, policy_version 3107 (0.0034) [2024-11-07 15:16:01,028][04584] Fps is (10 sec: 4917.4, 60 sec: 5871.0, 300 sec: 5692.7). Total num frames: 12738560. Throughput: 0: 1491.2. Samples: 1180236. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:16:01,031][04584] Avg episode reward: [(0, '4.267')] [2024-11-07 15:16:06,033][04584] Fps is (10 sec: 4504.1, 60 sec: 5870.7, 300 sec: 5692.8). Total num frames: 12759040. Throughput: 0: 1437.4. Samples: 1186593. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:16:06,038][04584] Avg episode reward: [(0, '4.296')] [2024-11-07 15:16:07,478][09024] Updated weights for policy 0, policy_version 3117 (0.0062) [2024-11-07 15:16:11,030][04584] Fps is (10 sec: 4095.1, 60 sec: 5666.0, 300 sec: 5678.8). Total num frames: 12779520. Throughput: 0: 1411.1. Samples: 1189884. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:16:11,032][04584] Avg episode reward: [(0, '4.350')] [2024-11-07 15:16:16,027][04584] Fps is (10 sec: 3278.6, 60 sec: 5324.9, 300 sec: 5637.2). Total num frames: 12791808. Throughput: 0: 1267.9. Samples: 1194069. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:16:16,039][04584] Avg episode reward: [(0, '4.416')] [2024-11-07 15:16:19,334][09024] Updated weights for policy 0, policy_version 3127 (0.0075) [2024-11-07 15:16:21,028][04584] Fps is (10 sec: 3277.4, 60 sec: 5120.0, 300 sec: 5637.2). Total num frames: 12812288. Throughput: 0: 1162.9. Samples: 1200180. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:16:21,035][04584] Avg episode reward: [(0, '4.512')] [2024-11-07 15:16:26,028][04584] Fps is (10 sec: 4505.5, 60 sec: 5051.9, 300 sec: 5651.3). Total num frames: 12836864. Throughput: 0: 1108.3. Samples: 1203174. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:16:26,031][04584] Avg episode reward: [(0, '4.481')] [2024-11-07 15:16:29,455][09024] Updated weights for policy 0, policy_version 3137 (0.0076) [2024-11-07 15:16:31,034][04584] Fps is (10 sec: 4093.6, 60 sec: 4710.1, 300 sec: 5651.0). Total num frames: 12853248. Throughput: 0: 1089.5. Samples: 1209522. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-11-07 15:16:31,035][04584] Avg episode reward: [(0, '4.504')] [2024-11-07 15:16:36,028][04584] Fps is (10 sec: 4096.1, 60 sec: 4505.8, 300 sec: 5692.8). Total num frames: 12877824. Throughput: 0: 1049.7. Samples: 1216185. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:16:36,038][04584] Avg episode reward: [(0, '4.367')] [2024-11-07 15:16:37,390][09024] Updated weights for policy 0, policy_version 3147 (0.0072) [2024-11-07 15:16:41,027][04584] Fps is (10 sec: 6147.7, 60 sec: 4778.7, 300 sec: 5762.2). Total num frames: 12914688. Throughput: 0: 1088.7. Samples: 1221156. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:16:41,029][04584] Avg episode reward: [(0, '4.351')] [2024-11-07 15:16:42,957][09024] Updated weights for policy 0, policy_version 3157 (0.0072) [2024-11-07 15:16:46,028][04584] Fps is (10 sec: 7372.8, 60 sec: 4778.7, 300 sec: 5803.8). Total num frames: 12951552. Throughput: 0: 1153.2. Samples: 1232130. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:16:46,029][04584] Avg episode reward: [(0, '4.437')] [2024-11-07 15:16:50,809][09024] Updated weights for policy 0, policy_version 3167 (0.0060) [2024-11-07 15:16:51,031][04584] Fps is (10 sec: 5732.6, 60 sec: 4710.5, 300 sec: 5817.7). Total num frames: 12972032. Throughput: 0: 1166.0. Samples: 1239060. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2024-11-07 15:16:51,033][04584] Avg episode reward: [(0, '4.411')] [2024-11-07 15:16:56,028][04584] Fps is (10 sec: 5734.1, 60 sec: 4915.3, 300 sec: 5887.1). Total num frames: 13008896. Throughput: 0: 1216.6. Samples: 1244631. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:16:56,034][04584] Avg episode reward: [(0, '4.402')] [2024-11-07 15:16:56,200][09024] Updated weights for policy 0, policy_version 3177 (0.0044) [2024-11-07 15:17:01,029][04584] Fps is (10 sec: 7783.8, 60 sec: 5188.2, 300 sec: 5956.5). Total num frames: 13049856. Throughput: 0: 1382.7. Samples: 1256292. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-11-07 15:17:01,031][04584] Avg episode reward: [(0, '4.430')] [2024-11-07 15:17:01,393][09024] Updated weights for policy 0, policy_version 3187 (0.0050) [2024-11-07 15:17:06,028][04584] Fps is (10 sec: 7782.9, 60 sec: 5461.8, 300 sec: 5970.5). Total num frames: 13086720. Throughput: 0: 1490.1. Samples: 1267233. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:17:06,029][04584] Avg episode reward: [(0, '4.488')] [2024-11-07 15:17:07,166][09024] Updated weights for policy 0, policy_version 3197 (0.0034) [2024-11-07 15:17:11,028][04584] Fps is (10 sec: 7373.7, 60 sec: 5734.6, 300 sec: 6039.9). Total num frames: 13123584. Throughput: 0: 1548.8. Samples: 1272870. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:17:11,029][04584] Avg episode reward: [(0, '4.295')] [2024-11-07 15:17:11,242][09009] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003205_13127680.pth... [2024-11-07 15:17:11,341][09009] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000002852_11681792.pth [2024-11-07 15:17:12,310][09024] Updated weights for policy 0, policy_version 3207 (0.0045) [2024-11-07 15:17:16,028][04584] Fps is (10 sec: 7372.8, 60 sec: 6144.0, 300 sec: 6067.7). Total num frames: 13160448. Throughput: 0: 1663.0. Samples: 1284345. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-11-07 15:17:16,029][04584] Avg episode reward: [(0, '4.660')] [2024-11-07 15:17:17,924][09024] Updated weights for policy 0, policy_version 3217 (0.0044) [2024-11-07 15:17:21,028][04584] Fps is (10 sec: 6963.1, 60 sec: 6348.8, 300 sec: 6081.5). Total num frames: 13193216. Throughput: 0: 1750.2. Samples: 1294944. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) [2024-11-07 15:17:21,039][04584] Avg episode reward: [(0, '4.639')] [2024-11-07 15:17:26,029][04584] Fps is (10 sec: 5324.3, 60 sec: 6280.5, 300 sec: 6040.0). Total num frames: 13213696. Throughput: 0: 1676.6. Samples: 1296603. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2024-11-07 15:17:26,032][04584] Avg episode reward: [(0, '4.504')] [2024-11-07 15:17:26,557][09024] Updated weights for policy 0, policy_version 3227 (0.0042) [2024-11-07 15:17:31,028][04584] Fps is (10 sec: 5734.2, 60 sec: 6622.5, 300 sec: 6040.0). Total num frames: 13250560. Throughput: 0: 1658.7. Samples: 1306773. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:17:31,030][04584] Avg episode reward: [(0, '4.591')] [2024-11-07 15:17:31,592][09024] Updated weights for policy 0, policy_version 3237 (0.0043) [2024-11-07 15:17:36,028][04584] Fps is (10 sec: 6963.7, 60 sec: 6758.4, 300 sec: 6053.8). Total num frames: 13283328. Throughput: 0: 1743.0. Samples: 1317489. Policy #0 lag: (min: 0.0, avg: 1.1, max: 4.0) [2024-11-07 15:17:36,039][04584] Avg episode reward: [(0, '4.412')] [2024-11-07 15:17:40,291][09024] Updated weights for policy 0, policy_version 3247 (0.0078) [2024-11-07 15:17:41,030][04584] Fps is (10 sec: 4914.0, 60 sec: 6416.8, 300 sec: 5984.3). Total num frames: 13299712. Throughput: 0: 1683.3. Samples: 1320381. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:17:41,034][04584] Avg episode reward: [(0, '4.433')] [2024-11-07 15:17:46,035][04584] Fps is (10 sec: 3274.6, 60 sec: 6075.0, 300 sec: 5984.2). Total num frames: 13316096. Throughput: 0: 1530.9. Samples: 1325190. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-11-07 15:17:46,038][04584] Avg episode reward: [(0, '4.402')] [2024-11-07 15:17:51,032][04584] Fps is (10 sec: 3685.8, 60 sec: 6075.6, 300 sec: 5928.7). Total num frames: 13336576. Throughput: 0: 1412.1. Samples: 1330785. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2024-11-07 15:17:51,040][04584] Avg episode reward: [(0, '4.321')] [2024-11-07 15:17:51,466][09024] Updated weights for policy 0, policy_version 3257 (0.0066) [2024-11-07 15:17:57,446][04584] Fps is (10 sec: 2871.7, 60 sec: 5535.4, 300 sec: 5831.4). Total num frames: 13348864. Throughput: 0: 1311.9. Samples: 1333767. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2024-11-07 15:17:57,457][04584] Avg episode reward: [(0, '4.297')] [2024-11-07 15:18:01,037][04584] Fps is (10 sec: 2865.7, 60 sec: 5255.8, 300 sec: 5817.6). Total num frames: 13365248. Throughput: 0: 1179.2. Samples: 1337421. Policy #0 lag: (min: 0.0, avg: 0.9, max: 4.0) [2024-11-07 15:18:01,041][04584] Avg episode reward: [(0, '4.410')] [2024-11-07 15:18:06,028][04584] Fps is (10 sec: 2863.6, 60 sec: 4778.6, 300 sec: 5748.3). Total num frames: 13373440. Throughput: 0: 1028.0. Samples: 1341204. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) [2024-11-07 15:18:06,035][04584] Avg episode reward: [(0, '4.401')] [2024-11-07 15:18:06,827][09024] Updated weights for policy 0, policy_version 3267 (0.0073) [2024-11-07 15:18:11,028][04584] Fps is (10 sec: 3279.8, 60 sec: 4573.8, 300 sec: 5720.5). Total num frames: 13398016. Throughput: 0: 1053.3. Samples: 1344000. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2024-11-07 15:18:11,035][04584] Avg episode reward: [(0, '4.495')] [2024-11-07 15:18:16,033][04584] Fps is (10 sec: 4503.3, 60 sec: 4300.4, 300 sec: 5714.8). Total num frames: 13418496. Throughput: 0: 978.6. Samples: 1350813. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:18:16,047][04584] Avg episode reward: [(0, '4.469')] [2024-11-07 15:18:16,227][09024] Updated weights for policy 0, policy_version 3277 (0.0046) [2024-11-07 15:18:21,028][04584] Fps is (10 sec: 4096.1, 60 sec: 4096.0, 300 sec: 5665.0). Total num frames: 13438976. Throughput: 0: 873.2. Samples: 1356783. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2024-11-07 15:18:21,031][04584] Avg episode reward: [(0, '4.397')] [2024-11-07 15:18:25,955][09024] Updated weights for policy 0, policy_version 3287 (0.0095) [2024-11-07 15:18:26,028][04584] Fps is (10 sec: 4507.9, 60 sec: 4164.3, 300 sec: 5637.2). Total num frames: 13463552. Throughput: 0: 880.2. Samples: 1359987. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:18:26,029][04584] Avg episode reward: [(0, '4.511')] [2024-11-07 15:18:31,786][04584] Fps is (10 sec: 3045.8, 60 sec: 3640.4, 300 sec: 5553.6). Total num frames: 13471744. Throughput: 0: 882.3. Samples: 1365555. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:18:31,802][04584] Avg episode reward: [(0, '4.441')] [2024-11-07 15:18:36,028][04584] Fps is (10 sec: 2457.6, 60 sec: 3413.3, 300 sec: 5512.4). Total num frames: 13488128. Throughput: 0: 844.5. Samples: 1368783. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:18:36,046][04584] Avg episode reward: [(0, '4.248')] [2024-11-07 15:18:40,005][09024] Updated weights for policy 0, policy_version 3297 (0.0126) [2024-11-07 15:18:41,028][04584] Fps is (10 sec: 3545.7, 60 sec: 3413.5, 300 sec: 5442.8). Total num frames: 13504512. Throughput: 0: 868.6. Samples: 1371624. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-11-07 15:18:41,031][04584] Avg episode reward: [(0, '4.406')] [2024-11-07 15:18:46,028][04584] Fps is (10 sec: 4095.8, 60 sec: 3550.2, 300 sec: 5401.2). Total num frames: 13529088. Throughput: 0: 893.7. Samples: 1377630. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:18:46,041][04584] Avg episode reward: [(0, '4.452')] [2024-11-07 15:18:49,850][09024] Updated weights for policy 0, policy_version 3307 (0.0041) [2024-11-07 15:18:51,028][04584] Fps is (10 sec: 4095.8, 60 sec: 3481.8, 300 sec: 5381.0). Total num frames: 13545472. Throughput: 0: 942.9. Samples: 1383636. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:18:51,037][04584] Avg episode reward: [(0, '4.478')] [2024-11-07 15:18:56,031][04584] Fps is (10 sec: 4504.3, 60 sec: 3845.3, 300 sec: 5359.4). Total num frames: 13574144. Throughput: 0: 959.2. Samples: 1387167. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:18:56,040][04584] Avg episode reward: [(0, '4.499')] [2024-11-07 15:18:57,652][09024] Updated weights for policy 0, policy_version 3317 (0.0054) [2024-11-07 15:19:01,028][04584] Fps is (10 sec: 5734.7, 60 sec: 3960.1, 300 sec: 5345.6). Total num frames: 13602816. Throughput: 0: 1009.3. Samples: 1396227. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:19:01,042][04584] Avg episode reward: [(0, '4.339')] [2024-11-07 15:19:06,270][04584] Fps is (10 sec: 4800.8, 60 sec: 4147.6, 300 sec: 5285.8). Total num frames: 13623296. Throughput: 0: 962.7. Samples: 1400337. Policy #0 lag: (min: 0.0, avg: 1.2, max: 4.0) [2024-11-07 15:19:06,275][04584] Avg episode reward: [(0, '4.364')] [2024-11-07 15:19:06,706][09024] Updated weights for policy 0, policy_version 3327 (0.0078) [2024-11-07 15:19:11,028][04584] Fps is (10 sec: 4915.3, 60 sec: 4232.6, 300 sec: 5262.3). Total num frames: 13651968. Throughput: 0: 1026.0. Samples: 1406157. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-11-07 15:19:11,031][04584] Avg episode reward: [(0, '4.456')] [2024-11-07 15:19:11,165][09009] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003333_13651968.pth... [2024-11-07 15:19:11,634][09009] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003038_12443648.pth [2024-11-07 15:19:13,368][09024] Updated weights for policy 0, policy_version 3337 (0.0061) [2024-11-07 15:19:16,029][04584] Fps is (10 sec: 6295.7, 60 sec: 4437.6, 300 sec: 5248.4). Total num frames: 13684736. Throughput: 0: 1129.8. Samples: 1415538. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2024-11-07 15:19:16,031][04584] Avg episode reward: [(0, '4.392')] [2024-11-07 15:19:19,544][09024] Updated weights for policy 0, policy_version 3347 (0.0048) [2024-11-07 15:19:21,028][04584] Fps is (10 sec: 6553.6, 60 sec: 4642.2, 300 sec: 5248.4). Total num frames: 13717504. Throughput: 0: 1260.9. Samples: 1425525. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:19:21,030][04584] Avg episode reward: [(0, '4.638')] [2024-11-07 15:19:25,100][09024] Updated weights for policy 0, policy_version 3357 (0.0035) [2024-11-07 15:19:26,027][04584] Fps is (10 sec: 7373.7, 60 sec: 4915.2, 300 sec: 5312.9). Total num frames: 13758464. Throughput: 0: 1307.5. Samples: 1430463. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) [2024-11-07 15:19:26,029][04584] Avg episode reward: [(0, '4.403')] [2024-11-07 15:19:30,049][09024] Updated weights for policy 0, policy_version 3367 (0.0042) [2024-11-07 15:19:31,028][04584] Fps is (10 sec: 8192.0, 60 sec: 5531.3, 300 sec: 5317.9). Total num frames: 13799424. Throughput: 0: 1453.4. Samples: 1443030. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:19:31,030][04584] Avg episode reward: [(0, '4.426')] [2024-11-07 15:19:35,383][09024] Updated weights for policy 0, policy_version 3377 (0.0046) [2024-11-07 15:19:36,028][04584] Fps is (10 sec: 7782.2, 60 sec: 5802.7, 300 sec: 5345.6). Total num frames: 13836288. Throughput: 0: 1583.3. Samples: 1454883. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:19:36,030][04584] Avg episode reward: [(0, '4.432')] [2024-11-07 15:19:41,028][04584] Fps is (10 sec: 5734.0, 60 sec: 5870.9, 300 sec: 5345.6). Total num frames: 13856768. Throughput: 0: 1627.3. Samples: 1460391. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:19:41,031][04584] Avg episode reward: [(0, '4.316')] [2024-11-07 15:19:43,198][09024] Updated weights for policy 0, policy_version 3387 (0.0046) [2024-11-07 15:19:46,028][04584] Fps is (10 sec: 5734.4, 60 sec: 6075.8, 300 sec: 5359.7). Total num frames: 13893632. Throughput: 0: 1574.5. Samples: 1467081. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2024-11-07 15:19:46,031][04584] Avg episode reward: [(0, '4.465')] [2024-11-07 15:19:48,791][09024] Updated weights for policy 0, policy_version 3397 (0.0040) [2024-11-07 15:19:51,029][04584] Fps is (10 sec: 7372.6, 60 sec: 6417.0, 300 sec: 5345.6). Total num frames: 13930496. Throughput: 0: 1741.6. Samples: 1478289. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-11-07 15:19:51,031][04584] Avg episode reward: [(0, '4.315')] [2024-11-07 15:19:56,032][04584] Fps is (10 sec: 5732.1, 60 sec: 6280.5, 300 sec: 5303.9). Total num frames: 13950976. Throughput: 0: 1703.6. Samples: 1482825. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-11-07 15:19:56,033][04584] Avg episode reward: [(0, '4.279')] [2024-11-07 15:19:56,173][09024] Updated weights for policy 0, policy_version 3407 (0.0061) [2024-11-07 15:20:01,028][04584] Fps is (10 sec: 5325.3, 60 sec: 6348.8, 300 sec: 5345.7). Total num frames: 13983744. Throughput: 0: 1658.4. Samples: 1490166. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:20:01,038][04584] Avg episode reward: [(0, '4.491')] [2024-11-07 15:20:03,670][09024] Updated weights for policy 0, policy_version 3417 (0.0068) [2024-11-07 15:20:06,028][04584] Fps is (10 sec: 5736.7, 60 sec: 6443.0, 300 sec: 5317.9). Total num frames: 14008320. Throughput: 0: 1613.7. Samples: 1498143. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) [2024-11-07 15:20:06,033][04584] Avg episode reward: [(0, '4.486')] [2024-11-07 15:20:10,251][09024] Updated weights for policy 0, policy_version 3427 (0.0055) [2024-11-07 15:20:11,031][04584] Fps is (10 sec: 5323.5, 60 sec: 6416.8, 300 sec: 5303.9). Total num frames: 14036992. Throughput: 0: 1615.9. Samples: 1503183. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:20:11,033][04584] Avg episode reward: [(0, '4.214')] [2024-11-07 15:20:16,028][04584] Fps is (10 sec: 4505.5, 60 sec: 6144.1, 300 sec: 5248.4). Total num frames: 14053376. Throughput: 0: 1494.3. Samples: 1510272. Policy #0 lag: (min: 0.0, avg: 1.5, max: 2.0) [2024-11-07 15:20:16,029][04584] Avg episode reward: [(0, '4.210')] [2024-11-07 15:20:19,639][09024] Updated weights for policy 0, policy_version 3437 (0.0053) [2024-11-07 15:20:21,033][04584] Fps is (10 sec: 4915.4, 60 sec: 6143.8, 300 sec: 5262.3). Total num frames: 14086144. Throughput: 0: 1395.5. Samples: 1517682. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:20:21,037][04584] Avg episode reward: [(0, '4.434')] [2024-11-07 15:20:26,028][04584] Fps is (10 sec: 6143.8, 60 sec: 5939.1, 300 sec: 5234.6). Total num frames: 14114816. Throughput: 0: 1373.1. Samples: 1522182. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:20:26,032][04584] Avg episode reward: [(0, '4.349')] [2024-11-07 15:20:26,131][09024] Updated weights for policy 0, policy_version 3447 (0.0076) [2024-11-07 15:20:31,028][04584] Fps is (10 sec: 6144.9, 60 sec: 5802.6, 300 sec: 5220.7). Total num frames: 14147584. Throughput: 0: 1439.2. Samples: 1531848. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:20:31,031][04584] Avg episode reward: [(0, '4.535')] [2024-11-07 15:20:32,703][09024] Updated weights for policy 0, policy_version 3457 (0.0041) [2024-11-07 15:20:36,028][04584] Fps is (10 sec: 6554.1, 60 sec: 5734.4, 300 sec: 5262.3). Total num frames: 14180352. Throughput: 0: 1398.4. Samples: 1541214. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:20:36,030][04584] Avg episode reward: [(0, '4.496')] [2024-11-07 15:20:38,720][09024] Updated weights for policy 0, policy_version 3467 (0.0053) [2024-11-07 15:20:41,028][04584] Fps is (10 sec: 6553.8, 60 sec: 5939.2, 300 sec: 5248.4). Total num frames: 14213120. Throughput: 0: 1413.5. Samples: 1546428. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2024-11-07 15:20:41,042][04584] Avg episode reward: [(0, '4.401')] [2024-11-07 15:20:45,336][09024] Updated weights for policy 0, policy_version 3477 (0.0039) [2024-11-07 15:20:46,028][04584] Fps is (10 sec: 6143.7, 60 sec: 5802.6, 300 sec: 5262.4). Total num frames: 14241792. Throughput: 0: 1449.3. Samples: 1555386. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-11-07 15:20:46,041][04584] Avg episode reward: [(0, '4.415')] [2024-11-07 15:20:51,028][04584] Fps is (10 sec: 4505.7, 60 sec: 5461.4, 300 sec: 5234.6). Total num frames: 14258176. Throughput: 0: 1400.9. Samples: 1561185. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-11-07 15:20:51,034][04584] Avg episode reward: [(0, '4.540')] [2024-11-07 15:20:54,462][09024] Updated weights for policy 0, policy_version 3487 (0.0060) [2024-11-07 15:20:56,031][04584] Fps is (10 sec: 4913.8, 60 sec: 5666.2, 300 sec: 5262.3). Total num frames: 14290944. Throughput: 0: 1396.8. Samples: 1566039. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) [2024-11-07 15:20:56,042][04584] Avg episode reward: [(0, '4.524')] [2024-11-07 15:21:01,028][04584] Fps is (10 sec: 5734.5, 60 sec: 5529.6, 300 sec: 5276.3). Total num frames: 14315520. Throughput: 0: 1413.1. Samples: 1573863. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:21:01,031][04584] Avg episode reward: [(0, '4.498')] [2024-11-07 15:21:02,337][09024] Updated weights for policy 0, policy_version 3497 (0.0062) [2024-11-07 15:21:06,028][04584] Fps is (10 sec: 4916.8, 60 sec: 5529.6, 300 sec: 5290.1). Total num frames: 14340096. Throughput: 0: 1409.3. Samples: 1581096. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-11-07 15:21:06,031][04584] Avg episode reward: [(0, '4.470')] [2024-11-07 15:21:09,697][09024] Updated weights for policy 0, policy_version 3507 (0.0050) [2024-11-07 15:21:11,028][04584] Fps is (10 sec: 5324.8, 60 sec: 5529.8, 300 sec: 5345.6). Total num frames: 14368768. Throughput: 0: 1419.2. Samples: 1586046. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:21:11,039][04584] Avg episode reward: [(0, '4.391')] [2024-11-07 15:21:11,183][09009] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003509_14372864.pth... [2024-11-07 15:21:11,472][09009] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003205_13127680.pth [2024-11-07 15:21:16,028][04584] Fps is (10 sec: 5734.2, 60 sec: 5734.4, 300 sec: 5373.4). Total num frames: 14397440. Throughput: 0: 1388.4. Samples: 1594326. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:21:16,035][04584] Avg episode reward: [(0, '4.430')] [2024-11-07 15:21:16,959][09024] Updated weights for policy 0, policy_version 3517 (0.0056) [2024-11-07 15:21:21,028][04584] Fps is (10 sec: 5734.3, 60 sec: 5666.3, 300 sec: 5387.3). Total num frames: 14426112. Throughput: 0: 1373.9. Samples: 1603041. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2024-11-07 15:21:21,030][04584] Avg episode reward: [(0, '4.463')] [2024-11-07 15:21:26,028][04584] Fps is (10 sec: 4505.6, 60 sec: 5461.4, 300 sec: 5387.4). Total num frames: 14442496. Throughput: 0: 1323.1. Samples: 1605966. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:21:26,031][04584] Avg episode reward: [(0, '4.369')] [2024-11-07 15:21:26,572][09024] Updated weights for policy 0, policy_version 3527 (0.0072) [2024-11-07 15:21:31,029][04584] Fps is (10 sec: 4505.0, 60 sec: 5393.0, 300 sec: 5401.1). Total num frames: 14471168. Throughput: 0: 1270.9. Samples: 1612578. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) [2024-11-07 15:21:31,031][04584] Avg episode reward: [(0, '4.422')] [2024-11-07 15:21:32,854][09024] Updated weights for policy 0, policy_version 3537 (0.0041) [2024-11-07 15:21:36,028][04584] Fps is (10 sec: 6553.3, 60 sec: 5461.3, 300 sec: 5401.1). Total num frames: 14508032. Throughput: 0: 1370.6. Samples: 1622862. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:21:36,030][04584] Avg episode reward: [(0, '4.480')] [2024-11-07 15:21:38,882][09024] Updated weights for policy 0, policy_version 3547 (0.0037) [2024-11-07 15:21:41,028][04584] Fps is (10 sec: 6964.0, 60 sec: 5461.3, 300 sec: 5387.3). Total num frames: 14540800. Throughput: 0: 1373.6. Samples: 1627848. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:21:41,031][04584] Avg episode reward: [(0, '4.407')] [2024-11-07 15:21:45,700][09024] Updated weights for policy 0, policy_version 3557 (0.0073) [2024-11-07 15:21:46,028][04584] Fps is (10 sec: 6144.4, 60 sec: 5461.3, 300 sec: 5415.1). Total num frames: 14569472. Throughput: 0: 1403.5. Samples: 1637022. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:21:46,032][04584] Avg episode reward: [(0, '4.427')] [2024-11-07 15:21:51,028][04584] Fps is (10 sec: 6553.7, 60 sec: 5802.7, 300 sec: 5415.1). Total num frames: 14606336. Throughput: 0: 1468.6. Samples: 1647183. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:21:51,030][04584] Avg episode reward: [(0, '4.294')] [2024-11-07 15:21:51,480][09024] Updated weights for policy 0, policy_version 3567 (0.0047) [2024-11-07 15:21:56,028][04584] Fps is (10 sec: 6963.1, 60 sec: 5802.9, 300 sec: 5387.3). Total num frames: 14639104. Throughput: 0: 1479.7. Samples: 1652631. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:21:56,037][04584] Avg episode reward: [(0, '4.431')] [2024-11-07 15:21:59,534][09024] Updated weights for policy 0, policy_version 3577 (0.0050) [2024-11-07 15:22:01,029][04584] Fps is (10 sec: 5324.4, 60 sec: 5734.3, 300 sec: 5331.7). Total num frames: 14659584. Throughput: 0: 1446.8. Samples: 1659432. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:22:01,033][04584] Avg episode reward: [(0, '4.272')] [2024-11-07 15:22:06,031][04584] Fps is (10 sec: 4913.5, 60 sec: 5802.3, 300 sec: 5303.9). Total num frames: 14688256. Throughput: 0: 1439.5. Samples: 1667823. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) [2024-11-07 15:22:06,033][04584] Avg episode reward: [(0, '4.296')] [2024-11-07 15:22:06,795][09024] Updated weights for policy 0, policy_version 3587 (0.0040) [2024-11-07 15:22:11,027][04584] Fps is (10 sec: 5735.0, 60 sec: 5802.7, 300 sec: 5276.2). Total num frames: 14716928. Throughput: 0: 1472.8. Samples: 1672242. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-11-07 15:22:11,030][04584] Avg episode reward: [(0, '4.152')] [2024-11-07 15:22:12,703][09024] Updated weights for policy 0, policy_version 3597 (0.0034) [2024-11-07 15:22:16,029][04584] Fps is (10 sec: 6145.8, 60 sec: 5870.9, 300 sec: 5276.2). Total num frames: 14749696. Throughput: 0: 1566.5. Samples: 1683072. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:22:16,030][04584] Avg episode reward: [(0, '4.296')] [2024-11-07 15:22:20,096][09024] Updated weights for policy 0, policy_version 3607 (0.0047) [2024-11-07 15:22:21,028][04584] Fps is (10 sec: 5734.2, 60 sec: 5802.7, 300 sec: 5290.1). Total num frames: 14774272. Throughput: 0: 1506.8. Samples: 1690665. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2024-11-07 15:22:21,031][04584] Avg episode reward: [(0, '4.260')] [2024-11-07 15:22:26,028][04584] Fps is (10 sec: 4505.9, 60 sec: 5871.0, 300 sec: 5234.6). Total num frames: 14794752. Throughput: 0: 1458.4. Samples: 1693476. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:22:26,031][04584] Avg episode reward: [(0, '4.190')] [2024-11-07 15:22:29,189][09024] Updated weights for policy 0, policy_version 3617 (0.0069) [2024-11-07 15:22:31,030][04584] Fps is (10 sec: 4914.7, 60 sec: 5871.0, 300 sec: 5220.6). Total num frames: 14823424. Throughput: 0: 1418.5. Samples: 1700856. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-11-07 15:22:31,035][04584] Avg episode reward: [(0, '4.454')] [2024-11-07 15:22:36,031][04584] Fps is (10 sec: 4094.8, 60 sec: 5461.1, 300 sec: 5206.8). Total num frames: 14835712. Throughput: 0: 1289.9. Samples: 1705233. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2024-11-07 15:22:36,034][04584] Avg episode reward: [(0, '4.470')] [2024-11-07 15:22:39,738][09024] Updated weights for policy 0, policy_version 3627 (0.0034) [2024-11-07 15:22:41,030][04584] Fps is (10 sec: 3685.9, 60 sec: 5324.6, 300 sec: 5234.6). Total num frames: 14860288. Throughput: 0: 1252.8. Samples: 1709010. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:22:41,035][04584] Avg episode reward: [(0, '4.389')] [2024-11-07 15:22:46,028][04584] Fps is (10 sec: 5326.5, 60 sec: 5324.8, 300 sec: 5262.4). Total num frames: 14888960. Throughput: 0: 1274.8. Samples: 1716798. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-11-07 15:22:46,038][04584] Avg episode reward: [(0, '4.541')] [2024-11-07 15:22:47,436][09024] Updated weights for policy 0, policy_version 3637 (0.0068) [2024-11-07 15:22:51,028][04584] Fps is (10 sec: 5326.1, 60 sec: 5120.0, 300 sec: 5329.6). Total num frames: 14913536. Throughput: 0: 1276.2. Samples: 1725249. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2024-11-07 15:22:51,030][04584] Avg episode reward: [(0, '4.592')] [2024-11-07 15:22:54,034][09024] Updated weights for policy 0, policy_version 3647 (0.0041) [2024-11-07 15:22:56,028][04584] Fps is (10 sec: 6144.0, 60 sec: 5188.3, 300 sec: 5373.6). Total num frames: 14950400. Throughput: 0: 1286.1. Samples: 1730118. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:22:56,029][04584] Avg episode reward: [(0, '4.450')] [2024-11-07 15:22:59,685][09024] Updated weights for policy 0, policy_version 3657 (0.0033) [2024-11-07 15:23:01,029][04584] Fps is (10 sec: 6962.5, 60 sec: 5393.1, 300 sec: 5456.7). Total num frames: 14983168. Throughput: 0: 1286.4. Samples: 1740960. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:23:01,043][04584] Avg episode reward: [(0, '4.578')] [2024-11-07 15:23:07,933][04584] Fps is (10 sec: 5160.5, 60 sec: 5227.4, 300 sec: 5435.5). Total num frames: 15011840. Throughput: 0: 1254.4. Samples: 1749504. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:23:07,937][04584] Avg episode reward: [(0, '4.497')] [2024-11-07 15:23:08,868][09024] Updated weights for policy 0, policy_version 3667 (0.0061) [2024-11-07 15:23:11,028][04584] Fps is (10 sec: 4915.6, 60 sec: 5256.5, 300 sec: 5470.7). Total num frames: 15032320. Throughput: 0: 1274.7. Samples: 1750839. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-11-07 15:23:11,030][04584] Avg episode reward: [(0, '4.241')] [2024-11-07 15:23:11,056][09009] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003670_15032320.pth... [2024-11-07 15:23:11,488][09009] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003333_13651968.pth [2024-11-07 15:23:15,771][09024] Updated weights for policy 0, policy_version 3677 (0.0062) [2024-11-07 15:23:16,028][04584] Fps is (10 sec: 6072.5, 60 sec: 5188.3, 300 sec: 5498.4). Total num frames: 15060992. Throughput: 0: 1307.6. Samples: 1759698. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:23:16,030][04584] Avg episode reward: [(0, '4.370')] [2024-11-07 15:23:21,028][04584] Fps is (10 sec: 5734.5, 60 sec: 5256.6, 300 sec: 5512.2). Total num frames: 15089664. Throughput: 0: 1416.4. Samples: 1768965. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:23:21,030][04584] Avg episode reward: [(0, '4.447')] [2024-11-07 15:23:22,754][09024] Updated weights for policy 0, policy_version 3687 (0.0042) [2024-11-07 15:23:26,028][04584] Fps is (10 sec: 6553.6, 60 sec: 5529.6, 300 sec: 5623.9). Total num frames: 15126528. Throughput: 0: 1440.3. Samples: 1773822. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:23:26,031][04584] Avg episode reward: [(0, '4.320')] [2024-11-07 15:23:27,486][09024] Updated weights for policy 0, policy_version 3697 (0.0048) [2024-11-07 15:23:31,028][04584] Fps is (10 sec: 7781.8, 60 sec: 5734.4, 300 sec: 5692.7). Total num frames: 15167488. Throughput: 0: 1530.0. Samples: 1785651. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:23:31,032][04584] Avg episode reward: [(0, '4.306')] [2024-11-07 15:23:32,802][09024] Updated weights for policy 0, policy_version 3707 (0.0041) [2024-11-07 15:23:36,033][04584] Fps is (10 sec: 7778.0, 60 sec: 6143.7, 300 sec: 5762.1). Total num frames: 15204352. Throughput: 0: 1604.4. Samples: 1797456. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-11-07 15:23:36,036][04584] Avg episode reward: [(0, '4.600')] [2024-11-07 15:23:39,526][09024] Updated weights for policy 0, policy_version 3717 (0.0049) [2024-11-07 15:23:42,489][04584] Fps is (10 sec: 5361.2, 60 sec: 5998.2, 300 sec: 5733.8). Total num frames: 15228928. Throughput: 0: 1531.2. Samples: 1801260. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-11-07 15:23:42,491][04584] Avg episode reward: [(0, '4.589')] [2024-11-07 15:23:46,027][04584] Fps is (10 sec: 4508.3, 60 sec: 6007.5, 300 sec: 5776.1). Total num frames: 15249408. Throughput: 0: 1467.6. Samples: 1806999. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-11-07 15:23:46,030][04584] Avg episode reward: [(0, '4.559')] [2024-11-07 15:23:47,911][09024] Updated weights for policy 0, policy_version 3727 (0.0051) [2024-11-07 15:23:51,028][04584] Fps is (10 sec: 6715.1, 60 sec: 6212.2, 300 sec: 5803.9). Total num frames: 15286272. Throughput: 0: 1578.8. Samples: 1817544. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:23:51,030][04584] Avg episode reward: [(0, '4.696')] [2024-11-07 15:23:53,536][09024] Updated weights for policy 0, policy_version 3737 (0.0043) [2024-11-07 15:23:56,028][04584] Fps is (10 sec: 7782.1, 60 sec: 6280.5, 300 sec: 5845.5). Total num frames: 15327232. Throughput: 0: 1607.7. Samples: 1823187. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:23:56,029][04584] Avg episode reward: [(0, '4.438')] [2024-11-07 15:23:58,746][09024] Updated weights for policy 0, policy_version 3747 (0.0047) [2024-11-07 15:24:01,028][04584] Fps is (10 sec: 7782.8, 60 sec: 6348.9, 300 sec: 5905.9). Total num frames: 15364096. Throughput: 0: 1671.8. Samples: 1834929. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:24:01,033][04584] Avg episode reward: [(0, '4.452')] [2024-11-07 15:24:04,806][09024] Updated weights for policy 0, policy_version 3757 (0.0063) [2024-11-07 15:24:06,030][04584] Fps is (10 sec: 6961.5, 60 sec: 6627.3, 300 sec: 5914.9). Total num frames: 15396864. Throughput: 0: 1695.1. Samples: 1845249. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:24:06,035][04584] Avg episode reward: [(0, '4.504')] [2024-11-07 15:24:10,314][09024] Updated weights for policy 0, policy_version 3767 (0.0044) [2024-11-07 15:24:11,028][04584] Fps is (10 sec: 6963.2, 60 sec: 6690.2, 300 sec: 5928.8). Total num frames: 15433728. Throughput: 0: 1711.3. Samples: 1850832. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:24:11,031][04584] Avg episode reward: [(0, '4.455')] [2024-11-07 15:24:16,915][04584] Fps is (10 sec: 5644.4, 60 sec: 6525.3, 300 sec: 5883.3). Total num frames: 15458304. Throughput: 0: 1663.4. Samples: 1861977. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:24:16,920][04584] Avg episode reward: [(0, '4.409')] [2024-11-07 15:24:17,977][09024] Updated weights for policy 0, policy_version 3777 (0.0042) [2024-11-07 15:24:21,033][04584] Fps is (10 sec: 5734.2, 60 sec: 6690.1, 300 sec: 5873.2). Total num frames: 15491072. Throughput: 0: 1585.9. Samples: 1868814. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) [2024-11-07 15:24:21,041][04584] Avg episode reward: [(0, '4.502')] [2024-11-07 15:24:23,334][09024] Updated weights for policy 0, policy_version 3787 (0.0046) [2024-11-07 15:24:26,028][04584] Fps is (10 sec: 7641.4, 60 sec: 6690.1, 300 sec: 5859.4). Total num frames: 15527936. Throughput: 0: 1676.7. Samples: 1874262. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-11-07 15:24:26,032][04584] Avg episode reward: [(0, '4.557')] [2024-11-07 15:24:28,965][09024] Updated weights for policy 0, policy_version 3797 (0.0066) [2024-11-07 15:24:31,028][04584] Fps is (10 sec: 7782.5, 60 sec: 6690.2, 300 sec: 5873.2). Total num frames: 15568896. Throughput: 0: 1750.1. Samples: 1885752. Policy #0 lag: (min: 0.0, avg: 1.4, max: 4.0) [2024-11-07 15:24:31,030][04584] Avg episode reward: [(0, '4.387')] [2024-11-07 15:24:33,986][09024] Updated weights for policy 0, policy_version 3807 (0.0035) [2024-11-07 15:24:36,028][04584] Fps is (10 sec: 7782.2, 60 sec: 6690.7, 300 sec: 5928.8). Total num frames: 15605760. Throughput: 0: 1776.5. Samples: 1897488. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:24:36,031][04584] Avg episode reward: [(0, '4.476')] [2024-11-07 15:24:39,316][09024] Updated weights for policy 0, policy_version 3817 (0.0047) [2024-11-07 15:24:41,028][04584] Fps is (10 sec: 7372.9, 60 sec: 7067.0, 300 sec: 5928.8). Total num frames: 15642624. Throughput: 0: 1785.7. Samples: 1903545. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:24:41,030][04584] Avg episode reward: [(0, '4.429')] [2024-11-07 15:24:45,770][09024] Updated weights for policy 0, policy_version 3827 (0.0057) [2024-11-07 15:24:46,029][04584] Fps is (10 sec: 6962.5, 60 sec: 7099.6, 300 sec: 5914.9). Total num frames: 15675392. Throughput: 0: 1739.0. Samples: 1913187. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-11-07 15:24:46,033][04584] Avg episode reward: [(0, '4.549')] [2024-11-07 15:24:51,363][04584] Fps is (10 sec: 5152.1, 60 sec: 6788.8, 300 sec: 5908.3). Total num frames: 15695872. Throughput: 0: 1610.0. Samples: 1918233. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-11-07 15:24:51,366][04584] Avg episode reward: [(0, '4.576')] [2024-11-07 15:24:55,001][09024] Updated weights for policy 0, policy_version 3837 (0.0038) [2024-11-07 15:24:56,028][04584] Fps is (10 sec: 4506.1, 60 sec: 6553.6, 300 sec: 5887.1). Total num frames: 15720448. Throughput: 0: 1615.8. Samples: 1923543. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2024-11-07 15:24:56,030][04584] Avg episode reward: [(0, '4.527')] [2024-11-07 15:25:00,829][09024] Updated weights for policy 0, policy_version 3847 (0.0064) [2024-11-07 15:25:01,028][04584] Fps is (10 sec: 6357.0, 60 sec: 6553.6, 300 sec: 5928.8). Total num frames: 15757312. Throughput: 0: 1617.7. Samples: 1933338. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:25:01,030][04584] Avg episode reward: [(0, '4.338')] [2024-11-07 15:25:06,027][04584] Fps is (10 sec: 7373.0, 60 sec: 6622.2, 300 sec: 5956.6). Total num frames: 15794176. Throughput: 0: 1676.0. Samples: 1944231. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:25:06,032][04584] Avg episode reward: [(0, '4.445')] [2024-11-07 15:25:06,372][09024] Updated weights for policy 0, policy_version 3857 (0.0037) [2024-11-07 15:25:11,031][04584] Fps is (10 sec: 7370.4, 60 sec: 6621.5, 300 sec: 6025.9). Total num frames: 15831040. Throughput: 0: 1680.4. Samples: 1949886. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2024-11-07 15:25:11,033][04584] Avg episode reward: [(0, '4.314')] [2024-11-07 15:25:11,052][09009] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003865_15831040.pth... [2024-11-07 15:25:11,277][09009] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003509_14372864.pth [2024-11-07 15:25:11,863][09024] Updated weights for policy 0, policy_version 3867 (0.0044) [2024-11-07 15:25:16,028][04584] Fps is (10 sec: 7372.7, 60 sec: 6929.2, 300 sec: 6039.9). Total num frames: 15867904. Throughput: 0: 1678.3. Samples: 1961277. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2024-11-07 15:25:16,035][04584] Avg episode reward: [(0, '4.238')] [2024-11-07 15:25:17,024][09024] Updated weights for policy 0, policy_version 3877 (0.0041) [2024-11-07 15:25:21,030][04584] Fps is (10 sec: 7783.6, 60 sec: 6963.0, 300 sec: 6081.5). Total num frames: 15908864. Throughput: 0: 1676.8. Samples: 1972947. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:25:21,032][04584] Avg episode reward: [(0, '4.292')] [2024-11-07 15:25:22,877][09024] Updated weights for policy 0, policy_version 3887 (0.0059) [2024-11-07 15:25:26,028][04584] Fps is (10 sec: 5734.1, 60 sec: 6621.8, 300 sec: 6026.0). Total num frames: 15925248. Throughput: 0: 1657.8. Samples: 1978146. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-11-07 15:25:26,030][04584] Avg episode reward: [(0, '4.499')] [2024-11-07 15:25:30,836][09024] Updated weights for policy 0, policy_version 3897 (0.0053) [2024-11-07 15:25:31,028][04584] Fps is (10 sec: 5325.6, 60 sec: 6553.6, 300 sec: 6039.9). Total num frames: 15962112. Throughput: 0: 1588.0. Samples: 1984647. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-11-07 15:25:31,030][04584] Avg episode reward: [(0, '4.349')] [2024-11-07 15:25:36,029][04584] Fps is (10 sec: 7372.2, 60 sec: 6553.5, 300 sec: 6053.7). Total num frames: 15998976. Throughput: 0: 1733.7. Samples: 1995669. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-11-07 15:25:36,035][04584] Avg episode reward: [(0, '4.612')] [2024-11-07 15:25:36,308][09024] Updated weights for policy 0, policy_version 3907 (0.0043) [2024-11-07 15:25:36,861][09009] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003908_16007168.pth... [2024-11-07 15:25:36,876][09009] Stopping Batcher_0... [2024-11-07 15:25:36,877][09009] Loop batcher_evt_loop terminating... [2024-11-07 15:25:36,874][04584] Component Batcher_0 stopped! [2024-11-07 15:25:36,971][09024] Weights refcount: 2 0 [2024-11-07 15:25:36,977][09024] Stopping InferenceWorker_p0-w0... [2024-11-07 15:25:36,978][09024] Loop inference_proc0-0_evt_loop terminating... [2024-11-07 15:25:36,979][04584] Component InferenceWorker_p0-w0 stopped! [2024-11-07 15:25:37,003][09009] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003670_15032320.pth [2024-11-07 15:25:37,010][09009] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003908_16007168.pth... [2024-11-07 15:25:37,187][09009] Stopping LearnerWorker_p0... [2024-11-07 15:25:37,188][09009] Loop learner_proc0_evt_loop terminating... [2024-11-07 15:25:37,188][04584] Component LearnerWorker_p0 stopped! [2024-11-07 15:25:37,438][04584] Component RolloutWorker_w3 stopped! [2024-11-07 15:25:37,451][04584] Component RolloutWorker_w4 stopped! [2024-11-07 15:25:37,446][09026] Stopping RolloutWorker_w3... [2024-11-07 15:25:37,460][09026] Loop rollout_proc3_evt_loop terminating... [2024-11-07 15:25:37,452][09030] Stopping RolloutWorker_w4... [2024-11-07 15:25:37,467][09030] Loop rollout_proc4_evt_loop terminating... [2024-11-07 15:25:37,514][04584] Component RolloutWorker_w6 stopped! [2024-11-07 15:25:37,522][09037] Stopping RolloutWorker_w6... [2024-11-07 15:25:37,528][09037] Loop rollout_proc6_evt_loop terminating... [2024-11-07 15:25:37,552][04584] Component RolloutWorker_w0 stopped! [2024-11-07 15:25:37,553][09025] Stopping RolloutWorker_w0... [2024-11-07 15:25:37,556][09025] Loop rollout_proc0_evt_loop terminating... [2024-11-07 15:25:37,632][04584] Component RolloutWorker_w1 stopped! [2024-11-07 15:25:37,633][09029] Stopping RolloutWorker_w1... [2024-11-07 15:25:37,643][09029] Loop rollout_proc1_evt_loop terminating... [2024-11-07 15:25:37,666][04584] Component RolloutWorker_w9 stopped! [2024-11-07 15:25:37,664][09039] Stopping RolloutWorker_w9... [2024-11-07 15:25:37,702][09039] Loop rollout_proc9_evt_loop terminating... [2024-11-07 15:25:38,109][04584] Component RolloutWorker_w8 stopped! [2024-11-07 15:25:38,154][04584] Component RolloutWorker_w7 stopped! [2024-11-07 15:25:38,111][09040] Stopping RolloutWorker_w8... [2024-11-07 15:25:38,159][09040] Loop rollout_proc8_evt_loop terminating... [2024-11-07 15:25:38,156][09038] Stopping RolloutWorker_w7... [2024-11-07 15:25:38,168][04584] Component RolloutWorker_w5 stopped! [2024-11-07 15:25:38,172][09038] Loop rollout_proc7_evt_loop terminating... [2024-11-07 15:25:38,172][09028] Stopping RolloutWorker_w5... [2024-11-07 15:25:38,178][09028] Loop rollout_proc5_evt_loop terminating... [2024-11-07 15:25:38,397][04584] Component RolloutWorker_w2 stopped! [2024-11-07 15:25:38,405][09027] Stopping RolloutWorker_w2... [2024-11-07 15:25:38,402][04584] Waiting for process learner_proc0 to stop... [2024-11-07 15:25:38,410][09027] Loop rollout_proc2_evt_loop terminating... [2024-11-07 15:25:44,086][04584] Waiting for process inference_proc0-0 to join... [2024-11-07 15:25:44,088][04584] Waiting for process rollout_proc0 to join... [2024-11-07 15:25:44,090][04584] Waiting for process rollout_proc1 to join... [2024-11-07 15:25:44,092][04584] Waiting for process rollout_proc2 to join... [2024-11-07 15:25:44,094][04584] Waiting for process rollout_proc3 to join... [2024-11-07 15:25:44,096][04584] Waiting for process rollout_proc4 to join... [2024-11-07 15:25:44,098][04584] Waiting for process rollout_proc5 to join... [2024-11-07 15:25:44,099][04584] Waiting for process rollout_proc6 to join... [2024-11-07 15:25:44,102][04584] Waiting for process rollout_proc7 to join... [2024-11-07 15:25:44,106][04584] Waiting for process rollout_proc8 to join... [2024-11-07 15:25:44,108][04584] Waiting for process rollout_proc9 to join... [2024-11-07 15:25:44,110][04584] Batcher 0 profile tree view: batching: 177.8703, releasing_batches: 0.3180 [2024-11-07 15:25:44,112][04584] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0001 wait_policy_total: 27.4860 update_model: 26.3783 weight_update: 0.0043 one_step: 0.0063 handle_policy_step: 1339.9856 deserialize: 47.4091, stack: 6.2756, obs_to_device_normalize: 399.2838, forward: 569.5053, send_messages: 86.6617 prepare_outputs: 187.4270 to_cpu: 142.4986 [2024-11-07 15:25:44,113][04584] Learner 0 profile tree view: misc: 0.0142, prepare_batch: 70.5279 train: 315.6715 epoch_init: 0.0392, minibatch_init: 0.0518, losses_postprocess: 3.5406, kl_divergence: 4.0413, after_optimizer: 18.4153 calculate_losses: 108.2507 losses_init: 0.0171, forward_head: 8.9566, bptt_initial: 63.2059, tail: 4.5197, advantages_returns: 1.3485, losses: 15.0900 bptt: 13.8609 bptt_forward_core: 13.4036 update: 178.6702 clip: 5.0453 [2024-11-07 15:25:44,118][04584] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.6293, enqueue_policy_requests: 37.4664, env_step: 596.6403, overhead: 39.0474, complete_rollouts: 2.4262 save_policy_outputs: 47.7074 split_output_tensors: 15.5779 [2024-11-07 15:25:44,120][04584] RolloutWorker_w9 profile tree view: wait_for_trajectories: 0.5428, enqueue_policy_requests: 33.2939, env_step: 802.9854, overhead: 35.3433, complete_rollouts: 1.0714 save_policy_outputs: 45.4609 split_output_tensors: 15.8628 [2024-11-07 15:25:44,122][04584] Loop Runner_EvtLoop terminating... [2024-11-07 15:25:44,126][04584] Runner profile tree view: main_loop: 1467.0711 [2024-11-07 15:25:44,131][04584] Collected {0: 16007168}, FPS: 5447.1 [2024-11-07 15:25:44,737][04584] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json [2024-11-07 15:25:44,739][04584] Adding new argument 'no_render'=True that is not in the saved config file! [2024-11-07 15:25:44,739][04584] Adding new argument 'save_video'=True that is not in the saved config file! [2024-11-07 15:25:44,741][04584] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-11-07 15:25:44,742][04584] Adding new argument 'video_name'=None that is not in the saved config file! [2024-11-07 15:25:44,744][04584] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-11-07 15:25:44,748][04584] Adding new argument 'max_num_episodes'=20 that is not in the saved config file! [2024-11-07 15:25:44,750][04584] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-11-07 15:25:44,752][04584] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-11-07 15:25:44,754][04584] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-11-07 15:25:44,755][04584] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-11-07 15:25:44,758][04584] Adding new argument 'train_script'=None that is not in the saved config file! [2024-11-07 15:25:44,759][04584] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-11-07 15:25:44,763][04584] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-11-07 15:25:45,048][04584] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 15:25:45,059][04584] RunningMeanStd input shape: (1,) [2024-11-07 15:25:45,267][04584] ConvEncoder: input_channels=3 [2024-11-07 15:25:45,537][04584] Conv encoder output size: 512 [2024-11-07 15:25:45,540][04584] Policy head output size: 512 [2024-11-07 15:25:45,688][04584] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003908_16007168.pth... [2024-11-07 15:25:46,915][04584] Num frames 100... [2024-11-07 15:25:47,218][04584] Num frames 200... [2024-11-07 15:25:47,432][04584] Num frames 300... [2024-11-07 15:25:47,678][04584] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2024-11-07 15:25:47,680][04584] Avg episode reward: 3.840, avg true_objective: 3.840 [2024-11-07 15:25:47,727][04584] Num frames 400... [2024-11-07 15:25:47,953][04584] Num frames 500... [2024-11-07 15:25:48,204][04584] Num frames 600... [2024-11-07 15:25:48,378][04584] Avg episode rewards: #0: 3.200, true rewards: #0: 3.200 [2024-11-07 15:25:48,379][04584] Avg episode reward: 3.200, avg true_objective: 3.200 [2024-11-07 15:25:48,527][04584] Num frames 700... [2024-11-07 15:25:48,761][04584] Num frames 800... [2024-11-07 15:25:48,980][04584] Num frames 900... [2024-11-07 15:25:49,257][04584] Num frames 1000... [2024-11-07 15:25:49,367][04584] Avg episode rewards: #0: 3.413, true rewards: #0: 3.413 [2024-11-07 15:25:49,369][04584] Avg episode reward: 3.413, avg true_objective: 3.413 [2024-11-07 15:25:49,550][04584] Num frames 1100... [2024-11-07 15:25:49,786][04584] Num frames 1200... [2024-11-07 15:25:50,076][04584] Num frames 1300... [2024-11-07 15:25:50,336][04584] Num frames 1400... [2024-11-07 15:25:50,412][04584] Avg episode rewards: #0: 3.520, true rewards: #0: 3.520 [2024-11-07 15:25:50,415][04584] Avg episode reward: 3.520, avg true_objective: 3.520 [2024-11-07 15:25:50,657][04584] Num frames 1500... [2024-11-07 15:25:50,950][04584] Num frames 1600... [2024-11-07 15:25:51,197][04584] Num frames 1700... [2024-11-07 15:25:51,432][04584] Num frames 1800... [2024-11-07 15:25:51,561][04584] Avg episode rewards: #0: 3.848, true rewards: #0: 3.648 [2024-11-07 15:25:51,562][04584] Avg episode reward: 3.848, avg true_objective: 3.648 [2024-11-07 15:25:51,762][04584] Num frames 1900... [2024-11-07 15:25:52,003][04584] Num frames 2000... [2024-11-07 15:25:52,256][04584] Num frames 2100... [2024-11-07 15:25:52,501][04584] Num frames 2200... [2024-11-07 15:25:52,725][04584] Avg episode rewards: #0: 4.120, true rewards: #0: 3.787 [2024-11-07 15:25:52,729][04584] Avg episode reward: 4.120, avg true_objective: 3.787 [2024-11-07 15:25:52,810][04584] Num frames 2300... [2024-11-07 15:25:53,060][04584] Num frames 2400... [2024-11-07 15:25:53,296][04584] Num frames 2500... [2024-11-07 15:25:53,533][04584] Num frames 2600... [2024-11-07 15:25:53,726][04584] Avg episode rewards: #0: 4.080, true rewards: #0: 3.794 [2024-11-07 15:25:53,728][04584] Avg episode reward: 4.080, avg true_objective: 3.794 [2024-11-07 15:25:53,866][04584] Num frames 2700... [2024-11-07 15:25:54,143][04584] Num frames 2800... [2024-11-07 15:25:54,389][04584] Num frames 2900... [2024-11-07 15:25:54,622][04584] Num frames 3000... [2024-11-07 15:25:54,786][04584] Avg episode rewards: #0: 4.050, true rewards: #0: 3.800 [2024-11-07 15:25:54,789][04584] Avg episode reward: 4.050, avg true_objective: 3.800 [2024-11-07 15:25:54,940][04584] Num frames 3100... [2024-11-07 15:25:55,222][04584] Num frames 3200... [2024-11-07 15:25:55,492][04584] Num frames 3300... [2024-11-07 15:25:55,765][04584] Num frames 3400... [2024-11-07 15:25:56,060][04584] Avg episode rewards: #0: 4.209, true rewards: #0: 3.876 [2024-11-07 15:25:56,062][04584] Avg episode reward: 4.209, avg true_objective: 3.876 [2024-11-07 15:25:56,108][04584] Num frames 3500... [2024-11-07 15:25:56,386][04584] Num frames 3600... [2024-11-07 15:25:56,661][04584] Num frames 3700... [2024-11-07 15:25:56,965][04584] Num frames 3800... [2024-11-07 15:25:57,293][04584] Avg episode rewards: #0: 4.172, true rewards: #0: 3.872 [2024-11-07 15:25:57,295][04584] Avg episode reward: 4.172, avg true_objective: 3.872 [2024-11-07 15:25:57,407][04584] Num frames 3900... [2024-11-07 15:25:57,705][04584] Num frames 4000... [2024-11-07 15:25:57,968][04584] Num frames 4100... [2024-11-07 15:26:00,392][04584] Num frames 4200... [2024-11-07 15:26:00,634][04584] Avg episode rewards: #0: 4.142, true rewards: #0: 3.869 [2024-11-07 15:26:00,639][04584] Avg episode reward: 4.142, avg true_objective: 3.869 [2024-11-07 15:26:00,803][04584] Num frames 4300... [2024-11-07 15:26:01,112][04584] Num frames 4400... [2024-11-07 15:26:01,437][04584] Num frames 4500... [2024-11-07 15:26:01,718][04584] Num frames 4600... [2024-11-07 15:26:01,904][04584] Avg episode rewards: #0: 4.200, true rewards: #0: 3.867 [2024-11-07 15:26:01,905][04584] Avg episode reward: 4.200, avg true_objective: 3.867 [2024-11-07 15:26:02,104][04584] Num frames 4700... [2024-11-07 15:26:02,487][04584] Num frames 4800... [2024-11-07 15:26:02,890][04584] Num frames 4900... [2024-11-07 15:26:03,241][04584] Num frames 5000... [2024-11-07 15:26:03,366][04584] Avg episode rewards: #0: 4.172, true rewards: #0: 3.865 [2024-11-07 15:26:03,368][04584] Avg episode reward: 4.172, avg true_objective: 3.865 [2024-11-07 15:26:03,607][04584] Num frames 5100... [2024-11-07 15:26:03,918][04584] Num frames 5200... [2024-11-07 15:26:04,195][04584] Num frames 5300... [2024-11-07 15:26:04,502][04584] Num frames 5400... [2024-11-07 15:26:04,799][04584] Avg episode rewards: #0: 4.266, true rewards: #0: 3.909 [2024-11-07 15:26:04,803][04584] Avg episode reward: 4.266, avg true_objective: 3.909 [2024-11-07 15:26:04,915][04584] Num frames 5500... [2024-11-07 15:26:05,256][04584] Num frames 5600... [2024-11-07 15:26:05,613][04584] Num frames 5700... [2024-11-07 15:26:05,979][04584] Num frames 5800... [2024-11-07 15:26:06,209][04584] Avg episode rewards: #0: 4.237, true rewards: #0: 3.904 [2024-11-07 15:26:06,213][04584] Avg episode reward: 4.237, avg true_objective: 3.904 [2024-11-07 15:26:06,396][04584] Num frames 5900... [2024-11-07 15:26:06,733][04584] Num frames 6000... [2024-11-07 15:26:07,059][04584] Num frames 6100... [2024-11-07 15:26:07,383][04584] Num frames 6200... [2024-11-07 15:26:07,575][04584] Avg episode rewards: #0: 4.213, true rewards: #0: 3.900 [2024-11-07 15:26:07,579][04584] Avg episode reward: 4.213, avg true_objective: 3.900 [2024-11-07 15:26:07,773][04584] Num frames 6300... [2024-11-07 15:26:08,122][04584] Num frames 6400... [2024-11-07 15:26:08,510][04584] Avg episode rewards: #0: 4.115, true rewards: #0: 3.821 [2024-11-07 15:26:08,513][04584] Avg episode reward: 4.115, avg true_objective: 3.821 [2024-11-07 15:26:08,534][04584] Num frames 6500... [2024-11-07 15:26:08,840][04584] Num frames 6600... [2024-11-07 15:26:09,118][04584] Num frames 6700... [2024-11-07 15:26:09,426][04584] Num frames 6800... [2024-11-07 15:26:09,749][04584] Avg episode rewards: #0: 4.100, true rewards: #0: 3.822 [2024-11-07 15:26:09,751][04584] Avg episode reward: 4.100, avg true_objective: 3.822 [2024-11-07 15:26:09,827][04584] Num frames 6900... [2024-11-07 15:26:10,154][04584] Num frames 7000... [2024-11-07 15:26:10,468][04584] Num frames 7100... [2024-11-07 15:26:10,765][04584] Num frames 7200... [2024-11-07 15:26:11,015][04584] Avg episode rewards: #0: 4.086, true rewards: #0: 3.823 [2024-11-07 15:26:11,018][04584] Avg episode reward: 4.086, avg true_objective: 3.823 [2024-11-07 15:26:11,145][04584] Num frames 7300... [2024-11-07 15:26:11,448][04584] Num frames 7400... [2024-11-07 15:26:11,817][04584] Num frames 7500... [2024-11-07 15:26:12,169][04584] Num frames 7600... [2024-11-07 15:26:12,493][04584] Avg episode rewards: #0: 4.140, true rewards: #0: 3.840 [2024-11-07 15:26:12,498][04584] Avg episode reward: 4.140, avg true_objective: 3.840 [2024-11-07 15:26:39,781][04584] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4! [2024-11-07 15:26:41,145][04584] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json [2024-11-07 15:26:41,146][04584] Overriding arg 'num_workers' with value 4 passed from command line [2024-11-07 15:26:41,148][04584] Adding new argument 'no_render'=True that is not in the saved config file! [2024-11-07 15:26:41,149][04584] Adding new argument 'save_video'=True that is not in the saved config file! [2024-11-07 15:26:41,152][04584] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-11-07 15:26:41,154][04584] Adding new argument 'video_name'=None that is not in the saved config file! [2024-11-07 15:26:41,156][04584] Adding new argument 'max_num_frames'=150000 that is not in the saved config file! [2024-11-07 15:26:41,157][04584] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-11-07 15:26:41,160][04584] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-11-07 15:26:41,163][04584] Adding new argument 'hf_repository'='alidenewade/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-11-07 15:26:41,169][04584] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-11-07 15:26:41,171][04584] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-11-07 15:26:41,176][04584] Adding new argument 'train_script'=None that is not in the saved config file! [2024-11-07 15:26:41,179][04584] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-11-07 15:26:41,182][04584] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-11-07 15:26:41,222][04584] RunningMeanStd input shape: (3, 72, 128) [2024-11-07 15:26:41,224][04584] RunningMeanStd input shape: (1,) [2024-11-07 15:26:41,260][04584] ConvEncoder: input_channels=3 [2024-11-07 15:26:41,321][04584] Conv encoder output size: 512 [2024-11-07 15:26:41,323][04584] Policy head output size: 512 [2024-11-07 15:26:41,356][04584] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003908_16007168.pth... [2024-11-07 15:26:42,080][04584] Num frames 100... [2024-11-07 15:26:42,204][04584] Avg episode rewards: #0: 1.280, true rewards: #0: 1.280 [2024-11-07 15:26:42,206][04584] Avg episode reward: 1.280, avg true_objective: 1.280 [2024-11-07 15:26:42,386][04584] Num frames 200... [2024-11-07 15:26:42,603][04584] Num frames 300... [2024-11-07 15:26:42,828][04584] Num frames 400... [2024-11-07 15:26:43,054][04584] Num frames 500... [2024-11-07 15:26:43,135][04584] Avg episode rewards: #0: 2.560, true rewards: #0: 2.560 [2024-11-07 15:26:43,137][04584] Avg episode reward: 2.560, avg true_objective: 2.560 [2024-11-07 15:26:43,338][04584] Num frames 600... [2024-11-07 15:26:43,556][04584] Num frames 700... [2024-11-07 15:26:43,790][04584] Num frames 800... [2024-11-07 15:26:44,021][04584] Num frames 900... [2024-11-07 15:26:44,275][04584] Avg episode rewards: #0: 3.973, true rewards: #0: 3.307 [2024-11-07 15:26:44,280][04584] Avg episode reward: 3.973, avg true_objective: 3.307 [2024-11-07 15:26:44,313][04584] Num frames 1000... [2024-11-07 15:26:44,514][04584] Num frames 1100... [2024-11-07 15:26:44,732][04584] Num frames 1200... [2024-11-07 15:26:44,965][04584] Num frames 1300... [2024-11-07 15:26:45,194][04584] Avg episode rewards: #0: 3.940, true rewards: #0: 3.440 [2024-11-07 15:26:45,199][04584] Avg episode reward: 3.940, avg true_objective: 3.440 [2024-11-07 15:26:45,259][04584] Num frames 1400... [2024-11-07 15:26:45,491][04584] Num frames 1500... [2024-11-07 15:26:45,708][04584] Num frames 1600... [2024-11-07 15:26:45,933][04584] Num frames 1700... [2024-11-07 15:26:46,133][04584] Avg episode rewards: #0: 3.920, true rewards: #0: 3.520 [2024-11-07 15:26:46,138][04584] Avg episode reward: 3.920, avg true_objective: 3.520 [2024-11-07 15:26:46,227][04584] Num frames 1800... [2024-11-07 15:26:46,458][04584] Num frames 1900... [2024-11-07 15:26:46,696][04584] Num frames 2000... [2024-11-07 15:26:46,930][04584] Num frames 2100... [2024-11-07 15:26:47,112][04584] Avg episode rewards: #0: 3.907, true rewards: #0: 3.573 [2024-11-07 15:26:47,115][04584] Avg episode reward: 3.907, avg true_objective: 3.573 [2024-11-07 15:26:47,281][04584] Num frames 2200... [2024-11-07 15:26:47,506][04584] Num frames 2300... [2024-11-07 15:26:47,747][04584] Num frames 2400... [2024-11-07 15:26:48,011][04584] Num frames 2500... [2024-11-07 15:26:48,134][04584] Avg episode rewards: #0: 3.897, true rewards: #0: 3.611 [2024-11-07 15:26:48,136][04584] Avg episode reward: 3.897, avg true_objective: 3.611 [2024-11-07 15:26:48,329][04584] Num frames 2600... [2024-11-07 15:26:48,555][04584] Num frames 2700... [2024-11-07 15:26:48,784][04584] Num frames 2800... [2024-11-07 15:26:48,992][04584] Num frames 2900... [2024-11-07 15:26:49,076][04584] Avg episode rewards: #0: 3.890, true rewards: #0: 3.640 [2024-11-07 15:26:49,079][04584] Avg episode reward: 3.890, avg true_objective: 3.640 [2024-11-07 15:26:49,278][04584] Num frames 3000... [2024-11-07 15:26:49,523][04584] Num frames 3100... [2024-11-07 15:26:49,752][04584] Num frames 3200... [2024-11-07 15:26:49,974][04584] Num frames 3300... [2024-11-07 15:26:50,159][04584] Avg episode rewards: #0: 4.067, true rewards: #0: 3.733 [2024-11-07 15:26:50,161][04584] Avg episode reward: 4.067, avg true_objective: 3.733 [2024-11-07 15:26:50,253][04584] Num frames 3400... [2024-11-07 15:26:50,471][04584] Num frames 3500... [2024-11-07 15:26:50,695][04584] Num frames 3600... [2024-11-07 15:26:50,913][04584] Num frames 3700... [2024-11-07 15:26:51,056][04584] Avg episode rewards: #0: 4.044, true rewards: #0: 3.744 [2024-11-07 15:26:51,059][04584] Avg episode reward: 4.044, avg true_objective: 3.744 [2024-11-07 15:27:00,168][04584] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4!