diff --git "a/sf_log.txt" "b/sf_log.txt" --- "a/sf_log.txt" +++ "b/sf_log.txt" @@ -13793,3 +13793,23829 @@ main_loop: 717.3057 [2024-11-07 16:45:06,599][14395] Avg episode rewards: #0: 4.390, true rewards: #0: 4.020 [2024-11-07 16:45:06,603][14395] Avg episode reward: 4.390, avg true_objective: 4.020 [2024-11-07 16:47:35,377][14395] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4! +[2024-11-07 16:47:52,066][14395] The model has been pushed to https://huggingface.co/alidenewade/rl_course_vizdoom_health_gathering_supreme +[2024-11-07 16:48:50,525][14395] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-07 16:48:50,529][14395] Overriding arg 'num_workers' with value 1 passed from command line +[2024-11-07 16:48:50,531][14395] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-11-07 16:48:50,535][14395] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-11-07 16:48:50,537][14395] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 16:48:50,540][14395] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-11-07 16:48:50,543][14395] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 16:48:50,547][14395] Adding new argument 'max_num_episodes'=100000000000 that is not in the saved config file! +[2024-11-07 16:48:50,550][14395] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2024-11-07 16:48:50,552][14395] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2024-11-07 16:48:50,554][14395] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-11-07 16:48:50,558][14395] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-11-07 16:48:50,562][14395] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-11-07 16:48:50,564][14395] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-11-07 16:48:50,568][14395] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-11-07 16:48:50,631][14395] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 16:48:50,637][14395] RunningMeanStd input shape: (1,) +[2024-11-07 16:48:50,697][14395] ConvEncoder: input_channels=3 +[2024-11-07 16:48:50,755][14395] Conv encoder output size: 512 +[2024-11-07 16:48:50,756][14395] Policy head output size: 512 +[2024-11-07 16:48:50,787][14395] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... +[2024-11-07 16:48:51,548][14395] Num frames 100... +[2024-11-07 16:48:51,786][14395] Num frames 200... +[2024-11-07 16:48:52,032][14395] Num frames 300... +[2024-11-07 16:48:52,295][14395] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 16:48:52,298][14395] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 16:48:52,340][14395] Num frames 400... +[2024-11-07 16:48:52,554][14395] Num frames 500... +[2024-11-07 16:48:52,760][14395] Num frames 600... +[2024-11-07 16:48:52,960][14395] Num frames 700... +[2024-11-07 16:48:53,182][14395] Num frames 800... +[2024-11-07 16:48:53,313][14395] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 +[2024-11-07 16:48:53,315][14395] Avg episode reward: 4.660, avg true_objective: 4.160 +[2024-11-07 16:48:53,481][14395] Num frames 900... +[2024-11-07 16:48:53,688][14395] Num frames 1000... +[2024-11-07 16:48:53,910][14395] Num frames 1100... +[2024-11-07 16:48:54,110][14395] Num frames 1200... +[2024-11-07 16:48:54,203][14395] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 +[2024-11-07 16:48:54,207][14395] Avg episode reward: 4.387, avg true_objective: 4.053 +[2024-11-07 16:48:54,393][14395] Num frames 1300... +[2024-11-07 16:48:54,590][14395] Num frames 1400... +[2024-11-07 16:48:54,793][14395] Num frames 1500... +[2024-11-07 16:48:54,989][14395] Num frames 1600... +[2024-11-07 16:48:55,225][14395] Avg episode rewards: #0: 4.740, true rewards: #0: 4.240 +[2024-11-07 16:48:55,229][14395] Avg episode reward: 4.740, avg true_objective: 4.240 +[2024-11-07 16:48:55,250][14395] Num frames 1700... +[2024-11-07 16:48:55,452][14395] Num frames 1800... +[2024-11-07 16:48:55,652][14395] Num frames 1900... +[2024-11-07 16:48:55,860][14395] Num frames 2000... +[2024-11-07 16:48:56,076][14395] Avg episode rewards: #0: 4.560, true rewards: #0: 4.160 +[2024-11-07 16:48:56,080][14395] Avg episode reward: 4.560, avg true_objective: 4.160 +[2024-11-07 16:48:56,136][14395] Num frames 2100... +[2024-11-07 16:48:56,326][14395] Num frames 2200... +[2024-11-07 16:48:56,524][14395] Num frames 2300... +[2024-11-07 16:48:56,734][14395] Num frames 2400... +[2024-11-07 16:48:56,917][14395] Avg episode rewards: #0: 4.440, true rewards: #0: 4.107 +[2024-11-07 16:48:56,922][14395] Avg episode reward: 4.440, avg true_objective: 4.107 +[2024-11-07 16:48:57,011][14395] Num frames 2500... +[2024-11-07 16:48:57,214][14395] Num frames 2600... +[2024-11-07 16:48:57,407][14395] Num frames 2700... +[2024-11-07 16:48:57,620][14395] Num frames 2800... +[2024-11-07 16:48:57,774][14395] Avg episode rewards: #0: 4.354, true rewards: #0: 4.069 +[2024-11-07 16:48:57,778][14395] Avg episode reward: 4.354, avg true_objective: 4.069 +[2024-11-07 16:48:57,897][14395] Num frames 2900... +[2024-11-07 16:48:58,109][14395] Num frames 3000... +[2024-11-07 16:48:58,323][14395] Num frames 3100... +[2024-11-07 16:48:58,527][14395] Num frames 3200... +[2024-11-07 16:48:58,735][14395] Num frames 3300... +[2024-11-07 16:48:58,944][14395] Num frames 3400... +[2024-11-07 16:48:59,133][14395] Avg episode rewards: #0: 4.945, true rewards: #0: 4.320 +[2024-11-07 16:48:59,137][14395] Avg episode reward: 4.945, avg true_objective: 4.320 +[2024-11-07 16:49:23,526][14395] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-07 16:49:23,528][14395] Overriding arg 'num_workers' with value 1 passed from command line +[2024-11-07 16:49:23,529][14395] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-11-07 16:49:23,530][14395] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-11-07 16:49:23,533][14395] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 16:49:23,534][14395] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-11-07 16:49:23,537][14395] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 16:49:23,539][14395] Adding new argument 'max_num_episodes'=100000000000 that is not in the saved config file! +[2024-11-07 16:49:23,540][14395] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2024-11-07 16:49:23,541][14395] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2024-11-07 16:49:23,543][14395] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-11-07 16:49:23,547][14395] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-11-07 16:49:23,548][14395] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-11-07 16:49:23,551][14395] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-11-07 16:49:23,552][14395] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-11-07 16:49:23,621][14395] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 16:49:23,624][14395] RunningMeanStd input shape: (1,) +[2024-11-07 16:49:23,660][14395] ConvEncoder: input_channels=3 +[2024-11-07 16:49:23,709][14395] Conv encoder output size: 512 +[2024-11-07 16:49:23,711][14395] Policy head output size: 512 +[2024-11-07 16:49:23,790][14395] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... +[2024-11-07 16:49:24,636][14395] Num frames 100... +[2024-11-07 16:49:24,860][14395] Num frames 200... +[2024-11-07 16:49:25,076][14395] Num frames 300... +[2024-11-07 16:49:25,299][14395] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 16:49:25,308][14395] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 16:49:25,375][14395] Num frames 400... +[2024-11-07 16:49:25,573][14395] Num frames 500... +[2024-11-07 16:49:25,842][14395] Num frames 600... +[2024-11-07 16:49:26,042][14395] Num frames 700... +[2024-11-07 16:49:26,264][14395] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 16:49:26,266][14395] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 16:49:26,338][14395] Num frames 800... +[2024-11-07 16:49:26,550][14395] Num frames 900... +[2024-11-07 16:49:26,751][14395] Num frames 1000... +[2024-11-07 16:49:26,946][14395] Num frames 1100... +[2024-11-07 16:49:27,151][14395] Num frames 1200... +[2024-11-07 16:49:27,244][14395] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 +[2024-11-07 16:49:27,247][14395] Avg episode reward: 4.387, avg true_objective: 4.053 +[2024-11-07 16:49:27,442][14395] Num frames 1300... +[2024-11-07 16:49:27,696][14395] Num frames 1400... +[2024-11-07 16:49:27,898][14395] Num frames 1500... +[2024-11-07 16:49:28,094][14395] Num frames 1600... +[2024-11-07 16:49:28,212][14395] Avg episode rewards: #0: 4.580, true rewards: #0: 4.080 +[2024-11-07 16:49:28,216][14395] Avg episode reward: 4.580, avg true_objective: 4.080 +[2024-11-07 16:49:28,372][14395] Num frames 1700... +[2024-11-07 16:49:28,588][14395] Num frames 1800... +[2024-11-07 16:49:28,810][14395] Num frames 1900... +[2024-11-07 16:49:29,004][14395] Num frames 2000... +[2024-11-07 16:49:29,229][14395] Avg episode rewards: #0: 4.760, true rewards: #0: 4.160 +[2024-11-07 16:49:29,231][14395] Avg episode reward: 4.760, avg true_objective: 4.160 +[2024-11-07 16:49:29,279][14395] Num frames 2100... +[2024-11-07 16:49:29,486][14395] Num frames 2200... +[2024-11-07 16:49:29,741][14395] Num frames 2300... +[2024-11-07 16:49:29,974][14395] Num frames 2400... +[2024-11-07 16:49:30,158][14395] Avg episode rewards: #0: 4.607, true rewards: #0: 4.107 +[2024-11-07 16:49:30,161][14395] Avg episode reward: 4.607, avg true_objective: 4.107 +[2024-11-07 16:49:30,253][14395] Num frames 2500... +[2024-11-07 16:49:30,489][14395] Num frames 2600... +[2024-11-07 16:49:30,752][14395] Num frames 2700... +[2024-11-07 16:49:30,952][14395] Num frames 2800... +[2024-11-07 16:49:31,102][14395] Avg episode rewards: #0: 4.497, true rewards: #0: 4.069 +[2024-11-07 16:49:31,104][14395] Avg episode reward: 4.497, avg true_objective: 4.069 +[2024-11-07 16:49:31,229][14395] Num frames 2900... +[2024-11-07 16:49:31,450][14395] Num frames 3000... +[2024-11-07 16:49:31,654][14395] Num frames 3100... +[2024-11-07 16:49:31,863][14395] Num frames 3200... +[2024-11-07 16:49:31,985][14395] Avg episode rewards: #0: 4.415, true rewards: #0: 4.040 +[2024-11-07 16:49:31,986][14395] Avg episode reward: 4.415, avg true_objective: 4.040 +[2024-11-07 16:49:32,155][14395] Num frames 3300... +[2024-11-07 16:49:32,376][14395] Num frames 3400... +[2024-11-07 16:49:32,598][14395] Num frames 3500... +[2024-11-07 16:49:32,812][14395] Num frames 3600... +[2024-11-07 16:49:32,904][14395] Avg episode rewards: #0: 4.351, true rewards: #0: 4.018 +[2024-11-07 16:49:32,910][14395] Avg episode reward: 4.351, avg true_objective: 4.018 +[2024-11-07 16:49:33,134][14395] Num frames 3700... +[2024-11-07 16:49:33,332][14395] Num frames 3800... +[2024-11-07 16:49:33,544][14395] Num frames 3900... +[2024-11-07 16:49:33,750][14395] Num frames 4000... +[2024-11-07 16:49:33,803][14395] Avg episode rewards: #0: 4.300, true rewards: #0: 4.000 +[2024-11-07 16:49:33,804][14395] Avg episode reward: 4.300, avg true_objective: 4.000 +[2024-11-07 16:49:34,010][14395] Num frames 4100... +[2024-11-07 16:49:34,225][14395] Num frames 4200... +[2024-11-07 16:49:34,432][14395] Num frames 4300... +[2024-11-07 16:49:34,654][14395] Avg episode rewards: #0: 4.258, true rewards: #0: 3.985 +[2024-11-07 16:49:34,658][14395] Avg episode reward: 4.258, avg true_objective: 3.985 +[2024-11-07 16:49:34,714][14395] Num frames 4400... +[2024-11-07 16:49:34,967][14395] Num frames 4500... +[2024-11-07 16:49:35,191][14395] Num frames 4600... +[2024-11-07 16:49:35,420][14395] Num frames 4700... +[2024-11-07 16:49:35,619][14395] Avg episode rewards: #0: 4.223, true rewards: #0: 3.973 +[2024-11-07 16:49:35,621][14395] Avg episode reward: 4.223, avg true_objective: 3.973 +[2024-11-07 16:49:35,687][14395] Num frames 4800... +[2024-11-07 16:49:35,890][14395] Num frames 4900... +[2024-11-07 16:49:36,083][14395] Num frames 5000... +[2024-11-07 16:49:36,288][14395] Num frames 5100... +[2024-11-07 16:49:36,445][14395] Avg episode rewards: #0: 4.194, true rewards: #0: 3.963 +[2024-11-07 16:49:36,449][14395] Avg episode reward: 4.194, avg true_objective: 3.963 +[2024-11-07 16:49:36,555][14395] Num frames 5200... +[2024-11-07 16:49:36,751][14395] Num frames 5300... +[2024-11-07 16:49:36,953][14395] Num frames 5400... +[2024-11-07 16:49:37,147][14395] Num frames 5500... +[2024-11-07 16:49:37,279][14395] Avg episode rewards: #0: 4.169, true rewards: #0: 3.954 +[2024-11-07 16:49:37,285][14395] Avg episode reward: 4.169, avg true_objective: 3.954 +[2024-11-07 16:49:37,426][14395] Num frames 5600... +[2024-11-07 16:49:37,623][14395] Num frames 5700... +[2024-11-07 16:49:37,808][14395] Num frames 5800... +[2024-11-07 16:49:38,005][14395] Num frames 5900... +[2024-11-07 16:49:38,097][14395] Avg episode rewards: #0: 4.147, true rewards: #0: 3.947 +[2024-11-07 16:49:38,100][14395] Avg episode reward: 4.147, avg true_objective: 3.947 +[2024-11-07 16:49:38,263][14395] Num frames 6000... +[2024-11-07 16:49:38,452][14395] Num frames 6100... +[2024-11-07 16:49:38,635][14395] Num frames 6200... +[2024-11-07 16:49:38,813][14395] Num frames 6300... +[2024-11-07 16:49:38,877][14395] Avg episode rewards: #0: 4.128, true rewards: #0: 3.940 +[2024-11-07 16:49:38,880][14395] Avg episode reward: 4.128, avg true_objective: 3.940 +[2024-11-07 16:49:39,075][14395] Num frames 6400... +[2024-11-07 16:49:39,266][14395] Num frames 6500... +[2024-11-07 16:49:39,447][14395] Num frames 6600... +[2024-11-07 16:49:39,675][14395] Avg episode rewards: #0: 4.111, true rewards: #0: 3.934 +[2024-11-07 16:49:39,677][14395] Avg episode reward: 4.111, avg true_objective: 3.934 +[2024-11-07 16:49:39,704][14395] Num frames 6700... +[2024-11-07 16:49:39,907][14395] Num frames 6800... +[2024-11-07 16:49:40,106][14395] Num frames 6900... +[2024-11-07 16:49:40,306][14395] Num frames 7000... +[2024-11-07 16:49:40,490][14395] Avg episode rewards: #0: 4.096, true rewards: #0: 3.929 +[2024-11-07 16:49:40,496][14395] Avg episode reward: 4.096, avg true_objective: 3.929 +[2024-11-07 16:49:40,568][14395] Num frames 7100... +[2024-11-07 16:49:40,756][14395] Num frames 7200... +[2024-11-07 16:49:40,947][14395] Num frames 7300... +[2024-11-07 16:49:41,129][14395] Num frames 7400... +[2024-11-07 16:49:41,440][14395] Num frames 7500... +[2024-11-07 16:49:41,601][14395] Avg episode rewards: #0: 4.238, true rewards: #0: 3.975 +[2024-11-07 16:49:41,603][14395] Avg episode reward: 4.238, avg true_objective: 3.975 +[2024-11-07 16:49:41,701][14395] Num frames 7600... +[2024-11-07 16:49:41,890][14395] Num frames 7700... +[2024-11-07 16:49:42,089][14395] Num frames 7800... +[2024-11-07 16:49:42,295][14395] Num frames 7900... +[2024-11-07 16:49:42,420][14395] Avg episode rewards: #0: 4.218, true rewards: #0: 3.968 +[2024-11-07 16:49:42,421][14395] Avg episode reward: 4.218, avg true_objective: 3.968 +[2024-11-07 16:49:42,554][14395] Num frames 8000... +[2024-11-07 16:49:42,762][14395] Num frames 8100... +[2024-11-07 16:49:42,951][14395] Num frames 8200... +[2024-11-07 16:49:43,144][14395] Num frames 8300... +[2024-11-07 16:49:43,355][14395] Avg episode rewards: #0: 4.278, true rewards: #0: 3.992 +[2024-11-07 16:49:43,358][14395] Avg episode reward: 4.278, avg true_objective: 3.992 +[2024-11-07 16:49:43,403][14395] Num frames 8400... +[2024-11-07 16:49:43,599][14395] Num frames 8500... +[2024-11-07 16:49:43,790][14395] Num frames 8600... +[2024-11-07 16:49:43,979][14395] Num frames 8700... +[2024-11-07 16:49:44,167][14395] Avg episode rewards: #0: 4.258, true rewards: #0: 3.985 +[2024-11-07 16:49:44,171][14395] Avg episode reward: 4.258, avg true_objective: 3.985 +[2024-11-07 16:49:44,251][14395] Num frames 8800... +[2024-11-07 16:49:44,432][14395] Num frames 8900... +[2024-11-07 16:49:44,618][14395] Num frames 9000... +[2024-11-07 16:49:44,799][14395] Num frames 9100... +[2024-11-07 16:49:44,985][14395] Num frames 9200... +[2024-11-07 16:49:45,071][14395] Avg episode rewards: #0: 4.311, true rewards: #0: 4.007 +[2024-11-07 16:49:45,074][14395] Avg episode reward: 4.311, avg true_objective: 4.007 +[2024-11-07 16:49:45,259][14395] Num frames 9300... +[2024-11-07 16:49:45,504][14395] Num frames 9400... +[2024-11-07 16:49:45,699][14395] Num frames 9500... +[2024-11-07 16:49:45,908][14395] Num frames 9600... +[2024-11-07 16:49:45,961][14395] Avg episode rewards: #0: 4.292, true rewards: #0: 4.000 +[2024-11-07 16:49:45,963][14395] Avg episode reward: 4.292, avg true_objective: 4.000 +[2024-11-07 16:49:46,158][14395] Num frames 9700... +[2024-11-07 16:49:46,344][14395] Num frames 9800... +[2024-11-07 16:49:46,531][14395] Num frames 9900... +[2024-11-07 16:49:46,745][14395] Avg episode rewards: #0: 4.274, true rewards: #0: 3.994 +[2024-11-07 16:49:46,750][14395] Avg episode reward: 4.274, avg true_objective: 3.994 +[2024-11-07 16:49:46,786][14395] Num frames 10000... +[2024-11-07 16:49:46,979][14395] Num frames 10100... +[2024-11-07 16:49:47,156][14395] Num frames 10200... +[2024-11-07 16:49:47,329][14395] Num frames 10300... +[2024-11-07 16:49:47,506][14395] Avg episode rewards: #0: 4.257, true rewards: #0: 3.988 +[2024-11-07 16:49:47,510][14395] Avg episode reward: 4.257, avg true_objective: 3.988 +[2024-11-07 16:49:47,594][14395] Num frames 10400... +[2024-11-07 16:49:47,774][14395] Num frames 10500... +[2024-11-07 16:49:47,952][14395] Num frames 10600... +[2024-11-07 16:49:48,133][14395] Num frames 10700... +[2024-11-07 16:49:48,332][14395] Avg episode rewards: #0: 4.290, true rewards: #0: 3.994 +[2024-11-07 16:49:48,337][14395] Avg episode reward: 4.290, avg true_objective: 3.994 +[2024-11-07 16:49:48,380][14395] Num frames 10800... +[2024-11-07 16:49:48,563][14395] Num frames 10900... +[2024-11-07 16:49:48,743][14395] Num frames 11000... +[2024-11-07 16:49:48,929][14395] Num frames 11100... +[2024-11-07 16:49:49,257][14395] Avg episode rewards: #0: 4.274, true rewards: #0: 3.989 +[2024-11-07 16:49:49,259][14395] Avg episode reward: 4.274, avg true_objective: 3.989 +[2024-11-07 16:49:49,339][14395] Num frames 11200... +[2024-11-07 16:49:49,526][14395] Num frames 11300... +[2024-11-07 16:49:49,719][14395] Num frames 11400... +[2024-11-07 16:49:49,901][14395] Num frames 11500... +[2024-11-07 16:49:50,084][14395] Num frames 11600... +[2024-11-07 16:49:50,270][14395] Num frames 11700... +[2024-11-07 16:49:50,467][14395] Avg episode rewards: #0: 4.440, true rewards: #0: 4.061 +[2024-11-07 16:49:50,472][14395] Avg episode reward: 4.440, avg true_objective: 4.061 +[2024-11-07 16:49:50,542][14395] Num frames 11800... +[2024-11-07 16:49:50,724][14395] Num frames 11900... +[2024-11-07 16:49:50,912][14395] Num frames 12000... +[2024-11-07 16:49:51,103][14395] Num frames 12100... +[2024-11-07 16:49:51,268][14395] Avg episode rewards: #0: 4.420, true rewards: #0: 4.053 +[2024-11-07 16:49:51,271][14395] Avg episode reward: 4.420, avg true_objective: 4.053 +[2024-11-07 16:49:51,363][14395] Num frames 12200... +[2024-11-07 16:49:51,552][14395] Num frames 12300... +[2024-11-07 16:49:51,748][14395] Num frames 12400... +[2024-11-07 16:49:51,934][14395] Num frames 12500... +[2024-11-07 16:49:52,069][14395] Avg episode rewards: #0: 4.401, true rewards: #0: 4.046 +[2024-11-07 16:49:52,074][14395] Avg episode reward: 4.401, avg true_objective: 4.046 +[2024-11-07 16:49:52,194][14395] Num frames 12600... +[2024-11-07 16:49:52,376][14395] Num frames 12700... +[2024-11-07 16:49:52,559][14395] Num frames 12800... +[2024-11-07 16:49:52,762][14395] Num frames 12900... +[2024-11-07 16:49:52,876][14395] Avg episode rewards: #0: 4.384, true rewards: #0: 4.040 +[2024-11-07 16:49:52,877][14395] Avg episode reward: 4.384, avg true_objective: 4.040 +[2024-11-07 16:49:53,036][14395] Num frames 13000... +[2024-11-07 16:49:53,267][14395] Num frames 13100... +[2024-11-07 16:49:53,470][14395] Num frames 13200... +[2024-11-07 16:49:53,706][14395] Num frames 13300... +[2024-11-07 16:49:53,915][14395] Avg episode rewards: #0: 4.417, true rewards: #0: 4.053 +[2024-11-07 16:49:53,916][14395] Avg episode reward: 4.417, avg true_objective: 4.053 +[2024-11-07 16:49:53,979][14395] Num frames 13400... +[2024-11-07 16:49:54,194][14395] Num frames 13500... +[2024-11-07 16:49:54,427][14395] Num frames 13600... +[2024-11-07 16:49:54,678][14395] Num frames 13700... +[2024-11-07 16:49:56,906][14395] Avg episode rewards: #0: 4.400, true rewards: #0: 4.047 +[2024-11-07 16:49:56,907][14395] Avg episode reward: 4.400, avg true_objective: 4.047 +[2024-11-07 16:49:56,987][14395] Num frames 13800... +[2024-11-07 16:49:57,191][14395] Num frames 13900... +[2024-11-07 16:49:57,383][14395] Num frames 14000... +[2024-11-07 16:49:57,560][14395] Num frames 14100... +[2024-11-07 16:49:57,724][14395] Avg episode rewards: #0: 4.384, true rewards: #0: 4.041 +[2024-11-07 16:49:57,725][14395] Avg episode reward: 4.384, avg true_objective: 4.041 +[2024-11-07 16:49:57,832][14395] Num frames 14200... +[2024-11-07 16:49:58,066][14395] Num frames 14300... +[2024-11-07 16:49:58,273][14395] Num frames 14400... +[2024-11-07 16:49:58,492][14395] Num frames 14500... +[2024-11-07 16:49:58,671][14395] Avg episode rewards: #0: 4.406, true rewards: #0: 4.044 +[2024-11-07 16:49:58,673][14395] Avg episode reward: 4.406, avg true_objective: 4.044 +[2024-11-07 16:49:58,754][14395] Num frames 14600... +[2024-11-07 16:49:58,957][14395] Num frames 14700... +[2024-11-07 16:49:59,150][14395] Num frames 14800... +[2024-11-07 16:49:59,346][14395] Num frames 14900... +[2024-11-07 16:49:59,541][14395] Num frames 15000... +[2024-11-07 16:49:59,615][14395] Avg episode rewards: #0: 4.435, true rewards: #0: 4.056 +[2024-11-07 16:49:59,619][14395] Avg episode reward: 4.435, avg true_objective: 4.056 +[2024-11-07 16:49:59,831][14395] Num frames 15100... +[2024-11-07 16:50:00,028][14395] Num frames 15200... +[2024-11-07 16:50:00,217][14395] Num frames 15300... +[2024-11-07 16:50:00,400][14395] Num frames 15400... +[2024-11-07 16:50:00,499][14395] Avg episode rewards: #0: 4.480, true rewards: #0: 4.059 +[2024-11-07 16:50:00,503][14395] Avg episode reward: 4.480, avg true_objective: 4.059 +[2024-11-07 16:50:00,679][14395] Num frames 15500... +[2024-11-07 16:50:00,881][14395] Num frames 15600... +[2024-11-07 16:50:01,073][14395] Num frames 15700... +[2024-11-07 16:50:01,278][14395] Num frames 15800... +[2024-11-07 16:50:01,354][14395] Avg episode rewards: #0: 4.464, true rewards: #0: 4.053 +[2024-11-07 16:50:01,355][14395] Avg episode reward: 4.464, avg true_objective: 4.053 +[2024-11-07 16:50:01,539][14395] Num frames 15900... +[2024-11-07 16:50:01,848][14395] Num frames 16000... +[2024-11-07 16:50:02,073][14395] Avg episode rewards: #0: 4.416, true rewards: #0: 4.016 +[2024-11-07 16:50:02,076][14395] Avg episode reward: 4.416, avg true_objective: 4.016 +[2024-11-07 16:50:02,164][14395] Num frames 16100... +[2024-11-07 16:50:02,428][14395] Num frames 16200... +[2024-11-07 16:50:02,695][14395] Num frames 16300... +[2024-11-07 16:50:02,887][14395] Num frames 16400... +[2024-11-07 16:50:03,088][14395] Avg episode rewards: #0: 4.434, true rewards: #0: 4.020 +[2024-11-07 16:50:03,091][14395] Avg episode reward: 4.434, avg true_objective: 4.020 +[2024-11-07 16:50:03,148][14395] Num frames 16500... +[2024-11-07 16:50:03,334][14395] Num frames 16600... +[2024-11-07 16:50:03,535][14395] Num frames 16700... +[2024-11-07 16:50:03,733][14395] Num frames 16800... +[2024-11-07 16:50:03,906][14395] Avg episode rewards: #0: 4.420, true rewards: #0: 4.015 +[2024-11-07 16:50:03,912][14395] Avg episode reward: 4.420, avg true_objective: 4.015 +[2024-11-07 16:50:03,992][14395] Num frames 16900... +[2024-11-07 16:50:04,183][14395] Num frames 17000... +[2024-11-07 16:50:04,382][14395] Num frames 17100... +[2024-11-07 16:50:04,593][14395] Num frames 17200... +[2024-11-07 16:50:04,747][14395] Avg episode rewards: #0: 4.407, true rewards: #0: 4.011 +[2024-11-07 16:50:04,751][14395] Avg episode reward: 4.407, avg true_objective: 4.011 +[2024-11-07 16:50:04,867][14395] Num frames 17300... +[2024-11-07 16:50:05,055][14395] Num frames 17400... +[2024-11-07 16:50:05,266][14395] Num frames 17500... +[2024-11-07 16:50:05,466][14395] Num frames 17600... +[2024-11-07 16:50:05,586][14395] Avg episode rewards: #0: 4.394, true rewards: #0: 4.007 +[2024-11-07 16:50:05,589][14395] Avg episode reward: 4.394, avg true_objective: 4.007 +[2024-11-07 16:50:05,748][14395] Num frames 17700... +[2024-11-07 16:50:05,938][14395] Num frames 17800... +[2024-11-07 16:50:06,146][14395] Num frames 17900... +[2024-11-07 16:50:06,382][14395] Num frames 18000... +[2024-11-07 16:50:06,597][14395] Num frames 18100... +[2024-11-07 16:50:06,803][14395] Avg episode rewards: #0: 4.461, true rewards: #0: 4.039 +[2024-11-07 16:50:06,809][14395] Avg episode reward: 4.461, avg true_objective: 4.039 +[2024-11-07 16:50:06,875][14395] Num frames 18200... +[2024-11-07 16:50:07,076][14395] Num frames 18300... +[2024-11-07 16:50:07,256][14395] Num frames 18400... +[2024-11-07 16:50:07,437][14395] Num frames 18500... +[2024-11-07 16:50:07,605][14395] Avg episode rewards: #0: 4.448, true rewards: #0: 4.035 +[2024-11-07 16:50:07,608][14395] Avg episode reward: 4.448, avg true_objective: 4.035 +[2024-11-07 16:50:07,702][14395] Num frames 18600... +[2024-11-07 16:50:07,884][14395] Num frames 18700... +[2024-11-07 16:50:08,063][14395] Num frames 18800... +[2024-11-07 16:50:08,245][14395] Num frames 18900... +[2024-11-07 16:50:08,382][14395] Avg episode rewards: #0: 4.435, true rewards: #0: 4.031 +[2024-11-07 16:50:08,385][14395] Avg episode reward: 4.435, avg true_objective: 4.031 +[2024-11-07 16:50:08,507][14395] Num frames 19000... +[2024-11-07 16:50:08,694][14395] Num frames 19100... +[2024-11-07 16:50:08,876][14395] Num frames 19200... +[2024-11-07 16:50:09,060][14395] Num frames 19300... +[2024-11-07 16:50:09,168][14395] Avg episode rewards: #0: 4.423, true rewards: #0: 4.027 +[2024-11-07 16:50:09,169][14395] Avg episode reward: 4.423, avg true_objective: 4.027 +[2024-11-07 16:50:09,320][14395] Num frames 19400... +[2024-11-07 16:50:09,502][14395] Num frames 19500... +[2024-11-07 16:50:09,692][14395] Num frames 19600... +[2024-11-07 16:50:09,869][14395] Num frames 19700... +[2024-11-07 16:50:09,947][14395] Avg episode rewards: #0: 4.411, true rewards: #0: 4.023 +[2024-11-07 16:50:09,949][14395] Avg episode reward: 4.411, avg true_objective: 4.023 +[2024-11-07 16:50:10,124][14395] Num frames 19800... +[2024-11-07 16:50:10,299][14395] Num frames 19900... +[2024-11-07 16:50:10,478][14395] Num frames 20000... +[2024-11-07 16:50:10,715][14395] Avg episode rewards: #0: 4.399, true rewards: #0: 4.019 +[2024-11-07 16:50:10,719][14395] Avg episode reward: 4.399, avg true_objective: 4.019 +[2024-11-07 16:50:10,747][14395] Num frames 20100... +[2024-11-07 16:50:10,965][14395] Num frames 20200... +[2024-11-07 16:50:11,149][14395] Num frames 20300... +[2024-11-07 16:50:11,329][14395] Num frames 20400... +[2024-11-07 16:50:11,529][14395] Avg episode rewards: #0: 4.388, true rewards: #0: 4.016 +[2024-11-07 16:50:11,532][14395] Avg episode reward: 4.388, avg true_objective: 4.016 +[2024-11-07 16:50:11,579][14395] Num frames 20500... +[2024-11-07 16:50:11,820][14395] Num frames 20600... +[2024-11-07 16:50:12,007][14395] Num frames 20700... +[2024-11-07 16:50:12,189][14395] Num frames 20800... +[2024-11-07 16:50:12,366][14395] Avg episode rewards: #0: 4.378, true rewards: #0: 4.012 +[2024-11-07 16:50:12,369][14395] Avg episode reward: 4.378, avg true_objective: 4.012 +[2024-11-07 16:50:12,453][14395] Num frames 20900... +[2024-11-07 16:50:12,647][14395] Num frames 21000... +[2024-11-07 16:50:12,873][14395] Num frames 21100... +[2024-11-07 16:50:13,079][14395] Num frames 21200... +[2024-11-07 16:50:13,321][14395] Num frames 21300... +[2024-11-07 16:50:13,405][14395] Avg episode rewards: #0: 4.398, true rewards: #0: 4.021 +[2024-11-07 16:50:13,406][14395] Avg episode reward: 4.398, avg true_objective: 4.021 +[2024-11-07 16:50:13,607][14395] Num frames 21400... +[2024-11-07 16:50:13,842][14395] Num frames 21500... +[2024-11-07 16:50:14,067][14395] Num frames 21600... +[2024-11-07 16:50:14,333][14395] Avg episode rewards: #0: 4.388, true rewards: #0: 4.018 +[2024-11-07 16:50:14,334][14395] Avg episode reward: 4.388, avg true_objective: 4.018 +[2024-11-07 16:50:14,350][14395] Num frames 21700... +[2024-11-07 16:50:14,566][14395] Num frames 21800... +[2024-11-07 16:50:14,779][14395] Num frames 21900... +[2024-11-07 16:50:14,961][14395] Num frames 22000... +[2024-11-07 16:50:15,160][14395] Avg episode rewards: #0: 4.378, true rewards: #0: 4.015 +[2024-11-07 16:50:15,162][14395] Avg episode reward: 4.378, avg true_objective: 4.015 +[2024-11-07 16:50:15,203][14395] Num frames 22100... +[2024-11-07 16:50:15,383][14395] Num frames 22200... +[2024-11-07 16:50:15,563][14395] Num frames 22300... +[2024-11-07 16:50:15,761][14395] Num frames 22400... +[2024-11-07 16:50:16,021][14395] Avg episode rewards: #0: 4.392, true rewards: #0: 4.017 +[2024-11-07 16:50:16,024][14395] Avg episode reward: 4.392, avg true_objective: 4.017 +[2024-11-07 16:50:16,045][14395] Num frames 22500... +[2024-11-07 16:50:16,279][14395] Num frames 22600... +[2024-11-07 16:50:16,530][14395] Num frames 22700... +[2024-11-07 16:50:16,758][14395] Num frames 22800... +[2024-11-07 16:50:16,989][14395] Avg episode rewards: #0: 4.382, true rewards: #0: 4.014 +[2024-11-07 16:50:16,992][14395] Avg episode reward: 4.382, avg true_objective: 4.014 +[2024-11-07 16:50:17,064][14395] Num frames 22900... +[2024-11-07 16:50:17,345][14395] Num frames 23000... +[2024-11-07 16:50:17,632][14395] Num frames 23100... +[2024-11-07 16:50:17,873][14395] Num frames 23200... +[2024-11-07 16:50:18,298][14395] Avg episode rewards: #0: 4.373, true rewards: #0: 4.011 +[2024-11-07 16:50:18,301][14395] Avg episode reward: 4.373, avg true_objective: 4.011 +[2024-11-07 16:50:18,395][14395] Num frames 23300... +[2024-11-07 16:50:18,582][14395] Num frames 23400... +[2024-11-07 16:50:18,759][14395] Num frames 23500... +[2024-11-07 16:50:18,947][14395] Num frames 23600... +[2024-11-07 16:50:19,089][14395] Avg episode rewards: #0: 4.364, true rewards: #0: 4.008 +[2024-11-07 16:50:19,091][14395] Avg episode reward: 4.364, avg true_objective: 4.008 +[2024-11-07 16:50:19,233][14395] Num frames 23700... +[2024-11-07 16:50:19,551][14395] Num frames 23800... +[2024-11-07 16:50:19,792][14395] Num frames 23900... +[2024-11-07 16:50:19,985][14395] Num frames 24000... +[2024-11-07 16:50:20,105][14395] Avg episode rewards: #0: 4.355, true rewards: #0: 4.005 +[2024-11-07 16:50:20,110][14395] Avg episode reward: 4.355, avg true_objective: 4.005 +[2024-11-07 16:50:20,300][14395] Num frames 24100... +[2024-11-07 16:50:20,560][14395] Num frames 24200... +[2024-11-07 16:50:20,756][14395] Num frames 24300... +[2024-11-07 16:50:20,944][14395] Num frames 24400... +[2024-11-07 16:50:21,034][14395] Avg episode rewards: #0: 4.347, true rewards: #0: 4.003 +[2024-11-07 16:50:21,035][14395] Avg episode reward: 4.347, avg true_objective: 4.003 +[2024-11-07 16:50:21,197][14395] Num frames 24500... +[2024-11-07 16:50:21,402][14395] Num frames 24600... +[2024-11-07 16:50:21,595][14395] Num frames 24700... +[2024-11-07 16:50:21,796][14395] Num frames 24800... +[2024-11-07 16:50:21,992][14395] Num frames 24900... +[2024-11-07 16:50:22,162][14395] Avg episode rewards: #0: 4.397, true rewards: #0: 4.026 +[2024-11-07 16:50:22,164][14395] Avg episode reward: 4.397, avg true_objective: 4.026 +[2024-11-07 16:50:22,267][14395] Num frames 25000... +[2024-11-07 16:50:22,574][14395] Num frames 25100... +[2024-11-07 16:50:22,807][14395] Num frames 25200... +[2024-11-07 16:50:23,060][14395] Num frames 25300... +[2024-11-07 16:50:23,231][14395] Avg episode rewards: #0: 4.388, true rewards: #0: 4.023 +[2024-11-07 16:50:23,233][14395] Avg episode reward: 4.388, avg true_objective: 4.023 +[2024-11-07 16:50:23,389][14395] Num frames 25400... +[2024-11-07 16:50:23,644][14395] Num frames 25500... +[2024-11-07 16:50:23,871][14395] Num frames 25600... +[2024-11-07 16:50:24,398][14395] Num frames 25700... +[2024-11-07 16:50:24,541][14395] Avg episode rewards: #0: 4.379, true rewards: #0: 4.020 +[2024-11-07 16:50:24,543][14395] Avg episode reward: 4.379, avg true_objective: 4.020 +[2024-11-07 16:50:24,945][14395] Num frames 25800... +[2024-11-07 16:50:25,316][14395] Num frames 25900... +[2024-11-07 16:50:25,834][14395] Num frames 26000... +[2024-11-07 16:50:26,354][14395] Num frames 26100... +[2024-11-07 16:50:26,456][14395] Avg episode rewards: #0: 4.371, true rewards: #0: 4.017 +[2024-11-07 16:50:26,458][14395] Avg episode reward: 4.371, avg true_objective: 4.017 +[2024-11-07 16:50:26,851][14395] Num frames 26200... +[2024-11-07 16:50:27,258][14395] Num frames 26300... +[2024-11-07 16:50:27,731][14395] Num frames 26400... +[2024-11-07 16:50:28,213][14395] Avg episode rewards: #0: 4.363, true rewards: #0: 4.015 +[2024-11-07 16:50:28,215][14395] Avg episode reward: 4.363, avg true_objective: 4.015 +[2024-11-07 16:50:28,229][14395] Num frames 26500... +[2024-11-07 16:50:28,554][14395] Num frames 26600... +[2024-11-07 16:50:29,001][14395] Num frames 26700... +[2024-11-07 16:50:29,200][14395] Avg episode rewards: #0: 4.336, true rewards: #0: 3.993 +[2024-11-07 16:50:29,202][14395] Avg episode reward: 4.336, avg true_objective: 3.993 +[2024-11-07 16:50:31,294][14395] Num frames 26800... +[2024-11-07 16:50:31,551][14395] Num frames 26900... +[2024-11-07 16:50:31,873][14395] Num frames 27000... +[2024-11-07 16:50:32,161][14395] Num frames 27100... +[2024-11-07 16:50:32,332][14395] Avg episode rewards: #0: 4.329, true rewards: #0: 3.991 +[2024-11-07 16:50:32,334][14395] Avg episode reward: 4.329, avg true_objective: 3.991 +[2024-11-07 16:50:32,477][14395] Num frames 27200... +[2024-11-07 16:50:32,825][14395] Num frames 27300... +[2024-11-07 16:50:33,321][14395] Num frames 27400... +[2024-11-07 16:50:33,591][14395] Num frames 27500... +[2024-11-07 16:50:33,923][14395] Avg episode rewards: #0: 4.346, true rewards: #0: 3.998 +[2024-11-07 16:50:33,925][14395] Avg episode reward: 4.346, avg true_objective: 3.998 +[2024-11-07 16:50:33,958][14395] Num frames 27600... +[2024-11-07 16:50:34,165][14395] Num frames 27700... +[2024-11-07 16:50:34,390][14395] Num frames 27800... +[2024-11-07 16:50:34,585][14395] Num frames 27900... +[2024-11-07 16:50:34,805][14395] Avg episode rewards: #0: 4.338, true rewards: #0: 3.995 +[2024-11-07 16:50:34,806][14395] Avg episode reward: 4.338, avg true_objective: 3.995 +[2024-11-07 16:50:34,888][14395] Num frames 28000... +[2024-11-07 16:50:35,087][14395] Num frames 28100... +[2024-11-07 16:50:35,294][14395] Num frames 28200... +[2024-11-07 16:50:35,493][14395] Num frames 28300... +[2024-11-07 16:50:35,688][14395] Num frames 28400... +[2024-11-07 16:50:35,780][14395] Avg episode rewards: #0: 4.354, true rewards: #0: 4.002 +[2024-11-07 16:50:35,785][14395] Avg episode reward: 4.354, avg true_objective: 4.002 +[2024-11-07 16:50:35,967][14395] Num frames 28500... +[2024-11-07 16:50:36,152][14395] Num frames 28600... +[2024-11-07 16:50:36,348][14395] Num frames 28700... +[2024-11-07 16:50:36,548][14395] Num frames 28800... +[2024-11-07 16:50:36,601][14395] Avg episode rewards: #0: 4.347, true rewards: #0: 4.000 +[2024-11-07 16:50:36,604][14395] Avg episode reward: 4.347, avg true_objective: 4.000 +[2024-11-07 16:50:36,827][14395] Num frames 28900... +[2024-11-07 16:50:37,019][14395] Num frames 29000... +[2024-11-07 16:50:37,212][14395] Num frames 29100... +[2024-11-07 16:50:37,430][14395] Avg episode rewards: #0: 4.340, true rewards: #0: 3.998 +[2024-11-07 16:50:37,433][14395] Avg episode reward: 4.340, avg true_objective: 3.998 +[2024-11-07 16:50:37,484][14395] Num frames 29200... +[2024-11-07 16:50:37,676][14395] Num frames 29300... +[2024-11-07 16:50:37,869][14395] Num frames 29400... +[2024-11-07 16:50:38,059][14395] Num frames 29500... +[2024-11-07 16:50:38,246][14395] Avg episode rewards: #0: 4.334, true rewards: #0: 3.996 +[2024-11-07 16:50:38,249][14395] Avg episode reward: 4.334, avg true_objective: 3.996 +[2024-11-07 16:50:38,337][14395] Num frames 29600... +[2024-11-07 16:50:38,537][14395] Num frames 29700... +[2024-11-07 16:50:38,726][14395] Num frames 29800... +[2024-11-07 16:50:38,908][14395] Num frames 29900... +[2024-11-07 16:50:39,101][14395] Num frames 30000... +[2024-11-07 16:50:39,289][14395] Num frames 30100... +[2024-11-07 16:50:39,369][14395] Avg episode rewards: #0: 4.375, true rewards: #0: 4.015 +[2024-11-07 16:50:39,372][14395] Avg episode reward: 4.375, avg true_objective: 4.015 +[2024-11-07 16:50:39,558][14395] Num frames 30200... +[2024-11-07 16:50:39,758][14395] Num frames 30300... +[2024-11-07 16:50:39,957][14395] Num frames 30400... +[2024-11-07 16:50:40,215][14395] Avg episode rewards: #0: 4.368, true rewards: #0: 4.013 +[2024-11-07 16:50:40,217][14395] Avg episode reward: 4.368, avg true_objective: 4.013 +[2024-11-07 16:50:40,229][14395] Num frames 30500... +[2024-11-07 16:50:40,426][14395] Num frames 30600... +[2024-11-07 16:50:40,622][14395] Num frames 30700... +[2024-11-07 16:50:40,842][14395] Num frames 30800... +[2024-11-07 16:50:40,989][14395] Avg episode rewards: #0: 4.370, true rewards: #0: 4.006 +[2024-11-07 16:50:40,991][14395] Avg episode reward: 4.370, avg true_objective: 4.006 +[2024-11-07 16:50:41,099][14395] Num frames 30900... +[2024-11-07 16:50:41,286][14395] Num frames 31000... +[2024-11-07 16:50:41,505][14395] Num frames 31100... +[2024-11-07 16:50:41,704][14395] Num frames 31200... +[2024-11-07 16:50:41,893][14395] Num frames 31300... +[2024-11-07 16:50:42,146][14395] Avg episode rewards: #0: 4.409, true rewards: #0: 4.025 +[2024-11-07 16:50:42,149][14395] Avg episode reward: 4.409, avg true_objective: 4.025 +[2024-11-07 16:50:42,185][14395] Num frames 31400... +[2024-11-07 16:50:42,369][14395] Num frames 31500... +[2024-11-07 16:50:42,569][14395] Num frames 31600... +[2024-11-07 16:50:42,766][14395] Num frames 31700... +[2024-11-07 16:50:42,958][14395] Avg episode rewards: #0: 4.402, true rewards: #0: 4.022 +[2024-11-07 16:50:42,960][14395] Avg episode reward: 4.402, avg true_objective: 4.022 +[2024-11-07 16:50:43,023][14395] Num frames 31800... +[2024-11-07 16:50:43,215][14395] Num frames 31900... +[2024-11-07 16:50:43,411][14395] Num frames 32000... +[2024-11-07 16:50:43,594][14395] Num frames 32100... +[2024-11-07 16:50:43,778][14395] Avg episode rewards: #0: 4.395, true rewards: #0: 4.020 +[2024-11-07 16:50:43,782][14395] Avg episode reward: 4.395, avg true_objective: 4.020 +[2024-11-07 16:50:43,882][14395] Num frames 32200... +[2024-11-07 16:50:44,095][14395] Num frames 32300... +[2024-11-07 16:50:44,296][14395] Num frames 32400... +[2024-11-07 16:50:44,486][14395] Num frames 32500... +[2024-11-07 16:50:44,630][14395] Avg episode rewards: #0: 4.388, true rewards: #0: 4.018 +[2024-11-07 16:50:44,633][14395] Avg episode reward: 4.388, avg true_objective: 4.018 +[2024-11-07 16:50:44,747][14395] Num frames 32600... +[2024-11-07 16:50:44,940][14395] Num frames 32700... +[2024-11-07 16:50:45,161][14395] Num frames 32800... +[2024-11-07 16:50:45,407][14395] Num frames 32900... +[2024-11-07 16:50:45,526][14395] Avg episode rewards: #0: 4.381, true rewards: #0: 4.016 +[2024-11-07 16:50:45,530][14395] Avg episode reward: 4.381, avg true_objective: 4.016 +[2024-11-07 16:50:45,725][14395] Num frames 33000... +[2024-11-07 16:50:45,965][14395] Num frames 33100... +[2024-11-07 16:50:46,207][14395] Num frames 33200... +[2024-11-07 16:50:46,430][14395] Num frames 33300... +[2024-11-07 16:50:46,512][14395] Avg episode rewards: #0: 4.375, true rewards: #0: 4.013 +[2024-11-07 16:50:46,513][14395] Avg episode reward: 4.375, avg true_objective: 4.013 +[2024-11-07 16:50:46,724][14395] Num frames 33400... +[2024-11-07 16:50:46,966][14395] Num frames 33500... +[2024-11-07 16:50:47,184][14395] Num frames 33600... +[2024-11-07 16:50:47,471][14395] Avg episode rewards: #0: 4.369, true rewards: #0: 4.011 +[2024-11-07 16:50:47,474][14395] Avg episode reward: 4.369, avg true_objective: 4.011 +[2024-11-07 16:50:47,497][14395] Num frames 33700... +[2024-11-07 16:50:47,733][14395] Num frames 33800... +[2024-11-07 16:50:47,941][14395] Num frames 33900... +[2024-11-07 16:50:48,120][14395] Num frames 34000... +[2024-11-07 16:50:48,303][14395] Num frames 34100... +[2024-11-07 16:50:48,439][14395] Avg episode rewards: #0: 4.382, true rewards: #0: 4.017 +[2024-11-07 16:50:48,442][14395] Avg episode reward: 4.382, avg true_objective: 4.017 +[2024-11-07 16:50:48,555][14395] Num frames 34200... +[2024-11-07 16:50:48,745][14395] Num frames 34300... +[2024-11-07 16:50:48,934][14395] Num frames 34400... +[2024-11-07 16:50:49,168][14395] Num frames 34500... +[2024-11-07 16:50:49,405][14395] Avg episode rewards: #0: 4.394, true rewards: #0: 4.022 +[2024-11-07 16:50:49,407][14395] Avg episode reward: 4.394, avg true_objective: 4.022 +[2024-11-07 16:50:49,434][14395] Num frames 34600... +[2024-11-07 16:50:49,635][14395] Num frames 34700... +[2024-11-07 16:50:49,829][14395] Num frames 34800... +[2024-11-07 16:50:50,011][14395] Num frames 34900... +[2024-11-07 16:50:50,214][14395] Avg episode rewards: #0: 4.388, true rewards: #0: 4.020 +[2024-11-07 16:50:50,216][14395] Avg episode reward: 4.388, avg true_objective: 4.020 +[2024-11-07 16:50:50,270][14395] Num frames 35000... +[2024-11-07 16:50:50,506][14395] Num frames 35100... +[2024-11-07 16:50:50,718][14395] Num frames 35200... +[2024-11-07 16:50:50,967][14395] Num frames 35300... +[2024-11-07 16:50:51,165][14395] Avg episode rewards: #0: 4.382, true rewards: #0: 4.018 +[2024-11-07 16:50:51,170][14395] Avg episode reward: 4.382, avg true_objective: 4.018 +[2024-11-07 16:50:51,285][14395] Num frames 35400... +[2024-11-07 16:50:51,537][14395] Num frames 35500... +[2024-11-07 16:50:51,790][14395] Num frames 35600... +[2024-11-07 16:50:52,038][14395] Num frames 35700... +[2024-11-07 16:50:52,242][14395] Num frames 35800... +[2024-11-07 16:50:52,318][14395] Avg episode rewards: #0: 4.394, true rewards: #0: 4.023 +[2024-11-07 16:50:52,322][14395] Avg episode reward: 4.394, avg true_objective: 4.023 +[2024-11-07 16:50:52,540][14395] Num frames 35900... +[2024-11-07 16:50:52,737][14395] Num frames 36000... +[2024-11-07 16:50:52,952][14395] Num frames 36100... +[2024-11-07 16:50:53,231][14395] Avg episode rewards: #0: 4.388, true rewards: #0: 4.021 +[2024-11-07 16:50:53,233][14395] Avg episode reward: 4.388, avg true_objective: 4.021 +[2024-11-07 16:50:53,256][14395] Num frames 36200... +[2024-11-07 16:50:53,487][14395] Num frames 36300... +[2024-11-07 16:50:53,741][14395] Num frames 36400... +[2024-11-07 16:50:53,975][14395] Num frames 36500... +[2024-11-07 16:50:54,215][14395] Avg episode rewards: #0: 4.382, true rewards: #0: 4.019 +[2024-11-07 16:50:54,218][14395] Avg episode reward: 4.382, avg true_objective: 4.019 +[2024-11-07 16:50:54,287][14395] Num frames 36600... +[2024-11-07 16:50:54,515][14395] Num frames 36700... +[2024-11-07 16:50:54,741][14395] Num frames 36800... +[2024-11-07 16:50:54,947][14395] Num frames 36900... +[2024-11-07 16:50:55,127][14395] Avg episode rewards: #0: 4.376, true rewards: #0: 4.017 +[2024-11-07 16:50:55,131][14395] Avg episode reward: 4.376, avg true_objective: 4.017 +[2024-11-07 16:50:55,220][14395] Num frames 37000... +[2024-11-07 16:50:55,422][14395] Num frames 37100... +[2024-11-07 16:50:55,626][14395] Num frames 37200... +[2024-11-07 16:50:55,830][14395] Num frames 37300... +[2024-11-07 16:50:55,971][14395] Avg episode rewards: #0: 4.370, true rewards: #0: 4.015 +[2024-11-07 16:50:55,974][14395] Avg episode reward: 4.370, avg true_objective: 4.015 +[2024-11-07 16:50:56,100][14395] Num frames 37400... +[2024-11-07 16:50:56,319][14395] Num frames 37500... +[2024-11-07 16:50:56,549][14395] Num frames 37600... +[2024-11-07 16:50:56,793][14395] Num frames 37700... +[2024-11-07 16:50:56,911][14395] Avg episode rewards: #0: 4.365, true rewards: #0: 4.014 +[2024-11-07 16:50:56,925][14395] Avg episode reward: 4.365, avg true_objective: 4.014 +[2024-11-07 16:50:57,137][14395] Num frames 37800... +[2024-11-07 16:50:57,368][14395] Num frames 37900... +[2024-11-07 16:50:57,614][14395] Num frames 38000... +[2024-11-07 16:50:57,866][14395] Num frames 38100... +[2024-11-07 16:50:58,014][14395] Avg episode rewards: #0: 4.373, true rewards: #0: 4.015 +[2024-11-07 16:50:58,017][14395] Avg episode reward: 4.373, avg true_objective: 4.015 +[2024-11-07 16:50:58,173][14395] Num frames 38200... +[2024-11-07 16:50:58,420][14395] Num frames 38300... +[2024-11-07 16:50:58,652][14395] Num frames 38400... +[2024-11-07 16:50:58,878][14395] Num frames 38500... +[2024-11-07 16:50:58,998][14395] Avg episode rewards: #0: 4.368, true rewards: #0: 4.013 +[2024-11-07 16:50:59,002][14395] Avg episode reward: 4.368, avg true_objective: 4.013 +[2024-11-07 16:50:59,195][14395] Num frames 38600... +[2024-11-07 16:50:59,418][14395] Num frames 38700... +[2024-11-07 16:50:59,674][14395] Num frames 38800... +[2024-11-07 16:50:59,973][14395] Num frames 38900... +[2024-11-07 16:51:00,350][14395] Avg episode rewards: #0: 4.379, true rewards: #0: 4.018 +[2024-11-07 16:51:00,351][14395] Avg episode reward: 4.379, avg true_objective: 4.018 +[2024-11-07 16:51:00,436][14395] Num frames 39000... +[2024-11-07 16:51:00,782][14395] Num frames 39100... +[2024-11-07 16:51:01,099][14395] Num frames 39200... +[2024-11-07 16:51:01,383][14395] Num frames 39300... +[2024-11-07 16:51:01,681][14395] Avg episode rewards: #0: 4.387, true rewards: #0: 4.020 +[2024-11-07 16:51:01,683][14395] Avg episode reward: 4.387, avg true_objective: 4.020 +[2024-11-07 16:51:01,707][14395] Num frames 39400... +[2024-11-07 16:51:01,989][14395] Num frames 39500... +[2024-11-07 16:51:02,273][14395] Num frames 39600... +[2024-11-07 16:51:02,512][14395] Num frames 39700... +[2024-11-07 16:51:02,760][14395] Num frames 39800... +[2024-11-07 16:51:02,911][14395] Avg episode rewards: #0: 4.398, true rewards: #0: 4.024 +[2024-11-07 16:51:02,912][14395] Avg episode reward: 4.398, avg true_objective: 4.024 +[2024-11-07 16:51:03,086][14395] Num frames 39900... +[2024-11-07 16:51:03,329][14395] Num frames 40000... +[2024-11-07 16:51:05,794][14395] Avg episode rewards: #0: 4.380, true rewards: #0: 4.010 +[2024-11-07 16:51:05,795][14395] Avg episode reward: 4.380, avg true_objective: 4.010 +[2024-11-07 16:51:05,822][14395] Num frames 40100... +[2024-11-07 16:51:06,057][14395] Num frames 40200... +[2024-11-07 16:51:06,248][14395] Num frames 40300... +[2024-11-07 16:51:06,439][14395] Num frames 40400... +[2024-11-07 16:51:06,681][14395] Num frames 40500... +[2024-11-07 16:51:06,917][14395] Avg episode rewards: #0: 4.409, true rewards: #0: 4.019 +[2024-11-07 16:51:06,918][14395] Avg episode reward: 4.409, avg true_objective: 4.019 +[2024-11-07 16:51:06,975][14395] Num frames 40600... +[2024-11-07 16:51:07,158][14395] Num frames 40700... +[2024-11-07 16:51:07,361][14395] Num frames 40800... +[2024-11-07 16:51:07,568][14395] Num frames 40900... +[2024-11-07 16:51:07,745][14395] Avg episode rewards: #0: 4.409, true rewards: #0: 4.019 +[2024-11-07 16:51:07,750][14395] Avg episode reward: 4.409, avg true_objective: 4.019 +[2024-11-07 16:51:07,854][14395] Num frames 41000... +[2024-11-07 16:51:08,044][14395] Num frames 41100... +[2024-11-07 16:51:08,278][14395] Num frames 41200... +[2024-11-07 16:51:08,519][14395] Num frames 41300... +[2024-11-07 16:51:08,672][14395] Avg episode rewards: #0: 4.393, true rewards: #0: 4.013 +[2024-11-07 16:51:08,674][14395] Avg episode reward: 4.393, avg true_objective: 4.013 +[2024-11-07 16:51:08,810][14395] Num frames 41400... +[2024-11-07 16:51:09,001][14395] Num frames 41500... +[2024-11-07 16:51:09,236][14395] Num frames 41600... +[2024-11-07 16:51:09,493][14395] Num frames 41700... +[2024-11-07 16:51:09,703][14395] Avg episode rewards: #0: 4.393, true rewards: #0: 4.013 +[2024-11-07 16:51:09,704][14395] Avg episode reward: 4.393, avg true_objective: 4.013 +[2024-11-07 16:51:09,813][14395] Num frames 41800... +[2024-11-07 16:51:10,074][14395] Num frames 41900... +[2024-11-07 16:51:10,350][14395] Num frames 42000... +[2024-11-07 16:51:10,640][14395] Num frames 42100... +[2024-11-07 16:51:10,837][14395] Avg episode rewards: #0: 4.376, true rewards: #0: 4.006 +[2024-11-07 16:51:10,839][14395] Avg episode reward: 4.376, avg true_objective: 4.006 +[2024-11-07 16:51:10,995][14395] Num frames 42200... +[2024-11-07 16:51:11,250][14395] Num frames 42300... +[2024-11-07 16:51:11,469][14395] Num frames 42400... +[2024-11-07 16:51:11,707][14395] Num frames 42500... +[2024-11-07 16:51:11,929][14395] Num frames 42600... +[2024-11-07 16:51:12,184][14395] Num frames 42700... +[2024-11-07 16:51:12,384][14395] Avg episode rewards: #0: 4.429, true rewards: #0: 4.029 +[2024-11-07 16:51:12,385][14395] Avg episode reward: 4.429, avg true_objective: 4.029 +[2024-11-07 16:51:12,512][14395] Num frames 42800... +[2024-11-07 16:51:12,835][14395] Num frames 42900... +[2024-11-07 16:51:13,103][14395] Num frames 43000... +[2024-11-07 16:51:13,311][14395] Num frames 43100... +[2024-11-07 16:51:13,433][14395] Avg episode rewards: #0: 4.429, true rewards: #0: 4.029 +[2024-11-07 16:51:13,436][14395] Avg episode reward: 4.429, avg true_objective: 4.029 +[2024-11-07 16:51:13,561][14395] Num frames 43200... +[2024-11-07 16:51:13,784][14395] Num frames 43300... +[2024-11-07 16:51:13,991][14395] Num frames 43400... +[2024-11-07 16:51:14,206][14395] Num frames 43500... +[2024-11-07 16:51:14,306][14395] Avg episode rewards: #0: 4.429, true rewards: #0: 4.029 +[2024-11-07 16:51:14,311][14395] Avg episode reward: 4.429, avg true_objective: 4.029 +[2024-11-07 16:51:14,507][14395] Num frames 43600... +[2024-11-07 16:51:14,708][14395] Num frames 43700... +[2024-11-07 16:51:14,898][14395] Avg episode rewards: #0: 4.416, true rewards: #0: 4.016 +[2024-11-07 16:51:14,901][14395] Avg episode reward: 4.416, avg true_objective: 4.016 +[2024-11-07 16:51:14,979][14395] Num frames 43800... +[2024-11-07 16:51:15,182][14395] Num frames 43900... +[2024-11-07 16:51:15,400][14395] Num frames 44000... +[2024-11-07 16:51:15,626][14395] Num frames 44100... +[2024-11-07 16:51:15,819][14395] Avg episode rewards: #0: 4.416, true rewards: #0: 4.016 +[2024-11-07 16:51:15,822][14395] Avg episode reward: 4.416, avg true_objective: 4.016 +[2024-11-07 16:51:15,935][14395] Num frames 44200... +[2024-11-07 16:51:16,180][14395] Num frames 44300... +[2024-11-07 16:51:16,544][14395] Num frames 44400... +[2024-11-07 16:51:16,833][14395] Num frames 44500... +[2024-11-07 16:51:16,982][14395] Avg episode rewards: #0: 4.416, true rewards: #0: 4.016 +[2024-11-07 16:51:16,985][14395] Avg episode reward: 4.416, avg true_objective: 4.016 +[2024-11-07 16:51:17,158][14395] Num frames 44600... +[2024-11-07 16:51:17,390][14395] Num frames 44700... +[2024-11-07 16:51:17,672][14395] Num frames 44800... +[2024-11-07 16:51:17,940][14395] Num frames 44900... +[2024-11-07 16:51:18,248][14395] Avg episode rewards: #0: 4.432, true rewards: #0: 4.022 +[2024-11-07 16:51:18,252][14395] Avg episode reward: 4.432, avg true_objective: 4.022 +[2024-11-07 16:51:18,286][14395] Num frames 45000... +[2024-11-07 16:51:18,535][14395] Num frames 45100... +[2024-11-07 16:51:18,787][14395] Num frames 45200... +[2024-11-07 16:51:19,042][14395] Num frames 45300... +[2024-11-07 16:51:19,283][14395] Num frames 45400... +[2024-11-07 16:51:19,423][14395] Avg episode rewards: #0: 4.449, true rewards: #0: 4.029 +[2024-11-07 16:51:19,427][14395] Avg episode reward: 4.449, avg true_objective: 4.029 +[2024-11-07 16:51:19,598][14395] Num frames 45500... +[2024-11-07 16:51:19,822][14395] Num frames 45600... +[2024-11-07 16:51:20,094][14395] Num frames 45700... +[2024-11-07 16:51:20,305][14395] Num frames 45800... +[2024-11-07 16:51:20,425][14395] Avg episode rewards: #0: 4.449, true rewards: #0: 4.029 +[2024-11-07 16:51:20,426][14395] Avg episode reward: 4.449, avg true_objective: 4.029 +[2024-11-07 16:51:20,621][14395] Num frames 45900... +[2024-11-07 16:51:20,871][14395] Num frames 46000... +[2024-11-07 16:51:21,111][14395] Num frames 46100... +[2024-11-07 16:51:21,372][14395] Num frames 46200... +[2024-11-07 16:51:21,450][14395] Avg episode rewards: #0: 4.449, true rewards: #0: 4.029 +[2024-11-07 16:51:21,454][14395] Avg episode reward: 4.449, avg true_objective: 4.029 +[2024-11-07 16:51:21,688][14395] Num frames 46300... +[2024-11-07 16:51:21,927][14395] Num frames 46400... +[2024-11-07 16:51:22,171][14395] Num frames 46500... +[2024-11-07 16:51:22,458][14395] Avg episode rewards: #0: 4.449, true rewards: #0: 4.029 +[2024-11-07 16:51:22,462][14395] Avg episode reward: 4.449, avg true_objective: 4.029 +[2024-11-07 16:51:22,498][14395] Num frames 46600... +[2024-11-07 16:51:22,757][14395] Num frames 46700... +[2024-11-07 16:51:23,021][14395] Num frames 46800... +[2024-11-07 16:51:23,248][14395] Num frames 46900... +[2024-11-07 16:51:23,476][14395] Avg episode rewards: #0: 4.449, true rewards: #0: 4.029 +[2024-11-07 16:51:23,479][14395] Avg episode reward: 4.449, avg true_objective: 4.029 +[2024-11-07 16:51:23,556][14395] Num frames 47000... +[2024-11-07 16:51:23,774][14395] Num frames 47100... +[2024-11-07 16:51:23,986][14395] Num frames 47200... +[2024-11-07 16:51:24,229][14395] Num frames 47300... +[2024-11-07 16:51:24,437][14395] Avg episode rewards: #0: 4.449, true rewards: #0: 4.029 +[2024-11-07 16:51:24,443][14395] Avg episode reward: 4.449, avg true_objective: 4.029 +[2024-11-07 16:51:24,571][14395] Num frames 47400... +[2024-11-07 16:51:24,811][14395] Num frames 47500... +[2024-11-07 16:51:25,095][14395] Num frames 47600... +[2024-11-07 16:51:25,350][14395] Num frames 47700... +[2024-11-07 16:51:25,509][14395] Avg episode rewards: #0: 4.419, true rewards: #0: 4.019 +[2024-11-07 16:51:25,511][14395] Avg episode reward: 4.419, avg true_objective: 4.019 +[2024-11-07 16:51:25,626][14395] Num frames 47800... +[2024-11-07 16:51:25,873][14395] Num frames 47900... +[2024-11-07 16:51:26,119][14395] Num frames 48000... +[2024-11-07 16:51:26,392][14395] Num frames 48100... +[2024-11-07 16:51:26,522][14395] Avg episode rewards: #0: 4.419, true rewards: #0: 4.019 +[2024-11-07 16:51:26,524][14395] Avg episode reward: 4.419, avg true_objective: 4.019 +[2024-11-07 16:51:26,723][14395] Num frames 48200... +[2024-11-07 16:51:26,981][14395] Num frames 48300... +[2024-11-07 16:51:27,242][14395] Num frames 48400... +[2024-11-07 16:51:27,492][14395] Num frames 48500... +[2024-11-07 16:51:27,581][14395] Avg episode rewards: #0: 4.403, true rewards: #0: 4.013 +[2024-11-07 16:51:27,583][14395] Avg episode reward: 4.403, avg true_objective: 4.013 +[2024-11-07 16:51:27,787][14395] Num frames 48600... +[2024-11-07 16:51:28,047][14395] Num frames 48700... +[2024-11-07 16:51:28,269][14395] Num frames 48800... +[2024-11-07 16:51:28,610][14395] Avg episode rewards: #0: 4.403, true rewards: #0: 4.013 +[2024-11-07 16:51:28,613][14395] Avg episode reward: 4.403, avg true_objective: 4.013 +[2024-11-07 16:51:28,629][14395] Num frames 48900... +[2024-11-07 16:51:28,923][14395] Num frames 49000... +[2024-11-07 16:51:29,129][14395] Num frames 49100... +[2024-11-07 16:51:29,341][14395] Num frames 49200... +[2024-11-07 16:51:29,563][14395] Num frames 49300... +[2024-11-07 16:51:29,645][14395] Avg episode rewards: #0: 4.400, true rewards: #0: 4.010 +[2024-11-07 16:51:29,647][14395] Avg episode reward: 4.400, avg true_objective: 4.010 +[2024-11-07 16:51:29,897][14395] Num frames 49400... +[2024-11-07 16:51:30,088][14395] Num frames 49500... +[2024-11-07 16:51:30,344][14395] Num frames 49600... +[2024-11-07 16:51:30,584][14395] Avg episode rewards: #0: 4.400, true rewards: #0: 4.010 +[2024-11-07 16:51:30,588][14395] Avg episode reward: 4.400, avg true_objective: 4.010 +[2024-11-07 16:51:30,611][14395] Num frames 49700... +[2024-11-07 16:51:30,872][14395] Num frames 49800... +[2024-11-07 16:51:31,068][14395] Num frames 49900... +[2024-11-07 16:51:31,509][14395] Num frames 50000... +[2024-11-07 16:51:31,761][14395] Avg episode rewards: #0: 4.400, true rewards: #0: 4.010 +[2024-11-07 16:51:31,763][14395] Avg episode reward: 4.400, avg true_objective: 4.010 +[2024-11-07 16:51:31,811][14395] Num frames 50100... +[2024-11-07 16:51:32,053][14395] Num frames 50200... +[2024-11-07 16:51:32,284][14395] Num frames 50300... +[2024-11-07 16:51:32,566][14395] Num frames 50400... +[2024-11-07 16:51:32,870][14395] Num frames 50500... +[2024-11-07 16:51:33,006][14395] Avg episode rewards: #0: 4.416, true rewards: #0: 4.016 +[2024-11-07 16:51:33,008][14395] Avg episode reward: 4.416, avg true_objective: 4.016 +[2024-11-07 16:51:33,222][14395] Num frames 50600... +[2024-11-07 16:51:33,478][14395] Num frames 50700... +[2024-11-07 16:51:33,747][14395] Num frames 50800... +[2024-11-07 16:51:34,050][14395] Num frames 50900... +[2024-11-07 16:51:34,326][14395] Avg episode rewards: #0: 4.419, true rewards: #0: 4.019 +[2024-11-07 16:51:34,327][14395] Avg episode reward: 4.419, avg true_objective: 4.019 +[2024-11-07 16:51:34,402][14395] Num frames 51000... +[2024-11-07 16:51:34,730][14395] Num frames 51100... +[2024-11-07 16:51:34,949][14395] Num frames 51200... +[2024-11-07 16:51:35,175][14395] Num frames 51300... +[2024-11-07 16:51:35,383][14395] Avg episode rewards: #0: 4.419, true rewards: #0: 4.019 +[2024-11-07 16:51:35,385][14395] Avg episode reward: 4.419, avg true_objective: 4.019 +[2024-11-07 16:51:35,496][14395] Num frames 51400... +[2024-11-07 16:51:35,740][14395] Num frames 51500... +[2024-11-07 16:51:36,023][14395] Num frames 51600... +[2024-11-07 16:51:36,319][14395] Num frames 51700... +[2024-11-07 16:51:36,555][14395] Avg episode rewards: #0: 4.367, true rewards: #0: 3.997 +[2024-11-07 16:51:36,557][14395] Avg episode reward: 4.367, avg true_objective: 3.997 +[2024-11-07 16:51:36,726][14395] Num frames 51800... +[2024-11-07 16:51:36,976][14395] Num frames 51900... +[2024-11-07 16:51:37,229][14395] Num frames 52000... +[2024-11-07 16:51:37,519][14395] Avg episode rewards: #0: 4.374, true rewards: #0: 3.994 +[2024-11-07 16:51:37,520][14395] Avg episode reward: 4.374, avg true_objective: 3.994 +[2024-11-07 16:51:37,531][14395] Num frames 52100... +[2024-11-07 16:51:37,781][14395] Num frames 52200... +[2024-11-07 16:51:40,064][14395] Num frames 52300... +[2024-11-07 16:51:40,310][14395] Num frames 52400... +[2024-11-07 16:51:40,573][14395] Avg episode rewards: #0: 4.374, true rewards: #0: 3.994 +[2024-11-07 16:51:40,578][14395] Avg episode reward: 4.374, avg true_objective: 3.994 +[2024-11-07 16:51:40,660][14395] Num frames 52500... +[2024-11-07 16:51:40,900][14395] Num frames 52600... +[2024-11-07 16:51:41,131][14395] Num frames 52700... +[2024-11-07 16:51:41,381][14395] Num frames 52800... +[2024-11-07 16:51:41,598][14395] Avg episode rewards: #0: 4.374, true rewards: #0: 3.994 +[2024-11-07 16:51:41,599][14395] Avg episode reward: 4.374, avg true_objective: 3.994 +[2024-11-07 16:51:41,711][14395] Num frames 52900... +[2024-11-07 16:51:41,970][14395] Num frames 53000... +[2024-11-07 16:51:42,237][14395] Num frames 53100... +[2024-11-07 16:51:42,466][14395] Num frames 53200... +[2024-11-07 16:51:42,702][14395] Num frames 53300... +[2024-11-07 16:51:42,784][14395] Avg episode rewards: #0: 4.374, true rewards: #0: 3.994 +[2024-11-07 16:51:42,786][14395] Avg episode reward: 4.374, avg true_objective: 3.994 +[2024-11-07 16:51:43,037][14395] Num frames 53400... +[2024-11-07 16:51:43,348][14395] Num frames 53500... +[2024-11-07 16:51:43,595][14395] Num frames 53600... +[2024-11-07 16:51:43,912][14395] Avg episode rewards: #0: 4.374, true rewards: #0: 3.994 +[2024-11-07 16:51:43,915][14395] Avg episode reward: 4.374, avg true_objective: 3.994 +[2024-11-07 16:51:43,934][14395] Num frames 53700... +[2024-11-07 16:51:44,185][14395] Num frames 53800... +[2024-11-07 16:51:44,429][14395] Num frames 53900... +[2024-11-07 16:51:44,715][14395] Num frames 54000... +[2024-11-07 16:51:44,959][14395] Num frames 54100... +[2024-11-07 16:51:45,116][14395] Avg episode rewards: #0: 4.390, true rewards: #0: 4.000 +[2024-11-07 16:51:45,117][14395] Avg episode reward: 4.390, avg true_objective: 4.000 +[2024-11-07 16:51:45,354][14395] Num frames 54200... +[2024-11-07 16:51:45,570][14395] Num frames 54300... +[2024-11-07 16:51:45,816][14395] Num frames 54400... +[2024-11-07 16:51:45,976][14395] Avg episode rewards: #0: 4.377, true rewards: #0: 3.987 +[2024-11-07 16:51:45,980][14395] Avg episode reward: 4.377, avg true_objective: 3.987 +[2024-11-07 16:51:46,164][14395] Num frames 54500... +[2024-11-07 16:51:46,426][14395] Num frames 54600... +[2024-11-07 16:51:46,681][14395] Num frames 54700... +[2024-11-07 16:51:46,932][14395] Num frames 54800... +[2024-11-07 16:51:47,181][14395] Avg episode rewards: #0: 4.377, true rewards: #0: 3.987 +[2024-11-07 16:51:47,183][14395] Avg episode reward: 4.377, avg true_objective: 3.987 +[2024-11-07 16:51:47,273][14395] Num frames 54900... +[2024-11-07 16:51:47,529][14395] Num frames 55000... +[2024-11-07 16:51:47,795][14395] Num frames 55100... +[2024-11-07 16:51:48,035][14395] Num frames 55200... +[2024-11-07 16:51:48,242][14395] Avg episode rewards: #0: 4.354, true rewards: #0: 3.984 +[2024-11-07 16:51:48,244][14395] Avg episode reward: 4.354, avg true_objective: 3.984 +[2024-11-07 16:51:48,331][14395] Num frames 55300... +[2024-11-07 16:51:48,575][14395] Num frames 55400... +[2024-11-07 16:51:48,928][14395] Num frames 55500... +[2024-11-07 16:51:49,165][14395] Num frames 55600... +[2024-11-07 16:51:49,417][14395] Num frames 55700... +[2024-11-07 16:51:49,662][14395] Num frames 55800... +[2024-11-07 16:51:49,871][14395] Num frames 55900... +[2024-11-07 16:51:49,938][14395] Avg episode rewards: #0: 4.420, true rewards: #0: 4.010 +[2024-11-07 16:51:49,939][14395] Avg episode reward: 4.420, avg true_objective: 4.010 +[2024-11-07 16:51:50,164][14395] Num frames 56000... +[2024-11-07 16:51:50,407][14395] Num frames 56100... +[2024-11-07 16:51:50,662][14395] Num frames 56200... +[2024-11-07 16:51:50,934][14395] Avg episode rewards: #0: 4.432, true rewards: #0: 4.022 +[2024-11-07 16:51:50,937][14395] Avg episode reward: 4.432, avg true_objective: 4.022 +[2024-11-07 16:51:50,990][14395] Num frames 56300... +[2024-11-07 16:51:51,233][14395] Num frames 56400... +[2024-11-07 16:51:51,475][14395] Num frames 56500... +[2024-11-07 16:51:51,724][14395] Num frames 56600... +[2024-11-07 16:51:51,924][14395] Avg episode rewards: #0: 4.419, true rewards: #0: 4.019 +[2024-11-07 16:51:51,925][14395] Avg episode reward: 4.419, avg true_objective: 4.019 +[2024-11-07 16:51:51,998][14395] Num frames 56700... +[2024-11-07 16:51:52,256][14395] Num frames 56800... +[2024-11-07 16:51:52,505][14395] Num frames 56900... +[2024-11-07 16:51:52,753][14395] Num frames 57000... +[2024-11-07 16:51:53,028][14395] Avg episode rewards: #0: 4.432, true rewards: #0: 4.022 +[2024-11-07 16:51:53,030][14395] Avg episode reward: 4.432, avg true_objective: 4.022 +[2024-11-07 16:51:53,069][14395] Num frames 57100... +[2024-11-07 16:51:53,301][14395] Num frames 57200... +[2024-11-07 16:51:53,529][14395] Num frames 57300... +[2024-11-07 16:51:53,787][14395] Num frames 57400... +[2024-11-07 16:51:54,012][14395] Num frames 57500... +[2024-11-07 16:51:54,153][14395] Avg episode rewards: #0: 4.449, true rewards: #0: 4.029 +[2024-11-07 16:51:54,155][14395] Avg episode reward: 4.449, avg true_objective: 4.029 +[2024-11-07 16:51:54,324][14395] Num frames 57600... +[2024-11-07 16:51:54,581][14395] Num frames 57700... +[2024-11-07 16:51:54,824][14395] Num frames 57800... +[2024-11-07 16:51:55,076][14395] Num frames 57900... +[2024-11-07 16:51:55,185][14395] Avg episode rewards: #0: 4.449, true rewards: #0: 4.029 +[2024-11-07 16:51:55,188][14395] Avg episode reward: 4.449, avg true_objective: 4.029 +[2024-11-07 16:51:55,391][14395] Num frames 58000... +[2024-11-07 16:51:55,571][14395] Num frames 58100... +[2024-11-07 16:51:55,817][14395] Num frames 58200... +[2024-11-07 16:51:56,039][14395] Num frames 58300... +[2024-11-07 16:51:56,104][14395] Avg episode rewards: #0: 4.413, true rewards: #0: 4.013 +[2024-11-07 16:51:56,105][14395] Avg episode reward: 4.413, avg true_objective: 4.013 +[2024-11-07 16:51:56,357][14395] Num frames 58400... +[2024-11-07 16:51:56,611][14395] Num frames 58500... +[2024-11-07 16:51:56,834][14395] Num frames 58600... +[2024-11-07 16:51:57,083][14395] Num frames 58700... +[2024-11-07 16:51:57,272][14395] Avg episode rewards: #0: 4.429, true rewards: #0: 4.019 +[2024-11-07 16:51:57,275][14395] Avg episode reward: 4.429, avg true_objective: 4.019 +[2024-11-07 16:51:57,420][14395] Num frames 58800... +[2024-11-07 16:51:57,690][14395] Num frames 58900... +[2024-11-07 16:51:57,909][14395] Num frames 59000... +[2024-11-07 16:51:58,159][14395] Num frames 59100... +[2024-11-07 16:51:58,420][14395] Num frames 59200... +[2024-11-07 16:51:58,473][14395] Avg episode rewards: #0: 4.446, true rewards: #0: 4.026 +[2024-11-07 16:51:58,477][14395] Avg episode reward: 4.446, avg true_objective: 4.026 +[2024-11-07 16:51:58,768][14395] Num frames 59300... +[2024-11-07 16:51:58,971][14395] Num frames 59400... +[2024-11-07 16:51:59,177][14395] Num frames 59500... +[2024-11-07 16:51:59,397][14395] Avg episode rewards: #0: 4.446, true rewards: #0: 4.026 +[2024-11-07 16:51:59,399][14395] Avg episode reward: 4.446, avg true_objective: 4.026 +[2024-11-07 16:51:59,436][14395] Num frames 59600... +[2024-11-07 16:51:59,640][14395] Num frames 59700... +[2024-11-07 16:51:59,845][14395] Num frames 59800... +[2024-11-07 16:52:00,028][14395] Num frames 59900... +[2024-11-07 16:52:00,216][14395] Avg episode rewards: #0: 4.446, true rewards: #0: 4.026 +[2024-11-07 16:52:00,217][14395] Avg episode reward: 4.446, avg true_objective: 4.026 +[2024-11-07 16:52:00,278][14395] Num frames 60000... +[2024-11-07 16:52:00,482][14395] Num frames 60100... +[2024-11-07 16:52:00,771][14395] Num frames 60200... +[2024-11-07 16:52:00,880][14395] Avg episode rewards: #0: 4.433, true rewards: #0: 4.013 +[2024-11-07 16:52:00,884][14395] Avg episode reward: 4.433, avg true_objective: 4.013 +[2024-11-07 16:52:01,084][14395] Num frames 60300... +[2024-11-07 16:52:01,344][14395] Num frames 60400... +[2024-11-07 16:52:01,610][14395] Num frames 60500... +[2024-11-07 16:52:01,877][14395] Num frames 60600... +[2024-11-07 16:52:01,954][14395] Avg episode rewards: #0: 4.433, true rewards: #0: 4.013 +[2024-11-07 16:52:01,955][14395] Avg episode reward: 4.433, avg true_objective: 4.013 +[2024-11-07 16:52:02,195][14395] Num frames 60700... +[2024-11-07 16:52:02,447][14395] Num frames 60800... +[2024-11-07 16:52:02,653][14395] Num frames 60900... +[2024-11-07 16:52:02,910][14395] Num frames 61000... +[2024-11-07 16:52:03,086][14395] Avg episode rewards: #0: 4.449, true rewards: #0: 4.019 +[2024-11-07 16:52:03,088][14395] Avg episode reward: 4.449, avg true_objective: 4.019 +[2024-11-07 16:52:03,193][14395] Num frames 61100... +[2024-11-07 16:52:03,459][14395] Num frames 61200... +[2024-11-07 16:52:03,667][14395] Num frames 61300... +[2024-11-07 16:52:03,911][14395] Num frames 61400... +[2024-11-07 16:52:04,039][14395] Avg episode rewards: #0: 4.433, true rewards: #0: 4.013 +[2024-11-07 16:52:04,044][14395] Avg episode reward: 4.433, avg true_objective: 4.013 +[2024-11-07 16:52:04,219][14395] Num frames 61500... +[2024-11-07 16:52:04,436][14395] Num frames 61600... +[2024-11-07 16:52:04,697][14395] Num frames 61700... +[2024-11-07 16:52:04,910][14395] Num frames 61800... +[2024-11-07 16:52:05,017][14395] Avg episode rewards: #0: 4.433, true rewards: #0: 4.013 +[2024-11-07 16:52:05,018][14395] Avg episode reward: 4.433, avg true_objective: 4.013 +[2024-11-07 16:52:05,232][14395] Num frames 61900... +[2024-11-07 16:52:05,499][14395] Num frames 62000... +[2024-11-07 16:52:05,789][14395] Num frames 62100... +[2024-11-07 16:52:06,033][14395] Num frames 62200... +[2024-11-07 16:52:06,258][14395] Avg episode rewards: #0: 4.449, true rewards: #0: 4.019 +[2024-11-07 16:52:06,261][14395] Avg episode reward: 4.449, avg true_objective: 4.019 +[2024-11-07 16:52:06,322][14395] Num frames 62300... +[2024-11-07 16:52:06,554][14395] Num frames 62400... +[2024-11-07 16:52:06,843][14395] Num frames 62500... +[2024-11-07 16:52:07,050][14395] Num frames 62600... +[2024-11-07 16:52:07,230][14395] Avg episode rewards: #0: 4.436, true rewards: #0: 4.016 +[2024-11-07 16:52:07,232][14395] Avg episode reward: 4.436, avg true_objective: 4.016 +[2024-11-07 16:52:07,325][14395] Num frames 62700... +[2024-11-07 16:52:07,585][14395] Num frames 62800... +[2024-11-07 16:52:07,819][14395] Num frames 62900... +[2024-11-07 16:52:08,044][14395] Num frames 63000... +[2024-11-07 16:52:08,186][14395] Avg episode rewards: #0: 4.436, true rewards: #0: 4.016 +[2024-11-07 16:52:08,187][14395] Avg episode reward: 4.436, avg true_objective: 4.016 +[2024-11-07 16:52:08,342][14395] Num frames 63100... +[2024-11-07 16:52:08,610][14395] Num frames 63200... +[2024-11-07 16:52:08,855][14395] Num frames 63300... +[2024-11-07 16:52:09,102][14395] Num frames 63400... +[2024-11-07 16:52:09,221][14395] Avg episode rewards: #0: 4.436, true rewards: #0: 4.016 +[2024-11-07 16:52:09,223][14395] Avg episode reward: 4.436, avg true_objective: 4.016 +[2024-11-07 16:52:09,422][14395] Num frames 63500... +[2024-11-07 16:52:09,664][14395] Num frames 63600... +[2024-11-07 16:52:09,873][14395] Num frames 63700... +[2024-11-07 16:52:10,115][14395] Num frames 63800... +[2024-11-07 16:52:10,191][14395] Avg episode rewards: #0: 4.436, true rewards: #0: 4.016 +[2024-11-07 16:52:10,192][14395] Avg episode reward: 4.436, avg true_objective: 4.016 +[2024-11-07 16:52:10,412][14395] Num frames 63900... +[2024-11-07 16:52:10,673][14395] Num frames 64000... +[2024-11-07 16:52:10,925][14395] Num frames 64100... +[2024-11-07 16:52:11,201][14395] Avg episode rewards: #0: 4.436, true rewards: #0: 4.016 +[2024-11-07 16:52:11,206][14395] Avg episode reward: 4.436, avg true_objective: 4.016 +[2024-11-07 16:52:11,246][14395] Num frames 64200... +[2024-11-07 16:52:11,479][14395] Num frames 64300... +[2024-11-07 16:52:11,802][14395] Num frames 64400... +[2024-11-07 16:52:12,061][14395] Num frames 64500... +[2024-11-07 16:52:14,337][14395] Avg episode rewards: #0: 4.436, true rewards: #0: 4.016 +[2024-11-07 16:52:14,342][14395] Avg episode reward: 4.436, avg true_objective: 4.016 +[2024-11-07 16:52:14,418][14395] Num frames 64600... +[2024-11-07 16:52:14,651][14395] Num frames 64700... +[2024-11-07 16:52:14,886][14395] Num frames 64800... +[2024-11-07 16:52:15,120][14395] Num frames 64900... +[2024-11-07 16:52:15,668][14395] Num frames 65000... +[2024-11-07 16:52:15,779][14395] Avg episode rewards: #0: 4.416, true rewards: #0: 4.006 +[2024-11-07 16:52:15,782][14395] Avg episode reward: 4.416, avg true_objective: 4.006 +[2024-11-07 16:52:15,983][14395] Num frames 65100... +[2024-11-07 16:52:16,228][14395] Num frames 65200... +[2024-11-07 16:52:16,471][14395] Num frames 65300... +[2024-11-07 16:52:16,709][14395] Num frames 65400... +[2024-11-07 16:52:16,779][14395] Avg episode rewards: #0: 4.416, true rewards: #0: 4.006 +[2024-11-07 16:52:16,784][14395] Avg episode reward: 4.416, avg true_objective: 4.006 +[2024-11-07 16:52:17,014][14395] Num frames 65500... +[2024-11-07 16:52:17,241][14395] Num frames 65600... +[2024-11-07 16:52:17,487][14395] Num frames 65700... +[2024-11-07 16:52:17,774][14395] Avg episode rewards: #0: 4.416, true rewards: #0: 4.006 +[2024-11-07 16:52:17,778][14395] Avg episode reward: 4.416, avg true_objective: 4.006 +[2024-11-07 16:52:17,817][14395] Num frames 65800... +[2024-11-07 16:52:18,070][14395] Num frames 65900... +[2024-11-07 16:52:18,313][14395] Num frames 66000... +[2024-11-07 16:52:18,547][14395] Num frames 66100... +[2024-11-07 16:52:18,776][14395] Avg episode rewards: #0: 4.416, true rewards: #0: 4.006 +[2024-11-07 16:52:18,778][14395] Avg episode reward: 4.416, avg true_objective: 4.006 +[2024-11-07 16:52:18,856][14395] Num frames 66200... +[2024-11-07 16:52:19,110][14395] Num frames 66300... +[2024-11-07 16:52:19,361][14395] Num frames 66400... +[2024-11-07 16:52:19,609][14395] Num frames 66500... +[2024-11-07 16:52:19,809][14395] Avg episode rewards: #0: 4.416, true rewards: #0: 4.006 +[2024-11-07 16:52:19,812][14395] Avg episode reward: 4.416, avg true_objective: 4.006 +[2024-11-07 16:52:19,946][14395] Num frames 66600... +[2024-11-07 16:52:20,191][14395] Num frames 66700... +[2024-11-07 16:52:20,610][14395] Num frames 66800... +[2024-11-07 16:52:20,866][14395] Num frames 66900... +[2024-11-07 16:52:21,109][14395] Num frames 67000... +[2024-11-07 16:52:21,184][14395] Avg episode rewards: #0: 4.446, true rewards: #0: 4.026 +[2024-11-07 16:52:21,188][14395] Avg episode reward: 4.446, avg true_objective: 4.026 +[2024-11-07 16:52:21,417][14395] Num frames 67100... +[2024-11-07 16:52:21,654][14395] Num frames 67200... +[2024-11-07 16:52:21,898][14395] Num frames 67300... +[2024-11-07 16:52:22,170][14395] Avg episode rewards: #0: 4.446, true rewards: #0: 4.026 +[2024-11-07 16:52:22,171][14395] Avg episode reward: 4.446, avg true_objective: 4.026 +[2024-11-07 16:52:22,192][14395] Num frames 67400... +[2024-11-07 16:52:22,428][14395] Num frames 67500... +[2024-11-07 16:52:22,679][14395] Num frames 67600... +[2024-11-07 16:52:22,929][14395] Num frames 67700... +[2024-11-07 16:52:23,170][14395] Num frames 67800... +[2024-11-07 16:52:23,372][14395] Num frames 67900... +[2024-11-07 16:52:23,598][14395] Num frames 68000... +[2024-11-07 16:52:23,651][14395] Avg episode rewards: #0: 4.482, true rewards: #0: 4.042 +[2024-11-07 16:52:23,654][14395] Avg episode reward: 4.482, avg true_objective: 4.042 +[2024-11-07 16:52:23,914][14395] Num frames 68100... +[2024-11-07 16:52:24,153][14395] Num frames 68200... +[2024-11-07 16:52:24,406][14395] Num frames 68300... +[2024-11-07 16:52:24,641][14395] Avg episode rewards: #0: 4.482, true rewards: #0: 4.042 +[2024-11-07 16:52:24,646][14395] Avg episode reward: 4.482, avg true_objective: 4.042 +[2024-11-07 16:52:24,694][14395] Num frames 68400... +[2024-11-07 16:52:24,881][14395] Num frames 68500... +[2024-11-07 16:52:25,065][14395] Num frames 68600... +[2024-11-07 16:52:25,247][14395] Num frames 68700... +[2024-11-07 16:52:25,429][14395] Num frames 68800... +[2024-11-07 16:52:25,562][14395] Avg episode rewards: #0: 4.482, true rewards: #0: 4.042 +[2024-11-07 16:52:25,566][14395] Avg episode reward: 4.482, avg true_objective: 4.042 +[2024-11-07 16:52:25,743][14395] Num frames 68900... +[2024-11-07 16:52:25,969][14395] Num frames 69000... +[2024-11-07 16:52:26,221][14395] Avg episode rewards: #0: 4.469, true rewards: #0: 4.029 +[2024-11-07 16:52:26,223][14395] Avg episode reward: 4.469, avg true_objective: 4.029 +[2024-11-07 16:52:26,269][14395] Num frames 69100... +[2024-11-07 16:52:26,493][14395] Num frames 69200... +[2024-11-07 16:52:26,761][14395] Num frames 69300... +[2024-11-07 16:52:27,002][14395] Num frames 69400... +[2024-11-07 16:52:27,249][14395] Avg episode rewards: #0: 4.469, true rewards: #0: 4.029 +[2024-11-07 16:52:27,252][14395] Avg episode reward: 4.469, avg true_objective: 4.029 +[2024-11-07 16:52:27,357][14395] Num frames 69500... +[2024-11-07 16:52:27,704][14395] Num frames 69600... +[2024-11-07 16:52:27,946][14395] Num frames 69700... +[2024-11-07 16:52:28,164][14395] Num frames 69800... +[2024-11-07 16:52:28,354][14395] Avg episode rewards: #0: 4.469, true rewards: #0: 4.029 +[2024-11-07 16:52:28,358][14395] Avg episode reward: 4.469, avg true_objective: 4.029 +[2024-11-07 16:52:28,503][14395] Num frames 69900... +[2024-11-07 16:52:28,775][14395] Num frames 70000... +[2024-11-07 16:52:29,033][14395] Num frames 70100... +[2024-11-07 16:52:29,239][14395] Num frames 70200... +[2024-11-07 16:52:29,457][14395] Num frames 70300... +[2024-11-07 16:52:29,696][14395] Num frames 70400... +[2024-11-07 16:52:29,748][14395] Avg episode rewards: #0: 4.469, true rewards: #0: 4.029 +[2024-11-07 16:52:29,752][14395] Avg episode reward: 4.469, avg true_objective: 4.029 +[2024-11-07 16:52:29,990][14395] Num frames 70500... +[2024-11-07 16:52:30,359][14395] Num frames 70600... +[2024-11-07 16:52:30,577][14395] Num frames 70700... +[2024-11-07 16:52:30,820][14395] Avg episode rewards: #0: 4.469, true rewards: #0: 4.029 +[2024-11-07 16:52:30,822][14395] Avg episode reward: 4.469, avg true_objective: 4.029 +[2024-11-07 16:52:30,859][14395] Num frames 70800... +[2024-11-07 16:52:31,093][14395] Num frames 70900... +[2024-11-07 16:52:31,351][14395] Num frames 71000... +[2024-11-07 16:52:31,584][14395] Num frames 71100... +[2024-11-07 16:52:31,807][14395] Avg episode rewards: #0: 4.462, true rewards: #0: 4.032 +[2024-11-07 16:52:31,810][14395] Avg episode reward: 4.462, avg true_objective: 4.032 +[2024-11-07 16:52:31,899][14395] Num frames 71200... +[2024-11-07 16:52:32,123][14395] Num frames 71300... +[2024-11-07 16:52:32,326][14395] Num frames 71400... +[2024-11-07 16:52:32,538][14395] Num frames 71500... +[2024-11-07 16:52:32,721][14395] Avg episode rewards: #0: 4.426, true rewards: #0: 4.016 +[2024-11-07 16:52:32,726][14395] Avg episode reward: 4.426, avg true_objective: 4.016 +[2024-11-07 16:52:32,833][14395] Num frames 71600... +[2024-11-07 16:52:33,022][14395] Num frames 71700... +[2024-11-07 16:52:33,210][14395] Num frames 71800... +[2024-11-07 16:52:33,396][14395] Num frames 71900... +[2024-11-07 16:52:33,523][14395] Avg episode rewards: #0: 4.426, true rewards: #0: 4.016 +[2024-11-07 16:52:33,526][14395] Avg episode reward: 4.426, avg true_objective: 4.016 +[2024-11-07 16:52:33,687][14395] Num frames 72000... +[2024-11-07 16:52:33,936][14395] Num frames 72100... +[2024-11-07 16:52:34,159][14395] Num frames 72200... +[2024-11-07 16:52:34,487][14395] Num frames 72300... +[2024-11-07 16:52:34,600][14395] Avg episode rewards: #0: 4.426, true rewards: #0: 4.016 +[2024-11-07 16:52:34,604][14395] Avg episode reward: 4.426, avg true_objective: 4.016 +[2024-11-07 16:52:34,858][14395] Num frames 72400... +[2024-11-07 16:52:35,137][14395] Num frames 72500... +[2024-11-07 16:52:35,424][14395] Num frames 72600... +[2024-11-07 16:52:35,731][14395] Num frames 72700... +[2024-11-07 16:52:35,798][14395] Avg episode rewards: #0: 4.426, true rewards: #0: 4.016 +[2024-11-07 16:52:35,803][14395] Avg episode reward: 4.426, avg true_objective: 4.016 +[2024-11-07 16:52:36,087][14395] Num frames 72800... +[2024-11-07 16:52:36,379][14395] Num frames 72900... +[2024-11-07 16:52:36,674][14395] Num frames 73000... +[2024-11-07 16:52:36,978][14395] Avg episode rewards: #0: 4.426, true rewards: #0: 4.016 +[2024-11-07 16:52:36,983][14395] Avg episode reward: 4.426, avg true_objective: 4.016 +[2024-11-07 16:52:37,036][14395] Num frames 73100... +[2024-11-07 16:52:37,327][14395] Num frames 73200... +[2024-11-07 16:52:37,618][14395] Num frames 73300... +[2024-11-07 16:52:37,906][14395] Num frames 73400... +[2024-11-07 16:52:38,179][14395] Avg episode rewards: #0: 4.426, true rewards: #0: 4.016 +[2024-11-07 16:52:38,181][14395] Avg episode reward: 4.426, avg true_objective: 4.016 +[2024-11-07 16:52:38,254][14395] Num frames 73500... +[2024-11-07 16:52:38,543][14395] Num frames 73600... +[2024-11-07 16:52:38,826][14395] Num frames 73700... +[2024-11-07 16:52:39,148][14395] Num frames 73800... +[2024-11-07 16:52:39,383][14395] Avg episode rewards: #0: 4.426, true rewards: #0: 4.016 +[2024-11-07 16:52:39,384][14395] Avg episode reward: 4.426, avg true_objective: 4.016 +[2024-11-07 16:52:39,540][14395] Num frames 73900... +[2024-11-07 16:52:39,794][14395] Num frames 74000... +[2024-11-07 16:52:40,005][14395] Num frames 74100... +[2024-11-07 16:52:40,785][14395] Num frames 74200... +[2024-11-07 16:52:40,971][14395] Avg episode rewards: #0: 4.410, true rewards: #0: 4.010 +[2024-11-07 16:52:40,974][14395] Avg episode reward: 4.410, avg true_objective: 4.010 +[2024-11-07 16:52:41,185][14395] Num frames 74300... +[2024-11-07 16:52:41,490][14395] Num frames 74400... +[2024-11-07 16:52:41,743][14395] Num frames 74500... +[2024-11-07 16:52:42,004][14395] Num frames 74600... +[2024-11-07 16:52:42,228][14395] Num frames 74700... +[2024-11-07 16:52:42,458][14395] Num frames 74800... +[2024-11-07 16:52:42,553][14395] Avg episode rewards: #0: 4.442, true rewards: #0: 4.022 +[2024-11-07 16:52:42,558][14395] Avg episode reward: 4.442, avg true_objective: 4.022 +[2024-11-07 16:52:42,805][14395] Num frames 74900... +[2024-11-07 16:52:43,106][14395] Num frames 75000... +[2024-11-07 16:52:43,376][14395] Num frames 75100... +[2024-11-07 16:52:43,665][14395] Num frames 75200... +[2024-11-07 16:52:43,802][14395] Avg episode rewards: #0: 4.456, true rewards: #0: 4.026 +[2024-11-07 16:52:43,803][14395] Avg episode reward: 4.456, avg true_objective: 4.026 +[2024-11-07 16:52:43,991][14395] Num frames 75300... +[2024-11-07 16:52:44,254][14395] Num frames 75400... +[2024-11-07 16:52:44,618][14395] Num frames 75500... +[2024-11-07 16:52:44,921][14395] Num frames 75600... +[2024-11-07 16:52:45,239][14395] Num frames 75700... +[2024-11-07 16:52:45,519][14395] Avg episode rewards: #0: 4.492, true rewards: #0: 4.042 +[2024-11-07 16:52:45,521][14395] Avg episode reward: 4.492, avg true_objective: 4.042 +[2024-11-07 16:52:45,585][14395] Num frames 75800... +[2024-11-07 16:52:45,955][14395] Num frames 75900... +[2024-11-07 16:52:46,264][14395] Num frames 76000... +[2024-11-07 16:52:48,565][14395] Num frames 76100... +[2024-11-07 16:52:48,788][14395] Avg episode rewards: #0: 4.475, true rewards: #0: 4.035 +[2024-11-07 16:52:48,793][14395] Avg episode reward: 4.475, avg true_objective: 4.035 +[2024-11-07 16:52:48,919][14395] Num frames 76200... +[2024-11-07 16:52:49,210][14395] Num frames 76300... +[2024-11-07 16:52:49,520][14395] Num frames 76400... +[2024-11-07 16:52:49,811][14395] Num frames 76500... +[2024-11-07 16:52:50,107][14395] Num frames 76600... +[2024-11-07 16:52:50,396][14395] Num frames 76700... +[2024-11-07 16:52:50,642][14395] Avg episode rewards: #0: 4.528, true rewards: #0: 4.058 +[2024-11-07 16:52:50,643][14395] Avg episode reward: 4.528, avg true_objective: 4.058 +[2024-11-07 16:52:50,752][14395] Num frames 76800... +[2024-11-07 16:52:51,053][14395] Num frames 76900... +[2024-11-07 16:52:51,350][14395] Num frames 77000... +[2024-11-07 16:52:51,710][14395] Num frames 77100... +[2024-11-07 16:52:52,022][14395] Num frames 77200... +[2024-11-07 16:52:52,271][14395] Num frames 77300... +[2024-11-07 16:52:52,358][14395] Avg episode rewards: #0: 4.564, true rewards: #0: 4.074 +[2024-11-07 16:52:52,361][14395] Avg episode reward: 4.564, avg true_objective: 4.074 +[2024-11-07 16:52:52,606][14395] Num frames 77400... +[2024-11-07 16:52:52,840][14395] Num frames 77500... +[2024-11-07 16:52:53,082][14395] Num frames 77600... +[2024-11-07 16:52:53,319][14395] Num frames 77700... +[2024-11-07 16:52:53,438][14395] Avg episode rewards: #0: 4.577, true rewards: #0: 4.077 +[2024-11-07 16:52:53,441][14395] Avg episode reward: 4.577, avg true_objective: 4.077 +[2024-11-07 16:52:53,643][14395] Num frames 77800... +[2024-11-07 16:52:53,896][14395] Num frames 77900... +[2024-11-07 16:52:54,139][14395] Num frames 78000... +[2024-11-07 16:52:54,386][14395] Num frames 78100... +[2024-11-07 16:52:54,556][14395] Avg episode rewards: #0: 4.600, true rewards: #0: 4.080 +[2024-11-07 16:52:54,558][14395] Avg episode reward: 4.600, avg true_objective: 4.080 +[2024-11-07 16:52:54,695][14395] Num frames 78200... +[2024-11-07 16:52:54,948][14395] Num frames 78300... +[2024-11-07 16:52:55,178][14395] Num frames 78400... +[2024-11-07 16:52:55,419][14395] Num frames 78500... +[2024-11-07 16:52:55,715][14395] Avg episode rewards: #0: 4.616, true rewards: #0: 4.086 +[2024-11-07 16:52:55,718][14395] Avg episode reward: 4.616, avg true_objective: 4.086 +[2024-11-07 16:52:55,758][14395] Num frames 78600... +[2024-11-07 16:52:55,997][14395] Num frames 78700... +[2024-11-07 16:52:56,232][14395] Num frames 78800... +[2024-11-07 16:52:56,463][14395] Num frames 78900... +[2024-11-07 16:52:56,697][14395] Num frames 79000... +[2024-11-07 16:52:56,852][14395] Avg episode rewards: #0: 4.620, true rewards: #0: 4.090 +[2024-11-07 16:52:56,854][14395] Avg episode reward: 4.620, avg true_objective: 4.090 +[2024-11-07 16:52:57,010][14395] Num frames 79100... +[2024-11-07 16:52:57,251][14395] Num frames 79200... +[2024-11-07 16:52:57,497][14395] Num frames 79300... +[2024-11-07 16:52:57,733][14395] Num frames 79400... +[2024-11-07 16:52:57,872][14395] Avg episode rewards: #0: 4.620, true rewards: #0: 4.090 +[2024-11-07 16:52:57,877][14395] Avg episode reward: 4.620, avg true_objective: 4.090 +[2024-11-07 16:52:58,105][14395] Num frames 79500... +[2024-11-07 16:52:58,354][14395] Num frames 79600... +[2024-11-07 16:52:58,609][14395] Num frames 79700... +[2024-11-07 16:52:58,851][14395] Num frames 79800... +[2024-11-07 16:52:59,079][14395] Avg episode rewards: #0: 4.620, true rewards: #0: 4.090 +[2024-11-07 16:52:59,084][14395] Avg episode reward: 4.620, avg true_objective: 4.090 +[2024-11-07 16:52:59,160][14395] Num frames 79900... +[2024-11-07 16:52:59,409][14395] Num frames 80000... +[2024-11-07 16:52:59,659][14395] Num frames 80100... +[2024-11-07 16:52:59,889][14395] Num frames 80200... +[2024-11-07 16:53:00,125][14395] Num frames 80300... +[2024-11-07 16:53:00,218][14395] Avg episode rewards: #0: 4.623, true rewards: #0: 4.093 +[2024-11-07 16:53:00,220][14395] Avg episode reward: 4.623, avg true_objective: 4.093 +[2024-11-07 16:53:00,413][14395] Num frames 80400... +[2024-11-07 16:53:00,769][14395] Num frames 80500... +[2024-11-07 16:53:01,039][14395] Num frames 80600... +[2024-11-07 16:53:01,433][14395] Num frames 80700... +[2024-11-07 16:53:01,505][14395] Avg episode rewards: #0: 4.606, true rewards: #0: 4.086 +[2024-11-07 16:53:01,506][14395] Avg episode reward: 4.606, avg true_objective: 4.086 +[2024-11-07 16:53:01,815][14395] Num frames 80800... +[2024-11-07 16:53:02,139][14395] Num frames 80900... +[2024-11-07 16:53:02,381][14395] Num frames 81000... +[2024-11-07 16:53:02,640][14395] Num frames 81100... +[2024-11-07 16:53:02,828][14395] Avg episode rewards: #0: 4.636, true rewards: #0: 4.106 +[2024-11-07 16:53:02,832][14395] Avg episode reward: 4.636, avg true_objective: 4.106 +[2024-11-07 16:53:02,978][14395] Num frames 81200... +[2024-11-07 16:53:03,224][14395] Num frames 81300... +[2024-11-07 16:53:03,473][14395] Num frames 81400... +[2024-11-07 16:53:03,577][14395] Avg episode rewards: #0: 4.594, true rewards: #0: 4.084 +[2024-11-07 16:53:03,583][14395] Avg episode reward: 4.594, avg true_objective: 4.084 +[2024-11-07 16:53:03,796][14395] Num frames 81500... +[2024-11-07 16:53:04,031][14395] Num frames 81600... +[2024-11-07 16:53:04,270][14395] Num frames 81700... +[2024-11-07 16:53:04,526][14395] Num frames 81800... +[2024-11-07 16:53:04,674][14395] Avg episode rewards: #0: 4.607, true rewards: #0: 4.087 +[2024-11-07 16:53:04,679][14395] Avg episode reward: 4.607, avg true_objective: 4.087 +[2024-11-07 16:53:04,863][14395] Num frames 81900... +[2024-11-07 16:53:05,124][14395] Num frames 82000... +[2024-11-07 16:53:05,364][14395] Num frames 82100... +[2024-11-07 16:53:05,602][14395] Num frames 82200... +[2024-11-07 16:53:05,699][14395] Avg episode rewards: #0: 4.607, true rewards: #0: 4.087 +[2024-11-07 16:53:05,703][14395] Avg episode reward: 4.607, avg true_objective: 4.087 +[2024-11-07 16:53:05,870][14395] Num frames 82300... +[2024-11-07 16:53:06,111][14395] Num frames 82400... +[2024-11-07 16:53:06,342][14395] Num frames 82500... +[2024-11-07 16:53:06,586][14395] Num frames 82600... +[2024-11-07 16:53:06,798][14395] Avg episode rewards: #0: 4.611, true rewards: #0: 4.091 +[2024-11-07 16:53:06,802][14395] Avg episode reward: 4.611, avg true_objective: 4.091 +[2024-11-07 16:53:06,894][14395] Num frames 82700... +[2024-11-07 16:53:07,150][14395] Num frames 82800... +[2024-11-07 16:53:07,499][14395] Num frames 82900... +[2024-11-07 16:53:07,753][14395] Num frames 83000... +[2024-11-07 16:53:07,940][14395] Avg episode rewards: #0: 4.611, true rewards: #0: 4.091 +[2024-11-07 16:53:07,941][14395] Avg episode reward: 4.611, avg true_objective: 4.091 +[2024-11-07 16:53:08,095][14395] Num frames 83100... +[2024-11-07 16:53:08,385][14395] Num frames 83200... +[2024-11-07 16:53:08,623][14395] Num frames 83300... +[2024-11-07 16:53:08,899][14395] Num frames 83400... +[2024-11-07 16:53:09,138][14395] Avg episode rewards: #0: 4.571, true rewards: #0: 4.071 +[2024-11-07 16:53:09,142][14395] Avg episode reward: 4.571, avg true_objective: 4.071 +[2024-11-07 16:53:09,242][14395] Num frames 83500... +[2024-11-07 16:53:09,492][14395] Num frames 83600... +[2024-11-07 16:53:09,736][14395] Num frames 83700... +[2024-11-07 16:53:09,989][14395] Num frames 83800... +[2024-11-07 16:53:10,269][14395] Avg episode rewards: #0: 4.585, true rewards: #0: 4.075 +[2024-11-07 16:53:10,270][14395] Avg episode reward: 4.585, avg true_objective: 4.075 +[2024-11-07 16:53:10,320][14395] Num frames 83900... +[2024-11-07 16:53:10,600][14395] Num frames 84000... +[2024-11-07 16:53:10,885][14395] Num frames 84100... +[2024-11-07 16:53:11,171][14395] Num frames 84200... +[2024-11-07 16:53:11,428][14395] Avg episode rewards: #0: 4.585, true rewards: #0: 4.075 +[2024-11-07 16:53:11,430][14395] Avg episode reward: 4.585, avg true_objective: 4.075 +[2024-11-07 16:53:11,519][14395] Num frames 84300... +[2024-11-07 16:53:11,807][14395] Num frames 84400... +[2024-11-07 16:53:12,114][14395] Num frames 84500... +[2024-11-07 16:53:12,422][14395] Num frames 84600... +[2024-11-07 16:53:12,623][14395] Avg episode rewards: #0: 4.597, true rewards: #0: 4.087 +[2024-11-07 16:53:12,624][14395] Avg episode reward: 4.597, avg true_objective: 4.087 +[2024-11-07 16:53:12,745][14395] Num frames 84700... +[2024-11-07 16:53:13,023][14395] Num frames 84800... +[2024-11-07 16:53:13,372][14395] Num frames 84900... +[2024-11-07 16:53:13,714][14395] Num frames 85000... +[2024-11-07 16:53:13,885][14395] Avg episode rewards: #0: 4.597, true rewards: #0: 4.087 +[2024-11-07 16:53:13,886][14395] Avg episode reward: 4.597, avg true_objective: 4.087 +[2024-11-07 16:53:14,058][14395] Num frames 85100... +[2024-11-07 16:53:14,315][14395] Num frames 85200... +[2024-11-07 16:53:14,631][14395] Avg episode rewards: #0: 4.585, true rewards: #0: 4.075 +[2024-11-07 16:53:14,632][14395] Avg episode reward: 4.585, avg true_objective: 4.075 +[2024-11-07 16:53:14,659][14395] Num frames 85300... +[2024-11-07 16:53:14,931][14395] Num frames 85400... +[2024-11-07 16:53:15,253][14395] Num frames 85500... +[2024-11-07 16:53:15,505][14395] Num frames 85600... +[2024-11-07 16:53:15,900][14395] Avg episode rewards: #0: 4.568, true rewards: #0: 4.068 +[2024-11-07 16:53:15,902][14395] Avg episode reward: 4.568, avg true_objective: 4.068 +[2024-11-07 16:53:16,082][14395] Num frames 85700... +[2024-11-07 16:53:16,926][14395] Num frames 85800... +[2024-11-07 16:53:17,377][14395] Num frames 85900... +[2024-11-07 16:53:17,611][14395] Num frames 86000... +[2024-11-07 16:53:17,716][14395] Avg episode rewards: #0: 4.559, true rewards: #0: 4.059 +[2024-11-07 16:53:17,721][14395] Avg episode reward: 4.559, avg true_objective: 4.059 +[2024-11-07 16:53:17,894][14395] Num frames 86100... +[2024-11-07 16:53:18,234][14395] Num frames 86200... +[2024-11-07 16:53:18,487][14395] Num frames 86300... +[2024-11-07 16:53:18,741][14395] Num frames 86400... +[2024-11-07 16:53:18,827][14395] Avg episode rewards: #0: 4.559, true rewards: #0: 4.059 +[2024-11-07 16:53:18,828][14395] Avg episode reward: 4.559, avg true_objective: 4.059 +[2024-11-07 16:53:19,094][14395] Num frames 86500... +[2024-11-07 16:53:19,350][14395] Num frames 86600... +[2024-11-07 16:53:19,575][14395] Num frames 86700... +[2024-11-07 16:53:19,948][14395] Avg episode rewards: #0: 4.559, true rewards: #0: 4.059 +[2024-11-07 16:53:19,949][14395] Avg episode reward: 4.559, avg true_objective: 4.059 +[2024-11-07 16:53:19,969][14395] Num frames 86800... +[2024-11-07 16:53:20,265][14395] Num frames 86900... +[2024-11-07 16:53:20,532][14395] Num frames 87000... +[2024-11-07 16:53:22,869][14395] Num frames 87100... +[2024-11-07 16:53:23,154][14395] Num frames 87200... +[2024-11-07 16:53:23,324][14395] Avg episode rewards: #0: 4.575, true rewards: #0: 4.065 +[2024-11-07 16:53:23,327][14395] Avg episode reward: 4.575, avg true_objective: 4.065 +[2024-11-07 16:53:23,482][14395] Num frames 87300... +[2024-11-07 16:53:23,784][14395] Num frames 87400... +[2024-11-07 16:53:24,057][14395] Num frames 87500... +[2024-11-07 16:53:24,273][14395] Num frames 87600... +[2024-11-07 16:53:24,527][14395] Num frames 87700... +[2024-11-07 16:53:24,810][14395] Avg episode rewards: #0: 4.611, true rewards: #0: 4.081 +[2024-11-07 16:53:24,813][14395] Avg episode reward: 4.611, avg true_objective: 4.081 +[2024-11-07 16:53:24,876][14395] Num frames 87800... +[2024-11-07 16:53:25,156][14395] Num frames 87900... +[2024-11-07 16:53:25,453][14395] Num frames 88000... +[2024-11-07 16:53:25,719][14395] Num frames 88100... +[2024-11-07 16:53:25,954][14395] Avg episode rewards: #0: 4.611, true rewards: #0: 4.081 +[2024-11-07 16:53:25,957][14395] Avg episode reward: 4.611, avg true_objective: 4.081 +[2024-11-07 16:53:26,054][14395] Num frames 88200... +[2024-11-07 16:53:26,314][14395] Num frames 88300... +[2024-11-07 16:53:26,611][14395] Num frames 88400... +[2024-11-07 16:53:26,894][14395] Num frames 88500... +[2024-11-07 16:53:27,108][14395] Avg episode rewards: #0: 4.611, true rewards: #0: 4.081 +[2024-11-07 16:53:27,112][14395] Avg episode reward: 4.611, avg true_objective: 4.081 +[2024-11-07 16:53:27,256][14395] Num frames 88600... +[2024-11-07 16:53:27,531][14395] Num frames 88700... +[2024-11-07 16:53:27,803][14395] Num frames 88800... +[2024-11-07 16:53:28,044][14395] Num frames 88900... +[2024-11-07 16:53:28,106][14395] Avg episode rewards: #0: 4.608, true rewards: #0: 4.077 +[2024-11-07 16:53:28,107][14395] Avg episode reward: 4.608, avg true_objective: 4.077 +[2024-11-07 16:53:28,403][14395] Num frames 89000... +[2024-11-07 16:53:28,693][14395] Num frames 89100... +[2024-11-07 16:53:28,966][14395] Num frames 89200... +[2024-11-07 16:53:29,220][14395] Num frames 89300... +[2024-11-07 16:53:29,412][14395] Avg episode rewards: #0: 4.624, true rewards: #0: 4.084 +[2024-11-07 16:53:29,420][14395] Avg episode reward: 4.624, avg true_objective: 4.084 +[2024-11-07 16:53:29,557][14395] Num frames 89400... +[2024-11-07 16:53:29,832][14395] Num frames 89500... +[2024-11-07 16:53:30,094][14395] Num frames 89600... +[2024-11-07 16:53:30,342][14395] Num frames 89700... +[2024-11-07 16:53:30,619][14395] Num frames 89800... +[2024-11-07 16:53:30,765][14395] Avg episode rewards: #0: 4.644, true rewards: #0: 4.093 +[2024-11-07 16:53:30,770][14395] Avg episode reward: 4.644, avg true_objective: 4.093 +[2024-11-07 16:53:30,985][14395] Num frames 89900... +[2024-11-07 16:53:31,267][14395] Num frames 90000... +[2024-11-07 16:53:31,549][14395] Num frames 90100... +[2024-11-07 16:53:31,847][14395] Num frames 90200... +[2024-11-07 16:53:32,124][14395] Avg episode rewards: #0: 4.647, true rewards: #0: 4.097 +[2024-11-07 16:53:32,131][14395] Avg episode reward: 4.647, avg true_objective: 4.097 +[2024-11-07 16:53:32,202][14395] Num frames 90300... +[2024-11-07 16:53:32,497][14395] Num frames 90400... +[2024-11-07 16:53:32,758][14395] Num frames 90500... +[2024-11-07 16:53:33,014][14395] Num frames 90600... +[2024-11-07 16:53:33,311][14395] Avg episode rewards: #0: 4.660, true rewards: #0: 4.100 +[2024-11-07 16:53:33,313][14395] Avg episode reward: 4.660, avg true_objective: 4.100 +[2024-11-07 16:53:33,329][14395] Num frames 90700... +[2024-11-07 16:53:33,609][14395] Num frames 90800... +[2024-11-07 16:53:33,893][14395] Num frames 90900... +[2024-11-07 16:53:34,158][14395] Num frames 91000... +[2024-11-07 16:53:34,437][14395] Num frames 91100... +[2024-11-07 16:53:34,618][14395] Avg episode rewards: #0: 4.676, true rewards: #0: 4.106 +[2024-11-07 16:53:34,623][14395] Avg episode reward: 4.676, avg true_objective: 4.106 +[2024-11-07 16:53:34,802][14395] Num frames 91200... +[2024-11-07 16:53:35,079][14395] Num frames 91300... +[2024-11-07 16:53:35,360][14395] Num frames 91400... +[2024-11-07 16:53:35,644][14395] Num frames 91500... +[2024-11-07 16:53:35,788][14395] Avg episode rewards: #0: 4.660, true rewards: #0: 4.100 +[2024-11-07 16:53:35,792][14395] Avg episode reward: 4.660, avg true_objective: 4.100 +[2024-11-07 16:53:36,033][14395] Num frames 91600... +[2024-11-07 16:53:36,318][14395] Num frames 91700... +[2024-11-07 16:53:36,612][14395] Avg episode rewards: #0: 4.631, true rewards: #0: 4.081 +[2024-11-07 16:53:36,618][14395] Avg episode reward: 4.631, avg true_objective: 4.081 +[2024-11-07 16:53:36,682][14395] Num frames 91800... +[2024-11-07 16:53:36,939][14395] Num frames 91900... +[2024-11-07 16:53:37,220][14395] Num frames 92000... +[2024-11-07 16:53:37,540][14395] Num frames 92100... +[2024-11-07 16:53:37,823][14395] Num frames 92200... +[2024-11-07 16:53:37,974][14395] Avg episode rewards: #0: 4.647, true rewards: #0: 4.087 +[2024-11-07 16:53:37,976][14395] Avg episode reward: 4.647, avg true_objective: 4.087 +[2024-11-07 16:53:38,165][14395] Num frames 92300... +[2024-11-07 16:53:38,457][14395] Num frames 92400... +[2024-11-07 16:53:38,719][14395] Num frames 92500... +[2024-11-07 16:53:38,998][14395] Num frames 92600... +[2024-11-07 16:53:39,091][14395] Avg episode rewards: #0: 4.647, true rewards: #0: 4.087 +[2024-11-07 16:53:39,093][14395] Avg episode reward: 4.647, avg true_objective: 4.087 +[2024-11-07 16:53:39,339][14395] Num frames 92700... +[2024-11-07 16:53:39,641][14395] Num frames 92800... +[2024-11-07 16:53:39,954][14395] Num frames 92900... +[2024-11-07 16:53:40,299][14395] Avg episode rewards: #0: 4.640, true rewards: #0: 4.090 +[2024-11-07 16:53:40,304][14395] Avg episode reward: 4.640, avg true_objective: 4.090 +[2024-11-07 16:53:40,319][14395] Num frames 93000... +[2024-11-07 16:53:40,631][14395] Num frames 93100... +[2024-11-07 16:53:40,937][14395] Num frames 93200... +[2024-11-07 16:53:41,217][14395] Num frames 93300... +[2024-11-07 16:53:41,506][14395] Num frames 93400... +[2024-11-07 16:53:41,694][14395] Avg episode rewards: #0: 4.657, true rewards: #0: 4.097 +[2024-11-07 16:53:41,699][14395] Avg episode reward: 4.657, avg true_objective: 4.097 +[2024-11-07 16:53:41,925][14395] Num frames 93500... +[2024-11-07 16:53:42,238][14395] Num frames 93600... +[2024-11-07 16:53:42,750][14395] Num frames 93700... +[2024-11-07 16:53:43,064][14395] Num frames 93800... +[2024-11-07 16:53:43,213][14395] Avg episode rewards: #0: 4.657, true rewards: #0: 4.097 +[2024-11-07 16:53:43,215][14395] Avg episode reward: 4.657, avg true_objective: 4.097 +[2024-11-07 16:53:43,518][14395] Num frames 93900... +[2024-11-07 16:53:43,906][14395] Num frames 94000... +[2024-11-07 16:53:44,237][14395] Num frames 94100... +[2024-11-07 16:53:44,525][14395] Num frames 94200... +[2024-11-07 16:53:44,633][14395] Avg episode rewards: #0: 4.640, true rewards: #0: 4.090 +[2024-11-07 16:53:44,634][14395] Avg episode reward: 4.640, avg true_objective: 4.090 +[2024-11-07 16:53:44,853][14395] Num frames 94300... +[2024-11-07 16:53:45,052][14395] Num frames 94400... +[2024-11-07 16:53:45,301][14395] Num frames 94500... +[2024-11-07 16:53:45,541][14395] Num frames 94600... +[2024-11-07 16:53:45,755][14395] Avg episode rewards: #0: 4.657, true rewards: #0: 4.097 +[2024-11-07 16:53:45,757][14395] Avg episode reward: 4.657, avg true_objective: 4.097 +[2024-11-07 16:53:45,886][14395] Num frames 94700... +[2024-11-07 16:53:46,227][14395] Num frames 94800... +[2024-11-07 16:53:46,506][14395] Num frames 94900... +[2024-11-07 16:53:46,890][14395] Num frames 95000... +[2024-11-07 16:53:47,094][14395] Avg episode rewards: #0: 4.640, true rewards: #0: 4.090 +[2024-11-07 16:53:47,098][14395] Avg episode reward: 4.640, avg true_objective: 4.090 +[2024-11-07 16:53:47,260][14395] Num frames 95100... +[2024-11-07 16:53:47,526][14395] Num frames 95200... +[2024-11-07 16:53:47,805][14395] Num frames 95300... +[2024-11-07 16:53:48,070][14395] Num frames 95400... +[2024-11-07 16:53:48,206][14395] Avg episode rewards: #0: 4.640, true rewards: #0: 4.100 +[2024-11-07 16:53:48,210][14395] Avg episode reward: 4.640, avg true_objective: 4.100 +[2024-11-07 16:53:48,412][14395] Num frames 95500... +[2024-11-07 16:53:48,663][14395] Num frames 95600... +[2024-11-07 16:53:48,928][14395] Num frames 95700... +[2024-11-07 16:53:49,167][14395] Num frames 95800... +[2024-11-07 16:53:49,551][14395] Num frames 95900... +[2024-11-07 16:53:49,622][14395] Avg episode rewards: #0: 4.653, true rewards: #0: 4.102 +[2024-11-07 16:53:49,626][14395] Avg episode reward: 4.653, avg true_objective: 4.102 +[2024-11-07 16:53:49,893][14395] Num frames 96000... +[2024-11-07 16:53:50,159][14395] Num frames 96100... +[2024-11-07 16:53:50,409][14395] Num frames 96200... +[2024-11-07 16:53:50,649][14395] Avg episode rewards: #0: 4.653, true rewards: #0: 4.102 +[2024-11-07 16:53:50,650][14395] Avg episode reward: 4.653, avg true_objective: 4.102 +[2024-11-07 16:53:50,674][14395] Num frames 96300... +[2024-11-07 16:53:50,877][14395] Num frames 96400... +[2024-11-07 16:53:51,078][14395] Num frames 96500... +[2024-11-07 16:53:51,284][14395] Num frames 96600... +[2024-11-07 16:53:51,497][14395] Avg episode rewards: #0: 4.587, true rewards: #0: 4.077 +[2024-11-07 16:53:51,501][14395] Avg episode reward: 4.587, avg true_objective: 4.077 +[2024-11-07 16:53:51,572][14395] Num frames 96700... +[2024-11-07 16:53:51,801][14395] Num frames 96800... +[2024-11-07 16:53:52,011][14395] Num frames 96900... +[2024-11-07 16:53:52,230][14395] Num frames 97000... +[2024-11-07 16:53:52,428][14395] Avg episode rewards: #0: 4.587, true rewards: #0: 4.077 +[2024-11-07 16:53:52,429][14395] Avg episode reward: 4.587, avg true_objective: 4.077 +[2024-11-07 16:53:52,514][14395] Num frames 97100... +[2024-11-07 16:53:52,730][14395] Num frames 97200... +[2024-11-07 16:53:52,989][14395] Num frames 97300... +[2024-11-07 16:53:53,197][14395] Num frames 97400... +[2024-11-07 16:53:53,399][14395] Num frames 97500... +[2024-11-07 16:53:53,468][14395] Avg episode rewards: #0: 4.603, true rewards: #0: 4.083 +[2024-11-07 16:53:53,473][14395] Avg episode reward: 4.603, avg true_objective: 4.083 +[2024-11-07 16:53:53,689][14395] Num frames 97600... +[2024-11-07 16:53:53,891][14395] Num frames 97700... +[2024-11-07 16:53:54,082][14395] Num frames 97800... +[2024-11-07 16:53:54,367][14395] Avg episode rewards: #0: 4.590, true rewards: #0: 4.080 +[2024-11-07 16:53:54,368][14395] Avg episode reward: 4.590, avg true_objective: 4.080 +[2024-11-07 16:53:54,398][14395] Num frames 97900... +[2024-11-07 16:53:54,685][14395] Num frames 98000... +[2024-11-07 16:53:54,944][14395] Num frames 98100... +[2024-11-07 16:53:57,242][14395] Num frames 98200... +[2024-11-07 16:53:57,491][14395] Avg episode rewards: #0: 4.574, true rewards: #0: 4.074 +[2024-11-07 16:53:57,493][14395] Avg episode reward: 4.574, avg true_objective: 4.074 +[2024-11-07 16:53:57,575][14395] Num frames 98300... +[2024-11-07 16:53:57,842][14395] Num frames 98400... +[2024-11-07 16:53:58,072][14395] Num frames 98500... +[2024-11-07 16:53:58,312][14395] Num frames 98600... +[2024-11-07 16:53:58,512][14395] Avg episode rewards: #0: 4.574, true rewards: #0: 4.074 +[2024-11-07 16:53:58,514][14395] Avg episode reward: 4.574, avg true_objective: 4.074 +[2024-11-07 16:53:58,599][14395] Num frames 98700... +[2024-11-07 16:53:58,826][14395] Num frames 98800... +[2024-11-07 16:53:59,015][14395] Num frames 98900... +[2024-11-07 16:53:59,209][14395] Num frames 99000... +[2024-11-07 16:53:59,377][14395] Avg episode rewards: #0: 4.574, true rewards: #0: 4.074 +[2024-11-07 16:53:59,379][14395] Avg episode reward: 4.574, avg true_objective: 4.074 +[2024-11-07 16:53:59,500][14395] Num frames 99100... +[2024-11-07 16:53:59,704][14395] Num frames 99200... +[2024-11-07 16:53:59,941][14395] Num frames 99300... +[2024-11-07 16:54:00,149][14395] Num frames 99400... +[2024-11-07 16:54:00,258][14395] Avg episode rewards: #0: 4.557, true rewards: #0: 4.067 +[2024-11-07 16:54:00,259][14395] Avg episode reward: 4.557, avg true_objective: 4.067 +[2024-11-07 16:54:00,417][14395] Num frames 99500... +[2024-11-07 16:54:00,634][14395] Num frames 99600... +[2024-11-07 16:54:00,831][14395] Num frames 99700... +[2024-11-07 16:54:01,048][14395] Num frames 99800... +[2024-11-07 16:54:01,129][14395] Avg episode rewards: #0: 4.541, true rewards: #0: 4.061 +[2024-11-07 16:54:01,131][14395] Avg episode reward: 4.541, avg true_objective: 4.061 +[2024-11-07 16:54:01,313][14395] Num frames 99900... +[2024-11-07 16:54:01,510][14395] Num frames 100000... +[2024-11-07 16:54:01,701][14395] Num frames 100100... +[2024-11-07 16:54:01,946][14395] Avg episode rewards: #0: 4.541, true rewards: #0: 4.061 +[2024-11-07 16:54:01,947][14395] Avg episode reward: 4.541, avg true_objective: 4.061 +[2024-11-07 16:54:01,961][14395] Num frames 100200... +[2024-11-07 16:54:02,157][14395] Num frames 100300... +[2024-11-07 16:54:02,475][14395] Num frames 100400... +[2024-11-07 16:54:02,788][14395] Num frames 100500... +[2024-11-07 16:54:03,112][14395] Avg episode rewards: #0: 4.541, true rewards: #0: 4.061 +[2024-11-07 16:54:03,113][14395] Avg episode reward: 4.541, avg true_objective: 4.061 +[2024-11-07 16:54:03,231][14395] Num frames 100600... +[2024-11-07 16:54:03,495][14395] Num frames 100700... +[2024-11-07 16:54:03,720][14395] Num frames 100800... +[2024-11-07 16:54:03,911][14395] Num frames 100900... +[2024-11-07 16:54:04,142][14395] Avg episode rewards: #0: 4.567, true rewards: #0: 4.077 +[2024-11-07 16:54:04,145][14395] Avg episode reward: 4.567, avg true_objective: 4.077 +[2024-11-07 16:54:04,177][14395] Num frames 101000... +[2024-11-07 16:54:04,373][14395] Num frames 101100... +[2024-11-07 16:54:04,556][14395] Num frames 101200... +[2024-11-07 16:54:04,748][14395] Num frames 101300... +[2024-11-07 16:54:04,955][14395] Num frames 101400... +[2024-11-07 16:54:05,094][14395] Avg episode rewards: #0: 4.583, true rewards: #0: 4.083 +[2024-11-07 16:54:05,099][14395] Avg episode reward: 4.583, avg true_objective: 4.083 +[2024-11-07 16:54:05,246][14395] Num frames 101500... +[2024-11-07 16:54:05,435][14395] Num frames 101600... +[2024-11-07 16:54:05,618][14395] Num frames 101700... +[2024-11-07 16:54:05,806][14395] Num frames 101800... +[2024-11-07 16:54:05,909][14395] Avg episode rewards: #0: 4.567, true rewards: #0: 4.077 +[2024-11-07 16:54:05,911][14395] Avg episode reward: 4.567, avg true_objective: 4.077 +[2024-11-07 16:54:06,061][14395] Num frames 101900... +[2024-11-07 16:54:06,249][14395] Num frames 102000... +[2024-11-07 16:54:06,452][14395] Num frames 102100... +[2024-11-07 16:54:06,636][14395] Num frames 102200... +[2024-11-07 16:54:06,825][14395] Num frames 102300... +[2024-11-07 16:54:07,032][14395] Avg episode rewards: #0: 4.603, true rewards: #0: 4.093 +[2024-11-07 16:54:07,034][14395] Avg episode reward: 4.603, avg true_objective: 4.093 +[2024-11-07 16:54:07,118][14395] Num frames 102400... +[2024-11-07 16:54:07,348][14395] Num frames 102500... +[2024-11-07 16:54:07,540][14395] Num frames 102600... +[2024-11-07 16:54:07,726][14395] Num frames 102700... +[2024-11-07 16:54:07,885][14395] Avg episode rewards: #0: 4.603, true rewards: #0: 4.093 +[2024-11-07 16:54:07,890][14395] Avg episode reward: 4.603, avg true_objective: 4.093 +[2024-11-07 16:54:07,994][14395] Num frames 102800... +[2024-11-07 16:54:08,202][14395] Num frames 102900... +[2024-11-07 16:54:08,389][14395] Num frames 103000... +[2024-11-07 16:54:08,571][14395] Num frames 103100... +[2024-11-07 16:54:08,766][14395] Num frames 103200... +[2024-11-07 16:54:08,825][14395] Avg episode rewards: #0: 4.603, true rewards: #0: 4.093 +[2024-11-07 16:54:08,830][14395] Avg episode reward: 4.603, avg true_objective: 4.093 +[2024-11-07 16:54:09,076][14395] Num frames 103300... +[2024-11-07 16:54:09,316][14395] Num frames 103400... +[2024-11-07 16:54:09,537][14395] Num frames 103500... +[2024-11-07 16:54:09,762][14395] Num frames 103600... +[2024-11-07 16:54:09,863][14395] Avg episode rewards: #0: 4.606, true rewards: #0: 4.096 +[2024-11-07 16:54:09,868][14395] Avg episode reward: 4.606, avg true_objective: 4.096 +[2024-11-07 16:54:10,068][14395] Num frames 103700... +[2024-11-07 16:54:10,300][14395] Num frames 103800... +[2024-11-07 16:54:10,919][14395] Num frames 103900... +[2024-11-07 16:54:11,144][14395] Num frames 104000... +[2024-11-07 16:54:12,461][14395] Avg episode rewards: #0: 4.606, true rewards: #0: 4.096 +[2024-11-07 16:54:12,465][14395] Avg episode reward: 4.606, avg true_objective: 4.096 +[2024-11-07 16:54:13,690][14395] Num frames 104100... +[2024-11-07 16:54:13,912][14395] Num frames 104200... +[2024-11-07 16:54:14,125][14395] Num frames 104300... +[2024-11-07 16:54:14,376][14395] Avg episode rewards: #0: 4.606, true rewards: #0: 4.096 +[2024-11-07 16:54:14,382][14395] Avg episode reward: 4.606, avg true_objective: 4.096 +[2024-11-07 16:54:14,439][14395] Num frames 104400... +[2024-11-07 16:54:14,662][14395] Num frames 104500... +[2024-11-07 16:54:14,876][14395] Num frames 104600... +[2024-11-07 16:54:15,117][14395] Num frames 104700... +[2024-11-07 16:54:15,380][14395] Avg episode rewards: #0: 4.606, true rewards: #0: 4.096 +[2024-11-07 16:54:15,382][14395] Avg episode reward: 4.606, avg true_objective: 4.096 +[2024-11-07 16:54:15,454][14395] Num frames 104800... +[2024-11-07 16:54:15,708][14395] Num frames 104900... +[2024-11-07 16:54:15,945][14395] Num frames 105000... +[2024-11-07 16:54:16,206][14395] Num frames 105100... +[2024-11-07 16:54:16,395][14395] Avg episode rewards: #0: 4.606, true rewards: #0: 4.096 +[2024-11-07 16:54:16,396][14395] Avg episode reward: 4.606, avg true_objective: 4.096 +[2024-11-07 16:54:16,512][14395] Num frames 105200... +[2024-11-07 16:54:16,811][14395] Num frames 105300... +[2024-11-07 16:54:17,063][14395] Num frames 105400... +[2024-11-07 16:54:17,308][14395] Num frames 105500... +[2024-11-07 16:54:17,506][14395] Avg episode rewards: #0: 4.609, true rewards: #0: 4.099 +[2024-11-07 16:54:17,508][14395] Avg episode reward: 4.609, avg true_objective: 4.099 +[2024-11-07 16:54:17,580][14395] Num frames 105600... +[2024-11-07 16:54:17,807][14395] Num frames 105700... +[2024-11-07 16:54:18,016][14395] Num frames 105800... +[2024-11-07 16:54:18,212][14395] Num frames 105900... +[2024-11-07 16:54:18,384][14395] Avg episode rewards: #0: 4.593, true rewards: #0: 4.093 +[2024-11-07 16:54:18,385][14395] Avg episode reward: 4.593, avg true_objective: 4.093 +[2024-11-07 16:54:18,495][14395] Num frames 106000... +[2024-11-07 16:54:18,762][14395] Num frames 106100... +[2024-11-07 16:54:18,997][14395] Num frames 106200... +[2024-11-07 16:54:19,191][14395] Num frames 106300... +[2024-11-07 16:54:19,391][14395] Avg episode rewards: #0: 4.606, true rewards: #0: 4.096 +[2024-11-07 16:54:19,394][14395] Avg episode reward: 4.606, avg true_objective: 4.096 +[2024-11-07 16:54:19,475][14395] Num frames 106400... +[2024-11-07 16:54:19,690][14395] Num frames 106500... +[2024-11-07 16:54:19,890][14395] Num frames 106600... +[2024-11-07 16:54:20,097][14395] Num frames 106700... +[2024-11-07 16:54:20,270][14395] Avg episode rewards: #0: 4.606, true rewards: #0: 4.096 +[2024-11-07 16:54:20,274][14395] Avg episode reward: 4.606, avg true_objective: 4.096 +[2024-11-07 16:54:20,382][14395] Num frames 106800... +[2024-11-07 16:54:20,575][14395] Num frames 106900... +[2024-11-07 16:54:20,767][14395] Num frames 107000... +[2024-11-07 16:54:20,956][14395] Num frames 107100... +[2024-11-07 16:54:21,084][14395] Avg episode rewards: #0: 4.606, true rewards: #0: 4.096 +[2024-11-07 16:54:21,087][14395] Avg episode reward: 4.606, avg true_objective: 4.096 +[2024-11-07 16:54:21,243][14395] Num frames 107200... +[2024-11-07 16:54:21,444][14395] Num frames 107300... +[2024-11-07 16:54:21,650][14395] Num frames 107400... +[2024-11-07 16:54:21,850][14395] Num frames 107500... +[2024-11-07 16:54:22,073][14395] Avg episode rewards: #0: 4.623, true rewards: #0: 4.102 +[2024-11-07 16:54:22,077][14395] Avg episode reward: 4.623, avg true_objective: 4.102 +[2024-11-07 16:54:22,124][14395] Num frames 107600... +[2024-11-07 16:54:22,336][14395] Num frames 107700... +[2024-11-07 16:54:22,548][14395] Num frames 107800... +[2024-11-07 16:54:22,755][14395] Num frames 107900... +[2024-11-07 16:54:25,524][14395] Avg episode rewards: #0: 4.606, true rewards: #0: 4.096 +[2024-11-07 16:54:25,577][14395] Avg episode reward: 4.606, avg true_objective: 4.096 +[2024-11-07 16:54:25,671][14395] Num frames 108000... +[2024-11-07 16:54:25,932][14395] Num frames 108100... +[2024-11-07 16:54:26,159][14395] Num frames 108200... +[2024-11-07 16:54:26,388][14395] Num frames 108300... +[2024-11-07 16:54:26,758][14395] Avg episode rewards: #0: 4.606, true rewards: #0: 4.096 +[2024-11-07 16:54:26,762][14395] Avg episode reward: 4.606, avg true_objective: 4.096 +[2024-11-07 16:54:26,880][14395] Num frames 108400... +[2024-11-07 16:54:27,121][14395] Num frames 108500... +[2024-11-07 16:54:27,360][14395] Num frames 108600... +[2024-11-07 16:54:27,593][14395] Num frames 108700... +[2024-11-07 16:54:27,819][14395] Num frames 108800... +[2024-11-07 16:54:27,879][14395] Avg episode rewards: #0: 4.570, true rewards: #0: 4.080 +[2024-11-07 16:54:27,880][14395] Avg episode reward: 4.570, avg true_objective: 4.080 +[2024-11-07 16:54:28,107][14395] Num frames 108900... +[2024-11-07 16:54:28,362][14395] Num frames 109000... +[2024-11-07 16:54:28,580][14395] Num frames 109100... +[2024-11-07 16:54:28,843][14395] Avg episode rewards: #0: 4.570, true rewards: #0: 4.080 +[2024-11-07 16:54:28,846][14395] Avg episode reward: 4.570, avg true_objective: 4.080 +[2024-11-07 16:54:28,887][14395] Num frames 109200... +[2024-11-07 16:54:29,116][14395] Num frames 109300... +[2024-11-07 16:54:29,327][14395] Num frames 109400... +[2024-11-07 16:54:29,558][14395] Num frames 109500... +[2024-11-07 16:54:29,760][14395] Avg episode rewards: #0: 4.554, true rewards: #0: 4.074 +[2024-11-07 16:54:29,768][14395] Avg episode reward: 4.554, avg true_objective: 4.074 +[2024-11-07 16:54:29,853][14395] Num frames 109600... +[2024-11-07 16:54:30,080][14395] Num frames 109700... +[2024-11-07 16:54:30,315][14395] Num frames 109800... +[2024-11-07 16:54:30,547][14395] Num frames 109900... +[2024-11-07 16:54:30,723][14395] Avg episode rewards: #0: 4.567, true rewards: #0: 4.086 +[2024-11-07 16:54:30,725][14395] Avg episode reward: 4.567, avg true_objective: 4.086 +[2024-11-07 16:54:30,834][14395] Num frames 110000... +[2024-11-07 16:54:31,083][14395] Num frames 110100... +[2024-11-07 16:54:31,350][14395] Num frames 110200... +[2024-11-07 16:54:34,075][14395] Num frames 110300... +[2024-11-07 16:54:34,291][14395] Avg episode rewards: #0: 4.570, true rewards: #0: 4.090 +[2024-11-07 16:54:35,148][14395] Avg episode reward: 4.570, avg true_objective: 4.090 +[2024-11-07 16:54:35,216][14395] Num frames 110400... +[2024-11-07 16:54:35,428][14395] Num frames 110500... +[2024-11-07 16:54:35,638][14395] Num frames 110600... +[2024-11-07 16:54:35,843][14395] Num frames 110700... +[2024-11-07 16:54:36,016][14395] Avg episode rewards: #0: 4.570, true rewards: #0: 4.090 +[2024-11-07 16:54:36,017][14395] Avg episode reward: 4.570, avg true_objective: 4.090 +[2024-11-07 16:54:36,106][14395] Num frames 110800... +[2024-11-07 16:54:36,329][14395] Num frames 110900... +[2024-11-07 16:54:36,546][14395] Num frames 111000... +[2024-11-07 16:54:36,754][14395] Num frames 111100... +[2024-11-07 16:54:36,968][14395] Num frames 111200... +[2024-11-07 16:54:37,030][14395] Avg episode rewards: #0: 4.550, true rewards: #0: 4.080 +[2024-11-07 16:54:37,035][14395] Avg episode reward: 4.550, avg true_objective: 4.080 +[2024-11-07 16:54:37,256][14395] Num frames 111300... +[2024-11-07 16:54:37,478][14395] Num frames 111400... +[2024-11-07 16:54:37,692][14395] Num frames 111500... +[2024-11-07 16:54:37,929][14395] Avg episode rewards: #0: 4.550, true rewards: #0: 4.080 +[2024-11-07 16:54:37,933][14395] Avg episode reward: 4.550, avg true_objective: 4.080 +[2024-11-07 16:54:37,981][14395] Num frames 111600... +[2024-11-07 16:54:38,192][14395] Num frames 111700... +[2024-11-07 16:54:38,416][14395] Num frames 111800... +[2024-11-07 16:54:38,625][14395] Num frames 111900... +[2024-11-07 16:54:38,808][14395] Avg episode rewards: #0: 4.550, true rewards: #0: 4.080 +[2024-11-07 16:54:38,811][14395] Avg episode reward: 4.550, avg true_objective: 4.080 +[2024-11-07 16:54:38,878][14395] Num frames 112000... +[2024-11-07 16:54:39,064][14395] Num frames 112100... +[2024-11-07 16:54:39,281][14395] Num frames 112200... +[2024-11-07 16:54:39,467][14395] Num frames 112300... +[2024-11-07 16:54:39,622][14395] Avg episode rewards: #0: 4.550, true rewards: #0: 4.080 +[2024-11-07 16:54:39,625][14395] Avg episode reward: 4.550, avg true_objective: 4.080 +[2024-11-07 16:54:39,737][14395] Num frames 112400... +[2024-11-07 16:54:39,929][14395] Num frames 112500... +[2024-11-07 16:54:40,120][14395] Num frames 112600... +[2024-11-07 16:54:40,303][14395] Num frames 112700... +[2024-11-07 16:54:40,428][14395] Avg episode rewards: #0: 4.550, true rewards: #0: 4.080 +[2024-11-07 16:54:40,431][14395] Avg episode reward: 4.550, avg true_objective: 4.080 +[2024-11-07 16:54:41,822][14395] Num frames 112800... +[2024-11-07 16:54:42,020][14395] Num frames 112900... +[2024-11-07 16:54:42,219][14395] Num frames 113000... +[2024-11-07 16:54:42,413][14395] Num frames 113100... +[2024-11-07 16:54:43,483][14395] Avg episode rewards: #0: 4.567, true rewards: #0: 4.086 +[2024-11-07 16:54:43,485][14395] Avg episode reward: 4.567, avg true_objective: 4.086 +[2024-11-07 16:54:43,518][14395] Num frames 113200... +[2024-11-07 16:54:43,747][14395] Num frames 113300... +[2024-11-07 16:54:44,167][14395] Num frames 113400... +[2024-11-07 16:54:44,364][14395] Num frames 113500... +[2024-11-07 16:54:44,558][14395] Num frames 113600... +[2024-11-07 16:54:44,750][14395] Num frames 113700... +[2024-11-07 16:54:44,938][14395] Num frames 113800... +[2024-11-07 16:54:45,044][14395] Avg episode rewards: #0: 4.632, true rewards: #0: 4.112 +[2024-11-07 16:54:45,046][14395] Avg episode reward: 4.632, avg true_objective: 4.112 +[2024-11-07 16:54:45,215][14395] Num frames 113900... +[2024-11-07 16:54:45,410][14395] Num frames 114000... +[2024-11-07 16:54:45,593][14395] Num frames 114100... +[2024-11-07 16:54:45,807][14395] Num frames 114200... +[2024-11-07 16:54:45,947][14395] Avg episode rewards: #0: 4.645, true rewards: #0: 4.115 +[2024-11-07 16:54:45,951][14395] Avg episode reward: 4.645, avg true_objective: 4.115 +[2024-11-07 16:54:46,110][14395] Num frames 114300... +[2024-11-07 16:54:46,331][14395] Num frames 114400... +[2024-11-07 16:54:46,541][14395] Num frames 114500... +[2024-11-07 16:54:46,777][14395] Num frames 114600... +[2024-11-07 16:54:46,900][14395] Avg episode rewards: #0: 4.645, true rewards: #0: 4.115 +[2024-11-07 16:54:46,902][14395] Avg episode reward: 4.645, avg true_objective: 4.115 +[2024-11-07 16:54:47,070][14395] Num frames 114700... +[2024-11-07 16:54:47,247][14395] Num frames 114800... +[2024-11-07 16:54:47,437][14395] Num frames 114900... +[2024-11-07 16:54:47,634][14395] Num frames 115000... +[2024-11-07 16:54:47,710][14395] Avg episode rewards: #0: 4.645, true rewards: #0: 4.115 +[2024-11-07 16:54:47,712][14395] Avg episode reward: 4.645, avg true_objective: 4.115 +[2024-11-07 16:54:47,983][14395] Num frames 115100... +[2024-11-07 16:54:48,214][14395] Num frames 115200... +[2024-11-07 16:54:48,522][14395] Num frames 115300... +[2024-11-07 16:54:48,710][14395] Avg episode rewards: #0: 4.652, true rewards: #0: 4.112 +[2024-11-07 16:54:48,711][14395] Avg episode reward: 4.652, avg true_objective: 4.112 +[2024-11-07 16:54:48,789][14395] Num frames 115400... +[2024-11-07 16:54:49,043][14395] Num frames 115500... +[2024-11-07 16:54:49,253][14395] Num frames 115600... +[2024-11-07 16:54:49,469][14395] Num frames 115700... +[2024-11-07 16:54:49,600][14395] Avg episode rewards: #0: 4.603, true rewards: #0: 4.093 +[2024-11-07 16:54:49,604][14395] Avg episode reward: 4.603, avg true_objective: 4.093 +[2024-11-07 16:54:49,732][14395] Num frames 115800... +[2024-11-07 16:54:49,942][14395] Num frames 115900... +[2024-11-07 16:54:50,147][14395] Num frames 116000... +[2024-11-07 16:54:50,335][14395] Num frames 116100... +[2024-11-07 16:54:50,451][14395] Avg episode rewards: #0: 4.590, true rewards: #0: 4.090 +[2024-11-07 16:54:50,453][14395] Avg episode reward: 4.590, avg true_objective: 4.090 +[2024-11-07 16:54:50,606][14395] Num frames 116200... +[2024-11-07 16:54:50,845][14395] Num frames 116300... +[2024-11-07 16:54:51,082][14395] Num frames 116400... +[2024-11-07 16:54:51,376][14395] Num frames 116500... +[2024-11-07 16:54:51,583][14395] Avg episode rewards: #0: 4.570, true rewards: #0: 4.080 +[2024-11-07 16:54:51,587][14395] Avg episode reward: 4.570, avg true_objective: 4.080 +[2024-11-07 16:54:51,649][14395] Num frames 116600... +[2024-11-07 16:54:51,869][14395] Num frames 116700... +[2024-11-07 16:54:52,089][14395] Num frames 116800... +[2024-11-07 16:54:52,278][14395] Num frames 116900... +[2024-11-07 16:54:52,458][14395] Avg episode rewards: #0: 4.570, true rewards: #0: 4.080 +[2024-11-07 16:54:52,462][14395] Avg episode reward: 4.570, avg true_objective: 4.080 +[2024-11-07 16:54:52,553][14395] Num frames 117000... +[2024-11-07 16:54:52,753][14395] Num frames 117100... +[2024-11-07 16:54:52,951][14395] Num frames 117200... +[2024-11-07 16:54:53,143][14395] Num frames 117300... +[2024-11-07 16:54:53,296][14395] Avg episode rewards: #0: 4.518, true rewards: #0: 4.058 +[2024-11-07 16:54:53,300][14395] Avg episode reward: 4.518, avg true_objective: 4.058 +[2024-11-07 16:54:53,422][14395] Num frames 117400... +[2024-11-07 16:54:53,627][14395] Num frames 117500... +[2024-11-07 16:54:53,834][14395] Num frames 117600... +[2024-11-07 16:54:54,032][14395] Num frames 117700... +[2024-11-07 16:54:54,264][14395] Avg episode rewards: #0: 4.498, true rewards: #0: 4.048 +[2024-11-07 16:54:54,267][14395] Avg episode reward: 4.498, avg true_objective: 4.048 +[2024-11-07 16:54:54,297][14395] Num frames 117800... +[2024-11-07 16:54:54,484][14395] Num frames 117900... +[2024-11-07 16:54:54,691][14395] Num frames 118000... +[2024-11-07 16:54:54,899][14395] Num frames 118100... +[2024-11-07 16:54:55,115][14395] Avg episode rewards: #0: 4.485, true rewards: #0: 4.045 +[2024-11-07 16:54:55,118][14395] Avg episode reward: 4.485, avg true_objective: 4.045 +[2024-11-07 16:54:55,190][14395] Num frames 118200... +[2024-11-07 16:54:55,418][14395] Num frames 118300... +[2024-11-07 16:54:55,634][14395] Num frames 118400... +[2024-11-07 16:54:55,837][14395] Num frames 118500... +[2024-11-07 16:54:56,022][14395] Avg episode rewards: #0: 4.462, true rewards: #0: 4.042 +[2024-11-07 16:54:56,025][14395] Avg episode reward: 4.462, avg true_objective: 4.042 +[2024-11-07 16:54:56,127][14395] Num frames 118600... +[2024-11-07 16:54:56,326][14395] Num frames 118700... +[2024-11-07 16:54:56,531][14395] Num frames 118800... +[2024-11-07 16:54:56,724][14395] Num frames 118900... +[2024-11-07 16:54:56,941][14395] Avg episode rewards: #0: 4.459, true rewards: #0: 4.038 +[2024-11-07 16:54:56,947][14395] Avg episode reward: 4.459, avg true_objective: 4.038 +[2024-11-07 16:54:57,005][14395] Num frames 119000... +[2024-11-07 16:54:57,195][14395] Num frames 119100... +[2024-11-07 16:54:57,396][14395] Num frames 119200... +[2024-11-07 16:54:57,593][14395] Num frames 119300... +[2024-11-07 16:54:57,784][14395] Avg episode rewards: #0: 4.442, true rewards: #0: 4.032 +[2024-11-07 16:54:57,789][14395] Avg episode reward: 4.442, avg true_objective: 4.032 +[2024-11-07 16:54:57,887][14395] Num frames 119400... +[2024-11-07 16:54:58,084][14395] Num frames 119500... +[2024-11-07 16:54:58,283][14395] Num frames 119600... +[2024-11-07 16:54:58,485][14395] Num frames 119700... +[2024-11-07 16:54:58,630][14395] Avg episode rewards: #0: 4.442, true rewards: #0: 4.032 +[2024-11-07 16:54:58,631][14395] Avg episode reward: 4.442, avg true_objective: 4.032 +[2024-11-07 16:54:58,747][14395] Num frames 119800... +[2024-11-07 16:54:58,958][14395] Num frames 119900... +[2024-11-07 16:54:59,152][14395] Num frames 120000... +[2024-11-07 16:54:59,388][14395] Avg episode rewards: #0: 4.433, true rewards: #0: 4.022 +[2024-11-07 16:54:59,391][14395] Avg episode reward: 4.433, avg true_objective: 4.022 +[2024-11-07 16:54:59,407][14395] Num frames 120100... +[2024-11-07 16:54:59,635][14395] Num frames 120200... +[2024-11-07 16:54:59,816][14395] Num frames 120300... +[2024-11-07 16:55:00,037][14395] Num frames 120400... +[2024-11-07 16:55:00,326][14395] Num frames 120500... +[2024-11-07 16:55:00,500][14395] Avg episode rewards: #0: 4.433, true rewards: #0: 4.022 +[2024-11-07 16:55:00,503][14395] Avg episode reward: 4.433, avg true_objective: 4.022 +[2024-11-07 16:55:00,638][14395] Num frames 120600... +[2024-11-07 16:55:00,894][14395] Num frames 120700... +[2024-11-07 16:55:01,146][14395] Num frames 120800... +[2024-11-07 16:55:01,352][14395] Num frames 120900... +[2024-11-07 16:55:01,598][14395] Avg episode rewards: #0: 4.449, true rewards: #0: 4.029 +[2024-11-07 16:55:01,599][14395] Avg episode reward: 4.449, avg true_objective: 4.029 +[2024-11-07 16:55:01,612][14395] Num frames 121000... +[2024-11-07 16:55:01,806][14395] Num frames 121100... +[2024-11-07 16:55:01,999][14395] Num frames 121200... +[2024-11-07 16:55:03,095][14395] Num frames 121300... +[2024-11-07 16:55:03,287][14395] Avg episode rewards: #0: 4.433, true rewards: #0: 4.022 +[2024-11-07 16:55:03,288][14395] Avg episode reward: 4.433, avg true_objective: 4.022 +[2024-11-07 16:55:03,328][14395] Num frames 121400... +[2024-11-07 16:55:03,519][14395] Num frames 121500... +[2024-11-07 16:55:03,707][14395] Num frames 121600... +[2024-11-07 16:55:03,900][14395] Num frames 121700... +[2024-11-07 16:55:04,074][14395] Avg episode rewards: #0: 4.444, true rewards: #0: 4.034 +[2024-11-07 16:55:04,079][14395] Avg episode reward: 4.444, avg true_objective: 4.034 +[2024-11-07 16:55:04,167][14395] Num frames 121800... +[2024-11-07 16:55:04,364][14395] Num frames 121900... +[2024-11-07 16:55:04,549][14395] Num frames 122000... +[2024-11-07 16:55:04,737][14395] Num frames 122100... +[2024-11-07 16:55:04,883][14395] Avg episode rewards: #0: 4.431, true rewards: #0: 4.031 +[2024-11-07 16:55:04,887][14395] Avg episode reward: 4.431, avg true_objective: 4.031 +[2024-11-07 16:55:05,016][14395] Num frames 122200... +[2024-11-07 16:55:05,204][14395] Num frames 122300... +[2024-11-07 16:55:05,397][14395] Num frames 122400... +[2024-11-07 16:55:05,593][14395] Num frames 122500... +[2024-11-07 16:55:05,827][14395] Avg episode rewards: #0: 4.448, true rewards: #0: 4.037 +[2024-11-07 16:55:05,828][14395] Avg episode reward: 4.448, avg true_objective: 4.037 +[2024-11-07 16:55:05,844][14395] Num frames 122600... +[2024-11-07 16:55:06,046][14395] Num frames 122700... +[2024-11-07 16:55:06,239][14395] Num frames 122800... +[2024-11-07 16:55:06,433][14395] Num frames 122900... +[2024-11-07 16:55:06,628][14395] Num frames 123000... +[2024-11-07 16:55:06,768][14395] Avg episode rewards: #0: 4.448, true rewards: #0: 4.037 +[2024-11-07 16:55:06,769][14395] Avg episode reward: 4.448, avg true_objective: 4.037 +[2024-11-07 16:55:06,878][14395] Num frames 123100... +[2024-11-07 16:55:07,064][14395] Num frames 123200... +[2024-11-07 16:55:07,755][14395] Num frames 123300... +[2024-11-07 16:55:11,298][14395] Num frames 123400... +[2024-11-07 16:55:11,431][14395] Avg episode rewards: #0: 4.448, true rewards: #0: 4.037 +[2024-11-07 16:55:11,433][14395] Avg episode reward: 4.448, avg true_objective: 4.037 +[2024-11-07 16:55:11,583][14395] Num frames 123500... +[2024-11-07 16:55:11,794][14395] Num frames 123600... +[2024-11-07 16:55:11,983][14395] Num frames 123700... +[2024-11-07 16:55:12,182][14395] Num frames 123800... +[2024-11-07 16:55:12,259][14395] Avg episode rewards: #0: 4.434, true rewards: #0: 4.034 +[2024-11-07 16:55:12,263][14395] Avg episode reward: 4.434, avg true_objective: 4.034 +[2024-11-07 16:55:12,485][14395] Num frames 123900... +[2024-11-07 16:55:12,686][14395] Num frames 124000... +[2024-11-07 16:55:12,883][14395] Num frames 124100... +[2024-11-07 16:55:13,117][14395] Avg episode rewards: #0: 4.421, true rewards: #0: 4.031 +[2024-11-07 16:55:13,120][14395] Avg episode reward: 4.421, avg true_objective: 4.031 +[2024-11-07 16:55:13,151][14395] Num frames 124200... +[2024-11-07 16:55:13,337][14395] Num frames 124300... +[2024-11-07 16:55:13,525][14395] Num frames 124400... +[2024-11-07 16:55:13,709][14395] Num frames 124500... +[2024-11-07 16:55:13,911][14395] Avg episode rewards: #0: 4.421, true rewards: #0: 4.031 +[2024-11-07 16:55:13,914][14395] Avg episode reward: 4.421, avg true_objective: 4.031 +[2024-11-07 16:55:13,976][14395] Num frames 124600... +[2024-11-07 16:55:14,168][14395] Num frames 124700... +[2024-11-07 16:55:14,360][14395] Num frames 124800... +[2024-11-07 16:55:14,571][14395] Num frames 124900... +[2024-11-07 16:55:14,774][14395] Avg episode rewards: #0: 4.421, true rewards: #0: 4.031 +[2024-11-07 16:55:14,775][14395] Avg episode reward: 4.421, avg true_objective: 4.031 +[2024-11-07 16:55:14,861][14395] Num frames 125000... +[2024-11-07 16:55:15,126][14395] Num frames 125100... +[2024-11-07 16:55:15,349][14395] Num frames 125200... +[2024-11-07 16:55:15,546][14395] Num frames 125300... +[2024-11-07 16:55:15,759][14395] Num frames 125400... +[2024-11-07 16:55:15,965][14395] Avg episode rewards: #0: 4.454, true rewards: #0: 4.044 +[2024-11-07 16:55:15,967][14395] Avg episode reward: 4.454, avg true_objective: 4.044 +[2024-11-07 16:55:16,026][14395] Num frames 125500... +[2024-11-07 16:55:16,304][14395] Num frames 125600... +[2024-11-07 16:55:16,553][14395] Num frames 125700... +[2024-11-07 16:55:16,809][14395] Num frames 125800... +[2024-11-07 16:55:16,984][14395] Avg episode rewards: #0: 4.467, true rewards: #0: 4.057 +[2024-11-07 16:55:16,987][14395] Avg episode reward: 4.467, avg true_objective: 4.057 +[2024-11-07 16:55:17,095][14395] Num frames 125900... +[2024-11-07 16:55:17,288][14395] Num frames 126000... +[2024-11-07 16:55:17,508][14395] Num frames 126100... +[2024-11-07 16:55:18,092][14395] Num frames 126200... +[2024-11-07 16:55:18,375][14395] Num frames 126300... +[2024-11-07 16:55:18,609][14395] Num frames 126400... +[2024-11-07 16:55:20,293][14395] Avg episode rewards: #0: 4.503, true rewards: #0: 4.073 +[2024-11-07 16:55:20,294][14395] Avg episode reward: 4.503, avg true_objective: 4.073 +[2024-11-07 16:55:20,488][14395] Num frames 126500... +[2024-11-07 16:55:21,805][14395] Num frames 126600... +[2024-11-07 16:55:22,068][14395] Num frames 126700... +[2024-11-07 16:55:22,311][14395] Num frames 126800... +[2024-11-07 16:55:22,527][14395] Num frames 126900... +[2024-11-07 16:55:22,748][14395] Num frames 127000... +[2024-11-07 16:55:22,825][14395] Avg episode rewards: #0: 4.548, true rewards: #0: 4.098 +[2024-11-07 16:55:22,826][14395] Avg episode reward: 4.548, avg true_objective: 4.098 +[2024-11-07 16:55:23,027][14395] Num frames 127100... +[2024-11-07 16:55:23,268][14395] Num frames 127200... +[2024-11-07 16:55:23,451][14395] Num frames 127300... +[2024-11-07 16:55:23,678][14395] Avg episode rewards: #0: 4.548, true rewards: #0: 4.098 +[2024-11-07 16:55:23,682][14395] Avg episode reward: 4.548, avg true_objective: 4.098 +[2024-11-07 16:55:23,713][14395] Num frames 127400... +[2024-11-07 16:55:23,900][14395] Num frames 127500... +[2024-11-07 16:55:24,093][14395] Num frames 127600... +[2024-11-07 16:55:24,305][14395] Num frames 127700... +[2024-11-07 16:55:24,520][14395] Num frames 127800... +[2024-11-07 16:55:24,664][14395] Avg episode rewards: #0: 4.565, true rewards: #0: 4.105 +[2024-11-07 16:55:24,665][14395] Avg episode reward: 4.565, avg true_objective: 4.105 +[2024-11-07 16:55:24,785][14395] Num frames 127900... +[2024-11-07 16:55:25,011][14395] Num frames 128000... +[2024-11-07 16:55:25,217][14395] Num frames 128100... +[2024-11-07 16:55:25,407][14395] Num frames 128200... +[2024-11-07 16:55:25,574][14395] Avg episode rewards: #0: 4.552, true rewards: #0: 4.101 +[2024-11-07 16:55:25,580][14395] Avg episode reward: 4.552, avg true_objective: 4.101 +[2024-11-07 16:55:25,678][14395] Num frames 128300... +[2024-11-07 16:55:25,876][14395] Num frames 128400... +[2024-11-07 16:55:26,073][14395] Num frames 128500... +[2024-11-07 16:55:26,261][14395] Num frames 128600... +[2024-11-07 16:55:26,449][14395] Num frames 128700... +[2024-11-07 16:55:26,517][14395] Avg episode rewards: #0: 4.532, true rewards: #0: 4.092 +[2024-11-07 16:55:26,521][14395] Avg episode reward: 4.532, avg true_objective: 4.092 +[2024-11-07 16:55:26,718][14395] Num frames 128800... +[2024-11-07 16:55:26,900][14395] Num frames 128900... +[2024-11-07 16:55:27,115][14395] Avg episode rewards: #0: 4.521, true rewards: #0: 4.081 +[2024-11-07 16:55:27,119][14395] Avg episode reward: 4.521, avg true_objective: 4.081 +[2024-11-07 16:55:27,175][14395] Num frames 129000... +[2024-11-07 16:55:27,381][14395] Num frames 129100... +[2024-11-07 16:55:27,581][14395] Num frames 129200... +[2024-11-07 16:55:27,773][14395] Num frames 129300... +[2024-11-07 16:55:27,971][14395] Avg episode rewards: #0: 4.521, true rewards: #0: 4.081 +[2024-11-07 16:55:27,973][14395] Avg episode reward: 4.521, avg true_objective: 4.081 +[2024-11-07 16:55:28,055][14395] Num frames 129400... +[2024-11-07 16:55:28,252][14395] Num frames 129500... +[2024-11-07 16:55:28,441][14395] Num frames 129600... +[2024-11-07 16:55:28,634][14395] Num frames 129700... +[2024-11-07 16:55:28,694][14395] Avg episode rewards: #0: 4.520, true rewards: #0: 4.080 +[2024-11-07 16:55:28,698][14395] Avg episode reward: 4.520, avg true_objective: 4.080 +[2024-11-07 16:55:28,908][14395] Num frames 129800... +[2024-11-07 16:55:29,098][14395] Num frames 129900... +[2024-11-07 16:55:29,288][14395] Num frames 130000... +[2024-11-07 16:55:29,468][14395] Num frames 130100... +[2024-11-07 16:55:29,557][14395] Avg episode rewards: #0: 4.517, true rewards: #0: 4.077 +[2024-11-07 16:55:29,559][14395] Avg episode reward: 4.517, avg true_objective: 4.077 +[2024-11-07 16:55:29,750][14395] Num frames 130200... +[2024-11-07 16:55:29,967][14395] Num frames 130300... +[2024-11-07 16:55:30,173][14395] Num frames 130400... +[2024-11-07 16:55:30,362][14395] Num frames 130500... +[2024-11-07 16:55:30,422][14395] Avg episode rewards: #0: 4.497, true rewards: #0: 4.067 +[2024-11-07 16:55:30,426][14395] Avg episode reward: 4.497, avg true_objective: 4.067 +[2024-11-07 16:55:30,620][14395] Num frames 130600... +[2024-11-07 16:55:30,822][14395] Num frames 130700... +[2024-11-07 16:55:31,002][14395] Num frames 130800... +[2024-11-07 16:55:31,073][14395] Avg episode rewards: #0: 4.473, true rewards: #0: 4.053 +[2024-11-07 16:55:31,076][14395] Avg episode reward: 4.473, avg true_objective: 4.053 +[2024-11-07 16:55:31,278][14395] Num frames 130900... +[2024-11-07 16:55:31,464][14395] Num frames 131000... +[2024-11-07 16:55:31,652][14395] Num frames 131100... +[2024-11-07 16:55:31,851][14395] Num frames 131200... +[2024-11-07 16:55:31,950][14395] Avg episode rewards: #0: 4.473, true rewards: #0: 4.053 +[2024-11-07 16:55:31,955][14395] Avg episode reward: 4.473, avg true_objective: 4.053 +[2024-11-07 16:55:32,125][14395] Num frames 131300... +[2024-11-07 16:55:32,313][14395] Num frames 131400... +[2024-11-07 16:55:32,606][14395] Num frames 131500... +[2024-11-07 16:55:32,795][14395] Num frames 131600... +[2024-11-07 16:55:32,863][14395] Avg episode rewards: #0: 4.456, true rewards: #0: 4.046 +[2024-11-07 16:55:32,865][14395] Avg episode reward: 4.456, avg true_objective: 4.046 +[2024-11-07 16:55:33,067][14395] Num frames 131700... +[2024-11-07 16:55:33,275][14395] Num frames 131800... +[2024-11-07 16:55:33,510][14395] Avg episode rewards: #0: 4.457, true rewards: #0: 4.037 +[2024-11-07 16:55:33,515][14395] Avg episode reward: 4.457, avg true_objective: 4.037 +[2024-11-07 16:55:33,545][14395] Num frames 131900... +[2024-11-07 16:55:33,738][14395] Num frames 132000... +[2024-11-07 16:55:33,952][14395] Num frames 132100... +[2024-11-07 16:55:34,144][14395] Num frames 132200... +[2024-11-07 16:55:34,346][14395] Avg episode rewards: #0: 4.470, true rewards: #0: 4.049 +[2024-11-07 16:55:34,349][14395] Avg episode reward: 4.470, avg true_objective: 4.049 +[2024-11-07 16:55:34,407][14395] Num frames 132300... +[2024-11-07 16:55:34,597][14395] Num frames 132400... +[2024-11-07 16:55:34,790][14395] Num frames 132500... +[2024-11-07 16:55:34,985][14395] Num frames 132600... +[2024-11-07 16:55:35,155][14395] Avg episode rewards: #0: 4.453, true rewards: #0: 4.043 +[2024-11-07 16:55:35,157][14395] Avg episode reward: 4.453, avg true_objective: 4.043 +[2024-11-07 16:55:35,242][14395] Num frames 132700... +[2024-11-07 16:55:35,451][14395] Num frames 132800... +[2024-11-07 16:55:35,645][14395] Num frames 132900... +[2024-11-07 16:55:35,852][14395] Num frames 133000... +[2024-11-07 16:56:34,547][14395] Avg episode rewards: #0: 4.453, true rewards: #0: 4.043 +[2024-11-07 16:56:35,186][14395] Avg episode reward: 4.453, avg true_objective: 4.043 +[2024-11-07 16:58:32,328][14395] Num frames 133100... +[2024-11-07 16:59:01,673][14395] Num frames 133200... +[2024-11-07 16:59:02,459][14395] Num frames 133300... +[2024-11-07 16:59:02,790][14395] Num frames 133400... +[2024-11-07 16:59:06,979][14395] Avg episode rewards: #0: 4.453, true rewards: #0: 4.043 +[2024-11-07 16:59:07,279][14395] Avg episode reward: 4.453, avg true_objective: 4.043 +[2024-11-07 17:02:09,363][14395] Num frames 133500... +[2024-11-07 17:11:47,637][14395] Num frames 133600... +[2024-11-07 17:21:28,546][14395] Num frames 133700... +[2024-11-07 17:34:34,423][14395] Num frames 133800... +[2024-11-07 17:37:08,638][14395] Avg episode rewards: #0: 4.437, true rewards: #0: 4.037 +[2024-11-07 17:37:09,069][14395] Avg episode reward: 4.437, avg true_objective: 4.037 +[2024-11-07 17:58:13,270][14395] Num frames 133900... +[2024-11-07 22:48:27,457][40007] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... +[2024-11-07 22:48:27,493][40007] Rollout worker 0 uses device cpu +[2024-11-07 22:48:27,790][40007] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 22:48:27,791][40007] InferenceWorker_p0-w0: min num requests: 1 +[2024-11-07 22:48:27,797][40007] Starting all processes... +[2024-11-07 22:48:27,798][40007] Starting process learner_proc0 +[2024-11-07 22:48:27,876][40007] Starting all processes... +[2024-11-07 22:48:27,917][40007] Starting process inference_proc0-0 +[2024-11-07 22:48:27,918][40007] Starting process rollout_proc0 +[2024-11-07 22:48:30,816][40309] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6] +[2024-11-07 22:48:30,817][40308] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 22:48:30,817][40302] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 22:48:30,817][40308] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-11-07 22:48:30,818][40302] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-11-07 22:48:30,891][40308] Num visible devices: 1 +[2024-11-07 22:48:30,891][40302] Num visible devices: 1 +[2024-11-07 22:48:30,931][40302] Starting seed is not provided +[2024-11-07 22:48:30,931][40302] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 22:48:30,931][40302] Initializing actor-critic model on device cuda:0 +[2024-11-07 22:48:30,931][40302] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 22:48:30,935][40302] RunningMeanStd input shape: (1,) +[2024-11-07 22:48:30,948][40302] ConvEncoder: input_channels=3 +[2024-11-07 22:48:31,957][40302] Conv encoder output size: 512 +[2024-11-07 22:48:31,958][40302] Policy head output size: 512 +[2024-11-07 22:48:32,287][40302] Created Actor Critic model with architecture: +[2024-11-07 22:48:32,287][40302] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-11-07 22:48:33,883][40302] Using optimizer +[2024-11-07 22:48:40,573][40302] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... +[2024-11-07 22:48:41,038][40302] Loading model from checkpoint +[2024-11-07 22:48:41,042][40302] Loaded experiment state at self.train_step=4884, self.env_steps=20004864 +[2024-11-07 22:48:41,043][40302] Initialized policy 0 weights for model version 4884 +[2024-11-07 22:48:41,063][40302] LearnerWorker_p0 finished initialization! +[2024-11-07 22:48:41,064][40302] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 22:48:41,239][40308] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 22:48:41,241][40308] RunningMeanStd input shape: (1,) +[2024-11-07 22:48:41,253][40308] ConvEncoder: input_channels=3 +[2024-11-07 22:48:41,364][40308] Conv encoder output size: 512 +[2024-11-07 22:48:41,364][40308] Policy head output size: 512 +[2024-11-07 22:48:41,406][40007] Inference worker 0-0 is ready! +[2024-11-07 22:48:41,408][40007] All inference workers are ready! Signal rollout workers to start! +[2024-11-07 22:48:41,440][40309] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 22:48:43,973][40309] Decorrelating experience for 0 frames... +[2024-11-07 22:48:44,308][40007] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 20004864. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 22:48:44,318][40309] Decorrelating experience for 32 frames... +[2024-11-07 22:48:47,781][40007] Heartbeat connected on Batcher_0 +[2024-11-07 22:48:47,785][40007] Heartbeat connected on LearnerWorker_p0 +[2024-11-07 22:48:47,796][40007] Heartbeat connected on RolloutWorker_w0 +[2024-11-07 22:48:49,019][40007] Heartbeat connected on InferenceWorker_p0-w0 +[2024-11-07 22:48:49,308][40007] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 20004864. Throughput: 0: 1.0. Samples: 5. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 22:48:54,308][40007] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 20004864. Throughput: 0: 93.5. Samples: 935. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 22:48:54,310][40007] Avg episode reward: [(0, '4.568')] +[2024-11-07 22:48:58,369][40302] Signal inference workers to stop experience collection... +[2024-11-07 22:48:58,375][40308] InferenceWorker_p0-w0: stopping experience collection +[2024-11-07 22:48:59,309][40007] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 20004864. Throughput: 0: 109.1. Samples: 1636. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 22:48:59,310][40007] Avg episode reward: [(0, '4.482')] +[2024-11-07 22:49:04,309][40007] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 20004864. Throughput: 0: 102.6. Samples: 2052. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 22:49:04,314][40007] Avg episode reward: [(0, '4.482')] +[2024-11-07 22:49:04,343][40302] Signal inference workers to resume experience collection... +[2024-11-07 22:49:04,344][40308] InferenceWorker_p0-w0: resuming experience collection +[2024-11-07 22:49:04,345][40302] Stopping Batcher_0... +[2024-11-07 22:49:04,345][40302] Loop batcher_evt_loop terminating... +[2024-11-07 22:49:04,358][40007] Component Batcher_0 stopped! +[2024-11-07 22:49:04,360][40308] Weights refcount: 2 0 +[2024-11-07 22:49:04,364][40308] Stopping InferenceWorker_p0-w0... +[2024-11-07 22:49:04,365][40308] Loop inference_proc0-0_evt_loop terminating... +[2024-11-07 22:49:04,364][40007] Component InferenceWorker_p0-w0 stopped! +[2024-11-07 22:49:05,054][40309] Stopping RolloutWorker_w0... +[2024-11-07 22:49:05,055][40309] Loop rollout_proc0_evt_loop terminating... +[2024-11-07 22:49:05,054][40007] Component RolloutWorker_w0 stopped! +[2024-11-07 22:49:05,574][40302] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004886_20013056.pth... +[2024-11-07 22:49:05,698][40302] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004691_19214336.pth +[2024-11-07 22:49:05,711][40302] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004886_20013056.pth... +[2024-11-07 22:49:05,830][40302] Stopping LearnerWorker_p0... +[2024-11-07 22:49:05,831][40302] Loop learner_proc0_evt_loop terminating... +[2024-11-07 22:49:05,830][40007] Component LearnerWorker_p0 stopped! +[2024-11-07 22:49:05,832][40007] Waiting for process learner_proc0 to stop... +[2024-11-07 22:49:07,103][40007] Waiting for process inference_proc0-0 to join... +[2024-11-07 22:49:07,104][40007] Waiting for process rollout_proc0 to join... +[2024-11-07 22:49:07,105][40007] Batcher 0 profile tree view: +batching: 0.1261, releasing_batches: 0.0012 +[2024-11-07 22:49:07,107][40007] InferenceWorker_p0-w0 profile tree view: +update_model: 0.0807 +wait_policy: 0.0000 + wait_policy_total: 2.9407 +one_step: 0.0050 + handle_policy_step: 13.5353 + deserialize: 0.0998, stack: 0.0362, obs_to_device_normalize: 2.7488, forward: 8.5619, send_messages: 0.2222 + prepare_outputs: 1.5741 + to_cpu: 1.2227 +[2024-11-07 22:49:07,108][40007] Learner 0 profile tree view: +misc: 0.0000, prepare_batch: 3.7822 +train: 7.8824 + epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0024, kl_divergence: 0.0241, after_optimizer: 0.5007 + calculate_losses: 1.5819 + losses_init: 0.0000, forward_head: 0.6289, bptt_initial: 0.5272, tail: 0.0389, advantages_returns: 0.0028, losses: 0.2730 + bptt: 0.1021 + bptt_forward_core: 0.1016 + update: 5.7721 + clip: 0.3766 +[2024-11-07 22:49:07,111][40007] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.0039, enqueue_policy_requests: 0.2836, env_step: 2.4835, overhead: 0.1487, complete_rollouts: 0.0108 +save_policy_outputs: 0.2191 + split_output_tensors: 0.0756 +[2024-11-07 22:49:07,113][40007] Loop Runner_EvtLoop terminating... +[2024-11-07 22:49:07,114][40007] Runner profile tree view: +main_loop: 39.3175 +[2024-11-07 22:49:07,116][40007] Collected {0: 20013056}, FPS: 208.4 +[2024-11-07 22:49:07,412][40007] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-07 22:49:07,413][40007] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-11-07 22:49:07,414][40007] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-11-07 22:49:07,416][40007] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 22:49:07,419][40007] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-11-07 22:49:07,420][40007] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 22:49:07,423][40007] Adding new argument 'max_num_episodes'=1000000 that is not in the saved config file! +[2024-11-07 22:49:07,424][40007] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2024-11-07 22:49:07,428][40007] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2024-11-07 22:49:07,429][40007] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-11-07 22:49:07,432][40007] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-11-07 22:49:07,433][40007] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-11-07 22:49:07,435][40007] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-11-07 22:49:07,436][40007] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-11-07 22:49:07,478][40007] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 22:49:07,485][40007] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 22:49:07,488][40007] RunningMeanStd input shape: (1,) +[2024-11-07 22:49:07,511][40007] ConvEncoder: input_channels=3 +[2024-11-07 22:49:07,648][40007] Conv encoder output size: 512 +[2024-11-07 22:49:07,649][40007] Policy head output size: 512 +[2024-11-07 22:49:08,368][40007] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004886_20013056.pth... +[2024-11-07 22:49:09,832][40007] Num frames 100... +[2024-11-07 22:49:10,163][40007] Num frames 200... +[2024-11-07 22:49:10,486][40007] Num frames 300... +[2024-11-07 22:49:10,797][40007] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 22:49:10,801][40007] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 22:49:10,863][40007] Num frames 400... +[2024-11-07 22:49:11,209][40007] Num frames 500... +[2024-11-07 22:49:11,533][40007] Num frames 600... +[2024-11-07 22:49:11,834][40007] Num frames 700... +[2024-11-07 22:49:12,165][40007] Num frames 800... +[2024-11-07 22:49:12,330][40007] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 +[2024-11-07 22:49:12,332][40007] Avg episode reward: 4.660, avg true_objective: 4.160 +[2024-11-07 22:49:12,541][40007] Num frames 900... +[2024-11-07 22:49:14,275][40007] Num frames 1000... +[2024-11-07 22:49:14,626][40007] Num frames 1100... +[2024-11-07 22:49:14,877][40007] Num frames 1200... +[2024-11-07 22:49:14,976][40007] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 +[2024-11-07 22:49:14,982][40007] Avg episode reward: 4.387, avg true_objective: 4.053 +[2024-11-07 22:49:15,177][40007] Num frames 1300... +[2024-11-07 22:49:15,409][40007] Num frames 1400... +[2024-11-07 22:49:15,627][40007] Num frames 1500... +[2024-11-07 22:49:15,866][40007] Num frames 1600... +[2024-11-07 22:49:15,918][40007] Avg episode rewards: #0: 4.250, true rewards: #0: 4.000 +[2024-11-07 22:49:15,922][40007] Avg episode reward: 4.250, avg true_objective: 4.000 +[2024-11-07 22:49:16,166][40007] Num frames 1700... +[2024-11-07 22:49:16,408][40007] Num frames 1800... +[2024-11-07 22:49:16,661][40007] Num frames 1900... +[2024-11-07 22:49:16,878][40007] Num frames 2000... +[2024-11-07 22:49:16,971][40007] Avg episode rewards: #0: 4.432, true rewards: #0: 4.032 +[2024-11-07 22:49:16,975][40007] Avg episode reward: 4.432, avg true_objective: 4.032 +[2024-11-07 22:49:17,188][40007] Num frames 2100... +[2024-11-07 22:49:17,441][40007] Num frames 2200... +[2024-11-07 22:49:17,673][40007] Num frames 2300... +[2024-11-07 22:49:17,898][40007] Num frames 2400... +[2024-11-07 22:49:17,951][40007] Avg episode rewards: #0: 4.333, true rewards: #0: 4.000 +[2024-11-07 22:49:17,956][40007] Avg episode reward: 4.333, avg true_objective: 4.000 +[2024-11-07 22:49:18,182][40007] Num frames 2500... +[2024-11-07 22:49:18,408][40007] Num frames 2600... +[2024-11-07 22:49:18,609][40007] Num frames 2700... +[2024-11-07 22:49:18,797][40007] Num frames 2800... +[2024-11-07 22:49:18,953][40007] Avg episode rewards: #0: 4.497, true rewards: #0: 4.069 +[2024-11-07 22:49:18,956][40007] Avg episode reward: 4.497, avg true_objective: 4.069 +[2024-11-07 22:49:19,060][40007] Num frames 2900... +[2024-11-07 22:49:19,231][40007] Num frames 3000... +[2024-11-07 22:49:19,409][40007] Num frames 3100... +[2024-11-07 22:49:19,586][40007] Num frames 3200... +[2024-11-07 22:49:19,744][40007] Avg episode rewards: #0: 4.415, true rewards: #0: 4.040 +[2024-11-07 22:49:19,747][40007] Avg episode reward: 4.415, avg true_objective: 4.040 +[2024-11-07 22:49:19,916][40007] Num frames 3300... +[2024-11-07 22:49:20,115][40007] Num frames 3400... +[2024-11-07 22:49:20,307][40007] Num frames 3500... +[2024-11-07 22:49:20,511][40007] Num frames 3600... +[2024-11-07 22:49:20,727][40007] Avg episode rewards: #0: 4.533, true rewards: #0: 4.089 +[2024-11-07 22:49:20,732][40007] Avg episode reward: 4.533, avg true_objective: 4.089 +[2024-11-07 22:49:20,784][40007] Num frames 3700... +[2024-11-07 22:49:20,959][40007] Num frames 3800... +[2024-11-07 22:49:21,130][40007] Num frames 3900... +[2024-11-07 22:49:21,305][40007] Num frames 4000... +[2024-11-07 22:49:21,489][40007] Avg episode rewards: #0: 4.464, true rewards: #0: 4.064 +[2024-11-07 22:49:21,495][40007] Avg episode reward: 4.464, avg true_objective: 4.064 +[2024-11-07 22:49:21,572][40007] Num frames 4100... +[2024-11-07 22:49:21,743][40007] Num frames 4200... +[2024-11-07 22:49:21,902][40007] Num frames 4300... +[2024-11-07 22:49:22,066][40007] Num frames 4400... +[2024-11-07 22:49:22,198][40007] Avg episode rewards: #0: 4.407, true rewards: #0: 4.044 +[2024-11-07 22:49:22,200][40007] Avg episode reward: 4.407, avg true_objective: 4.044 +[2024-11-07 22:49:22,291][40007] Num frames 4500... +[2024-11-07 22:49:22,449][40007] Num frames 4600... +[2024-11-07 22:49:22,632][40007] Num frames 4700... +[2024-11-07 22:49:22,816][40007] Num frames 4800... +[2024-11-07 22:49:22,923][40007] Avg episode rewards: #0: 4.360, true rewards: #0: 4.027 +[2024-11-07 22:49:22,928][40007] Avg episode reward: 4.360, avg true_objective: 4.027 +[2024-11-07 22:49:23,091][40007] Num frames 4900... +[2024-11-07 22:49:23,259][40007] Num frames 5000... +[2024-11-07 22:49:23,457][40007] Avg episode rewards: #0: 4.222, true rewards: #0: 3.914 +[2024-11-07 22:49:23,458][40007] Avg episode reward: 4.222, avg true_objective: 3.914 +[2024-11-07 22:49:23,487][40007] Num frames 5100... +[2024-11-07 22:49:23,655][40007] Num frames 5200... +[2024-11-07 22:49:23,808][40007] Num frames 5300... +[2024-11-07 22:49:23,994][40007] Num frames 5400... +[2024-11-07 22:49:24,192][40007] Avg episode rewards: #0: 4.194, true rewards: #0: 3.909 +[2024-11-07 22:49:24,197][40007] Avg episode reward: 4.194, avg true_objective: 3.909 +[2024-11-07 22:49:24,273][40007] Num frames 5500... +[2024-11-07 22:49:24,445][40007] Num frames 5600... +[2024-11-07 22:49:24,632][40007] Num frames 5700... +[2024-11-07 22:49:24,790][40007] Num frames 5800... +[2024-11-07 22:49:24,934][40007] Avg episode rewards: #0: 4.171, true rewards: #0: 3.904 +[2024-11-07 22:49:24,936][40007] Avg episode reward: 4.171, avg true_objective: 3.904 +[2024-11-07 22:49:25,014][40007] Num frames 5900... +[2024-11-07 22:49:25,186][40007] Num frames 6000... +[2024-11-07 22:49:25,355][40007] Num frames 6100... +[2024-11-07 22:49:25,647][40007] Num frames 6200... +[2024-11-07 22:49:25,767][40007] Avg episode rewards: #0: 4.150, true rewards: #0: 3.900 +[2024-11-07 22:49:25,771][40007] Avg episode reward: 4.150, avg true_objective: 3.900 +[2024-11-07 22:49:25,919][40007] Num frames 6300... +[2024-11-07 22:49:26,140][40007] Num frames 6400... +[2024-11-07 22:49:26,384][40007] Num frames 6500... +[2024-11-07 22:49:26,579][40007] Num frames 6600... +[2024-11-07 22:49:26,674][40007] Avg episode rewards: #0: 4.132, true rewards: #0: 3.896 +[2024-11-07 22:49:26,678][40007] Avg episode reward: 4.132, avg true_objective: 3.896 +[2024-11-07 22:49:26,831][40007] Num frames 6700... +[2024-11-07 22:49:27,056][40007] Num frames 6800... +[2024-11-07 22:49:27,257][40007] Num frames 6900... +[2024-11-07 22:49:27,488][40007] Num frames 7000... +[2024-11-07 22:49:27,568][40007] Avg episode rewards: #0: 4.116, true rewards: #0: 3.893 +[2024-11-07 22:49:27,572][40007] Avg episode reward: 4.116, avg true_objective: 3.893 +[2024-11-07 22:49:27,784][40007] Num frames 7100... +[2024-11-07 22:49:28,056][40007] Num frames 7200... +[2024-11-07 22:49:28,297][40007] Num frames 7300... +[2024-11-07 22:49:28,536][40007] Num frames 7400... +[2024-11-07 22:49:28,724][40007] Avg episode rewards: #0: 4.187, true rewards: #0: 3.924 +[2024-11-07 22:49:28,727][40007] Avg episode reward: 4.187, avg true_objective: 3.924 +[2024-11-07 22:49:28,855][40007] Num frames 7500... +[2024-11-07 22:49:29,044][40007] Num frames 7600... +[2024-11-07 22:49:29,203][40007] Num frames 7700... +[2024-11-07 22:49:29,362][40007] Num frames 7800... +[2024-11-07 22:49:29,508][40007] Num frames 7900... +[2024-11-07 22:49:29,570][40007] Avg episode rewards: #0: 4.252, true rewards: #0: 3.952 +[2024-11-07 22:49:29,571][40007] Avg episode reward: 4.252, avg true_objective: 3.952 +[2024-11-07 22:49:29,725][40007] Num frames 8000... +[2024-11-07 22:49:29,880][40007] Num frames 8100... +[2024-11-07 22:49:30,028][40007] Avg episode rewards: #0: 4.171, true rewards: #0: 3.886 +[2024-11-07 22:49:30,032][40007] Avg episode reward: 4.171, avg true_objective: 3.886 +[2024-11-07 22:49:30,113][40007] Num frames 8200... +[2024-11-07 22:49:30,274][40007] Num frames 8300... +[2024-11-07 22:49:30,444][40007] Num frames 8400... +[2024-11-07 22:49:30,630][40007] Num frames 8500... +[2024-11-07 22:49:30,788][40007] Avg episode rewards: #0: 4.156, true rewards: #0: 3.884 +[2024-11-07 22:49:30,789][40007] Avg episode reward: 4.156, avg true_objective: 3.884 +[2024-11-07 22:49:30,894][40007] Num frames 8600... +[2024-11-07 22:49:31,096][40007] Num frames 8700... +[2024-11-07 22:49:31,287][40007] Num frames 8800... +[2024-11-07 22:49:31,519][40007] Num frames 8900... +[2024-11-07 22:49:31,648][40007] Avg episode rewards: #0: 4.143, true rewards: #0: 3.882 +[2024-11-07 22:49:31,650][40007] Avg episode reward: 4.143, avg true_objective: 3.882 +[2024-11-07 22:49:31,827][40007] Num frames 9000... +[2024-11-07 22:49:32,076][40007] Num frames 9100... +[2024-11-07 22:49:32,265][40007] Num frames 9200... +[2024-11-07 22:49:32,480][40007] Num frames 9300... +[2024-11-07 22:49:32,556][40007] Avg episode rewards: #0: 4.130, true rewards: #0: 3.880 +[2024-11-07 22:49:32,559][40007] Avg episode reward: 4.130, avg true_objective: 3.880 +[2024-11-07 22:49:32,740][40007] Num frames 9400... +[2024-11-07 22:49:32,964][40007] Num frames 9500... +[2024-11-07 22:49:33,146][40007] Num frames 9600... +[2024-11-07 22:49:33,391][40007] Avg episode rewards: #0: 4.118, true rewards: #0: 3.878 +[2024-11-07 22:49:33,393][40007] Avg episode reward: 4.118, avg true_objective: 3.878 +[2024-11-07 22:49:33,407][40007] Num frames 9700... +[2024-11-07 22:49:33,683][40007] Num frames 9800... +[2024-11-07 22:49:33,884][40007] Num frames 9900... +[2024-11-07 22:49:34,124][40007] Num frames 10000... +[2024-11-07 22:49:34,348][40007] Avg episode rewards: #0: 4.108, true rewards: #0: 3.877 +[2024-11-07 22:49:34,352][40007] Avg episode reward: 4.108, avg true_objective: 3.877 +[2024-11-07 22:49:34,403][40007] Num frames 10100... +[2024-11-07 22:49:34,591][40007] Num frames 10200... +[2024-11-07 22:49:34,756][40007] Num frames 10300... +[2024-11-07 22:49:34,935][40007] Num frames 10400... +[2024-11-07 22:49:35,049][40007] Avg episode rewards: #0: 4.086, true rewards: #0: 3.863 +[2024-11-07 22:49:35,053][40007] Avg episode reward: 4.086, avg true_objective: 3.863 +[2024-11-07 22:49:35,195][40007] Num frames 10500... +[2024-11-07 22:49:35,363][40007] Num frames 10600... +[2024-11-07 22:49:35,559][40007] Avg episode rewards: #0: 4.031, true rewards: #0: 3.817 +[2024-11-07 22:49:35,563][40007] Avg episode reward: 4.031, avg true_objective: 3.817 +[2024-11-07 22:49:35,609][40007] Num frames 10700... +[2024-11-07 22:49:35,781][40007] Num frames 10800... +[2024-11-07 22:49:35,951][40007] Num frames 10900... +[2024-11-07 22:49:36,121][40007] Num frames 11000... +[2024-11-07 22:49:36,306][40007] Avg episode rewards: #0: 4.024, true rewards: #0: 3.818 +[2024-11-07 22:49:36,310][40007] Avg episode reward: 4.024, avg true_objective: 3.818 +[2024-11-07 22:49:36,392][40007] Num frames 11100... +[2024-11-07 22:49:36,566][40007] Num frames 11200... +[2024-11-07 22:49:36,741][40007] Num frames 11300... +[2024-11-07 22:49:36,908][40007] Num frames 11400... +[2024-11-07 22:49:37,071][40007] Num frames 11500... +[2024-11-07 22:49:37,212][40007] Avg episode rewards: #0: 4.117, true rewards: #0: 3.850 +[2024-11-07 22:49:37,215][40007] Avg episode reward: 4.117, avg true_objective: 3.850 +[2024-11-07 22:49:37,315][40007] Num frames 11600... +[2024-11-07 22:49:37,482][40007] Num frames 11700... +[2024-11-07 22:49:37,656][40007] Num frames 11800... +[2024-11-07 22:49:37,824][40007] Num frames 11900... +[2024-11-07 22:49:37,993][40007] Num frames 12000... +[2024-11-07 22:49:38,107][40007] Avg episode rewards: #0: 4.171, true rewards: #0: 3.881 +[2024-11-07 22:49:38,111][40007] Avg episode reward: 4.171, avg true_objective: 3.881 +[2024-11-07 22:49:38,244][40007] Num frames 12100... +[2024-11-07 22:49:38,400][40007] Num frames 12200... +[2024-11-07 22:49:38,576][40007] Num frames 12300... +[2024-11-07 22:49:38,748][40007] Num frames 12400... +[2024-11-07 22:49:38,880][40007] Avg episode rewards: #0: 4.202, true rewards: #0: 3.890 +[2024-11-07 22:49:38,884][40007] Avg episode reward: 4.202, avg true_objective: 3.890 +[2024-11-07 22:49:38,990][40007] Num frames 12500... +[2024-11-07 22:49:39,174][40007] Num frames 12600... +[2024-11-07 22:49:39,348][40007] Num frames 12700... +[2024-11-07 22:49:39,512][40007] Num frames 12800... +[2024-11-07 22:49:39,622][40007] Avg episode rewards: #0: 4.191, true rewards: #0: 3.888 +[2024-11-07 22:49:39,627][40007] Avg episode reward: 4.191, avg true_objective: 3.888 +[2024-11-07 22:49:39,783][40007] Num frames 12900... +[2024-11-07 22:49:39,977][40007] Num frames 13000... +[2024-11-07 22:49:40,155][40007] Num frames 13100... +[2024-11-07 22:49:40,336][40007] Num frames 13200... +[2024-11-07 22:49:40,423][40007] Avg episode rewards: #0: 4.181, true rewards: #0: 3.887 +[2024-11-07 22:49:40,425][40007] Avg episode reward: 4.181, avg true_objective: 3.887 +[2024-11-07 22:49:40,622][40007] Num frames 13300... +[2024-11-07 22:49:40,845][40007] Num frames 13400... +[2024-11-07 22:49:41,047][40007] Num frames 13500... +[2024-11-07 22:49:41,259][40007] Num frames 13600... +[2024-11-07 22:49:41,468][40007] Num frames 13700... +[2024-11-07 22:49:41,666][40007] Avg episode rewards: #0: 4.274, true rewards: #0: 3.931 +[2024-11-07 22:49:41,668][40007] Avg episode reward: 4.274, avg true_objective: 3.931 +[2024-11-07 22:49:41,783][40007] Num frames 13800... +[2024-11-07 22:49:42,034][40007] Num frames 13900... +[2024-11-07 22:49:42,259][40007] Num frames 14000... +[2024-11-07 22:49:42,469][40007] Avg episode rewards: #0: 4.238, true rewards: #0: 3.905 +[2024-11-07 22:49:42,470][40007] Avg episode reward: 4.238, avg true_objective: 3.905 +[2024-11-07 22:49:42,574][40007] Num frames 14100... +[2024-11-07 22:49:42,825][40007] Num frames 14200... +[2024-11-07 22:49:43,010][40007] Num frames 14300... +[2024-11-07 22:49:43,275][40007] Num frames 14400... +[2024-11-07 22:49:43,418][40007] Avg episode rewards: #0: 4.228, true rewards: #0: 3.903 +[2024-11-07 22:49:43,422][40007] Avg episode reward: 4.228, avg true_objective: 3.903 +[2024-11-07 22:49:43,552][40007] Num frames 14500... +[2024-11-07 22:49:43,754][40007] Num frames 14600... +[2024-11-07 22:49:43,959][40007] Num frames 14700... +[2024-11-07 22:49:44,169][40007] Num frames 14800... +[2024-11-07 22:49:44,409][40007] Avg episode rewards: #0: 4.261, true rewards: #0: 3.918 +[2024-11-07 22:49:44,415][40007] Avg episode reward: 4.261, avg true_objective: 3.918 +[2024-11-07 22:49:44,461][40007] Num frames 14900... +[2024-11-07 22:49:44,656][40007] Num frames 15000... +[2024-11-07 22:49:44,852][40007] Num frames 15100... +[2024-11-07 22:49:45,050][40007] Num frames 15200... +[2024-11-07 22:49:45,138][40007] Avg episode rewards: #0: 4.235, true rewards: #0: 3.902 +[2024-11-07 22:49:45,139][40007] Avg episode reward: 4.235, avg true_objective: 3.902 +[2024-11-07 22:49:45,313][40007] Num frames 15300... +[2024-11-07 22:49:45,510][40007] Num frames 15400... +[2024-11-07 22:49:45,710][40007] Num frames 15500... +[2024-11-07 22:49:45,906][40007] Num frames 15600... +[2024-11-07 22:49:46,081][40007] Avg episode rewards: #0: 4.266, true rewards: #0: 3.916 +[2024-11-07 22:49:46,085][40007] Avg episode reward: 4.266, avg true_objective: 3.916 +[2024-11-07 22:49:46,179][40007] Num frames 15700... +[2024-11-07 22:49:47,795][40007] Num frames 15800... +[2024-11-07 22:49:47,977][40007] Num frames 15900... +[2024-11-07 22:49:48,166][40007] Num frames 16000... +[2024-11-07 22:49:48,316][40007] Avg episode rewards: #0: 4.256, true rewards: #0: 3.914 +[2024-11-07 22:49:48,318][40007] Avg episode reward: 4.256, avg true_objective: 3.914 +[2024-11-07 22:49:48,424][40007] Num frames 16100... +[2024-11-07 22:49:48,615][40007] Num frames 16200... +[2024-11-07 22:49:48,804][40007] Num frames 16300... +[2024-11-07 22:49:48,986][40007] Num frames 16400... +[2024-11-07 22:49:49,112][40007] Avg episode rewards: #0: 4.246, true rewards: #0: 3.912 +[2024-11-07 22:49:49,117][40007] Avg episode reward: 4.246, avg true_objective: 3.912 +[2024-11-07 22:49:49,277][40007] Num frames 16500... +[2024-11-07 22:49:49,471][40007] Num frames 16600... +[2024-11-07 22:49:49,656][40007] Num frames 16700... +[2024-11-07 22:49:49,838][40007] Num frames 16800... +[2024-11-07 22:49:49,983][40007] Avg episode rewards: #0: 4.267, true rewards: #0: 3.918 +[2024-11-07 22:49:49,989][40007] Avg episode reward: 4.267, avg true_objective: 3.918 +[2024-11-07 22:49:50,105][40007] Num frames 16900... +[2024-11-07 22:49:50,302][40007] Num frames 17000... +[2024-11-07 22:49:50,498][40007] Num frames 17100... +[2024-11-07 22:49:50,688][40007] Num frames 17200... +[2024-11-07 22:49:50,810][40007] Avg episode rewards: #0: 4.257, true rewards: #0: 3.916 +[2024-11-07 22:49:50,814][40007] Avg episode reward: 4.257, avg true_objective: 3.916 +[2024-11-07 22:49:50,973][40007] Num frames 17300... +[2024-11-07 22:49:51,167][40007] Num frames 17400... +[2024-11-07 22:49:51,362][40007] Num frames 17500... +[2024-11-07 22:49:51,553][40007] Num frames 17600... +[2024-11-07 22:49:51,640][40007] Avg episode rewards: #0: 4.248, true rewards: #0: 3.915 +[2024-11-07 22:49:51,645][40007] Avg episode reward: 4.248, avg true_objective: 3.915 +[2024-11-07 22:49:51,827][40007] Num frames 17700... +[2024-11-07 22:49:52,031][40007] Num frames 17800... +[2024-11-07 22:49:52,237][40007] Num frames 17900... +[2024-11-07 22:49:52,464][40007] Num frames 18000... +[2024-11-07 22:49:52,516][40007] Avg episode rewards: #0: 4.239, true rewards: #0: 3.913 +[2024-11-07 22:49:52,521][40007] Avg episode reward: 4.239, avg true_objective: 3.913 +[2024-11-07 22:49:52,730][40007] Num frames 18100... +[2024-11-07 22:49:52,940][40007] Num frames 18200... +[2024-11-07 22:49:53,135][40007] Num frames 18300... +[2024-11-07 22:49:53,346][40007] Num frames 18400... +[2024-11-07 22:49:53,507][40007] Avg episode rewards: #0: 4.266, true rewards: #0: 3.925 +[2024-11-07 22:49:53,508][40007] Avg episode reward: 4.266, avg true_objective: 3.925 +[2024-11-07 22:49:53,611][40007] Num frames 18500... +[2024-11-07 22:49:53,812][40007] Num frames 18600... +[2024-11-07 22:49:54,018][40007] Num frames 18700... +[2024-11-07 22:49:54,219][40007] Num frames 18800... +[2024-11-07 22:49:54,341][40007] Avg episode rewards: #0: 4.257, true rewards: #0: 3.923 +[2024-11-07 22:49:54,342][40007] Avg episode reward: 4.257, avg true_objective: 3.923 +[2024-11-07 22:49:54,495][40007] Num frames 18900... +[2024-11-07 22:49:54,676][40007] Num frames 19000... +[2024-11-07 22:49:54,862][40007] Num frames 19100... +[2024-11-07 22:49:55,048][40007] Num frames 19200... +[2024-11-07 22:49:55,236][40007] Num frames 19300... +[2024-11-07 22:49:55,314][40007] Avg episode rewards: #0: 4.309, true rewards: #0: 3.941 +[2024-11-07 22:49:55,315][40007] Avg episode reward: 4.309, avg true_objective: 3.941 +[2024-11-07 22:49:55,478][40007] Num frames 19400... +[2024-11-07 22:49:55,670][40007] Num frames 19500... +[2024-11-07 22:49:55,859][40007] Num frames 19600... +[2024-11-07 22:49:56,089][40007] Avg episode rewards: #0: 4.299, true rewards: #0: 3.939 +[2024-11-07 22:49:56,092][40007] Avg episode reward: 4.299, avg true_objective: 3.939 +[2024-11-07 22:49:56,108][40007] Num frames 19700... +[2024-11-07 22:49:56,375][40007] Num frames 19800... +[2024-11-07 22:49:56,567][40007] Num frames 19900... +[2024-11-07 22:49:56,764][40007] Num frames 20000... +[2024-11-07 22:49:56,974][40007] Avg episode rewards: #0: 4.290, true rewards: #0: 3.937 +[2024-11-07 22:49:56,978][40007] Avg episode reward: 4.290, avg true_objective: 3.937 +[2024-11-07 22:49:57,036][40007] Num frames 20100... +[2024-11-07 22:49:57,215][40007] Num frames 20200... +[2024-11-07 22:49:57,397][40007] Num frames 20300... +[2024-11-07 22:49:57,572][40007] Num frames 20400... +[2024-11-07 22:49:57,803][40007] Avg episode rewards: #0: 4.326, true rewards: #0: 3.942 +[2024-11-07 22:49:57,806][40007] Avg episode reward: 4.326, avg true_objective: 3.942 +[2024-11-07 22:49:57,827][40007] Num frames 20500... +[2024-11-07 22:49:58,021][40007] Num frames 20600... +[2024-11-07 22:49:58,205][40007] Num frames 20700... +[2024-11-07 22:49:58,407][40007] Num frames 20800... +[2024-11-07 22:49:58,590][40007] Num frames 20900... +[2024-11-07 22:49:58,777][40007] Num frames 21000... +[2024-11-07 22:49:58,909][40007] Avg episode rewards: #0: 4.385, true rewards: #0: 3.970 +[2024-11-07 22:49:58,912][40007] Avg episode reward: 4.385, avg true_objective: 3.970 +[2024-11-07 22:49:59,036][40007] Num frames 21100... +[2024-11-07 22:49:59,217][40007] Num frames 21200... +[2024-11-07 22:49:59,399][40007] Num frames 21300... +[2024-11-07 22:49:59,579][40007] Num frames 21400... +[2024-11-07 22:49:59,681][40007] Avg episode rewards: #0: 4.375, true rewards: #0: 3.967 +[2024-11-07 22:49:59,685][40007] Avg episode reward: 4.375, avg true_objective: 3.967 +[2024-11-07 22:49:59,843][40007] Num frames 21500... +[2024-11-07 22:50:00,022][40007] Num frames 21600... +[2024-11-07 22:50:00,206][40007] Num frames 21700... +[2024-11-07 22:50:00,401][40007] Num frames 21800... +[2024-11-07 22:50:00,608][40007] Avg episode rewards: #0: 4.395, true rewards: #0: 3.977 +[2024-11-07 22:50:00,609][40007] Avg episode reward: 4.395, avg true_objective: 3.977 +[2024-11-07 22:50:00,682][40007] Num frames 21900... +[2024-11-07 22:50:00,914][40007] Num frames 22000... +[2024-11-07 22:50:01,173][40007] Num frames 22100... +[2024-11-07 22:50:01,405][40007] Num frames 22200... +[2024-11-07 22:50:01,598][40007] Avg episode rewards: #0: 4.385, true rewards: #0: 3.974 +[2024-11-07 22:50:01,604][40007] Avg episode reward: 4.385, avg true_objective: 3.974 +[2024-11-07 22:50:01,723][40007] Num frames 22300... +[2024-11-07 22:50:01,937][40007] Num frames 22400... +[2024-11-07 22:50:02,177][40007] Num frames 22500... +[2024-11-07 22:50:02,359][40007] Num frames 22600... +[2024-11-07 22:50:02,545][40007] Num frames 22700... +[2024-11-07 22:50:02,714][40007] Avg episode rewards: #0: 4.433, true rewards: #0: 3.994 +[2024-11-07 22:50:02,719][40007] Avg episode reward: 4.433, avg true_objective: 3.994 +[2024-11-07 22:50:02,802][40007] Num frames 22800... +[2024-11-07 22:50:03,005][40007] Num frames 22900... +[2024-11-07 22:50:03,199][40007] Num frames 23000... +[2024-11-07 22:50:03,441][40007] Num frames 23100... +[2024-11-07 22:50:03,618][40007] Avg episode rewards: #0: 4.423, true rewards: #0: 3.992 +[2024-11-07 22:50:03,619][40007] Avg episode reward: 4.423, avg true_objective: 3.992 +[2024-11-07 22:50:03,728][40007] Num frames 23200... +[2024-11-07 22:50:03,945][40007] Num frames 23300... +[2024-11-07 22:50:04,178][40007] Num frames 23400... +[2024-11-07 22:50:04,396][40007] Num frames 23500... +[2024-11-07 22:50:04,526][40007] Avg episode rewards: #0: 4.413, true rewards: #0: 3.989 +[2024-11-07 22:50:04,528][40007] Avg episode reward: 4.413, avg true_objective: 3.989 +[2024-11-07 22:50:04,679][40007] Num frames 23600... +[2024-11-07 22:50:04,889][40007] Num frames 23700... +[2024-11-07 22:50:05,080][40007] Num frames 23800... +[2024-11-07 22:50:05,272][40007] Num frames 23900... +[2024-11-07 22:50:05,373][40007] Avg episode rewards: #0: 4.403, true rewards: #0: 3.987 +[2024-11-07 22:50:05,374][40007] Avg episode reward: 4.403, avg true_objective: 3.987 +[2024-11-07 22:50:05,534][40007] Num frames 24000... +[2024-11-07 22:50:05,732][40007] Num frames 24100... +[2024-11-07 22:50:05,958][40007] Num frames 24200... +[2024-11-07 22:50:06,207][40007] Avg episode rewards: #0: 4.409, true rewards: #0: 3.982 +[2024-11-07 22:50:06,208][40007] Avg episode reward: 4.409, avg true_objective: 3.982 +[2024-11-07 22:50:06,227][40007] Num frames 24300... +[2024-11-07 22:50:06,460][40007] Num frames 24400... +[2024-11-07 22:50:06,660][40007] Num frames 24500... +[2024-11-07 22:50:06,876][40007] Num frames 24600... +[2024-11-07 22:50:07,094][40007] Avg episode rewards: #0: 4.399, true rewards: #0: 3.980 +[2024-11-07 22:50:07,097][40007] Avg episode reward: 4.399, avg true_objective: 3.980 +[2024-11-07 22:50:07,176][40007] Num frames 24700... +[2024-11-07 22:50:07,389][40007] Num frames 24800... +[2024-11-07 22:50:07,590][40007] Num frames 24900... +[2024-11-07 22:50:07,795][40007] Num frames 25000... +[2024-11-07 22:50:07,977][40007] Avg episode rewards: #0: 4.390, true rewards: #0: 3.978 +[2024-11-07 22:50:07,982][40007] Avg episode reward: 4.390, avg true_objective: 3.978 +[2024-11-07 22:50:08,086][40007] Num frames 25100... +[2024-11-07 22:50:08,284][40007] Num frames 25200... +[2024-11-07 22:50:08,505][40007] Num frames 25300... +[2024-11-07 22:50:08,714][40007] Num frames 25400... +[2024-11-07 22:50:08,863][40007] Avg episode rewards: #0: 4.382, true rewards: #0: 3.976 +[2024-11-07 22:50:08,867][40007] Avg episode reward: 4.382, avg true_objective: 3.976 +[2024-11-07 22:50:09,002][40007] Num frames 25500... +[2024-11-07 22:50:09,206][40007] Num frames 25600... +[2024-11-07 22:50:09,417][40007] Num frames 25700... +[2024-11-07 22:50:09,615][40007] Num frames 25800... +[2024-11-07 22:50:09,731][40007] Avg episode rewards: #0: 4.374, true rewards: #0: 3.974 +[2024-11-07 22:50:09,735][40007] Avg episode reward: 4.374, avg true_objective: 3.974 +[2024-11-07 22:50:09,905][40007] Num frames 25900... +[2024-11-07 22:50:10,137][40007] Num frames 26000... +[2024-11-07 22:50:10,390][40007] Num frames 26100... +[2024-11-07 22:50:10,698][40007] Num frames 26200... +[2024-11-07 22:50:10,780][40007] Avg episode rewards: #0: 4.365, true rewards: #0: 3.972 +[2024-11-07 22:50:10,784][40007] Avg episode reward: 4.365, avg true_objective: 3.972 +[2024-11-07 22:50:10,980][40007] Num frames 26300... +[2024-11-07 22:50:11,179][40007] Num frames 26400... +[2024-11-07 22:50:11,372][40007] Num frames 26500... +[2024-11-07 22:50:11,562][40007] Num frames 26600... +[2024-11-07 22:50:11,730][40007] Avg episode rewards: #0: 4.382, true rewards: #0: 3.979 +[2024-11-07 22:50:11,734][40007] Avg episode reward: 4.382, avg true_objective: 3.979 +[2024-11-07 22:50:11,837][40007] Num frames 26700... +[2024-11-07 22:50:12,038][40007] Num frames 26800... +[2024-11-07 22:50:12,239][40007] Num frames 26900... +[2024-11-07 22:50:12,495][40007] Num frames 27000... +[2024-11-07 22:50:12,743][40007] Num frames 27100... +[2024-11-07 22:50:12,817][40007] Avg episode rewards: #0: 4.398, true rewards: #0: 3.986 +[2024-11-07 22:50:12,819][40007] Avg episode reward: 4.398, avg true_objective: 3.986 +[2024-11-07 22:50:13,051][40007] Num frames 27200... +[2024-11-07 22:50:13,264][40007] Num frames 27300... +[2024-11-07 22:50:13,473][40007] Num frames 27400... +[2024-11-07 22:50:13,688][40007] Avg episode rewards: #0: 4.400, true rewards: #0: 3.980 +[2024-11-07 22:50:13,689][40007] Avg episode reward: 4.400, avg true_objective: 3.980 +[2024-11-07 22:50:13,787][40007] Num frames 27500... +[2024-11-07 22:50:14,012][40007] Num frames 27600... +[2024-11-07 22:50:14,248][40007] Num frames 27700... +[2024-11-07 22:50:14,512][40007] Num frames 27800... +[2024-11-07 22:50:14,670][40007] Avg episode rewards: #0: 4.392, true rewards: #0: 3.978 +[2024-11-07 22:50:14,672][40007] Avg episode reward: 4.392, avg true_objective: 3.978 +[2024-11-07 22:50:14,824][40007] Num frames 27900... +[2024-11-07 22:50:15,031][40007] Num frames 28000... +[2024-11-07 22:50:15,229][40007] Num frames 28100... +[2024-11-07 22:50:15,458][40007] Num frames 28200... +[2024-11-07 22:50:15,593][40007] Avg episode rewards: #0: 4.384, true rewards: #0: 3.976 +[2024-11-07 22:50:15,594][40007] Avg episode reward: 4.384, avg true_objective: 3.976 +[2024-11-07 22:50:15,775][40007] Num frames 28300... +[2024-11-07 22:50:15,995][40007] Num frames 28400... +[2024-11-07 22:50:16,245][40007] Num frames 28500... +[2024-11-07 22:50:16,455][40007] Num frames 28600... +[2024-11-07 22:50:16,539][40007] Avg episode rewards: #0: 4.377, true rewards: #0: 3.974 +[2024-11-07 22:50:16,543][40007] Avg episode reward: 4.377, avg true_objective: 3.974 +[2024-11-07 22:50:16,746][40007] Num frames 28700... +[2024-11-07 22:50:16,954][40007] Num frames 28800... +[2024-11-07 22:50:17,146][40007] Avg episode rewards: #0: 4.352, true rewards: #0: 3.955 +[2024-11-07 22:50:17,150][40007] Avg episode reward: 4.352, avg true_objective: 3.955 +[2024-11-07 22:50:17,238][40007] Num frames 28900... +[2024-11-07 22:50:17,437][40007] Num frames 29000... +[2024-11-07 22:50:17,645][40007] Num frames 29100... +[2024-11-07 22:50:17,854][40007] Num frames 29200... +[2024-11-07 22:50:18,015][40007] Avg episode rewards: #0: 4.345, true rewards: #0: 3.953 +[2024-11-07 22:50:18,018][40007] Avg episode reward: 4.345, avg true_objective: 3.953 +[2024-11-07 22:50:18,128][40007] Num frames 29300... +[2024-11-07 22:50:18,318][40007] Num frames 29400... +[2024-11-07 22:50:18,507][40007] Num frames 29500... +[2024-11-07 22:50:18,712][40007] Num frames 29600... +[2024-11-07 22:50:18,842][40007] Avg episode rewards: #0: 4.338, true rewards: #0: 3.951 +[2024-11-07 22:50:18,845][40007] Avg episode reward: 4.338, avg true_objective: 3.951 +[2024-11-07 22:50:19,020][40007] Num frames 29700... +[2024-11-07 22:50:19,208][40007] Num frames 29800... +[2024-11-07 22:50:19,397][40007] Num frames 29900... +[2024-11-07 22:50:19,581][40007] Num frames 30000... +[2024-11-07 22:50:19,680][40007] Avg episode rewards: #0: 4.332, true rewards: #0: 3.950 +[2024-11-07 22:50:19,685][40007] Avg episode reward: 4.332, avg true_objective: 3.950 +[2024-11-07 22:50:19,840][40007] Num frames 30100... +[2024-11-07 22:50:21,472][40007] Num frames 30200... +[2024-11-07 22:50:21,655][40007] Num frames 30300... +[2024-11-07 22:50:21,842][40007] Num frames 30400... +[2024-11-07 22:50:21,904][40007] Avg episode rewards: #0: 4.325, true rewards: #0: 3.949 +[2024-11-07 22:50:21,907][40007] Avg episode reward: 4.325, avg true_objective: 3.949 +[2024-11-07 22:50:22,100][40007] Num frames 30500... +[2024-11-07 22:50:22,300][40007] Num frames 30600... +[2024-11-07 22:50:22,486][40007] Num frames 30700... +[2024-11-07 22:50:22,711][40007] Avg episode rewards: #0: 4.319, true rewards: #0: 3.947 +[2024-11-07 22:50:22,712][40007] Avg episode reward: 4.319, avg true_objective: 3.947 +[2024-11-07 22:50:22,740][40007] Num frames 30800... +[2024-11-07 22:50:22,933][40007] Num frames 30900... +[2024-11-07 22:50:23,122][40007] Num frames 31000... +[2024-11-07 22:50:23,310][40007] Num frames 31100... +[2024-11-07 22:50:23,499][40007] Avg episode rewards: #0: 4.313, true rewards: #0: 3.946 +[2024-11-07 22:50:23,502][40007] Avg episode reward: 4.313, avg true_objective: 3.946 +[2024-11-07 22:50:23,586][40007] Num frames 31200... +[2024-11-07 22:50:23,773][40007] Num frames 31300... +[2024-11-07 22:50:23,952][40007] Num frames 31400... +[2024-11-07 22:50:24,130][40007] Num frames 31500... +[2024-11-07 22:50:24,305][40007] Avg episode rewards: #0: 4.307, true rewards: #0: 3.944 +[2024-11-07 22:50:24,309][40007] Avg episode reward: 4.307, avg true_objective: 3.944 +[2024-11-07 22:50:24,424][40007] Num frames 31600... +[2024-11-07 22:50:24,618][40007] Num frames 31700... +[2024-11-07 22:50:24,800][40007] Num frames 31800... +[2024-11-07 22:50:24,986][40007] Num frames 31900... +[2024-11-07 22:50:25,179][40007] Num frames 32000... +[2024-11-07 22:50:25,378][40007] Avg episode rewards: #0: 4.342, true rewards: #0: 3.959 +[2024-11-07 22:50:25,382][40007] Avg episode reward: 4.342, avg true_objective: 3.959 +[2024-11-07 22:50:25,472][40007] Num frames 32100... +[2024-11-07 22:50:25,664][40007] Num frames 32200... +[2024-11-07 22:50:25,857][40007] Num frames 32300... +[2024-11-07 22:50:26,048][40007] Num frames 32400... +[2024-11-07 22:50:26,213][40007] Avg episode rewards: #0: 4.336, true rewards: #0: 3.958 +[2024-11-07 22:50:26,214][40007] Avg episode reward: 4.336, avg true_objective: 3.958 +[2024-11-07 22:50:26,308][40007] Num frames 32500... +[2024-11-07 22:50:26,501][40007] Num frames 32600... +[2024-11-07 22:50:26,696][40007] Num frames 32700... +[2024-11-07 22:50:26,886][40007] Num frames 32800... +[2024-11-07 22:50:27,012][40007] Avg episode rewards: #0: 4.330, true rewards: #0: 3.956 +[2024-11-07 22:50:27,016][40007] Avg episode reward: 4.330, avg true_objective: 3.956 +[2024-11-07 22:50:27,172][40007] Num frames 32900... +[2024-11-07 22:50:27,363][40007] Num frames 33000... +[2024-11-07 22:50:27,565][40007] Num frames 33100... +[2024-11-07 22:50:27,755][40007] Num frames 33200... +[2024-11-07 22:50:27,954][40007] Avg episode rewards: #0: 4.343, true rewards: #0: 3.962 +[2024-11-07 22:50:27,957][40007] Avg episode reward: 4.343, avg true_objective: 3.962 +[2024-11-07 22:50:28,007][40007] Num frames 33300... +[2024-11-07 22:50:28,228][40007] Num frames 33400... +[2024-11-07 22:50:28,525][40007] Num frames 33500... +[2024-11-07 22:50:28,692][40007] Num frames 33600... +[2024-11-07 22:50:28,865][40007] Avg episode rewards: #0: 4.337, true rewards: #0: 3.961 +[2024-11-07 22:50:28,870][40007] Avg episode reward: 4.337, avg true_objective: 3.961 +[2024-11-07 22:50:28,953][40007] Num frames 33700... +[2024-11-07 22:50:29,168][40007] Num frames 33800... +[2024-11-07 22:50:29,394][40007] Num frames 33900... +[2024-11-07 22:50:29,572][40007] Num frames 34000... +[2024-11-07 22:50:29,702][40007] Avg episode rewards: #0: 4.342, true rewards: #0: 3.958 +[2024-11-07 22:50:29,707][40007] Avg episode reward: 4.342, avg true_objective: 3.958 +[2024-11-07 22:50:29,834][40007] Num frames 34100... +[2024-11-07 22:50:30,007][40007] Num frames 34200... +[2024-11-07 22:50:30,187][40007] Num frames 34300... +[2024-11-07 22:50:30,371][40007] Num frames 34400... +[2024-11-07 22:50:30,584][40007] Avg episode rewards: #0: 4.355, true rewards: #0: 3.964 +[2024-11-07 22:50:30,587][40007] Avg episode reward: 4.355, avg true_objective: 3.964 +[2024-11-07 22:50:30,633][40007] Num frames 34500... +[2024-11-07 22:50:30,813][40007] Num frames 34600... +[2024-11-07 22:50:30,993][40007] Num frames 34700... +[2024-11-07 22:50:31,180][40007] Num frames 34800... +[2024-11-07 22:50:31,373][40007] Avg episode rewards: #0: 4.349, true rewards: #0: 3.962 +[2024-11-07 22:50:31,376][40007] Avg episode reward: 4.349, avg true_objective: 3.962 +[2024-11-07 22:50:31,446][40007] Num frames 34900... +[2024-11-07 22:50:31,621][40007] Num frames 35000... +[2024-11-07 22:50:31,796][40007] Num frames 35100... +[2024-11-07 22:50:31,980][40007] Num frames 35200... +[2024-11-07 22:50:32,137][40007] Avg episode rewards: #0: 4.343, true rewards: #0: 3.961 +[2024-11-07 22:50:32,141][40007] Avg episode reward: 4.343, avg true_objective: 3.961 +[2024-11-07 22:50:32,237][40007] Num frames 35300... +[2024-11-07 22:50:32,416][40007] Num frames 35400... +[2024-11-07 22:50:32,596][40007] Num frames 35500... +[2024-11-07 22:50:32,673][40007] Avg episode rewards: #0: 4.323, true rewards: #0: 3.946 +[2024-11-07 22:50:32,676][40007] Avg episode reward: 4.323, avg true_objective: 3.946 +[2024-11-07 22:50:32,861][40007] Num frames 35600... +[2024-11-07 22:50:33,042][40007] Num frames 35700... +[2024-11-07 22:50:33,232][40007] Num frames 35800... +[2024-11-07 22:50:33,435][40007] Num frames 35900... +[2024-11-07 22:50:33,594][40007] Avg episode rewards: #0: 4.336, true rewards: #0: 3.951 +[2024-11-07 22:50:33,598][40007] Avg episode reward: 4.336, avg true_objective: 3.951 +[2024-11-07 22:50:33,694][40007] Num frames 36000... +[2024-11-07 22:50:33,875][40007] Num frames 36100... +[2024-11-07 22:50:34,075][40007] Num frames 36200... +[2024-11-07 22:50:34,275][40007] Num frames 36300... +[2024-11-07 22:50:34,410][40007] Avg episode rewards: #0: 4.331, true rewards: #0: 3.950 +[2024-11-07 22:50:34,414][40007] Avg episode reward: 4.331, avg true_objective: 3.950 +[2024-11-07 22:50:34,541][40007] Num frames 36400... +[2024-11-07 22:50:34,746][40007] Num frames 36500... +[2024-11-07 22:50:34,950][40007] Num frames 36600... +[2024-11-07 22:50:35,157][40007] Num frames 36700... +[2024-11-07 22:50:35,293][40007] Avg episode rewards: #0: 4.325, true rewards: #0: 3.949 +[2024-11-07 22:50:35,295][40007] Avg episode reward: 4.325, avg true_objective: 3.949 +[2024-11-07 22:50:35,461][40007] Num frames 36800... +[2024-11-07 22:50:35,710][40007] Num frames 36900... +[2024-11-07 22:50:35,936][40007] Num frames 37000... +[2024-11-07 22:50:36,147][40007] Num frames 37100... +[2024-11-07 22:50:36,226][40007] Avg episode rewards: #0: 4.320, true rewards: #0: 3.948 +[2024-11-07 22:50:36,228][40007] Avg episode reward: 4.320, avg true_objective: 3.948 +[2024-11-07 22:50:36,432][40007] Num frames 37200... +[2024-11-07 22:50:36,636][40007] Num frames 37300... +[2024-11-07 22:50:36,854][40007] Num frames 37400... +[2024-11-07 22:50:37,101][40007] Avg episode rewards: #0: 4.315, true rewards: #0: 3.947 +[2024-11-07 22:50:37,103][40007] Avg episode reward: 4.315, avg true_objective: 3.947 +[2024-11-07 22:50:37,115][40007] Num frames 37500... +[2024-11-07 22:50:37,306][40007] Num frames 37600... +[2024-11-07 22:50:37,525][40007] Num frames 37700... +[2024-11-07 22:50:37,746][40007] Num frames 37800... +[2024-11-07 22:50:37,974][40007] Avg episode rewards: #0: 4.310, true rewards: #0: 3.946 +[2024-11-07 22:50:37,975][40007] Avg episode reward: 4.310, avg true_objective: 3.946 +[2024-11-07 22:50:38,024][40007] Num frames 37900... +[2024-11-07 22:50:38,269][40007] Num frames 38000... +[2024-11-07 22:50:38,499][40007] Num frames 38100... +[2024-11-07 22:50:38,714][40007] Num frames 38200... +[2024-11-07 22:50:38,905][40007] Avg episode rewards: #0: 4.305, true rewards: #0: 3.945 +[2024-11-07 22:50:38,909][40007] Avg episode reward: 4.305, avg true_objective: 3.945 +[2024-11-07 22:50:39,010][40007] Num frames 38300... +[2024-11-07 22:50:39,220][40007] Num frames 38400... +[2024-11-07 22:50:39,437][40007] Num frames 38500... +[2024-11-07 22:50:39,634][40007] Num frames 38600... +[2024-11-07 22:50:39,829][40007] Num frames 38700... +[2024-11-07 22:50:39,980][40007] Avg episode rewards: #0: 4.321, true rewards: #0: 3.953 +[2024-11-07 22:50:39,986][40007] Avg episode reward: 4.321, avg true_objective: 3.953 +[2024-11-07 22:50:40,136][40007] Num frames 38800... +[2024-11-07 22:50:40,374][40007] Num frames 38900... +[2024-11-07 22:50:40,586][40007] Num frames 39000... +[2024-11-07 22:50:40,808][40007] Num frames 39100... +[2024-11-07 22:50:40,931][40007] Avg episode rewards: #0: 4.316, true rewards: #0: 3.952 +[2024-11-07 22:50:40,937][40007] Avg episode reward: 4.316, avg true_objective: 3.952 +[2024-11-07 22:50:41,117][40007] Num frames 39200... +[2024-11-07 22:50:41,351][40007] Num frames 39300... +[2024-11-07 22:50:41,648][40007] Num frames 39400... +[2024-11-07 22:50:41,853][40007] Num frames 39500... +[2024-11-07 22:50:42,056][40007] Num frames 39600... +[2024-11-07 22:50:42,132][40007] Avg episode rewards: #0: 4.331, true rewards: #0: 3.961 +[2024-11-07 22:50:42,136][40007] Avg episode reward: 4.331, avg true_objective: 3.961 +[2024-11-07 22:50:42,355][40007] Num frames 39700... +[2024-11-07 22:50:42,550][40007] Num frames 39800... +[2024-11-07 22:50:42,741][40007] Num frames 39900... +[2024-11-07 22:50:42,985][40007] Avg episode rewards: #0: 4.331, true rewards: #0: 3.961 +[2024-11-07 22:50:42,989][40007] Avg episode reward: 4.331, avg true_objective: 3.961 +[2024-11-07 22:50:43,031][40007] Num frames 40000... +[2024-11-07 22:50:43,244][40007] Num frames 40100... +[2024-11-07 22:50:43,446][40007] Num frames 40200... +[2024-11-07 22:50:43,629][40007] Num frames 40300... +[2024-11-07 22:50:43,836][40007] Avg episode rewards: #0: 4.314, true rewards: #0: 3.954 +[2024-11-07 22:50:43,839][40007] Avg episode reward: 4.314, avg true_objective: 3.954 +[2024-11-07 22:50:43,907][40007] Num frames 40400... +[2024-11-07 22:50:44,085][40007] Num frames 40500... +[2024-11-07 22:50:44,264][40007] Num frames 40600... +[2024-11-07 22:50:44,445][40007] Num frames 40700... +[2024-11-07 22:50:44,609][40007] Avg episode rewards: #0: 4.314, true rewards: #0: 3.954 +[2024-11-07 22:50:44,611][40007] Avg episode reward: 4.314, avg true_objective: 3.954 +[2024-11-07 22:50:44,701][40007] Num frames 40800... +[2024-11-07 22:50:44,910][40007] Num frames 40900... +[2024-11-07 22:50:45,091][40007] Num frames 41000... +[2024-11-07 22:50:45,282][40007] Num frames 41100... +[2024-11-07 22:50:45,432][40007] Avg episode rewards: #0: 4.314, true rewards: #0: 3.954 +[2024-11-07 22:50:45,433][40007] Avg episode reward: 4.314, avg true_objective: 3.954 +[2024-11-07 22:50:45,544][40007] Num frames 41200... +[2024-11-07 22:50:45,724][40007] Num frames 41300... +[2024-11-07 22:50:45,909][40007] Num frames 41400... +[2024-11-07 22:50:46,077][40007] Avg episode rewards: #0: 4.305, true rewards: #0: 3.945 +[2024-11-07 22:50:46,083][40007] Avg episode reward: 4.305, avg true_objective: 3.945 +[2024-11-07 22:50:46,169][40007] Num frames 41500... +[2024-11-07 22:50:46,354][40007] Num frames 41600... +[2024-11-07 22:50:46,532][40007] Num frames 41700... +[2024-11-07 22:50:46,713][40007] Num frames 41800... +[2024-11-07 22:50:46,855][40007] Avg episode rewards: #0: 4.305, true rewards: #0: 3.945 +[2024-11-07 22:50:46,858][40007] Avg episode reward: 4.305, avg true_objective: 3.945 +[2024-11-07 22:50:46,968][40007] Num frames 41900... +[2024-11-07 22:50:47,148][40007] Num frames 42000... +[2024-11-07 22:50:47,328][40007] Num frames 42100... +[2024-11-07 22:50:47,502][40007] Num frames 42200... +[2024-11-07 22:50:47,614][40007] Avg episode rewards: #0: 4.288, true rewards: #0: 3.938 +[2024-11-07 22:50:47,618][40007] Avg episode reward: 4.288, avg true_objective: 3.938 +[2024-11-07 22:50:47,777][40007] Num frames 42300... +[2024-11-07 22:50:47,988][40007] Num frames 42400... +[2024-11-07 22:50:48,187][40007] Num frames 42500... +[2024-11-07 22:50:48,379][40007] Num frames 42600... +[2024-11-07 22:50:48,565][40007] Num frames 42700... +[2024-11-07 22:50:48,644][40007] Avg episode rewards: #0: 4.318, true rewards: #0: 3.948 +[2024-11-07 22:50:48,646][40007] Avg episode reward: 4.318, avg true_objective: 3.948 +[2024-11-07 22:50:48,834][40007] Num frames 42800... +[2024-11-07 22:50:49,026][40007] Num frames 42900... +[2024-11-07 22:50:49,220][40007] Num frames 43000... +[2024-11-07 22:50:49,471][40007] Avg episode rewards: #0: 4.301, true rewards: #0: 3.941 +[2024-11-07 22:50:49,476][40007] Avg episode reward: 4.301, avg true_objective: 3.941 +[2024-11-07 22:50:49,507][40007] Num frames 43100... +[2024-11-07 22:50:49,707][40007] Num frames 43200... +[2024-11-07 22:50:49,918][40007] Num frames 43300... +[2024-11-07 22:50:50,131][40007] Num frames 43400... +[2024-11-07 22:50:50,345][40007] Avg episode rewards: #0: 4.301, true rewards: #0: 3.941 +[2024-11-07 22:50:50,347][40007] Avg episode reward: 4.301, avg true_objective: 3.941 +[2024-11-07 22:50:50,409][40007] Num frames 43500... +[2024-11-07 22:50:50,613][40007] Num frames 43600... +[2024-11-07 22:50:50,804][40007] Num frames 43700... +[2024-11-07 22:50:50,994][40007] Num frames 43800... +[2024-11-07 22:50:51,195][40007] Num frames 43900... +[2024-11-07 22:50:51,390][40007] Num frames 44000... +[2024-11-07 22:50:51,495][40007] Avg episode rewards: #0: 4.337, true rewards: #0: 3.957 +[2024-11-07 22:50:51,500][40007] Avg episode reward: 4.337, avg true_objective: 3.957 +[2024-11-07 22:50:51,668][40007] Num frames 44100... +[2024-11-07 22:50:51,850][40007] Num frames 44200... +[2024-11-07 22:50:52,032][40007] Num frames 44300... +[2024-11-07 22:50:52,218][40007] Num frames 44400... +[2024-11-07 22:50:52,287][40007] Avg episode rewards: #0: 4.337, true rewards: #0: 3.957 +[2024-11-07 22:50:52,291][40007] Avg episode reward: 4.337, avg true_objective: 3.957 +[2024-11-07 22:50:52,482][40007] Num frames 44500... +[2024-11-07 22:50:52,672][40007] Num frames 44600... +[2024-11-07 22:50:52,863][40007] Num frames 44700... +[2024-11-07 22:50:53,093][40007] Avg episode rewards: #0: 4.350, true rewards: #0: 3.970 +[2024-11-07 22:50:53,097][40007] Avg episode reward: 4.350, avg true_objective: 3.970 +[2024-11-07 22:50:53,139][40007] Num frames 44800... +[2024-11-07 22:50:53,328][40007] Num frames 44900... +[2024-11-07 22:50:53,515][40007] Num frames 45000... +[2024-11-07 22:50:55,158][40007] Num frames 45100... +[2024-11-07 22:50:55,351][40007] Avg episode rewards: #0: 4.350, true rewards: #0: 3.970 +[2024-11-07 22:50:55,354][40007] Avg episode reward: 4.350, avg true_objective: 3.970 +[2024-11-07 22:50:55,431][40007] Num frames 45200... +[2024-11-07 22:50:55,615][40007] Num frames 45300... +[2024-11-07 22:50:55,807][40007] Num frames 45400... +[2024-11-07 22:50:55,992][40007] Num frames 45500... +[2024-11-07 22:50:56,171][40007] Avg episode rewards: #0: 4.350, true rewards: #0: 3.970 +[2024-11-07 22:50:56,176][40007] Avg episode reward: 4.350, avg true_objective: 3.970 +[2024-11-07 22:50:56,280][40007] Num frames 45600... +[2024-11-07 22:50:56,463][40007] Num frames 45700... +[2024-11-07 22:50:56,651][40007] Num frames 45800... +[2024-11-07 22:50:56,835][40007] Num frames 45900... +[2024-11-07 22:50:56,975][40007] Avg episode rewards: #0: 4.350, true rewards: #0: 3.970 +[2024-11-07 22:50:56,976][40007] Avg episode reward: 4.350, avg true_objective: 3.970 +[2024-11-07 22:50:57,093][40007] Num frames 46000... +[2024-11-07 22:50:57,281][40007] Num frames 46100... +[2024-11-07 22:50:57,461][40007] Num frames 46200... +[2024-11-07 22:50:57,647][40007] Num frames 46300... +[2024-11-07 22:50:57,758][40007] Avg episode rewards: #0: 4.350, true rewards: #0: 3.970 +[2024-11-07 22:50:57,762][40007] Avg episode reward: 4.350, avg true_objective: 3.970 +[2024-11-07 22:50:57,923][40007] Num frames 46400... +[2024-11-07 22:50:58,116][40007] Num frames 46500... +[2024-11-07 22:50:58,363][40007] Num frames 46600... +[2024-11-07 22:50:58,570][40007] Num frames 46700... +[2024-11-07 22:50:58,650][40007] Avg episode rewards: #0: 4.350, true rewards: #0: 3.970 +[2024-11-07 22:50:58,654][40007] Avg episode reward: 4.350, avg true_objective: 3.970 +[2024-11-07 22:50:58,848][40007] Num frames 46800... +[2024-11-07 22:50:59,045][40007] Num frames 46900... +[2024-11-07 22:50:59,243][40007] Num frames 47000... +[2024-11-07 22:50:59,487][40007] Avg episode rewards: #0: 4.334, true rewards: #0: 3.964 +[2024-11-07 22:50:59,492][40007] Avg episode reward: 4.334, avg true_objective: 3.964 +[2024-11-07 22:50:59,516][40007] Num frames 47100... +[2024-11-07 22:50:59,711][40007] Num frames 47200... +[2024-11-07 22:50:59,896][40007] Num frames 47300... +[2024-11-07 22:51:00,086][40007] Num frames 47400... +[2024-11-07 22:51:00,236][40007] Avg episode rewards: #0: 4.324, true rewards: #0: 3.954 +[2024-11-07 22:51:00,239][40007] Avg episode reward: 4.324, avg true_objective: 3.954 +[2024-11-07 22:51:00,357][40007] Num frames 47500... +[2024-11-07 22:51:00,552][40007] Num frames 47600... +[2024-11-07 22:51:00,753][40007] Num frames 47700... +[2024-11-07 22:51:00,936][40007] Num frames 47800... +[2024-11-07 22:51:01,054][40007] Avg episode rewards: #0: 4.337, true rewards: #0: 3.967 +[2024-11-07 22:51:01,056][40007] Avg episode reward: 4.337, avg true_objective: 3.967 +[2024-11-07 22:51:01,219][40007] Num frames 47900... +[2024-11-07 22:51:01,414][40007] Num frames 48000... +[2024-11-07 22:51:01,676][40007] Num frames 48100... +[2024-11-07 22:51:01,885][40007] Num frames 48200... +[2024-11-07 22:51:01,972][40007] Avg episode rewards: #0: 4.337, true rewards: #0: 3.967 +[2024-11-07 22:51:01,975][40007] Avg episode reward: 4.337, avg true_objective: 3.967 +[2024-11-07 22:51:02,174][40007] Num frames 48300... +[2024-11-07 22:51:02,396][40007] Num frames 48400... +[2024-11-07 22:51:02,617][40007] Num frames 48500... +[2024-11-07 22:51:02,790][40007] Num frames 48600... +[2024-11-07 22:51:02,902][40007] Avg episode rewards: #0: 4.350, true rewards: #0: 3.970 +[2024-11-07 22:51:02,905][40007] Avg episode reward: 4.350, avg true_objective: 3.970 +[2024-11-07 22:51:03,049][40007] Num frames 48700... +[2024-11-07 22:51:03,228][40007] Num frames 48800... +[2024-11-07 22:51:03,418][40007] Num frames 48900... +[2024-11-07 22:51:03,597][40007] Num frames 49000... +[2024-11-07 22:51:03,680][40007] Avg episode rewards: #0: 4.350, true rewards: #0: 3.970 +[2024-11-07 22:51:03,683][40007] Avg episode reward: 4.350, avg true_objective: 3.970 +[2024-11-07 22:51:03,854][40007] Num frames 49100... +[2024-11-07 22:51:04,032][40007] Num frames 49200... +[2024-11-07 22:51:04,270][40007] Num frames 49300... +[2024-11-07 22:51:04,500][40007] Avg episode rewards: #0: 4.350, true rewards: #0: 3.970 +[2024-11-07 22:51:04,504][40007] Avg episode reward: 4.350, avg true_objective: 3.970 +[2024-11-07 22:51:04,519][40007] Num frames 49400... +[2024-11-07 22:51:04,720][40007] Num frames 49500... +[2024-11-07 22:51:04,905][40007] Num frames 49600... +[2024-11-07 22:51:05,080][40007] Num frames 49700... +[2024-11-07 22:51:05,268][40007] Num frames 49800... +[2024-11-07 22:51:05,416][40007] Avg episode rewards: #0: 4.367, true rewards: #0: 3.977 +[2024-11-07 22:51:05,420][40007] Avg episode reward: 4.367, avg true_objective: 3.977 +[2024-11-07 22:51:05,533][40007] Num frames 49900... +[2024-11-07 22:51:05,733][40007] Num frames 50000... +[2024-11-07 22:51:05,960][40007] Num frames 50100... +[2024-11-07 22:51:06,226][40007] Num frames 50200... +[2024-11-07 22:51:06,353][40007] Avg episode rewards: #0: 4.370, true rewards: #0: 3.980 +[2024-11-07 22:51:06,357][40007] Avg episode reward: 4.370, avg true_objective: 3.980 +[2024-11-07 22:51:06,496][40007] Num frames 50300... +[2024-11-07 22:51:06,672][40007] Num frames 50400... +[2024-11-07 22:51:06,852][40007] Num frames 50500... +[2024-11-07 22:51:07,035][40007] Num frames 50600... +[2024-11-07 22:51:07,249][40007] Avg episode rewards: #0: 4.399, true rewards: #0: 3.999 +[2024-11-07 22:51:07,256][40007] Avg episode reward: 4.399, avg true_objective: 3.999 +[2024-11-07 22:51:07,310][40007] Num frames 50700... +[2024-11-07 22:51:07,508][40007] Num frames 50800... +[2024-11-07 22:51:07,710][40007] Num frames 50900... +[2024-11-07 22:51:07,926][40007] Num frames 51000... +[2024-11-07 22:51:08,139][40007] Avg episode rewards: #0: 4.399, true rewards: #0: 3.999 +[2024-11-07 22:51:08,141][40007] Avg episode reward: 4.399, avg true_objective: 3.999 +[2024-11-07 22:51:08,238][40007] Num frames 51100... +[2024-11-07 22:51:08,450][40007] Num frames 51200... +[2024-11-07 22:51:08,691][40007] Num frames 51300... +[2024-11-07 22:51:08,955][40007] Num frames 51400... +[2024-11-07 22:51:09,144][40007] Avg episode rewards: #0: 4.370, true rewards: #0: 3.989 +[2024-11-07 22:51:09,146][40007] Avg episode reward: 4.370, avg true_objective: 3.989 +[2024-11-07 22:51:09,310][40007] Num frames 51500... +[2024-11-07 22:51:09,575][40007] Num frames 51600... +[2024-11-07 22:51:09,792][40007] Num frames 51700... +[2024-11-07 22:51:09,998][40007] Num frames 51800... +[2024-11-07 22:51:10,130][40007] Avg episode rewards: #0: 4.350, true rewards: #0: 3.980 +[2024-11-07 22:51:10,132][40007] Avg episode reward: 4.350, avg true_objective: 3.980 +[2024-11-07 22:51:10,293][40007] Num frames 51900... +[2024-11-07 22:51:10,502][40007] Num frames 52000... +[2024-11-07 22:51:10,774][40007] Num frames 52100... +[2024-11-07 22:51:10,985][40007] Num frames 52200... +[2024-11-07 22:51:11,079][40007] Avg episode rewards: #0: 4.337, true rewards: #0: 3.977 +[2024-11-07 22:51:11,080][40007] Avg episode reward: 4.337, avg true_objective: 3.977 +[2024-11-07 22:51:11,275][40007] Num frames 52300... +[2024-11-07 22:51:11,464][40007] Num frames 52400... +[2024-11-07 22:51:11,647][40007] Num frames 52500... +[2024-11-07 22:51:11,914][40007] Avg episode rewards: #0: 4.337, true rewards: #0: 3.977 +[2024-11-07 22:51:11,915][40007] Avg episode reward: 4.337, avg true_objective: 3.977 +[2024-11-07 22:51:11,919][40007] Num frames 52600... +[2024-11-07 22:51:12,121][40007] Num frames 52700... +[2024-11-07 22:51:12,314][40007] Num frames 52800... +[2024-11-07 22:51:12,519][40007] Num frames 52900... +[2024-11-07 22:51:12,688][40007] Num frames 53000... +[2024-11-07 22:51:12,773][40007] Avg episode rewards: #0: 4.350, true rewards: #0: 3.980 +[2024-11-07 22:51:12,774][40007] Avg episode reward: 4.350, avg true_objective: 3.980 +[2024-11-07 22:51:12,945][40007] Num frames 53100... +[2024-11-07 22:51:13,130][40007] Num frames 53200... +[2024-11-07 22:51:13,315][40007] Num frames 53300... +[2024-11-07 22:51:13,562][40007] Avg episode rewards: #0: 4.314, true rewards: #0: 3.964 +[2024-11-07 22:51:13,564][40007] Avg episode reward: 4.314, avg true_objective: 3.964 +[2024-11-07 22:51:13,570][40007] Num frames 53400... +[2024-11-07 22:51:13,783][40007] Num frames 53500... +[2024-11-07 22:51:13,991][40007] Num frames 53600... +[2024-11-07 22:51:14,195][40007] Num frames 53700... +[2024-11-07 22:51:14,404][40007] Avg episode rewards: #0: 4.322, true rewards: #0: 3.972 +[2024-11-07 22:51:14,408][40007] Avg episode reward: 4.322, avg true_objective: 3.972 +[2024-11-07 22:51:14,469][40007] Num frames 53800... +[2024-11-07 22:51:14,659][40007] Num frames 53900... +[2024-11-07 22:51:14,869][40007] Num frames 54000... +[2024-11-07 22:51:15,059][40007] Num frames 54100... +[2024-11-07 22:51:15,257][40007] Avg episode rewards: #0: 4.322, true rewards: #0: 3.972 +[2024-11-07 22:51:15,261][40007] Avg episode reward: 4.322, avg true_objective: 3.972 +[2024-11-07 22:51:15,349][40007] Num frames 54200... +[2024-11-07 22:51:15,542][40007] Num frames 54300... +[2024-11-07 22:51:15,787][40007] Num frames 54400... +[2024-11-07 22:51:16,031][40007] Num frames 54500... +[2024-11-07 22:51:16,300][40007] Avg episode rewards: #0: 4.319, true rewards: #0: 3.969 +[2024-11-07 22:51:16,304][40007] Avg episode reward: 4.319, avg true_objective: 3.969 +[2024-11-07 22:51:16,367][40007] Num frames 54600... +[2024-11-07 22:51:16,601][40007] Num frames 54700... +[2024-11-07 22:51:16,833][40007] Num frames 54800... +[2024-11-07 22:51:17,063][40007] Num frames 54900... +[2024-11-07 22:51:17,257][40007] Avg episode rewards: #0: 4.325, true rewards: #0: 3.975 +[2024-11-07 22:51:17,259][40007] Avg episode reward: 4.325, avg true_objective: 3.975 +[2024-11-07 22:51:17,331][40007] Num frames 55000... +[2024-11-07 22:51:17,547][40007] Num frames 55100... +[2024-11-07 22:51:17,763][40007] Num frames 55200... +[2024-11-07 22:51:17,977][40007] Num frames 55300... +[2024-11-07 22:51:18,146][40007] Avg episode rewards: #0: 4.309, true rewards: #0: 3.969 +[2024-11-07 22:51:18,152][40007] Avg episode reward: 4.309, avg true_objective: 3.969 +[2024-11-07 22:51:18,275][40007] Num frames 55400... +[2024-11-07 22:51:18,486][40007] Num frames 55500... +[2024-11-07 22:51:18,704][40007] Num frames 55600... +[2024-11-07 22:51:18,943][40007] Num frames 55700... +[2024-11-07 22:51:19,074][40007] Avg episode rewards: #0: 4.309, true rewards: #0: 3.969 +[2024-11-07 22:51:19,079][40007] Avg episode reward: 4.309, avg true_objective: 3.969 +[2024-11-07 22:51:19,243][40007] Num frames 55800... +[2024-11-07 22:51:19,441][40007] Num frames 55900... +[2024-11-07 22:51:19,647][40007] Num frames 56000... +[2024-11-07 22:51:19,749][40007] Avg episode rewards: #0: 4.299, true rewards: #0: 3.959 +[2024-11-07 22:51:19,751][40007] Avg episode reward: 4.299, avg true_objective: 3.959 +[2024-11-07 22:51:19,942][40007] Num frames 56100... +[2024-11-07 22:51:20,166][40007] Num frames 56200... +[2024-11-07 22:51:20,386][40007] Num frames 56300... +[2024-11-07 22:51:20,604][40007] Num frames 56400... +[2024-11-07 22:51:20,691][40007] Avg episode rewards: #0: 4.286, true rewards: #0: 3.956 +[2024-11-07 22:51:20,694][40007] Avg episode reward: 4.286, avg true_objective: 3.956 +[2024-11-07 22:51:20,945][40007] Num frames 56500... +[2024-11-07 22:51:21,195][40007] Num frames 56600... +[2024-11-07 22:51:21,424][40007] Num frames 56700... +[2024-11-07 22:51:21,678][40007] Avg episode rewards: #0: 4.286, true rewards: #0: 3.956 +[2024-11-07 22:51:21,682][40007] Avg episode reward: 4.286, avg true_objective: 3.956 +[2024-11-07 22:51:21,727][40007] Num frames 56800... +[2024-11-07 22:51:21,938][40007] Num frames 56900... +[2024-11-07 22:51:22,163][40007] Num frames 57000... +[2024-11-07 22:51:22,389][40007] Num frames 57100... +[2024-11-07 22:51:22,604][40007] Num frames 57200... +[2024-11-07 22:51:22,804][40007] Avg episode rewards: #0: 4.315, true rewards: #0: 3.965 +[2024-11-07 22:51:22,806][40007] Avg episode reward: 4.315, avg true_objective: 3.965 +[2024-11-07 22:51:22,889][40007] Num frames 57300... +[2024-11-07 22:51:23,134][40007] Num frames 57400... +[2024-11-07 22:51:23,370][40007] Num frames 57500... +[2024-11-07 22:51:23,615][40007] Num frames 57600... +[2024-11-07 22:51:23,803][40007] Avg episode rewards: #0: 4.315, true rewards: #0: 3.965 +[2024-11-07 22:51:23,808][40007] Avg episode reward: 4.315, avg true_objective: 3.965 +[2024-11-07 22:51:23,940][40007] Num frames 57700... +[2024-11-07 22:51:24,183][40007] Num frames 57800... +[2024-11-07 22:51:24,423][40007] Num frames 57900... +[2024-11-07 22:51:24,661][40007] Num frames 58000... +[2024-11-07 22:51:24,884][40007] Avg episode rewards: #0: 4.312, true rewards: #0: 3.962 +[2024-11-07 22:51:24,885][40007] Avg episode reward: 4.312, avg true_objective: 3.962 +[2024-11-07 22:51:24,951][40007] Num frames 58100... +[2024-11-07 22:51:25,168][40007] Num frames 58200... +[2024-11-07 22:51:25,379][40007] Num frames 58300... +[2024-11-07 22:51:25,578][40007] Num frames 58400... +[2024-11-07 22:51:25,756][40007] Avg episode rewards: #0: 4.312, true rewards: #0: 3.962 +[2024-11-07 22:51:25,761][40007] Avg episode reward: 4.312, avg true_objective: 3.962 +[2024-11-07 22:51:25,937][40007] Num frames 58500... +[2024-11-07 22:51:26,144][40007] Num frames 58600... +[2024-11-07 22:51:26,354][40007] Num frames 58700... +[2024-11-07 22:51:26,550][40007] Num frames 58800... +[2024-11-07 22:51:26,688][40007] Avg episode rewards: #0: 4.283, true rewards: #0: 3.953 +[2024-11-07 22:51:26,694][40007] Avg episode reward: 4.283, avg true_objective: 3.953 +[2024-11-07 22:51:26,829][40007] Num frames 58900... +[2024-11-07 22:51:27,035][40007] Num frames 59000... +[2024-11-07 22:51:27,264][40007] Num frames 59100... +[2024-11-07 22:51:28,963][40007] Num frames 59200... +[2024-11-07 22:51:29,077][40007] Avg episode rewards: #0: 4.283, true rewards: #0: 3.953 +[2024-11-07 22:51:29,082][40007] Avg episode reward: 4.283, avg true_objective: 3.953 +[2024-11-07 22:51:29,273][40007] Num frames 59300... +[2024-11-07 22:51:29,503][40007] Num frames 59400... +[2024-11-07 22:51:29,721][40007] Num frames 59500... +[2024-11-07 22:51:29,960][40007] Num frames 59600... +[2024-11-07 22:51:30,113][40007] Avg episode rewards: #0: 4.296, true rewards: #0: 3.956 +[2024-11-07 22:51:30,117][40007] Avg episode reward: 4.296, avg true_objective: 3.956 +[2024-11-07 22:51:30,285][40007] Num frames 59700... +[2024-11-07 22:51:30,522][40007] Num frames 59800... +[2024-11-07 22:51:30,761][40007] Num frames 59900... +[2024-11-07 22:51:31,013][40007] Num frames 60000... +[2024-11-07 22:51:31,249][40007] Num frames 60100... +[2024-11-07 22:51:31,430][40007] Avg episode rewards: #0: 4.305, true rewards: #0: 3.965 +[2024-11-07 22:51:31,434][40007] Avg episode reward: 4.305, avg true_objective: 3.965 +[2024-11-07 22:51:31,570][40007] Num frames 60200... +[2024-11-07 22:51:31,807][40007] Num frames 60300... +[2024-11-07 22:51:32,058][40007] Num frames 60400... +[2024-11-07 22:51:32,280][40007] Num frames 60500... +[2024-11-07 22:51:32,567][40007] Avg episode rewards: #0: 4.286, true rewards: #0: 3.956 +[2024-11-07 22:51:32,571][40007] Avg episode reward: 4.286, avg true_objective: 3.956 +[2024-11-07 22:51:32,586][40007] Num frames 60600... +[2024-11-07 22:51:32,837][40007] Num frames 60700... +[2024-11-07 22:51:33,090][40007] Num frames 60800... +[2024-11-07 22:51:33,309][40007] Num frames 60900... +[2024-11-07 22:51:33,558][40007] Avg episode rewards: #0: 4.286, true rewards: #0: 3.956 +[2024-11-07 22:51:33,562][40007] Avg episode reward: 4.286, avg true_objective: 3.956 +[2024-11-07 22:51:33,622][40007] Num frames 61000... +[2024-11-07 22:51:33,855][40007] Num frames 61100... +[2024-11-07 22:51:34,063][40007] Num frames 61200... +[2024-11-07 22:51:34,250][40007] Num frames 61300... +[2024-11-07 22:51:34,435][40007] Avg episode rewards: #0: 4.269, true rewards: #0: 3.949 +[2024-11-07 22:51:34,438][40007] Avg episode reward: 4.269, avg true_objective: 3.949 +[2024-11-07 22:51:34,542][40007] Num frames 61400... +[2024-11-07 22:51:34,762][40007] Num frames 61500... +[2024-11-07 22:51:34,976][40007] Num frames 61600... +[2024-11-07 22:51:35,191][40007] Num frames 61700... +[2024-11-07 22:51:35,356][40007] Avg episode rewards: #0: 4.269, true rewards: #0: 3.949 +[2024-11-07 22:51:35,360][40007] Avg episode reward: 4.269, avg true_objective: 3.949 +[2024-11-07 22:51:35,486][40007] Num frames 61800... +[2024-11-07 22:51:35,701][40007] Num frames 61900... +[2024-11-07 22:51:35,920][40007] Num frames 62000... +[2024-11-07 22:51:36,138][40007] Num frames 62100... +[2024-11-07 22:51:36,365][40007] Num frames 62200... +[2024-11-07 22:51:36,563][40007] Avg episode rewards: #0: 4.269, true rewards: #0: 3.949 +[2024-11-07 22:51:36,568][40007] Avg episode reward: 4.269, avg true_objective: 3.949 +[2024-11-07 22:51:36,663][40007] Num frames 62300... +[2024-11-07 22:51:36,879][40007] Num frames 62400... +[2024-11-07 22:51:37,096][40007] Num frames 62500... +[2024-11-07 22:51:37,316][40007] Num frames 62600... +[2024-11-07 22:51:37,543][40007] Avg episode rewards: #0: 4.283, true rewards: #0: 3.953 +[2024-11-07 22:51:37,547][40007] Avg episode reward: 4.283, avg true_objective: 3.953 +[2024-11-07 22:51:37,618][40007] Num frames 62700... +[2024-11-07 22:51:37,843][40007] Num frames 62800... +[2024-11-07 22:51:38,073][40007] Num frames 62900... +[2024-11-07 22:51:38,286][40007] Num frames 63000... +[2024-11-07 22:51:38,535][40007] Avg episode rewards: #0: 4.306, true rewards: #0: 3.956 +[2024-11-07 22:51:38,539][40007] Avg episode reward: 4.306, avg true_objective: 3.956 +[2024-11-07 22:51:38,578][40007] Num frames 63100... +[2024-11-07 22:51:38,810][40007] Num frames 63200... +[2024-11-07 22:51:39,027][40007] Num frames 63300... +[2024-11-07 22:51:39,216][40007] Num frames 63400... +[2024-11-07 22:51:39,408][40007] Num frames 63500... +[2024-11-07 22:51:39,634][40007] Num frames 63600... +[2024-11-07 22:51:39,706][40007] Avg episode rewards: #0: 4.349, true rewards: #0: 3.969 +[2024-11-07 22:51:39,710][40007] Avg episode reward: 4.349, avg true_objective: 3.969 +[2024-11-07 22:51:39,962][40007] Num frames 63700... +[2024-11-07 22:51:40,186][40007] Num frames 63800... +[2024-11-07 22:51:40,422][40007] Num frames 63900... +[2024-11-07 22:51:40,664][40007] Num frames 64000... +[2024-11-07 22:51:40,870][40007] Avg episode rewards: #0: 4.356, true rewards: #0: 3.976 +[2024-11-07 22:51:40,871][40007] Avg episode reward: 4.356, avg true_objective: 3.976 +[2024-11-07 22:51:40,989][40007] Num frames 64100... +[2024-11-07 22:51:41,229][40007] Num frames 64200... +[2024-11-07 22:51:41,441][40007] Num frames 64300... +[2024-11-07 22:51:41,700][40007] Num frames 64400... +[2024-11-07 22:51:41,855][40007] Avg episode rewards: #0: 4.356, true rewards: #0: 3.976 +[2024-11-07 22:51:41,860][40007] Avg episode reward: 4.356, avg true_objective: 3.976 +[2024-11-07 22:51:42,028][40007] Num frames 64500... +[2024-11-07 22:51:42,268][40007] Num frames 64600... +[2024-11-07 22:51:42,498][40007] Num frames 64700... +[2024-11-07 22:51:42,721][40007] Num frames 64800... +[2024-11-07 22:51:42,835][40007] Avg episode rewards: #0: 4.356, true rewards: #0: 3.976 +[2024-11-07 22:51:42,836][40007] Avg episode reward: 4.356, avg true_objective: 3.976 +[2024-11-07 22:51:43,025][40007] Num frames 64900... +[2024-11-07 22:51:43,258][40007] Num frames 65000... +[2024-11-07 22:51:43,602][40007] Num frames 65100... +[2024-11-07 22:51:43,866][40007] Num frames 65200... +[2024-11-07 22:51:43,940][40007] Avg episode rewards: #0: 4.356, true rewards: #0: 3.976 +[2024-11-07 22:51:43,942][40007] Avg episode reward: 4.356, avg true_objective: 3.976 +[2024-11-07 22:51:44,264][40007] Num frames 65300... +[2024-11-07 22:51:44,527][40007] Num frames 65400... +[2024-11-07 22:51:44,752][40007] Num frames 65500... +[2024-11-07 22:51:45,059][40007] Avg episode rewards: #0: 4.356, true rewards: #0: 3.976 +[2024-11-07 22:51:45,060][40007] Avg episode reward: 4.356, avg true_objective: 3.976 +[2024-11-07 22:51:45,085][40007] Num frames 65600... +[2024-11-07 22:51:45,431][40007] Num frames 65700... +[2024-11-07 22:51:45,800][40007] Num frames 65800... +[2024-11-07 22:51:46,038][40007] Num frames 65900... +[2024-11-07 22:51:46,383][40007] Avg episode rewards: #0: 4.356, true rewards: #0: 3.976 +[2024-11-07 22:51:46,388][40007] Avg episode reward: 4.356, avg true_objective: 3.976 +[2024-11-07 22:51:46,494][40007] Num frames 66000... +[2024-11-07 22:51:46,694][40007] Num frames 66100... +[2024-11-07 22:51:46,886][40007] Num frames 66200... +[2024-11-07 22:51:47,084][40007] Num frames 66300... +[2024-11-07 22:51:47,256][40007] Avg episode rewards: #0: 4.340, true rewards: #0: 3.970 +[2024-11-07 22:51:47,260][40007] Avg episode reward: 4.340, avg true_objective: 3.970 +[2024-11-07 22:51:47,366][40007] Num frames 66400... +[2024-11-07 22:51:47,578][40007] Num frames 66500... +[2024-11-07 22:51:47,808][40007] Num frames 66600... +[2024-11-07 22:51:48,036][40007] Num frames 66700... +[2024-11-07 22:51:48,188][40007] Avg episode rewards: #0: 4.323, true rewards: #0: 3.963 +[2024-11-07 22:51:48,190][40007] Avg episode reward: 4.323, avg true_objective: 3.963 +[2024-11-07 22:51:48,308][40007] Num frames 66800... +[2024-11-07 22:51:48,499][40007] Num frames 66900... +[2024-11-07 22:51:48,692][40007] Num frames 67000... +[2024-11-07 22:51:48,888][40007] Num frames 67100... +[2024-11-07 22:51:49,000][40007] Avg episode rewards: #0: 4.317, true rewards: #0: 3.967 +[2024-11-07 22:51:49,004][40007] Avg episode reward: 4.317, avg true_objective: 3.967 +[2024-11-07 22:51:49,159][40007] Num frames 67200... +[2024-11-07 22:51:49,367][40007] Num frames 67300... +[2024-11-07 22:51:49,564][40007] Num frames 67400... +[2024-11-07 22:51:49,761][40007] Num frames 67500... +[2024-11-07 22:51:49,841][40007] Avg episode rewards: #0: 4.317, true rewards: #0: 3.967 +[2024-11-07 22:51:49,847][40007] Avg episode reward: 4.317, avg true_objective: 3.967 +[2024-11-07 22:51:50,048][40007] Num frames 67600... +[2024-11-07 22:51:50,240][40007] Num frames 67700... +[2024-11-07 22:51:50,423][40007] Num frames 67800... +[2024-11-07 22:51:50,611][40007] Num frames 67900... +[2024-11-07 22:51:50,720][40007] Avg episode rewards: #0: 4.330, true rewards: #0: 3.970 +[2024-11-07 22:51:50,724][40007] Avg episode reward: 4.330, avg true_objective: 3.970 +[2024-11-07 22:51:50,865][40007] Num frames 68000... +[2024-11-07 22:51:51,059][40007] Num frames 68100... +[2024-11-07 22:51:51,233][40007] Num frames 68200... +[2024-11-07 22:51:51,405][40007] Num frames 68300... +[2024-11-07 22:51:51,598][40007] Avg episode rewards: #0: 4.346, true rewards: #0: 3.976 +[2024-11-07 22:51:51,601][40007] Avg episode reward: 4.346, avg true_objective: 3.976 +[2024-11-07 22:51:51,670][40007] Num frames 68400... +[2024-11-07 22:51:51,853][40007] Num frames 68500... +[2024-11-07 22:51:52,037][40007] Num frames 68600... +[2024-11-07 22:51:52,153][40007] Avg episode rewards: #0: 4.346, true rewards: #0: 3.976 +[2024-11-07 22:51:52,155][40007] Avg episode reward: 4.346, avg true_objective: 3.976 +[2024-11-07 22:51:52,312][40007] Num frames 68700... +[2024-11-07 22:51:52,487][40007] Num frames 68800... +[2024-11-07 22:51:52,665][40007] Num frames 68900... +[2024-11-07 22:51:52,842][40007] Num frames 69000... +[2024-11-07 22:51:52,990][40007] Avg episode rewards: #0: 4.359, true rewards: #0: 3.979 +[2024-11-07 22:51:52,994][40007] Avg episode reward: 4.359, avg true_objective: 3.979 +[2024-11-07 22:51:53,116][40007] Num frames 69100... +[2024-11-07 22:51:53,296][40007] Num frames 69200... +[2024-11-07 22:51:53,470][40007] Num frames 69300... +[2024-11-07 22:51:53,645][40007] Num frames 69400... +[2024-11-07 22:51:53,758][40007] Avg episode rewards: #0: 4.359, true rewards: #0: 3.979 +[2024-11-07 22:51:53,763][40007] Avg episode reward: 4.359, avg true_objective: 3.979 +[2024-11-07 22:51:53,905][40007] Num frames 69500... +[2024-11-07 22:51:54,076][40007] Num frames 69600... +[2024-11-07 22:51:54,278][40007] Num frames 69700... +[2024-11-07 22:51:54,488][40007] Num frames 69800... +[2024-11-07 22:51:54,576][40007] Avg episode rewards: #0: 4.359, true rewards: #0: 3.979 +[2024-11-07 22:51:54,582][40007] Avg episode reward: 4.359, avg true_objective: 3.979 +[2024-11-07 22:51:54,768][40007] Num frames 69900... +[2024-11-07 22:51:54,974][40007] Num frames 70000... +[2024-11-07 22:51:55,173][40007] Num frames 70100... +[2024-11-07 22:51:55,365][40007] Num frames 70200... +[2024-11-07 22:51:55,544][40007] Avg episode rewards: #0: 4.376, true rewards: #0: 3.986 +[2024-11-07 22:51:55,549][40007] Avg episode reward: 4.376, avg true_objective: 3.986 +[2024-11-07 22:51:55,638][40007] Num frames 70300... +[2024-11-07 22:51:55,831][40007] Num frames 70400... +[2024-11-07 22:51:56,026][40007] Num frames 70500... +[2024-11-07 22:51:56,211][40007] Num frames 70600... +[2024-11-07 22:51:56,357][40007] Avg episode rewards: #0: 4.376, true rewards: #0: 3.986 +[2024-11-07 22:51:56,361][40007] Avg episode reward: 4.376, avg true_objective: 3.986 +[2024-11-07 22:51:56,489][40007] Num frames 70700... +[2024-11-07 22:51:56,696][40007] Num frames 70800... +[2024-11-07 22:51:56,911][40007] Num frames 70900... +[2024-11-07 22:51:57,106][40007] Num frames 71000... +[2024-11-07 22:51:57,230][40007] Avg episode rewards: #0: 4.376, true rewards: #0: 3.986 +[2024-11-07 22:51:57,236][40007] Avg episode reward: 4.376, avg true_objective: 3.986 +[2024-11-07 22:51:57,391][40007] Num frames 71100... +[2024-11-07 22:51:57,585][40007] Num frames 71200... +[2024-11-07 22:51:57,781][40007] Num frames 71300... +[2024-11-07 22:51:57,978][40007] Num frames 71400... +[2024-11-07 22:51:58,066][40007] Avg episode rewards: #0: 4.376, true rewards: #0: 3.986 +[2024-11-07 22:51:58,070][40007] Avg episode reward: 4.376, avg true_objective: 3.986 +[2024-11-07 22:51:58,255][40007] Num frames 71500... +[2024-11-07 22:51:58,451][40007] Num frames 71600... +[2024-11-07 22:51:58,650][40007] Num frames 71700... +[2024-11-07 22:51:58,878][40007] Num frames 71800... +[2024-11-07 22:51:59,078][40007] Avg episode rewards: #0: 4.359, true rewards: #0: 3.979 +[2024-11-07 22:51:59,081][40007] Avg episode reward: 4.359, avg true_objective: 3.979 +[2024-11-07 22:51:59,169][40007] Num frames 71900... +[2024-11-07 22:51:59,418][40007] Num frames 72000... +[2024-11-07 22:51:59,639][40007] Num frames 72100... +[2024-11-07 22:51:59,849][40007] Num frames 72200... +[2024-11-07 22:52:00,046][40007] Num frames 72300... +[2024-11-07 22:52:00,128][40007] Avg episode rewards: #0: 4.376, true rewards: #0: 3.986 +[2024-11-07 22:52:00,130][40007] Avg episode reward: 4.376, avg true_objective: 3.986 +[2024-11-07 22:52:00,374][40007] Num frames 72400... +[2024-11-07 22:52:00,586][40007] Num frames 72500... +[2024-11-07 22:52:00,809][40007] Num frames 72600... +[2024-11-07 22:52:01,030][40007] Num frames 72700... +[2024-11-07 22:52:02,617][40007] Avg episode rewards: #0: 4.379, true rewards: #0: 3.989 +[2024-11-07 22:52:02,620][40007] Avg episode reward: 4.379, avg true_objective: 3.989 +[2024-11-07 22:52:02,801][40007] Num frames 72800... +[2024-11-07 22:52:02,984][40007] Num frames 72900... +[2024-11-07 22:52:03,171][40007] Num frames 73000... +[2024-11-07 22:52:03,360][40007] Num frames 73100... +[2024-11-07 22:52:03,439][40007] Avg episode rewards: #0: 4.363, true rewards: #0: 3.983 +[2024-11-07 22:52:03,443][40007] Avg episode reward: 4.363, avg true_objective: 3.983 +[2024-11-07 22:52:03,630][40007] Num frames 73200... +[2024-11-07 22:52:03,817][40007] Num frames 73300... +[2024-11-07 22:52:04,005][40007] Num frames 73400... +[2024-11-07 22:52:04,243][40007] Avg episode rewards: #0: 4.363, true rewards: #0: 3.983 +[2024-11-07 22:52:04,247][40007] Avg episode reward: 4.363, avg true_objective: 3.983 +[2024-11-07 22:52:04,276][40007] Num frames 73500... +[2024-11-07 22:52:04,493][40007] Num frames 73600... +[2024-11-07 22:52:04,696][40007] Num frames 73700... +[2024-11-07 22:52:04,898][40007] Num frames 73800... +[2024-11-07 22:52:05,107][40007] Avg episode rewards: #0: 4.354, true rewards: #0: 3.984 +[2024-11-07 22:52:05,111][40007] Avg episode reward: 4.354, avg true_objective: 3.984 +[2024-11-07 22:52:05,181][40007] Num frames 73900... +[2024-11-07 22:52:05,380][40007] Num frames 74000... +[2024-11-07 22:52:05,575][40007] Num frames 74100... +[2024-11-07 22:52:05,794][40007] Num frames 74200... +[2024-11-07 22:52:05,982][40007] Avg episode rewards: #0: 4.338, true rewards: #0: 3.978 +[2024-11-07 22:52:05,987][40007] Avg episode reward: 4.338, avg true_objective: 3.978 +[2024-11-07 22:52:06,085][40007] Num frames 74300... +[2024-11-07 22:52:06,285][40007] Num frames 74400... +[2024-11-07 22:52:06,479][40007] Num frames 74500... +[2024-11-07 22:52:06,670][40007] Num frames 74600... +[2024-11-07 22:52:06,823][40007] Avg episode rewards: #0: 4.338, true rewards: #0: 3.978 +[2024-11-07 22:52:06,827][40007] Avg episode reward: 4.338, avg true_objective: 3.978 +[2024-11-07 22:52:06,947][40007] Num frames 74700... +[2024-11-07 22:52:07,146][40007] Num frames 74800... +[2024-11-07 22:52:07,335][40007] Num frames 74900... +[2024-11-07 22:52:07,522][40007] Num frames 75000... +[2024-11-07 22:52:07,765][40007] Avg episode rewards: #0: 4.354, true rewards: #0: 3.984 +[2024-11-07 22:52:07,768][40007] Avg episode reward: 4.354, avg true_objective: 3.984 +[2024-11-07 22:52:07,801][40007] Num frames 75100... +[2024-11-07 22:52:08,004][40007] Num frames 75200... +[2024-11-07 22:52:08,197][40007] Num frames 75300... +[2024-11-07 22:52:08,380][40007] Num frames 75400... +[2024-11-07 22:52:08,581][40007] Num frames 75500... +[2024-11-07 22:52:08,723][40007] Avg episode rewards: #0: 4.383, true rewards: #0: 4.003 +[2024-11-07 22:52:08,726][40007] Avg episode reward: 4.383, avg true_objective: 4.003 +[2024-11-07 22:52:08,857][40007] Num frames 75600... +[2024-11-07 22:52:09,031][40007] Num frames 75700... +[2024-11-07 22:52:09,205][40007] Num frames 75800... +[2024-11-07 22:52:09,385][40007] Num frames 75900... +[2024-11-07 22:52:09,654][40007] Avg episode rewards: #0: 4.383, true rewards: #0: 4.003 +[2024-11-07 22:52:09,656][40007] Avg episode reward: 4.383, avg true_objective: 4.003 +[2024-11-07 22:52:09,686][40007] Num frames 76000... +[2024-11-07 22:52:09,882][40007] Num frames 76100... +[2024-11-07 22:52:10,057][40007] Num frames 76200... +[2024-11-07 22:52:10,241][40007] Num frames 76300... +[2024-11-07 22:52:10,416][40007] Num frames 76400... +[2024-11-07 22:52:10,558][40007] Avg episode rewards: #0: 4.400, true rewards: #0: 4.010 +[2024-11-07 22:52:10,562][40007] Avg episode reward: 4.400, avg true_objective: 4.010 +[2024-11-07 22:52:10,689][40007] Num frames 76500... +[2024-11-07 22:52:10,875][40007] Num frames 76600... +[2024-11-07 22:52:11,076][40007] Num frames 76700... +[2024-11-07 22:52:11,252][40007] Num frames 76800... +[2024-11-07 22:52:11,402][40007] Avg episode rewards: #0: 4.403, true rewards: #0: 4.013 +[2024-11-07 22:52:11,405][40007] Avg episode reward: 4.403, avg true_objective: 4.013 +[2024-11-07 22:52:11,506][40007] Num frames 76900... +[2024-11-07 22:52:11,678][40007] Num frames 77000... +[2024-11-07 22:52:11,870][40007] Num frames 77100... +[2024-11-07 22:52:12,038][40007] Num frames 77200... +[2024-11-07 22:52:12,165][40007] Avg episode rewards: #0: 4.403, true rewards: #0: 4.013 +[2024-11-07 22:52:12,171][40007] Avg episode reward: 4.403, avg true_objective: 4.013 +[2024-11-07 22:52:12,304][40007] Num frames 77300... +[2024-11-07 22:52:12,506][40007] Num frames 77400... +[2024-11-07 22:52:12,727][40007] Num frames 77500... +[2024-11-07 22:52:12,933][40007] Num frames 77600... +[2024-11-07 22:52:13,159][40007] Avg episode rewards: #0: 4.419, true rewards: #0: 4.019 +[2024-11-07 22:52:13,160][40007] Avg episode reward: 4.419, avg true_objective: 4.019 +[2024-11-07 22:52:13,188][40007] Num frames 77700... +[2024-11-07 22:52:13,399][40007] Num frames 77800... +[2024-11-07 22:52:13,631][40007] Num frames 77900... +[2024-11-07 22:52:13,866][40007] Num frames 78000... +[2024-11-07 22:52:14,069][40007] Avg episode rewards: #0: 4.419, true rewards: #0: 4.019 +[2024-11-07 22:52:14,071][40007] Avg episode reward: 4.419, avg true_objective: 4.019 +[2024-11-07 22:52:14,143][40007] Num frames 78100... +[2024-11-07 22:52:14,332][40007] Num frames 78200... +[2024-11-07 22:52:14,513][40007] Num frames 78300... +[2024-11-07 22:52:14,692][40007] Num frames 78400... +[2024-11-07 22:52:14,858][40007] Avg episode rewards: #0: 4.419, true rewards: #0: 4.019 +[2024-11-07 22:52:14,862][40007] Avg episode reward: 4.419, avg true_objective: 4.019 +[2024-11-07 22:52:14,963][40007] Num frames 78500... +[2024-11-07 22:52:15,165][40007] Num frames 78600... +[2024-11-07 22:52:15,396][40007] Num frames 78700... +[2024-11-07 22:52:15,612][40007] Num frames 78800... +[2024-11-07 22:52:15,787][40007] Avg episode rewards: #0: 4.400, true rewards: #0: 4.010 +[2024-11-07 22:52:15,788][40007] Avg episode reward: 4.400, avg true_objective: 4.010 +[2024-11-07 22:52:15,922][40007] Num frames 78900... +[2024-11-07 22:52:16,120][40007] Num frames 79000... +[2024-11-07 22:52:16,299][40007] Num frames 79100... +[2024-11-07 22:52:16,470][40007] Num frames 79200... +[2024-11-07 22:52:16,568][40007] Avg episode rewards: #0: 4.400, true rewards: #0: 4.010 +[2024-11-07 22:52:16,572][40007] Avg episode reward: 4.400, avg true_objective: 4.010 +[2024-11-07 22:52:16,736][40007] Num frames 79300... +[2024-11-07 22:52:16,938][40007] Num frames 79400... +[2024-11-07 22:52:17,144][40007] Num frames 79500... +[2024-11-07 22:52:17,341][40007] Num frames 79600... +[2024-11-07 22:52:17,413][40007] Avg episode rewards: #0: 4.380, true rewards: #0: 4.000 +[2024-11-07 22:52:17,416][40007] Avg episode reward: 4.380, avg true_objective: 4.000 +[2024-11-07 22:52:17,601][40007] Num frames 79700... +[2024-11-07 22:52:17,784][40007] Num frames 79800... +[2024-11-07 22:52:17,957][40007] Num frames 79900... +[2024-11-07 22:52:18,126][40007] Num frames 80000... +[2024-11-07 22:52:18,285][40007] Avg episode rewards: #0: 4.396, true rewards: #0: 4.006 +[2024-11-07 22:52:18,289][40007] Avg episode reward: 4.396, avg true_objective: 4.006 +[2024-11-07 22:52:18,382][40007] Num frames 80100... +[2024-11-07 22:52:18,575][40007] Num frames 80200... +[2024-11-07 22:52:18,765][40007] Num frames 80300... +[2024-11-07 22:52:18,944][40007] Num frames 80400... +[2024-11-07 22:52:19,071][40007] Avg episode rewards: #0: 4.396, true rewards: #0: 4.006 +[2024-11-07 22:52:19,074][40007] Avg episode reward: 4.396, avg true_objective: 4.006 +[2024-11-07 22:52:19,204][40007] Num frames 80500... +[2024-11-07 22:52:19,385][40007] Num frames 80600... +[2024-11-07 22:52:19,571][40007] Num frames 80700... +[2024-11-07 22:52:19,745][40007] Num frames 80800... +[2024-11-07 22:52:19,925][40007] Num frames 80900... +[2024-11-07 22:52:20,127][40007] Avg episode rewards: #0: 4.432, true rewards: #0: 4.022 +[2024-11-07 22:52:20,131][40007] Avg episode reward: 4.432, avg true_objective: 4.022 +[2024-11-07 22:52:20,180][40007] Num frames 81000... +[2024-11-07 22:52:20,389][40007] Num frames 81100... +[2024-11-07 22:52:20,631][40007] Num frames 81200... +[2024-11-07 22:52:20,866][40007] Num frames 81300... +[2024-11-07 22:52:21,105][40007] Num frames 81400... +[2024-11-07 22:52:21,229][40007] Avg episode rewards: #0: 4.449, true rewards: #0: 4.029 +[2024-11-07 22:52:21,232][40007] Avg episode reward: 4.449, avg true_objective: 4.029 +[2024-11-07 22:52:21,392][40007] Num frames 81500... +[2024-11-07 22:52:21,596][40007] Num frames 81600... +[2024-11-07 22:52:21,787][40007] Num frames 81700... +[2024-11-07 22:52:21,990][40007] Num frames 81800... +[2024-11-07 22:52:22,137][40007] Avg episode rewards: #0: 4.458, true rewards: #0: 4.038 +[2024-11-07 22:52:22,142][40007] Avg episode reward: 4.458, avg true_objective: 4.038 +[2024-11-07 22:52:22,267][40007] Num frames 81900... +[2024-11-07 22:52:22,465][40007] Num frames 82000... +[2024-11-07 22:52:22,668][40007] Num frames 82100... +[2024-11-07 22:52:22,858][40007] Num frames 82200... +[2024-11-07 22:52:23,086][40007] Avg episode rewards: #0: 4.475, true rewards: #0: 4.045 +[2024-11-07 22:52:23,089][40007] Avg episode reward: 4.475, avg true_objective: 4.045 +[2024-11-07 22:52:23,116][40007] Num frames 82300... +[2024-11-07 22:52:23,322][40007] Num frames 82400... +[2024-11-07 22:52:23,536][40007] Num frames 82500... +[2024-11-07 22:52:23,768][40007] Num frames 82600... +[2024-11-07 22:52:23,999][40007] Num frames 82700... +[2024-11-07 22:52:24,158][40007] Avg episode rewards: #0: 4.491, true rewards: #0: 4.051 +[2024-11-07 22:52:24,162][40007] Avg episode reward: 4.491, avg true_objective: 4.051 +[2024-11-07 22:52:24,302][40007] Num frames 82800... +[2024-11-07 22:52:24,529][40007] Num frames 82900... +[2024-11-07 22:52:24,750][40007] Num frames 83000... +[2024-11-07 22:52:24,962][40007] Num frames 83100... +[2024-11-07 22:52:25,082][40007] Avg episode rewards: #0: 4.462, true rewards: #0: 4.042 +[2024-11-07 22:52:25,086][40007] Avg episode reward: 4.462, avg true_objective: 4.042 +[2024-11-07 22:52:25,275][40007] Num frames 83200... +[2024-11-07 22:52:25,492][40007] Num frames 83300... +[2024-11-07 22:52:25,712][40007] Num frames 83400... +[2024-11-07 22:52:25,930][40007] Num frames 83500... +[2024-11-07 22:52:26,011][40007] Avg episode rewards: #0: 4.462, true rewards: #0: 4.042 +[2024-11-07 22:52:26,016][40007] Avg episode reward: 4.462, avg true_objective: 4.042 +[2024-11-07 22:52:26,230][40007] Num frames 83600... +[2024-11-07 22:52:26,444][40007] Num frames 83700... +[2024-11-07 22:52:26,668][40007] Num frames 83800... +[2024-11-07 22:52:26,933][40007] Avg episode rewards: #0: 4.462, true rewards: #0: 4.042 +[2024-11-07 22:52:26,937][40007] Avg episode reward: 4.462, avg true_objective: 4.042 +[2024-11-07 22:52:26,968][40007] Num frames 83900... +[2024-11-07 22:52:27,198][40007] Num frames 84000... +[2024-11-07 22:52:27,412][40007] Num frames 84100... +[2024-11-07 22:52:27,645][40007] Num frames 84200... +[2024-11-07 22:52:27,902][40007] Avg episode rewards: #0: 4.426, true rewards: #0: 4.026 +[2024-11-07 22:52:27,906][40007] Avg episode reward: 4.426, avg true_objective: 4.026 +[2024-11-07 22:52:27,982][40007] Num frames 84300... +[2024-11-07 22:52:28,230][40007] Num frames 84400... +[2024-11-07 22:52:28,459][40007] Num frames 84500... +[2024-11-07 22:52:28,707][40007] Num frames 84600... +[2024-11-07 22:52:28,941][40007] Num frames 84700... +[2024-11-07 22:52:29,059][40007] Avg episode rewards: #0: 4.442, true rewards: #0: 4.032 +[2024-11-07 22:52:29,060][40007] Avg episode reward: 4.442, avg true_objective: 4.032 +[2024-11-07 22:52:29,241][40007] Num frames 84800... +[2024-11-07 22:52:29,477][40007] Num frames 84900... +[2024-11-07 22:52:29,713][40007] Num frames 85000... +[2024-11-07 22:52:29,945][40007] Num frames 85100... +[2024-11-07 22:52:30,033][40007] Avg episode rewards: #0: 4.442, true rewards: #0: 4.032 +[2024-11-07 22:52:30,038][40007] Avg episode reward: 4.442, avg true_objective: 4.032 +[2024-11-07 22:52:30,279][40007] Num frames 85200... +[2024-11-07 22:52:30,501][40007] Num frames 85300... +[2024-11-07 22:52:30,738][40007] Num frames 85400... +[2024-11-07 22:52:31,023][40007] Num frames 85500... +[2024-11-07 22:52:31,223][40007] Avg episode rewards: #0: 4.458, true rewards: #0: 4.038 +[2024-11-07 22:52:31,227][40007] Avg episode reward: 4.458, avg true_objective: 4.038 +[2024-11-07 22:52:31,348][40007] Num frames 85600... +[2024-11-07 22:52:31,581][40007] Num frames 85700... +[2024-11-07 22:52:31,833][40007] Num frames 85800... +[2024-11-07 22:52:32,061][40007] Num frames 85900... +[2024-11-07 22:52:32,278][40007] Num frames 86000... +[2024-11-07 22:52:32,487][40007] Num frames 86100... +[2024-11-07 22:52:32,553][40007] Avg episode rewards: #0: 4.494, true rewards: #0: 4.054 +[2024-11-07 22:52:32,556][40007] Avg episode reward: 4.494, avg true_objective: 4.054 +[2024-11-07 22:52:32,776][40007] Num frames 86200... +[2024-11-07 22:52:32,969][40007] Num frames 86300... +[2024-11-07 22:52:33,169][40007] Num frames 86400... +[2024-11-07 22:52:33,402][40007] Avg episode rewards: #0: 4.494, true rewards: #0: 4.054 +[2024-11-07 22:52:33,404][40007] Avg episode reward: 4.494, avg true_objective: 4.054 +[2024-11-07 22:52:33,447][40007] Num frames 86500... +[2024-11-07 22:52:33,658][40007] Num frames 86600... +[2024-11-07 22:52:33,871][40007] Num frames 86700... +[2024-11-07 22:52:34,065][40007] Num frames 86800... +[2024-11-07 22:52:34,258][40007] Avg episode rewards: #0: 4.494, true rewards: #0: 4.054 +[2024-11-07 22:52:34,261][40007] Avg episode reward: 4.494, avg true_objective: 4.054 +[2024-11-07 22:52:34,341][40007] Num frames 86900... +[2024-11-07 22:52:34,539][40007] Num frames 87000... +[2024-11-07 22:52:34,746][40007] Num frames 87100... +[2024-11-07 22:52:36,443][40007] Num frames 87200... +[2024-11-07 22:52:36,615][40007] Avg episode rewards: #0: 4.494, true rewards: #0: 4.054 +[2024-11-07 22:52:36,617][40007] Avg episode reward: 4.494, avg true_objective: 4.054 +[2024-11-07 22:52:36,729][40007] Num frames 87300... +[2024-11-07 22:52:36,923][40007] Num frames 87400... +[2024-11-07 22:52:37,109][40007] Num frames 87500... +[2024-11-07 22:52:37,302][40007] Num frames 87600... +[2024-11-07 22:52:37,510][40007] Num frames 87700... +[2024-11-07 22:52:37,571][40007] Avg episode rewards: #0: 4.511, true rewards: #0: 4.061 +[2024-11-07 22:52:37,576][40007] Avg episode reward: 4.511, avg true_objective: 4.061 +[2024-11-07 22:52:37,806][40007] Num frames 87800... +[2024-11-07 22:52:38,116][40007] Num frames 87900... +[2024-11-07 22:52:38,324][40007] Num frames 88000... +[2024-11-07 22:52:38,579][40007] Avg episode rewards: #0: 4.504, true rewards: #0: 4.064 +[2024-11-07 22:52:38,580][40007] Avg episode reward: 4.504, avg true_objective: 4.064 +[2024-11-07 22:52:38,625][40007] Num frames 88100... +[2024-11-07 22:52:38,828][40007] Num frames 88200... +[2024-11-07 22:52:39,034][40007] Num frames 88300... +[2024-11-07 22:52:39,246][40007] Num frames 88400... +[2024-11-07 22:52:39,452][40007] Avg episode rewards: #0: 4.504, true rewards: #0: 4.064 +[2024-11-07 22:52:39,458][40007] Avg episode reward: 4.504, avg true_objective: 4.064 +[2024-11-07 22:52:39,553][40007] Num frames 88500... +[2024-11-07 22:52:39,773][40007] Num frames 88600... +[2024-11-07 22:52:39,979][40007] Num frames 88700... +[2024-11-07 22:52:40,174][40007] Num frames 88800... +[2024-11-07 22:52:40,363][40007] Num frames 88900... +[2024-11-07 22:52:40,460][40007] Avg episode rewards: #0: 4.520, true rewards: #0: 4.070 +[2024-11-07 22:52:40,462][40007] Avg episode reward: 4.520, avg true_objective: 4.070 +[2024-11-07 22:52:40,637][40007] Num frames 89000... +[2024-11-07 22:52:40,831][40007] Num frames 89100... +[2024-11-07 22:52:41,019][40007] Num frames 89200... +[2024-11-07 22:52:41,205][40007] Num frames 89300... +[2024-11-07 22:52:41,266][40007] Avg episode rewards: #0: 4.507, true rewards: #0: 4.067 +[2024-11-07 22:52:41,270][40007] Avg episode reward: 4.507, avg true_objective: 4.067 +[2024-11-07 22:52:41,490][40007] Num frames 89400... +[2024-11-07 22:52:41,695][40007] Num frames 89500... +[2024-11-07 22:52:41,878][40007] Num frames 89600... +[2024-11-07 22:52:42,102][40007] Avg episode rewards: #0: 4.507, true rewards: #0: 4.067 +[2024-11-07 22:52:42,104][40007] Avg episode reward: 4.507, avg true_objective: 4.067 +[2024-11-07 22:52:42,137][40007] Num frames 89700... +[2024-11-07 22:52:42,336][40007] Num frames 89800... +[2024-11-07 22:52:42,584][40007] Num frames 89900... +[2024-11-07 22:52:42,848][40007] Num frames 90000... +[2024-11-07 22:52:43,069][40007] Avg episode rewards: #0: 4.507, true rewards: #0: 4.067 +[2024-11-07 22:52:43,072][40007] Avg episode reward: 4.507, avg true_objective: 4.067 +[2024-11-07 22:52:43,184][40007] Num frames 90100... +[2024-11-07 22:52:43,448][40007] Num frames 90200... +[2024-11-07 22:52:43,636][40007] Num frames 90300... +[2024-11-07 22:52:43,827][40007] Num frames 90400... +[2024-11-07 22:52:43,987][40007] Avg episode rewards: #0: 4.491, true rewards: #0: 4.061 +[2024-11-07 22:52:43,990][40007] Avg episode reward: 4.491, avg true_objective: 4.061 +[2024-11-07 22:52:44,095][40007] Num frames 90500... +[2024-11-07 22:52:44,417][40007] Num frames 90600... +[2024-11-07 22:52:44,715][40007] Num frames 90700... +[2024-11-07 22:52:44,810][40007] Avg episode rewards: #0: 4.478, true rewards: #0: 4.048 +[2024-11-07 22:52:44,811][40007] Avg episode reward: 4.478, avg true_objective: 4.048 +[2024-11-07 22:52:45,010][40007] Num frames 90800... +[2024-11-07 22:52:45,276][40007] Num frames 90900... +[2024-11-07 22:52:45,534][40007] Num frames 91000... +[2024-11-07 22:52:45,917][40007] Avg episode rewards: #0: 4.462, true rewards: #0: 4.042 +[2024-11-07 22:52:45,921][40007] Avg episode reward: 4.462, avg true_objective: 4.042 +[2024-11-07 22:52:45,954][40007] Num frames 91100... +[2024-11-07 22:52:46,264][40007] Num frames 91200... +[2024-11-07 22:52:46,594][40007] Num frames 91300... +[2024-11-07 22:52:46,876][40007] Num frames 91400... +[2024-11-07 22:52:47,218][40007] Avg episode rewards: #0: 4.462, true rewards: #0: 4.042 +[2024-11-07 22:52:47,220][40007] Avg episode reward: 4.462, avg true_objective: 4.042 +[2024-11-07 22:52:47,273][40007] Num frames 91500... +[2024-11-07 22:52:47,664][40007] Num frames 91600... +[2024-11-07 22:52:47,968][40007] Num frames 91700... +[2024-11-07 22:52:48,434][40007] Num frames 91800... +[2024-11-07 22:52:48,719][40007] Avg episode rewards: #0: 4.462, true rewards: #0: 4.042 +[2024-11-07 22:52:48,721][40007] Avg episode reward: 4.462, avg true_objective: 4.042 +[2024-11-07 22:52:48,903][40007] Num frames 91900... +[2024-11-07 22:52:49,512][40007] Num frames 92000... +[2024-11-07 22:52:49,927][40007] Num frames 92100... +[2024-11-07 22:52:50,332][40007] Num frames 92200... +[2024-11-07 22:52:50,678][40007] Avg episode rewards: #0: 4.475, true rewards: #0: 4.045 +[2024-11-07 22:52:50,680][40007] Avg episode reward: 4.475, avg true_objective: 4.045 +[2024-11-07 22:52:50,768][40007] Num frames 92300... +[2024-11-07 22:52:51,219][40007] Num frames 92400... +[2024-11-07 22:52:51,909][40007] Num frames 92500... +[2024-11-07 22:52:52,274][40007] Num frames 92600... +[2024-11-07 22:52:52,605][40007] Avg episode rewards: #0: 4.475, true rewards: #0: 4.045 +[2024-11-07 22:52:52,607][40007] Avg episode reward: 4.475, avg true_objective: 4.045 +[2024-11-07 22:52:53,344][40007] Num frames 92700... +[2024-11-07 22:52:55,512][40007] Num frames 92800... +[2024-11-07 22:52:55,832][40007] Num frames 92900... +[2024-11-07 22:52:56,086][40007] Num frames 93000... +[2024-11-07 22:52:56,241][40007] Avg episode rewards: #0: 4.475, true rewards: #0: 4.045 +[2024-11-07 22:52:56,245][40007] Avg episode reward: 4.475, avg true_objective: 4.045 +[2024-11-07 22:52:56,463][40007] Num frames 93100... +[2024-11-07 22:52:56,687][40007] Num frames 93200... +[2024-11-07 22:52:56,898][40007] Num frames 93300... +[2024-11-07 22:52:57,124][40007] Num frames 93400... +[2024-11-07 22:52:57,254][40007] Avg episode rewards: #0: 4.462, true rewards: #0: 4.042 +[2024-11-07 22:52:57,257][40007] Avg episode reward: 4.462, avg true_objective: 4.042 +[2024-11-07 22:52:57,436][40007] Num frames 93500... +[2024-11-07 22:52:57,930][40007] Num frames 93600... +[2024-11-07 22:52:58,171][40007] Num frames 93700... +[2024-11-07 22:52:58,364][40007] Num frames 93800... +[2024-11-07 22:52:58,461][40007] Avg episode rewards: #0: 4.462, true rewards: #0: 4.042 +[2024-11-07 22:52:58,466][40007] Avg episode reward: 4.462, avg true_objective: 4.042 +[2024-11-07 22:52:58,654][40007] Num frames 93900... +[2024-11-07 22:52:59,064][40007] Num frames 94000... +[2024-11-07 22:52:59,686][40007] Num frames 94100... +[2024-11-07 22:53:00,140][40007] Num frames 94200... +[2024-11-07 22:53:00,262][40007] Avg episode rewards: #0: 4.465, true rewards: #0: 4.045 +[2024-11-07 22:53:00,263][40007] Avg episode reward: 4.465, avg true_objective: 4.045 +[2024-11-07 22:53:00,407][40007] Num frames 94300... +[2024-11-07 22:53:00,723][40007] Num frames 94400... +[2024-11-07 22:53:00,960][40007] Num frames 94500... +[2024-11-07 22:53:01,205][40007] Num frames 94600... +[2024-11-07 22:53:01,331][40007] Avg episode rewards: #0: 4.465, true rewards: #0: 4.045 +[2024-11-07 22:53:01,333][40007] Avg episode reward: 4.465, avg true_objective: 4.045 +[2024-11-07 22:53:01,558][40007] Num frames 94700... +[2024-11-07 22:53:01,748][40007] Num frames 94800... +[2024-11-07 22:53:02,013][40007] Num frames 94900... +[2024-11-07 22:53:02,238][40007] Num frames 95000... +[2024-11-07 22:53:02,421][40007] Avg episode rewards: #0: 4.468, true rewards: #0: 4.048 +[2024-11-07 22:53:02,424][40007] Avg episode reward: 4.468, avg true_objective: 4.048 +[2024-11-07 22:53:02,508][40007] Num frames 95100... +[2024-11-07 22:53:02,711][40007] Num frames 95200... +[2024-11-07 22:53:02,929][40007] Num frames 95300... +[2024-11-07 22:53:03,140][40007] Num frames 95400... +[2024-11-07 22:53:03,411][40007] Avg episode rewards: #0: 4.468, true rewards: #0: 4.048 +[2024-11-07 22:53:03,416][40007] Avg episode reward: 4.468, avg true_objective: 4.048 +[2024-11-07 22:53:03,537][40007] Num frames 95500... +[2024-11-07 22:53:03,738][40007] Num frames 95600... +[2024-11-07 22:53:03,926][40007] Num frames 95700... +[2024-11-07 22:53:04,133][40007] Num frames 95800... +[2024-11-07 22:53:04,311][40007] Avg episode rewards: #0: 4.481, true rewards: #0: 4.051 +[2024-11-07 22:53:04,315][40007] Avg episode reward: 4.481, avg true_objective: 4.051 +[2024-11-07 22:53:04,405][40007] Num frames 95900... +[2024-11-07 22:53:04,638][40007] Num frames 96000... +[2024-11-07 22:53:04,848][40007] Num frames 96100... +[2024-11-07 22:53:05,052][40007] Num frames 96200... +[2024-11-07 22:53:05,205][40007] Avg episode rewards: #0: 4.481, true rewards: #0: 4.051 +[2024-11-07 22:53:05,206][40007] Avg episode reward: 4.481, avg true_objective: 4.051 +[2024-11-07 22:53:05,317][40007] Num frames 96300... +[2024-11-07 22:53:05,517][40007] Num frames 96400... +[2024-11-07 22:53:05,716][40007] Num frames 96500... +[2024-11-07 22:53:05,915][40007] Num frames 96600... +[2024-11-07 22:53:06,202][40007] Num frames 96700... +[2024-11-07 22:53:06,409][40007] Num frames 96800... +[2024-11-07 22:53:06,512][40007] Avg episode rewards: #0: 4.540, true rewards: #0: 4.080 +[2024-11-07 22:53:06,514][40007] Avg episode reward: 4.540, avg true_objective: 4.080 +[2024-11-07 22:53:06,678][40007] Num frames 96900... +[2024-11-07 22:53:06,906][40007] Num frames 97000... +[2024-11-07 22:53:07,104][40007] Num frames 97100... +[2024-11-07 22:53:07,315][40007] Num frames 97200... +[2024-11-07 22:53:07,389][40007] Avg episode rewards: #0: 4.540, true rewards: #0: 4.080 +[2024-11-07 22:53:07,391][40007] Avg episode reward: 4.540, avg true_objective: 4.080 +[2024-11-07 22:53:07,574][40007] Num frames 97300... +[2024-11-07 22:53:07,758][40007] Num frames 97400... +[2024-11-07 22:53:07,970][40007] Num frames 97500... +[2024-11-07 22:53:08,229][40007] Avg episode rewards: #0: 4.540, true rewards: #0: 4.080 +[2024-11-07 22:53:08,231][40007] Avg episode reward: 4.540, avg true_objective: 4.080 +[2024-11-07 22:53:08,266][40007] Num frames 97600... +[2024-11-07 22:53:08,485][40007] Num frames 97700... +[2024-11-07 22:53:10,404][40007] Num frames 97800... +[2024-11-07 22:53:10,638][40007] Num frames 97900... +[2024-11-07 22:53:10,858][40007] Num frames 98000... +[2024-11-07 22:53:11,002][40007] Avg episode rewards: #0: 4.527, true rewards: #0: 4.077 +[2024-11-07 22:53:11,007][40007] Avg episode reward: 4.527, avg true_objective: 4.077 +[2024-11-07 22:53:11,165][40007] Num frames 98100... +[2024-11-07 22:53:11,381][40007] Num frames 98200... +[2024-11-07 22:53:11,585][40007] Num frames 98300... +[2024-11-07 22:53:11,778][40007] Num frames 98400... +[2024-11-07 22:53:12,077][40007] Avg episode rewards: #0: 4.543, true rewards: #0: 4.083 +[2024-11-07 22:53:12,083][40007] Avg episode reward: 4.543, avg true_objective: 4.083 +[2024-11-07 22:53:12,126][40007] Num frames 98500... +[2024-11-07 22:53:12,368][40007] Num frames 98600... +[2024-11-07 22:53:12,606][40007] Num frames 98700... +[2024-11-07 22:53:12,822][40007] Num frames 98800... +[2024-11-07 22:53:13,046][40007] Num frames 98900... +[2024-11-07 22:53:13,179][40007] Avg episode rewards: #0: 4.546, true rewards: #0: 4.086 +[2024-11-07 22:53:13,181][40007] Avg episode reward: 4.546, avg true_objective: 4.086 +[2024-11-07 22:53:13,328][40007] Num frames 99000... +[2024-11-07 22:53:13,587][40007] Num frames 99100... +[2024-11-07 22:53:13,792][40007] Num frames 99200... +[2024-11-07 22:53:13,996][40007] Num frames 99300... +[2024-11-07 22:53:14,092][40007] Avg episode rewards: #0: 4.546, true rewards: #0: 4.086 +[2024-11-07 22:53:14,094][40007] Avg episode reward: 4.546, avg true_objective: 4.086 +[2024-11-07 22:53:14,275][40007] Num frames 99400... +[2024-11-07 22:53:14,479][40007] Num frames 99500... +[2024-11-07 22:53:14,680][40007] Num frames 99600... +[2024-11-07 22:53:14,871][40007] Num frames 99700... +[2024-11-07 22:53:14,933][40007] Avg episode rewards: #0: 4.546, true rewards: #0: 4.086 +[2024-11-07 22:53:14,935][40007] Avg episode reward: 4.546, avg true_objective: 4.086 +[2024-11-07 22:53:15,156][40007] Num frames 99800... +[2024-11-07 22:53:15,357][40007] Num frames 99900... +[2024-11-07 22:53:15,544][40007] Num frames 100000... +[2024-11-07 22:53:15,802][40007] Avg episode rewards: #0: 4.546, true rewards: #0: 4.086 +[2024-11-07 22:53:15,806][40007] Avg episode reward: 4.546, avg true_objective: 4.086 +[2024-11-07 22:53:15,854][40007] Num frames 100100... +[2024-11-07 22:53:16,056][40007] Num frames 100200... +[2024-11-07 22:53:16,307][40007] Num frames 100300... +[2024-11-07 22:53:16,514][40007] Num frames 100400... +[2024-11-07 22:53:16,731][40007] Num frames 100500... +[2024-11-07 22:53:16,866][40007] Avg episode rewards: #0: 4.550, true rewards: #0: 4.090 +[2024-11-07 22:53:16,871][40007] Avg episode reward: 4.550, avg true_objective: 4.090 +[2024-11-07 22:53:17,136][40007] Num frames 100600... +[2024-11-07 22:53:17,487][40007] Num frames 100700... +[2024-11-07 22:53:18,563][40007] Num frames 100800... +[2024-11-07 22:53:19,121][40007] Num frames 100900... +[2024-11-07 22:53:19,588][40007] Avg episode rewards: #0: 4.517, true rewards: #0: 4.077 +[2024-11-07 22:53:19,590][40007] Avg episode reward: 4.517, avg true_objective: 4.077 +[2024-11-07 22:53:20,202][40007] Num frames 101000... +[2024-11-07 22:53:20,461][40007] Num frames 101100... +[2024-11-07 22:53:20,726][40007] Num frames 101200... +[2024-11-07 22:53:20,961][40007] Num frames 101300... +[2024-11-07 22:53:21,028][40007] Avg episode rewards: #0: 4.500, true rewards: #0: 4.070 +[2024-11-07 22:53:21,030][40007] Avg episode reward: 4.500, avg true_objective: 4.070 +[2024-11-07 22:53:21,274][40007] Num frames 101400... +[2024-11-07 22:53:21,503][40007] Num frames 101500... +[2024-11-07 22:53:21,718][40007] Num frames 101600... +[2024-11-07 22:53:22,188][40007] Avg episode rewards: #0: 4.500, true rewards: #0: 4.070 +[2024-11-07 22:53:22,190][40007] Avg episode reward: 4.500, avg true_objective: 4.070 +[2024-11-07 22:53:22,248][40007] Num frames 101700... +[2024-11-07 22:53:22,518][40007] Num frames 101800... +[2024-11-07 22:53:22,781][40007] Num frames 101900... +[2024-11-07 22:53:23,035][40007] Num frames 102000... +[2024-11-07 22:53:23,592][40007] Avg episode rewards: #0: 4.500, true rewards: #0: 4.070 +[2024-11-07 22:53:23,594][40007] Avg episode reward: 4.500, avg true_objective: 4.070 +[2024-11-07 22:53:23,787][40007] Num frames 102100... +[2024-11-07 22:53:24,282][40007] Num frames 102200... +[2024-11-07 22:53:24,572][40007] Num frames 102300... +[2024-11-07 22:53:24,934][40007] Num frames 102400... +[2024-11-07 22:53:25,170][40007] Avg episode rewards: #0: 4.500, true rewards: #0: 4.070 +[2024-11-07 22:53:25,172][40007] Avg episode reward: 4.500, avg true_objective: 4.070 +[2024-11-07 22:53:25,293][40007] Num frames 102500... +[2024-11-07 22:53:25,514][40007] Num frames 102600... +[2024-11-07 22:53:25,702][40007] Num frames 102700... +[2024-11-07 22:53:25,884][40007] Num frames 102800... +[2024-11-07 22:53:26,022][40007] Avg episode rewards: #0: 4.468, true rewards: #0: 4.058 +[2024-11-07 22:53:26,026][40007] Avg episode reward: 4.468, avg true_objective: 4.058 +[2024-11-07 22:53:26,162][40007] Num frames 102900... +[2024-11-07 22:53:26,363][40007] Num frames 103000... +[2024-11-07 22:53:26,560][40007] Num frames 103100... +[2024-11-07 22:53:26,761][40007] Num frames 103200... +[2024-11-07 22:53:26,866][40007] Avg episode rewards: #0: 4.454, true rewards: #0: 4.054 +[2024-11-07 22:53:26,869][40007] Avg episode reward: 4.454, avg true_objective: 4.054 +[2024-11-07 22:53:27,061][40007] Num frames 103300... +[2024-11-07 22:53:27,334][40007] Num frames 103400... +[2024-11-07 22:53:27,630][40007] Num frames 103500... +[2024-11-07 22:53:27,886][40007] Num frames 103600... +[2024-11-07 22:53:28,131][40007] Avg episode rewards: #0: 4.448, true rewards: #0: 4.058 +[2024-11-07 22:53:28,134][40007] Avg episode reward: 4.448, avg true_objective: 4.058 +[2024-11-07 22:53:28,212][40007] Num frames 103700... +[2024-11-07 22:53:28,414][40007] Num frames 103800... +[2024-11-07 22:53:28,622][40007] Num frames 103900... +[2024-11-07 22:53:28,856][40007] Num frames 104000... +[2024-11-07 22:53:29,076][40007] Num frames 104100... +[2024-11-07 22:53:29,227][40007] Avg episode rewards: #0: 4.424, true rewards: #0: 4.054 +[2024-11-07 22:53:29,231][40007] Avg episode reward: 4.424, avg true_objective: 4.054 +[2024-11-07 22:53:29,353][40007] Num frames 104200... +[2024-11-07 22:53:29,532][40007] Num frames 104300... +[2024-11-07 22:53:29,743][40007] Num frames 104400... +[2024-11-07 22:53:30,021][40007] Num frames 104500... +[2024-11-07 22:53:30,168][40007] Avg episode rewards: #0: 4.408, true rewards: #0: 4.048 +[2024-11-07 22:53:30,171][40007] Avg episode reward: 4.408, avg true_objective: 4.048 +[2024-11-07 22:53:30,346][40007] Num frames 104600... +[2024-11-07 22:53:30,606][40007] Num frames 104700... +[2024-11-07 22:53:30,828][40007] Num frames 104800... +[2024-11-07 22:53:31,089][40007] Num frames 104900... +[2024-11-07 22:53:31,342][40007] Avg episode rewards: #0: 4.424, true rewards: #0: 4.054 +[2024-11-07 22:53:31,347][40007] Avg episode reward: 4.424, avg true_objective: 4.054 +[2024-11-07 22:53:31,416][40007] Num frames 105000... +[2024-11-07 22:53:31,657][40007] Num frames 105100... +[2024-11-07 22:53:31,895][40007] Num frames 105200... +[2024-11-07 22:53:32,146][40007] Num frames 105300... +[2024-11-07 22:53:32,359][40007] Avg episode rewards: #0: 4.424, true rewards: #0: 4.054 +[2024-11-07 22:53:32,361][40007] Avg episode reward: 4.424, avg true_objective: 4.054 +[2024-11-07 22:53:32,461][40007] Num frames 105400... +[2024-11-07 22:53:32,717][40007] Num frames 105500... +[2024-11-07 22:53:32,987][40007] Num frames 105600... +[2024-11-07 22:53:33,275][40007] Num frames 105700... +[2024-11-07 22:53:33,435][40007] Avg episode rewards: #0: 4.424, true rewards: #0: 4.054 +[2024-11-07 22:53:33,438][40007] Avg episode reward: 4.424, avg true_objective: 4.054 +[2024-11-07 22:53:33,639][40007] Num frames 105800... +[2024-11-07 22:53:33,864][40007] Num frames 105900... +[2024-11-07 22:53:34,093][40007] Num frames 106000... +[2024-11-07 22:53:34,405][40007] Num frames 106100... +[2024-11-07 22:53:34,568][40007] Avg episode rewards: #0: 4.424, true rewards: #0: 4.054 +[2024-11-07 22:53:34,572][40007] Avg episode reward: 4.424, avg true_objective: 4.054 +[2024-11-07 22:53:34,803][40007] Num frames 106200... +[2024-11-07 22:53:35,091][40007] Num frames 106300... +[2024-11-07 22:53:35,435][40007] Num frames 106400... +[2024-11-07 22:53:36,061][40007] Num frames 106500... +[2024-11-07 22:53:36,175][40007] Avg episode rewards: #0: 4.424, true rewards: #0: 4.054 +[2024-11-07 22:53:36,179][40007] Avg episode reward: 4.424, avg true_objective: 4.054 +[2024-11-07 22:53:36,376][40007] Num frames 106600... +[2024-11-07 22:53:36,647][40007] Num frames 106700... +[2024-11-07 22:53:37,405][40007] Num frames 106800... +[2024-11-07 22:53:37,643][40007] Num frames 106900... +[2024-11-07 22:53:37,863][40007] Avg episode rewards: #0: 4.441, true rewards: #0: 4.061 +[2024-11-07 22:53:37,869][40007] Avg episode reward: 4.441, avg true_objective: 4.061 +[2024-11-07 22:53:37,983][40007] Num frames 107000... +[2024-11-07 22:53:38,221][40007] Num frames 107100... +[2024-11-07 22:53:38,486][40007] Num frames 107200... +[2024-11-07 22:53:38,709][40007] Num frames 107300... +[2024-11-07 22:53:38,925][40007] Avg episode rewards: #0: 4.464, true rewards: #0: 4.064 +[2024-11-07 22:53:38,928][40007] Avg episode reward: 4.464, avg true_objective: 4.064 +[2024-11-07 22:53:38,991][40007] Num frames 107400... +[2024-11-07 22:53:39,193][40007] Num frames 107500... +[2024-11-07 22:53:39,393][40007] Num frames 107600... +[2024-11-07 22:53:39,597][40007] Num frames 107700... +[2024-11-07 22:53:39,796][40007] Num frames 107800... +[2024-11-07 22:53:39,925][40007] Avg episode rewards: #0: 4.480, true rewards: #0: 4.070 +[2024-11-07 22:53:39,935][40007] Avg episode reward: 4.480, avg true_objective: 4.070 +[2024-11-07 22:53:40,085][40007] Num frames 107900... +[2024-11-07 22:53:40,283][40007] Num frames 108000... +[2024-11-07 22:53:40,573][40007] Num frames 108100... +[2024-11-07 22:53:40,849][40007] Num frames 108200... +[2024-11-07 22:53:41,115][40007] Avg episode rewards: #0: 4.497, true rewards: #0: 4.077 +[2024-11-07 22:53:41,118][40007] Avg episode reward: 4.497, avg true_objective: 4.077 +[2024-11-07 22:53:41,183][40007] Num frames 108300... +[2024-11-07 22:53:41,390][40007] Num frames 108400... +[2024-11-07 22:53:41,583][40007] Num frames 108500... +[2024-11-07 22:53:41,775][40007] Num frames 108600... +[2024-11-07 22:53:41,969][40007] Avg episode rewards: #0: 4.484, true rewards: #0: 4.074 +[2024-11-07 22:53:41,975][40007] Avg episode reward: 4.484, avg true_objective: 4.074 +[2024-11-07 22:53:42,076][40007] Num frames 108700... +[2024-11-07 22:53:42,274][40007] Num frames 108800... +[2024-11-07 22:53:42,498][40007] Num frames 108900... +[2024-11-07 22:53:44,138][40007] Num frames 109000... +[2024-11-07 22:53:44,289][40007] Avg episode rewards: #0: 4.467, true rewards: #0: 4.067 +[2024-11-07 22:53:44,290][40007] Avg episode reward: 4.467, avg true_objective: 4.067 +[2024-11-07 22:53:44,437][40007] Num frames 109100... +[2024-11-07 22:53:44,813][40007] Num frames 109200... +[2024-11-07 22:53:45,141][40007] Num frames 109300... +[2024-11-07 22:53:45,403][40007] Num frames 109400... +[2024-11-07 22:53:45,969][40007] Avg episode rewards: #0: 4.496, true rewards: #0: 4.086 +[2024-11-07 22:53:45,971][40007] Avg episode reward: 4.496, avg true_objective: 4.086 +[2024-11-07 22:53:45,990][40007] Num frames 109500... +[2024-11-07 22:53:46,251][40007] Num frames 109600... +[2024-11-07 22:53:46,561][40007] Num frames 109700... +[2024-11-07 22:53:47,056][40007] Num frames 109800... +[2024-11-07 22:53:47,460][40007] Avg episode rewards: #0: 4.483, true rewards: #0: 4.083 +[2024-11-07 22:53:47,461][40007] Avg episode reward: 4.483, avg true_objective: 4.083 +[2024-11-07 22:53:47,540][40007] Num frames 109900... +[2024-11-07 22:53:47,970][40007] Num frames 110000... +[2024-11-07 22:53:48,288][40007] Num frames 110100... +[2024-11-07 22:53:48,570][40007] Num frames 110200... +[2024-11-07 22:53:48,803][40007] Avg episode rewards: #0: 4.483, true rewards: #0: 4.083 +[2024-11-07 22:53:48,809][40007] Avg episode reward: 4.483, avg true_objective: 4.083 +[2024-11-07 22:53:48,918][40007] Num frames 110300... +[2024-11-07 22:53:49,153][40007] Num frames 110400... +[2024-11-07 22:53:49,354][40007] Num frames 110500... +[2024-11-07 22:53:49,684][40007] Num frames 110600... +[2024-11-07 22:53:49,842][40007] Avg episode rewards: #0: 4.483, true rewards: #0: 4.083 +[2024-11-07 22:53:49,845][40007] Avg episode reward: 4.483, avg true_objective: 4.083 +[2024-11-07 22:53:50,008][40007] Num frames 110700... +[2024-11-07 22:53:50,227][40007] Num frames 110800... +[2024-11-07 22:53:50,436][40007] Num frames 110900... +[2024-11-07 22:53:50,672][40007] Num frames 111000... +[2024-11-07 22:53:50,799][40007] Avg episode rewards: #0: 4.467, true rewards: #0: 4.077 +[2024-11-07 22:53:50,801][40007] Avg episode reward: 4.467, avg true_objective: 4.077 +[2024-11-07 22:53:50,976][40007] Num frames 111100... +[2024-11-07 22:53:51,244][40007] Num frames 111200... +[2024-11-07 22:53:51,471][40007] Num frames 111300... +[2024-11-07 22:53:51,716][40007] Num frames 111400... +[2024-11-07 22:53:51,962][40007] Num frames 111500... +[2024-11-07 22:53:52,224][40007] Num frames 111600... +[2024-11-07 22:53:52,383][40007] Avg episode rewards: #0: 4.519, true rewards: #0: 4.099 +[2024-11-07 22:53:52,385][40007] Avg episode reward: 4.519, avg true_objective: 4.099 +[2024-11-07 22:53:52,517][40007] Num frames 111700... +[2024-11-07 22:53:52,756][40007] Num frames 111800... +[2024-11-07 22:53:53,008][40007] Num frames 111900... +[2024-11-07 22:53:53,278][40007] Num frames 112000... +[2024-11-07 22:53:53,399][40007] Avg episode rewards: #0: 4.519, true rewards: #0: 4.099 +[2024-11-07 22:53:53,400][40007] Avg episode reward: 4.519, avg true_objective: 4.099 +[2024-11-07 22:53:53,614][40007] Num frames 112100... +[2024-11-07 22:53:53,859][40007] Num frames 112200... +[2024-11-07 22:53:54,159][40007] Num frames 112300... +[2024-11-07 22:53:54,451][40007] Num frames 112400... +[2024-11-07 22:53:54,523][40007] Avg episode rewards: #0: 4.519, true rewards: #0: 4.099 +[2024-11-07 22:53:54,526][40007] Avg episode reward: 4.519, avg true_objective: 4.099 +[2024-11-07 22:53:54,788][40007] Num frames 112500... +[2024-11-07 22:53:55,103][40007] Num frames 112600... +[2024-11-07 22:53:55,395][40007] Num frames 112700... +[2024-11-07 22:53:55,702][40007] Avg episode rewards: #0: 4.503, true rewards: #0: 4.093 +[2024-11-07 22:53:55,707][40007] Avg episode reward: 4.503, avg true_objective: 4.093 +[2024-11-07 22:53:55,768][40007] Num frames 112800... +[2024-11-07 22:53:56,005][40007] Num frames 112900... +[2024-11-07 22:53:56,259][40007] Num frames 113000... +[2024-11-07 22:53:56,479][40007] Num frames 113100... +[2024-11-07 22:53:56,745][40007] Avg episode rewards: #0: 4.486, true rewards: #0: 4.086 +[2024-11-07 22:53:56,747][40007] Avg episode reward: 4.486, avg true_objective: 4.086 +[2024-11-07 22:53:56,798][40007] Num frames 113200... +[2024-11-07 22:53:56,996][40007] Num frames 113300... +[2024-11-07 22:53:57,232][40007] Num frames 113400... +[2024-11-07 22:53:57,467][40007] Num frames 113500... +[2024-11-07 22:53:57,696][40007] Num frames 113600... +[2024-11-07 22:53:57,940][40007] Num frames 113700... +[2024-11-07 22:53:58,050][40007] Avg episode rewards: #0: 4.519, true rewards: #0: 4.099 +[2024-11-07 22:53:58,056][40007] Avg episode reward: 4.519, avg true_objective: 4.099 +[2024-11-07 22:53:58,276][40007] Num frames 113800... +[2024-11-07 22:53:58,501][40007] Num frames 113900... +[2024-11-07 22:53:58,755][40007] Num frames 114000... +[2024-11-07 22:53:58,885][40007] Avg episode rewards: #0: 4.512, true rewards: #0: 4.092 +[2024-11-07 22:53:58,889][40007] Avg episode reward: 4.512, avg true_objective: 4.092 +[2024-11-07 22:53:59,052][40007] Num frames 114100... +[2024-11-07 22:53:59,301][40007] Num frames 114200... +[2024-11-07 22:53:59,517][40007] Num frames 114300... +[2024-11-07 22:53:59,765][40007] Num frames 114400... +[2024-11-07 22:53:59,857][40007] Avg episode rewards: #0: 4.512, true rewards: #0: 4.092 +[2024-11-07 22:53:59,862][40007] Avg episode reward: 4.512, avg true_objective: 4.092 +[2024-11-07 22:54:00,063][40007] Num frames 114500... +[2024-11-07 22:54:00,293][40007] Num frames 114600... +[2024-11-07 22:54:00,523][40007] Num frames 114700... +[2024-11-07 22:54:00,738][40007] Num frames 114800... +[2024-11-07 22:54:00,944][40007] Avg episode rewards: #0: 4.529, true rewards: #0: 4.098 +[2024-11-07 22:54:00,946][40007] Avg episode reward: 4.529, avg true_objective: 4.098 +[2024-11-07 22:54:01,082][40007] Num frames 114900... +[2024-11-07 22:54:01,369][40007] Num frames 115000... +[2024-11-07 22:54:01,632][40007] Num frames 115100... +[2024-11-07 22:54:01,907][40007] Num frames 115200... +[2024-11-07 22:54:02,109][40007] Avg episode rewards: #0: 4.529, true rewards: #0: 4.098 +[2024-11-07 22:54:02,112][40007] Avg episode reward: 4.529, avg true_objective: 4.098 +[2024-11-07 22:54:02,255][40007] Num frames 115300... +[2024-11-07 22:54:02,484][40007] Num frames 115400... +[2024-11-07 22:54:02,720][40007] Num frames 115500... +[2024-11-07 22:54:02,958][40007] Num frames 115600... +[2024-11-07 22:54:03,151][40007] Avg episode rewards: #0: 4.529, true rewards: #0: 4.098 +[2024-11-07 22:54:03,152][40007] Avg episode reward: 4.529, avg true_objective: 4.098 +[2024-11-07 22:54:03,326][40007] Num frames 115700... +[2024-11-07 22:54:03,565][40007] Num frames 115800... +[2024-11-07 22:54:03,811][40007] Num frames 115900... +[2024-11-07 22:54:04,067][40007] Num frames 116000... +[2024-11-07 22:54:04,163][40007] Avg episode rewards: #0: 4.512, true rewards: #0: 4.092 +[2024-11-07 22:54:04,165][40007] Avg episode reward: 4.512, avg true_objective: 4.092 +[2024-11-07 22:54:04,382][40007] Num frames 116100... +[2024-11-07 22:54:04,620][40007] Num frames 116200... +[2024-11-07 22:54:04,845][40007] Num frames 116300... +[2024-11-07 22:54:05,095][40007] Avg episode rewards: #0: 4.496, true rewards: #0: 4.086 +[2024-11-07 22:54:05,098][40007] Avg episode reward: 4.496, avg true_objective: 4.086 +[2024-11-07 22:54:05,106][40007] Num frames 116400... +[2024-11-07 22:54:05,336][40007] Num frames 116500... +[2024-11-07 22:54:05,532][40007] Num frames 116600... +[2024-11-07 22:54:05,790][40007] Num frames 116700... +[2024-11-07 22:54:06,029][40007] Avg episode rewards: #0: 4.479, true rewards: #0: 4.079 +[2024-11-07 22:54:06,036][40007] Avg episode reward: 4.479, avg true_objective: 4.079 +[2024-11-07 22:54:06,414][40007] Num frames 116800... +[2024-11-07 22:54:06,648][40007] Num frames 116900... +[2024-11-07 22:54:06,855][40007] Num frames 117000... +[2024-11-07 22:54:07,071][40007] Num frames 117100... +[2024-11-07 22:54:07,277][40007] Avg episode rewards: #0: 4.463, true rewards: #0: 4.073 +[2024-11-07 22:54:07,278][40007] Avg episode reward: 4.463, avg true_objective: 4.073 +[2024-11-07 22:54:07,354][40007] Num frames 117200... +[2024-11-07 22:54:07,568][40007] Num frames 117300... +[2024-11-07 22:54:07,777][40007] Num frames 117400... +[2024-11-07 22:54:08,049][40007] Num frames 117500... +[2024-11-07 22:54:08,269][40007] Num frames 117600... +[2024-11-07 22:54:08,360][40007] Avg episode rewards: #0: 4.476, true rewards: #0: 4.076 +[2024-11-07 22:54:08,363][40007] Avg episode reward: 4.476, avg true_objective: 4.076 +[2024-11-07 22:54:08,548][40007] Num frames 117700... +[2024-11-07 22:54:08,752][40007] Num frames 117800... +[2024-11-07 22:54:08,953][40007] Num frames 117900... +[2024-11-07 22:54:09,211][40007] Avg episode rewards: #0: 4.476, true rewards: #0: 4.076 +[2024-11-07 22:54:09,216][40007] Avg episode reward: 4.476, avg true_objective: 4.076 +[2024-11-07 22:54:09,228][40007] Num frames 118000... +[2024-11-07 22:54:09,449][40007] Num frames 118100... +[2024-11-07 22:54:09,682][40007] Num frames 118200... +[2024-11-07 22:54:09,910][40007] Num frames 118300... +[2024-11-07 22:54:10,124][40007] Num frames 118400... +[2024-11-07 22:54:10,361][40007] Num frames 118500... +[2024-11-07 22:54:10,514][40007] Avg episode rewards: #0: 4.496, true rewards: #0: 4.086 +[2024-11-07 22:54:10,521][40007] Avg episode reward: 4.496, avg true_objective: 4.086 +[2024-11-07 22:54:10,653][40007] Num frames 118600... +[2024-11-07 22:54:10,878][40007] Num frames 118700... +[2024-11-07 22:54:11,125][40007] Num frames 118800... +[2024-11-07 22:54:11,374][40007] Num frames 118900... +[2024-11-07 22:54:11,643][40007] Avg episode rewards: #0: 4.512, true rewards: #0: 4.092 +[2024-11-07 22:54:11,645][40007] Avg episode reward: 4.512, avg true_objective: 4.092 +[2024-11-07 22:54:11,681][40007] Num frames 119000... +[2024-11-07 22:54:11,915][40007] Num frames 119100... +[2024-11-07 22:54:12,138][40007] Num frames 119200... +[2024-11-07 22:54:12,364][40007] Num frames 119300... +[2024-11-07 22:54:12,591][40007] Num frames 119400... +[2024-11-07 22:54:12,664][40007] Avg episode rewards: #0: 4.525, true rewards: #0: 4.095 +[2024-11-07 22:54:12,669][40007] Avg episode reward: 4.525, avg true_objective: 4.095 +[2024-11-07 22:54:12,904][40007] Num frames 119500... +[2024-11-07 22:54:13,125][40007] Num frames 119600... +[2024-11-07 22:54:13,328][40007] Num frames 119700... +[2024-11-07 22:54:13,592][40007] Num frames 119800... +[2024-11-07 22:54:14,047][40007] Avg episode rewards: #0: 4.542, true rewards: #0: 4.102 +[2024-11-07 22:54:14,048][40007] Avg episode reward: 4.542, avg true_objective: 4.102 +[2024-11-07 22:54:14,162][40007] Num frames 119900... +[2024-11-07 22:54:14,432][40007] Num frames 120000... +[2024-11-07 22:54:14,673][40007] Num frames 120100... +[2024-11-07 22:54:14,905][40007] Num frames 120200... +[2024-11-07 22:54:15,121][40007] Num frames 120300... +[2024-11-07 22:54:44,296][40007] Avg episode rewards: #0: 4.558, true rewards: #0: 4.108 +[2024-11-07 22:54:44,748][40007] Avg episode reward: 4.558, avg true_objective: 4.108 +[2024-11-07 23:13:49,674][41694] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... +[2024-11-07 23:13:49,700][41694] Rollout worker 0 uses device cpu +[2024-11-07 23:13:49,701][41694] Rollout worker 1 uses device cpu +[2024-11-07 23:13:49,702][41694] Rollout worker 2 uses device cpu +[2024-11-07 23:13:49,703][41694] Rollout worker 3 uses device cpu +[2024-11-07 23:13:49,704][41694] Rollout worker 4 uses device cpu +[2024-11-07 23:13:49,705][41694] Rollout worker 5 uses device cpu +[2024-11-07 23:13:49,708][41694] Rollout worker 6 uses device cpu +[2024-11-07 23:13:49,710][41694] Rollout worker 7 uses device cpu +[2024-11-07 23:13:50,027][41694] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 23:13:50,029][41694] InferenceWorker_p0-w0: min num requests: 2 +[2024-11-07 23:13:50,062][41694] Starting all processes... +[2024-11-07 23:13:50,064][41694] Starting process learner_proc0 +[2024-11-07 23:13:50,313][41694] Starting all processes... +[2024-11-07 23:13:50,372][41694] Starting process inference_proc0-0 +[2024-11-07 23:13:50,373][41694] Starting process rollout_proc0 +[2024-11-07 23:13:50,374][41694] Starting process rollout_proc1 +[2024-11-07 23:13:50,374][41694] Starting process rollout_proc2 +[2024-11-07 23:13:50,375][41694] Starting process rollout_proc3 +[2024-11-07 23:13:50,376][41694] Starting process rollout_proc4 +[2024-11-07 23:13:50,376][41694] Starting process rollout_proc5 +[2024-11-07 23:13:50,377][41694] Starting process rollout_proc6 +[2024-11-07 23:13:50,378][41694] Starting process rollout_proc7 +[2024-11-07 23:13:56,106][42009] Worker 5 uses CPU cores [5] +[2024-11-07 23:13:57,184][42004] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 23:13:57,184][42004] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-11-07 23:13:57,333][42010] Worker 4 uses CPU cores [4] +[2024-11-07 23:13:57,431][42004] Num visible devices: 1 +[2024-11-07 23:13:57,692][42007] Worker 2 uses CPU cores [2] +[2024-11-07 23:13:57,971][42005] Worker 0 uses CPU cores [0] +[2024-11-07 23:13:57,981][42017] Worker 6 uses CPU cores [6] +[2024-11-07 23:13:58,028][42008] Worker 3 uses CPU cores [3] +[2024-11-07 23:13:58,229][41991] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 23:13:58,230][41991] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-11-07 23:13:58,251][41991] Num visible devices: 1 +[2024-11-07 23:13:58,259][41991] Starting seed is not provided +[2024-11-07 23:13:58,260][41991] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 23:13:58,260][41991] Initializing actor-critic model on device cuda:0 +[2024-11-07 23:13:58,260][41991] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 23:13:58,264][41991] RunningMeanStd input shape: (1,) +[2024-11-07 23:13:58,279][41991] ConvEncoder: input_channels=3 +[2024-11-07 23:13:58,521][42018] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] +[2024-11-07 23:13:58,619][42006] Worker 1 uses CPU cores [1] +[2024-11-07 23:13:59,504][41991] Conv encoder output size: 512 +[2024-11-07 23:13:59,505][41991] Policy head output size: 512 +[2024-11-07 23:13:59,884][41991] Created Actor Critic model with architecture: +[2024-11-07 23:13:59,885][41991] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-11-07 23:14:02,026][41991] Using optimizer +[2024-11-07 23:14:08,174][41991] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004886_20013056.pth... +[2024-11-07 23:14:08,593][41991] Loading model from checkpoint +[2024-11-07 23:14:08,598][41991] Loaded experiment state at self.train_step=4886, self.env_steps=20013056 +[2024-11-07 23:14:08,598][41991] Initialized policy 0 weights for model version 4886 +[2024-11-07 23:14:08,608][41991] LearnerWorker_p0 finished initialization! +[2024-11-07 23:14:08,608][41991] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 23:14:08,861][42004] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 23:14:08,862][42004] RunningMeanStd input shape: (1,) +[2024-11-07 23:14:08,875][42004] ConvEncoder: input_channels=3 +[2024-11-07 23:14:08,978][42004] Conv encoder output size: 512 +[2024-11-07 23:14:08,978][42004] Policy head output size: 512 +[2024-11-07 23:14:09,034][41694] Inference worker 0-0 is ready! +[2024-11-07 23:14:09,035][41694] All inference workers are ready! Signal rollout workers to start! +[2024-11-07 23:14:09,147][42008] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 23:14:09,148][42010] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 23:14:09,157][42009] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 23:14:09,159][42007] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 23:14:09,164][42005] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 23:14:09,165][42006] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 23:14:09,190][42017] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 23:14:09,204][42018] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 23:14:10,019][41694] Heartbeat connected on Batcher_0 +[2024-11-07 23:14:10,026][41694] Heartbeat connected on LearnerWorker_p0 +[2024-11-07 23:14:10,074][41694] Heartbeat connected on InferenceWorker_p0-w0 +[2024-11-07 23:14:12,072][42018] Decorrelating experience for 0 frames... +[2024-11-07 23:14:12,072][42010] Decorrelating experience for 0 frames... +[2024-11-07 23:14:12,072][42006] Decorrelating experience for 0 frames... +[2024-11-07 23:14:12,072][42009] Decorrelating experience for 0 frames... +[2024-11-07 23:14:12,072][42005] Decorrelating experience for 0 frames... +[2024-11-07 23:14:12,072][42007] Decorrelating experience for 0 frames... +[2024-11-07 23:14:12,074][42017] Decorrelating experience for 0 frames... +[2024-11-07 23:14:12,358][42008] Decorrelating experience for 0 frames... +[2024-11-07 23:14:12,422][42006] Decorrelating experience for 32 frames... +[2024-11-07 23:14:12,450][42005] Decorrelating experience for 32 frames... +[2024-11-07 23:14:12,458][42007] Decorrelating experience for 32 frames... +[2024-11-07 23:14:12,497][42018] Decorrelating experience for 32 frames... +[2024-11-07 23:14:12,746][42008] Decorrelating experience for 32 frames... +[2024-11-07 23:14:12,844][42010] Decorrelating experience for 32 frames... +[2024-11-07 23:14:12,932][41694] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 20013056. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 23:14:12,955][42007] Decorrelating experience for 64 frames... +[2024-11-07 23:14:12,959][42006] Decorrelating experience for 64 frames... +[2024-11-07 23:14:13,040][42005] Decorrelating experience for 64 frames... +[2024-11-07 23:14:13,050][42018] Decorrelating experience for 64 frames... +[2024-11-07 23:14:13,271][42009] Decorrelating experience for 32 frames... +[2024-11-07 23:14:13,363][42010] Decorrelating experience for 64 frames... +[2024-11-07 23:14:13,416][42008] Decorrelating experience for 64 frames... +[2024-11-07 23:14:13,492][42007] Decorrelating experience for 96 frames... +[2024-11-07 23:14:13,547][42005] Decorrelating experience for 96 frames... +[2024-11-07 23:14:13,573][41694] Heartbeat connected on RolloutWorker_w2 +[2024-11-07 23:14:13,627][42018] Decorrelating experience for 96 frames... +[2024-11-07 23:14:13,659][41694] Heartbeat connected on RolloutWorker_w0 +[2024-11-07 23:14:13,773][42006] Decorrelating experience for 96 frames... +[2024-11-07 23:14:13,812][41694] Heartbeat connected on RolloutWorker_w7 +[2024-11-07 23:14:13,866][42009] Decorrelating experience for 64 frames... +[2024-11-07 23:14:13,875][41694] Heartbeat connected on RolloutWorker_w1 +[2024-11-07 23:14:13,922][42008] Decorrelating experience for 96 frames... +[2024-11-07 23:14:14,007][41694] Heartbeat connected on RolloutWorker_w3 +[2024-11-07 23:14:14,011][42010] Decorrelating experience for 96 frames... +[2024-11-07 23:14:14,079][41694] Heartbeat connected on RolloutWorker_w4 +[2024-11-07 23:14:14,160][42017] Decorrelating experience for 32 frames... +[2024-11-07 23:14:14,288][42009] Decorrelating experience for 96 frames... +[2024-11-07 23:14:14,336][41694] Heartbeat connected on RolloutWorker_w5 +[2024-11-07 23:14:14,529][42017] Decorrelating experience for 64 frames... +[2024-11-07 23:14:14,847][42017] Decorrelating experience for 96 frames... +[2024-11-07 23:14:14,896][41694] Heartbeat connected on RolloutWorker_w6 +[2024-11-07 23:14:17,932][41694] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 20013056. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 23:14:21,624][41991] Signal inference workers to stop experience collection... +[2024-11-07 23:14:21,635][42004] InferenceWorker_p0-w0: stopping experience collection +[2024-11-07 23:14:22,932][41694] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 20013056. Throughput: 0: 122.4. Samples: 1224. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 23:14:22,935][41694] Avg episode reward: [(0, '2.104')] +[2024-11-07 23:14:27,931][41694] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 20013056. Throughput: 0: 150.5. Samples: 2258. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 23:14:27,933][41694] Avg episode reward: [(0, '2.104')] +[2024-11-07 23:14:30,864][41991] Signal inference workers to resume experience collection... +[2024-11-07 23:14:30,864][42004] InferenceWorker_p0-w0: resuming experience collection +[2024-11-07 23:14:32,932][41694] Fps is (10 sec: 1638.5, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 20029440. Throughput: 0: 112.9. Samples: 2258. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) +[2024-11-07 23:14:32,935][41694] Avg episode reward: [(0, '3.393')] +[2024-11-07 23:14:36,549][42004] Updated weights for policy 0, policy_version 4896 (0.0195) +[2024-11-07 23:14:37,932][41694] Fps is (10 sec: 4505.3, 60 sec: 1802.2, 300 sec: 1802.2). Total num frames: 20058112. Throughput: 0: 408.4. Samples: 10210. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:14:37,936][41694] Avg episode reward: [(0, '4.164')] +[2024-11-07 23:14:42,932][41694] Fps is (10 sec: 4915.2, 60 sec: 2184.5, 300 sec: 2184.5). Total num frames: 20078592. Throughput: 0: 558.8. Samples: 16764. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:14:42,934][41694] Avg episode reward: [(0, '4.382')] +[2024-11-07 23:14:47,102][42004] Updated weights for policy 0, policy_version 4906 (0.0075) +[2024-11-07 23:14:47,932][41694] Fps is (10 sec: 3686.5, 60 sec: 2340.6, 300 sec: 2340.6). Total num frames: 20094976. Throughput: 0: 575.8. Samples: 20154. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:14:47,934][41694] Avg episode reward: [(0, '4.318')] +[2024-11-07 23:14:52,932][41694] Fps is (10 sec: 2457.5, 60 sec: 2252.8, 300 sec: 2252.8). Total num frames: 20103168. Throughput: 0: 574.5. Samples: 22980. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:14:52,935][41694] Avg episode reward: [(0, '4.267')] +[2024-11-07 23:14:57,935][41694] Fps is (10 sec: 2048.0, 60 sec: 2275.6, 300 sec: 2275.6). Total num frames: 20115456. Throughput: 0: 583.4. Samples: 26254. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:14:57,938][41694] Avg episode reward: [(0, '4.157')] +[2024-11-07 23:15:01,763][42004] Updated weights for policy 0, policy_version 4916 (0.0068) +[2024-11-07 23:15:02,932][41694] Fps is (10 sec: 3686.6, 60 sec: 2539.5, 300 sec: 2539.5). Total num frames: 20140032. Throughput: 0: 642.8. Samples: 28926. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:15:02,934][41694] Avg episode reward: [(0, '4.345')] +[2024-11-07 23:15:07,931][41694] Fps is (10 sec: 5734.5, 60 sec: 2904.4, 300 sec: 2904.4). Total num frames: 20172800. Throughput: 0: 809.1. Samples: 37632. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:15:07,933][41694] Avg episode reward: [(0, '4.305')] +[2024-11-07 23:15:08,163][42004] Updated weights for policy 0, policy_version 4926 (0.0029) +[2024-11-07 23:15:12,932][41694] Fps is (10 sec: 6553.5, 60 sec: 3208.5, 300 sec: 3208.5). Total num frames: 20205568. Throughput: 0: 1032.8. Samples: 48736. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:15:12,934][41694] Avg episode reward: [(0, '4.309')] +[2024-11-07 23:15:14,684][42004] Updated weights for policy 0, policy_version 4936 (0.0036) +[2024-11-07 23:15:17,931][41694] Fps is (10 sec: 6553.6, 60 sec: 3754.7, 300 sec: 3465.9). Total num frames: 20238336. Throughput: 0: 1116.1. Samples: 52484. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:15:17,933][41694] Avg episode reward: [(0, '4.553')] +[2024-11-07 23:15:20,923][42004] Updated weights for policy 0, policy_version 4946 (0.0042) +[2024-11-07 23:15:22,932][41694] Fps is (10 sec: 6553.7, 60 sec: 4300.8, 300 sec: 3686.4). Total num frames: 20271104. Throughput: 0: 1158.6. Samples: 62346. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-07 23:15:22,934][41694] Avg episode reward: [(0, '4.498')] +[2024-11-07 23:15:27,932][41694] Fps is (10 sec: 5324.8, 60 sec: 4642.1, 300 sec: 3713.7). Total num frames: 20291584. Throughput: 0: 1180.7. Samples: 69894. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:15:27,933][41694] Avg episode reward: [(0, '4.436')] +[2024-11-07 23:15:28,550][42004] Updated weights for policy 0, policy_version 4956 (0.0032) +[2024-11-07 23:15:32,932][41694] Fps is (10 sec: 6143.6, 60 sec: 5051.7, 300 sec: 3993.6). Total num frames: 20332544. Throughput: 0: 1225.3. Samples: 75292. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:15:32,935][41694] Avg episode reward: [(0, '4.524')] +[2024-11-07 23:15:33,956][42004] Updated weights for policy 0, policy_version 4966 (0.0026) +[2024-11-07 23:15:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 5120.0, 300 sec: 4144.2). Total num frames: 20365312. Throughput: 0: 1398.1. Samples: 85892. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:15:37,934][41694] Avg episode reward: [(0, '4.442')] +[2024-11-07 23:15:37,949][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004972_20365312.pth... +[2024-11-07 23:15:38,073][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth +[2024-11-07 23:15:40,320][42004] Updated weights for policy 0, policy_version 4976 (0.0030) +[2024-11-07 23:15:42,932][41694] Fps is (10 sec: 6963.6, 60 sec: 5393.1, 300 sec: 4323.6). Total num frames: 20402176. Throughput: 0: 1554.9. Samples: 96226. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 23:15:42,933][41694] Avg episode reward: [(0, '4.372')] +[2024-11-07 23:15:45,861][42004] Updated weights for policy 0, policy_version 4986 (0.0027) +[2024-11-07 23:15:47,931][41694] Fps is (10 sec: 6963.4, 60 sec: 5666.2, 300 sec: 4440.9). Total num frames: 20434944. Throughput: 0: 1614.1. Samples: 101560. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 23:15:47,933][41694] Avg episode reward: [(0, '4.305')] +[2024-11-07 23:15:51,414][42004] Updated weights for policy 0, policy_version 4996 (0.0027) +[2024-11-07 23:15:52,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6144.1, 300 sec: 4587.5). Total num frames: 20471808. Throughput: 0: 1670.3. Samples: 112796. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:15:52,933][41694] Avg episode reward: [(0, '4.378')] +[2024-11-07 23:15:57,128][42004] Updated weights for policy 0, policy_version 5006 (0.0047) +[2024-11-07 23:15:59,216][41694] Fps is (10 sec: 6533.8, 60 sec: 6416.3, 300 sec: 4663.1). Total num frames: 20508672. Throughput: 0: 1615.9. Samples: 123526. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:15:59,217][41694] Avg episode reward: [(0, '4.346')] +[2024-11-07 23:16:02,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6553.6, 300 sec: 4729.0). Total num frames: 20533248. Throughput: 0: 1640.8. Samples: 126320. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:16:02,933][41694] Avg episode reward: [(0, '4.581')] +[2024-11-07 23:16:04,614][42004] Updated weights for policy 0, policy_version 5016 (0.0027) +[2024-11-07 23:16:07,932][41694] Fps is (10 sec: 7049.1, 60 sec: 6621.9, 300 sec: 4844.0). Total num frames: 20570112. Throughput: 0: 1649.9. Samples: 136590. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:16:07,933][41694] Avg episode reward: [(0, '4.181')] +[2024-11-07 23:16:10,252][42004] Updated weights for policy 0, policy_version 5026 (0.0044) +[2024-11-07 23:16:12,934][41694] Fps is (10 sec: 6961.5, 60 sec: 6621.6, 300 sec: 4915.1). Total num frames: 20602880. Throughput: 0: 1718.6. Samples: 147236. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:16:12,938][41694] Avg episode reward: [(0, '4.410')] +[2024-11-07 23:16:16,077][42004] Updated weights for policy 0, policy_version 5036 (0.0029) +[2024-11-07 23:16:17,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 5013.5). Total num frames: 20639744. Throughput: 0: 1710.3. Samples: 152256. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-07 23:16:17,933][41694] Avg episode reward: [(0, '4.456')] +[2024-11-07 23:16:21,467][42004] Updated weights for policy 0, policy_version 5046 (0.0032) +[2024-11-07 23:16:22,932][41694] Fps is (10 sec: 7374.6, 60 sec: 6758.4, 300 sec: 5104.2). Total num frames: 20676608. Throughput: 0: 1728.0. Samples: 163654. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-07 23:16:22,934][41694] Avg episode reward: [(0, '4.516')] +[2024-11-07 23:16:27,084][42004] Updated weights for policy 0, policy_version 5056 (0.0040) +[2024-11-07 23:16:27,932][41694] Fps is (10 sec: 7372.3, 60 sec: 7031.4, 300 sec: 5188.2). Total num frames: 20713472. Throughput: 0: 1742.8. Samples: 174652. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:16:27,934][41694] Avg episode reward: [(0, '4.304')] +[2024-11-07 23:16:33,140][41694] Fps is (10 sec: 6018.6, 60 sec: 6735.1, 300 sec: 5170.8). Total num frames: 20738048. Throughput: 0: 1737.1. Samples: 180092. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:16:33,142][41694] Avg episode reward: [(0, '4.632')] +[2024-11-07 23:16:34,644][42004] Updated weights for policy 0, policy_version 5066 (0.0025) +[2024-11-07 23:16:37,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6758.3, 300 sec: 5225.9). Total num frames: 20770816. Throughput: 0: 1661.1. Samples: 187546. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-07 23:16:37,935][41694] Avg episode reward: [(0, '4.672')] +[2024-11-07 23:16:40,113][42004] Updated weights for policy 0, policy_version 5076 (0.0031) +[2024-11-07 23:16:42,934][41694] Fps is (10 sec: 7109.4, 60 sec: 6758.1, 300 sec: 5297.4). Total num frames: 20807680. Throughput: 0: 1720.1. Samples: 198726. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:16:42,937][41694] Avg episode reward: [(0, '4.650')] +[2024-11-07 23:16:46,338][42004] Updated weights for policy 0, policy_version 5086 (0.0036) +[2024-11-07 23:16:47,932][41694] Fps is (10 sec: 6963.7, 60 sec: 6758.4, 300 sec: 5338.0). Total num frames: 20840448. Throughput: 0: 1714.4. Samples: 203470. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:16:47,934][41694] Avg episode reward: [(0, '4.359')] +[2024-11-07 23:16:51,674][42004] Updated weights for policy 0, policy_version 5096 (0.0042) +[2024-11-07 23:16:52,932][41694] Fps is (10 sec: 7374.5, 60 sec: 6826.6, 300 sec: 5427.2). Total num frames: 20881408. Throughput: 0: 1735.5. Samples: 214688. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:16:52,934][41694] Avg episode reward: [(0, '4.543')] +[2024-11-07 23:16:57,541][42004] Updated weights for policy 0, policy_version 5106 (0.0038) +[2024-11-07 23:16:57,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6906.2, 300 sec: 5461.3). Total num frames: 20914176. Throughput: 0: 1731.8. Samples: 225164. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:16:57,934][41694] Avg episode reward: [(0, '4.574')] +[2024-11-07 23:17:02,932][41694] Fps is (10 sec: 6963.4, 60 sec: 6963.2, 300 sec: 5517.5). Total num frames: 20951040. Throughput: 0: 1747.8. Samples: 230908. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:17:02,935][41694] Avg episode reward: [(0, '4.391')] +[2024-11-07 23:17:03,411][42004] Updated weights for policy 0, policy_version 5116 (0.0028) +[2024-11-07 23:17:07,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6758.4, 300 sec: 5500.3). Total num frames: 20975616. Throughput: 0: 1697.2. Samples: 240026. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:17:07,933][41694] Avg episode reward: [(0, '4.340')] +[2024-11-07 23:17:10,440][42004] Updated weights for policy 0, policy_version 5126 (0.0049) +[2024-11-07 23:17:12,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6827.0, 300 sec: 5552.4). Total num frames: 21012480. Throughput: 0: 1662.6. Samples: 249466. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:17:12,933][41694] Avg episode reward: [(0, '4.520')] +[2024-11-07 23:17:16,420][42004] Updated weights for policy 0, policy_version 5136 (0.0027) +[2024-11-07 23:17:17,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6758.4, 300 sec: 5579.4). Total num frames: 21045248. Throughput: 0: 1661.7. Samples: 254522. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:17:17,935][41694] Avg episode reward: [(0, '4.523')] +[2024-11-07 23:17:22,272][42004] Updated weights for policy 0, policy_version 5146 (0.0024) +[2024-11-07 23:17:22,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6758.4, 300 sec: 5626.6). Total num frames: 21082112. Throughput: 0: 1710.8. Samples: 264532. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:17:22,934][41694] Avg episode reward: [(0, '4.213')] +[2024-11-07 23:17:27,768][42004] Updated weights for policy 0, policy_version 5156 (0.0033) +[2024-11-07 23:17:27,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.5, 300 sec: 5671.4). Total num frames: 21118976. Throughput: 0: 1717.5. Samples: 276010. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:17:27,937][41694] Avg episode reward: [(0, '4.523')] +[2024-11-07 23:17:32,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6987.5, 300 sec: 5713.9). Total num frames: 21155840. Throughput: 0: 1737.2. Samples: 281646. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:17:32,938][41694] Avg episode reward: [(0, '4.265')] +[2024-11-07 23:17:33,205][42004] Updated weights for policy 0, policy_version 5166 (0.0033) +[2024-11-07 23:17:37,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7031.6, 300 sec: 5754.4). Total num frames: 21192704. Throughput: 0: 1722.6. Samples: 292206. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:17:37,933][41694] Avg episode reward: [(0, '4.459')] +[2024-11-07 23:17:37,945][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000005174_21192704.pth... +[2024-11-07 23:17:38,053][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004886_20013056.pth +[2024-11-07 23:17:40,601][42004] Updated weights for policy 0, policy_version 5176 (0.0030) +[2024-11-07 23:17:42,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6827.0, 300 sec: 5734.4). Total num frames: 21217280. Throughput: 0: 1670.8. Samples: 300350. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:17:42,934][41694] Avg episode reward: [(0, '4.284')] +[2024-11-07 23:17:46,033][42004] Updated weights for policy 0, policy_version 5186 (0.0028) +[2024-11-07 23:17:47,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6894.9, 300 sec: 5772.5). Total num frames: 21254144. Throughput: 0: 1669.1. Samples: 306018. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:17:47,933][41694] Avg episode reward: [(0, '4.708')] +[2024-11-07 23:17:51,536][42004] Updated weights for policy 0, policy_version 5196 (0.0033) +[2024-11-07 23:17:52,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 5808.9). Total num frames: 21291008. Throughput: 0: 1718.7. Samples: 317366. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:17:52,933][41694] Avg episode reward: [(0, '4.642')] +[2024-11-07 23:17:57,743][42004] Updated weights for policy 0, policy_version 5206 (0.0032) +[2024-11-07 23:17:57,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6826.7, 300 sec: 5825.4). Total num frames: 21323776. Throughput: 0: 1733.9. Samples: 327490. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:17:57,933][41694] Avg episode reward: [(0, '4.614')] +[2024-11-07 23:18:02,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6758.4, 300 sec: 5841.3). Total num frames: 21356544. Throughput: 0: 1731.6. Samples: 332444. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:18:02,933][41694] Avg episode reward: [(0, '4.504')] +[2024-11-07 23:18:04,001][42004] Updated weights for policy 0, policy_version 5216 (0.0026) +[2024-11-07 23:18:07,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6963.2, 300 sec: 5873.8). Total num frames: 21393408. Throughput: 0: 1729.5. Samples: 342358. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:18:07,936][41694] Avg episode reward: [(0, '4.212')] +[2024-11-07 23:18:09,558][42004] Updated weights for policy 0, policy_version 5226 (0.0030) +[2024-11-07 23:18:14,270][41694] Fps is (10 sec: 6141.4, 60 sec: 6744.5, 300 sec: 5855.4). Total num frames: 21426176. Throughput: 0: 1674.0. Samples: 353580. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:18:14,273][41694] Avg episode reward: [(0, '4.370')] +[2024-11-07 23:18:16,659][42004] Updated weights for policy 0, policy_version 5236 (0.0027) +[2024-11-07 23:18:17,932][41694] Fps is (10 sec: 6144.1, 60 sec: 6826.6, 300 sec: 5884.9). Total num frames: 21454848. Throughput: 0: 1654.8. Samples: 356110. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:18:17,933][41694] Avg episode reward: [(0, '4.339')] +[2024-11-07 23:18:22,263][42004] Updated weights for policy 0, policy_version 5246 (0.0035) +[2024-11-07 23:18:22,931][41694] Fps is (10 sec: 7566.0, 60 sec: 6826.7, 300 sec: 5914.6). Total num frames: 21491712. Throughput: 0: 1663.6. Samples: 367066. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:18:22,933][41694] Avg episode reward: [(0, '4.504')] +[2024-11-07 23:18:27,832][42004] Updated weights for policy 0, policy_version 5256 (0.0029) +[2024-11-07 23:18:27,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6826.6, 300 sec: 5943.2). Total num frames: 21528576. Throughput: 0: 1729.6. Samples: 378184. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-07 23:18:27,934][41694] Avg episode reward: [(0, '4.379')] +[2024-11-07 23:18:32,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 5955.0). Total num frames: 21561344. Throughput: 0: 1719.5. Samples: 383396. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:18:32,933][41694] Avg episode reward: [(0, '4.289')] +[2024-11-07 23:18:34,228][42004] Updated weights for policy 0, policy_version 5266 (0.0037) +[2024-11-07 23:18:37,931][41694] Fps is (10 sec: 6554.0, 60 sec: 6690.1, 300 sec: 5966.3). Total num frames: 21594112. Throughput: 0: 1688.1. Samples: 393332. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:18:37,933][41694] Avg episode reward: [(0, '4.271')] +[2024-11-07 23:18:39,927][42004] Updated weights for policy 0, policy_version 5276 (0.0034) +[2024-11-07 23:18:42,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6894.9, 300 sec: 5992.3). Total num frames: 21630976. Throughput: 0: 1702.7. Samples: 404112. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:18:42,934][41694] Avg episode reward: [(0, '4.371')] +[2024-11-07 23:18:45,483][42004] Updated weights for policy 0, policy_version 5286 (0.0026) +[2024-11-07 23:18:47,984][41694] Fps is (10 sec: 6112.0, 60 sec: 6684.3, 300 sec: 5971.6). Total num frames: 21655552. Throughput: 0: 1710.9. Samples: 409522. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:18:47,986][41694] Avg episode reward: [(0, '4.212')] +[2024-11-07 23:18:52,505][42004] Updated weights for policy 0, policy_version 5296 (0.0030) +[2024-11-07 23:18:52,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6690.1, 300 sec: 5997.7). Total num frames: 21692416. Throughput: 0: 1678.9. Samples: 417910. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:18:52,933][41694] Avg episode reward: [(0, '4.309')] +[2024-11-07 23:18:57,932][41694] Fps is (10 sec: 6176.2, 60 sec: 6553.6, 300 sec: 5978.7). Total num frames: 21716992. Throughput: 0: 1672.5. Samples: 426604. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:18:57,934][41694] Avg episode reward: [(0, '4.440')] +[2024-11-07 23:19:00,980][42004] Updated weights for policy 0, policy_version 5306 (0.0030) +[2024-11-07 23:19:02,932][41694] Fps is (10 sec: 4505.7, 60 sec: 6348.8, 300 sec: 5946.3). Total num frames: 21737472. Throughput: 0: 1635.7. Samples: 429716. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:19:02,933][41694] Avg episode reward: [(0, '4.387')] +[2024-11-07 23:19:07,932][41694] Fps is (10 sec: 4505.6, 60 sec: 6144.0, 300 sec: 5928.8). Total num frames: 21762048. Throughput: 0: 1537.3. Samples: 436246. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:19:07,936][41694] Avg episode reward: [(0, '4.461')] +[2024-11-07 23:19:10,443][42004] Updated weights for policy 0, policy_version 5316 (0.0047) +[2024-11-07 23:19:12,931][41694] Fps is (10 sec: 4505.6, 60 sec: 6074.7, 300 sec: 5998.2). Total num frames: 21782528. Throughput: 0: 1438.8. Samples: 442930. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:19:12,935][41694] Avg episode reward: [(0, '4.373')] +[2024-11-07 23:19:17,938][41694] Fps is (10 sec: 4912.2, 60 sec: 5938.6, 300 sec: 6095.3). Total num frames: 21811200. Throughput: 0: 1410.2. Samples: 446864. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:19:17,944][41694] Avg episode reward: [(0, '4.178')] +[2024-11-07 23:19:18,276][42004] Updated weights for policy 0, policy_version 5326 (0.0037) +[2024-11-07 23:19:22,932][41694] Fps is (10 sec: 4915.2, 60 sec: 5666.1, 300 sec: 6164.8). Total num frames: 21831680. Throughput: 0: 1349.5. Samples: 454060. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:19:22,942][41694] Avg episode reward: [(0, '4.190')] +[2024-11-07 23:19:26,855][42004] Updated weights for policy 0, policy_version 5336 (0.0041) +[2024-11-07 23:19:27,932][41694] Fps is (10 sec: 4918.1, 60 sec: 5529.6, 300 sec: 6206.5). Total num frames: 21860352. Throughput: 0: 1281.0. Samples: 461756. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:19:27,934][41694] Avg episode reward: [(0, '4.514')] +[2024-11-07 23:19:32,932][41694] Fps is (10 sec: 5734.2, 60 sec: 5461.3, 300 sec: 6206.5). Total num frames: 21889024. Throughput: 0: 1256.3. Samples: 465990. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:19:32,934][41694] Avg episode reward: [(0, '4.506')] +[2024-11-07 23:19:34,427][42004] Updated weights for policy 0, policy_version 5346 (0.0041) +[2024-11-07 23:19:37,932][41694] Fps is (10 sec: 5324.8, 60 sec: 5324.8, 300 sec: 6220.4). Total num frames: 21913600. Throughput: 0: 1240.8. Samples: 473744. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:19:37,933][41694] Avg episode reward: [(0, '4.723')] +[2024-11-07 23:19:37,949][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000005350_21913600.pth... +[2024-11-07 23:19:38,175][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004972_20365312.pth +[2024-11-07 23:19:42,932][41694] Fps is (10 sec: 4505.7, 60 sec: 5051.7, 300 sec: 6234.3). Total num frames: 21934080. Throughput: 0: 1191.4. Samples: 480218. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 23:19:42,934][41694] Avg episode reward: [(0, '4.619')] +[2024-11-07 23:19:43,630][42004] Updated weights for policy 0, policy_version 5356 (0.0051) +[2024-11-07 23:19:47,936][41694] Fps is (10 sec: 4094.4, 60 sec: 4987.5, 300 sec: 6275.8). Total num frames: 21954560. Throughput: 0: 1187.8. Samples: 483174. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:19:47,937][41694] Avg episode reward: [(0, '4.444')] +[2024-11-07 23:19:52,815][42004] Updated weights for policy 0, policy_version 5366 (0.0027) +[2024-11-07 23:19:52,931][41694] Fps is (10 sec: 4505.6, 60 sec: 4778.7, 300 sec: 6317.6). Total num frames: 21979136. Throughput: 0: 1189.9. Samples: 489792. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:19:52,935][41694] Avg episode reward: [(0, '4.538')] +[2024-11-07 23:19:57,950][41694] Fps is (10 sec: 4090.2, 60 sec: 4640.7, 300 sec: 6289.4). Total num frames: 21995520. Throughput: 0: 1165.0. Samples: 495378. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:19:57,961][41694] Avg episode reward: [(0, '4.533')] +[2024-11-07 23:20:02,487][42004] Updated weights for policy 0, policy_version 5376 (0.0033) +[2024-11-07 23:20:02,932][41694] Fps is (10 sec: 4095.8, 60 sec: 4710.4, 300 sec: 6262.0). Total num frames: 22020096. Throughput: 0: 1163.1. Samples: 499198. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:20:02,935][41694] Avg episode reward: [(0, '4.614')] +[2024-11-07 23:20:07,934][41694] Fps is (10 sec: 4923.3, 60 sec: 4710.3, 300 sec: 6234.2). Total num frames: 22044672. Throughput: 0: 1167.5. Samples: 506600. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 23:20:07,936][41694] Avg episode reward: [(0, '4.482')] +[2024-11-07 23:20:10,718][42004] Updated weights for policy 0, policy_version 5386 (0.0024) +[2024-11-07 23:20:12,933][41694] Fps is (10 sec: 5324.3, 60 sec: 4846.8, 300 sec: 6220.3). Total num frames: 22073344. Throughput: 0: 1176.6. Samples: 514704. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:20:12,935][41694] Avg episode reward: [(0, '4.449')] +[2024-11-07 23:20:17,932][41694] Fps is (10 sec: 5325.7, 60 sec: 4779.2, 300 sec: 6192.6). Total num frames: 22097920. Throughput: 0: 1167.0. Samples: 518506. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:20:17,934][41694] Avg episode reward: [(0, '4.275')] +[2024-11-07 23:20:18,027][42004] Updated weights for policy 0, policy_version 5396 (0.0057) +[2024-11-07 23:20:22,932][41694] Fps is (10 sec: 5325.5, 60 sec: 4915.2, 300 sec: 6220.4). Total num frames: 22126592. Throughput: 0: 1177.3. Samples: 526722. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:20:22,935][41694] Avg episode reward: [(0, '4.287')] +[2024-11-07 23:20:25,432][42004] Updated weights for policy 0, policy_version 5406 (0.0041) +[2024-11-07 23:20:29,142][41694] Fps is (10 sec: 5115.1, 60 sec: 4818.0, 300 sec: 6153.5). Total num frames: 22155264. Throughput: 0: 1197.7. Samples: 535564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:20:29,145][41694] Avg episode reward: [(0, '4.491')] +[2024-11-07 23:20:32,932][41694] Fps is (10 sec: 5324.7, 60 sec: 4846.9, 300 sec: 6150.9). Total num frames: 22179840. Throughput: 0: 1220.7. Samples: 538100. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:20:32,938][41694] Avg episode reward: [(0, '4.315')] +[2024-11-07 23:20:33,213][42004] Updated weights for policy 0, policy_version 5416 (0.0034) +[2024-11-07 23:20:37,932][41694] Fps is (10 sec: 6058.3, 60 sec: 4915.2, 300 sec: 6123.2). Total num frames: 22208512. Throughput: 0: 1277.1. Samples: 547262. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:20:37,934][41694] Avg episode reward: [(0, '4.408')] +[2024-11-07 23:20:39,932][42004] Updated weights for policy 0, policy_version 5426 (0.0052) +[2024-11-07 23:20:42,932][41694] Fps is (10 sec: 6144.0, 60 sec: 5120.0, 300 sec: 6123.2). Total num frames: 22241280. Throughput: 0: 1373.2. Samples: 557148. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:20:42,934][41694] Avg episode reward: [(0, '4.375')] +[2024-11-07 23:20:46,436][42004] Updated weights for policy 0, policy_version 5436 (0.0036) +[2024-11-07 23:20:47,932][41694] Fps is (10 sec: 6553.3, 60 sec: 5325.1, 300 sec: 6109.3). Total num frames: 22274048. Throughput: 0: 1388.6. Samples: 561686. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:20:47,934][41694] Avg episode reward: [(0, '4.320')] +[2024-11-07 23:20:52,627][42004] Updated weights for policy 0, policy_version 5446 (0.0035) +[2024-11-07 23:20:52,932][41694] Fps is (10 sec: 6553.6, 60 sec: 5461.3, 300 sec: 6122.0). Total num frames: 22306816. Throughput: 0: 1446.7. Samples: 571700. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:20:52,935][41694] Avg episode reward: [(0, '4.444')] +[2024-11-07 23:20:57,932][41694] Fps is (10 sec: 6144.3, 60 sec: 5667.8, 300 sec: 6109.3). Total num frames: 22335488. Throughput: 0: 1457.7. Samples: 580300. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:20:57,933][41694] Avg episode reward: [(0, '4.487')] +[2024-11-07 23:20:59,892][42004] Updated weights for policy 0, policy_version 5456 (0.0037) +[2024-11-07 23:21:02,932][41694] Fps is (10 sec: 4915.3, 60 sec: 5597.9, 300 sec: 6053.7). Total num frames: 22355968. Throughput: 0: 1471.6. Samples: 584726. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:21:02,934][41694] Avg episode reward: [(0, '4.447')] +[2024-11-07 23:21:07,843][42004] Updated weights for policy 0, policy_version 5466 (0.0037) +[2024-11-07 23:21:07,932][41694] Fps is (10 sec: 5324.8, 60 sec: 5734.6, 300 sec: 6053.8). Total num frames: 22388736. Throughput: 0: 1440.8. Samples: 591556. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 23:21:07,933][41694] Avg episode reward: [(0, '4.593')] +[2024-11-07 23:21:12,937][41694] Fps is (10 sec: 6140.8, 60 sec: 5734.0, 300 sec: 6025.9). Total num frames: 22417408. Throughput: 0: 1489.6. Samples: 600802. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:21:12,941][41694] Avg episode reward: [(0, '4.328')] +[2024-11-07 23:21:14,990][42004] Updated weights for policy 0, policy_version 5476 (0.0043) +[2024-11-07 23:21:17,931][41694] Fps is (10 sec: 5734.5, 60 sec: 5802.7, 300 sec: 5998.2). Total num frames: 22446080. Throughput: 0: 1484.7. Samples: 604910. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:21:17,934][41694] Avg episode reward: [(0, '4.296')] +[2024-11-07 23:21:21,421][42004] Updated weights for policy 0, policy_version 5486 (0.0031) +[2024-11-07 23:21:22,931][41694] Fps is (10 sec: 6147.2, 60 sec: 5870.9, 300 sec: 5984.3). Total num frames: 22478848. Throughput: 0: 1492.1. Samples: 614406. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:21:22,933][41694] Avg episode reward: [(0, '4.217')] +[2024-11-07 23:21:27,498][42004] Updated weights for policy 0, policy_version 5496 (0.0032) +[2024-11-07 23:21:27,931][41694] Fps is (10 sec: 6553.5, 60 sec: 6061.5, 300 sec: 6016.3). Total num frames: 22511616. Throughput: 0: 1499.8. Samples: 624638. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:21:27,938][41694] Avg episode reward: [(0, '4.468')] +[2024-11-07 23:21:32,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6007.5, 300 sec: 5998.2). Total num frames: 22540288. Throughput: 0: 1503.3. Samples: 629336. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:21:32,935][41694] Avg episode reward: [(0, '4.506')] +[2024-11-07 23:21:34,659][42004] Updated weights for policy 0, policy_version 5506 (0.0043) +[2024-11-07 23:21:37,932][41694] Fps is (10 sec: 4915.2, 60 sec: 5870.9, 300 sec: 5942.7). Total num frames: 22560768. Throughput: 0: 1440.4. Samples: 636518. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:21:37,937][41694] Avg episode reward: [(0, '4.585')] +[2024-11-07 23:21:37,954][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000005509_22564864.pth... +[2024-11-07 23:21:38,088][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000005174_21192704.pth +[2024-11-07 23:21:42,231][42004] Updated weights for policy 0, policy_version 5516 (0.0044) +[2024-11-07 23:21:42,932][41694] Fps is (10 sec: 5734.5, 60 sec: 5939.2, 300 sec: 5956.6). Total num frames: 22597632. Throughput: 0: 1449.9. Samples: 645546. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:21:42,934][41694] Avg episode reward: [(0, '4.721')] +[2024-11-07 23:21:47,932][41694] Fps is (10 sec: 6963.1, 60 sec: 5939.2, 300 sec: 5928.8). Total num frames: 22630400. Throughput: 0: 1452.7. Samples: 650096. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-07 23:21:47,934][41694] Avg episode reward: [(0, '4.529')] +[2024-11-07 23:21:48,520][42004] Updated weights for policy 0, policy_version 5526 (0.0062) +[2024-11-07 23:21:52,932][41694] Fps is (10 sec: 6553.6, 60 sec: 5939.2, 300 sec: 5928.8). Total num frames: 22663168. Throughput: 0: 1524.6. Samples: 660162. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:21:52,933][41694] Avg episode reward: [(0, '4.368')] +[2024-11-07 23:21:54,601][42004] Updated weights for policy 0, policy_version 5536 (0.0032) +[2024-11-07 23:21:57,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 22695936. Throughput: 0: 1541.2. Samples: 670150. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:21:57,934][41694] Avg episode reward: [(0, '4.420')] +[2024-11-07 23:22:00,899][42004] Updated weights for policy 0, policy_version 5546 (0.0038) +[2024-11-07 23:22:02,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6144.0, 300 sec: 5928.8). Total num frames: 22724608. Throughput: 0: 1560.8. Samples: 675148. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:22:02,936][41694] Avg episode reward: [(0, '4.534')] +[2024-11-07 23:22:07,544][42004] Updated weights for policy 0, policy_version 5556 (0.0030) +[2024-11-07 23:22:07,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6144.0, 300 sec: 5914.9). Total num frames: 22757376. Throughput: 0: 1560.0. Samples: 684608. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:22:07,933][41694] Avg episode reward: [(0, '4.492')] +[2024-11-07 23:22:12,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6076.3, 300 sec: 5887.1). Total num frames: 22781952. Throughput: 0: 1495.3. Samples: 691928. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:22:12,933][41694] Avg episode reward: [(0, '4.384')] +[2024-11-07 23:22:14,920][42004] Updated weights for policy 0, policy_version 5566 (0.0039) +[2024-11-07 23:22:17,931][41694] Fps is (10 sec: 6144.2, 60 sec: 6212.3, 300 sec: 5887.1). Total num frames: 22818816. Throughput: 0: 1510.2. Samples: 697294. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:22:17,933][41694] Avg episode reward: [(0, '4.337')] +[2024-11-07 23:22:20,818][42004] Updated weights for policy 0, policy_version 5576 (0.0026) +[2024-11-07 23:22:22,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6212.3, 300 sec: 5873.2). Total num frames: 22851584. Throughput: 0: 1581.9. Samples: 707702. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:22:22,934][41694] Avg episode reward: [(0, '4.418')] +[2024-11-07 23:22:26,369][42004] Updated weights for policy 0, policy_version 5586 (0.0028) +[2024-11-07 23:22:27,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6280.5, 300 sec: 5873.2). Total num frames: 22888448. Throughput: 0: 1627.2. Samples: 718768. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:22:27,933][41694] Avg episode reward: [(0, '4.566')] +[2024-11-07 23:22:32,216][42004] Updated weights for policy 0, policy_version 5596 (0.0031) +[2024-11-07 23:22:32,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6417.1, 300 sec: 5873.2). Total num frames: 22925312. Throughput: 0: 1646.0. Samples: 724164. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:22:32,933][41694] Avg episode reward: [(0, '4.653')] +[2024-11-07 23:22:37,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 5887.1). Total num frames: 22953984. Throughput: 0: 1642.1. Samples: 734056. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:22:37,934][41694] Avg episode reward: [(0, '4.336')] +[2024-11-07 23:22:38,714][42004] Updated weights for policy 0, policy_version 5606 (0.0027) +[2024-11-07 23:22:44,075][41694] Fps is (10 sec: 5513.7, 60 sec: 6364.1, 300 sec: 5850.6). Total num frames: 22986752. Throughput: 0: 1595.8. Samples: 743786. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:22:44,076][41694] Avg episode reward: [(0, '4.503')] +[2024-11-07 23:22:45,978][42004] Updated weights for policy 0, policy_version 5616 (0.0037) +[2024-11-07 23:22:47,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6417.1, 300 sec: 5845.5). Total num frames: 23015424. Throughput: 0: 1588.3. Samples: 746620. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:22:47,933][41694] Avg episode reward: [(0, '4.465')] +[2024-11-07 23:22:51,740][42004] Updated weights for policy 0, policy_version 5626 (0.0028) +[2024-11-07 23:22:52,932][41694] Fps is (10 sec: 7398.7, 60 sec: 6485.2, 300 sec: 5859.3). Total num frames: 23052288. Throughput: 0: 1613.8. Samples: 757230. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:22:52,935][41694] Avg episode reward: [(0, '4.584')] +[2024-11-07 23:22:57,215][42004] Updated weights for policy 0, policy_version 5636 (0.0028) +[2024-11-07 23:22:57,933][41694] Fps is (10 sec: 7372.1, 60 sec: 6553.5, 300 sec: 5873.2). Total num frames: 23089152. Throughput: 0: 1699.5. Samples: 768408. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:22:57,936][41694] Avg episode reward: [(0, '4.578')] +[2024-11-07 23:23:02,932][41694] Fps is (10 sec: 6963.7, 60 sec: 6621.9, 300 sec: 5859.4). Total num frames: 23121920. Throughput: 0: 1702.7. Samples: 773914. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:23:02,933][41694] Avg episode reward: [(0, '4.717')] +[2024-11-07 23:23:03,287][42004] Updated weights for policy 0, policy_version 5646 (0.0034) +[2024-11-07 23:23:07,931][41694] Fps is (10 sec: 6964.0, 60 sec: 6690.2, 300 sec: 5900.0). Total num frames: 23158784. Throughput: 0: 1695.4. Samples: 783996. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:23:07,933][41694] Avg episode reward: [(0, '4.331')] +[2024-11-07 23:23:09,043][42004] Updated weights for policy 0, policy_version 5656 (0.0038) +[2024-11-07 23:23:12,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6758.4, 300 sec: 5873.2). Total num frames: 23187456. Throughput: 0: 1667.9. Samples: 793824. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:23:12,934][41694] Avg episode reward: [(0, '4.356')] +[2024-11-07 23:23:16,067][42004] Updated weights for policy 0, policy_version 5666 (0.0044) +[2024-11-07 23:23:17,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6485.3, 300 sec: 5817.7). Total num frames: 23207936. Throughput: 0: 1645.0. Samples: 798188. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:23:17,935][41694] Avg episode reward: [(0, '4.365')] +[2024-11-07 23:23:22,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6553.6, 300 sec: 5817.7). Total num frames: 23244800. Throughput: 0: 1583.9. Samples: 805330. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:23:22,933][41694] Avg episode reward: [(0, '4.525')] +[2024-11-07 23:23:23,421][42004] Updated weights for policy 0, policy_version 5676 (0.0037) +[2024-11-07 23:23:27,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6485.3, 300 sec: 5817.7). Total num frames: 23277568. Throughput: 0: 1649.5. Samples: 816126. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:23:27,935][41694] Avg episode reward: [(0, '4.498')] +[2024-11-07 23:23:29,165][42004] Updated weights for policy 0, policy_version 5686 (0.0028) +[2024-11-07 23:23:32,933][41694] Fps is (10 sec: 6552.6, 60 sec: 6416.9, 300 sec: 5817.7). Total num frames: 23310336. Throughput: 0: 1664.1. Samples: 821506. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:23:32,935][41694] Avg episode reward: [(0, '4.521')] +[2024-11-07 23:23:36,403][42004] Updated weights for policy 0, policy_version 5696 (0.0046) +[2024-11-07 23:23:37,932][41694] Fps is (10 sec: 6143.5, 60 sec: 6417.0, 300 sec: 5789.9). Total num frames: 23339008. Throughput: 0: 1610.0. Samples: 829680. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:23:37,934][41694] Avg episode reward: [(0, '4.387')] +[2024-11-07 23:23:37,944][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000005698_23339008.pth... +[2024-11-07 23:23:38,105][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000005350_21913600.pth +[2024-11-07 23:23:42,932][41694] Fps is (10 sec: 5735.2, 60 sec: 6472.1, 300 sec: 5804.8). Total num frames: 23367680. Throughput: 0: 1552.7. Samples: 838278. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:23:42,933][41694] Avg episode reward: [(0, '4.341')] +[2024-11-07 23:23:43,328][42004] Updated weights for policy 0, policy_version 5706 (0.0031) +[2024-11-07 23:23:47,932][41694] Fps is (10 sec: 5734.8, 60 sec: 6348.8, 300 sec: 5776.1). Total num frames: 23396352. Throughput: 0: 1531.5. Samples: 842830. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:23:47,940][41694] Avg episode reward: [(0, '4.441')] +[2024-11-07 23:23:51,790][42004] Updated weights for policy 0, policy_version 5716 (0.0031) +[2024-11-07 23:23:52,931][41694] Fps is (10 sec: 4915.3, 60 sec: 6075.8, 300 sec: 5762.2). Total num frames: 23416832. Throughput: 0: 1460.5. Samples: 849720. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:23:52,933][41694] Avg episode reward: [(0, '4.404')] +[2024-11-07 23:23:57,932][41694] Fps is (10 sec: 4915.2, 60 sec: 5939.3, 300 sec: 5789.9). Total num frames: 23445504. Throughput: 0: 1420.7. Samples: 857754. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:23:57,937][41694] Avg episode reward: [(0, '4.474')] +[2024-11-07 23:23:59,290][42004] Updated weights for policy 0, policy_version 5726 (0.0042) +[2024-11-07 23:24:02,933][41694] Fps is (10 sec: 5323.8, 60 sec: 5802.5, 300 sec: 5789.9). Total num frames: 23470080. Throughput: 0: 1411.7. Samples: 861716. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 23:24:02,936][41694] Avg episode reward: [(0, '4.493')] +[2024-11-07 23:24:06,806][42004] Updated weights for policy 0, policy_version 5736 (0.0044) +[2024-11-07 23:24:07,931][41694] Fps is (10 sec: 5325.0, 60 sec: 5666.1, 300 sec: 5817.7). Total num frames: 23498752. Throughput: 0: 1434.7. Samples: 869892. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:24:07,935][41694] Avg episode reward: [(0, '4.515')] +[2024-11-07 23:24:12,933][41694] Fps is (10 sec: 5734.7, 60 sec: 5666.0, 300 sec: 5817.8). Total num frames: 23527424. Throughput: 0: 1397.9. Samples: 879034. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:24:12,936][41694] Avg episode reward: [(0, '4.633')] +[2024-11-07 23:24:13,903][42004] Updated weights for policy 0, policy_version 5746 (0.0032) +[2024-11-07 23:24:17,937][41694] Fps is (10 sec: 5732.5, 60 sec: 5802.4, 300 sec: 5845.4). Total num frames: 23556096. Throughput: 0: 1359.2. Samples: 882672. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:24:17,940][41694] Avg episode reward: [(0, '4.432')] +[2024-11-07 23:24:21,250][42004] Updated weights for policy 0, policy_version 5756 (0.0034) +[2024-11-07 23:24:22,931][41694] Fps is (10 sec: 5735.2, 60 sec: 5666.1, 300 sec: 5845.5). Total num frames: 23584768. Throughput: 0: 1368.1. Samples: 891244. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:24:22,936][41694] Avg episode reward: [(0, '4.290')] +[2024-11-07 23:24:27,931][41694] Fps is (10 sec: 4916.8, 60 sec: 5461.3, 300 sec: 5817.7). Total num frames: 23605248. Throughput: 0: 1330.7. Samples: 898160. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:24:27,938][41694] Avg episode reward: [(0, '4.351')] +[2024-11-07 23:24:29,028][42004] Updated weights for policy 0, policy_version 5766 (0.0037) +[2024-11-07 23:24:32,931][41694] Fps is (10 sec: 5734.4, 60 sec: 5529.7, 300 sec: 5859.4). Total num frames: 23642112. Throughput: 0: 1352.1. Samples: 903676. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:24:32,938][41694] Avg episode reward: [(0, '4.261')] +[2024-11-07 23:24:34,852][42004] Updated weights for policy 0, policy_version 5776 (0.0028) +[2024-11-07 23:24:37,931][41694] Fps is (10 sec: 6963.2, 60 sec: 5598.0, 300 sec: 5901.0). Total num frames: 23674880. Throughput: 0: 1424.4. Samples: 913818. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:24:37,934][41694] Avg episode reward: [(0, '4.400')] +[2024-11-07 23:24:41,092][42004] Updated weights for policy 0, policy_version 5786 (0.0030) +[2024-11-07 23:24:42,932][41694] Fps is (10 sec: 6553.5, 60 sec: 5666.1, 300 sec: 5942.7). Total num frames: 23707648. Throughput: 0: 1462.8. Samples: 923580. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:24:42,935][41694] Avg episode reward: [(0, '4.343')] +[2024-11-07 23:24:47,311][42004] Updated weights for policy 0, policy_version 5796 (0.0042) +[2024-11-07 23:24:47,932][41694] Fps is (10 sec: 6963.1, 60 sec: 5802.7, 300 sec: 5984.3). Total num frames: 23744512. Throughput: 0: 1483.7. Samples: 928478. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:24:47,934][41694] Avg episode reward: [(0, '4.496')] +[2024-11-07 23:24:52,932][41694] Fps is (10 sec: 6553.6, 60 sec: 5939.2, 300 sec: 6026.3). Total num frames: 23773184. Throughput: 0: 1524.3. Samples: 938484. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:24:52,933][41694] Avg episode reward: [(0, '4.529')] +[2024-11-07 23:24:54,130][42004] Updated weights for policy 0, policy_version 5806 (0.0042) +[2024-11-07 23:24:58,973][41694] Fps is (10 sec: 4822.6, 60 sec: 5770.8, 300 sec: 6004.8). Total num frames: 23797760. Throughput: 0: 1474.5. Samples: 946918. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:24:58,976][41694] Avg episode reward: [(0, '4.464')] +[2024-11-07 23:25:02,260][42004] Updated weights for policy 0, policy_version 5816 (0.0037) +[2024-11-07 23:25:02,932][41694] Fps is (10 sec: 4915.2, 60 sec: 5871.1, 300 sec: 6026.0). Total num frames: 23822336. Throughput: 0: 1485.3. Samples: 949506. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:25:02,934][41694] Avg episode reward: [(0, '4.542')] +[2024-11-07 23:25:07,932][41694] Fps is (10 sec: 6858.1, 60 sec: 6007.5, 300 sec: 6053.8). Total num frames: 23859200. Throughput: 0: 1510.0. Samples: 959192. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:25:07,938][41694] Avg episode reward: [(0, '4.595')] +[2024-11-07 23:25:08,392][42004] Updated weights for policy 0, policy_version 5826 (0.0035) +[2024-11-07 23:25:12,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6075.9, 300 sec: 6081.5). Total num frames: 23891968. Throughput: 0: 1589.9. Samples: 969704. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:25:12,933][41694] Avg episode reward: [(0, '4.355')] +[2024-11-07 23:25:14,685][42004] Updated weights for policy 0, policy_version 5836 (0.0042) +[2024-11-07 23:25:17,932][41694] Fps is (10 sec: 6143.5, 60 sec: 6076.0, 300 sec: 6081.5). Total num frames: 23920640. Throughput: 0: 1562.5. Samples: 973990. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:25:17,935][41694] Avg episode reward: [(0, '4.223')] +[2024-11-07 23:25:22,028][42004] Updated weights for policy 0, policy_version 5846 (0.0033) +[2024-11-07 23:25:22,933][41694] Fps is (10 sec: 5733.6, 60 sec: 6075.6, 300 sec: 6106.6). Total num frames: 23949312. Throughput: 0: 1517.5. Samples: 982108. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:25:22,935][41694] Avg episode reward: [(0, '4.454')] +[2024-11-07 23:25:27,932][41694] Fps is (10 sec: 6144.5, 60 sec: 6280.5, 300 sec: 6109.3). Total num frames: 23982080. Throughput: 0: 1522.0. Samples: 992070. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:25:27,934][41694] Avg episode reward: [(0, '4.394')] +[2024-11-07 23:25:28,323][42004] Updated weights for policy 0, policy_version 5856 (0.0045) +[2024-11-07 23:25:32,932][41694] Fps is (10 sec: 5325.5, 60 sec: 6007.5, 300 sec: 6081.5). Total num frames: 24002560. Throughput: 0: 1513.8. Samples: 996598. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:25:32,934][41694] Avg episode reward: [(0, '4.512')] +[2024-11-07 23:25:36,148][42004] Updated weights for policy 0, policy_version 5866 (0.0036) +[2024-11-07 23:25:37,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6007.5, 300 sec: 6081.5). Total num frames: 24035328. Throughput: 0: 1457.8. Samples: 1004084. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:25:37,934][41694] Avg episode reward: [(0, '4.605')] +[2024-11-07 23:25:37,952][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000005869_24039424.pth... +[2024-11-07 23:25:38,135][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000005509_22564864.pth +[2024-11-07 23:25:42,091][42004] Updated weights for policy 0, policy_version 5876 (0.0042) +[2024-11-07 23:25:42,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6075.8, 300 sec: 6095.4). Total num frames: 24072192. Throughput: 0: 1534.2. Samples: 1014358. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:25:42,935][41694] Avg episode reward: [(0, '4.627')] +[2024-11-07 23:25:47,633][42004] Updated weights for policy 0, policy_version 5886 (0.0028) +[2024-11-07 23:25:47,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6075.7, 300 sec: 6109.3). Total num frames: 24109056. Throughput: 0: 1561.3. Samples: 1019766. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:25:47,934][41694] Avg episode reward: [(0, '4.356')] +[2024-11-07 23:25:52,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6212.3, 300 sec: 6137.1). Total num frames: 24145920. Throughput: 0: 1594.9. Samples: 1030962. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:25:52,933][41694] Avg episode reward: [(0, '4.589')] +[2024-11-07 23:25:53,347][42004] Updated weights for policy 0, policy_version 5896 (0.0037) +[2024-11-07 23:25:57,932][41694] Fps is (10 sec: 6962.7, 60 sec: 6460.8, 300 sec: 6178.7). Total num frames: 24178688. Throughput: 0: 1592.7. Samples: 1041378. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:25:57,935][41694] Avg episode reward: [(0, '4.396')] +[2024-11-07 23:25:59,291][42004] Updated weights for policy 0, policy_version 5906 (0.0039) +[2024-11-07 23:26:02,932][41694] Fps is (10 sec: 6143.6, 60 sec: 6417.0, 300 sec: 6164.8). Total num frames: 24207360. Throughput: 0: 1604.0. Samples: 1046168. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:26:02,934][41694] Avg episode reward: [(0, '4.481')] +[2024-11-07 23:26:07,932][41694] Fps is (10 sec: 4915.5, 60 sec: 6144.0, 300 sec: 6137.2). Total num frames: 24227840. Throughput: 0: 1571.6. Samples: 1052826. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:26:07,934][41694] Avg episode reward: [(0, '4.459')] +[2024-11-07 23:26:08,280][42004] Updated weights for policy 0, policy_version 5916 (0.0022) +[2024-11-07 23:26:12,932][41694] Fps is (10 sec: 5325.0, 60 sec: 6144.0, 300 sec: 6150.9). Total num frames: 24260608. Throughput: 0: 1546.5. Samples: 1061662. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:26:12,936][41694] Avg episode reward: [(0, '4.420')] +[2024-11-07 23:26:14,568][42004] Updated weights for policy 0, policy_version 5926 (0.0036) +[2024-11-07 23:26:17,931][41694] Fps is (10 sec: 6553.8, 60 sec: 6212.4, 300 sec: 6150.9). Total num frames: 24293376. Throughput: 0: 1554.6. Samples: 1066554. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:26:17,933][41694] Avg episode reward: [(0, '4.313')] +[2024-11-07 23:26:20,283][42004] Updated weights for policy 0, policy_version 5936 (0.0027) +[2024-11-07 23:26:22,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6349.0, 300 sec: 6164.8). Total num frames: 24330240. Throughput: 0: 1628.2. Samples: 1077354. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:26:22,934][41694] Avg episode reward: [(0, '4.285')] +[2024-11-07 23:26:26,023][42004] Updated weights for policy 0, policy_version 5946 (0.0043) +[2024-11-07 23:26:27,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6417.1, 300 sec: 6192.6). Total num frames: 24367104. Throughput: 0: 1640.8. Samples: 1088194. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:26:27,934][41694] Avg episode reward: [(0, '4.440')] +[2024-11-07 23:26:31,565][42004] Updated weights for policy 0, policy_version 5956 (0.0034) +[2024-11-07 23:26:32,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6248.1). Total num frames: 24403968. Throughput: 0: 1640.7. Samples: 1093596. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:26:32,933][41694] Avg episode reward: [(0, '4.479')] +[2024-11-07 23:26:37,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6621.9, 300 sec: 6220.4). Total num frames: 24432640. Throughput: 0: 1608.9. Samples: 1103362. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:26:37,936][41694] Avg episode reward: [(0, '4.403')] +[2024-11-07 23:26:38,146][42004] Updated weights for policy 0, policy_version 5966 (0.0043) +[2024-11-07 23:26:42,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6348.8, 300 sec: 6178.7). Total num frames: 24453120. Throughput: 0: 1527.2. Samples: 1110102. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:26:42,933][41694] Avg episode reward: [(0, '4.515')] +[2024-11-07 23:26:45,798][42004] Updated weights for policy 0, policy_version 5976 (0.0037) +[2024-11-07 23:26:47,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6348.8, 300 sec: 6192.6). Total num frames: 24489984. Throughput: 0: 1545.0. Samples: 1115694. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:26:47,935][41694] Avg episode reward: [(0, '4.521')] +[2024-11-07 23:26:51,308][42004] Updated weights for policy 0, policy_version 5986 (0.0038) +[2024-11-07 23:26:52,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6348.8, 300 sec: 6206.5). Total num frames: 24526848. Throughput: 0: 1644.8. Samples: 1126842. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:26:52,934][41694] Avg episode reward: [(0, '4.375')] +[2024-11-07 23:26:57,167][42004] Updated weights for policy 0, policy_version 5996 (0.0029) +[2024-11-07 23:26:57,932][41694] Fps is (10 sec: 7373.1, 60 sec: 6417.2, 300 sec: 6234.3). Total num frames: 24563712. Throughput: 0: 1681.5. Samples: 1137328. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:26:57,934][41694] Avg episode reward: [(0, '4.338')] +[2024-11-07 23:27:02,899][42004] Updated weights for policy 0, policy_version 6006 (0.0031) +[2024-11-07 23:27:02,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6553.7, 300 sec: 6248.1). Total num frames: 24600576. Throughput: 0: 1693.2. Samples: 1142748. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:27:02,934][41694] Avg episode reward: [(0, '4.259')] +[2024-11-07 23:27:07,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6275.9). Total num frames: 24633344. Throughput: 0: 1687.2. Samples: 1153278. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:27:07,933][41694] Avg episode reward: [(0, '4.312')] +[2024-11-07 23:27:08,868][42004] Updated weights for policy 0, policy_version 6016 (0.0045) +[2024-11-07 23:27:14,568][41694] Fps is (10 sec: 5279.7, 60 sec: 6512.5, 300 sec: 6213.7). Total num frames: 24662016. Throughput: 0: 1593.0. Samples: 1162488. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:27:14,572][41694] Avg episode reward: [(0, '4.386')] +[2024-11-07 23:27:17,005][42004] Updated weights for policy 0, policy_version 6026 (0.0035) +[2024-11-07 23:27:17,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6553.6, 300 sec: 6220.4). Total num frames: 24686592. Throughput: 0: 1581.9. Samples: 1164784. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:27:17,934][41694] Avg episode reward: [(0, '4.582')] +[2024-11-07 23:27:22,838][42004] Updated weights for policy 0, policy_version 6036 (0.0031) +[2024-11-07 23:27:22,931][41694] Fps is (10 sec: 7346.7, 60 sec: 6553.6, 300 sec: 6220.4). Total num frames: 24723456. Throughput: 0: 1593.1. Samples: 1175052. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:27:22,933][41694] Avg episode reward: [(0, '4.439')] +[2024-11-07 23:27:27,934][41694] Fps is (10 sec: 7371.6, 60 sec: 6553.4, 300 sec: 6220.3). Total num frames: 24760320. Throughput: 0: 1701.3. Samples: 1186664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:27:27,937][41694] Avg episode reward: [(0, '4.607')] +[2024-11-07 23:27:28,032][42004] Updated weights for policy 0, policy_version 6046 (0.0020) +[2024-11-07 23:27:32,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6553.6, 300 sec: 6248.1). Total num frames: 24797184. Throughput: 0: 1700.2. Samples: 1192204. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:27:32,933][41694] Avg episode reward: [(0, '4.558')] +[2024-11-07 23:27:33,635][42004] Updated weights for policy 0, policy_version 6056 (0.0034) +[2024-11-07 23:27:37,931][41694] Fps is (10 sec: 7374.2, 60 sec: 6690.1, 300 sec: 6286.4). Total num frames: 24834048. Throughput: 0: 1699.2. Samples: 1203304. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:27:37,934][41694] Avg episode reward: [(0, '4.557')] +[2024-11-07 23:27:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000006063_24834048.pth... +[2024-11-07 23:27:38,148][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000005698_23339008.pth +[2024-11-07 23:27:39,400][42004] Updated weights for policy 0, policy_version 6066 (0.0027) +[2024-11-07 23:27:42,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 6289.8). Total num frames: 24870912. Throughput: 0: 1706.1. Samples: 1214104. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:27:42,933][41694] Avg episode reward: [(0, '4.480')] +[2024-11-07 23:27:45,275][42004] Updated weights for policy 0, policy_version 6076 (0.0037) +[2024-11-07 23:27:48,571][41694] Fps is (10 sec: 5774.7, 60 sec: 6687.2, 300 sec: 6234.6). Total num frames: 24895488. Throughput: 0: 1669.3. Samples: 1218934. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:27:48,573][41694] Avg episode reward: [(0, '4.463')] +[2024-11-07 23:27:52,702][42004] Updated weights for policy 0, policy_version 6086 (0.0032) +[2024-11-07 23:27:52,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6690.1, 300 sec: 6234.3). Total num frames: 24928256. Throughput: 0: 1625.6. Samples: 1226432. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:27:52,934][41694] Avg episode reward: [(0, '4.220')] +[2024-11-07 23:27:57,932][41694] Fps is (10 sec: 7438.9, 60 sec: 6690.1, 300 sec: 6248.1). Total num frames: 24965120. Throughput: 0: 1735.3. Samples: 1237738. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:27:57,933][41694] Avg episode reward: [(0, '4.487')] +[2024-11-07 23:27:58,424][42004] Updated weights for policy 0, policy_version 6096 (0.0036) +[2024-11-07 23:28:02,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6621.8, 300 sec: 6234.2). Total num frames: 24997888. Throughput: 0: 1741.2. Samples: 1243138. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:28:02,934][41694] Avg episode reward: [(0, '4.184')] +[2024-11-07 23:28:04,183][42004] Updated weights for policy 0, policy_version 6106 (0.0028) +[2024-11-07 23:28:07,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6758.4, 300 sec: 6275.9). Total num frames: 25038848. Throughput: 0: 1753.1. Samples: 1253942. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:28:07,934][41694] Avg episode reward: [(0, '4.459')] +[2024-11-07 23:28:09,356][42004] Updated weights for policy 0, policy_version 6116 (0.0027) +[2024-11-07 23:28:12,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7088.3, 300 sec: 6331.4). Total num frames: 25075712. Throughput: 0: 1755.5. Samples: 1265658. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:28:12,934][41694] Avg episode reward: [(0, '4.400')] +[2024-11-07 23:28:14,875][42004] Updated weights for policy 0, policy_version 6126 (0.0026) +[2024-11-07 23:28:17,932][41694] Fps is (10 sec: 6963.4, 60 sec: 7031.5, 300 sec: 6317.6). Total num frames: 25108480. Throughput: 0: 1750.7. Samples: 1270986. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:28:17,933][41694] Avg episode reward: [(0, '4.429')] +[2024-11-07 23:28:22,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6758.4, 300 sec: 6275.9). Total num frames: 25128960. Throughput: 0: 1705.7. Samples: 1280062. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:28:22,936][41694] Avg episode reward: [(0, '4.511')] +[2024-11-07 23:28:23,347][42004] Updated weights for policy 0, policy_version 6136 (0.0054) +[2024-11-07 23:28:27,933][41694] Fps is (10 sec: 5733.8, 60 sec: 6758.5, 300 sec: 6289.8). Total num frames: 25165824. Throughput: 0: 1638.4. Samples: 1287836. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:28:27,935][41694] Avg episode reward: [(0, '4.550')] +[2024-11-07 23:28:28,862][42004] Updated weights for policy 0, policy_version 6146 (0.0032) +[2024-11-07 23:28:32,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6317.6). Total num frames: 25202688. Throughput: 0: 1670.0. Samples: 1293014. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:28:32,933][41694] Avg episode reward: [(0, '4.386')] +[2024-11-07 23:28:34,259][42004] Updated weights for policy 0, policy_version 6156 (0.0028) +[2024-11-07 23:28:37,931][41694] Fps is (10 sec: 7373.6, 60 sec: 6758.4, 300 sec: 6345.3). Total num frames: 25239552. Throughput: 0: 1741.2. Samples: 1304786. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:28:37,933][41694] Avg episode reward: [(0, '4.312')] +[2024-11-07 23:28:39,750][42004] Updated weights for policy 0, policy_version 6166 (0.0031) +[2024-11-07 23:28:42,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6826.7, 300 sec: 6387.0). Total num frames: 25280512. Throughput: 0: 1740.6. Samples: 1316064. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:28:42,933][41694] Avg episode reward: [(0, '4.270')] +[2024-11-07 23:28:44,818][42004] Updated weights for policy 0, policy_version 6176 (0.0026) +[2024-11-07 23:28:47,934][41694] Fps is (10 sec: 7780.6, 60 sec: 7106.9, 300 sec: 6442.5). Total num frames: 25317376. Throughput: 0: 1756.0. Samples: 1322164. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:28:47,936][41694] Avg episode reward: [(0, '4.397')] +[2024-11-07 23:28:50,273][42004] Updated weights for policy 0, policy_version 6186 (0.0034) +[2024-11-07 23:28:52,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7099.8, 300 sec: 6470.3). Total num frames: 25354240. Throughput: 0: 1758.3. Samples: 1333064. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:28:52,934][41694] Avg episode reward: [(0, '4.416')] +[2024-11-07 23:28:57,932][41694] Fps is (10 sec: 5735.7, 60 sec: 6826.7, 300 sec: 6456.4). Total num frames: 25374720. Throughput: 0: 1651.6. Samples: 1339978. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:28:57,934][41694] Avg episode reward: [(0, '4.388')] +[2024-11-07 23:28:58,624][42004] Updated weights for policy 0, policy_version 6196 (0.0036) +[2024-11-07 23:29:02,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6826.7, 300 sec: 6470.3). Total num frames: 25407488. Throughput: 0: 1642.7. Samples: 1344906. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:29:02,934][41694] Avg episode reward: [(0, '4.371')] +[2024-11-07 23:29:04,766][42004] Updated weights for policy 0, policy_version 6206 (0.0037) +[2024-11-07 23:29:07,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6690.2, 300 sec: 6484.2). Total num frames: 25440256. Throughput: 0: 1655.1. Samples: 1354540. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:29:07,935][41694] Avg episode reward: [(0, '4.580')] +[2024-11-07 23:29:10,362][42004] Updated weights for policy 0, policy_version 6216 (0.0031) +[2024-11-07 23:29:12,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6525.9). Total num frames: 25481216. Throughput: 0: 1738.2. Samples: 1366052. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:29:12,936][41694] Avg episode reward: [(0, '4.344')] +[2024-11-07 23:29:15,596][42004] Updated weights for policy 0, policy_version 6226 (0.0033) +[2024-11-07 23:29:17,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6826.7, 300 sec: 6553.6). Total num frames: 25518080. Throughput: 0: 1751.9. Samples: 1371850. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:29:17,933][41694] Avg episode reward: [(0, '4.333')] +[2024-11-07 23:29:21,080][42004] Updated weights for policy 0, policy_version 6236 (0.0038) +[2024-11-07 23:29:22,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7099.7, 300 sec: 6609.1). Total num frames: 25554944. Throughput: 0: 1741.4. Samples: 1383150. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:29:22,934][41694] Avg episode reward: [(0, '4.333')] +[2024-11-07 23:29:27,082][42004] Updated weights for policy 0, policy_version 6246 (0.0034) +[2024-11-07 23:29:27,931][41694] Fps is (10 sec: 6963.1, 60 sec: 7031.6, 300 sec: 6595.3). Total num frames: 25587712. Throughput: 0: 1713.4. Samples: 1393168. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:29:27,933][41694] Avg episode reward: [(0, '4.414')] +[2024-11-07 23:29:32,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6758.4, 300 sec: 6553.6). Total num frames: 25608192. Throughput: 0: 1674.3. Samples: 1397506. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:29:32,935][41694] Avg episode reward: [(0, '4.278')] +[2024-11-07 23:29:34,946][42004] Updated weights for policy 0, policy_version 6256 (0.0032) +[2024-11-07 23:29:37,933][41694] Fps is (10 sec: 5733.7, 60 sec: 6758.3, 300 sec: 6567.5). Total num frames: 25645056. Throughput: 0: 1610.5. Samples: 1405538. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:29:37,946][41694] Avg episode reward: [(0, '4.174')] +[2024-11-07 23:29:37,979][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000006261_25645056.pth... +[2024-11-07 23:29:38,276][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000005869_24039424.pth +[2024-11-07 23:29:40,981][42004] Updated weights for policy 0, policy_version 6266 (0.0027) +[2024-11-07 23:29:42,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6621.8, 300 sec: 6553.6). Total num frames: 25677824. Throughput: 0: 1682.3. Samples: 1415680. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-07 23:29:42,936][41694] Avg episode reward: [(0, '4.345')] +[2024-11-07 23:29:46,446][42004] Updated weights for policy 0, policy_version 6276 (0.0023) +[2024-11-07 23:29:47,932][41694] Fps is (10 sec: 6963.8, 60 sec: 6622.1, 300 sec: 6581.4). Total num frames: 25714688. Throughput: 0: 1699.9. Samples: 1421402. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:29:47,933][41694] Avg episode reward: [(0, '4.369')] +[2024-11-07 23:29:51,667][42004] Updated weights for policy 0, policy_version 6286 (0.0028) +[2024-11-07 23:29:52,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6690.1, 300 sec: 6660.4). Total num frames: 25755648. Throughput: 0: 1746.4. Samples: 1433126. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:29:52,933][41694] Avg episode reward: [(0, '4.363')] +[2024-11-07 23:29:57,308][42004] Updated weights for policy 0, policy_version 6296 (0.0029) +[2024-11-07 23:29:57,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6894.9, 300 sec: 6664.7). Total num frames: 25788416. Throughput: 0: 1739.1. Samples: 1444312. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:29:57,933][41694] Avg episode reward: [(0, '4.267')] +[2024-11-07 23:30:02,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6895.0, 300 sec: 6650.8). Total num frames: 25821184. Throughput: 0: 1711.0. Samples: 1448844. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:30:02,933][41694] Avg episode reward: [(0, '4.644')] +[2024-11-07 23:30:03,872][42004] Updated weights for policy 0, policy_version 6306 (0.0030) +[2024-11-07 23:30:07,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6690.1, 300 sec: 6609.1). Total num frames: 25841664. Throughput: 0: 1596.0. Samples: 1454970. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:30:07,934][41694] Avg episode reward: [(0, '4.767')] +[2024-11-07 23:30:11,725][42004] Updated weights for policy 0, policy_version 6316 (0.0028) +[2024-11-07 23:30:12,935][41694] Fps is (10 sec: 5322.9, 60 sec: 6553.2, 300 sec: 6623.0). Total num frames: 25874432. Throughput: 0: 1606.9. Samples: 1465484. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:30:12,941][41694] Avg episode reward: [(0, '4.262')] +[2024-11-07 23:30:17,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6485.3, 300 sec: 6636.9). Total num frames: 25907200. Throughput: 0: 1615.0. Samples: 1470182. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:30:17,934][41694] Avg episode reward: [(0, '4.420')] +[2024-11-07 23:30:18,105][42004] Updated weights for policy 0, policy_version 6326 (0.0030) +[2024-11-07 23:30:22,931][41694] Fps is (10 sec: 6965.7, 60 sec: 6485.4, 300 sec: 6650.8). Total num frames: 25944064. Throughput: 0: 1664.0. Samples: 1480418. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:30:22,933][41694] Avg episode reward: [(0, '4.478')] +[2024-11-07 23:30:23,705][42004] Updated weights for policy 0, policy_version 6336 (0.0033) +[2024-11-07 23:30:27,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 25985024. Throughput: 0: 1699.7. Samples: 1492166. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:30:27,934][41694] Avg episode reward: [(0, '4.499')] +[2024-11-07 23:30:28,901][42004] Updated weights for policy 0, policy_version 6346 (0.0032) +[2024-11-07 23:30:32,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 26017792. Throughput: 0: 1700.3. Samples: 1497914. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:30:32,933][41694] Avg episode reward: [(0, '4.400')] +[2024-11-07 23:30:35,290][42004] Updated weights for policy 0, policy_version 6356 (0.0025) +[2024-11-07 23:30:37,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6758.5, 300 sec: 6706.3). Total num frames: 26050560. Throughput: 0: 1646.8. Samples: 1507232. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:30:37,933][41694] Avg episode reward: [(0, '4.198')] +[2024-11-07 23:30:42,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6485.3, 300 sec: 6636.9). Total num frames: 26066944. Throughput: 0: 1541.0. Samples: 1513656. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:30:42,936][41694] Avg episode reward: [(0, '4.423')] +[2024-11-07 23:30:43,647][42004] Updated weights for policy 0, policy_version 6366 (0.0034) +[2024-11-07 23:30:47,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6485.4, 300 sec: 6636.9). Total num frames: 26103808. Throughput: 0: 1551.4. Samples: 1518658. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:30:47,935][41694] Avg episode reward: [(0, '4.585')] +[2024-11-07 23:30:49,286][42004] Updated weights for policy 0, policy_version 6376 (0.0027) +[2024-11-07 23:30:52,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6417.1, 300 sec: 6650.8). Total num frames: 26140672. Throughput: 0: 1660.4. Samples: 1529686. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:30:52,935][41694] Avg episode reward: [(0, '4.446')] +[2024-11-07 23:30:54,577][42004] Updated weights for policy 0, policy_version 6386 (0.0024) +[2024-11-07 23:30:57,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6553.6, 300 sec: 6692.5). Total num frames: 26181632. Throughput: 0: 1693.6. Samples: 1541690. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:30:57,934][41694] Avg episode reward: [(0, '4.274')] +[2024-11-07 23:30:59,951][42004] Updated weights for policy 0, policy_version 6396 (0.0026) +[2024-11-07 23:31:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6553.6, 300 sec: 6734.1). Total num frames: 26214400. Throughput: 0: 1710.0. Samples: 1547130. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:31:02,933][41694] Avg episode reward: [(0, '4.484')] +[2024-11-07 23:31:05,896][42004] Updated weights for policy 0, policy_version 6406 (0.0026) +[2024-11-07 23:31:07,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6826.7, 300 sec: 6748.0). Total num frames: 26251264. Throughput: 0: 1713.1. Samples: 1557506. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:31:07,934][41694] Avg episode reward: [(0, '4.408')] +[2024-11-07 23:31:12,325][42004] Updated weights for policy 0, policy_version 6416 (0.0038) +[2024-11-07 23:31:14,825][41694] Fps is (10 sec: 5510.5, 60 sec: 6552.1, 300 sec: 6691.2). Total num frames: 26279936. Throughput: 0: 1595.6. Samples: 1566988. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:31:14,828][41694] Avg episode reward: [(0, '4.498')] +[2024-11-07 23:31:17,931][41694] Fps is (10 sec: 4915.3, 60 sec: 6553.6, 300 sec: 6678.6). Total num frames: 26300416. Throughput: 0: 1569.3. Samples: 1568534. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:31:17,934][41694] Avg episode reward: [(0, '4.447')] +[2024-11-07 23:31:20,553][42004] Updated weights for policy 0, policy_version 6426 (0.0037) +[2024-11-07 23:31:22,935][41694] Fps is (10 sec: 7073.2, 60 sec: 6553.6, 300 sec: 6678.6). Total num frames: 26337280. Throughput: 0: 1585.1. Samples: 1578560. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:31:22,938][41694] Avg episode reward: [(0, '4.505')] +[2024-11-07 23:31:26,110][42004] Updated weights for policy 0, policy_version 6436 (0.0028) +[2024-11-07 23:31:27,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6485.3, 300 sec: 6678.6). Total num frames: 26374144. Throughput: 0: 1692.4. Samples: 1589814. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:31:27,935][41694] Avg episode reward: [(0, '4.490')] +[2024-11-07 23:31:31,425][42004] Updated weights for policy 0, policy_version 6446 (0.0035) +[2024-11-07 23:31:32,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6553.6, 300 sec: 6706.3). Total num frames: 26411008. Throughput: 0: 1705.7. Samples: 1595414. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:31:32,933][41694] Avg episode reward: [(0, '4.736')] +[2024-11-07 23:31:36,890][42004] Updated weights for policy 0, policy_version 6456 (0.0024) +[2024-11-07 23:31:37,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.8, 300 sec: 6761.9). Total num frames: 26447872. Throughput: 0: 1714.3. Samples: 1606832. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 23:31:37,936][41694] Avg episode reward: [(0, '4.123')] +[2024-11-07 23:31:37,956][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000006457_26447872.pth... +[2024-11-07 23:31:38,129][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000006063_24834048.pth +[2024-11-07 23:31:42,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 6748.0). Total num frames: 26480640. Throughput: 0: 1660.9. Samples: 1616430. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:31:42,933][41694] Avg episode reward: [(0, '4.371')] +[2024-11-07 23:31:43,488][42004] Updated weights for policy 0, policy_version 6466 (0.0029) +[2024-11-07 23:31:49,066][41694] Fps is (10 sec: 5518.1, 60 sec: 6633.0, 300 sec: 6694.5). Total num frames: 26509312. Throughput: 0: 1607.5. Samples: 1621290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:31:49,070][41694] Avg episode reward: [(0, '4.351')] +[2024-11-07 23:31:51,036][42004] Updated weights for policy 0, policy_version 6476 (0.0033) +[2024-11-07 23:31:52,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6621.9, 300 sec: 6692.4). Total num frames: 26537984. Throughput: 0: 1583.4. Samples: 1628760. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:31:52,933][41694] Avg episode reward: [(0, '4.316')] +[2024-11-07 23:31:56,644][42004] Updated weights for policy 0, policy_version 6486 (0.0035) +[2024-11-07 23:31:57,932][41694] Fps is (10 sec: 7392.2, 60 sec: 6553.6, 300 sec: 6692.4). Total num frames: 26574848. Throughput: 0: 1691.3. Samples: 1639894. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:31:57,934][41694] Avg episode reward: [(0, '4.255')] +[2024-11-07 23:32:02,388][42004] Updated weights for policy 0, policy_version 6496 (0.0033) +[2024-11-07 23:32:02,933][41694] Fps is (10 sec: 6962.4, 60 sec: 6553.5, 300 sec: 6692.4). Total num frames: 26607616. Throughput: 0: 1706.4. Samples: 1645324. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:32:02,940][41694] Avg episode reward: [(0, '4.344')] +[2024-11-07 23:32:07,799][42004] Updated weights for policy 0, policy_version 6506 (0.0029) +[2024-11-07 23:32:07,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6621.9, 300 sec: 6771.7). Total num frames: 26648576. Throughput: 0: 1727.1. Samples: 1656278. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:32:07,933][41694] Avg episode reward: [(0, '4.291')] +[2024-11-07 23:32:12,932][41694] Fps is (10 sec: 7783.3, 60 sec: 6978.6, 300 sec: 6775.8). Total num frames: 26685440. Throughput: 0: 1731.3. Samples: 1667722. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:32:12,937][41694] Avg episode reward: [(0, '4.245')] +[2024-11-07 23:32:13,506][42004] Updated weights for policy 0, policy_version 6516 (0.0035) +[2024-11-07 23:32:17,934][41694] Fps is (10 sec: 6551.9, 60 sec: 6894.7, 300 sec: 6747.9). Total num frames: 26714112. Throughput: 0: 1708.9. Samples: 1672320. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:32:17,936][41694] Avg episode reward: [(0, '4.466')] +[2024-11-07 23:32:19,598][42004] Updated weights for policy 0, policy_version 6526 (0.0035) +[2024-11-07 23:32:23,300][41694] Fps is (10 sec: 5135.4, 60 sec: 6649.3, 300 sec: 6698.0). Total num frames: 26738688. Throughput: 0: 1671.5. Samples: 1682668. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:32:23,302][41694] Avg episode reward: [(0, '4.474')] +[2024-11-07 23:32:27,437][42004] Updated weights for policy 0, policy_version 6536 (0.0032) +[2024-11-07 23:32:27,933][41694] Fps is (10 sec: 5735.0, 60 sec: 6621.7, 300 sec: 6692.4). Total num frames: 26771456. Throughput: 0: 1627.1. Samples: 1689652. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:32:27,936][41694] Avg episode reward: [(0, '4.388')] +[2024-11-07 23:32:32,931][41694] Fps is (10 sec: 7229.9, 60 sec: 6621.9, 300 sec: 6692.4). Total num frames: 26808320. Throughput: 0: 1677.7. Samples: 1694882. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:32:32,933][41694] Avg episode reward: [(0, '4.432')] +[2024-11-07 23:32:32,964][42004] Updated weights for policy 0, policy_version 6546 (0.0028) +[2024-11-07 23:32:37,932][41694] Fps is (10 sec: 7783.4, 60 sec: 6690.1, 300 sec: 6706.3). Total num frames: 26849280. Throughput: 0: 1726.2. Samples: 1706440. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:32:37,935][41694] Avg episode reward: [(0, '4.323')] +[2024-11-07 23:32:38,655][42004] Updated weights for policy 0, policy_version 6556 (0.0023) +[2024-11-07 23:32:42,932][41694] Fps is (10 sec: 7782.0, 60 sec: 6758.3, 300 sec: 6762.6). Total num frames: 26886144. Throughput: 0: 1730.4. Samples: 1717762. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:32:42,934][41694] Avg episode reward: [(0, '4.472')] +[2024-11-07 23:32:43,818][42004] Updated weights for policy 0, policy_version 6566 (0.0030) +[2024-11-07 23:32:47,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6958.2, 300 sec: 6748.0). Total num frames: 26918912. Throughput: 0: 1735.9. Samples: 1723438. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:32:47,933][41694] Avg episode reward: [(0, '4.246')] +[2024-11-07 23:32:50,126][42004] Updated weights for policy 0, policy_version 6576 (0.0037) +[2024-11-07 23:32:52,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6894.9, 300 sec: 6734.1). Total num frames: 26951680. Throughput: 0: 1699.9. Samples: 1732776. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:32:52,936][41694] Avg episode reward: [(0, '4.361')] +[2024-11-07 23:32:57,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.9, 300 sec: 6692.4). Total num frames: 26972160. Throughput: 0: 1623.1. Samples: 1740760. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:32:57,933][41694] Avg episode reward: [(0, '4.287')] +[2024-11-07 23:32:58,240][42004] Updated weights for policy 0, policy_version 6586 (0.0032) +[2024-11-07 23:33:02,931][41694] Fps is (10 sec: 5325.0, 60 sec: 6622.0, 300 sec: 6664.7). Total num frames: 27004928. Throughput: 0: 1601.0. Samples: 1744360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:33:02,956][41694] Avg episode reward: [(0, '4.384')] +[2024-11-07 23:33:04,735][42004] Updated weights for policy 0, policy_version 6596 (0.0034) +[2024-11-07 23:33:07,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6485.3, 300 sec: 6650.8). Total num frames: 27037696. Throughput: 0: 1605.3. Samples: 1754314. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:33:07,933][41694] Avg episode reward: [(0, '4.476')] +[2024-11-07 23:33:10,331][42004] Updated weights for policy 0, policy_version 6606 (0.0031) +[2024-11-07 23:33:12,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6485.3, 300 sec: 6664.7). Total num frames: 27074560. Throughput: 0: 1674.8. Samples: 1765018. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:33:12,933][41694] Avg episode reward: [(0, '4.316')] +[2024-11-07 23:33:15,771][42004] Updated weights for policy 0, policy_version 6616 (0.0035) +[2024-11-07 23:33:17,933][41694] Fps is (10 sec: 7371.7, 60 sec: 6622.0, 300 sec: 6720.2). Total num frames: 27111424. Throughput: 0: 1690.1. Samples: 1770940. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:33:17,937][41694] Avg episode reward: [(0, '4.372')] +[2024-11-07 23:33:21,960][42004] Updated weights for policy 0, policy_version 6626 (0.0036) +[2024-11-07 23:33:22,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6800.2, 300 sec: 6706.4). Total num frames: 27144192. Throughput: 0: 1663.5. Samples: 1781296. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:33:22,933][41694] Avg episode reward: [(0, '4.401')] +[2024-11-07 23:33:27,834][42004] Updated weights for policy 0, policy_version 6636 (0.0025) +[2024-11-07 23:33:27,932][41694] Fps is (10 sec: 6963.7, 60 sec: 6826.7, 300 sec: 6706.3). Total num frames: 27181056. Throughput: 0: 1635.1. Samples: 1791344. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:33:27,935][41694] Avg episode reward: [(0, '4.340')] +[2024-11-07 23:33:32,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6485.3, 300 sec: 6636.9). Total num frames: 27197440. Throughput: 0: 1623.2. Samples: 1796482. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:33:32,933][41694] Avg episode reward: [(0, '4.405')] +[2024-11-07 23:33:36,020][42004] Updated weights for policy 0, policy_version 6646 (0.0038) +[2024-11-07 23:33:37,932][41694] Fps is (10 sec: 5325.0, 60 sec: 6417.1, 300 sec: 6623.0). Total num frames: 27234304. Throughput: 0: 1559.6. Samples: 1802956. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:33:37,934][41694] Avg episode reward: [(0, '4.531')] +[2024-11-07 23:33:37,963][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000006649_27234304.pth... +[2024-11-07 23:33:38,064][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000006261_25645056.pth +[2024-11-07 23:33:41,940][42004] Updated weights for policy 0, policy_version 6656 (0.0026) +[2024-11-07 23:33:42,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6348.9, 300 sec: 6609.2). Total num frames: 27267072. Throughput: 0: 1613.7. Samples: 1813376. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:33:42,934][41694] Avg episode reward: [(0, '4.512')] +[2024-11-07 23:33:47,348][42004] Updated weights for policy 0, policy_version 6666 (0.0027) +[2024-11-07 23:33:47,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6485.3, 300 sec: 6623.0). Total num frames: 27308032. Throughput: 0: 1661.6. Samples: 1819134. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:33:47,934][41694] Avg episode reward: [(0, '4.558')] +[2024-11-07 23:33:52,886][42004] Updated weights for policy 0, policy_version 6676 (0.0033) +[2024-11-07 23:33:52,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6553.6, 300 sec: 6678.6). Total num frames: 27344896. Throughput: 0: 1688.5. Samples: 1830296. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:33:52,934][41694] Avg episode reward: [(0, '4.525')] +[2024-11-07 23:33:57,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6690.1, 300 sec: 6664.7). Total num frames: 27373568. Throughput: 0: 1669.3. Samples: 1840136. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:33:57,934][41694] Avg episode reward: [(0, '4.520')] +[2024-11-07 23:33:59,676][42004] Updated weights for policy 0, policy_version 6686 (0.0055) +[2024-11-07 23:34:02,932][41694] Fps is (10 sec: 6144.1, 60 sec: 6690.1, 300 sec: 6664.7). Total num frames: 27406336. Throughput: 0: 1635.7. Samples: 1844546. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:34:02,934][41694] Avg episode reward: [(0, '4.449')] +[2024-11-07 23:34:07,644][42004] Updated weights for policy 0, policy_version 6696 (0.0032) +[2024-11-07 23:34:07,932][41694] Fps is (10 sec: 5324.5, 60 sec: 6485.3, 300 sec: 6595.2). Total num frames: 27426816. Throughput: 0: 1574.2. Samples: 1852138. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:34:07,935][41694] Avg episode reward: [(0, '4.383')] +[2024-11-07 23:34:12,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6417.1, 300 sec: 6581.4). Total num frames: 27459584. Throughput: 0: 1551.9. Samples: 1861178. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 23:34:12,933][41694] Avg episode reward: [(0, '4.420')] +[2024-11-07 23:34:13,807][42004] Updated weights for policy 0, policy_version 6706 (0.0038) +[2024-11-07 23:34:17,932][41694] Fps is (10 sec: 6963.5, 60 sec: 6417.2, 300 sec: 6581.4). Total num frames: 27496448. Throughput: 0: 1552.7. Samples: 1866354. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:34:17,934][41694] Avg episode reward: [(0, '4.432')] +[2024-11-07 23:34:19,622][42004] Updated weights for policy 0, policy_version 6716 (0.0040) +[2024-11-07 23:34:22,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6417.1, 300 sec: 6581.4). Total num frames: 27529216. Throughput: 0: 1643.8. Samples: 1876926. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:34:22,933][41694] Avg episode reward: [(0, '4.439')] +[2024-11-07 23:34:25,154][42004] Updated weights for policy 0, policy_version 6726 (0.0036) +[2024-11-07 23:34:27,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6417.1, 300 sec: 6636.9). Total num frames: 27566080. Throughput: 0: 1657.0. Samples: 1887942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-07 23:34:27,933][41694] Avg episode reward: [(0, '4.383')] +[2024-11-07 23:34:31,611][42004] Updated weights for policy 0, policy_version 6736 (0.0034) +[2024-11-07 23:34:32,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 6623.1). Total num frames: 27598848. Throughput: 0: 1636.0. Samples: 1892752. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:34:32,933][41694] Avg episode reward: [(0, '4.414')] +[2024-11-07 23:34:37,221][42004] Updated weights for policy 0, policy_version 6746 (0.0030) +[2024-11-07 23:34:37,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6621.9, 300 sec: 6623.0). Total num frames: 27631616. Throughput: 0: 1620.8. Samples: 1903232. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:34:37,933][41694] Avg episode reward: [(0, '4.579')] +[2024-11-07 23:34:42,932][41694] Fps is (10 sec: 5324.6, 60 sec: 6417.0, 300 sec: 6567.5). Total num frames: 27652096. Throughput: 0: 1542.2. Samples: 1909536. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:34:42,935][41694] Avg episode reward: [(0, '4.474')] +[2024-11-07 23:34:45,717][42004] Updated weights for policy 0, policy_version 6756 (0.0031) +[2024-11-07 23:34:47,931][41694] Fps is (10 sec: 5325.0, 60 sec: 6280.5, 300 sec: 6539.7). Total num frames: 27684864. Throughput: 0: 1553.8. Samples: 1914468. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:34:47,933][41694] Avg episode reward: [(0, '4.384')] +[2024-11-07 23:34:51,262][42004] Updated weights for policy 0, policy_version 6766 (0.0036) +[2024-11-07 23:34:52,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6280.6, 300 sec: 6553.6). Total num frames: 27721728. Throughput: 0: 1631.2. Samples: 1925542. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:34:52,933][41694] Avg episode reward: [(0, '4.493')] +[2024-11-07 23:34:56,878][42004] Updated weights for policy 0, policy_version 6776 (0.0026) +[2024-11-07 23:34:57,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6417.1, 300 sec: 6567.5). Total num frames: 27758592. Throughput: 0: 1668.8. Samples: 1936272. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:34:57,933][41694] Avg episode reward: [(0, '4.197')] +[2024-11-07 23:35:02,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6348.8, 300 sec: 6595.3). Total num frames: 27787264. Throughput: 0: 1661.4. Samples: 1941118. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 23:35:02,935][41694] Avg episode reward: [(0, '4.297')] +[2024-11-07 23:35:03,745][42004] Updated weights for policy 0, policy_version 6786 (0.0044) +[2024-11-07 23:35:07,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6553.7, 300 sec: 6595.3). Total num frames: 27820032. Throughput: 0: 1616.7. Samples: 1949678. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:35:07,933][41694] Avg episode reward: [(0, '4.318')] +[2024-11-07 23:35:10,120][42004] Updated weights for policy 0, policy_version 6796 (0.0040) +[2024-11-07 23:35:12,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6485.3, 300 sec: 6581.4). Total num frames: 27848704. Throughput: 0: 1591.9. Samples: 1959578. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:35:12,937][41694] Avg episode reward: [(0, '4.258')] +[2024-11-07 23:35:17,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6212.3, 300 sec: 6525.8). Total num frames: 27869184. Throughput: 0: 1514.7. Samples: 1960914. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:35:17,933][41694] Avg episode reward: [(0, '4.260')] +[2024-11-07 23:35:18,777][42004] Updated weights for policy 0, policy_version 6806 (0.0040) +[2024-11-07 23:35:22,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6212.3, 300 sec: 6498.1). Total num frames: 27901952. Throughput: 0: 1482.0. Samples: 1969922. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:35:22,935][41694] Avg episode reward: [(0, '4.312')] +[2024-11-07 23:35:25,065][42004] Updated weights for policy 0, policy_version 6816 (0.0034) +[2024-11-07 23:35:27,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6144.0, 300 sec: 6498.1). Total num frames: 27934720. Throughput: 0: 1579.8. Samples: 1980628. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:35:27,933][41694] Avg episode reward: [(0, '4.469')] +[2024-11-07 23:35:31,043][42004] Updated weights for policy 0, policy_version 6826 (0.0045) +[2024-11-07 23:35:32,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6212.3, 300 sec: 6511.9). Total num frames: 27971584. Throughput: 0: 1580.7. Samples: 1985600. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:35:32,934][41694] Avg episode reward: [(0, '4.414')] +[2024-11-07 23:35:37,216][42004] Updated weights for policy 0, policy_version 6836 (0.0028) +[2024-11-07 23:35:37,932][41694] Fps is (10 sec: 6553.1, 60 sec: 6143.9, 300 sec: 6553.6). Total num frames: 28000256. Throughput: 0: 1561.3. Samples: 1995800. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-07 23:35:37,935][41694] Avg episode reward: [(0, '4.242')] +[2024-11-07 23:35:37,953][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000006837_28004352.pth... +[2024-11-07 23:35:38,084][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000006457_26447872.pth +[2024-11-07 23:35:42,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6348.8, 300 sec: 6539.7). Total num frames: 28033024. Throughput: 0: 1520.1. Samples: 2004678. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 23:35:42,934][41694] Avg episode reward: [(0, '4.357')] +[2024-11-07 23:35:43,895][42004] Updated weights for policy 0, policy_version 6846 (0.0041) +[2024-11-07 23:35:48,720][41694] Fps is (10 sec: 5695.5, 60 sec: 6199.1, 300 sec: 6494.6). Total num frames: 28061696. Throughput: 0: 1500.1. Samples: 2009804. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 23:35:48,722][41694] Avg episode reward: [(0, '4.508')] +[2024-11-07 23:35:51,215][42004] Updated weights for policy 0, policy_version 6856 (0.0029) +[2024-11-07 23:35:52,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6212.3, 300 sec: 6484.2). Total num frames: 28094464. Throughput: 0: 1510.9. Samples: 2017668. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:35:52,935][41694] Avg episode reward: [(0, '4.507')] +[2024-11-07 23:35:57,027][42004] Updated weights for policy 0, policy_version 6866 (0.0037) +[2024-11-07 23:35:57,931][41694] Fps is (10 sec: 7114.6, 60 sec: 6144.0, 300 sec: 6484.2). Total num frames: 28127232. Throughput: 0: 1528.8. Samples: 2028376. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 23:35:57,933][41694] Avg episode reward: [(0, '4.262')] +[2024-11-07 23:36:02,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6212.3, 300 sec: 6470.3). Total num frames: 28160000. Throughput: 0: 1612.7. Samples: 2033484. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:36:02,934][41694] Avg episode reward: [(0, '4.313')] +[2024-11-07 23:36:03,208][42004] Updated weights for policy 0, policy_version 6876 (0.0024) +[2024-11-07 23:36:07,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6280.5, 300 sec: 6540.0). Total num frames: 28196864. Throughput: 0: 1633.7. Samples: 2043440. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:36:07,933][41694] Avg episode reward: [(0, '4.443')] +[2024-11-07 23:36:09,357][42004] Updated weights for policy 0, policy_version 6886 (0.0029) +[2024-11-07 23:36:12,933][41694] Fps is (10 sec: 6143.3, 60 sec: 6212.1, 300 sec: 6511.9). Total num frames: 28221440. Throughput: 0: 1586.7. Samples: 2052030. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:36:12,934][41694] Avg episode reward: [(0, '4.583')] +[2024-11-07 23:36:16,686][42004] Updated weights for policy 0, policy_version 6896 (0.0031) +[2024-11-07 23:36:17,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 28254208. Throughput: 0: 1568.1. Samples: 2056166. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:36:17,933][41694] Avg episode reward: [(0, '4.473')] +[2024-11-07 23:36:22,932][41694] Fps is (10 sec: 5325.3, 60 sec: 6212.2, 300 sec: 6442.5). Total num frames: 28274688. Throughput: 0: 1565.6. Samples: 2066250. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:36:22,934][41694] Avg episode reward: [(0, '4.542')] +[2024-11-07 23:36:24,342][42004] Updated weights for policy 0, policy_version 6906 (0.0027) +[2024-11-07 23:36:27,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6442.5). Total num frames: 28311552. Throughput: 0: 1542.6. Samples: 2074094. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:36:27,933][41694] Avg episode reward: [(0, '4.454')] +[2024-11-07 23:36:29,886][42004] Updated weights for policy 0, policy_version 6916 (0.0031) +[2024-11-07 23:36:32,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6280.5, 300 sec: 6442.5). Total num frames: 28348416. Throughput: 0: 1584.8. Samples: 2079870. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:36:32,934][41694] Avg episode reward: [(0, '4.376')] +[2024-11-07 23:36:35,174][42004] Updated weights for policy 0, policy_version 6926 (0.0030) +[2024-11-07 23:36:37,933][41694] Fps is (10 sec: 7781.2, 60 sec: 6485.3, 300 sec: 6470.3). Total num frames: 28389376. Throughput: 0: 1642.2. Samples: 2091568. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:36:37,935][41694] Avg episode reward: [(0, '4.357')] +[2024-11-07 23:36:40,714][42004] Updated weights for policy 0, policy_version 6936 (0.0025) +[2024-11-07 23:36:42,940][41694] Fps is (10 sec: 7776.0, 60 sec: 6552.7, 300 sec: 6523.0). Total num frames: 28426240. Throughput: 0: 1647.9. Samples: 2102544. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:36:42,945][41694] Avg episode reward: [(0, '4.534')] +[2024-11-07 23:36:46,680][42004] Updated weights for policy 0, policy_version 6946 (0.0031) +[2024-11-07 23:36:47,934][41694] Fps is (10 sec: 6553.0, 60 sec: 6640.6, 300 sec: 6498.0). Total num frames: 28454912. Throughput: 0: 1642.8. Samples: 2107412. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:36:47,936][41694] Avg episode reward: [(0, '4.543')] +[2024-11-07 23:36:52,425][42004] Updated weights for policy 0, policy_version 6956 (0.0025) +[2024-11-07 23:36:52,931][41694] Fps is (10 sec: 6559.1, 60 sec: 6621.9, 300 sec: 6498.1). Total num frames: 28491776. Throughput: 0: 1652.1. Samples: 2117786. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:36:52,933][41694] Avg episode reward: [(0, '4.337')] +[2024-11-07 23:36:57,932][41694] Fps is (10 sec: 6145.5, 60 sec: 6485.3, 300 sec: 6470.3). Total num frames: 28516352. Throughput: 0: 1641.8. Samples: 2125908. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-07 23:36:57,933][41694] Avg episode reward: [(0, '4.450')] +[2024-11-07 23:36:59,540][42004] Updated weights for policy 0, policy_version 6966 (0.0025) +[2024-11-07 23:37:02,933][41694] Fps is (10 sec: 6143.7, 60 sec: 6553.6, 300 sec: 6456.4). Total num frames: 28553216. Throughput: 0: 1677.8. Samples: 2131668. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-07 23:37:02,936][41694] Avg episode reward: [(0, '4.487')] +[2024-11-07 23:37:05,420][42004] Updated weights for policy 0, policy_version 6976 (0.0028) +[2024-11-07 23:37:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6553.6, 300 sec: 6456.4). Total num frames: 28590080. Throughput: 0: 1687.2. Samples: 2142172. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:37:07,934][41694] Avg episode reward: [(0, '4.614')] +[2024-11-07 23:37:10,950][42004] Updated weights for policy 0, policy_version 6986 (0.0029) +[2024-11-07 23:37:12,931][41694] Fps is (10 sec: 7373.2, 60 sec: 6758.5, 300 sec: 6484.2). Total num frames: 28626944. Throughput: 0: 1761.0. Samples: 2153340. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:37:12,933][41694] Avg episode reward: [(0, '4.357')] +[2024-11-07 23:37:16,506][42004] Updated weights for policy 0, policy_version 6996 (0.0030) +[2024-11-07 23:37:17,933][41694] Fps is (10 sec: 7372.1, 60 sec: 6826.5, 300 sec: 6534.0). Total num frames: 28663808. Throughput: 0: 1752.6. Samples: 2158738. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:37:17,936][41694] Avg episode reward: [(0, '4.373')] +[2024-11-07 23:37:22,863][42004] Updated weights for policy 0, policy_version 7006 (0.0035) +[2024-11-07 23:37:22,932][41694] Fps is (10 sec: 6963.1, 60 sec: 7031.5, 300 sec: 6525.9). Total num frames: 28696576. Throughput: 0: 1704.5. Samples: 2168266. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:37:22,934][41694] Avg episode reward: [(0, '4.406')] +[2024-11-07 23:37:27,931][41694] Fps is (10 sec: 6964.0, 60 sec: 7031.5, 300 sec: 6525.8). Total num frames: 28733440. Throughput: 0: 1713.5. Samples: 2179636. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:37:27,936][41694] Avg episode reward: [(0, '4.392')] +[2024-11-07 23:37:28,237][42004] Updated weights for policy 0, policy_version 7016 (0.0031) +[2024-11-07 23:37:32,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6826.7, 300 sec: 6470.3). Total num frames: 28758016. Throughput: 0: 1686.6. Samples: 2183306. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:37:32,933][41694] Avg episode reward: [(0, '4.323')] +[2024-11-07 23:37:35,541][42004] Updated weights for policy 0, policy_version 7026 (0.0023) +[2024-11-07 23:37:37,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6758.6, 300 sec: 6470.3). Total num frames: 28794880. Throughput: 0: 1671.7. Samples: 2193012. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:37:37,934][41694] Avg episode reward: [(0, '4.403')] +[2024-11-07 23:37:37,949][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000007030_28794880.pth... +[2024-11-07 23:37:38,313][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000006649_27234304.pth +[2024-11-07 23:37:41,453][42004] Updated weights for policy 0, policy_version 7036 (0.0037) +[2024-11-07 23:37:42,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6691.1, 300 sec: 6470.3). Total num frames: 28827648. Throughput: 0: 1719.7. Samples: 2203296. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:37:42,935][41694] Avg episode reward: [(0, '4.417')] +[2024-11-07 23:37:47,367][42004] Updated weights for policy 0, policy_version 7046 (0.0042) +[2024-11-07 23:37:47,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.9, 300 sec: 6484.2). Total num frames: 28864512. Throughput: 0: 1701.7. Samples: 2208242. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:37:47,934][41694] Avg episode reward: [(0, '4.587')] +[2024-11-07 23:37:52,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6525.8). Total num frames: 28897280. Throughput: 0: 1713.9. Samples: 2219296. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:37:52,933][41694] Avg episode reward: [(0, '4.347')] +[2024-11-07 23:37:53,155][42004] Updated weights for policy 0, policy_version 7056 (0.0028) +[2024-11-07 23:37:57,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6963.2, 300 sec: 6539.7). Total num frames: 28934144. Throughput: 0: 1698.1. Samples: 2229754. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:37:57,934][41694] Avg episode reward: [(0, '4.401')] +[2024-11-07 23:37:58,757][42004] Updated weights for policy 0, policy_version 7066 (0.0025) +[2024-11-07 23:38:02,933][41694] Fps is (10 sec: 6963.2, 60 sec: 6895.0, 300 sec: 6539.7). Total num frames: 28966912. Throughput: 0: 1688.5. Samples: 2234720. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:38:02,937][41694] Avg episode reward: [(0, '4.730')] +[2024-11-07 23:38:06,920][42004] Updated weights for policy 0, policy_version 7076 (0.0043) +[2024-11-07 23:38:07,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6621.9, 300 sec: 6484.2). Total num frames: 28987392. Throughput: 0: 1628.7. Samples: 2241558. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:38:07,933][41694] Avg episode reward: [(0, '4.298')] +[2024-11-07 23:38:12,416][42004] Updated weights for policy 0, policy_version 7086 (0.0032) +[2024-11-07 23:38:12,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6690.1, 300 sec: 6498.1). Total num frames: 29028352. Throughput: 0: 1629.2. Samples: 2252948. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:38:12,936][41694] Avg episode reward: [(0, '4.234')] +[2024-11-07 23:38:17,724][42004] Updated weights for policy 0, policy_version 7096 (0.0027) +[2024-11-07 23:38:17,932][41694] Fps is (10 sec: 7782.1, 60 sec: 6690.2, 300 sec: 6511.9). Total num frames: 29065216. Throughput: 0: 1675.5. Samples: 2258702. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:38:17,935][41694] Avg episode reward: [(0, '4.413')] +[2024-11-07 23:38:22,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6512.0). Total num frames: 29102080. Throughput: 0: 1709.8. Samples: 2269954. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:38:22,934][41694] Avg episode reward: [(0, '4.438')] +[2024-11-07 23:38:23,124][42004] Updated weights for policy 0, policy_version 7106 (0.0023) +[2024-11-07 23:38:27,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6690.1, 300 sec: 6567.5). Total num frames: 29134848. Throughput: 0: 1714.2. Samples: 2280436. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:38:27,933][41694] Avg episode reward: [(0, '4.413')] +[2024-11-07 23:38:29,286][42004] Updated weights for policy 0, policy_version 7116 (0.0035) +[2024-11-07 23:38:32,933][41694] Fps is (10 sec: 6962.2, 60 sec: 6894.8, 300 sec: 6567.5). Total num frames: 29171712. Throughput: 0: 1722.4. Samples: 2285752. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:38:32,935][41694] Avg episode reward: [(0, '4.423')] +[2024-11-07 23:38:34,570][42004] Updated weights for policy 0, policy_version 7126 (0.0031) +[2024-11-07 23:38:38,835][41694] Fps is (10 sec: 6386.3, 60 sec: 6725.4, 300 sec: 6547.4). Total num frames: 29204480. Throughput: 0: 1695.9. Samples: 2297144. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:38:38,840][41694] Avg episode reward: [(0, '4.423')] +[2024-11-07 23:38:42,085][42004] Updated weights for policy 0, policy_version 7136 (0.0024) +[2024-11-07 23:38:42,932][41694] Fps is (10 sec: 6144.8, 60 sec: 6758.4, 300 sec: 6525.8). Total num frames: 29233152. Throughput: 0: 1665.6. Samples: 2304706. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:38:42,934][41694] Avg episode reward: [(0, '4.462')] +[2024-11-07 23:38:47,366][42004] Updated weights for policy 0, policy_version 7146 (0.0023) +[2024-11-07 23:38:47,931][41694] Fps is (10 sec: 7654.7, 60 sec: 6826.7, 300 sec: 6539.7). Total num frames: 29274112. Throughput: 0: 1684.8. Samples: 2310538. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:38:47,934][41694] Avg episode reward: [(0, '4.296')] +[2024-11-07 23:38:52,635][42004] Updated weights for policy 0, policy_version 7156 (0.0025) +[2024-11-07 23:38:52,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6894.9, 300 sec: 6567.5). Total num frames: 29310976. Throughput: 0: 1789.6. Samples: 2322092. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:38:52,935][41694] Avg episode reward: [(0, '4.236')] +[2024-11-07 23:38:57,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6894.9, 300 sec: 6581.4). Total num frames: 29347840. Throughput: 0: 1790.4. Samples: 2333518. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:38:57,933][41694] Avg episode reward: [(0, '4.367')] +[2024-11-07 23:38:58,183][42004] Updated weights for policy 0, policy_version 7166 (0.0029) +[2024-11-07 23:39:02,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6826.7, 300 sec: 6609.2). Total num frames: 29376512. Throughput: 0: 1765.2. Samples: 2338134. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:39:02,933][41694] Avg episode reward: [(0, '4.415')] +[2024-11-07 23:39:04,730][42004] Updated weights for policy 0, policy_version 7176 (0.0026) +[2024-11-07 23:39:07,931][41694] Fps is (10 sec: 6963.4, 60 sec: 7168.0, 300 sec: 6636.9). Total num frames: 29417472. Throughput: 0: 1739.2. Samples: 2348220. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:39:07,933][41694] Avg episode reward: [(0, '4.390')] +[2024-11-07 23:39:10,452][42004] Updated weights for policy 0, policy_version 7186 (0.0032) +[2024-11-07 23:39:12,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6826.7, 300 sec: 6581.4). Total num frames: 29437952. Throughput: 0: 1705.8. Samples: 2357196. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:39:12,933][41694] Avg episode reward: [(0, '4.447')] +[2024-11-07 23:39:17,779][42004] Updated weights for policy 0, policy_version 7196 (0.0041) +[2024-11-07 23:39:17,932][41694] Fps is (10 sec: 5734.1, 60 sec: 6826.6, 300 sec: 6595.2). Total num frames: 29474816. Throughput: 0: 1675.8. Samples: 2361162. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:39:17,934][41694] Avg episode reward: [(0, '4.473')] +[2024-11-07 23:39:22,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6826.6, 300 sec: 6595.2). Total num frames: 29511680. Throughput: 0: 1704.5. Samples: 2372306. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:39:22,934][41694] Avg episode reward: [(0, '4.372')] +[2024-11-07 23:39:23,161][42004] Updated weights for policy 0, policy_version 7206 (0.0029) +[2024-11-07 23:39:27,931][41694] Fps is (10 sec: 7373.2, 60 sec: 6894.9, 300 sec: 6609.1). Total num frames: 29548544. Throughput: 0: 1759.6. Samples: 2383888. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:39:27,934][41694] Avg episode reward: [(0, '4.267')] +[2024-11-07 23:39:28,544][42004] Updated weights for policy 0, policy_version 7216 (0.0035) +[2024-11-07 23:39:32,934][41694] Fps is (10 sec: 7371.1, 60 sec: 6894.8, 300 sec: 6623.0). Total num frames: 29585408. Throughput: 0: 1752.9. Samples: 2389424. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:39:32,937][41694] Avg episode reward: [(0, '4.498')] +[2024-11-07 23:39:34,526][42004] Updated weights for policy 0, policy_version 7226 (0.0034) +[2024-11-07 23:39:37,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7069.6, 300 sec: 6678.6). Total num frames: 29622272. Throughput: 0: 1719.0. Samples: 2399448. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:39:37,934][41694] Avg episode reward: [(0, '4.575')] +[2024-11-07 23:39:37,950][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000007232_29622272.pth... +[2024-11-07 23:39:38,101][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000006837_28004352.pth +[2024-11-07 23:39:40,531][42004] Updated weights for policy 0, policy_version 7236 (0.0035) +[2024-11-07 23:39:42,932][41694] Fps is (10 sec: 6964.6, 60 sec: 7031.4, 300 sec: 6678.5). Total num frames: 29655040. Throughput: 0: 1698.1. Samples: 2409932. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:39:42,933][41694] Avg episode reward: [(0, '4.399')] +[2024-11-07 23:39:47,917][42004] Updated weights for policy 0, policy_version 7246 (0.0036) +[2024-11-07 23:39:47,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6758.4, 300 sec: 6636.9). Total num frames: 29679616. Throughput: 0: 1716.7. Samples: 2415386. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:39:47,933][41694] Avg episode reward: [(0, '4.479')] +[2024-11-07 23:39:52,931][41694] Fps is (10 sec: 5734.8, 60 sec: 6690.1, 300 sec: 6623.0). Total num frames: 29712384. Throughput: 0: 1664.1. Samples: 2423106. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:39:52,934][41694] Avg episode reward: [(0, '4.511')] +[2024-11-07 23:39:53,506][42004] Updated weights for policy 0, policy_version 7256 (0.0028) +[2024-11-07 23:39:57,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 29753344. Throughput: 0: 1715.4. Samples: 2434390. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:39:57,938][41694] Avg episode reward: [(0, '4.313')] +[2024-11-07 23:39:58,925][42004] Updated weights for policy 0, policy_version 7266 (0.0033) +[2024-11-07 23:40:02,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6664.7). Total num frames: 29786112. Throughput: 0: 1743.3. Samples: 2439608. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:40:02,933][41694] Avg episode reward: [(0, '4.306')] +[2024-11-07 23:40:05,201][42004] Updated weights for policy 0, policy_version 7276 (0.0036) +[2024-11-07 23:40:07,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 29818880. Throughput: 0: 1711.8. Samples: 2449336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 23:40:07,933][41694] Avg episode reward: [(0, '4.446')] +[2024-11-07 23:40:11,719][42004] Updated weights for policy 0, policy_version 7286 (0.0036) +[2024-11-07 23:40:12,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6894.9, 300 sec: 6720.2). Total num frames: 29851648. Throughput: 0: 1670.8. Samples: 2459072. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 23:40:12,935][41694] Avg episode reward: [(0, '4.376')] +[2024-11-07 23:40:17,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6758.5, 300 sec: 6706.3). Total num frames: 29880320. Throughput: 0: 1650.3. Samples: 2463684. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:40:17,934][41694] Avg episode reward: [(0, '4.472')] +[2024-11-07 23:40:17,990][42004] Updated weights for policy 0, policy_version 7296 (0.0028) +[2024-11-07 23:40:22,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6553.6, 300 sec: 6678.6). Total num frames: 29904896. Throughput: 0: 1589.1. Samples: 2470958. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:40:22,936][41694] Avg episode reward: [(0, '4.443')] +[2024-11-07 23:40:25,770][42004] Updated weights for policy 0, policy_version 7306 (0.0029) +[2024-11-07 23:40:27,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6485.3, 300 sec: 6664.7). Total num frames: 29937664. Throughput: 0: 1583.5. Samples: 2481190. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:40:27,934][41694] Avg episode reward: [(0, '4.312')] +[2024-11-07 23:40:31,630][42004] Updated weights for policy 0, policy_version 7316 (0.0033) +[2024-11-07 23:40:32,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6417.3, 300 sec: 6678.6). Total num frames: 29970432. Throughput: 0: 1581.0. Samples: 2486532. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:40:32,934][41694] Avg episode reward: [(0, '4.392')] +[2024-11-07 23:40:37,935][41694] Fps is (10 sec: 6551.6, 60 sec: 6348.5, 300 sec: 6678.5). Total num frames: 30003200. Throughput: 0: 1598.4. Samples: 2495040. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:40:37,936][41694] Avg episode reward: [(0, '4.268')] +[2024-11-07 23:40:38,340][42004] Updated weights for policy 0, policy_version 7326 (0.0037) +[2024-11-07 23:40:42,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6348.8, 300 sec: 6710.4). Total num frames: 30035968. Throughput: 0: 1569.4. Samples: 2505012. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:40:42,934][41694] Avg episode reward: [(0, '4.261')] +[2024-11-07 23:40:44,978][42004] Updated weights for policy 0, policy_version 7336 (0.0027) +[2024-11-07 23:40:47,931][41694] Fps is (10 sec: 6555.7, 60 sec: 6485.3, 300 sec: 6692.4). Total num frames: 30068736. Throughput: 0: 1557.0. Samples: 2509672. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:40:47,933][41694] Avg episode reward: [(0, '4.356')] +[2024-11-07 23:40:50,471][42004] Updated weights for policy 0, policy_version 7346 (0.0024) +[2024-11-07 23:40:52,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6553.6, 300 sec: 6706.3). Total num frames: 30105600. Throughput: 0: 1589.9. Samples: 2520880. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:40:52,933][41694] Avg episode reward: [(0, '4.497')] +[2024-11-07 23:40:57,783][42004] Updated weights for policy 0, policy_version 7356 (0.0024) +[2024-11-07 23:40:57,932][41694] Fps is (10 sec: 6143.8, 60 sec: 6280.5, 300 sec: 6678.6). Total num frames: 30130176. Throughput: 0: 1554.0. Samples: 2529004. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:40:57,934][41694] Avg episode reward: [(0, '4.319')] +[2024-11-07 23:41:02,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6664.7). Total num frames: 30162944. Throughput: 0: 1565.9. Samples: 2534150. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:41:02,934][41694] Avg episode reward: [(0, '4.352')] +[2024-11-07 23:41:03,770][42004] Updated weights for policy 0, policy_version 7366 (0.0035) +[2024-11-07 23:41:07,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6692.5). Total num frames: 30195712. Throughput: 0: 1623.1. Samples: 2543998. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:41:07,934][41694] Avg episode reward: [(0, '4.331')] +[2024-11-07 23:41:11,132][42004] Updated weights for policy 0, policy_version 7376 (0.0036) +[2024-11-07 23:41:12,937][41694] Fps is (10 sec: 5733.4, 60 sec: 6143.8, 300 sec: 6664.6). Total num frames: 30220288. Throughput: 0: 1572.6. Samples: 2551960. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 23:41:12,940][41694] Avg episode reward: [(0, '4.510')] +[2024-11-07 23:41:17,648][42004] Updated weights for policy 0, policy_version 7386 (0.0028) +[2024-11-07 23:41:17,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6212.3, 300 sec: 6706.3). Total num frames: 30253056. Throughput: 0: 1545.7. Samples: 2556088. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:41:17,933][41694] Avg episode reward: [(0, '4.291')] +[2024-11-07 23:41:22,932][41694] Fps is (10 sec: 6554.8, 60 sec: 6348.8, 300 sec: 6692.4). Total num frames: 30285824. Throughput: 0: 1585.5. Samples: 2566384. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:41:22,936][41694] Avg episode reward: [(0, '4.512')] +[2024-11-07 23:41:23,443][42004] Updated weights for policy 0, policy_version 7396 (0.0029) +[2024-11-07 23:41:29,096][41694] Fps is (10 sec: 6236.8, 60 sec: 6294.9, 300 sec: 6666.1). Total num frames: 30322688. Throughput: 0: 1579.8. Samples: 2577942. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:41:29,098][41694] Avg episode reward: [(0, '4.493')] +[2024-11-07 23:41:30,630][42004] Updated weights for policy 0, policy_version 7406 (0.0031) +[2024-11-07 23:41:32,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6280.6, 300 sec: 6636.9). Total num frames: 30347264. Throughput: 0: 1566.0. Samples: 2580144. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:41:32,933][41694] Avg episode reward: [(0, '4.505')] +[2024-11-07 23:41:36,136][42004] Updated weights for policy 0, policy_version 7416 (0.0030) +[2024-11-07 23:41:37,931][41694] Fps is (10 sec: 7417.6, 60 sec: 6417.4, 300 sec: 6651.0). Total num frames: 30388224. Throughput: 0: 1567.7. Samples: 2591426. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:41:37,934][41694] Avg episode reward: [(0, '4.476')] +[2024-11-07 23:41:37,946][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000007419_30388224.pth... +[2024-11-07 23:41:38,089][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000007030_28794880.pth +[2024-11-07 23:41:41,616][42004] Updated weights for policy 0, policy_version 7426 (0.0026) +[2024-11-07 23:41:42,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6485.4, 300 sec: 6678.6). Total num frames: 30425088. Throughput: 0: 1633.2. Samples: 2602498. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:41:42,933][41694] Avg episode reward: [(0, '4.514')] +[2024-11-07 23:41:47,462][42004] Updated weights for policy 0, policy_version 7436 (0.0039) +[2024-11-07 23:41:47,932][41694] Fps is (10 sec: 6962.7, 60 sec: 6485.3, 300 sec: 6664.7). Total num frames: 30457856. Throughput: 0: 1637.4. Samples: 2607832. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:41:47,935][41694] Avg episode reward: [(0, '4.175')] +[2024-11-07 23:41:52,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6417.0, 300 sec: 6692.4). Total num frames: 30490624. Throughput: 0: 1636.4. Samples: 2617638. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:41:52,934][41694] Avg episode reward: [(0, '4.524')] +[2024-11-07 23:41:53,486][42004] Updated weights for policy 0, policy_version 7446 (0.0041) +[2024-11-07 23:41:57,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6706.3). Total num frames: 30531584. Throughput: 0: 1714.3. Samples: 2629100. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:41:57,934][41694] Avg episode reward: [(0, '4.306')] +[2024-11-07 23:41:58,736][42004] Updated weights for policy 0, policy_version 7456 (0.0030) +[2024-11-07 23:42:03,170][41694] Fps is (10 sec: 6000.9, 60 sec: 6459.7, 300 sec: 6645.4). Total num frames: 30552064. Throughput: 0: 1735.1. Samples: 2634582. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:42:03,172][41694] Avg episode reward: [(0, '4.232')] +[2024-11-07 23:42:06,492][42004] Updated weights for policy 0, policy_version 7466 (0.0029) +[2024-11-07 23:42:07,931][41694] Fps is (10 sec: 5734.7, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 30588928. Throughput: 0: 1676.6. Samples: 2641830. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:42:07,936][41694] Avg episode reward: [(0, '4.475')] +[2024-11-07 23:42:12,091][42004] Updated weights for policy 0, policy_version 7476 (0.0027) +[2024-11-07 23:42:12,933][41694] Fps is (10 sec: 7552.2, 60 sec: 6758.5, 300 sec: 6650.8). Total num frames: 30625792. Throughput: 0: 1706.3. Samples: 2652738. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:42:12,935][41694] Avg episode reward: [(0, '4.405')] +[2024-11-07 23:42:17,356][42004] Updated weights for policy 0, policy_version 7486 (0.0027) +[2024-11-07 23:42:17,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6894.9, 300 sec: 6678.6). Total num frames: 30666752. Throughput: 0: 1744.3. Samples: 2658640. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:42:17,934][41694] Avg episode reward: [(0, '4.454')] +[2024-11-07 23:42:22,932][41694] Fps is (10 sec: 7373.4, 60 sec: 6894.9, 300 sec: 6664.7). Total num frames: 30699520. Throughput: 0: 1737.9. Samples: 2669632. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:42:22,944][41694] Avg episode reward: [(0, '4.489')] +[2024-11-07 23:42:23,446][42004] Updated weights for policy 0, policy_version 7496 (0.0036) +[2024-11-07 23:42:27,932][41694] Fps is (10 sec: 6962.9, 60 sec: 7031.4, 300 sec: 6706.3). Total num frames: 30736384. Throughput: 0: 1720.3. Samples: 2679912. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:42:27,935][41694] Avg episode reward: [(0, '4.171')] +[2024-11-07 23:42:28,892][42004] Updated weights for policy 0, policy_version 7506 (0.0028) +[2024-11-07 23:42:32,931][41694] Fps is (10 sec: 7373.0, 60 sec: 7099.7, 300 sec: 6706.3). Total num frames: 30773248. Throughput: 0: 1728.6. Samples: 2685616. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:42:32,933][41694] Avg episode reward: [(0, '4.259')] +[2024-11-07 23:42:34,351][42004] Updated weights for policy 0, policy_version 7516 (0.0031) +[2024-11-07 23:42:37,931][41694] Fps is (10 sec: 6144.4, 60 sec: 6826.7, 300 sec: 6678.6). Total num frames: 30797824. Throughput: 0: 1739.8. Samples: 2695928. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:42:37,933][41694] Avg episode reward: [(0, '4.165')] +[2024-11-07 23:42:41,464][42004] Updated weights for policy 0, policy_version 7526 (0.0024) +[2024-11-07 23:42:42,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6826.6, 300 sec: 6678.6). Total num frames: 30834688. Throughput: 0: 1688.6. Samples: 2705086. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:42:42,933][41694] Avg episode reward: [(0, '4.362')] +[2024-11-07 23:42:46,980][42004] Updated weights for policy 0, policy_version 7536 (0.0031) +[2024-11-07 23:42:47,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6895.0, 300 sec: 6692.4). Total num frames: 30871552. Throughput: 0: 1700.7. Samples: 2710710. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:42:47,933][41694] Avg episode reward: [(0, '4.416')] +[2024-11-07 23:42:52,268][42004] Updated weights for policy 0, policy_version 7546 (0.0029) +[2024-11-07 23:42:52,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7031.5, 300 sec: 6706.3). Total num frames: 30912512. Throughput: 0: 1786.5. Samples: 2722222. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:42:52,933][41694] Avg episode reward: [(0, '4.207')] +[2024-11-07 23:42:57,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6895.0, 300 sec: 6706.3). Total num frames: 30945280. Throughput: 0: 1776.5. Samples: 2732678. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:42:57,934][41694] Avg episode reward: [(0, '4.332')] +[2024-11-07 23:42:58,160][42004] Updated weights for policy 0, policy_version 7556 (0.0036) +[2024-11-07 23:43:02,931][41694] Fps is (10 sec: 6553.7, 60 sec: 7128.1, 300 sec: 6748.0). Total num frames: 30978048. Throughput: 0: 1770.9. Samples: 2738332. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:43:02,936][41694] Avg episode reward: [(0, '4.550')] +[2024-11-07 23:43:04,028][42004] Updated weights for policy 0, policy_version 7566 (0.0032) +[2024-11-07 23:43:07,932][41694] Fps is (10 sec: 7372.9, 60 sec: 7168.0, 300 sec: 6748.0). Total num frames: 31019008. Throughput: 0: 1763.9. Samples: 2749006. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:43:07,933][41694] Avg episode reward: [(0, '4.286')] +[2024-11-07 23:43:09,315][42004] Updated weights for policy 0, policy_version 7576 (0.0032) +[2024-11-07 23:43:12,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6895.1, 300 sec: 6692.5). Total num frames: 31039488. Throughput: 0: 1703.1. Samples: 2756552. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:43:12,933][41694] Avg episode reward: [(0, '4.395')] +[2024-11-07 23:43:16,856][42004] Updated weights for policy 0, policy_version 7586 (0.0036) +[2024-11-07 23:43:17,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6894.9, 300 sec: 6706.3). Total num frames: 31080448. Throughput: 0: 1696.9. Samples: 2761978. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:43:17,934][41694] Avg episode reward: [(0, '4.625')] +[2024-11-07 23:43:22,466][42004] Updated weights for policy 0, policy_version 7596 (0.0025) +[2024-11-07 23:43:22,932][41694] Fps is (10 sec: 7372.2, 60 sec: 6894.9, 300 sec: 6706.3). Total num frames: 31113216. Throughput: 0: 1723.0. Samples: 2773464. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:43:22,935][41694] Avg episode reward: [(0, '4.416')] +[2024-11-07 23:43:27,898][42004] Updated weights for policy 0, policy_version 7606 (0.0041) +[2024-11-07 23:43:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.3, 300 sec: 6720.2). Total num frames: 31154176. Throughput: 0: 1768.6. Samples: 2784674. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:43:27,934][41694] Avg episode reward: [(0, '4.370')] +[2024-11-07 23:43:32,933][41694] Fps is (10 sec: 7372.5, 60 sec: 6894.8, 300 sec: 6740.8). Total num frames: 31186944. Throughput: 0: 1751.6. Samples: 2789532. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:43:32,934][41694] Avg episode reward: [(0, '4.464')] +[2024-11-07 23:43:33,794][42004] Updated weights for policy 0, policy_version 7616 (0.0048) +[2024-11-07 23:43:37,933][41694] Fps is (10 sec: 6962.5, 60 sec: 7099.6, 300 sec: 6748.0). Total num frames: 31223808. Throughput: 0: 1734.4. Samples: 2800270. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:43:37,935][41694] Avg episode reward: [(0, '4.368')] +[2024-11-07 23:43:37,945][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000007623_31223808.pth... +[2024-11-07 23:43:38,079][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000007232_29622272.pth +[2024-11-07 23:43:39,221][42004] Updated weights for policy 0, policy_version 7626 (0.0029) +[2024-11-07 23:43:42,932][41694] Fps is (10 sec: 7373.7, 60 sec: 7099.7, 300 sec: 6734.1). Total num frames: 31260672. Throughput: 0: 1744.9. Samples: 2811198. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:43:42,933][41694] Avg episode reward: [(0, '4.580')] +[2024-11-07 23:43:46,907][42004] Updated weights for policy 0, policy_version 7636 (0.0040) +[2024-11-07 23:43:47,932][41694] Fps is (10 sec: 5734.7, 60 sec: 6826.6, 300 sec: 6678.6). Total num frames: 31281152. Throughput: 0: 1689.4. Samples: 2814356. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:43:47,934][41694] Avg episode reward: [(0, '4.323')] +[2024-11-07 23:43:52,286][42004] Updated weights for policy 0, policy_version 7646 (0.0027) +[2024-11-07 23:43:52,932][41694] Fps is (10 sec: 6143.7, 60 sec: 6826.6, 300 sec: 6692.4). Total num frames: 31322112. Throughput: 0: 1682.7. Samples: 2824728. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:43:52,934][41694] Avg episode reward: [(0, '4.393')] +[2024-11-07 23:43:57,493][42004] Updated weights for policy 0, policy_version 7656 (0.0024) +[2024-11-07 23:43:57,931][41694] Fps is (10 sec: 7782.9, 60 sec: 6895.0, 300 sec: 6720.2). Total num frames: 31358976. Throughput: 0: 1772.1. Samples: 2836294. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:43:57,933][41694] Avg episode reward: [(0, '4.429')] +[2024-11-07 23:44:02,932][41694] Fps is (10 sec: 6963.5, 60 sec: 6894.9, 300 sec: 6692.4). Total num frames: 31391744. Throughput: 0: 1780.1. Samples: 2842084. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:44:02,933][41694] Avg episode reward: [(0, '4.272')] +[2024-11-07 23:44:03,719][42004] Updated weights for policy 0, policy_version 7666 (0.0042) +[2024-11-07 23:44:07,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6826.7, 300 sec: 6748.0). Total num frames: 31428608. Throughput: 0: 1731.7. Samples: 2851390. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:44:07,933][41694] Avg episode reward: [(0, '4.273')] +[2024-11-07 23:44:09,379][42004] Updated weights for policy 0, policy_version 7676 (0.0030) +[2024-11-07 23:44:12,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7099.7, 300 sec: 6748.0). Total num frames: 31465472. Throughput: 0: 1737.1. Samples: 2862844. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:44:12,933][41694] Avg episode reward: [(0, '4.122')] +[2024-11-07 23:44:15,367][42004] Updated weights for policy 0, policy_version 7686 (0.0028) +[2024-11-07 23:44:19,627][41694] Fps is (10 sec: 5953.8, 60 sec: 6771.9, 300 sec: 6695.6). Total num frames: 31498240. Throughput: 0: 1667.2. Samples: 2867380. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:44:19,629][41694] Avg episode reward: [(0, '4.429')] +[2024-11-07 23:44:22,617][42004] Updated weights for policy 0, policy_version 7696 (0.0032) +[2024-11-07 23:44:22,934][41694] Fps is (10 sec: 5733.2, 60 sec: 6826.5, 300 sec: 6692.4). Total num frames: 31522816. Throughput: 0: 1663.2. Samples: 2875114. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:44:22,938][41694] Avg episode reward: [(0, '4.558')] +[2024-11-07 23:44:27,932][41694] Fps is (10 sec: 7398.2, 60 sec: 6758.4, 300 sec: 6692.5). Total num frames: 31559680. Throughput: 0: 1666.8. Samples: 2886204. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:44:27,934][41694] Avg episode reward: [(0, '4.573')] +[2024-11-07 23:44:28,208][42004] Updated weights for policy 0, policy_version 7706 (0.0028) +[2024-11-07 23:44:32,931][41694] Fps is (10 sec: 7374.4, 60 sec: 6826.8, 300 sec: 6692.5). Total num frames: 31596544. Throughput: 0: 1722.4. Samples: 2891864. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:44:32,933][41694] Avg episode reward: [(0, '4.401')] +[2024-11-07 23:44:33,667][42004] Updated weights for policy 0, policy_version 7716 (0.0026) +[2024-11-07 23:44:37,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.8, 300 sec: 6706.3). Total num frames: 31633408. Throughput: 0: 1741.3. Samples: 2903084. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:44:37,934][41694] Avg episode reward: [(0, '4.566')] +[2024-11-07 23:44:39,457][42004] Updated weights for policy 0, policy_version 7726 (0.0034) +[2024-11-07 23:44:42,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 31666176. Throughput: 0: 1710.0. Samples: 2913242. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:44:42,934][41694] Avg episode reward: [(0, '4.174')] +[2024-11-07 23:44:45,460][42004] Updated weights for policy 0, policy_version 7736 (0.0035) +[2024-11-07 23:44:47,932][41694] Fps is (10 sec: 6963.1, 60 sec: 7031.5, 300 sec: 6748.0). Total num frames: 31703040. Throughput: 0: 1694.3. Samples: 2918326. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:44:47,938][41694] Avg episode reward: [(0, '4.470')] +[2024-11-07 23:44:50,879][42004] Updated weights for policy 0, policy_version 7746 (0.0036) +[2024-11-07 23:44:53,714][41694] Fps is (10 sec: 6077.8, 60 sec: 6738.8, 300 sec: 6688.6). Total num frames: 31731712. Throughput: 0: 1712.0. Samples: 2929768. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:44:53,716][41694] Avg episode reward: [(0, '4.537')] +[2024-11-07 23:44:57,933][41694] Fps is (10 sec: 6144.0, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 31764480. Throughput: 0: 1664.2. Samples: 2937734. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:44:57,935][41694] Avg episode reward: [(0, '4.537')] +[2024-11-07 23:44:57,989][42004] Updated weights for policy 0, policy_version 7756 (0.0039) +[2024-11-07 23:45:02,931][41694] Fps is (10 sec: 7554.7, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 31801344. Throughput: 0: 1753.6. Samples: 2943320. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:45:02,935][41694] Avg episode reward: [(0, '4.457')] +[2024-11-07 23:45:03,816][42004] Updated weights for policy 0, policy_version 7766 (0.0033) +[2024-11-07 23:45:07,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6826.7, 300 sec: 6734.1). Total num frames: 31838208. Throughput: 0: 1751.3. Samples: 2953920. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:45:07,933][41694] Avg episode reward: [(0, '4.341')] +[2024-11-07 23:45:09,678][42004] Updated weights for policy 0, policy_version 7776 (0.0030) +[2024-11-07 23:45:12,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6690.1, 300 sec: 6734.1). Total num frames: 31866880. Throughput: 0: 1717.7. Samples: 2963500. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:45:12,934][41694] Avg episode reward: [(0, '4.306')] +[2024-11-07 23:45:16,644][42004] Updated weights for policy 0, policy_version 7786 (0.0034) +[2024-11-07 23:45:17,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6884.7, 300 sec: 6761.9). Total num frames: 31899648. Throughput: 0: 1685.6. Samples: 2967718. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:45:17,933][41694] Avg episode reward: [(0, '4.417')] +[2024-11-07 23:45:22,543][42004] Updated weights for policy 0, policy_version 7796 (0.0031) +[2024-11-07 23:45:22,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6826.9, 300 sec: 6761.9). Total num frames: 31932416. Throughput: 0: 1657.6. Samples: 2977674. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:45:22,933][41694] Avg episode reward: [(0, '4.477')] +[2024-11-07 23:45:27,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 31952896. Throughput: 0: 1623.9. Samples: 2986316. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:45:27,936][41694] Avg episode reward: [(0, '4.243')] +[2024-11-07 23:45:30,334][42004] Updated weights for policy 0, policy_version 7806 (0.0034) +[2024-11-07 23:45:32,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6553.6, 300 sec: 6734.2). Total num frames: 31989760. Throughput: 0: 1599.4. Samples: 2990300. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:45:32,933][41694] Avg episode reward: [(0, '4.282')] +[2024-11-07 23:45:36,131][42004] Updated weights for policy 0, policy_version 7816 (0.0026) +[2024-11-07 23:45:37,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6553.6, 300 sec: 6748.0). Total num frames: 32026624. Throughput: 0: 1609.3. Samples: 3000926. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:45:37,934][41694] Avg episode reward: [(0, '4.502')] +[2024-11-07 23:45:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000007819_32026624.pth... +[2024-11-07 23:45:38,086][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000007419_30388224.pth +[2024-11-07 23:45:41,991][42004] Updated weights for policy 0, policy_version 7826 (0.0032) +[2024-11-07 23:45:42,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6553.6, 300 sec: 6748.0). Total num frames: 32059392. Throughput: 0: 1636.9. Samples: 3011396. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:45:42,933][41694] Avg episode reward: [(0, '4.416')] +[2024-11-07 23:45:47,932][41694] Fps is (10 sec: 6143.8, 60 sec: 6417.1, 300 sec: 6720.2). Total num frames: 32088064. Throughput: 0: 1613.7. Samples: 3015938. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:45:47,934][41694] Avg episode reward: [(0, '4.494')] +[2024-11-07 23:45:48,632][42004] Updated weights for policy 0, policy_version 7836 (0.0033) +[2024-11-07 23:45:52,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6640.2, 300 sec: 6761.9). Total num frames: 32124928. Throughput: 0: 1597.4. Samples: 3025804. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:45:52,933][41694] Avg episode reward: [(0, '4.568')] +[2024-11-07 23:45:54,267][42004] Updated weights for policy 0, policy_version 7846 (0.0028) +[2024-11-07 23:45:57,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6775.8). Total num frames: 32161792. Throughput: 0: 1629.9. Samples: 3036846. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:45:57,933][41694] Avg episode reward: [(0, '4.341')] +[2024-11-07 23:45:59,794][42004] Updated weights for policy 0, policy_version 7856 (0.0035) +[2024-11-07 23:46:02,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6348.8, 300 sec: 6734.1). Total num frames: 32182272. Throughput: 0: 1658.6. Samples: 3042356. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:46:02,934][41694] Avg episode reward: [(0, '4.543')] +[2024-11-07 23:46:07,527][42004] Updated weights for policy 0, policy_version 7866 (0.0028) +[2024-11-07 23:46:07,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6348.8, 300 sec: 6775.8). Total num frames: 32219136. Throughput: 0: 1595.7. Samples: 3049482. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:46:07,934][41694] Avg episode reward: [(0, '4.320')] +[2024-11-07 23:46:12,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6417.1, 300 sec: 6775.8). Total num frames: 32251904. Throughput: 0: 1632.4. Samples: 3059774. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:46:12,934][41694] Avg episode reward: [(0, '4.403')] +[2024-11-07 23:46:13,665][42004] Updated weights for policy 0, policy_version 7876 (0.0036) +[2024-11-07 23:46:17,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6485.3, 300 sec: 6789.6). Total num frames: 32288768. Throughput: 0: 1655.6. Samples: 3064800. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:46:17,934][41694] Avg episode reward: [(0, '4.452')] +[2024-11-07 23:46:19,633][42004] Updated weights for policy 0, policy_version 7886 (0.0034) +[2024-11-07 23:46:22,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6788.7). Total num frames: 32317440. Throughput: 0: 1639.5. Samples: 3074706. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:46:22,934][41694] Avg episode reward: [(0, '4.532')] +[2024-11-07 23:46:25,643][42004] Updated weights for policy 0, policy_version 7896 (0.0022) +[2024-11-07 23:46:27,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 32358400. Throughput: 0: 1646.0. Samples: 3085464. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:46:27,934][41694] Avg episode reward: [(0, '4.360')] +[2024-11-07 23:46:31,087][42004] Updated weights for policy 0, policy_version 7906 (0.0024) +[2024-11-07 23:46:32,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 32395264. Throughput: 0: 1674.2. Samples: 3091278. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:46:32,934][41694] Avg episode reward: [(0, '4.379')] +[2024-11-07 23:46:37,932][41694] Fps is (10 sec: 6143.8, 60 sec: 6553.6, 300 sec: 6761.9). Total num frames: 32419840. Throughput: 0: 1630.1. Samples: 3099158. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:46:37,934][41694] Avg episode reward: [(0, '4.242')] +[2024-11-07 23:46:38,389][42004] Updated weights for policy 0, policy_version 7916 (0.0023) +[2024-11-07 23:46:42,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6621.9, 300 sec: 6775.8). Total num frames: 32456704. Throughput: 0: 1636.4. Samples: 3110484. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:46:42,933][41694] Avg episode reward: [(0, '4.490')] +[2024-11-07 23:46:43,769][42004] Updated weights for policy 0, policy_version 7926 (0.0030) +[2024-11-07 23:46:47,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 32493568. Throughput: 0: 1640.8. Samples: 3116194. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:46:47,934][41694] Avg episode reward: [(0, '4.566')] +[2024-11-07 23:46:49,225][42004] Updated weights for policy 0, policy_version 7936 (0.0027) +[2024-11-07 23:46:52,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 32530432. Throughput: 0: 1731.4. Samples: 3127394. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:46:52,933][41694] Avg episode reward: [(0, '4.552')] +[2024-11-07 23:46:55,146][42004] Updated weights for policy 0, policy_version 7946 (0.0034) +[2024-11-07 23:46:57,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6836.8). Total num frames: 32567296. Throughput: 0: 1734.8. Samples: 3137842. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:46:57,934][41694] Avg episode reward: [(0, '4.376')] +[2024-11-07 23:47:00,443][42004] Updated weights for policy 0, policy_version 7956 (0.0022) +[2024-11-07 23:47:02,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7031.5, 300 sec: 6831.3). Total num frames: 32604160. Throughput: 0: 1754.9. Samples: 3143770. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:47:02,933][41694] Avg episode reward: [(0, '4.370')] +[2024-11-07 23:47:05,949][42004] Updated weights for policy 0, policy_version 7966 (0.0032) +[2024-11-07 23:47:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7031.5, 300 sec: 6831.3). Total num frames: 32641024. Throughput: 0: 1780.3. Samples: 3154818. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:47:07,934][41694] Avg episode reward: [(0, '4.354')] +[2024-11-07 23:47:12,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6826.6, 300 sec: 6761.9). Total num frames: 32661504. Throughput: 0: 1702.4. Samples: 3162074. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:47:12,934][41694] Avg episode reward: [(0, '4.476')] +[2024-11-07 23:47:13,568][42004] Updated weights for policy 0, policy_version 7976 (0.0023) +[2024-11-07 23:47:17,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6894.9, 300 sec: 6789.6). Total num frames: 32702464. Throughput: 0: 1699.7. Samples: 3167766. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:47:17,936][41694] Avg episode reward: [(0, '4.345')] +[2024-11-07 23:47:18,927][42004] Updated weights for policy 0, policy_version 7986 (0.0030) +[2024-11-07 23:47:22,931][41694] Fps is (10 sec: 7782.6, 60 sec: 7031.5, 300 sec: 6789.7). Total num frames: 32739328. Throughput: 0: 1776.9. Samples: 3179118. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:47:22,933][41694] Avg episode reward: [(0, '4.454')] +[2024-11-07 23:47:24,285][42004] Updated weights for policy 0, policy_version 7996 (0.0031) +[2024-11-07 23:47:27,933][41694] Fps is (10 sec: 6962.5, 60 sec: 6894.8, 300 sec: 6775.7). Total num frames: 32772096. Throughput: 0: 1760.5. Samples: 3189708. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-07 23:47:27,935][41694] Avg episode reward: [(0, '4.373')] +[2024-11-07 23:47:30,354][42004] Updated weights for policy 0, policy_version 8006 (0.0026) +[2024-11-07 23:47:32,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6894.9, 300 sec: 6817.4). Total num frames: 32808960. Throughput: 0: 1753.9. Samples: 3195118. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:47:32,933][41694] Avg episode reward: [(0, '4.474')] +[2024-11-07 23:47:35,726][42004] Updated weights for policy 0, policy_version 8016 (0.0021) +[2024-11-07 23:47:37,931][41694] Fps is (10 sec: 7783.4, 60 sec: 7168.0, 300 sec: 6831.3). Total num frames: 32849920. Throughput: 0: 1755.6. Samples: 3206394. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:47:37,933][41694] Avg episode reward: [(0, '4.324')] +[2024-11-07 23:47:37,942][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000008020_32849920.pth... +[2024-11-07 23:47:38,077][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000007623_31223808.pth +[2024-11-07 23:47:41,125][42004] Updated weights for policy 0, policy_version 8026 (0.0017) +[2024-11-07 23:47:44,573][41694] Fps is (10 sec: 6333.1, 60 sec: 6910.6, 300 sec: 6779.7). Total num frames: 32882688. Throughput: 0: 1715.8. Samples: 3217870. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:47:44,576][41694] Avg episode reward: [(0, '4.555')] +[2024-11-07 23:47:47,931][41694] Fps is (10 sec: 6143.9, 60 sec: 6963.2, 300 sec: 6775.8). Total num frames: 32911360. Throughput: 0: 1693.4. Samples: 3219972. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:47:47,933][41694] Avg episode reward: [(0, '4.567')] +[2024-11-07 23:47:48,321][42004] Updated weights for policy 0, policy_version 8036 (0.0024) +[2024-11-07 23:47:52,932][41694] Fps is (10 sec: 7840.7, 60 sec: 6963.2, 300 sec: 6789.6). Total num frames: 32948224. Throughput: 0: 1708.0. Samples: 3231678. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:47:52,934][41694] Avg episode reward: [(0, '4.508')] +[2024-11-07 23:47:53,779][42004] Updated weights for policy 0, policy_version 8046 (0.0025) +[2024-11-07 23:47:57,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6826.7, 300 sec: 6775.8). Total num frames: 32976896. Throughput: 0: 1747.5. Samples: 3240712. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:47:57,934][41694] Avg episode reward: [(0, '4.339')] +[2024-11-07 23:48:02,475][42004] Updated weights for policy 0, policy_version 8056 (0.0048) +[2024-11-07 23:48:02,942][41694] Fps is (10 sec: 4910.1, 60 sec: 6552.4, 300 sec: 6706.1). Total num frames: 32997376. Throughput: 0: 1694.6. Samples: 3244042. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:48:02,944][41694] Avg episode reward: [(0, '4.196')] +[2024-11-07 23:48:07,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6485.4, 300 sec: 6748.0). Total num frames: 33030144. Throughput: 0: 1626.7. Samples: 3252320. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:48:07,933][41694] Avg episode reward: [(0, '4.502')] +[2024-11-07 23:48:08,599][42004] Updated weights for policy 0, policy_version 8066 (0.0037) +[2024-11-07 23:48:12,932][41694] Fps is (10 sec: 7380.6, 60 sec: 6826.7, 300 sec: 6748.0). Total num frames: 33071104. Throughput: 0: 1645.3. Samples: 3263744. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:48:12,934][41694] Avg episode reward: [(0, '4.220')] +[2024-11-07 23:48:14,010][42004] Updated weights for policy 0, policy_version 8076 (0.0028) +[2024-11-07 23:48:18,779][41694] Fps is (10 sec: 6419.1, 60 sec: 6529.6, 300 sec: 6714.8). Total num frames: 33099776. Throughput: 0: 1687.4. Samples: 3272482. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:48:18,781][41694] Avg episode reward: [(0, '4.239')] +[2024-11-07 23:48:21,375][42004] Updated weights for policy 0, policy_version 8086 (0.0027) +[2024-11-07 23:48:22,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6485.3, 300 sec: 6692.5). Total num frames: 33128448. Throughput: 0: 1566.9. Samples: 3276904. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:48:22,933][41694] Avg episode reward: [(0, '4.419')] +[2024-11-07 23:48:26,778][42004] Updated weights for policy 0, policy_version 8096 (0.0037) +[2024-11-07 23:48:27,932][41694] Fps is (10 sec: 7608.0, 60 sec: 6622.0, 300 sec: 6720.2). Total num frames: 33169408. Throughput: 0: 1625.7. Samples: 3288356. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:48:27,935][41694] Avg episode reward: [(0, '4.437')] +[2024-11-07 23:48:32,096][42004] Updated weights for policy 0, policy_version 8106 (0.0026) +[2024-11-07 23:48:32,932][41694] Fps is (10 sec: 7782.1, 60 sec: 6621.8, 300 sec: 6720.2). Total num frames: 33206272. Throughput: 0: 1644.8. Samples: 3293990. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:48:32,934][41694] Avg episode reward: [(0, '4.314')] +[2024-11-07 23:48:37,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 33243136. Throughput: 0: 1630.4. Samples: 3305046. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:48:37,933][41694] Avg episode reward: [(0, '4.398')] +[2024-11-07 23:48:37,937][42004] Updated weights for policy 0, policy_version 8116 (0.0033) +[2024-11-07 23:48:42,931][41694] Fps is (10 sec: 7373.1, 60 sec: 6808.2, 300 sec: 6775.8). Total num frames: 33280000. Throughput: 0: 1679.5. Samples: 3316290. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:48:42,933][41694] Avg episode reward: [(0, '4.651')] +[2024-11-07 23:48:43,195][42004] Updated weights for policy 0, policy_version 8126 (0.0024) +[2024-11-07 23:48:47,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 33316864. Throughput: 0: 1738.2. Samples: 3322242. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:48:47,934][41694] Avg episode reward: [(0, '4.344')] +[2024-11-07 23:48:48,556][42004] Updated weights for policy 0, policy_version 8136 (0.0029) +[2024-11-07 23:48:52,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 33341440. Throughput: 0: 1676.2. Samples: 3327748. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:48:52,933][41694] Avg episode reward: [(0, '4.258')] +[2024-11-07 23:48:55,735][42004] Updated weights for policy 0, policy_version 8146 (0.0023) +[2024-11-07 23:48:57,932][41694] Fps is (10 sec: 6553.8, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 33382400. Throughput: 0: 1727.7. Samples: 3341492. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:48:57,934][41694] Avg episode reward: [(0, '4.508')] +[2024-11-07 23:49:01,051][42004] Updated weights for policy 0, policy_version 8156 (0.0029) +[2024-11-07 23:49:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6964.4, 300 sec: 6734.1). Total num frames: 33415168. Throughput: 0: 1696.8. Samples: 3347402. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:49:02,935][41694] Avg episode reward: [(0, '4.520')] +[2024-11-07 23:49:06,616][42004] Updated weights for policy 0, policy_version 8166 (0.0039) +[2024-11-07 23:49:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7099.7, 300 sec: 6748.0). Total num frames: 33456128. Throughput: 0: 1809.1. Samples: 3358312. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:49:07,934][41694] Avg episode reward: [(0, '4.340')] +[2024-11-07 23:49:12,423][42004] Updated weights for policy 0, policy_version 8176 (0.0033) +[2024-11-07 23:49:12,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6963.2, 300 sec: 6787.0). Total num frames: 33488896. Throughput: 0: 1793.3. Samples: 3369056. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:49:12,933][41694] Avg episode reward: [(0, '4.183')] +[2024-11-07 23:49:17,787][42004] Updated weights for policy 0, policy_version 8186 (0.0032) +[2024-11-07 23:49:17,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7270.7, 300 sec: 6803.6). Total num frames: 33529856. Throughput: 0: 1793.2. Samples: 3374684. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:49:17,933][41694] Avg episode reward: [(0, '4.487')] +[2024-11-07 23:49:22,932][41694] Fps is (10 sec: 7781.7, 60 sec: 7304.4, 300 sec: 6803.5). Total num frames: 33566720. Throughput: 0: 1803.0. Samples: 3386184. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:49:22,937][41694] Avg episode reward: [(0, '4.369')] +[2024-11-07 23:49:23,190][42004] Updated weights for policy 0, policy_version 8196 (0.0036) +[2024-11-07 23:49:27,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6963.2, 300 sec: 6748.0). Total num frames: 33587200. Throughput: 0: 1729.6. Samples: 3394122. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:49:27,933][41694] Avg episode reward: [(0, '4.492')] +[2024-11-07 23:49:30,632][42004] Updated weights for policy 0, policy_version 8206 (0.0036) +[2024-11-07 23:49:32,932][41694] Fps is (10 sec: 6144.5, 60 sec: 7031.5, 300 sec: 6761.9). Total num frames: 33628160. Throughput: 0: 1713.2. Samples: 3399334. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:49:32,933][41694] Avg episode reward: [(0, '4.344')] +[2024-11-07 23:49:35,974][42004] Updated weights for policy 0, policy_version 8216 (0.0033) +[2024-11-07 23:49:37,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7031.5, 300 sec: 6775.8). Total num frames: 33665024. Throughput: 0: 1838.8. Samples: 3410496. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:49:37,933][41694] Avg episode reward: [(0, '4.340')] +[2024-11-07 23:49:37,943][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000008219_33665024.pth... +[2024-11-07 23:49:38,076][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000007819_32026624.pth +[2024-11-07 23:49:41,600][42004] Updated weights for policy 0, policy_version 8226 (0.0024) +[2024-11-07 23:49:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 6775.8). Total num frames: 33701888. Throughput: 0: 1785.7. Samples: 3421850. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:49:42,934][41694] Avg episode reward: [(0, '4.543')] +[2024-11-07 23:49:47,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6758.4, 300 sec: 6765.9). Total num frames: 33722368. Throughput: 0: 1729.1. Samples: 3425210. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:49:47,934][41694] Avg episode reward: [(0, '4.347')] +[2024-11-07 23:49:49,665][42004] Updated weights for policy 0, policy_version 8236 (0.0031) +[2024-11-07 23:49:52,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6894.9, 300 sec: 6748.0). Total num frames: 33755136. Throughput: 0: 1673.0. Samples: 3433598. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:49:52,935][41694] Avg episode reward: [(0, '4.368')] +[2024-11-07 23:49:55,445][42004] Updated weights for policy 0, policy_version 8246 (0.0022) +[2024-11-07 23:49:57,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6826.6, 300 sec: 6748.0). Total num frames: 33792000. Throughput: 0: 1673.6. Samples: 3444368. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:49:57,934][41694] Avg episode reward: [(0, '4.414')] +[2024-11-07 23:50:02,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6621.9, 300 sec: 6692.4). Total num frames: 33812480. Throughput: 0: 1653.2. Samples: 3449080. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:50:02,934][41694] Avg episode reward: [(0, '4.281')] +[2024-11-07 23:50:03,116][42004] Updated weights for policy 0, policy_version 8256 (0.0037) +[2024-11-07 23:50:07,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 33849344. Throughput: 0: 1565.7. Samples: 3456638. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:50:07,936][41694] Avg episode reward: [(0, '4.307')] +[2024-11-07 23:50:08,971][42004] Updated weights for policy 0, policy_version 8266 (0.0031) +[2024-11-07 23:50:12,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 33886208. Throughput: 0: 1637.2. Samples: 3467794. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:50:12,933][41694] Avg episode reward: [(0, '4.563')] +[2024-11-07 23:50:14,487][42004] Updated weights for policy 0, policy_version 8276 (0.0029) +[2024-11-07 23:50:17,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6553.6, 300 sec: 6748.0). Total num frames: 33923072. Throughput: 0: 1640.5. Samples: 3473156. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:50:17,934][41694] Avg episode reward: [(0, '4.498')] +[2024-11-07 23:50:20,331][42004] Updated weights for policy 0, policy_version 8286 (0.0021) +[2024-11-07 23:50:22,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6485.4, 300 sec: 6789.6). Total num frames: 33955840. Throughput: 0: 1623.2. Samples: 3483538. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:50:22,934][41694] Avg episode reward: [(0, '4.469')] +[2024-11-07 23:50:25,777][42004] Updated weights for policy 0, policy_version 8296 (0.0031) +[2024-11-07 23:50:27,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 33996800. Throughput: 0: 1628.4. Samples: 3495130. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:50:27,933][41694] Avg episode reward: [(0, '4.346')] +[2024-11-07 23:50:31,226][42004] Updated weights for policy 0, policy_version 8306 (0.0034) +[2024-11-07 23:50:32,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 34033664. Throughput: 0: 1675.9. Samples: 3500626. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:50:32,933][41694] Avg episode reward: [(0, '4.319')] +[2024-11-07 23:50:37,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6485.3, 300 sec: 6761.9). Total num frames: 34054144. Throughput: 0: 1649.6. Samples: 3507830. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:50:37,934][41694] Avg episode reward: [(0, '4.381')] +[2024-11-07 23:50:39,027][42004] Updated weights for policy 0, policy_version 8316 (0.0029) +[2024-11-07 23:50:42,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6417.1, 300 sec: 6775.8). Total num frames: 34086912. Throughput: 0: 1647.7. Samples: 3518514. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:50:42,934][41694] Avg episode reward: [(0, '4.419')] +[2024-11-07 23:50:44,714][42004] Updated weights for policy 0, policy_version 8326 (0.0027) +[2024-11-07 23:50:47,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6758.5, 300 sec: 6789.6). Total num frames: 34127872. Throughput: 0: 1663.9. Samples: 3523954. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:50:47,933][41694] Avg episode reward: [(0, '4.488')] +[2024-11-07 23:50:49,928][42004] Updated weights for policy 0, policy_version 8336 (0.0034) +[2024-11-07 23:50:52,936][41694] Fps is (10 sec: 7779.1, 60 sec: 6826.2, 300 sec: 6789.5). Total num frames: 34164736. Throughput: 0: 1759.0. Samples: 3535802. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:50:52,937][41694] Avg episode reward: [(0, '4.480')] +[2024-11-07 23:50:55,555][42004] Updated weights for policy 0, policy_version 8346 (0.0029) +[2024-11-07 23:50:57,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 34197504. Throughput: 0: 1740.3. Samples: 3546108. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:50:57,934][41694] Avg episode reward: [(0, '4.662')] +[2024-11-07 23:51:02,674][42004] Updated weights for policy 0, policy_version 8356 (0.0024) +[2024-11-07 23:51:02,932][41694] Fps is (10 sec: 6146.6, 60 sec: 6895.0, 300 sec: 6803.5). Total num frames: 34226176. Throughput: 0: 1721.9. Samples: 3550642. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:51:02,934][41694] Avg episode reward: [(0, '4.307')] +[2024-11-07 23:51:07,932][41694] Fps is (10 sec: 5324.4, 60 sec: 6690.1, 300 sec: 6775.7). Total num frames: 34250752. Throughput: 0: 1653.6. Samples: 3557952. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:51:07,934][41694] Avg episode reward: [(0, '4.251')] +[2024-11-07 23:51:12,508][42004] Updated weights for policy 0, policy_version 8366 (0.0047) +[2024-11-07 23:51:12,931][41694] Fps is (10 sec: 4096.0, 60 sec: 6348.8, 300 sec: 6706.3). Total num frames: 34267136. Throughput: 0: 1521.8. Samples: 3563612. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:51:12,934][41694] Avg episode reward: [(0, '4.199')] +[2024-11-07 23:51:17,932][41694] Fps is (10 sec: 4505.9, 60 sec: 6212.3, 300 sec: 6706.3). Total num frames: 34295808. Throughput: 0: 1503.2. Samples: 3568268. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:51:17,933][41694] Avg episode reward: [(0, '4.408')] +[2024-11-07 23:51:19,073][42004] Updated weights for policy 0, policy_version 8376 (0.0039) +[2024-11-07 23:51:22,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6348.8, 300 sec: 6706.3). Total num frames: 34336768. Throughput: 0: 1568.5. Samples: 3578410. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:51:22,934][41694] Avg episode reward: [(0, '4.528')] +[2024-11-07 23:51:24,338][42004] Updated weights for policy 0, policy_version 8386 (0.0033) +[2024-11-07 23:51:27,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6280.5, 300 sec: 6706.3). Total num frames: 34373632. Throughput: 0: 1584.0. Samples: 3589796. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:51:27,933][41694] Avg episode reward: [(0, '4.439')] +[2024-11-07 23:51:30,014][42004] Updated weights for policy 0, policy_version 8396 (0.0037) +[2024-11-07 23:51:32,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6280.5, 300 sec: 6748.0). Total num frames: 34410496. Throughput: 0: 1580.2. Samples: 3595064. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:51:32,934][41694] Avg episode reward: [(0, '4.577')] +[2024-11-07 23:51:35,229][42004] Updated weights for policy 0, policy_version 8406 (0.0033) +[2024-11-07 23:51:37,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6553.6, 300 sec: 6748.0). Total num frames: 34447360. Throughput: 0: 1579.3. Samples: 3606864. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:51:37,933][41694] Avg episode reward: [(0, '4.677')] +[2024-11-07 23:51:38,016][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000008411_34451456.pth... +[2024-11-07 23:51:38,132][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000008020_32849920.pth +[2024-11-07 23:51:40,706][42004] Updated weights for policy 0, policy_version 8416 (0.0029) +[2024-11-07 23:51:45,073][41694] Fps is (10 sec: 6072.2, 60 sec: 6393.6, 300 sec: 6699.3). Total num frames: 34484224. Throughput: 0: 1523.6. Samples: 3617934. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:51:45,076][41694] Avg episode reward: [(0, '4.521')] +[2024-11-07 23:51:47,932][41694] Fps is (10 sec: 5734.1, 60 sec: 6280.5, 300 sec: 6692.4). Total num frames: 34504704. Throughput: 0: 1529.6. Samples: 3619476. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:51:47,935][41694] Avg episode reward: [(0, '4.507')] +[2024-11-07 23:51:48,513][42004] Updated weights for policy 0, policy_version 8426 (0.0032) +[2024-11-07 23:51:52,932][41694] Fps is (10 sec: 7297.4, 60 sec: 6281.0, 300 sec: 6692.4). Total num frames: 34541568. Throughput: 0: 1602.9. Samples: 3630082. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:51:52,933][41694] Avg episode reward: [(0, '4.358')] +[2024-11-07 23:51:54,100][42004] Updated weights for policy 0, policy_version 8436 (0.0033) +[2024-11-07 23:51:57,933][41694] Fps is (10 sec: 7781.8, 60 sec: 6416.9, 300 sec: 6706.3). Total num frames: 34582528. Throughput: 0: 1740.6. Samples: 3641940. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:51:57,935][41694] Avg episode reward: [(0, '4.391')] +[2024-11-07 23:51:59,220][42004] Updated weights for policy 0, policy_version 8446 (0.0024) +[2024-11-07 23:52:02,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6485.3, 300 sec: 6692.5). Total num frames: 34615296. Throughput: 0: 1767.9. Samples: 3647822. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:52:02,933][41694] Avg episode reward: [(0, '4.550')] +[2024-11-07 23:52:05,363][42004] Updated weights for policy 0, policy_version 8456 (0.0031) +[2024-11-07 23:52:07,932][41694] Fps is (10 sec: 7373.7, 60 sec: 6758.5, 300 sec: 6761.9). Total num frames: 34656256. Throughput: 0: 1766.3. Samples: 3657892. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:52:07,934][41694] Avg episode reward: [(0, '4.437')] +[2024-11-07 23:52:10,491][42004] Updated weights for policy 0, policy_version 8466 (0.0024) +[2024-11-07 23:52:12,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7099.7, 300 sec: 6748.0). Total num frames: 34693120. Throughput: 0: 1772.2. Samples: 3669546. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:52:12,933][41694] Avg episode reward: [(0, '4.593')] +[2024-11-07 23:52:16,016][42004] Updated weights for policy 0, policy_version 8476 (0.0025) +[2024-11-07 23:52:19,575][41694] Fps is (10 sec: 5980.6, 60 sec: 6976.9, 300 sec: 6696.8). Total num frames: 34725888. Throughput: 0: 1718.4. Samples: 3675216. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:52:19,576][41694] Avg episode reward: [(0, '4.599')] +[2024-11-07 23:52:22,935][41694] Fps is (10 sec: 5323.0, 60 sec: 6826.3, 300 sec: 6692.4). Total num frames: 34746368. Throughput: 0: 1666.9. Samples: 3681882. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:52:22,937][41694] Avg episode reward: [(0, '4.603')] +[2024-11-07 23:52:24,439][42004] Updated weights for policy 0, policy_version 8486 (0.0027) +[2024-11-07 23:52:27,932][41694] Fps is (10 sec: 6371.7, 60 sec: 6758.4, 300 sec: 6678.6). Total num frames: 34779136. Throughput: 0: 1718.3. Samples: 3691578. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:52:27,933][41694] Avg episode reward: [(0, '4.401')] +[2024-11-07 23:52:30,293][42004] Updated weights for policy 0, policy_version 8496 (0.0031) +[2024-11-07 23:52:32,931][41694] Fps is (10 sec: 6965.5, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 34816000. Throughput: 0: 1725.4. Samples: 3697118. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:52:32,933][41694] Avg episode reward: [(0, '4.364')] +[2024-11-07 23:52:36,015][42004] Updated weights for policy 0, policy_version 8506 (0.0028) +[2024-11-07 23:52:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6715.9). Total num frames: 34852864. Throughput: 0: 1722.4. Samples: 3707588. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:52:37,934][41694] Avg episode reward: [(0, '4.527')] +[2024-11-07 23:52:41,580][42004] Updated weights for policy 0, policy_version 8516 (0.0028) +[2024-11-07 23:52:42,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7008.6, 300 sec: 6706.3). Total num frames: 34889728. Throughput: 0: 1704.4. Samples: 3718636. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:52:42,934][41694] Avg episode reward: [(0, '4.329')] +[2024-11-07 23:52:47,416][42004] Updated weights for policy 0, policy_version 8526 (0.0037) +[2024-11-07 23:52:47,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6963.3, 300 sec: 6692.5). Total num frames: 34922496. Throughput: 0: 1693.2. Samples: 3724014. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-07 23:52:47,935][41694] Avg episode reward: [(0, '4.336')] +[2024-11-07 23:52:54,116][41694] Fps is (10 sec: 5493.6, 60 sec: 6694.6, 300 sec: 6665.7). Total num frames: 34951168. Throughput: 0: 1663.0. Samples: 3734696. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-07 23:52:54,119][41694] Avg episode reward: [(0, '4.324')] +[2024-11-07 23:52:55,450][42004] Updated weights for policy 0, policy_version 8536 (0.0025) +[2024-11-07 23:52:57,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6622.0, 300 sec: 6720.5). Total num frames: 34979840. Throughput: 0: 1587.6. Samples: 3740988. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:52:57,936][41694] Avg episode reward: [(0, '4.231')] +[2024-11-07 23:53:01,011][42004] Updated weights for policy 0, policy_version 8546 (0.0034) +[2024-11-07 23:53:02,931][41694] Fps is (10 sec: 7433.8, 60 sec: 6690.1, 300 sec: 6734.1). Total num frames: 35016704. Throughput: 0: 1648.9. Samples: 3746708. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:53:02,936][41694] Avg episode reward: [(0, '4.527')] +[2024-11-07 23:53:06,679][42004] Updated weights for policy 0, policy_version 8556 (0.0032) +[2024-11-07 23:53:07,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 35053568. Throughput: 0: 1683.5. Samples: 3757634. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:53:07,934][41694] Avg episode reward: [(0, '4.481')] +[2024-11-07 23:53:12,576][42004] Updated weights for policy 0, policy_version 8566 (0.0034) +[2024-11-07 23:53:12,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6553.6, 300 sec: 6753.5). Total num frames: 35086336. Throughput: 0: 1700.3. Samples: 3768090. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:53:12,933][41694] Avg episode reward: [(0, '4.628')] +[2024-11-07 23:53:17,648][42004] Updated weights for policy 0, policy_version 8576 (0.0026) +[2024-11-07 23:53:17,932][41694] Fps is (10 sec: 7372.2, 60 sec: 6878.4, 300 sec: 6775.7). Total num frames: 35127296. Throughput: 0: 1708.9. Samples: 3774020. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:53:17,934][41694] Avg episode reward: [(0, '4.463')] +[2024-11-07 23:53:22,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6963.6, 300 sec: 6761.9). Total num frames: 35164160. Throughput: 0: 1732.1. Samples: 3785534. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:53:22,933][41694] Avg episode reward: [(0, '4.567')] +[2024-11-07 23:53:23,112][42004] Updated weights for policy 0, policy_version 8586 (0.0033) +[2024-11-07 23:53:28,644][41694] Fps is (10 sec: 6118.4, 60 sec: 6814.1, 300 sec: 6717.9). Total num frames: 35192832. Throughput: 0: 1591.4. Samples: 3791382. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:53:28,645][41694] Avg episode reward: [(0, '4.221')] +[2024-11-07 23:53:30,721][42004] Updated weights for policy 0, policy_version 8596 (0.0026) +[2024-11-07 23:53:32,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 35221504. Throughput: 0: 1658.4. Samples: 3798640. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:53:32,933][41694] Avg episode reward: [(0, '4.310')] +[2024-11-07 23:53:36,119][42004] Updated weights for policy 0, policy_version 8606 (0.0036) +[2024-11-07 23:53:37,932][41694] Fps is (10 sec: 7497.0, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 35262464. Throughput: 0: 1719.0. Samples: 3810014. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:53:37,934][41694] Avg episode reward: [(0, '4.245')] +[2024-11-07 23:53:37,950][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000008609_35262464.pth... +[2024-11-07 23:53:38,074][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000008219_33665024.pth +[2024-11-07 23:53:41,296][42004] Updated weights for policy 0, policy_version 8616 (0.0026) +[2024-11-07 23:53:42,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 35299328. Throughput: 0: 1790.0. Samples: 3821538. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:53:42,933][41694] Avg episode reward: [(0, '4.377')] +[2024-11-07 23:53:47,231][42004] Updated weights for policy 0, policy_version 8626 (0.0035) +[2024-11-07 23:53:47,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6894.9, 300 sec: 6761.9). Total num frames: 35336192. Throughput: 0: 1777.4. Samples: 3826692. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:53:47,934][41694] Avg episode reward: [(0, '4.424')] +[2024-11-07 23:53:52,409][42004] Updated weights for policy 0, policy_version 8636 (0.0033) +[2024-11-07 23:53:52,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7173.0, 300 sec: 6748.0). Total num frames: 35373056. Throughput: 0: 1788.1. Samples: 3838098. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:53:52,934][41694] Avg episode reward: [(0, '4.601')] +[2024-11-07 23:53:57,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7168.0, 300 sec: 6761.9). Total num frames: 35409920. Throughput: 0: 1804.2. Samples: 3849278. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:53:57,934][41694] Avg episode reward: [(0, '4.467')] +[2024-11-07 23:53:57,941][42004] Updated weights for policy 0, policy_version 8646 (0.0035) +[2024-11-07 23:54:03,206][41694] Fps is (10 sec: 5980.1, 60 sec: 6931.5, 300 sec: 6700.1). Total num frames: 35434496. Throughput: 0: 1785.4. Samples: 3854850. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:54:03,209][41694] Avg episode reward: [(0, '4.378')] +[2024-11-07 23:54:05,943][42004] Updated weights for policy 0, policy_version 8656 (0.0027) +[2024-11-07 23:54:07,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6894.9, 300 sec: 6706.3). Total num frames: 35467264. Throughput: 0: 1686.8. Samples: 3861442. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:54:07,933][41694] Avg episode reward: [(0, '4.444')] +[2024-11-07 23:54:11,524][42004] Updated weights for policy 0, policy_version 8666 (0.0030) +[2024-11-07 23:54:12,932][41694] Fps is (10 sec: 7159.4, 60 sec: 6963.2, 300 sec: 6692.4). Total num frames: 35504128. Throughput: 0: 1835.0. Samples: 3872650. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:54:12,933][41694] Avg episode reward: [(0, '4.275')] +[2024-11-07 23:54:16,912][42004] Updated weights for policy 0, policy_version 8676 (0.0031) +[2024-11-07 23:54:17,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6895.0, 300 sec: 6692.5). Total num frames: 35540992. Throughput: 0: 1775.2. Samples: 3878526. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:54:17,933][41694] Avg episode reward: [(0, '4.544')] +[2024-11-07 23:54:22,706][42004] Updated weights for policy 0, policy_version 8686 (0.0049) +[2024-11-07 23:54:22,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6894.9, 300 sec: 6748.0). Total num frames: 35577856. Throughput: 0: 1755.1. Samples: 3888992. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:54:22,934][41694] Avg episode reward: [(0, '4.510')] +[2024-11-07 23:54:27,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7115.9, 300 sec: 6734.1). Total num frames: 35614720. Throughput: 0: 1742.3. Samples: 3899942. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:54:27,934][41694] Avg episode reward: [(0, '4.481')] +[2024-11-07 23:54:28,220][42004] Updated weights for policy 0, policy_version 8696 (0.0023) +[2024-11-07 23:54:32,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7168.0, 300 sec: 6734.1). Total num frames: 35651584. Throughput: 0: 1755.2. Samples: 3905678. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:54:32,933][41694] Avg episode reward: [(0, '4.380')] +[2024-11-07 23:54:33,573][42004] Updated weights for policy 0, policy_version 8706 (0.0023) +[2024-11-07 23:54:37,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6826.7, 300 sec: 6678.6). Total num frames: 35672064. Throughput: 0: 1739.6. Samples: 3916380. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:54:37,934][41694] Avg episode reward: [(0, '4.382')] +[2024-11-07 23:54:41,437][42004] Updated weights for policy 0, policy_version 8716 (0.0037) +[2024-11-07 23:54:42,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6826.7, 300 sec: 6734.1). Total num frames: 35708928. Throughput: 0: 1658.5. Samples: 3923910. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:54:42,934][41694] Avg episode reward: [(0, '4.414')] +[2024-11-07 23:54:46,864][42004] Updated weights for policy 0, policy_version 8726 (0.0026) +[2024-11-07 23:54:47,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6894.9, 300 sec: 6761.9). Total num frames: 35749888. Throughput: 0: 1666.2. Samples: 3929372. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:54:47,934][41694] Avg episode reward: [(0, '4.495')] +[2024-11-07 23:54:52,400][42004] Updated weights for policy 0, policy_version 8736 (0.0035) +[2024-11-07 23:54:52,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6748.0). Total num frames: 35782656. Throughput: 0: 1759.6. Samples: 3940626. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:54:52,933][41694] Avg episode reward: [(0, '4.447')] +[2024-11-07 23:54:57,618][42004] Updated weights for policy 0, policy_version 8746 (0.0025) +[2024-11-07 23:54:57,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6895.0, 300 sec: 6817.4). Total num frames: 35823616. Throughput: 0: 1766.3. Samples: 3952132. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:54:57,933][41694] Avg episode reward: [(0, '4.283')] +[2024-11-07 23:55:02,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7063.7, 300 sec: 6803.5). Total num frames: 35856384. Throughput: 0: 1760.4. Samples: 3957742. Policy #0 lag: (min: 0.0, avg: 0.8, max: 4.0) +[2024-11-07 23:55:02,935][41694] Avg episode reward: [(0, '4.474')] +[2024-11-07 23:55:03,595][42004] Updated weights for policy 0, policy_version 8756 (0.0023) +[2024-11-07 23:55:07,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7168.0, 300 sec: 6817.4). Total num frames: 35897344. Throughput: 0: 1763.6. Samples: 3968356. Policy #0 lag: (min: 0.0, avg: 0.8, max: 4.0) +[2024-11-07 23:55:07,936][41694] Avg episode reward: [(0, '4.357')] +[2024-11-07 23:55:08,816][42004] Updated weights for policy 0, policy_version 8766 (0.0034) +[2024-11-07 23:55:12,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6826.7, 300 sec: 6748.0). Total num frames: 35913728. Throughput: 0: 1687.8. Samples: 3975892. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 23:55:12,934][41694] Avg episode reward: [(0, '4.377')] +[2024-11-07 23:55:17,209][42004] Updated weights for policy 0, policy_version 8776 (0.2340) +[2024-11-07 23:55:17,946][41694] Fps is (10 sec: 5317.2, 60 sec: 6825.0, 300 sec: 6761.5). Total num frames: 35950592. Throughput: 0: 1658.7. Samples: 3980342. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-07 23:55:17,947][41694] Avg episode reward: [(0, '4.459')] +[2024-11-07 23:55:22,801][42004] Updated weights for policy 0, policy_version 8786 (0.0040) +[2024-11-07 23:55:22,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6748.0). Total num frames: 35987456. Throughput: 0: 1659.4. Samples: 3991052. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-07 23:55:22,933][41694] Avg episode reward: [(0, '4.555')] +[2024-11-07 23:55:27,931][41694] Fps is (10 sec: 6562.9, 60 sec: 6690.1, 300 sec: 6720.2). Total num frames: 36016128. Throughput: 0: 1713.8. Samples: 4001032. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-07 23:55:27,934][41694] Avg episode reward: [(0, '4.441')] +[2024-11-07 23:55:29,033][42004] Updated weights for policy 0, policy_version 8796 (0.0034) +[2024-11-07 23:55:32,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 36057088. Throughput: 0: 1714.4. Samples: 4006518. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:55:32,933][41694] Avg episode reward: [(0, '4.237')] +[2024-11-07 23:55:34,308][42004] Updated weights for policy 0, policy_version 8806 (0.0021) +[2024-11-07 23:55:37,932][41694] Fps is (10 sec: 7781.8, 60 sec: 7031.4, 300 sec: 6803.5). Total num frames: 36093952. Throughput: 0: 1715.6. Samples: 4017830. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:55:37,935][41694] Avg episode reward: [(0, '4.574')] +[2024-11-07 23:55:37,950][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000008812_36093952.pth... +[2024-11-07 23:55:38,142][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000008411_34451456.pth +[2024-11-07 23:55:39,774][42004] Updated weights for policy 0, policy_version 8816 (0.0029) +[2024-11-07 23:55:42,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7031.5, 300 sec: 6789.6). Total num frames: 36130816. Throughput: 0: 1707.2. Samples: 4028958. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:55:42,933][41694] Avg episode reward: [(0, '4.229')] +[2024-11-07 23:55:47,605][42004] Updated weights for policy 0, policy_version 8826 (0.0026) +[2024-11-07 23:55:47,932][41694] Fps is (10 sec: 5734.8, 60 sec: 6690.1, 300 sec: 6734.2). Total num frames: 36151296. Throughput: 0: 1693.5. Samples: 4033950. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:55:47,933][41694] Avg episode reward: [(0, '4.304')] +[2024-11-07 23:55:52,829][42004] Updated weights for policy 0, policy_version 8836 (0.0032) +[2024-11-07 23:55:52,932][41694] Fps is (10 sec: 6143.8, 60 sec: 6826.6, 300 sec: 6761.9). Total num frames: 36192256. Throughput: 0: 1637.1. Samples: 4042028. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:55:52,935][41694] Avg episode reward: [(0, '4.270')] +[2024-11-07 23:55:57,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 36229120. Throughput: 0: 1722.8. Samples: 4053418. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:55:57,933][41694] Avg episode reward: [(0, '4.509')] +[2024-11-07 23:55:58,394][42004] Updated weights for policy 0, policy_version 8846 (0.0026) +[2024-11-07 23:56:02,932][41694] Fps is (10 sec: 6963.4, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 36261888. Throughput: 0: 1738.8. Samples: 4058562. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:56:02,934][41694] Avg episode reward: [(0, '4.267')] +[2024-11-07 23:56:04,311][42004] Updated weights for policy 0, policy_version 8856 (0.0040) +[2024-11-07 23:56:07,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 6886.8). Total num frames: 36298752. Throughput: 0: 1739.9. Samples: 4069348. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-07 23:56:07,933][41694] Avg episode reward: [(0, '4.735')] +[2024-11-07 23:56:10,146][42004] Updated weights for policy 0, policy_version 8866 (0.0031) +[2024-11-07 23:56:12,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6963.2, 300 sec: 6900.7). Total num frames: 36331520. Throughput: 0: 1732.3. Samples: 4078986. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:56:12,934][41694] Avg episode reward: [(0, '4.703')] +[2024-11-07 23:56:16,844][42004] Updated weights for policy 0, policy_version 8876 (0.0026) +[2024-11-07 23:56:17,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6896.6, 300 sec: 6872.9). Total num frames: 36364288. Throughput: 0: 1709.1. Samples: 4083426. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:56:17,935][41694] Avg episode reward: [(0, '4.344')] +[2024-11-07 23:56:22,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.8, 300 sec: 6817.4). Total num frames: 36384768. Throughput: 0: 1630.2. Samples: 4091186. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:56:22,935][41694] Avg episode reward: [(0, '4.543')] +[2024-11-07 23:56:24,397][42004] Updated weights for policy 0, policy_version 8886 (0.0031) +[2024-11-07 23:56:27,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 36421632. Throughput: 0: 1619.5. Samples: 4101834. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:56:27,933][41694] Avg episode reward: [(0, '4.364')] +[2024-11-07 23:56:29,619][42004] Updated weights for policy 0, policy_version 8896 (0.0026) +[2024-11-07 23:56:32,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.2, 300 sec: 6817.4). Total num frames: 36458496. Throughput: 0: 1638.8. Samples: 4107694. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:56:32,936][41694] Avg episode reward: [(0, '4.500')] +[2024-11-07 23:56:35,530][42004] Updated weights for policy 0, policy_version 8906 (0.0032) +[2024-11-07 23:56:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.2, 300 sec: 6867.3). Total num frames: 36495360. Throughput: 0: 1695.6. Samples: 4118328. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:56:37,933][41694] Avg episode reward: [(0, '4.463')] +[2024-11-07 23:56:40,689][42004] Updated weights for policy 0, policy_version 8916 (0.0030) +[2024-11-07 23:56:42,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 6886.8). Total num frames: 36536320. Throughput: 0: 1705.4. Samples: 4130160. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:56:42,933][41694] Avg episode reward: [(0, '4.567')] +[2024-11-07 23:56:45,868][42004] Updated weights for policy 0, policy_version 8926 (0.0027) +[2024-11-07 23:56:47,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7031.5, 300 sec: 6886.8). Total num frames: 36573184. Throughput: 0: 1721.9. Samples: 4136046. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:56:47,934][41694] Avg episode reward: [(0, '4.565')] +[2024-11-07 23:56:51,225][42004] Updated weights for policy 0, policy_version 8936 (0.0030) +[2024-11-07 23:56:52,932][41694] Fps is (10 sec: 7782.0, 60 sec: 7031.5, 300 sec: 6886.9). Total num frames: 36614144. Throughput: 0: 1737.9. Samples: 4147554. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:56:52,933][41694] Avg episode reward: [(0, '4.344')] +[2024-11-07 23:56:57,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 36634624. Throughput: 0: 1678.4. Samples: 4154514. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:56:57,933][41694] Avg episode reward: [(0, '4.365')] +[2024-11-07 23:56:58,928][42004] Updated weights for policy 0, policy_version 8946 (0.0030) +[2024-11-07 23:57:02,931][41694] Fps is (10 sec: 5325.1, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 36667392. Throughput: 0: 1704.9. Samples: 4160146. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:57:02,934][41694] Avg episode reward: [(0, '4.382')] +[2024-11-07 23:57:04,842][42004] Updated weights for policy 0, policy_version 8956 (0.0026) +[2024-11-07 23:57:07,934][41694] Fps is (10 sec: 6961.6, 60 sec: 6758.1, 300 sec: 6817.4). Total num frames: 36704256. Throughput: 0: 1762.0. Samples: 4170482. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:57:07,937][41694] Avg episode reward: [(0, '4.325')] +[2024-11-07 23:57:10,580][42004] Updated weights for policy 0, policy_version 8966 (0.0031) +[2024-11-07 23:57:12,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6869.6). Total num frames: 36741120. Throughput: 0: 1774.2. Samples: 4181674. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:57:12,932][41694] Avg episode reward: [(0, '4.236')] +[2024-11-07 23:57:15,557][42004] Updated weights for policy 0, policy_version 8976 (0.0027) +[2024-11-07 23:57:17,931][41694] Fps is (10 sec: 7784.2, 60 sec: 6963.2, 300 sec: 6900.8). Total num frames: 36782080. Throughput: 0: 1782.2. Samples: 4187892. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:57:17,933][41694] Avg episode reward: [(0, '4.538')] +[2024-11-07 23:57:21,067][42004] Updated weights for policy 0, policy_version 8986 (0.0029) +[2024-11-07 23:57:22,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7236.3, 300 sec: 6914.6). Total num frames: 36818944. Throughput: 0: 1794.6. Samples: 4199084. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:57:22,934][41694] Avg episode reward: [(0, '4.774')] +[2024-11-07 23:57:26,409][42004] Updated weights for policy 0, policy_version 8996 (0.0020) +[2024-11-07 23:57:27,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7236.3, 300 sec: 6914.6). Total num frames: 36855808. Throughput: 0: 1789.2. Samples: 4210674. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:57:27,933][41694] Avg episode reward: [(0, '4.475')] +[2024-11-07 23:57:32,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6963.2, 300 sec: 6859.1). Total num frames: 36876288. Throughput: 0: 1700.6. Samples: 4212572. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:57:32,933][41694] Avg episode reward: [(0, '4.539')] +[2024-11-07 23:57:34,170][42004] Updated weights for policy 0, policy_version 9006 (0.0027) +[2024-11-07 23:57:37,932][41694] Fps is (10 sec: 5734.1, 60 sec: 6963.2, 300 sec: 6859.1). Total num frames: 36913152. Throughput: 0: 1682.5. Samples: 4223268. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:57:37,935][41694] Avg episode reward: [(0, '4.357')] +[2024-11-07 23:57:37,978][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000009013_36917248.pth... +[2024-11-07 23:57:38,134][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000008609_35262464.pth +[2024-11-07 23:57:39,933][42004] Updated weights for policy 0, policy_version 9016 (0.0032) +[2024-11-07 23:57:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6872.9). Total num frames: 36950016. Throughput: 0: 1753.9. Samples: 4233438. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:57:42,933][41694] Avg episode reward: [(0, '4.248')] +[2024-11-07 23:57:45,664][42004] Updated weights for policy 0, policy_version 9026 (0.0034) +[2024-11-07 23:57:47,932][41694] Fps is (10 sec: 6963.5, 60 sec: 6826.7, 300 sec: 6914.6). Total num frames: 36982784. Throughput: 0: 1750.6. Samples: 4238924. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:57:47,934][41694] Avg episode reward: [(0, '4.359')] +[2024-11-07 23:57:51,668][42004] Updated weights for policy 0, policy_version 9036 (0.0039) +[2024-11-07 23:57:52,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 6914.6). Total num frames: 37019648. Throughput: 0: 1747.5. Samples: 4249114. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:57:52,934][41694] Avg episode reward: [(0, '4.531')] +[2024-11-07 23:57:57,045][42004] Updated weights for policy 0, policy_version 9046 (0.0029) +[2024-11-07 23:57:57,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 6914.6). Total num frames: 37056512. Throughput: 0: 1754.6. Samples: 4260630. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:57:57,934][41694] Avg episode reward: [(0, '4.565')] +[2024-11-07 23:58:02,655][42004] Updated weights for policy 0, policy_version 9056 (0.0022) +[2024-11-07 23:58:04,713][41694] Fps is (10 sec: 6258.2, 60 sec: 6895.0, 300 sec: 6873.1). Total num frames: 37093376. Throughput: 0: 1673.2. Samples: 4266168. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:58:04,715][41694] Avg episode reward: [(0, '4.382')] +[2024-11-07 23:58:07,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6826.9, 300 sec: 6873.0). Total num frames: 37113856. Throughput: 0: 1655.8. Samples: 4273594. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:58:07,934][41694] Avg episode reward: [(0, '4.229')] +[2024-11-07 23:58:10,040][42004] Updated weights for policy 0, policy_version 9066 (0.0029) +[2024-11-07 23:58:12,932][41694] Fps is (10 sec: 6977.1, 60 sec: 6826.6, 300 sec: 6859.1). Total num frames: 37150720. Throughput: 0: 1642.4. Samples: 4284580. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:58:12,934][41694] Avg episode reward: [(0, '4.072')] +[2024-11-07 23:58:16,022][42004] Updated weights for policy 0, policy_version 9076 (0.0029) +[2024-11-07 23:58:17,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6859.1). Total num frames: 37187584. Throughput: 0: 1712.7. Samples: 4289642. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:58:17,935][41694] Avg episode reward: [(0, '4.253')] +[2024-11-07 23:58:21,602][42004] Updated weights for policy 0, policy_version 9086 (0.0026) +[2024-11-07 23:58:22,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6758.4, 300 sec: 6903.5). Total num frames: 37224448. Throughput: 0: 1718.4. Samples: 4300596. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:58:22,934][41694] Avg episode reward: [(0, '4.341')] +[2024-11-07 23:58:26,789][42004] Updated weights for policy 0, policy_version 9096 (0.0031) +[2024-11-07 23:58:27,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6826.7, 300 sec: 6928.5). Total num frames: 37265408. Throughput: 0: 1752.4. Samples: 4312296. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:58:27,933][41694] Avg episode reward: [(0, '4.428')] +[2024-11-07 23:58:31,901][42004] Updated weights for policy 0, policy_version 9106 (0.0031) +[2024-11-07 23:58:32,932][41694] Fps is (10 sec: 7782.6, 60 sec: 7099.7, 300 sec: 6914.6). Total num frames: 37302272. Throughput: 0: 1761.3. Samples: 4318182. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:58:32,934][41694] Avg episode reward: [(0, '4.599')] +[2024-11-07 23:58:39,080][41694] Fps is (10 sec: 6245.9, 60 sec: 6899.5, 300 sec: 6874.0). Total num frames: 37335040. Throughput: 0: 1755.0. Samples: 4330104. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:58:39,083][41694] Avg episode reward: [(0, '4.505')] +[2024-11-07 23:58:39,289][42004] Updated weights for policy 0, policy_version 9116 (0.0033) +[2024-11-07 23:58:42,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6894.9, 300 sec: 6872.9). Total num frames: 37363712. Throughput: 0: 1710.4. Samples: 4337596. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:58:42,933][41694] Avg episode reward: [(0, '4.343')] +[2024-11-07 23:58:44,615][42004] Updated weights for policy 0, policy_version 9126 (0.0029) +[2024-11-07 23:58:47,932][41694] Fps is (10 sec: 7866.5, 60 sec: 7031.5, 300 sec: 6886.8). Total num frames: 37404672. Throughput: 0: 1788.9. Samples: 4343484. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:58:47,935][41694] Avg episode reward: [(0, '4.231')] +[2024-11-07 23:58:50,433][42004] Updated weights for policy 0, policy_version 9136 (0.0029) +[2024-11-07 23:58:52,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6873.0). Total num frames: 37437440. Throughput: 0: 1781.8. Samples: 4353774. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:58:52,936][41694] Avg episode reward: [(0, '4.406')] +[2024-11-07 23:58:55,882][42004] Updated weights for policy 0, policy_version 9146 (0.0026) +[2024-11-07 23:58:57,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6963.2, 300 sec: 6921.0). Total num frames: 37474304. Throughput: 0: 1793.2. Samples: 4365276. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:58:57,933][41694] Avg episode reward: [(0, '4.374')] +[2024-11-07 23:59:01,336][42004] Updated weights for policy 0, policy_version 9156 (0.0025) +[2024-11-07 23:59:02,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7176.2, 300 sec: 6928.5). Total num frames: 37511168. Throughput: 0: 1809.5. Samples: 4371070. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:59:02,934][41694] Avg episode reward: [(0, '4.405')] +[2024-11-07 23:59:06,729][42004] Updated weights for policy 0, policy_version 9166 (0.0027) +[2024-11-07 23:59:07,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7304.5, 300 sec: 6942.4). Total num frames: 37552128. Throughput: 0: 1815.4. Samples: 4382290. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:59:07,934][41694] Avg episode reward: [(0, '4.406')] +[2024-11-07 23:59:13,461][41694] Fps is (10 sec: 6224.1, 60 sec: 7037.6, 300 sec: 6888.4). Total num frames: 37576704. Throughput: 0: 1662.2. Samples: 4387974. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:59:13,463][41694] Avg episode reward: [(0, '4.322')] +[2024-11-07 23:59:14,325][42004] Updated weights for policy 0, policy_version 9176 (0.0026) +[2024-11-07 23:59:17,932][41694] Fps is (10 sec: 5734.5, 60 sec: 7031.5, 300 sec: 6886.8). Total num frames: 37609472. Throughput: 0: 1711.0. Samples: 4395176. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:59:17,938][41694] Avg episode reward: [(0, '4.439')] +[2024-11-07 23:59:19,884][42004] Updated weights for policy 0, policy_version 9186 (0.0024) +[2024-11-07 23:59:22,931][41694] Fps is (10 sec: 6920.0, 60 sec: 6963.2, 300 sec: 6872.9). Total num frames: 37642240. Throughput: 0: 1732.6. Samples: 4406082. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:59:22,934][41694] Avg episode reward: [(0, '4.282')] +[2024-11-07 23:59:26,210][42004] Updated weights for policy 0, policy_version 9196 (0.0034) +[2024-11-07 23:59:27,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6826.6, 300 sec: 6859.1). Total num frames: 37675008. Throughput: 0: 1731.0. Samples: 4415490. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:59:27,933][41694] Avg episode reward: [(0, '4.313')] +[2024-11-07 23:59:32,034][42004] Updated weights for policy 0, policy_version 9206 (0.0054) +[2024-11-07 23:59:32,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6914.6). Total num frames: 37711872. Throughput: 0: 1716.0. Samples: 4420706. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:59:32,933][41694] Avg episode reward: [(0, '4.547')] +[2024-11-07 23:59:37,585][42004] Updated weights for policy 0, policy_version 9216 (0.0039) +[2024-11-07 23:59:37,934][41694] Fps is (10 sec: 7371.1, 60 sec: 7029.2, 300 sec: 6914.5). Total num frames: 37748736. Throughput: 0: 1733.6. Samples: 4431790. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:59:37,936][41694] Avg episode reward: [(0, '4.472')] +[2024-11-07 23:59:37,973][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000009216_37748736.pth... +[2024-11-07 23:59:38,111][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000008812_36093952.pth +[2024-11-07 23:59:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 6900.7). Total num frames: 37785600. Throughput: 0: 1722.7. Samples: 4442796. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:59:42,933][41694] Avg episode reward: [(0, '4.356')] +[2024-11-07 23:59:43,163][42004] Updated weights for policy 0, policy_version 9226 (0.0043) +[2024-11-07 23:59:47,931][41694] Fps is (10 sec: 5735.9, 60 sec: 6690.1, 300 sec: 6859.1). Total num frames: 37806080. Throughput: 0: 1721.4. Samples: 4448532. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:59:47,933][41694] Avg episode reward: [(0, '4.273')] +[2024-11-07 23:59:50,597][42004] Updated weights for policy 0, policy_version 9236 (0.0029) +[2024-11-07 23:59:52,932][41694] Fps is (10 sec: 6143.6, 60 sec: 6826.6, 300 sec: 6859.1). Total num frames: 37847040. Throughput: 0: 1640.4. Samples: 4456108. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:59:52,936][41694] Avg episode reward: [(0, '4.441')] +[2024-11-07 23:59:56,328][42004] Updated weights for policy 0, policy_version 9246 (0.0037) +[2024-11-07 23:59:57,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6859.1). Total num frames: 37879808. Throughput: 0: 1766.3. Samples: 4466524. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:59:57,933][41694] Avg episode reward: [(0, '4.344')] +[2024-11-08 00:00:02,783][42004] Updated weights for policy 0, policy_version 9256 (0.0037) +[2024-11-08 00:00:02,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6690.1, 300 sec: 6831.3). Total num frames: 37912576. Throughput: 0: 1693.1. Samples: 4471368. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:00:02,933][41694] Avg episode reward: [(0, '4.537')] +[2024-11-08 00:00:07,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 6900.7). Total num frames: 37949440. Throughput: 0: 1685.2. Samples: 4481918. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:00:07,934][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 00:00:08,049][42004] Updated weights for policy 0, policy_version 9266 (0.0028) +[2024-11-08 00:00:12,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6887.4, 300 sec: 6901.1). Total num frames: 37986304. Throughput: 0: 1732.0. Samples: 4493428. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:00:12,933][41694] Avg episode reward: [(0, '4.389')] +[2024-11-08 00:00:13,673][42004] Updated weights for policy 0, policy_version 9276 (0.0028) +[2024-11-08 00:00:17,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6900.7). Total num frames: 38023168. Throughput: 0: 1722.4. Samples: 4498214. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:00:17,933][41694] Avg episode reward: [(0, '4.443')] +[2024-11-08 00:00:19,486][42004] Updated weights for policy 0, policy_version 9286 (0.0033) +[2024-11-08 00:00:22,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.1, 300 sec: 6873.0). Total num frames: 38043648. Throughput: 0: 1678.4. Samples: 4507312. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:00:22,933][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 00:00:27,359][42004] Updated weights for policy 0, policy_version 9296 (0.0027) +[2024-11-08 00:00:27,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6758.4, 300 sec: 6859.1). Total num frames: 38080512. Throughput: 0: 1629.5. Samples: 4516122. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:00:27,933][41694] Avg episode reward: [(0, '4.512')] +[2024-11-08 00:00:32,533][42004] Updated weights for policy 0, policy_version 9306 (0.0031) +[2024-11-08 00:00:32,932][41694] Fps is (10 sec: 7372.4, 60 sec: 6758.3, 300 sec: 6859.1). Total num frames: 38117376. Throughput: 0: 1630.8. Samples: 4521920. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:00:32,934][41694] Avg episode reward: [(0, '4.504')] +[2024-11-08 00:00:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.7, 300 sec: 6859.1). Total num frames: 38154240. Throughput: 0: 1718.1. Samples: 4533424. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:00:37,934][41694] Avg episode reward: [(0, '4.351')] +[2024-11-08 00:00:38,217][42004] Updated weights for policy 0, policy_version 9316 (0.0034) +[2024-11-08 00:00:42,932][41694] Fps is (10 sec: 7373.2, 60 sec: 6758.4, 300 sec: 6914.6). Total num frames: 38191104. Throughput: 0: 1719.0. Samples: 4543878. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:00:42,933][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 00:00:43,780][42004] Updated weights for policy 0, policy_version 9326 (0.0027) +[2024-11-08 00:00:47,933][41694] Fps is (10 sec: 7371.9, 60 sec: 7031.3, 300 sec: 6900.7). Total num frames: 38227968. Throughput: 0: 1732.9. Samples: 4549352. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:00:47,936][41694] Avg episode reward: [(0, '4.482')] +[2024-11-08 00:00:49,424][42004] Updated weights for policy 0, policy_version 9336 (0.0020) +[2024-11-08 00:00:52,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6963.2, 300 sec: 6900.7). Total num frames: 38264832. Throughput: 0: 1750.3. Samples: 4560684. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:00:52,934][41694] Avg episode reward: [(0, '4.469')] +[2024-11-08 00:00:56,855][42004] Updated weights for policy 0, policy_version 9346 (0.0029) +[2024-11-08 00:00:57,931][41694] Fps is (10 sec: 6144.9, 60 sec: 6826.7, 300 sec: 6873.0). Total num frames: 38289408. Throughput: 0: 1663.4. Samples: 4568282. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:00:57,933][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 00:01:02,284][42004] Updated weights for policy 0, policy_version 9356 (0.0032) +[2024-11-08 00:01:02,931][41694] Fps is (10 sec: 6144.2, 60 sec: 6895.0, 300 sec: 6872.9). Total num frames: 38326272. Throughput: 0: 1682.8. Samples: 4573938. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:01:02,933][41694] Avg episode reward: [(0, '4.263')] +[2024-11-08 00:01:07,782][42004] Updated weights for policy 0, policy_version 9366 (0.0026) +[2024-11-08 00:01:07,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6886.8). Total num frames: 38363136. Throughput: 0: 1725.5. Samples: 4584960. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:01:07,933][41694] Avg episode reward: [(0, '4.327')] +[2024-11-08 00:01:12,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6758.4, 300 sec: 6872.9). Total num frames: 38391808. Throughput: 0: 1751.4. Samples: 4594936. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:01:12,937][41694] Avg episode reward: [(0, '4.522')] +[2024-11-08 00:01:14,490][42004] Updated weights for policy 0, policy_version 9376 (0.0034) +[2024-11-08 00:01:17,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6690.1, 300 sec: 6914.6). Total num frames: 38424576. Throughput: 0: 1720.2. Samples: 4599330. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:01:17,935][41694] Avg episode reward: [(0, '4.539')] +[2024-11-08 00:01:20,044][42004] Updated weights for policy 0, policy_version 9386 (0.0032) +[2024-11-08 00:01:22,932][41694] Fps is (10 sec: 7373.0, 60 sec: 7031.5, 300 sec: 6928.5). Total num frames: 38465536. Throughput: 0: 1713.7. Samples: 4610538. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:01:22,940][41694] Avg episode reward: [(0, '4.389')] +[2024-11-08 00:01:25,708][42004] Updated weights for policy 0, policy_version 9396 (0.0022) +[2024-11-08 00:01:27,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7031.5, 300 sec: 6928.5). Total num frames: 38502400. Throughput: 0: 1729.6. Samples: 4621710. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:01:27,933][41694] Avg episode reward: [(0, '4.394')] +[2024-11-08 00:01:32,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6758.4, 300 sec: 6872.9). Total num frames: 38522880. Throughput: 0: 1680.1. Samples: 4624956. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:01:32,934][41694] Avg episode reward: [(0, '4.334')] +[2024-11-08 00:01:33,382][42004] Updated weights for policy 0, policy_version 9406 (0.0031) +[2024-11-08 00:01:37,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6859.1). Total num frames: 38559744. Throughput: 0: 1639.3. Samples: 4634452. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:01:37,933][41694] Avg episode reward: [(0, '4.435')] +[2024-11-08 00:01:38,053][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000009415_38563840.pth... +[2024-11-08 00:01:38,217][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000009013_36917248.pth +[2024-11-08 00:01:38,623][42004] Updated weights for policy 0, policy_version 9416 (0.0025) +[2024-11-08 00:01:42,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6826.7, 300 sec: 6873.0). Total num frames: 38600704. Throughput: 0: 1729.2. Samples: 4646098. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:01:42,933][41694] Avg episode reward: [(0, '4.520')] +[2024-11-08 00:01:43,921][42004] Updated weights for policy 0, policy_version 9426 (0.0026) +[2024-11-08 00:01:47,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.6, 300 sec: 6845.2). Total num frames: 38633472. Throughput: 0: 1722.4. Samples: 4651444. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:01:47,934][41694] Avg episode reward: [(0, '4.413')] +[2024-11-08 00:01:49,978][42004] Updated weights for policy 0, policy_version 9436 (0.0033) +[2024-11-08 00:01:52,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6900.7). Total num frames: 38670336. Throughput: 0: 1712.7. Samples: 4662032. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:01:52,933][41694] Avg episode reward: [(0, '4.430')] +[2024-11-08 00:01:55,341][42004] Updated weights for policy 0, policy_version 9446 (0.0028) +[2024-11-08 00:01:57,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7031.4, 300 sec: 6928.5). Total num frames: 38711296. Throughput: 0: 1749.0. Samples: 4673640. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:01:57,934][41694] Avg episode reward: [(0, '4.500')] +[2024-11-08 00:02:00,740][42004] Updated weights for policy 0, policy_version 9456 (0.0031) +[2024-11-08 00:02:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6914.7). Total num frames: 38744064. Throughput: 0: 1774.8. Samples: 4679198. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:02:02,934][41694] Avg episode reward: [(0, '4.477')] +[2024-11-08 00:02:07,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6758.4, 300 sec: 6872.9). Total num frames: 38768640. Throughput: 0: 1682.4. Samples: 4686246. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:02:07,933][41694] Avg episode reward: [(0, '4.339')] +[2024-11-08 00:02:08,386][42004] Updated weights for policy 0, policy_version 9466 (0.0032) +[2024-11-08 00:02:12,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6895.0, 300 sec: 6859.1). Total num frames: 38805504. Throughput: 0: 1681.6. Samples: 4697384. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:02:12,935][41694] Avg episode reward: [(0, '4.681')] +[2024-11-08 00:02:13,838][42004] Updated weights for policy 0, policy_version 9476 (0.0025) +[2024-11-08 00:02:17,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6859.1). Total num frames: 38842368. Throughput: 0: 1735.4. Samples: 4703048. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:02:17,940][41694] Avg episode reward: [(0, '4.264')] +[2024-11-08 00:02:19,568][42004] Updated weights for policy 0, policy_version 9486 (0.0024) +[2024-11-08 00:02:22,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6845.2). Total num frames: 38875136. Throughput: 0: 1760.2. Samples: 4713660. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:02:22,934][41694] Avg episode reward: [(0, '4.568')] +[2024-11-08 00:02:25,134][42004] Updated weights for policy 0, policy_version 9496 (0.0032) +[2024-11-08 00:02:27,933][41694] Fps is (10 sec: 7371.6, 60 sec: 6894.7, 300 sec: 6914.6). Total num frames: 38916096. Throughput: 0: 1759.5. Samples: 4725276. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:02:27,935][41694] Avg episode reward: [(0, '4.348')] +[2024-11-08 00:02:30,260][42004] Updated weights for policy 0, policy_version 9506 (0.0029) +[2024-11-08 00:02:32,932][41694] Fps is (10 sec: 8191.8, 60 sec: 7236.3, 300 sec: 6928.5). Total num frames: 38957056. Throughput: 0: 1770.9. Samples: 4731136. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:02:32,934][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 00:02:35,514][42004] Updated weights for policy 0, policy_version 9516 (0.0031) +[2024-11-08 00:02:39,609][41694] Fps is (10 sec: 6314.5, 60 sec: 6973.0, 300 sec: 6875.5). Total num frames: 38989824. Throughput: 0: 1731.6. Samples: 4742858. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:02:39,613][41694] Avg episode reward: [(0, '4.379')] +[2024-11-08 00:02:42,876][42004] Updated weights for policy 0, policy_version 9526 (0.0025) +[2024-11-08 00:02:42,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6963.1, 300 sec: 6900.7). Total num frames: 39018496. Throughput: 0: 1708.9. Samples: 4750540. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:02:42,934][41694] Avg episode reward: [(0, '4.521')] +[2024-11-08 00:02:47,932][41694] Fps is (10 sec: 7874.3, 60 sec: 7031.4, 300 sec: 6900.7). Total num frames: 39055360. Throughput: 0: 1718.6. Samples: 4756536. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:02:47,936][41694] Avg episode reward: [(0, '4.459')] +[2024-11-08 00:02:48,116][42004] Updated weights for policy 0, policy_version 9536 (0.0020) +[2024-11-08 00:02:52,932][41694] Fps is (10 sec: 6963.6, 60 sec: 6963.2, 300 sec: 6886.8). Total num frames: 39088128. Throughput: 0: 1805.2. Samples: 4767482. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:02:52,934][41694] Avg episode reward: [(0, '4.486')] +[2024-11-08 00:02:54,187][42004] Updated weights for policy 0, policy_version 9546 (0.0043) +[2024-11-08 00:02:57,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6895.0, 300 sec: 6928.7). Total num frames: 39124992. Throughput: 0: 1786.0. Samples: 4777752. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:02:57,933][41694] Avg episode reward: [(0, '4.247')] +[2024-11-08 00:02:59,719][42004] Updated weights for policy 0, policy_version 9556 (0.0028) +[2024-11-08 00:03:02,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6942.4). Total num frames: 39161856. Throughput: 0: 1787.2. Samples: 4783474. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:03:02,934][41694] Avg episode reward: [(0, '4.185')] +[2024-11-08 00:03:05,567][42004] Updated weights for policy 0, policy_version 9566 (0.0035) +[2024-11-08 00:03:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7168.0, 300 sec: 6942.4). Total num frames: 39198720. Throughput: 0: 1784.0. Samples: 4793940. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:03:07,933][41694] Avg episode reward: [(0, '4.215')] +[2024-11-08 00:03:10,993][42004] Updated weights for policy 0, policy_version 9576 (0.0032) +[2024-11-08 00:03:13,947][41694] Fps is (10 sec: 5949.5, 60 sec: 6914.5, 300 sec: 6890.9). Total num frames: 39227392. Throughput: 0: 1615.2. Samples: 4799598. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:03:13,949][41694] Avg episode reward: [(0, '4.293')] +[2024-11-08 00:03:17,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6894.9, 300 sec: 6886.8). Total num frames: 39256064. Throughput: 0: 1678.9. Samples: 4806686. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:03:17,934][41694] Avg episode reward: [(0, '4.203')] +[2024-11-08 00:03:18,624][42004] Updated weights for policy 0, policy_version 9586 (0.0025) +[2024-11-08 00:03:22,932][41694] Fps is (10 sec: 7750.0, 60 sec: 7031.5, 300 sec: 6886.8). Total num frames: 39297024. Throughput: 0: 1741.9. Samples: 4818320. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:03:22,933][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 00:03:23,952][42004] Updated weights for policy 0, policy_version 9596 (0.0026) +[2024-11-08 00:03:27,931][41694] Fps is (10 sec: 7373.1, 60 sec: 6895.1, 300 sec: 6873.0). Total num frames: 39329792. Throughput: 0: 1743.0. Samples: 4828972. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:03:27,933][41694] Avg episode reward: [(0, '4.714')] +[2024-11-08 00:03:30,039][42004] Updated weights for policy 0, policy_version 9606 (0.0028) +[2024-11-08 00:03:32,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6826.7, 300 sec: 6913.7). Total num frames: 39366656. Throughput: 0: 1722.4. Samples: 4834044. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:03:32,933][41694] Avg episode reward: [(0, '4.451')] +[2024-11-08 00:03:35,449][42004] Updated weights for policy 0, policy_version 9616 (0.0027) +[2024-11-08 00:03:37,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7093.2, 300 sec: 6914.6). Total num frames: 39403520. Throughput: 0: 1732.7. Samples: 4845456. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:03:37,933][41694] Avg episode reward: [(0, '4.348')] +[2024-11-08 00:03:37,950][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000009620_39403520.pth... +[2024-11-08 00:03:38,086][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000009216_37748736.pth +[2024-11-08 00:03:40,928][42004] Updated weights for policy 0, policy_version 9626 (0.0045) +[2024-11-08 00:03:42,933][41694] Fps is (10 sec: 7371.9, 60 sec: 7031.4, 300 sec: 6900.7). Total num frames: 39440384. Throughput: 0: 1751.6. Samples: 4856578. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:03:42,935][41694] Avg episode reward: [(0, '4.331')] +[2024-11-08 00:03:48,378][41694] Fps is (10 sec: 5881.6, 60 sec: 6776.3, 300 sec: 6862.6). Total num frames: 39464960. Throughput: 0: 1738.0. Samples: 4862460. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:03:48,381][41694] Avg episode reward: [(0, '4.481')] +[2024-11-08 00:03:48,420][42004] Updated weights for policy 0, policy_version 9636 (0.0028) +[2024-11-08 00:03:52,938][41694] Fps is (10 sec: 5731.6, 60 sec: 6826.0, 300 sec: 6858.9). Total num frames: 39497728. Throughput: 0: 1679.5. Samples: 4869530. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:03:52,945][41694] Avg episode reward: [(0, '4.522')] +[2024-11-08 00:03:54,290][42004] Updated weights for policy 0, policy_version 9646 (0.0032) +[2024-11-08 00:03:57,932][41694] Fps is (10 sec: 6859.7, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 39530496. Throughput: 0: 1812.5. Samples: 4879322. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:03:57,934][41694] Avg episode reward: [(0, '4.683')] +[2024-11-08 00:04:01,068][42004] Updated weights for policy 0, policy_version 9656 (0.0043) +[2024-11-08 00:04:02,931][41694] Fps is (10 sec: 6147.8, 60 sec: 6621.9, 300 sec: 6803.5). Total num frames: 39559168. Throughput: 0: 1717.3. Samples: 4883964. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:04:02,934][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 00:04:07,380][42004] Updated weights for policy 0, policy_version 9666 (0.0035) +[2024-11-08 00:04:07,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6553.6, 300 sec: 6843.6). Total num frames: 39591936. Throughput: 0: 1664.2. Samples: 4893210. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:04:07,934][41694] Avg episode reward: [(0, '4.581')] +[2024-11-08 00:04:12,933][41694] Fps is (10 sec: 6552.8, 60 sec: 6735.7, 300 sec: 6831.3). Total num frames: 39624704. Throughput: 0: 1639.3. Samples: 4902742. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:04:12,936][41694] Avg episode reward: [(0, '4.568')] +[2024-11-08 00:04:13,826][42004] Updated weights for policy 0, policy_version 9676 (0.0028) +[2024-11-08 00:04:17,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 39661568. Throughput: 0: 1643.5. Samples: 4908000. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:04:17,935][41694] Avg episode reward: [(0, '4.404')] +[2024-11-08 00:04:19,430][42004] Updated weights for policy 0, policy_version 9686 (0.0034) +[2024-11-08 00:04:22,931][41694] Fps is (10 sec: 5735.1, 60 sec: 6417.1, 300 sec: 6803.5). Total num frames: 39682048. Throughput: 0: 1615.4. Samples: 4918150. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:04:22,933][41694] Avg episode reward: [(0, '4.380')] +[2024-11-08 00:04:27,147][42004] Updated weights for policy 0, policy_version 9696 (0.0025) +[2024-11-08 00:04:27,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6485.3, 300 sec: 6803.5). Total num frames: 39718912. Throughput: 0: 1540.8. Samples: 4925914. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:04:27,934][41694] Avg episode reward: [(0, '4.525')] +[2024-11-08 00:04:32,760][42004] Updated weights for policy 0, policy_version 9706 (0.0031) +[2024-11-08 00:04:32,933][41694] Fps is (10 sec: 7371.4, 60 sec: 6485.1, 300 sec: 6803.5). Total num frames: 39755776. Throughput: 0: 1550.5. Samples: 4931544. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:04:32,938][41694] Avg episode reward: [(0, '4.659')] +[2024-11-08 00:04:37,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6485.3, 300 sec: 6803.5). Total num frames: 39792640. Throughput: 0: 1618.0. Samples: 4942328. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:04:37,933][41694] Avg episode reward: [(0, '4.394')] +[2024-11-08 00:04:38,476][42004] Updated weights for policy 0, policy_version 9716 (0.0027) +[2024-11-08 00:04:42,932][41694] Fps is (10 sec: 6554.4, 60 sec: 6348.9, 300 sec: 6831.3). Total num frames: 39821312. Throughput: 0: 1626.8. Samples: 4952530. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:04:42,934][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 00:04:45,128][42004] Updated weights for policy 0, policy_version 9726 (0.0040) +[2024-11-08 00:04:47,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6533.9, 300 sec: 6803.5). Total num frames: 39854080. Throughput: 0: 1615.6. Samples: 4956666. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 00:04:47,933][41694] Avg episode reward: [(0, '4.332')] +[2024-11-08 00:04:51,556][42004] Updated weights for policy 0, policy_version 9736 (0.0043) +[2024-11-08 00:04:52,932][41694] Fps is (10 sec: 6144.3, 60 sec: 6417.7, 300 sec: 6789.6). Total num frames: 39882752. Throughput: 0: 1622.8. Samples: 4966238. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 00:04:52,933][41694] Avg episode reward: [(0, '4.268')] +[2024-11-08 00:04:57,932][41694] Fps is (10 sec: 4505.4, 60 sec: 6144.0, 300 sec: 6734.1). Total num frames: 39899136. Throughput: 0: 1540.9. Samples: 4972082. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:04:57,936][41694] Avg episode reward: [(0, '4.230')] +[2024-11-08 00:05:00,688][42004] Updated weights for policy 0, policy_version 9746 (0.0038) +[2024-11-08 00:05:02,931][41694] Fps is (10 sec: 4915.3, 60 sec: 6212.3, 300 sec: 6720.2). Total num frames: 39931904. Throughput: 0: 1520.9. Samples: 4976442. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:05:02,933][41694] Avg episode reward: [(0, '4.328')] +[2024-11-08 00:05:07,599][42004] Updated weights for policy 0, policy_version 9756 (0.0031) +[2024-11-08 00:05:07,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6143.9, 300 sec: 6692.4). Total num frames: 39960576. Throughput: 0: 1490.3. Samples: 4985214. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:05:07,935][41694] Avg episode reward: [(0, '4.129')] +[2024-11-08 00:05:12,934][41694] Fps is (10 sec: 5733.1, 60 sec: 6075.6, 300 sec: 6664.6). Total num frames: 39989248. Throughput: 0: 1519.1. Samples: 4994276. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:05:12,936][41694] Avg episode reward: [(0, '4.295')] +[2024-11-08 00:05:14,610][42004] Updated weights for policy 0, policy_version 9766 (0.0053) +[2024-11-08 00:05:17,932][41694] Fps is (10 sec: 5325.1, 60 sec: 5871.0, 300 sec: 6678.6). Total num frames: 40013824. Throughput: 0: 1484.6. Samples: 4998348. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:05:17,934][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 00:05:22,937][41694] Fps is (10 sec: 4913.5, 60 sec: 5938.6, 300 sec: 6636.8). Total num frames: 40038400. Throughput: 0: 1401.5. Samples: 5005402. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:05:22,941][41694] Avg episode reward: [(0, '4.319')] +[2024-11-08 00:05:22,963][42004] Updated weights for policy 0, policy_version 9776 (0.0026) +[2024-11-08 00:05:27,932][41694] Fps is (10 sec: 6144.0, 60 sec: 5939.2, 300 sec: 6636.9). Total num frames: 40075264. Throughput: 0: 1383.3. Samples: 5014778. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:05:27,934][41694] Avg episode reward: [(0, '4.329')] +[2024-11-08 00:05:29,190][42004] Updated weights for policy 0, policy_version 9786 (0.0025) +[2024-11-08 00:05:32,932][41694] Fps is (10 sec: 5327.9, 60 sec: 5598.0, 300 sec: 6567.5). Total num frames: 40091648. Throughput: 0: 1376.1. Samples: 5018592. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:05:32,933][41694] Avg episode reward: [(0, '4.666')] +[2024-11-08 00:05:37,473][42004] Updated weights for policy 0, policy_version 9796 (0.0035) +[2024-11-08 00:05:37,932][41694] Fps is (10 sec: 4915.2, 60 sec: 5529.6, 300 sec: 6553.6). Total num frames: 40124416. Throughput: 0: 1331.2. Samples: 5026144. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:05:37,934][41694] Avg episode reward: [(0, '4.441')] +[2024-11-08 00:05:38,045][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000009797_40128512.pth... +[2024-11-08 00:05:38,206][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000009415_38563840.pth +[2024-11-08 00:05:42,932][41694] Fps is (10 sec: 6963.2, 60 sec: 5666.2, 300 sec: 6553.6). Total num frames: 40161280. Throughput: 0: 1427.1. Samples: 5036302. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:05:42,933][41694] Avg episode reward: [(0, '4.307')] +[2024-11-08 00:05:43,487][42004] Updated weights for policy 0, policy_version 9806 (0.0027) +[2024-11-08 00:05:47,934][41694] Fps is (10 sec: 6961.9, 60 sec: 5665.9, 300 sec: 6539.7). Total num frames: 40194048. Throughput: 0: 1443.8. Samples: 5041416. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:05:47,935][41694] Avg episode reward: [(0, '4.543')] +[2024-11-08 00:05:49,544][42004] Updated weights for policy 0, policy_version 9816 (0.0049) +[2024-11-08 00:05:52,932][41694] Fps is (10 sec: 6553.6, 60 sec: 5734.4, 300 sec: 6567.5). Total num frames: 40226816. Throughput: 0: 1474.2. Samples: 5051554. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:05:52,934][41694] Avg episode reward: [(0, '4.299')] +[2024-11-08 00:05:56,432][42004] Updated weights for policy 0, policy_version 9826 (0.0033) +[2024-11-08 00:05:57,932][41694] Fps is (10 sec: 6145.2, 60 sec: 5939.2, 300 sec: 6539.7). Total num frames: 40255488. Throughput: 0: 1464.2. Samples: 5060164. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:05:57,935][41694] Avg episode reward: [(0, '4.167')] +[2024-11-08 00:06:02,932][41694] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 6511.9). Total num frames: 40284160. Throughput: 0: 1473.5. Samples: 5064654. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:06:02,934][41694] Avg episode reward: [(0, '4.614')] +[2024-11-08 00:06:03,273][42004] Updated weights for policy 0, policy_version 9836 (0.0026) +[2024-11-08 00:06:07,932][41694] Fps is (10 sec: 4915.2, 60 sec: 5734.4, 300 sec: 6484.2). Total num frames: 40304640. Throughput: 0: 1449.5. Samples: 5070620. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:06:07,934][41694] Avg episode reward: [(0, '4.461')] +[2024-11-08 00:06:11,268][42004] Updated weights for policy 0, policy_version 9846 (0.0032) +[2024-11-08 00:06:12,931][41694] Fps is (10 sec: 5324.9, 60 sec: 5802.9, 300 sec: 6484.2). Total num frames: 40337408. Throughput: 0: 1467.3. Samples: 5080804. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:06:12,937][41694] Avg episode reward: [(0, '4.435')] +[2024-11-08 00:06:17,868][42004] Updated weights for policy 0, policy_version 9856 (0.0032) +[2024-11-08 00:06:17,931][41694] Fps is (10 sec: 6553.8, 60 sec: 5939.2, 300 sec: 6456.4). Total num frames: 40370176. Throughput: 0: 1482.9. Samples: 5085320. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:06:17,933][41694] Avg episode reward: [(0, '4.272')] +[2024-11-08 00:06:22,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6076.3, 300 sec: 6442.5). Total num frames: 40402944. Throughput: 0: 1544.4. Samples: 5095640. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:06:22,942][41694] Avg episode reward: [(0, '4.589')] +[2024-11-08 00:06:23,836][42004] Updated weights for policy 0, policy_version 9866 (0.0033) +[2024-11-08 00:06:27,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6007.5, 300 sec: 6484.2). Total num frames: 40435712. Throughput: 0: 1532.6. Samples: 5105268. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:06:27,934][41694] Avg episode reward: [(0, '4.494')] +[2024-11-08 00:06:30,621][42004] Updated weights for policy 0, policy_version 9876 (0.0031) +[2024-11-08 00:06:32,931][41694] Fps is (10 sec: 6144.2, 60 sec: 6212.3, 300 sec: 6456.4). Total num frames: 40464384. Throughput: 0: 1514.7. Samples: 5109574. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:06:32,933][41694] Avg episode reward: [(0, '4.637')] +[2024-11-08 00:06:36,901][42004] Updated weights for policy 0, policy_version 9886 (0.0027) +[2024-11-08 00:06:37,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6212.3, 300 sec: 6428.6). Total num frames: 40497152. Throughput: 0: 1502.8. Samples: 5119180. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:06:37,936][41694] Avg episode reward: [(0, '4.493')] +[2024-11-08 00:06:42,932][41694] Fps is (10 sec: 5324.8, 60 sec: 5939.2, 300 sec: 6387.0). Total num frames: 40517632. Throughput: 0: 1452.7. Samples: 5125536. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:06:42,933][41694] Avg episode reward: [(0, '4.686')] +[2024-11-08 00:06:45,315][42004] Updated weights for policy 0, policy_version 9896 (0.2151) +[2024-11-08 00:06:47,932][41694] Fps is (10 sec: 5324.7, 60 sec: 5939.4, 300 sec: 6373.1). Total num frames: 40550400. Throughput: 0: 1463.7. Samples: 5130520. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:06:47,933][41694] Avg episode reward: [(0, '4.580')] +[2024-11-08 00:06:51,222][42004] Updated weights for policy 0, policy_version 9906 (0.0039) +[2024-11-08 00:06:52,933][41694] Fps is (10 sec: 6552.9, 60 sec: 5939.1, 300 sec: 6345.3). Total num frames: 40583168. Throughput: 0: 1565.4. Samples: 5141064. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:06:52,938][41694] Avg episode reward: [(0, '4.318')] +[2024-11-08 00:06:57,111][42004] Updated weights for policy 0, policy_version 9916 (0.0027) +[2024-11-08 00:06:57,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6075.7, 300 sec: 6359.2). Total num frames: 40620032. Throughput: 0: 1570.0. Samples: 5151456. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:06:57,934][41694] Avg episode reward: [(0, '4.130')] +[2024-11-08 00:07:02,931][41694] Fps is (10 sec: 6554.3, 60 sec: 6075.7, 300 sec: 6373.1). Total num frames: 40648704. Throughput: 0: 1583.2. Samples: 5156566. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:07:02,934][41694] Avg episode reward: [(0, '4.298')] +[2024-11-08 00:07:03,708][42004] Updated weights for policy 0, policy_version 9926 (0.0034) +[2024-11-08 00:07:07,936][41694] Fps is (10 sec: 5732.0, 60 sec: 6211.8, 300 sec: 6345.2). Total num frames: 40677376. Throughput: 0: 1534.8. Samples: 5164712. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:07:07,938][41694] Avg episode reward: [(0, '4.349')] +[2024-11-08 00:07:10,326][42004] Updated weights for policy 0, policy_version 9936 (0.0036) +[2024-11-08 00:07:14,596][41694] Fps is (10 sec: 5267.4, 60 sec: 6044.6, 300 sec: 6295.9). Total num frames: 40710144. Throughput: 0: 1493.1. Samples: 5174940. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:07:14,597][41694] Avg episode reward: [(0, '4.383')] +[2024-11-08 00:07:17,932][41694] Fps is (10 sec: 5736.7, 60 sec: 6075.7, 300 sec: 6303.7). Total num frames: 40734720. Throughput: 0: 1486.2. Samples: 5176452. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:07:17,934][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 00:07:18,341][42004] Updated weights for policy 0, policy_version 9946 (0.0029) +[2024-11-08 00:07:22,932][41694] Fps is (10 sec: 6878.9, 60 sec: 6075.7, 300 sec: 6275.9). Total num frames: 40767488. Throughput: 0: 1492.9. Samples: 5186360. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:07:22,935][41694] Avg episode reward: [(0, '4.591')] +[2024-11-08 00:07:24,568][42004] Updated weights for policy 0, policy_version 9956 (0.0029) +[2024-11-08 00:07:27,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6075.7, 300 sec: 6248.1). Total num frames: 40800256. Throughput: 0: 1578.7. Samples: 5196578. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:07:27,934][41694] Avg episode reward: [(0, '4.468')] +[2024-11-08 00:07:30,502][42004] Updated weights for policy 0, policy_version 9966 (0.0044) +[2024-11-08 00:07:32,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6212.2, 300 sec: 6297.8). Total num frames: 40837120. Throughput: 0: 1583.8. Samples: 5201790. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:07:32,934][41694] Avg episode reward: [(0, '4.296')] +[2024-11-08 00:07:36,977][42004] Updated weights for policy 0, policy_version 9976 (0.0066) +[2024-11-08 00:07:37,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6144.0, 300 sec: 6262.0). Total num frames: 40865792. Throughput: 0: 1565.4. Samples: 5211506. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:07:37,937][41694] Avg episode reward: [(0, '4.535')] +[2024-11-08 00:07:37,956][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000009977_40865792.pth... +[2024-11-08 00:07:38,102][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000009620_39403520.pth +[2024-11-08 00:07:42,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6280.4, 300 sec: 6234.2). Total num frames: 40894464. Throughput: 0: 1533.4. Samples: 5220462. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:07:42,935][41694] Avg episode reward: [(0, '4.504')] +[2024-11-08 00:07:43,643][42004] Updated weights for policy 0, policy_version 9986 (0.0046) +[2024-11-08 00:07:48,978][41694] Fps is (10 sec: 5191.0, 60 sec: 6105.8, 300 sec: 6198.4). Total num frames: 40923136. Throughput: 0: 1494.9. Samples: 5225400. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:07:48,980][41694] Avg episode reward: [(0, '4.424')] +[2024-11-08 00:07:51,894][42004] Updated weights for policy 0, policy_version 9996 (0.0036) +[2024-11-08 00:07:52,932][41694] Fps is (10 sec: 5325.2, 60 sec: 6075.8, 300 sec: 6178.7). Total num frames: 40947712. Throughput: 0: 1493.8. Samples: 5231928. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:07:52,935][41694] Avg episode reward: [(0, '4.515')] +[2024-11-08 00:07:57,770][42004] Updated weights for policy 0, policy_version 10006 (0.0032) +[2024-11-08 00:07:57,933][41694] Fps is (10 sec: 6861.4, 60 sec: 6075.6, 300 sec: 6178.7). Total num frames: 40984576. Throughput: 0: 1558.2. Samples: 5242470. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:07:57,937][41694] Avg episode reward: [(0, '4.476')] +[2024-11-08 00:08:02,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6144.0, 300 sec: 6164.8). Total num frames: 41017344. Throughput: 0: 1574.2. Samples: 5247292. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:08:02,933][41694] Avg episode reward: [(0, '4.416')] +[2024-11-08 00:08:04,092][42004] Updated weights for policy 0, policy_version 10016 (0.0034) +[2024-11-08 00:08:07,932][41694] Fps is (10 sec: 6144.7, 60 sec: 6144.4, 300 sec: 6186.1). Total num frames: 41046016. Throughput: 0: 1566.1. Samples: 5256834. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:08:07,933][41694] Avg episode reward: [(0, '4.295')] +[2024-11-08 00:08:10,699][42004] Updated weights for policy 0, policy_version 10026 (0.0043) +[2024-11-08 00:08:12,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6319.3, 300 sec: 6178.7). Total num frames: 41078784. Throughput: 0: 1543.9. Samples: 5266052. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:08:12,934][41694] Avg episode reward: [(0, '4.327')] +[2024-11-08 00:08:17,285][42004] Updated weights for policy 0, policy_version 10036 (0.0047) +[2024-11-08 00:08:17,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6150.9). Total num frames: 41111552. Throughput: 0: 1528.7. Samples: 5270580. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:08:17,935][41694] Avg episode reward: [(0, '4.573')] +[2024-11-08 00:08:23,369][41694] Fps is (10 sec: 5101.8, 60 sec: 6031.8, 300 sec: 6100.2). Total num frames: 41132032. Throughput: 0: 1523.5. Samples: 5280728. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:08:23,371][41694] Avg episode reward: [(0, '4.613')] +[2024-11-08 00:08:25,447][42004] Updated weights for policy 0, policy_version 10046 (0.0038) +[2024-11-08 00:08:27,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6075.7, 300 sec: 6095.4). Total num frames: 41164800. Throughput: 0: 1483.4. Samples: 5287216. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:08:27,934][41694] Avg episode reward: [(0, '4.365')] +[2024-11-08 00:08:31,440][42004] Updated weights for policy 0, policy_version 10056 (0.0032) +[2024-11-08 00:08:32,934][41694] Fps is (10 sec: 6851.5, 60 sec: 6007.3, 300 sec: 6081.5). Total num frames: 41197568. Throughput: 0: 1525.4. Samples: 5292448. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:08:32,938][41694] Avg episode reward: [(0, '4.429')] +[2024-11-08 00:08:37,389][42004] Updated weights for policy 0, policy_version 10066 (0.0038) +[2024-11-08 00:08:37,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6075.7, 300 sec: 6067.7). Total num frames: 41230336. Throughput: 0: 1573.4. Samples: 5302732. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:08:37,934][41694] Avg episode reward: [(0, '4.408')] +[2024-11-08 00:08:42,931][41694] Fps is (10 sec: 6964.8, 60 sec: 6212.4, 300 sec: 6118.5). Total num frames: 41267200. Throughput: 0: 1569.7. Samples: 5313104. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:08:42,941][41694] Avg episode reward: [(0, '4.324')] +[2024-11-08 00:08:43,357][42004] Updated weights for policy 0, policy_version 10076 (0.0042) +[2024-11-08 00:08:47,932][41694] Fps is (10 sec: 6553.3, 60 sec: 6322.5, 300 sec: 6095.5). Total num frames: 41295872. Throughput: 0: 1566.3. Samples: 5317778. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:08:47,938][41694] Avg episode reward: [(0, '4.443')] +[2024-11-08 00:08:50,200][42004] Updated weights for policy 0, policy_version 10086 (0.0041) +[2024-11-08 00:08:52,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6348.8, 300 sec: 6095.4). Total num frames: 41328640. Throughput: 0: 1558.3. Samples: 5326958. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:08:52,937][41694] Avg episode reward: [(0, '4.435')] +[2024-11-08 00:08:57,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6075.8, 300 sec: 6067.6). Total num frames: 41349120. Throughput: 0: 1530.6. Samples: 5334928. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:08:57,937][41694] Avg episode reward: [(0, '4.367')] +[2024-11-08 00:08:58,206][42004] Updated weights for policy 0, policy_version 10096 (0.0024) +[2024-11-08 00:09:02,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6075.7, 300 sec: 6067.6). Total num frames: 41381888. Throughput: 0: 1515.6. Samples: 5338780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:09:02,936][41694] Avg episode reward: [(0, '4.420')] +[2024-11-08 00:09:04,565][42004] Updated weights for policy 0, policy_version 10106 (0.0029) +[2024-11-08 00:09:07,933][41694] Fps is (10 sec: 6552.8, 60 sec: 6143.9, 300 sec: 6067.6). Total num frames: 41414656. Throughput: 0: 1516.8. Samples: 5348322. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:09:07,937][41694] Avg episode reward: [(0, '4.443')] +[2024-11-08 00:09:10,773][42004] Updated weights for policy 0, policy_version 10116 (0.0039) +[2024-11-08 00:09:12,932][41694] Fps is (10 sec: 6553.0, 60 sec: 6143.9, 300 sec: 6053.7). Total num frames: 41447424. Throughput: 0: 1580.8. Samples: 5358354. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:09:12,934][41694] Avg episode reward: [(0, '4.442')] +[2024-11-08 00:09:17,031][42004] Updated weights for policy 0, policy_version 10126 (0.0030) +[2024-11-08 00:09:17,932][41694] Fps is (10 sec: 6554.6, 60 sec: 6144.0, 300 sec: 6095.4). Total num frames: 41480192. Throughput: 0: 1569.5. Samples: 5363070. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:09:17,933][41694] Avg episode reward: [(0, '4.347')] +[2024-11-08 00:09:22,932][41694] Fps is (10 sec: 6144.6, 60 sec: 6326.6, 300 sec: 6067.6). Total num frames: 41508864. Throughput: 0: 1538.2. Samples: 5371952. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:09:22,934][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 00:09:24,186][42004] Updated weights for policy 0, policy_version 10136 (0.0030) +[2024-11-08 00:09:27,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6280.5, 300 sec: 6053.8). Total num frames: 41541632. Throughput: 0: 1530.6. Samples: 5381980. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:09:27,934][41694] Avg episode reward: [(0, '4.486')] +[2024-11-08 00:09:29,723][42004] Updated weights for policy 0, policy_version 10146 (0.0029) +[2024-11-08 00:09:32,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6144.2, 300 sec: 6012.1). Total num frames: 41566208. Throughput: 0: 1551.9. Samples: 5387612. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:09:32,933][41694] Avg episode reward: [(0, '4.402')] +[2024-11-08 00:09:37,214][42004] Updated weights for policy 0, policy_version 10156 (0.0029) +[2024-11-08 00:09:37,933][41694] Fps is (10 sec: 6143.0, 60 sec: 6212.1, 300 sec: 6039.8). Total num frames: 41603072. Throughput: 0: 1513.4. Samples: 5395062. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:09:37,937][41694] Avg episode reward: [(0, '4.339')] +[2024-11-08 00:09:37,950][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000010157_41603072.pth... +[2024-11-08 00:09:38,073][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000009797_40128512.pth +[2024-11-08 00:09:42,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6144.0, 300 sec: 6039.9). Total num frames: 41635840. Throughput: 0: 1569.9. Samples: 5405572. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:09:42,933][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 00:09:43,238][42004] Updated weights for policy 0, policy_version 10166 (0.0033) +[2024-11-08 00:09:47,931][41694] Fps is (10 sec: 6964.5, 60 sec: 6280.6, 300 sec: 6067.6). Total num frames: 41672704. Throughput: 0: 1598.2. Samples: 5410698. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:09:47,933][41694] Avg episode reward: [(0, '4.521')] +[2024-11-08 00:09:48,701][42004] Updated weights for policy 0, policy_version 10176 (0.0032) +[2024-11-08 00:09:52,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6348.8, 300 sec: 6137.1). Total num frames: 41709568. Throughput: 0: 1638.8. Samples: 5422064. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:09:52,933][41694] Avg episode reward: [(0, '4.328')] +[2024-11-08 00:09:54,658][42004] Updated weights for policy 0, policy_version 10186 (0.0042) +[2024-11-08 00:09:57,932][41694] Fps is (10 sec: 6962.8, 60 sec: 6553.6, 300 sec: 6137.0). Total num frames: 41742336. Throughput: 0: 1637.9. Samples: 5432060. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:09:57,934][41694] Avg episode reward: [(0, '4.417')] +[2024-11-08 00:10:00,543][42004] Updated weights for policy 0, policy_version 10196 (0.0038) +[2024-11-08 00:10:02,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6151.0). Total num frames: 41775104. Throughput: 0: 1656.1. Samples: 5437594. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:10:02,933][41694] Avg episode reward: [(0, '4.422')] +[2024-11-08 00:10:07,932][41694] Fps is (10 sec: 5734.6, 60 sec: 6417.2, 300 sec: 6137.1). Total num frames: 41799680. Throughput: 0: 1632.8. Samples: 5445428. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:10:07,933][41694] Avg episode reward: [(0, '4.553')] +[2024-11-08 00:10:08,307][42004] Updated weights for policy 0, policy_version 10206 (0.0030) +[2024-11-08 00:10:12,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6485.4, 300 sec: 6178.7). Total num frames: 41836544. Throughput: 0: 1633.6. Samples: 5455494. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:10:12,934][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 00:10:13,889][42004] Updated weights for policy 0, policy_version 10216 (0.0026) +[2024-11-08 00:10:17,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6192.7). Total num frames: 41865216. Throughput: 0: 1612.6. Samples: 5460178. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:10:17,934][41694] Avg episode reward: [(0, '4.380')] +[2024-11-08 00:10:20,702][42004] Updated weights for policy 0, policy_version 10226 (0.0050) +[2024-11-08 00:10:22,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6485.3, 300 sec: 6178.7). Total num frames: 41897984. Throughput: 0: 1652.7. Samples: 5469430. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:10:22,934][41694] Avg episode reward: [(0, '4.403')] +[2024-11-08 00:10:26,359][42004] Updated weights for policy 0, policy_version 10236 (0.0028) +[2024-11-08 00:10:27,933][41694] Fps is (10 sec: 6962.5, 60 sec: 6553.5, 300 sec: 6248.1). Total num frames: 41934848. Throughput: 0: 1662.7. Samples: 5480396. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:10:27,934][41694] Avg episode reward: [(0, '4.270')] +[2024-11-08 00:10:32,540][42004] Updated weights for policy 0, policy_version 10246 (0.0035) +[2024-11-08 00:10:32,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 6248.1). Total num frames: 41967616. Throughput: 0: 1659.6. Samples: 5485382. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:10:32,934][41694] Avg episode reward: [(0, '4.338')] +[2024-11-08 00:10:37,801][42004] Updated weights for policy 0, policy_version 10256 (0.0032) +[2024-11-08 00:10:37,931][41694] Fps is (10 sec: 7373.6, 60 sec: 6758.6, 300 sec: 6262.0). Total num frames: 42008576. Throughput: 0: 1647.7. Samples: 5496210. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:10:37,933][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 00:10:42,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6553.6, 300 sec: 6220.4). Total num frames: 42029056. Throughput: 0: 1594.4. Samples: 5503806. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:10:42,934][41694] Avg episode reward: [(0, '4.352')] +[2024-11-08 00:10:45,297][42004] Updated weights for policy 0, policy_version 10266 (0.0025) +[2024-11-08 00:10:47,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6553.6, 300 sec: 6234.3). Total num frames: 42065920. Throughput: 0: 1595.5. Samples: 5509390. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:10:47,933][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 00:10:50,634][42004] Updated weights for policy 0, policy_version 10276 (0.0028) +[2024-11-08 00:10:52,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6621.9, 300 sec: 6275.9). Total num frames: 42106880. Throughput: 0: 1677.9. Samples: 5520934. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:10:52,934][41694] Avg episode reward: [(0, '4.347')] +[2024-11-08 00:10:55,745][42004] Updated weights for policy 0, policy_version 10286 (0.0026) +[2024-11-08 00:10:57,932][41694] Fps is (10 sec: 7782.1, 60 sec: 6690.1, 300 sec: 6303.7). Total num frames: 42143744. Throughput: 0: 1717.7. Samples: 5532792. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:10:57,934][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 00:11:01,647][42004] Updated weights for policy 0, policy_version 10296 (0.0037) +[2024-11-08 00:11:02,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.2, 300 sec: 6345.3). Total num frames: 42176512. Throughput: 0: 1736.1. Samples: 5538302. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:11:02,935][41694] Avg episode reward: [(0, '4.429')] +[2024-11-08 00:11:07,871][42004] Updated weights for policy 0, policy_version 10306 (0.0032) +[2024-11-08 00:11:07,932][41694] Fps is (10 sec: 6963.4, 60 sec: 6894.9, 300 sec: 6359.2). Total num frames: 42213376. Throughput: 0: 1731.2. Samples: 5547334. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:11:07,934][41694] Avg episode reward: [(0, '4.580')] +[2024-11-08 00:11:12,931][41694] Fps is (10 sec: 6553.5, 60 sec: 6758.4, 300 sec: 6345.3). Total num frames: 42242048. Throughput: 0: 1714.1. Samples: 5557528. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:11:12,933][41694] Avg episode reward: [(0, '4.499')] +[2024-11-08 00:11:16,375][42004] Updated weights for policy 0, policy_version 10316 (0.0025) +[2024-11-08 00:11:17,931][41694] Fps is (10 sec: 4915.3, 60 sec: 6621.9, 300 sec: 6303.7). Total num frames: 42262528. Throughput: 0: 1649.6. Samples: 5559612. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:11:17,933][41694] Avg episode reward: [(0, '4.342')] +[2024-11-08 00:11:22,569][42004] Updated weights for policy 0, policy_version 10326 (0.0025) +[2024-11-08 00:11:22,932][41694] Fps is (10 sec: 5324.6, 60 sec: 6621.8, 300 sec: 6303.7). Total num frames: 42295296. Throughput: 0: 1614.9. Samples: 5568882. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:11:22,934][41694] Avg episode reward: [(0, '4.459')] +[2024-11-08 00:11:27,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6622.0, 300 sec: 6331.4). Total num frames: 42332160. Throughput: 0: 1680.7. Samples: 5579438. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:11:27,933][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 00:11:28,301][42004] Updated weights for policy 0, policy_version 10336 (0.0029) +[2024-11-08 00:11:32,931][41694] Fps is (10 sec: 7373.1, 60 sec: 6690.1, 300 sec: 6345.3). Total num frames: 42369024. Throughput: 0: 1670.8. Samples: 5584578. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:11:32,933][41694] Avg episode reward: [(0, '4.449')] +[2024-11-08 00:11:33,777][42004] Updated weights for policy 0, policy_version 10346 (0.0030) +[2024-11-08 00:11:37,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6553.6, 300 sec: 6387.0). Total num frames: 42401792. Throughput: 0: 1653.2. Samples: 5595326. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:11:37,934][41694] Avg episode reward: [(0, '4.442')] +[2024-11-08 00:11:37,948][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000010352_42401792.pth... +[2024-11-08 00:11:38,088][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000009977_40865792.pth +[2024-11-08 00:11:39,942][42004] Updated weights for policy 0, policy_version 10356 (0.0027) +[2024-11-08 00:11:42,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6400.9). Total num frames: 42438656. Throughput: 0: 1615.5. Samples: 5605488. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:11:42,933][41694] Avg episode reward: [(0, '4.499')] +[2024-11-08 00:11:45,646][42004] Updated weights for policy 0, policy_version 10366 (0.0023) +[2024-11-08 00:11:49,835][41694] Fps is (10 sec: 5849.7, 60 sec: 6550.6, 300 sec: 6359.8). Total num frames: 42471424. Throughput: 0: 1553.1. Samples: 5611148. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:11:49,837][41694] Avg episode reward: [(0, '4.622')] +[2024-11-08 00:11:52,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6485.3, 300 sec: 6359.2). Total num frames: 42496000. Throughput: 0: 1585.6. Samples: 5618684. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:11:52,933][41694] Avg episode reward: [(0, '4.506')] +[2024-11-08 00:11:53,008][42004] Updated weights for policy 0, policy_version 10376 (0.0032) +[2024-11-08 00:11:57,932][41694] Fps is (10 sec: 8094.5, 60 sec: 6553.6, 300 sec: 6400.9). Total num frames: 42536960. Throughput: 0: 1614.5. Samples: 5630182. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:11:57,934][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 00:11:58,419][42004] Updated weights for policy 0, policy_version 10386 (0.0026) +[2024-11-08 00:12:02,933][41694] Fps is (10 sec: 7781.5, 60 sec: 6621.7, 300 sec: 6428.7). Total num frames: 42573824. Throughput: 0: 1701.6. Samples: 5636186. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:12:02,935][41694] Avg episode reward: [(0, '4.678')] +[2024-11-08 00:12:03,702][42004] Updated weights for policy 0, policy_version 10396 (0.0029) +[2024-11-08 00:12:07,932][41694] Fps is (10 sec: 7372.1, 60 sec: 6621.8, 300 sec: 6479.0). Total num frames: 42610688. Throughput: 0: 1743.6. Samples: 5647346. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:12:07,940][41694] Avg episode reward: [(0, '4.346')] +[2024-11-08 00:12:09,344][42004] Updated weights for policy 0, policy_version 10406 (0.0021) +[2024-11-08 00:12:12,932][41694] Fps is (10 sec: 7373.6, 60 sec: 6758.4, 300 sec: 6484.2). Total num frames: 42647552. Throughput: 0: 1744.8. Samples: 5657956. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:12:12,933][41694] Avg episode reward: [(0, '4.444')] +[2024-11-08 00:12:15,031][42004] Updated weights for policy 0, policy_version 10416 (0.0028) +[2024-11-08 00:12:17,932][41694] Fps is (10 sec: 7373.5, 60 sec: 7031.5, 300 sec: 6498.1). Total num frames: 42684416. Throughput: 0: 1753.6. Samples: 5663492. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:12:17,934][41694] Avg episode reward: [(0, '4.305')] +[2024-11-08 00:12:20,287][42004] Updated weights for policy 0, policy_version 10426 (0.0026) +[2024-11-08 00:12:24,181][41694] Fps is (10 sec: 6189.7, 60 sec: 6888.0, 300 sec: 6470.7). Total num frames: 42717184. Throughput: 0: 1729.7. Samples: 5675326. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:12:24,185][41694] Avg episode reward: [(0, '4.253')] +[2024-11-08 00:12:27,673][42004] Updated weights for policy 0, policy_version 10436 (0.0023) +[2024-11-08 00:12:27,932][41694] Fps is (10 sec: 6143.5, 60 sec: 6894.8, 300 sec: 6470.3). Total num frames: 42745856. Throughput: 0: 1716.8. Samples: 5682744. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:12:27,935][41694] Avg episode reward: [(0, '4.303')] +[2024-11-08 00:12:32,931][41694] Fps is (10 sec: 7489.6, 60 sec: 6894.9, 300 sec: 6498.1). Total num frames: 42782720. Throughput: 0: 1796.0. Samples: 5688548. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:12:32,933][41694] Avg episode reward: [(0, '4.237')] +[2024-11-08 00:12:32,936][42004] Updated weights for policy 0, policy_version 10446 (0.0033) +[2024-11-08 00:12:37,931][41694] Fps is (10 sec: 7783.0, 60 sec: 7031.5, 300 sec: 6539.7). Total num frames: 42823680. Throughput: 0: 1807.6. Samples: 5700028. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:12:37,934][41694] Avg episode reward: [(0, '4.391')] +[2024-11-08 00:12:38,432][42004] Updated weights for policy 0, policy_version 10456 (0.0038) +[2024-11-08 00:12:42,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7031.5, 300 sec: 6590.9). Total num frames: 42860544. Throughput: 0: 1798.7. Samples: 5711124. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:12:42,934][41694] Avg episode reward: [(0, '4.544')] +[2024-11-08 00:12:44,155][42004] Updated weights for policy 0, policy_version 10466 (0.0026) +[2024-11-08 00:12:47,932][41694] Fps is (10 sec: 6963.2, 60 sec: 7261.9, 300 sec: 6595.3). Total num frames: 42893312. Throughput: 0: 1779.2. Samples: 5716250. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:12:47,933][41694] Avg episode reward: [(0, '4.307')] +[2024-11-08 00:12:49,832][42004] Updated weights for policy 0, policy_version 10476 (0.0025) +[2024-11-08 00:12:52,933][41694] Fps is (10 sec: 6962.4, 60 sec: 7236.1, 300 sec: 6595.3). Total num frames: 42930176. Throughput: 0: 1777.5. Samples: 5727334. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:12:52,935][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 00:12:55,275][42004] Updated weights for policy 0, policy_version 10486 (0.0026) +[2024-11-08 00:12:58,549][41694] Fps is (10 sec: 6172.3, 60 sec: 6959.8, 300 sec: 6567.6). Total num frames: 42958848. Throughput: 0: 1645.3. Samples: 5733010. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:12:58,553][41694] Avg episode reward: [(0, '4.604')] +[2024-11-08 00:13:02,931][41694] Fps is (10 sec: 5735.1, 60 sec: 6895.1, 300 sec: 6581.4). Total num frames: 42987520. Throughput: 0: 1704.4. Samples: 5740190. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:13:02,934][41694] Avg episode reward: [(0, '4.456')] +[2024-11-08 00:13:03,132][42004] Updated weights for policy 0, policy_version 10496 (0.0034) +[2024-11-08 00:13:07,932][41694] Fps is (10 sec: 6985.1, 60 sec: 6895.0, 300 sec: 6595.3). Total num frames: 43024384. Throughput: 0: 1730.8. Samples: 5751050. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:13:07,933][41694] Avg episode reward: [(0, '4.399')] +[2024-11-08 00:13:08,566][42004] Updated weights for policy 0, policy_version 10506 (0.0043) +[2024-11-08 00:13:12,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6963.2, 300 sec: 6623.0). Total num frames: 43065344. Throughput: 0: 1772.7. Samples: 5762512. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:13:12,934][41694] Avg episode reward: [(0, '4.728')] +[2024-11-08 00:13:13,748][42004] Updated weights for policy 0, policy_version 10516 (0.0021) +[2024-11-08 00:13:17,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6895.0, 300 sec: 6674.6). Total num frames: 43098112. Throughput: 0: 1767.6. Samples: 5768092. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:13:17,935][41694] Avg episode reward: [(0, '4.420')] +[2024-11-08 00:13:19,861][42004] Updated weights for policy 0, policy_version 10526 (0.0036) +[2024-11-08 00:13:22,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7181.0, 300 sec: 6692.5). Total num frames: 43139072. Throughput: 0: 1746.8. Samples: 5778636. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:13:22,933][41694] Avg episode reward: [(0, '4.528')] +[2024-11-08 00:13:25,146][42004] Updated weights for policy 0, policy_version 10536 (0.0044) +[2024-11-08 00:13:27,932][41694] Fps is (10 sec: 7782.2, 60 sec: 7168.1, 300 sec: 6706.4). Total num frames: 43175936. Throughput: 0: 1750.7. Samples: 5789906. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:13:27,934][41694] Avg episode reward: [(0, '4.376')] +[2024-11-08 00:13:30,479][42004] Updated weights for policy 0, policy_version 10546 (0.0030) +[2024-11-08 00:13:33,061][41694] Fps is (10 sec: 5661.2, 60 sec: 6880.1, 300 sec: 6661.8). Total num frames: 43196416. Throughput: 0: 1762.0. Samples: 5795768. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:13:33,062][41694] Avg episode reward: [(0, '4.560')] +[2024-11-08 00:13:37,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6826.7, 300 sec: 6664.7). Total num frames: 43233280. Throughput: 0: 1673.1. Samples: 5802622. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:13:37,934][41694] Avg episode reward: [(0, '4.396')] +[2024-11-08 00:13:37,945][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000010555_43233280.pth... +[2024-11-08 00:13:38,080][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000010157_41603072.pth +[2024-11-08 00:13:38,339][42004] Updated weights for policy 0, policy_version 10556 (0.0033) +[2024-11-08 00:13:42,932][41694] Fps is (10 sec: 7469.4, 60 sec: 6826.7, 300 sec: 6692.5). Total num frames: 43270144. Throughput: 0: 1821.9. Samples: 5813872. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:13:42,933][41694] Avg episode reward: [(0, '4.420')] +[2024-11-08 00:13:43,566][42004] Updated weights for policy 0, policy_version 10566 (0.0032) +[2024-11-08 00:13:47,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6720.2). Total num frames: 43311104. Throughput: 0: 1773.7. Samples: 5820008. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:13:47,934][41694] Avg episode reward: [(0, '4.491')] +[2024-11-08 00:13:48,669][42004] Updated weights for policy 0, policy_version 10576 (0.0028) +[2024-11-08 00:13:52,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6963.3, 300 sec: 6775.8). Total num frames: 43347968. Throughput: 0: 1783.2. Samples: 5831294. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:13:52,935][41694] Avg episode reward: [(0, '4.448')] +[2024-11-08 00:13:54,443][42004] Updated weights for policy 0, policy_version 10586 (0.0041) +[2024-11-08 00:13:57,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7173.6, 300 sec: 6789.6). Total num frames: 43384832. Throughput: 0: 1780.7. Samples: 5842642. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:13:57,934][41694] Avg episode reward: [(0, '4.237')] +[2024-11-08 00:13:59,679][42004] Updated weights for policy 0, policy_version 10596 (0.0026) +[2024-11-08 00:14:02,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7236.3, 300 sec: 6803.6). Total num frames: 43421696. Throughput: 0: 1782.7. Samples: 5848312. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:14:02,933][41694] Avg episode reward: [(0, '4.327')] +[2024-11-08 00:14:07,655][42004] Updated weights for policy 0, policy_version 10606 (0.0027) +[2024-11-08 00:14:07,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6963.2, 300 sec: 6761.9). Total num frames: 43442176. Throughput: 0: 1763.5. Samples: 5857992. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:14:07,934][41694] Avg episode reward: [(0, '4.302')] +[2024-11-08 00:14:12,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6894.9, 300 sec: 6775.8). Total num frames: 43479040. Throughput: 0: 1682.5. Samples: 5865618. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:14:12,933][41694] Avg episode reward: [(0, '4.628')] +[2024-11-08 00:14:13,482][42004] Updated weights for policy 0, policy_version 10616 (0.0026) +[2024-11-08 00:14:17,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6803.5). Total num frames: 43515904. Throughput: 0: 1684.0. Samples: 5871330. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:14:17,933][41694] Avg episode reward: [(0, '4.423')] +[2024-11-08 00:14:18,523][42004] Updated weights for policy 0, policy_version 10626 (0.0043) +[2024-11-08 00:14:22,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6831.3). Total num frames: 43556864. Throughput: 0: 1793.1. Samples: 5883310. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:14:22,933][41694] Avg episode reward: [(0, '4.429')] +[2024-11-08 00:14:24,062][42004] Updated weights for policy 0, policy_version 10636 (0.0024) +[2024-11-08 00:14:27,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6894.9, 300 sec: 6859.1). Total num frames: 43589632. Throughput: 0: 1780.0. Samples: 5893972. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:14:27,935][41694] Avg episode reward: [(0, '4.348')] +[2024-11-08 00:14:29,536][42004] Updated weights for policy 0, policy_version 10646 (0.0034) +[2024-11-08 00:14:32,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7251.9, 300 sec: 6873.0). Total num frames: 43630592. Throughput: 0: 1776.0. Samples: 5899928. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:14:32,933][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 00:14:34,832][42004] Updated weights for policy 0, policy_version 10656 (0.0028) +[2024-11-08 00:14:37,932][41694] Fps is (10 sec: 7782.2, 60 sec: 7236.2, 300 sec: 6886.8). Total num frames: 43667456. Throughput: 0: 1785.0. Samples: 5911618. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:14:37,934][41694] Avg episode reward: [(0, '4.104')] +[2024-11-08 00:14:42,514][42004] Updated weights for policy 0, policy_version 10666 (0.0033) +[2024-11-08 00:14:42,932][41694] Fps is (10 sec: 5734.0, 60 sec: 6963.1, 300 sec: 6831.3). Total num frames: 43687936. Throughput: 0: 1689.8. Samples: 5918686. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:14:42,943][41694] Avg episode reward: [(0, '4.209')] +[2024-11-08 00:14:47,933][41694] Fps is (10 sec: 5324.3, 60 sec: 6826.5, 300 sec: 6817.4). Total num frames: 43720704. Throughput: 0: 1671.2. Samples: 5923518. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:14:47,937][41694] Avg episode reward: [(0, '4.393')] +[2024-11-08 00:14:48,517][42004] Updated weights for policy 0, policy_version 10676 (0.0043) +[2024-11-08 00:14:52,932][41694] Fps is (10 sec: 7373.2, 60 sec: 6894.9, 300 sec: 6845.2). Total num frames: 43761664. Throughput: 0: 1699.6. Samples: 5934472. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:14:52,934][41694] Avg episode reward: [(0, '4.340')] +[2024-11-08 00:14:54,002][42004] Updated weights for policy 0, policy_version 10686 (0.0035) +[2024-11-08 00:14:57,931][41694] Fps is (10 sec: 7373.7, 60 sec: 6826.7, 300 sec: 6845.2). Total num frames: 43794432. Throughput: 0: 1772.0. Samples: 5945356. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:14:57,933][41694] Avg episode reward: [(0, '4.565')] +[2024-11-08 00:14:59,904][42004] Updated weights for policy 0, policy_version 10696 (0.0026) +[2024-11-08 00:15:02,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6758.4, 300 sec: 6872.9). Total num frames: 43827200. Throughput: 0: 1754.6. Samples: 5950286. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 00:15:02,935][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 00:15:05,773][42004] Updated weights for policy 0, policy_version 10706 (0.0033) +[2024-11-08 00:15:07,932][41694] Fps is (10 sec: 6963.0, 60 sec: 7031.5, 300 sec: 6872.9). Total num frames: 43864064. Throughput: 0: 1725.4. Samples: 5960954. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 00:15:07,934][41694] Avg episode reward: [(0, '4.280')] +[2024-11-08 00:15:11,340][42004] Updated weights for policy 0, policy_version 10716 (0.0024) +[2024-11-08 00:15:12,937][41694] Fps is (10 sec: 7368.9, 60 sec: 7030.8, 300 sec: 6900.6). Total num frames: 43900928. Throughput: 0: 1725.8. Samples: 5971644. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:15:12,941][41694] Avg episode reward: [(0, '4.347')] +[2024-11-08 00:15:17,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6690.1, 300 sec: 6845.2). Total num frames: 43917312. Throughput: 0: 1664.7. Samples: 5974838. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:15:17,934][41694] Avg episode reward: [(0, '4.389')] +[2024-11-08 00:15:20,269][42004] Updated weights for policy 0, policy_version 10726 (0.0048) +[2024-11-08 00:15:22,932][41694] Fps is (10 sec: 4917.8, 60 sec: 6553.6, 300 sec: 6831.3). Total num frames: 43950080. Throughput: 0: 1572.9. Samples: 5982400. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:15:22,933][41694] Avg episode reward: [(0, '4.211')] +[2024-11-08 00:15:25,715][42004] Updated weights for policy 0, policy_version 10736 (0.0023) +[2024-11-08 00:15:27,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6859.1). Total num frames: 43991040. Throughput: 0: 1665.4. Samples: 5993628. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 00:15:27,934][41694] Avg episode reward: [(0, '4.625')] +[2024-11-08 00:15:31,220][42004] Updated weights for policy 0, policy_version 10746 (0.0033) +[2024-11-08 00:15:32,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6553.6, 300 sec: 6831.3). Total num frames: 44023808. Throughput: 0: 1685.3. Samples: 5999356. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 00:15:32,933][41694] Avg episode reward: [(0, '4.396')] +[2024-11-08 00:15:36,746][42004] Updated weights for policy 0, policy_version 10756 (0.0022) +[2024-11-08 00:15:37,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6900.7). Total num frames: 44064768. Throughput: 0: 1685.5. Samples: 6010318. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:15:37,933][41694] Avg episode reward: [(0, '4.374')] +[2024-11-08 00:15:37,942][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000010758_44064768.pth... +[2024-11-08 00:15:38,058][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000010352_42401792.pth +[2024-11-08 00:15:42,191][42004] Updated weights for policy 0, policy_version 10766 (0.0029) +[2024-11-08 00:15:42,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6895.0, 300 sec: 6900.7). Total num frames: 44101632. Throughput: 0: 1697.6. Samples: 6021748. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:15:42,934][41694] Avg episode reward: [(0, '4.435')] +[2024-11-08 00:15:47,619][42004] Updated weights for policy 0, policy_version 10776 (0.0029) +[2024-11-08 00:15:47,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.3, 300 sec: 6886.8). Total num frames: 44138496. Throughput: 0: 1711.7. Samples: 6027312. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:15:47,935][41694] Avg episode reward: [(0, '4.364')] +[2024-11-08 00:15:52,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6621.9, 300 sec: 6831.3). Total num frames: 44158976. Throughput: 0: 1631.0. Samples: 6034350. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:15:52,934][41694] Avg episode reward: [(0, '4.448')] +[2024-11-08 00:15:55,462][42004] Updated weights for policy 0, policy_version 10786 (0.0031) +[2024-11-08 00:15:57,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6690.1, 300 sec: 6845.2). Total num frames: 44195840. Throughput: 0: 1640.2. Samples: 6045444. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:15:57,933][41694] Avg episode reward: [(0, '4.349')] +[2024-11-08 00:16:00,768][42004] Updated weights for policy 0, policy_version 10796 (0.0029) +[2024-11-08 00:16:02,933][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 44232704. Throughput: 0: 1701.7. Samples: 6051412. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:16:02,935][41694] Avg episode reward: [(0, '4.227')] +[2024-11-08 00:16:06,738][42004] Updated weights for policy 0, policy_version 10806 (0.0031) +[2024-11-08 00:16:07,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6872.9). Total num frames: 44269568. Throughput: 0: 1757.9. Samples: 6061504. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:16:07,934][41694] Avg episode reward: [(0, '4.226')] +[2024-11-08 00:16:12,774][42004] Updated weights for policy 0, policy_version 10816 (0.0036) +[2024-11-08 00:16:12,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.7, 300 sec: 6914.6). Total num frames: 44302336. Throughput: 0: 1734.1. Samples: 6071662. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:16:12,933][41694] Avg episode reward: [(0, '4.329')] +[2024-11-08 00:16:17,932][41694] Fps is (10 sec: 6963.2, 60 sec: 7031.5, 300 sec: 6928.5). Total num frames: 44339200. Throughput: 0: 1725.5. Samples: 6077004. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:16:17,934][41694] Avg episode reward: [(0, '4.452')] +[2024-11-08 00:16:18,271][42004] Updated weights for policy 0, policy_version 10826 (0.0033) +[2024-11-08 00:16:22,932][41694] Fps is (10 sec: 7372.5, 60 sec: 7099.7, 300 sec: 6928.5). Total num frames: 44376064. Throughput: 0: 1739.1. Samples: 6088576. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:16:22,934][41694] Avg episode reward: [(0, '4.415')] +[2024-11-08 00:16:25,877][42004] Updated weights for policy 0, policy_version 10836 (0.0027) +[2024-11-08 00:16:27,932][41694] Fps is (10 sec: 5733.9, 60 sec: 6758.3, 300 sec: 6872.9). Total num frames: 44396544. Throughput: 0: 1640.0. Samples: 6095548. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:16:27,935][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 00:16:31,462][42004] Updated weights for policy 0, policy_version 10846 (0.0035) +[2024-11-08 00:16:32,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6826.6, 300 sec: 6886.8). Total num frames: 44433408. Throughput: 0: 1636.8. Samples: 6100970. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:16:32,934][41694] Avg episode reward: [(0, '4.291')] +[2024-11-08 00:16:36,772][42004] Updated weights for policy 0, policy_version 10856 (0.0025) +[2024-11-08 00:16:37,932][41694] Fps is (10 sec: 7373.3, 60 sec: 6758.4, 300 sec: 6886.8). Total num frames: 44470272. Throughput: 0: 1741.5. Samples: 6112718. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:16:37,936][41694] Avg episode reward: [(0, '4.458')] +[2024-11-08 00:16:42,931][41694] Fps is (10 sec: 6963.8, 60 sec: 6690.2, 300 sec: 6931.6). Total num frames: 44503040. Throughput: 0: 1710.6. Samples: 6122420. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:16:42,933][41694] Avg episode reward: [(0, '4.479')] +[2024-11-08 00:16:43,025][42004] Updated weights for policy 0, policy_version 10866 (0.0037) +[2024-11-08 00:16:47,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6942.4). Total num frames: 44544000. Throughput: 0: 1704.0. Samples: 6128090. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:16:47,933][41694] Avg episode reward: [(0, '4.585')] +[2024-11-08 00:16:48,337][42004] Updated weights for policy 0, policy_version 10876 (0.0032) +[2024-11-08 00:16:52,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7031.5, 300 sec: 6928.5). Total num frames: 44580864. Throughput: 0: 1730.0. Samples: 6139356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:16:52,935][41694] Avg episode reward: [(0, '4.321')] +[2024-11-08 00:16:53,884][42004] Updated weights for policy 0, policy_version 10886 (0.0025) +[2024-11-08 00:16:59,870][41694] Fps is (10 sec: 5832.7, 60 sec: 6745.3, 300 sec: 6869.5). Total num frames: 44613632. Throughput: 0: 1675.4. Samples: 6150302. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:16:59,872][41694] Avg episode reward: [(0, '4.483')] +[2024-11-08 00:17:01,883][42004] Updated weights for policy 0, policy_version 10896 (0.0026) +[2024-11-08 00:17:02,934][41694] Fps is (10 sec: 5323.7, 60 sec: 6689.9, 300 sec: 6859.0). Total num frames: 44634112. Throughput: 0: 1661.3. Samples: 6151768. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:17:02,936][41694] Avg episode reward: [(0, '4.328')] +[2024-11-08 00:17:07,635][42004] Updated weights for policy 0, policy_version 10906 (0.0029) +[2024-11-08 00:17:07,931][41694] Fps is (10 sec: 7113.3, 60 sec: 6690.1, 300 sec: 6859.1). Total num frames: 44670976. Throughput: 0: 1636.1. Samples: 6162198. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:17:07,933][41694] Avg episode reward: [(0, '4.587')] +[2024-11-08 00:17:12,933][41694] Fps is (10 sec: 7373.6, 60 sec: 6758.3, 300 sec: 6859.0). Total num frames: 44707840. Throughput: 0: 1735.6. Samples: 6173650. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:17:12,935][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 00:17:13,167][42004] Updated weights for policy 0, policy_version 10916 (0.0029) +[2024-11-08 00:17:17,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6902.2). Total num frames: 44744704. Throughput: 0: 1731.7. Samples: 6178894. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:17:17,935][41694] Avg episode reward: [(0, '4.476')] +[2024-11-08 00:17:18,582][42004] Updated weights for policy 0, policy_version 10926 (0.0026) +[2024-11-08 00:17:22,932][41694] Fps is (10 sec: 7783.1, 60 sec: 6826.7, 300 sec: 6914.6). Total num frames: 44785664. Throughput: 0: 1731.5. Samples: 6190634. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:17:22,935][41694] Avg episode reward: [(0, '4.273')] +[2024-11-08 00:17:23,713][42004] Updated weights for policy 0, policy_version 10936 (0.0030) +[2024-11-08 00:17:27,931][41694] Fps is (10 sec: 7782.3, 60 sec: 7099.8, 300 sec: 6914.6). Total num frames: 44822528. Throughput: 0: 1769.2. Samples: 6202036. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:17:27,933][41694] Avg episode reward: [(0, '4.375')] +[2024-11-08 00:17:29,294][42004] Updated weights for policy 0, policy_version 10946 (0.0026) +[2024-11-08 00:17:34,273][41694] Fps is (10 sec: 6139.5, 60 sec: 6877.7, 300 sec: 6855.7). Total num frames: 44855296. Throughput: 0: 1720.2. Samples: 6207806. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:17:34,275][41694] Avg episode reward: [(0, '4.350')] +[2024-11-08 00:17:37,166][42004] Updated weights for policy 0, policy_version 10956 (0.0025) +[2024-11-08 00:17:37,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6826.7, 300 sec: 6845.2). Total num frames: 44879872. Throughput: 0: 1669.7. Samples: 6214492. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:17:37,933][41694] Avg episode reward: [(0, '4.375')] +[2024-11-08 00:17:37,943][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000010957_44879872.pth... +[2024-11-08 00:17:38,226][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000010555_43233280.pth +[2024-11-08 00:17:42,771][42004] Updated weights for policy 0, policy_version 10966 (0.0023) +[2024-11-08 00:17:42,931][41694] Fps is (10 sec: 7096.2, 60 sec: 6894.9, 300 sec: 6859.1). Total num frames: 44916736. Throughput: 0: 1741.1. Samples: 6225276. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:17:42,934][41694] Avg episode reward: [(0, '4.512')] +[2024-11-08 00:17:47,914][42004] Updated weights for policy 0, policy_version 10976 (0.0027) +[2024-11-08 00:17:47,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6894.9, 300 sec: 6873.0). Total num frames: 44957696. Throughput: 0: 1762.4. Samples: 6231072. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:17:47,934][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 00:17:52,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6894.9, 300 sec: 6915.2). Total num frames: 44994560. Throughput: 0: 1795.6. Samples: 6242998. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:17:52,933][41694] Avg episode reward: [(0, '4.537')] +[2024-11-08 00:17:53,045][42004] Updated weights for policy 0, policy_version 10986 (0.0031) +[2024-11-08 00:17:57,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7195.7, 300 sec: 6928.5). Total num frames: 45031424. Throughput: 0: 1797.8. Samples: 6254548. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:17:57,933][41694] Avg episode reward: [(0, '4.349')] +[2024-11-08 00:17:58,666][42004] Updated weights for policy 0, policy_version 10996 (0.0041) +[2024-11-08 00:18:02,932][41694] Fps is (10 sec: 7372.5, 60 sec: 7236.5, 300 sec: 6928.5). Total num frames: 45068288. Throughput: 0: 1793.0. Samples: 6259582. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:18:02,934][41694] Avg episode reward: [(0, '4.413')] +[2024-11-08 00:18:04,477][42004] Updated weights for policy 0, policy_version 11006 (0.0029) +[2024-11-08 00:18:08,879][41694] Fps is (10 sec: 5612.2, 60 sec: 6922.1, 300 sec: 6850.9). Total num frames: 45092864. Throughput: 0: 1740.0. Samples: 6270584. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:18:08,881][41694] Avg episode reward: [(0, '4.475')] +[2024-11-08 00:18:12,236][42004] Updated weights for policy 0, policy_version 11016 (0.0019) +[2024-11-08 00:18:12,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6963.3, 300 sec: 6872.9). Total num frames: 45125632. Throughput: 0: 1678.1. Samples: 6277550. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:18:12,935][41694] Avg episode reward: [(0, '4.520')] +[2024-11-08 00:18:17,332][42004] Updated weights for policy 0, policy_version 11026 (0.0029) +[2024-11-08 00:18:17,932][41694] Fps is (10 sec: 8144.3, 60 sec: 7031.4, 300 sec: 6872.9). Total num frames: 45166592. Throughput: 0: 1735.2. Samples: 6283564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:18:17,938][41694] Avg episode reward: [(0, '4.230')] +[2024-11-08 00:18:22,560][42004] Updated weights for policy 0, policy_version 11036 (0.0028) +[2024-11-08 00:18:22,932][41694] Fps is (10 sec: 7782.7, 60 sec: 6963.2, 300 sec: 6873.0). Total num frames: 45203456. Throughput: 0: 1795.3. Samples: 6295282. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:18:22,933][41694] Avg episode reward: [(0, '4.468')] +[2024-11-08 00:18:27,766][42004] Updated weights for policy 0, policy_version 11046 (0.0030) +[2024-11-08 00:18:27,931][41694] Fps is (10 sec: 7782.7, 60 sec: 7031.5, 300 sec: 6945.4). Total num frames: 45244416. Throughput: 0: 1821.8. Samples: 6307258. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:18:27,933][41694] Avg episode reward: [(0, '4.534')] +[2024-11-08 00:18:32,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7262.1, 300 sec: 6942.4). Total num frames: 45281280. Throughput: 0: 1815.2. Samples: 6312754. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:18:32,934][41694] Avg episode reward: [(0, '4.401')] +[2024-11-08 00:18:33,497][42004] Updated weights for policy 0, policy_version 11056 (0.0041) +[2024-11-08 00:18:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7304.5, 300 sec: 6942.4). Total num frames: 45318144. Throughput: 0: 1791.9. Samples: 6323632. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:18:37,933][41694] Avg episode reward: [(0, '4.608')] +[2024-11-08 00:18:38,825][42004] Updated weights for policy 0, policy_version 11066 (0.0039) +[2024-11-08 00:18:43,290][41694] Fps is (10 sec: 5931.3, 60 sec: 7057.6, 300 sec: 6878.5). Total num frames: 45342720. Throughput: 0: 1652.4. Samples: 6329498. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:18:43,292][41694] Avg episode reward: [(0, '4.369')] +[2024-11-08 00:18:46,233][42004] Updated weights for policy 0, policy_version 11076 (0.0033) +[2024-11-08 00:18:47,932][41694] Fps is (10 sec: 6144.0, 60 sec: 7031.5, 300 sec: 6886.8). Total num frames: 45379584. Throughput: 0: 1720.8. Samples: 6337016. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:18:47,933][41694] Avg episode reward: [(0, '4.672')] +[2024-11-08 00:18:51,411][42004] Updated weights for policy 0, policy_version 11086 (0.0026) +[2024-11-08 00:18:52,931][41694] Fps is (10 sec: 7647.1, 60 sec: 7031.5, 300 sec: 6886.8). Total num frames: 45416448. Throughput: 0: 1778.6. Samples: 6348934. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:18:52,933][41694] Avg episode reward: [(0, '4.334')] +[2024-11-08 00:18:56,690][42004] Updated weights for policy 0, policy_version 11096 (0.0029) +[2024-11-08 00:18:57,933][41694] Fps is (10 sec: 7781.1, 60 sec: 7099.5, 300 sec: 6900.7). Total num frames: 45457408. Throughput: 0: 1840.7. Samples: 6360382. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:18:57,935][41694] Avg episode reward: [(0, '4.203')] +[2024-11-08 00:19:02,498][42004] Updated weights for policy 0, policy_version 11106 (0.0027) +[2024-11-08 00:19:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7031.5, 300 sec: 6942.4). Total num frames: 45490176. Throughput: 0: 1826.7. Samples: 6365764. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:19:02,933][41694] Avg episode reward: [(0, '4.491')] +[2024-11-08 00:19:07,932][41694] Fps is (10 sec: 6964.4, 60 sec: 7352.4, 300 sec: 6942.4). Total num frames: 45527040. Throughput: 0: 1794.1. Samples: 6376018. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:19:07,933][41694] Avg episode reward: [(0, '4.338')] +[2024-11-08 00:19:08,691][42004] Updated weights for policy 0, policy_version 11116 (0.0027) +[2024-11-08 00:19:12,933][41694] Fps is (10 sec: 6962.4, 60 sec: 7236.2, 300 sec: 6928.5). Total num frames: 45559808. Throughput: 0: 1758.9. Samples: 6386412. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:19:12,936][41694] Avg episode reward: [(0, '4.343')] +[2024-11-08 00:19:14,220][42004] Updated weights for policy 0, policy_version 11126 (0.0032) +[2024-11-08 00:19:17,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6895.0, 300 sec: 6859.1). Total num frames: 45580288. Throughput: 0: 1763.2. Samples: 6392098. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:19:17,934][41694] Avg episode reward: [(0, '4.416')] +[2024-11-08 00:19:21,847][42004] Updated weights for policy 0, policy_version 11136 (0.0025) +[2024-11-08 00:19:22,931][41694] Fps is (10 sec: 6144.7, 60 sec: 6963.2, 300 sec: 6886.8). Total num frames: 45621248. Throughput: 0: 1680.6. Samples: 6399260. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:19:22,933][41694] Avg episode reward: [(0, '4.243')] +[2024-11-08 00:19:27,067][42004] Updated weights for policy 0, policy_version 11146 (0.0024) +[2024-11-08 00:19:27,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6894.9, 300 sec: 6873.0). Total num frames: 45658112. Throughput: 0: 1822.2. Samples: 6410844. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:19:27,933][41694] Avg episode reward: [(0, '4.559')] +[2024-11-08 00:19:32,782][42004] Updated weights for policy 0, policy_version 11156 (0.0039) +[2024-11-08 00:19:32,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6894.9, 300 sec: 6872.9). Total num frames: 45694976. Throughput: 0: 1763.2. Samples: 6416360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:19:32,933][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 00:19:37,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6826.7, 300 sec: 6914.6). Total num frames: 45727744. Throughput: 0: 1726.5. Samples: 6426626. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:19:37,933][41694] Avg episode reward: [(0, '4.380')] +[2024-11-08 00:19:37,943][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000011164_45727744.pth... +[2024-11-08 00:19:38,075][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000010758_44064768.pth +[2024-11-08 00:19:39,188][42004] Updated weights for policy 0, policy_version 11166 (0.0032) +[2024-11-08 00:19:42,932][41694] Fps is (10 sec: 6553.8, 60 sec: 7005.1, 300 sec: 6914.6). Total num frames: 45760512. Throughput: 0: 1681.4. Samples: 6436044. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:19:42,935][41694] Avg episode reward: [(0, '4.339')] +[2024-11-08 00:19:45,346][42004] Updated weights for policy 0, policy_version 11176 (0.0033) +[2024-11-08 00:19:47,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6894.9, 300 sec: 6886.8). Total num frames: 45793280. Throughput: 0: 1676.9. Samples: 6441226. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:19:47,933][41694] Avg episode reward: [(0, '4.478')] +[2024-11-08 00:19:52,904][42004] Updated weights for policy 0, policy_version 11186 (0.0030) +[2024-11-08 00:19:52,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6690.1, 300 sec: 6859.1). Total num frames: 45817856. Throughput: 0: 1649.9. Samples: 6450264. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:19:52,933][41694] Avg episode reward: [(0, '4.505')] +[2024-11-08 00:19:57,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6622.1, 300 sec: 6873.0). Total num frames: 45854720. Throughput: 0: 1633.1. Samples: 6459898. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:19:57,935][41694] Avg episode reward: [(0, '4.299')] +[2024-11-08 00:19:58,236][42004] Updated weights for policy 0, policy_version 11196 (0.0045) +[2024-11-08 00:20:02,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6690.1, 300 sec: 6872.9). Total num frames: 45891584. Throughput: 0: 1632.1. Samples: 6465542. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:20:02,934][41694] Avg episode reward: [(0, '4.307')] +[2024-11-08 00:20:03,885][42004] Updated weights for policy 0, policy_version 11206 (0.0026) +[2024-11-08 00:20:07,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6873.1). Total num frames: 45928448. Throughput: 0: 1716.3. Samples: 6476494. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:20:07,933][41694] Avg episode reward: [(0, '4.581')] +[2024-11-08 00:20:09,196][42004] Updated weights for policy 0, policy_version 11216 (0.0027) +[2024-11-08 00:20:12,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6690.3, 300 sec: 6928.5). Total num frames: 45961216. Throughput: 0: 1700.7. Samples: 6487376. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:20:12,933][41694] Avg episode reward: [(0, '4.293')] +[2024-11-08 00:20:16,102][42004] Updated weights for policy 0, policy_version 11226 (0.0035) +[2024-11-08 00:20:17,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6894.9, 300 sec: 6928.5). Total num frames: 45993984. Throughput: 0: 1666.3. Samples: 6491344. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:20:17,933][41694] Avg episode reward: [(0, '4.293')] +[2024-11-08 00:20:21,609][42004] Updated weights for policy 0, policy_version 11236 (0.0031) +[2024-11-08 00:20:22,932][41694] Fps is (10 sec: 6962.7, 60 sec: 6826.6, 300 sec: 6914.6). Total num frames: 46030848. Throughput: 0: 1675.3. Samples: 6502014. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:20:22,934][41694] Avg episode reward: [(0, '4.615')] +[2024-11-08 00:20:27,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6621.8, 300 sec: 6886.8). Total num frames: 46055424. Throughput: 0: 1633.3. Samples: 6509544. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:20:27,934][41694] Avg episode reward: [(0, '4.531')] +[2024-11-08 00:20:29,136][42004] Updated weights for policy 0, policy_version 11246 (0.0030) +[2024-11-08 00:20:32,931][41694] Fps is (10 sec: 6144.5, 60 sec: 6621.9, 300 sec: 6873.0). Total num frames: 46092288. Throughput: 0: 1644.2. Samples: 6515214. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:20:32,933][41694] Avg episode reward: [(0, '4.222')] +[2024-11-08 00:20:34,453][42004] Updated weights for policy 0, policy_version 11256 (0.0030) +[2024-11-08 00:20:37,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6621.9, 300 sec: 6859.1). Total num frames: 46125056. Throughput: 0: 1693.3. Samples: 6526464. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:20:37,934][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 00:20:40,306][42004] Updated weights for policy 0, policy_version 11266 (0.0025) +[2024-11-08 00:20:42,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6690.1, 300 sec: 6859.1). Total num frames: 46161920. Throughput: 0: 1715.2. Samples: 6537082. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:20:42,936][41694] Avg episode reward: [(0, '4.435')] +[2024-11-08 00:20:46,348][42004] Updated weights for policy 0, policy_version 11276 (0.0025) +[2024-11-08 00:20:47,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 6900.7). Total num frames: 46194688. Throughput: 0: 1694.3. Samples: 6541784. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:20:47,934][41694] Avg episode reward: [(0, '4.505')] +[2024-11-08 00:20:51,995][42004] Updated weights for policy 0, policy_version 11286 (0.0028) +[2024-11-08 00:20:52,939][41694] Fps is (10 sec: 6958.2, 60 sec: 6894.1, 300 sec: 6900.6). Total num frames: 46231552. Throughput: 0: 1686.9. Samples: 6552416. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:20:52,941][41694] Avg episode reward: [(0, '4.622')] +[2024-11-08 00:20:57,091][42004] Updated weights for policy 0, policy_version 11296 (0.0018) +[2024-11-08 00:20:57,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6914.6). Total num frames: 46272512. Throughput: 0: 1713.0. Samples: 6564462. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:20:57,933][41694] Avg episode reward: [(0, '4.511')] +[2024-11-08 00:21:02,932][41694] Fps is (10 sec: 6148.3, 60 sec: 6690.2, 300 sec: 6859.1). Total num frames: 46292992. Throughput: 0: 1693.7. Samples: 6567562. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:21:02,934][41694] Avg episode reward: [(0, '4.376')] +[2024-11-08 00:21:04,847][42004] Updated weights for policy 0, policy_version 11306 (0.0032) +[2024-11-08 00:21:07,932][41694] Fps is (10 sec: 5734.0, 60 sec: 6690.1, 300 sec: 6872.9). Total num frames: 46329856. Throughput: 0: 1665.4. Samples: 6576958. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:21:07,934][41694] Avg episode reward: [(0, '4.243')] +[2024-11-08 00:21:10,362][42004] Updated weights for policy 0, policy_version 11316 (0.0029) +[2024-11-08 00:21:12,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6873.0). Total num frames: 46366720. Throughput: 0: 1736.5. Samples: 6587686. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:21:12,933][41694] Avg episode reward: [(0, '4.258')] +[2024-11-08 00:21:16,242][42004] Updated weights for policy 0, policy_version 11326 (0.0034) +[2024-11-08 00:21:17,932][41694] Fps is (10 sec: 6963.6, 60 sec: 6758.4, 300 sec: 6859.1). Total num frames: 46399488. Throughput: 0: 1729.8. Samples: 6593056. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:21:17,934][41694] Avg episode reward: [(0, '4.379')] +[2024-11-08 00:21:22,265][42004] Updated weights for policy 0, policy_version 11336 (0.0024) +[2024-11-08 00:21:22,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.5, 300 sec: 6914.6). Total num frames: 46436352. Throughput: 0: 1710.7. Samples: 6603446. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:21:22,935][41694] Avg episode reward: [(0, '4.608')] +[2024-11-08 00:21:27,367][42004] Updated weights for policy 0, policy_version 11346 (0.0027) +[2024-11-08 00:21:27,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7031.5, 300 sec: 6928.5). Total num frames: 46477312. Throughput: 0: 1735.2. Samples: 6615168. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:21:27,933][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 00:21:32,578][42004] Updated weights for policy 0, policy_version 11356 (0.0031) +[2024-11-08 00:21:32,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7031.4, 300 sec: 6928.5). Total num frames: 46514176. Throughput: 0: 1759.4. Samples: 6620958. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:21:32,933][41694] Avg episode reward: [(0, '4.298')] +[2024-11-08 00:21:37,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6894.9, 300 sec: 6900.7). Total num frames: 46538752. Throughput: 0: 1693.8. Samples: 6628626. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:21:37,933][41694] Avg episode reward: [(0, '4.239')] +[2024-11-08 00:21:37,945][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000011362_46538752.pth... +[2024-11-08 00:21:38,058][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000010957_44879872.pth +[2024-11-08 00:21:40,008][42004] Updated weights for policy 0, policy_version 11366 (0.0028) +[2024-11-08 00:21:42,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6894.9, 300 sec: 6886.8). Total num frames: 46575616. Throughput: 0: 1679.7. Samples: 6640048. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:21:42,934][41694] Avg episode reward: [(0, '4.392')] +[2024-11-08 00:21:45,337][42004] Updated weights for policy 0, policy_version 11376 (0.0042) +[2024-11-08 00:21:47,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6886.8). Total num frames: 46612480. Throughput: 0: 1740.7. Samples: 6645894. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:21:47,933][41694] Avg episode reward: [(0, '4.604')] +[2024-11-08 00:21:50,913][42004] Updated weights for policy 0, policy_version 11386 (0.0032) +[2024-11-08 00:21:52,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6964.0, 300 sec: 6946.4). Total num frames: 46649344. Throughput: 0: 1777.7. Samples: 6656956. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:21:52,934][41694] Avg episode reward: [(0, '4.490')] +[2024-11-08 00:21:56,736][42004] Updated weights for policy 0, policy_version 11396 (0.0035) +[2024-11-08 00:21:57,935][41694] Fps is (10 sec: 7370.0, 60 sec: 6894.5, 300 sec: 6956.2). Total num frames: 46686208. Throughput: 0: 1776.4. Samples: 6667632. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:21:57,938][41694] Avg episode reward: [(0, '4.231')] +[2024-11-08 00:22:02,276][42004] Updated weights for policy 0, policy_version 11406 (0.0042) +[2024-11-08 00:22:02,932][41694] Fps is (10 sec: 7373.0, 60 sec: 7168.0, 300 sec: 6956.3). Total num frames: 46723072. Throughput: 0: 1783.3. Samples: 6673306. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:22:02,934][41694] Avg episode reward: [(0, '4.538')] +[2024-11-08 00:22:09,456][41694] Fps is (10 sec: 6044.1, 60 sec: 6923.9, 300 sec: 6906.7). Total num frames: 46755840. Throughput: 0: 1734.9. Samples: 6684162. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:22:09,460][41694] Avg episode reward: [(0, '4.364')] +[2024-11-08 00:22:09,784][42004] Updated weights for policy 0, policy_version 11416 (0.0033) +[2024-11-08 00:22:12,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6894.9, 300 sec: 6900.7). Total num frames: 46780416. Throughput: 0: 1700.4. Samples: 6691688. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:22:12,933][41694] Avg episode reward: [(0, '4.472')] +[2024-11-08 00:22:15,109][42004] Updated weights for policy 0, policy_version 11426 (0.0021) +[2024-11-08 00:22:17,931][41694] Fps is (10 sec: 7732.3, 60 sec: 7031.5, 300 sec: 6900.7). Total num frames: 46821376. Throughput: 0: 1702.9. Samples: 6697590. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:22:17,933][41694] Avg episode reward: [(0, '4.311')] +[2024-11-08 00:22:20,412][42004] Updated weights for policy 0, policy_version 11436 (0.0024) +[2024-11-08 00:22:22,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7031.5, 300 sec: 6900.7). Total num frames: 46858240. Throughput: 0: 1787.7. Samples: 6709074. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:22:22,938][41694] Avg episode reward: [(0, '4.469')] +[2024-11-08 00:22:26,202][42004] Updated weights for policy 0, policy_version 11446 (0.0029) +[2024-11-08 00:22:27,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6894.9, 300 sec: 6932.2). Total num frames: 46891008. Throughput: 0: 1771.2. Samples: 6719750. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:22:27,934][41694] Avg episode reward: [(0, '4.564')] +[2024-11-08 00:22:31,716][42004] Updated weights for policy 0, policy_version 11456 (0.0031) +[2024-11-08 00:22:32,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6956.3). Total num frames: 46931968. Throughput: 0: 1760.7. Samples: 6725128. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:22:32,933][41694] Avg episode reward: [(0, '4.433')] +[2024-11-08 00:22:37,011][42004] Updated weights for policy 0, policy_version 11466 (0.0029) +[2024-11-08 00:22:37,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7168.0, 300 sec: 6956.3). Total num frames: 46968832. Throughput: 0: 1774.0. Samples: 6736786. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:22:37,937][41694] Avg episode reward: [(0, '4.289')] +[2024-11-08 00:22:43,832][41694] Fps is (10 sec: 6388.2, 60 sec: 6994.8, 300 sec: 6907.4). Total num frames: 47001600. Throughput: 0: 1633.5. Samples: 6742606. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:22:43,833][41694] Avg episode reward: [(0, '4.610')] +[2024-11-08 00:22:44,320][42004] Updated weights for policy 0, policy_version 11476 (0.0024) +[2024-11-08 00:22:47,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6963.2, 300 sec: 6900.7). Total num frames: 47030272. Throughput: 0: 1711.8. Samples: 6750338. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:22:47,933][41694] Avg episode reward: [(0, '4.505')] +[2024-11-08 00:22:49,615][42004] Updated weights for policy 0, policy_version 11486 (0.0025) +[2024-11-08 00:22:52,931][41694] Fps is (10 sec: 7652.1, 60 sec: 7031.5, 300 sec: 6914.6). Total num frames: 47071232. Throughput: 0: 1788.8. Samples: 6761932. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:22:52,934][41694] Avg episode reward: [(0, '4.555')] +[2024-11-08 00:22:54,943][42004] Updated weights for policy 0, policy_version 11496 (0.0021) +[2024-11-08 00:22:57,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7031.9, 300 sec: 6914.6). Total num frames: 47108096. Throughput: 0: 1819.6. Samples: 6773568. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:22:57,935][41694] Avg episode reward: [(0, '4.289')] +[2024-11-08 00:23:00,705][42004] Updated weights for policy 0, policy_version 11506 (0.0028) +[2024-11-08 00:23:02,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6963.2, 300 sec: 6964.7). Total num frames: 47140864. Throughput: 0: 1801.8. Samples: 6778670. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:23:02,934][41694] Avg episode reward: [(0, '4.301')] +[2024-11-08 00:23:06,408][42004] Updated weights for policy 0, policy_version 11516 (0.0033) +[2024-11-08 00:23:07,933][41694] Fps is (10 sec: 6962.5, 60 sec: 7214.6, 300 sec: 6956.2). Total num frames: 47177728. Throughput: 0: 1779.8. Samples: 6789168. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:23:07,934][41694] Avg episode reward: [(0, '4.264')] +[2024-11-08 00:23:12,081][42004] Updated weights for policy 0, policy_version 11526 (0.0032) +[2024-11-08 00:23:12,932][41694] Fps is (10 sec: 7373.0, 60 sec: 7236.3, 300 sec: 6942.4). Total num frames: 47214592. Throughput: 0: 1781.9. Samples: 6799936. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:23:12,934][41694] Avg episode reward: [(0, '4.326')] +[2024-11-08 00:23:18,211][41694] Fps is (10 sec: 5977.3, 60 sec: 6930.9, 300 sec: 6894.2). Total num frames: 47239168. Throughput: 0: 1782.7. Samples: 6805846. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:23:18,214][41694] Avg episode reward: [(0, '4.348')] +[2024-11-08 00:23:19,449][42004] Updated weights for policy 0, policy_version 11536 (0.0030) +[2024-11-08 00:23:22,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6963.2, 300 sec: 6886.8). Total num frames: 47276032. Throughput: 0: 1696.3. Samples: 6813118. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:23:22,933][41694] Avg episode reward: [(0, '4.635')] +[2024-11-08 00:23:25,219][42004] Updated weights for policy 0, policy_version 11546 (0.0027) +[2024-11-08 00:23:27,932][41694] Fps is (10 sec: 7163.6, 60 sec: 6963.2, 300 sec: 6872.9). Total num frames: 47308800. Throughput: 0: 1845.4. Samples: 6823988. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:23:27,934][41694] Avg episode reward: [(0, '4.535')] +[2024-11-08 00:23:30,584][42004] Updated weights for policy 0, policy_version 11556 (0.0021) +[2024-11-08 00:23:32,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 6872.9). Total num frames: 47345664. Throughput: 0: 1765.2. Samples: 6829770. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:23:32,934][41694] Avg episode reward: [(0, '4.507')] +[2024-11-08 00:23:36,526][42004] Updated weights for policy 0, policy_version 11566 (0.0030) +[2024-11-08 00:23:37,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6894.9, 300 sec: 6923.0). Total num frames: 47382528. Throughput: 0: 1738.0. Samples: 6840140. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:23:37,934][41694] Avg episode reward: [(0, '4.555')] +[2024-11-08 00:23:37,949][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000011568_47382528.pth... +[2024-11-08 00:23:38,089][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000011164_45727744.pth +[2024-11-08 00:23:41,874][42004] Updated weights for policy 0, policy_version 11576 (0.0022) +[2024-11-08 00:23:42,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7138.6, 300 sec: 6928.5). Total num frames: 47423488. Throughput: 0: 1736.8. Samples: 6851722. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:23:42,934][41694] Avg episode reward: [(0, '4.585')] +[2024-11-08 00:23:47,039][42004] Updated weights for policy 0, policy_version 11586 (0.0025) +[2024-11-08 00:23:47,932][41694] Fps is (10 sec: 7782.1, 60 sec: 7168.0, 300 sec: 6928.5). Total num frames: 47460352. Throughput: 0: 1754.2. Samples: 6857608. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:23:47,934][41694] Avg episode reward: [(0, '4.488')] +[2024-11-08 00:23:52,932][41694] Fps is (10 sec: 6143.8, 60 sec: 6894.9, 300 sec: 6873.0). Total num frames: 47484928. Throughput: 0: 1753.8. Samples: 6868088. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:23:52,934][41694] Avg episode reward: [(0, '4.352')] +[2024-11-08 00:23:54,615][42004] Updated weights for policy 0, policy_version 11596 (0.0030) +[2024-11-08 00:23:57,931][41694] Fps is (10 sec: 6144.3, 60 sec: 6894.9, 300 sec: 6886.8). Total num frames: 47521792. Throughput: 0: 1706.3. Samples: 6876720. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:23:57,934][41694] Avg episode reward: [(0, '4.353')] +[2024-11-08 00:23:59,778][42004] Updated weights for policy 0, policy_version 11606 (0.0027) +[2024-11-08 00:24:02,931][41694] Fps is (10 sec: 7373.1, 60 sec: 6963.3, 300 sec: 6886.8). Total num frames: 47558656. Throughput: 0: 1716.1. Samples: 6882588. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:24:02,934][41694] Avg episode reward: [(0, '4.461')] +[2024-11-08 00:24:05,792][42004] Updated weights for policy 0, policy_version 11616 (0.0030) +[2024-11-08 00:24:07,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6895.0, 300 sec: 6886.9). Total num frames: 47591424. Throughput: 0: 1771.4. Samples: 6892830. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:24:07,934][41694] Avg episode reward: [(0, '4.510')] +[2024-11-08 00:24:11,544][42004] Updated weights for policy 0, policy_version 11626 (0.0032) +[2024-11-08 00:24:12,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6894.9, 300 sec: 6942.4). Total num frames: 47628288. Throughput: 0: 1768.7. Samples: 6903578. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:24:12,933][41694] Avg episode reward: [(0, '4.453')] +[2024-11-08 00:24:16,941][42004] Updated weights for policy 0, policy_version 11636 (0.0027) +[2024-11-08 00:24:17,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7133.0, 300 sec: 6928.5). Total num frames: 47665152. Throughput: 0: 1765.9. Samples: 6909234. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:24:17,934][41694] Avg episode reward: [(0, '4.449')] +[2024-11-08 00:24:22,227][42004] Updated weights for policy 0, policy_version 11646 (0.0033) +[2024-11-08 00:24:22,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7168.0, 300 sec: 6942.4). Total num frames: 47706112. Throughput: 0: 1791.4. Samples: 6920754. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:24:22,933][41694] Avg episode reward: [(0, '4.483')] +[2024-11-08 00:24:27,932][41694] Fps is (10 sec: 6143.8, 60 sec: 6963.2, 300 sec: 6886.8). Total num frames: 47726592. Throughput: 0: 1701.6. Samples: 6928296. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:24:27,933][41694] Avg episode reward: [(0, '4.533')] +[2024-11-08 00:24:29,678][42004] Updated weights for policy 0, policy_version 11656 (0.0025) +[2024-11-08 00:24:32,931][41694] Fps is (10 sec: 6144.1, 60 sec: 7031.5, 300 sec: 6914.6). Total num frames: 47767552. Throughput: 0: 1698.0. Samples: 6934018. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:24:32,933][41694] Avg episode reward: [(0, '4.531')] +[2024-11-08 00:24:34,967][42004] Updated weights for policy 0, policy_version 11666 (0.0025) +[2024-11-08 00:24:37,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7031.5, 300 sec: 6928.5). Total num frames: 47804416. Throughput: 0: 1718.8. Samples: 6945434. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:24:37,934][41694] Avg episode reward: [(0, '4.394')] +[2024-11-08 00:24:40,850][42004] Updated weights for policy 0, policy_version 11676 (0.0032) +[2024-11-08 00:24:42,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6894.9, 300 sec: 6928.5). Total num frames: 47837184. Throughput: 0: 1759.0. Samples: 6955876. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:24:42,934][41694] Avg episode reward: [(0, '4.488')] +[2024-11-08 00:24:46,339][42004] Updated weights for policy 0, policy_version 11686 (0.0026) +[2024-11-08 00:24:47,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6895.0, 300 sec: 6970.1). Total num frames: 47874048. Throughput: 0: 1749.6. Samples: 6961320. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:24:47,936][41694] Avg episode reward: [(0, '4.429')] +[2024-11-08 00:24:51,878][42004] Updated weights for policy 0, policy_version 11696 (0.0045) +[2024-11-08 00:24:52,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7099.8, 300 sec: 6970.1). Total num frames: 47910912. Throughput: 0: 1775.6. Samples: 6972732. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:24:52,933][41694] Avg episode reward: [(0, '4.284')] +[2024-11-08 00:24:57,264][42004] Updated weights for policy 0, policy_version 11706 (0.0029) +[2024-11-08 00:24:57,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7168.0, 300 sec: 6984.0). Total num frames: 47951872. Throughput: 0: 1790.4. Samples: 6984144. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:24:57,934][41694] Avg episode reward: [(0, '4.251')] +[2024-11-08 00:25:02,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6894.9, 300 sec: 6928.5). Total num frames: 47972352. Throughput: 0: 1758.3. Samples: 6988360. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:25:02,935][41694] Avg episode reward: [(0, '4.364')] +[2024-11-08 00:25:04,894][42004] Updated weights for policy 0, policy_version 11716 (0.0027) +[2024-11-08 00:25:07,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6963.2, 300 sec: 6942.4). Total num frames: 48009216. Throughput: 0: 1694.0. Samples: 6996986. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:25:07,933][41694] Avg episode reward: [(0, '4.363')] +[2024-11-08 00:25:10,261][42004] Updated weights for policy 0, policy_version 11726 (0.0036) +[2024-11-08 00:25:12,933][41694] Fps is (10 sec: 7372.2, 60 sec: 6963.1, 300 sec: 6956.2). Total num frames: 48046080. Throughput: 0: 1769.3. Samples: 7007916. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:25:12,934][41694] Avg episode reward: [(0, '4.322')] +[2024-11-08 00:25:16,751][42004] Updated weights for policy 0, policy_version 11736 (0.0043) +[2024-11-08 00:25:17,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6894.9, 300 sec: 6942.4). Total num frames: 48078848. Throughput: 0: 1744.9. Samples: 7012540. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:25:17,935][41694] Avg episode reward: [(0, '4.592')] +[2024-11-08 00:25:22,193][42004] Updated weights for policy 0, policy_version 11746 (0.0026) +[2024-11-08 00:25:22,931][41694] Fps is (10 sec: 6963.9, 60 sec: 6826.7, 300 sec: 6984.0). Total num frames: 48115712. Throughput: 0: 1725.2. Samples: 7023066. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:25:22,933][41694] Avg episode reward: [(0, '4.207')] +[2024-11-08 00:25:27,579][42004] Updated weights for policy 0, policy_version 11756 (0.0039) +[2024-11-08 00:25:27,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7099.8, 300 sec: 6984.0). Total num frames: 48152576. Throughput: 0: 1750.3. Samples: 7034640. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:25:27,934][41694] Avg episode reward: [(0, '4.472')] +[2024-11-08 00:25:32,786][42004] Updated weights for policy 0, policy_version 11766 (0.0036) +[2024-11-08 00:25:32,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7099.7, 300 sec: 7011.8). Total num frames: 48193536. Throughput: 0: 1757.9. Samples: 7040424. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:25:32,934][41694] Avg episode reward: [(0, '4.413')] +[2024-11-08 00:25:37,934][41694] Fps is (10 sec: 6142.4, 60 sec: 6826.4, 300 sec: 6956.2). Total num frames: 48214016. Throughput: 0: 1667.5. Samples: 7047776. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:25:37,938][41694] Avg episode reward: [(0, '4.361')] +[2024-11-08 00:25:37,954][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000011771_48214016.pth... +[2024-11-08 00:25:38,096][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000011362_46538752.pth +[2024-11-08 00:25:40,433][42004] Updated weights for policy 0, policy_version 11776 (0.0024) +[2024-11-08 00:25:42,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6894.9, 300 sec: 6970.1). Total num frames: 48250880. Throughput: 0: 1667.9. Samples: 7059200. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:25:42,936][41694] Avg episode reward: [(0, '4.333')] +[2024-11-08 00:25:45,915][42004] Updated weights for policy 0, policy_version 11786 (0.0028) +[2024-11-08 00:25:47,931][41694] Fps is (10 sec: 7374.7, 60 sec: 6894.9, 300 sec: 6970.3). Total num frames: 48287744. Throughput: 0: 1702.1. Samples: 7064954. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:25:47,934][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 00:25:51,713][42004] Updated weights for policy 0, policy_version 11796 (0.0033) +[2024-11-08 00:25:52,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6956.3). Total num frames: 48324608. Throughput: 0: 1739.4. Samples: 7075260. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:25:52,935][41694] Avg episode reward: [(0, '4.198')] +[2024-11-08 00:25:56,788][42004] Updated weights for policy 0, policy_version 11806 (0.0024) +[2024-11-08 00:25:57,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6894.9, 300 sec: 7025.7). Total num frames: 48365568. Throughput: 0: 1765.9. Samples: 7087378. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:25:57,933][41694] Avg episode reward: [(0, '4.421')] +[2024-11-08 00:26:02,269][42004] Updated weights for policy 0, policy_version 11816 (0.0026) +[2024-11-08 00:26:02,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7168.0, 300 sec: 7025.7). Total num frames: 48402432. Throughput: 0: 1789.0. Samples: 7093044. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:26:02,934][41694] Avg episode reward: [(0, '4.316')] +[2024-11-08 00:26:07,783][42004] Updated weights for policy 0, policy_version 11826 (0.0032) +[2024-11-08 00:26:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7168.0, 300 sec: 7025.7). Total num frames: 48439296. Throughput: 0: 1801.5. Samples: 7104136. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:26:07,933][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 00:26:12,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6826.8, 300 sec: 6970.1). Total num frames: 48455680. Throughput: 0: 1676.9. Samples: 7110102. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:26:12,933][41694] Avg episode reward: [(0, '4.409')] +[2024-11-08 00:26:16,215][42004] Updated weights for policy 0, policy_version 11836 (0.0040) +[2024-11-08 00:26:17,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6894.9, 300 sec: 6970.1). Total num frames: 48492544. Throughput: 0: 1662.8. Samples: 7115252. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:26:17,934][41694] Avg episode reward: [(0, '4.652')] +[2024-11-08 00:26:21,882][42004] Updated weights for policy 0, policy_version 11846 (0.0023) +[2024-11-08 00:26:22,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6826.6, 300 sec: 6942.4). Total num frames: 48525312. Throughput: 0: 1742.4. Samples: 7126180. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:26:22,936][41694] Avg episode reward: [(0, '4.429')] +[2024-11-08 00:26:27,892][42004] Updated weights for policy 0, policy_version 11856 (0.0031) +[2024-11-08 00:26:27,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6826.6, 300 sec: 6942.4). Total num frames: 48562176. Throughput: 0: 1715.9. Samples: 7136414. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:26:27,935][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 00:26:32,931][41694] Fps is (10 sec: 7373.1, 60 sec: 6758.4, 300 sec: 6984.0). Total num frames: 48599040. Throughput: 0: 1707.8. Samples: 7141804. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:26:32,933][41694] Avg episode reward: [(0, '4.441')] +[2024-11-08 00:26:33,239][42004] Updated weights for policy 0, policy_version 11866 (0.0030) +[2024-11-08 00:26:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7031.7, 300 sec: 6984.0). Total num frames: 48635904. Throughput: 0: 1733.1. Samples: 7153250. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:26:37,934][41694] Avg episode reward: [(0, '4.268')] +[2024-11-08 00:26:38,700][42004] Updated weights for policy 0, policy_version 11876 (0.0027) +[2024-11-08 00:26:44,521][41694] Fps is (10 sec: 6361.9, 60 sec: 6850.1, 300 sec: 6946.6). Total num frames: 48672768. Throughput: 0: 1661.7. Samples: 7164794. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:26:44,523][41694] Avg episode reward: [(0, '4.380')] +[2024-11-08 00:26:45,897][42004] Updated weights for policy 0, policy_version 11886 (0.0028) +[2024-11-08 00:26:47,932][41694] Fps is (10 sec: 6144.1, 60 sec: 6826.6, 300 sec: 6942.4). Total num frames: 48697344. Throughput: 0: 1644.8. Samples: 7167062. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 00:26:47,934][41694] Avg episode reward: [(0, '4.505')] +[2024-11-08 00:26:51,198][42004] Updated weights for policy 0, policy_version 11896 (0.0029) +[2024-11-08 00:26:52,931][41694] Fps is (10 sec: 7791.8, 60 sec: 6895.0, 300 sec: 6956.3). Total num frames: 48738304. Throughput: 0: 1656.0. Samples: 7178656. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 00:26:52,933][41694] Avg episode reward: [(0, '4.543')] +[2024-11-08 00:26:57,084][42004] Updated weights for policy 0, policy_version 11906 (0.0032) +[2024-11-08 00:26:57,931][41694] Fps is (10 sec: 7373.1, 60 sec: 6758.4, 300 sec: 6942.4). Total num frames: 48771072. Throughput: 0: 1757.2. Samples: 7189176. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:26:57,933][41694] Avg episode reward: [(0, '4.337')] +[2024-11-08 00:27:02,637][42004] Updated weights for policy 0, policy_version 11916 (0.0036) +[2024-11-08 00:27:02,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6992.4). Total num frames: 48807936. Throughput: 0: 1771.3. Samples: 7194962. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:27:02,933][41694] Avg episode reward: [(0, '4.372')] +[2024-11-08 00:27:07,874][42004] Updated weights for policy 0, policy_version 11926 (0.0031) +[2024-11-08 00:27:07,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6826.7, 300 sec: 7011.8). Total num frames: 48848896. Throughput: 0: 1774.0. Samples: 7206010. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:27:07,935][41694] Avg episode reward: [(0, '4.306')] +[2024-11-08 00:27:12,932][41694] Fps is (10 sec: 7782.2, 60 sec: 7168.0, 300 sec: 6997.9). Total num frames: 48885760. Throughput: 0: 1809.0. Samples: 7217820. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:27:12,934][41694] Avg episode reward: [(0, '4.229')] +[2024-11-08 00:27:13,133][42004] Updated weights for policy 0, policy_version 11936 (0.0029) +[2024-11-08 00:27:18,915][41694] Fps is (10 sec: 5966.9, 60 sec: 6918.1, 300 sec: 6947.0). Total num frames: 48914432. Throughput: 0: 1775.9. Samples: 7223466. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:27:18,917][41694] Avg episode reward: [(0, '4.096')] +[2024-11-08 00:27:20,672][42004] Updated weights for policy 0, policy_version 11946 (0.0031) +[2024-11-08 00:27:22,932][41694] Fps is (10 sec: 6144.0, 60 sec: 7031.5, 300 sec: 6970.1). Total num frames: 48947200. Throughput: 0: 1727.7. Samples: 7230994. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:27:22,933][41694] Avg episode reward: [(0, '4.289')] +[2024-11-08 00:27:25,937][42004] Updated weights for policy 0, policy_version 11956 (0.0019) +[2024-11-08 00:27:27,934][41694] Fps is (10 sec: 7720.5, 60 sec: 7031.2, 300 sec: 6956.2). Total num frames: 48984064. Throughput: 0: 1783.8. Samples: 7242236. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:27:27,937][41694] Avg episode reward: [(0, '4.494')] +[2024-11-08 00:27:31,997][42004] Updated weights for policy 0, policy_version 11966 (0.0028) +[2024-11-08 00:27:32,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6963.2, 300 sec: 6942.4). Total num frames: 49016832. Throughput: 0: 1780.4. Samples: 7247180. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 00:27:32,934][41694] Avg episode reward: [(0, '4.258')] +[2024-11-08 00:27:37,234][42004] Updated weights for policy 0, policy_version 11976 (0.0029) +[2024-11-08 00:27:37,934][41694] Fps is (10 sec: 7372.7, 60 sec: 7031.2, 300 sec: 6991.4). Total num frames: 49057792. Throughput: 0: 1778.1. Samples: 7258676. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 00:27:37,936][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 00:27:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000011977_49057792.pth... +[2024-11-08 00:27:38,082][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000011568_47382528.pth +[2024-11-08 00:27:42,442][42004] Updated weights for policy 0, policy_version 11986 (0.0032) +[2024-11-08 00:27:42,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7222.8, 300 sec: 6997.9). Total num frames: 49094656. Throughput: 0: 1801.9. Samples: 7270260. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:27:42,933][41694] Avg episode reward: [(0, '4.118')] +[2024-11-08 00:27:47,762][42004] Updated weights for policy 0, policy_version 11996 (0.0035) +[2024-11-08 00:27:47,933][41694] Fps is (10 sec: 7783.6, 60 sec: 7304.5, 300 sec: 6997.9). Total num frames: 49135616. Throughput: 0: 1802.8. Samples: 7276090. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:27:47,934][41694] Avg episode reward: [(0, '4.248')] +[2024-11-08 00:27:53,286][41694] Fps is (10 sec: 6329.4, 60 sec: 6990.2, 300 sec: 6947.9). Total num frames: 49160192. Throughput: 0: 1805.0. Samples: 7287876. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:27:53,287][41694] Avg episode reward: [(0, '4.395')] +[2024-11-08 00:27:55,090][42004] Updated weights for policy 0, policy_version 12006 (0.0028) +[2024-11-08 00:27:57,931][41694] Fps is (10 sec: 6144.7, 60 sec: 7099.7, 300 sec: 6970.2). Total num frames: 49197056. Throughput: 0: 1721.2. Samples: 7295272. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:27:57,933][41694] Avg episode reward: [(0, '4.559')] +[2024-11-08 00:28:00,567][42004] Updated weights for policy 0, policy_version 12016 (0.0047) +[2024-11-08 00:28:02,931][41694] Fps is (10 sec: 7218.9, 60 sec: 7031.5, 300 sec: 6956.3). Total num frames: 49229824. Throughput: 0: 1762.6. Samples: 7301050. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:28:02,933][41694] Avg episode reward: [(0, '4.423')] +[2024-11-08 00:28:06,790][42004] Updated weights for policy 0, policy_version 12026 (0.0022) +[2024-11-08 00:28:07,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6963.2, 300 sec: 6956.3). Total num frames: 49266688. Throughput: 0: 1770.8. Samples: 7310680. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:28:07,934][41694] Avg episode reward: [(0, '4.266')] +[2024-11-08 00:28:12,059][42004] Updated weights for policy 0, policy_version 12036 (0.0024) +[2024-11-08 00:28:12,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 7004.6). Total num frames: 49303552. Throughput: 0: 1780.4. Samples: 7322348. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:28:12,933][41694] Avg episode reward: [(0, '4.259')] +[2024-11-08 00:28:17,136][42004] Updated weights for policy 0, policy_version 12046 (0.0030) +[2024-11-08 00:28:17,933][41694] Fps is (10 sec: 7781.7, 60 sec: 7287.3, 300 sec: 7011.8). Total num frames: 49344512. Throughput: 0: 1803.9. Samples: 7328356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:28:17,935][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 00:28:22,339][42004] Updated weights for policy 0, policy_version 12056 (0.0025) +[2024-11-08 00:28:22,931][41694] Fps is (10 sec: 8192.0, 60 sec: 7304.6, 300 sec: 7039.6). Total num frames: 49385472. Throughput: 0: 1811.6. Samples: 7340194. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:28:22,934][41694] Avg episode reward: [(0, '4.444')] +[2024-11-08 00:28:27,932][41694] Fps is (10 sec: 6144.6, 60 sec: 7031.8, 300 sec: 6984.0). Total num frames: 49405952. Throughput: 0: 1750.8. Samples: 7349046. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:28:27,934][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 00:28:29,741][42004] Updated weights for policy 0, policy_version 12066 (0.0032) +[2024-11-08 00:28:32,932][41694] Fps is (10 sec: 6143.9, 60 sec: 7168.0, 300 sec: 6997.9). Total num frames: 49446912. Throughput: 0: 1723.8. Samples: 7353658. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:28:32,933][41694] Avg episode reward: [(0, '4.433')] +[2024-11-08 00:28:35,245][42004] Updated weights for policy 0, policy_version 12076 (0.0025) +[2024-11-08 00:28:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7031.8, 300 sec: 6970.1). Total num frames: 49479680. Throughput: 0: 1723.1. Samples: 7364806. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:28:37,936][41694] Avg episode reward: [(0, '4.300')] +[2024-11-08 00:28:41,094][42004] Updated weights for policy 0, policy_version 12086 (0.0035) +[2024-11-08 00:28:42,932][41694] Fps is (10 sec: 6963.1, 60 sec: 7031.4, 300 sec: 6970.1). Total num frames: 49516544. Throughput: 0: 1781.2. Samples: 7375428. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:28:42,933][41694] Avg episode reward: [(0, '4.339')] +[2024-11-08 00:28:46,344][42004] Updated weights for policy 0, policy_version 12096 (0.0024) +[2024-11-08 00:28:47,932][41694] Fps is (10 sec: 7372.2, 60 sec: 6963.2, 300 sec: 7011.8). Total num frames: 49553408. Throughput: 0: 1779.7. Samples: 7381140. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:28:47,935][41694] Avg episode reward: [(0, '4.463')] +[2024-11-08 00:28:51,766][42004] Updated weights for policy 0, policy_version 12106 (0.0031) +[2024-11-08 00:28:52,931][41694] Fps is (10 sec: 7782.6, 60 sec: 7279.2, 300 sec: 7025.7). Total num frames: 49594368. Throughput: 0: 1823.8. Samples: 7392750. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:28:52,933][41694] Avg episode reward: [(0, '4.226')] +[2024-11-08 00:28:56,975][42004] Updated weights for policy 0, policy_version 12116 (0.0029) +[2024-11-08 00:28:57,931][41694] Fps is (10 sec: 7783.2, 60 sec: 7236.3, 300 sec: 7025.7). Total num frames: 49631232. Throughput: 0: 1824.1. Samples: 7404432. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:28:57,933][41694] Avg episode reward: [(0, '4.372')] +[2024-11-08 00:29:02,937][41694] Fps is (10 sec: 5731.1, 60 sec: 7030.8, 300 sec: 6983.9). Total num frames: 49651712. Throughput: 0: 1811.5. Samples: 7409880. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:29:02,940][41694] Avg episode reward: [(0, '4.429')] +[2024-11-08 00:29:04,700][42004] Updated weights for policy 0, policy_version 12126 (0.0026) +[2024-11-08 00:29:07,934][41694] Fps is (10 sec: 5733.1, 60 sec: 7031.2, 300 sec: 6984.0). Total num frames: 49688576. Throughput: 0: 1705.9. Samples: 7416964. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:29:07,938][41694] Avg episode reward: [(0, '4.370')] +[2024-11-08 00:29:10,555][42004] Updated weights for policy 0, policy_version 12136 (0.0023) +[2024-11-08 00:29:12,932][41694] Fps is (10 sec: 6967.0, 60 sec: 6963.2, 300 sec: 6970.1). Total num frames: 49721344. Throughput: 0: 1737.6. Samples: 7427240. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:29:12,933][41694] Avg episode reward: [(0, '4.565')] +[2024-11-08 00:29:16,046][42004] Updated weights for policy 0, policy_version 12146 (0.0026) +[2024-11-08 00:29:17,931][41694] Fps is (10 sec: 7374.4, 60 sec: 6963.3, 300 sec: 6970.1). Total num frames: 49762304. Throughput: 0: 1762.7. Samples: 7432978. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:29:17,933][41694] Avg episode reward: [(0, '4.592')] +[2024-11-08 00:29:21,252][42004] Updated weights for policy 0, policy_version 12156 (0.0023) +[2024-11-08 00:29:22,931][41694] Fps is (10 sec: 8192.2, 60 sec: 6963.2, 300 sec: 7039.6). Total num frames: 49803264. Throughput: 0: 1776.1. Samples: 7444730. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:29:22,933][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 00:29:26,574][42004] Updated weights for policy 0, policy_version 12166 (0.0031) +[2024-11-08 00:29:27,932][41694] Fps is (10 sec: 7782.1, 60 sec: 7236.2, 300 sec: 7025.7). Total num frames: 49840128. Throughput: 0: 1798.7. Samples: 7456368. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:29:27,934][41694] Avg episode reward: [(0, '4.353')] +[2024-11-08 00:29:31,965][42004] Updated weights for policy 0, policy_version 12176 (0.0026) +[2024-11-08 00:29:32,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7168.0, 300 sec: 7025.7). Total num frames: 49876992. Throughput: 0: 1796.8. Samples: 7461996. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:29:32,933][41694] Avg episode reward: [(0, '4.404')] +[2024-11-08 00:29:37,932][41694] Fps is (10 sec: 6144.1, 60 sec: 7031.5, 300 sec: 6997.9). Total num frames: 49901568. Throughput: 0: 1726.3. Samples: 7470434. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:29:37,933][41694] Avg episode reward: [(0, '4.487')] +[2024-11-08 00:29:37,945][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000012183_49901568.pth... +[2024-11-08 00:29:38,079][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000011771_48214016.pth +[2024-11-08 00:29:39,412][42004] Updated weights for policy 0, policy_version 12186 (0.0044) +[2024-11-08 00:29:42,934][41694] Fps is (10 sec: 5733.2, 60 sec: 6963.0, 300 sec: 6984.0). Total num frames: 49934336. Throughput: 0: 1686.9. Samples: 7480344. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:29:42,935][41694] Avg episode reward: [(0, '4.286')] +[2024-11-08 00:29:45,832][42004] Updated weights for policy 0, policy_version 12196 (0.0036) +[2024-11-08 00:29:47,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6895.0, 300 sec: 6970.1). Total num frames: 49967104. Throughput: 0: 1669.1. Samples: 7484982. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:29:47,933][41694] Avg episode reward: [(0, '4.403')] +[2024-11-08 00:29:51,131][42004] Updated weights for policy 0, policy_version 12206 (0.0028) +[2024-11-08 00:29:52,932][41694] Fps is (10 sec: 6964.6, 60 sec: 6826.7, 300 sec: 6956.3). Total num frames: 50003968. Throughput: 0: 1765.4. Samples: 7496402. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:29:52,934][41694] Avg episode reward: [(0, '4.271')] +[2024-11-08 00:29:56,860][42004] Updated weights for policy 0, policy_version 12216 (0.0041) +[2024-11-08 00:29:57,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6894.9, 300 sec: 7025.7). Total num frames: 50044928. Throughput: 0: 1773.4. Samples: 7507042. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:29:57,934][41694] Avg episode reward: [(0, '4.424')] +[2024-11-08 00:30:02,446][42004] Updated weights for policy 0, policy_version 12226 (0.0028) +[2024-11-08 00:30:02,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7100.4, 300 sec: 7011.8). Total num frames: 50077696. Throughput: 0: 1769.0. Samples: 7512582. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:30:02,933][41694] Avg episode reward: [(0, '4.299')] +[2024-11-08 00:30:07,925][42004] Updated weights for policy 0, policy_version 12236 (0.0030) +[2024-11-08 00:30:07,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7168.3, 300 sec: 7025.7). Total num frames: 50118656. Throughput: 0: 1753.6. Samples: 7523644. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:30:07,933][41694] Avg episode reward: [(0, '4.539')] +[2024-11-08 00:30:12,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6963.2, 300 sec: 6984.0). Total num frames: 50139136. Throughput: 0: 1656.1. Samples: 7530890. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:30:12,933][41694] Avg episode reward: [(0, '4.475')] +[2024-11-08 00:30:16,123][42004] Updated weights for policy 0, policy_version 12246 (0.0025) +[2024-11-08 00:30:17,932][41694] Fps is (10 sec: 4915.2, 60 sec: 6758.4, 300 sec: 6956.3). Total num frames: 50167808. Throughput: 0: 1642.7. Samples: 7535918. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:30:17,933][41694] Avg episode reward: [(0, '4.571')] +[2024-11-08 00:30:22,631][42004] Updated weights for policy 0, policy_version 12256 (0.0032) +[2024-11-08 00:30:22,932][41694] Fps is (10 sec: 6143.8, 60 sec: 6621.8, 300 sec: 6942.4). Total num frames: 50200576. Throughput: 0: 1647.4. Samples: 7544566. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:30:22,934][41694] Avg episode reward: [(0, '4.362')] +[2024-11-08 00:30:27,668][42004] Updated weights for policy 0, policy_version 12266 (0.0033) +[2024-11-08 00:30:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.2, 300 sec: 6942.4). Total num frames: 50241536. Throughput: 0: 1695.5. Samples: 7556636. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:30:27,933][41694] Avg episode reward: [(0, '4.472')] +[2024-11-08 00:30:32,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6690.1, 300 sec: 6998.0). Total num frames: 50278400. Throughput: 0: 1714.9. Samples: 7562152. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:30:32,934][41694] Avg episode reward: [(0, '4.426')] +[2024-11-08 00:30:33,089][42004] Updated weights for policy 0, policy_version 12276 (0.0022) +[2024-11-08 00:30:37,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 7011.8). Total num frames: 50319360. Throughput: 0: 1725.4. Samples: 7574046. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:30:37,935][41694] Avg episode reward: [(0, '4.329')] +[2024-11-08 00:30:38,195][42004] Updated weights for policy 0, policy_version 12286 (0.0026) +[2024-11-08 00:30:42,937][41694] Fps is (10 sec: 7778.4, 60 sec: 7031.1, 300 sec: 7011.7). Total num frames: 50356224. Throughput: 0: 1751.0. Samples: 7585848. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:30:42,938][41694] Avg episode reward: [(0, '4.553')] +[2024-11-08 00:30:45,668][42004] Updated weights for policy 0, policy_version 12296 (0.0030) +[2024-11-08 00:30:47,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6894.9, 300 sec: 6970.1). Total num frames: 50380800. Throughput: 0: 1669.9. Samples: 7587726. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:30:47,934][41694] Avg episode reward: [(0, '4.346')] +[2024-11-08 00:30:51,153][42004] Updated weights for policy 0, policy_version 12306 (0.0030) +[2024-11-08 00:30:52,932][41694] Fps is (10 sec: 5737.4, 60 sec: 6826.7, 300 sec: 6942.4). Total num frames: 50413568. Throughput: 0: 1667.2. Samples: 7598670. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:30:52,935][41694] Avg episode reward: [(0, '4.359')] +[2024-11-08 00:30:56,702][42004] Updated weights for policy 0, policy_version 12316 (0.0023) +[2024-11-08 00:30:57,932][41694] Fps is (10 sec: 7372.3, 60 sec: 6826.6, 300 sec: 6956.2). Total num frames: 50454528. Throughput: 0: 1753.6. Samples: 7609802. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:30:57,933][41694] Avg episode reward: [(0, '4.286')] +[2024-11-08 00:31:02,133][42004] Updated weights for policy 0, policy_version 12326 (0.0022) +[2024-11-08 00:31:02,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6894.9, 300 sec: 6956.3). Total num frames: 50491392. Throughput: 0: 1773.7. Samples: 7615734. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:31:02,933][41694] Avg episode reward: [(0, '4.567')] +[2024-11-08 00:31:07,339][42004] Updated weights for policy 0, policy_version 12336 (0.0023) +[2024-11-08 00:31:07,932][41694] Fps is (10 sec: 7782.8, 60 sec: 6894.9, 300 sec: 7039.6). Total num frames: 50532352. Throughput: 0: 1833.4. Samples: 7627068. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:31:07,934][41694] Avg episode reward: [(0, '4.368')] +[2024-11-08 00:31:12,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7099.7, 300 sec: 7025.7). Total num frames: 50565120. Throughput: 0: 1800.3. Samples: 7637648. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:31:12,935][41694] Avg episode reward: [(0, '4.499')] +[2024-11-08 00:31:13,326][42004] Updated weights for policy 0, policy_version 12346 (0.0035) +[2024-11-08 00:31:19,571][41694] Fps is (10 sec: 5630.5, 60 sec: 6977.4, 300 sec: 6986.9). Total num frames: 50597888. Throughput: 0: 1733.7. Samples: 7643012. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:31:19,576][41694] Avg episode reward: [(0, '4.595')] +[2024-11-08 00:31:20,796][42004] Updated weights for policy 0, policy_version 12356 (0.0032) +[2024-11-08 00:31:22,932][41694] Fps is (10 sec: 5734.4, 60 sec: 7031.5, 300 sec: 6984.0). Total num frames: 50622464. Throughput: 0: 1701.6. Samples: 7650618. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:31:22,934][41694] Avg episode reward: [(0, '4.517')] +[2024-11-08 00:31:26,648][42004] Updated weights for policy 0, policy_version 12366 (0.0027) +[2024-11-08 00:31:27,931][41694] Fps is (10 sec: 7348.8, 60 sec: 6963.2, 300 sec: 6984.0). Total num frames: 50659328. Throughput: 0: 1668.0. Samples: 7660900. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:31:27,933][41694] Avg episode reward: [(0, '4.567')] +[2024-11-08 00:31:32,031][42004] Updated weights for policy 0, policy_version 12376 (0.0030) +[2024-11-08 00:31:32,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6984.0). Total num frames: 50696192. Throughput: 0: 1751.7. Samples: 7666552. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:31:32,933][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 00:31:37,096][42004] Updated weights for policy 0, policy_version 12386 (0.0024) +[2024-11-08 00:31:37,932][41694] Fps is (10 sec: 7782.1, 60 sec: 6963.2, 300 sec: 7035.8). Total num frames: 50737152. Throughput: 0: 1774.8. Samples: 7678536. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:31:37,933][41694] Avg episode reward: [(0, '4.199')] +[2024-11-08 00:31:37,944][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000012387_50737152.pth... +[2024-11-08 00:31:38,076][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000011977_49057792.pth +[2024-11-08 00:31:42,341][42004] Updated weights for policy 0, policy_version 12396 (0.0027) +[2024-11-08 00:31:42,932][41694] Fps is (10 sec: 8192.1, 60 sec: 7032.1, 300 sec: 7053.5). Total num frames: 50778112. Throughput: 0: 1791.7. Samples: 7690428. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:31:42,934][41694] Avg episode reward: [(0, '4.418')] +[2024-11-08 00:31:47,458][42004] Updated weights for policy 0, policy_version 12406 (0.0031) +[2024-11-08 00:31:47,931][41694] Fps is (10 sec: 7782.7, 60 sec: 7236.3, 300 sec: 7039.6). Total num frames: 50814976. Throughput: 0: 1792.0. Samples: 7696374. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:31:47,935][41694] Avg episode reward: [(0, '4.393')] +[2024-11-08 00:31:53,942][41694] Fps is (10 sec: 5951.9, 60 sec: 7049.2, 300 sec: 7001.7). Total num frames: 50843648. Throughput: 0: 1756.5. Samples: 7707886. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:31:53,944][41694] Avg episode reward: [(0, '4.417')] +[2024-11-08 00:31:55,064][42004] Updated weights for policy 0, policy_version 12416 (0.0029) +[2024-11-08 00:31:57,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6963.3, 300 sec: 6997.9). Total num frames: 50872320. Throughput: 0: 1714.3. Samples: 7714792. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:31:57,933][41694] Avg episode reward: [(0, '4.412')] +[2024-11-08 00:32:01,009][42004] Updated weights for policy 0, policy_version 12426 (0.0030) +[2024-11-08 00:32:02,932][41694] Fps is (10 sec: 6834.9, 60 sec: 6894.9, 300 sec: 6970.1). Total num frames: 50905088. Throughput: 0: 1776.6. Samples: 7720046. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:32:02,934][41694] Avg episode reward: [(0, '4.649')] +[2024-11-08 00:32:06,647][42004] Updated weights for policy 0, policy_version 12436 (0.0029) +[2024-11-08 00:32:07,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6895.0, 300 sec: 6984.0). Total num frames: 50946048. Throughput: 0: 1780.6. Samples: 7730744. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:32:07,933][41694] Avg episode reward: [(0, '4.386')] +[2024-11-08 00:32:11,857][42004] Updated weights for policy 0, policy_version 12446 (0.0024) +[2024-11-08 00:32:12,931][41694] Fps is (10 sec: 8192.1, 60 sec: 7031.5, 300 sec: 7049.2). Total num frames: 50987008. Throughput: 0: 1813.3. Samples: 7742500. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:32:12,934][41694] Avg episode reward: [(0, '4.511')] +[2024-11-08 00:32:17,025][42004] Updated weights for policy 0, policy_version 12456 (0.0024) +[2024-11-08 00:32:17,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7299.2, 300 sec: 7039.6). Total num frames: 51023872. Throughput: 0: 1819.5. Samples: 7748430. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:32:17,934][41694] Avg episode reward: [(0, '4.250')] +[2024-11-08 00:32:22,366][42004] Updated weights for policy 0, policy_version 12466 (0.0022) +[2024-11-08 00:32:22,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7372.8, 300 sec: 7053.5). Total num frames: 51064832. Throughput: 0: 1813.5. Samples: 7760142. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:32:22,934][41694] Avg episode reward: [(0, '4.452')] +[2024-11-08 00:32:28,355][41694] Fps is (10 sec: 5894.4, 60 sec: 7050.0, 300 sec: 7001.7). Total num frames: 51085312. Throughput: 0: 1665.2. Samples: 7766066. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:32:28,356][41694] Avg episode reward: [(0, '4.372')] +[2024-11-08 00:32:30,000][42004] Updated weights for policy 0, policy_version 12476 (0.0029) +[2024-11-08 00:32:32,931][41694] Fps is (10 sec: 5324.8, 60 sec: 7031.5, 300 sec: 6984.1). Total num frames: 51118080. Throughput: 0: 1699.5. Samples: 7772852. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:32:32,933][41694] Avg episode reward: [(0, '4.374')] +[2024-11-08 00:32:36,109][42004] Updated weights for policy 0, policy_version 12486 (0.0038) +[2024-11-08 00:32:37,932][41694] Fps is (10 sec: 7271.1, 60 sec: 6963.2, 300 sec: 6984.0). Total num frames: 51154944. Throughput: 0: 1709.3. Samples: 7783078. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:32:37,933][41694] Avg episode reward: [(0, '4.476')] +[2024-11-08 00:32:41,571][42004] Updated weights for policy 0, policy_version 12496 (0.0036) +[2024-11-08 00:32:42,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6894.9, 300 sec: 6970.2). Total num frames: 51191808. Throughput: 0: 1770.8. Samples: 7794480. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 00:32:42,935][41694] Avg episode reward: [(0, '4.513')] +[2024-11-08 00:32:46,732][42004] Updated weights for policy 0, policy_version 12506 (0.0027) +[2024-11-08 00:32:47,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 7034.1). Total num frames: 51232768. Throughput: 0: 1783.9. Samples: 7800322. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 00:32:47,934][41694] Avg episode reward: [(0, '4.419')] +[2024-11-08 00:32:51,957][42004] Updated weights for policy 0, policy_version 12516 (0.0027) +[2024-11-08 00:32:52,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7221.4, 300 sec: 7025.7). Total num frames: 51269632. Throughput: 0: 1807.4. Samples: 7812076. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:32:52,934][41694] Avg episode reward: [(0, '4.281')] +[2024-11-08 00:32:57,295][42004] Updated weights for policy 0, policy_version 12526 (0.0023) +[2024-11-08 00:32:57,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7304.5, 300 sec: 7053.5). Total num frames: 51310592. Throughput: 0: 1806.3. Samples: 7823782. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:32:57,934][41694] Avg episode reward: [(0, '4.343')] +[2024-11-08 00:33:02,957][41694] Fps is (10 sec: 6128.5, 60 sec: 7096.7, 300 sec: 6997.3). Total num frames: 51331072. Throughput: 0: 1796.8. Samples: 7829332. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:33:02,960][41694] Avg episode reward: [(0, '4.305')] +[2024-11-08 00:33:05,521][42004] Updated weights for policy 0, policy_version 12536 (0.0044) +[2024-11-08 00:33:07,939][41694] Fps is (10 sec: 5324.4, 60 sec: 6963.1, 300 sec: 6984.0). Total num frames: 51363840. Throughput: 0: 1667.7. Samples: 7835188. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:33:07,942][41694] Avg episode reward: [(0, '4.543')] +[2024-11-08 00:33:11,305][42004] Updated weights for policy 0, policy_version 12546 (0.0025) +[2024-11-08 00:33:12,931][41694] Fps is (10 sec: 6980.9, 60 sec: 6894.9, 300 sec: 6970.2). Total num frames: 51400704. Throughput: 0: 1795.5. Samples: 7846102. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:33:12,933][41694] Avg episode reward: [(0, '4.289')] +[2024-11-08 00:33:16,274][42004] Updated weights for policy 0, policy_version 12556 (0.0031) +[2024-11-08 00:33:17,931][41694] Fps is (10 sec: 7783.0, 60 sec: 6963.2, 300 sec: 6970.1). Total num frames: 51441664. Throughput: 0: 1765.8. Samples: 7852314. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:33:17,933][41694] Avg episode reward: [(0, '4.355')] +[2024-11-08 00:33:21,462][42004] Updated weights for policy 0, policy_version 12566 (0.0031) +[2024-11-08 00:33:22,932][41694] Fps is (10 sec: 7781.8, 60 sec: 6894.9, 300 sec: 7025.7). Total num frames: 51478528. Throughput: 0: 1802.9. Samples: 7864208. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:33:22,934][41694] Avg episode reward: [(0, '4.389')] +[2024-11-08 00:33:26,634][42004] Updated weights for policy 0, policy_version 12576 (0.0030) +[2024-11-08 00:33:27,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7287.7, 300 sec: 7025.7). Total num frames: 51519488. Throughput: 0: 1814.4. Samples: 7876126. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:33:27,934][41694] Avg episode reward: [(0, '4.231')] +[2024-11-08 00:33:31,789][42004] Updated weights for policy 0, policy_version 12586 (0.0027) +[2024-11-08 00:33:32,932][41694] Fps is (10 sec: 8192.4, 60 sec: 7372.8, 300 sec: 7053.4). Total num frames: 51560448. Throughput: 0: 1814.8. Samples: 7881990. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:33:32,933][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 00:33:37,932][41694] Fps is (10 sec: 6143.5, 60 sec: 7099.6, 300 sec: 6997.9). Total num frames: 51580928. Throughput: 0: 1774.7. Samples: 7891940. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:33:37,934][41694] Avg episode reward: [(0, '4.402')] +[2024-11-08 00:33:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000012593_51580928.pth... +[2024-11-08 00:33:38,122][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000012183_49901568.pth +[2024-11-08 00:33:39,747][42004] Updated weights for policy 0, policy_version 12596 (0.0025) +[2024-11-08 00:33:42,931][41694] Fps is (10 sec: 5324.9, 60 sec: 7031.5, 300 sec: 6984.1). Total num frames: 51613696. Throughput: 0: 1688.5. Samples: 7899766. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:33:42,933][41694] Avg episode reward: [(0, '4.495')] +[2024-11-08 00:33:45,236][42004] Updated weights for policy 0, policy_version 12606 (0.0032) +[2024-11-08 00:33:47,931][41694] Fps is (10 sec: 7373.3, 60 sec: 7031.5, 300 sec: 6984.0). Total num frames: 51654656. Throughput: 0: 1699.1. Samples: 7905748. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:33:47,933][41694] Avg episode reward: [(0, '4.435')] +[2024-11-08 00:33:50,296][42004] Updated weights for policy 0, policy_version 12616 (0.0032) +[2024-11-08 00:33:52,932][41694] Fps is (10 sec: 8191.9, 60 sec: 7099.7, 300 sec: 6997.9). Total num frames: 51695616. Throughput: 0: 1838.3. Samples: 7917910. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:33:52,933][41694] Avg episode reward: [(0, '4.477')] +[2024-11-08 00:33:55,450][42004] Updated weights for policy 0, policy_version 12626 (0.0024) +[2024-11-08 00:33:57,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7031.5, 300 sec: 7053.6). Total num frames: 51732480. Throughput: 0: 1855.5. Samples: 7929600. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:33:57,934][41694] Avg episode reward: [(0, '4.378')] +[2024-11-08 00:34:01,008][42004] Updated weights for policy 0, policy_version 12636 (0.0031) +[2024-11-08 00:34:02,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7307.6, 300 sec: 7053.5). Total num frames: 51769344. Throughput: 0: 1843.9. Samples: 7935288. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:34:02,935][41694] Avg episode reward: [(0, '4.287')] +[2024-11-08 00:34:06,660][42004] Updated weights for policy 0, policy_version 12646 (0.0034) +[2024-11-08 00:34:07,932][41694] Fps is (10 sec: 7372.5, 60 sec: 7372.8, 300 sec: 7067.3). Total num frames: 51806208. Throughput: 0: 1816.1. Samples: 7945932. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:34:07,934][41694] Avg episode reward: [(0, '4.421')] +[2024-11-08 00:34:12,932][41694] Fps is (10 sec: 5324.8, 60 sec: 7031.5, 300 sec: 6984.0). Total num frames: 51822592. Throughput: 0: 1699.8. Samples: 7952616. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:34:12,933][41694] Avg episode reward: [(0, '4.557')] +[2024-11-08 00:34:14,774][42004] Updated weights for policy 0, policy_version 12656 (0.0034) +[2024-11-08 00:34:17,933][41694] Fps is (10 sec: 5324.3, 60 sec: 6963.0, 300 sec: 6970.1). Total num frames: 51859456. Throughput: 0: 1681.0. Samples: 7957636. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:34:17,934][41694] Avg episode reward: [(0, '4.433')] +[2024-11-08 00:34:20,235][42004] Updated weights for policy 0, policy_version 12666 (0.0026) +[2024-11-08 00:34:22,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7031.5, 300 sec: 6984.0). Total num frames: 51900416. Throughput: 0: 1717.3. Samples: 7969216. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:34:22,933][41694] Avg episode reward: [(0, '4.206')] +[2024-11-08 00:34:25,090][42004] Updated weights for policy 0, policy_version 12676 (0.0032) +[2024-11-08 00:34:27,932][41694] Fps is (10 sec: 8192.8, 60 sec: 7031.4, 300 sec: 6997.9). Total num frames: 51941376. Throughput: 0: 1812.6. Samples: 7981334. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:34:27,934][41694] Avg episode reward: [(0, '4.475')] +[2024-11-08 00:34:30,448][42004] Updated weights for policy 0, policy_version 12686 (0.0024) +[2024-11-08 00:34:32,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6963.2, 300 sec: 7039.6). Total num frames: 51978240. Throughput: 0: 1810.4. Samples: 7987216. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:34:32,933][41694] Avg episode reward: [(0, '4.293')] +[2024-11-08 00:34:35,777][42004] Updated weights for policy 0, policy_version 12696 (0.0030) +[2024-11-08 00:34:37,931][41694] Fps is (10 sec: 7782.9, 60 sec: 7304.6, 300 sec: 7067.4). Total num frames: 52019200. Throughput: 0: 1797.7. Samples: 7998804. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:34:37,933][41694] Avg episode reward: [(0, '4.493')] +[2024-11-08 00:34:41,119][42004] Updated weights for policy 0, policy_version 12706 (0.0020) +[2024-11-08 00:34:42,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7304.5, 300 sec: 7067.3). Total num frames: 52051968. Throughput: 0: 1783.0. Samples: 8009836. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:34:42,933][41694] Avg episode reward: [(0, '4.399')] +[2024-11-08 00:34:47,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6963.2, 300 sec: 7011.8). Total num frames: 52072448. Throughput: 0: 1721.3. Samples: 8012748. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:34:47,933][41694] Avg episode reward: [(0, '4.305')] +[2024-11-08 00:34:49,547][42004] Updated weights for policy 0, policy_version 12716 (0.0043) +[2024-11-08 00:34:52,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6895.0, 300 sec: 6997.9). Total num frames: 52109312. Throughput: 0: 1677.5. Samples: 8021420. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:34:52,933][41694] Avg episode reward: [(0, '4.424')] +[2024-11-08 00:34:54,980][42004] Updated weights for policy 0, policy_version 12726 (0.0024) +[2024-11-08 00:34:57,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 7011.8). Total num frames: 52146176. Throughput: 0: 1786.8. Samples: 8033020. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:34:57,935][41694] Avg episode reward: [(0, '4.329')] +[2024-11-08 00:35:00,152][42004] Updated weights for policy 0, policy_version 12736 (0.0032) +[2024-11-08 00:35:02,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6894.9, 300 sec: 6997.9). Total num frames: 52183040. Throughput: 0: 1808.1. Samples: 8038998. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:35:02,934][41694] Avg episode reward: [(0, '4.296')] +[2024-11-08 00:35:05,799][42004] Updated weights for policy 0, policy_version 12746 (0.0024) +[2024-11-08 00:35:07,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.3, 300 sec: 7067.3). Total num frames: 52224000. Throughput: 0: 1794.4. Samples: 8049964. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:35:07,935][41694] Avg episode reward: [(0, '4.507')] +[2024-11-08 00:35:11,288][42004] Updated weights for policy 0, policy_version 12756 (0.0024) +[2024-11-08 00:35:12,931][41694] Fps is (10 sec: 7373.1, 60 sec: 7236.3, 300 sec: 7081.2). Total num frames: 52256768. Throughput: 0: 1767.6. Samples: 8060874. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:35:12,933][41694] Avg episode reward: [(0, '4.405')] +[2024-11-08 00:35:17,850][42004] Updated weights for policy 0, policy_version 12766 (0.0036) +[2024-11-08 00:35:17,932][41694] Fps is (10 sec: 6553.6, 60 sec: 7168.2, 300 sec: 7081.2). Total num frames: 52289536. Throughput: 0: 1739.6. Samples: 8065500. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:35:17,935][41694] Avg episode reward: [(0, '4.332')] +[2024-11-08 00:35:22,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6758.4, 300 sec: 6997.9). Total num frames: 52305920. Throughput: 0: 1609.6. Samples: 8071236. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:35:22,933][41694] Avg episode reward: [(0, '4.288')] +[2024-11-08 00:35:26,006][42004] Updated weights for policy 0, policy_version 12776 (0.0034) +[2024-11-08 00:35:27,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6690.2, 300 sec: 6997.9). Total num frames: 52342784. Throughput: 0: 1602.0. Samples: 8081926. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:35:27,934][41694] Avg episode reward: [(0, '4.468')] +[2024-11-08 00:35:31,569][42004] Updated weights for policy 0, policy_version 12786 (0.0027) +[2024-11-08 00:35:32,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6984.0). Total num frames: 52379648. Throughput: 0: 1661.7. Samples: 8087524. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:35:32,934][41694] Avg episode reward: [(0, '4.623')] +[2024-11-08 00:35:36,616][42004] Updated weights for policy 0, policy_version 12796 (0.0032) +[2024-11-08 00:35:37,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6690.1, 300 sec: 6998.0). Total num frames: 52420608. Throughput: 0: 1734.0. Samples: 8099450. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:35:37,933][41694] Avg episode reward: [(0, '4.485')] +[2024-11-08 00:35:37,943][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000012798_52420608.pth... +[2024-11-08 00:35:38,104][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000012387_50737152.pth +[2024-11-08 00:35:41,859][42004] Updated weights for policy 0, policy_version 12806 (0.0032) +[2024-11-08 00:35:42,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 7039.6). Total num frames: 52457472. Throughput: 0: 1736.3. Samples: 8111152. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:35:42,933][41694] Avg episode reward: [(0, '4.356')] +[2024-11-08 00:35:47,124][42004] Updated weights for policy 0, policy_version 12816 (0.0026) +[2024-11-08 00:35:47,932][41694] Fps is (10 sec: 7782.2, 60 sec: 7099.7, 300 sec: 7067.3). Total num frames: 52498432. Throughput: 0: 1733.1. Samples: 8116986. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:35:47,935][41694] Avg episode reward: [(0, '4.436')] +[2024-11-08 00:35:55,130][41694] Fps is (10 sec: 6043.8, 60 sec: 6782.9, 300 sec: 6987.5). Total num frames: 52531200. Throughput: 0: 1644.2. Samples: 8127566. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:35:55,131][41694] Avg episode reward: [(0, '4.572')] +[2024-11-08 00:35:55,468][42004] Updated weights for policy 0, policy_version 12826 (0.0033) +[2024-11-08 00:35:57,932][41694] Fps is (10 sec: 5325.0, 60 sec: 6758.4, 300 sec: 6984.0). Total num frames: 52551680. Throughput: 0: 1619.9. Samples: 8133770. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:35:57,933][41694] Avg episode reward: [(0, '4.536')] +[2024-11-08 00:36:01,305][42004] Updated weights for policy 0, policy_version 12836 (0.0038) +[2024-11-08 00:36:02,932][41694] Fps is (10 sec: 6825.3, 60 sec: 6690.1, 300 sec: 6956.3). Total num frames: 52584448. Throughput: 0: 1641.0. Samples: 8139346. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:36:02,935][41694] Avg episode reward: [(0, '4.437')] +[2024-11-08 00:36:07,003][42004] Updated weights for policy 0, policy_version 12846 (0.0027) +[2024-11-08 00:36:07,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 6970.1). Total num frames: 52621312. Throughput: 0: 1749.7. Samples: 8149972. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:36:07,933][41694] Avg episode reward: [(0, '4.341')] +[2024-11-08 00:36:12,586][42004] Updated weights for policy 0, policy_version 12856 (0.0032) +[2024-11-08 00:36:12,933][41694] Fps is (10 sec: 7371.9, 60 sec: 6690.0, 300 sec: 7023.0). Total num frames: 52658176. Throughput: 0: 1753.9. Samples: 8160854. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:36:12,937][41694] Avg episode reward: [(0, '4.326')] +[2024-11-08 00:36:17,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 7025.7). Total num frames: 52695040. Throughput: 0: 1754.1. Samples: 8166458. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:36:17,934][41694] Avg episode reward: [(0, '4.781')] +[2024-11-08 00:36:18,010][42004] Updated weights for policy 0, policy_version 12866 (0.0031) +[2024-11-08 00:36:22,932][41694] Fps is (10 sec: 7373.6, 60 sec: 7099.7, 300 sec: 7025.7). Total num frames: 52731904. Throughput: 0: 1744.6. Samples: 8177956. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:36:22,934][41694] Avg episode reward: [(0, '4.412')] +[2024-11-08 00:36:23,535][42004] Updated weights for policy 0, policy_version 12876 (0.0026) +[2024-11-08 00:36:29,714][41694] Fps is (10 sec: 5909.7, 60 sec: 6828.6, 300 sec: 6969.7). Total num frames: 52764672. Throughput: 0: 1650.6. Samples: 8188370. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 00:36:29,716][41694] Avg episode reward: [(0, '4.444')] +[2024-11-08 00:36:31,837][42004] Updated weights for policy 0, policy_version 12886 (0.0029) +[2024-11-08 00:36:32,932][41694] Fps is (10 sec: 5325.0, 60 sec: 6758.4, 300 sec: 6942.4). Total num frames: 52785152. Throughput: 0: 1611.6. Samples: 8189506. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 00:36:32,933][41694] Avg episode reward: [(0, '4.542')] +[2024-11-08 00:36:37,401][42004] Updated weights for policy 0, policy_version 12896 (0.0026) +[2024-11-08 00:36:37,932][41694] Fps is (10 sec: 7476.8, 60 sec: 6758.4, 300 sec: 6942.4). Total num frames: 52826112. Throughput: 0: 1699.9. Samples: 8200322. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:36:37,933][41694] Avg episode reward: [(0, '4.461')] +[2024-11-08 00:36:42,631][42004] Updated weights for policy 0, policy_version 12906 (0.0028) +[2024-11-08 00:36:42,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 6942.4). Total num frames: 52862976. Throughput: 0: 1742.4. Samples: 8212178. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:36:42,933][41694] Avg episode reward: [(0, '4.489')] +[2024-11-08 00:36:47,717][42004] Updated weights for policy 0, policy_version 12916 (0.0022) +[2024-11-08 00:36:47,932][41694] Fps is (10 sec: 7782.0, 60 sec: 6758.4, 300 sec: 7008.0). Total num frames: 52903936. Throughput: 0: 1747.9. Samples: 8218002. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:36:47,935][41694] Avg episode reward: [(0, '4.351')] +[2024-11-08 00:36:52,713][42004] Updated weights for policy 0, policy_version 12926 (0.0026) +[2024-11-08 00:36:52,933][41694] Fps is (10 sec: 8191.0, 60 sec: 7157.1, 300 sec: 7025.6). Total num frames: 52944896. Throughput: 0: 1783.1. Samples: 8230212. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:36:52,935][41694] Avg episode reward: [(0, '4.252')] +[2024-11-08 00:36:57,932][41694] Fps is (10 sec: 7782.6, 60 sec: 7168.0, 300 sec: 7039.6). Total num frames: 52981760. Throughput: 0: 1794.2. Samples: 8241590. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:36:57,934][41694] Avg episode reward: [(0, '4.519')] +[2024-11-08 00:36:58,519][42004] Updated weights for policy 0, policy_version 12936 (0.0042) +[2024-11-08 00:37:04,045][41694] Fps is (10 sec: 5529.0, 60 sec: 6903.4, 300 sec: 6957.8). Total num frames: 53006336. Throughput: 0: 1737.6. Samples: 8246584. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:37:04,047][41694] Avg episode reward: [(0, '4.537')] +[2024-11-08 00:37:06,848][42004] Updated weights for policy 0, policy_version 12946 (0.0046) +[2024-11-08 00:37:07,931][41694] Fps is (10 sec: 5325.0, 60 sec: 6894.9, 300 sec: 6942.4). Total num frames: 53035008. Throughput: 0: 1659.9. Samples: 8252650. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:37:07,933][41694] Avg episode reward: [(0, '4.330')] +[2024-11-08 00:37:12,191][42004] Updated weights for policy 0, policy_version 12956 (0.0029) +[2024-11-08 00:37:12,931][41694] Fps is (10 sec: 7375.0, 60 sec: 6895.1, 300 sec: 6942.4). Total num frames: 53071872. Throughput: 0: 1754.3. Samples: 8264184. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:37:12,933][41694] Avg episode reward: [(0, '4.465')] +[2024-11-08 00:37:17,117][42004] Updated weights for policy 0, policy_version 12966 (0.0027) +[2024-11-08 00:37:17,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6942.4). Total num frames: 53112832. Throughput: 0: 1796.8. Samples: 8270362. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 00:37:17,934][41694] Avg episode reward: [(0, '4.431')] +[2024-11-08 00:37:22,133][42004] Updated weights for policy 0, policy_version 12976 (0.0020) +[2024-11-08 00:37:22,931][41694] Fps is (10 sec: 8192.0, 60 sec: 7031.5, 300 sec: 7021.9). Total num frames: 53153792. Throughput: 0: 1827.9. Samples: 8282578. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 00:37:22,933][41694] Avg episode reward: [(0, '4.510')] +[2024-11-08 00:37:27,267][42004] Updated weights for policy 0, policy_version 12986 (0.0024) +[2024-11-08 00:37:27,932][41694] Fps is (10 sec: 8191.9, 60 sec: 7387.5, 300 sec: 7039.6). Total num frames: 53194752. Throughput: 0: 1835.5. Samples: 8294774. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:37:27,933][41694] Avg episode reward: [(0, '4.418')] +[2024-11-08 00:37:32,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7372.8, 300 sec: 7025.7). Total num frames: 53227520. Throughput: 0: 1825.4. Samples: 8300142. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:37:32,933][41694] Avg episode reward: [(0, '4.614')] +[2024-11-08 00:37:33,240][42004] Updated weights for policy 0, policy_version 12996 (0.0032) +[2024-11-08 00:37:38,608][41694] Fps is (10 sec: 5371.0, 60 sec: 7020.6, 300 sec: 6968.0). Total num frames: 53252096. Throughput: 0: 1757.7. Samples: 8310494. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:37:38,611][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 00:37:38,650][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000013001_53252096.pth... +[2024-11-08 00:37:38,780][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000012593_51580928.pth +[2024-11-08 00:37:41,311][42004] Updated weights for policy 0, policy_version 13006 (0.0028) +[2024-11-08 00:37:42,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6963.2, 300 sec: 6942.4). Total num frames: 53280768. Throughput: 0: 1673.7. Samples: 8316906. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:37:42,933][41694] Avg episode reward: [(0, '4.697')] +[2024-11-08 00:37:46,833][42004] Updated weights for policy 0, policy_version 13016 (0.0019) +[2024-11-08 00:37:47,931][41694] Fps is (10 sec: 7468.6, 60 sec: 6963.3, 300 sec: 6956.3). Total num frames: 53321728. Throughput: 0: 1727.9. Samples: 8322414. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:37:47,934][41694] Avg episode reward: [(0, '4.450')] +[2024-11-08 00:37:51,823][42004] Updated weights for policy 0, policy_version 13026 (0.0028) +[2024-11-08 00:37:52,931][41694] Fps is (10 sec: 8192.1, 60 sec: 6963.4, 300 sec: 6956.3). Total num frames: 53362688. Throughput: 0: 1821.5. Samples: 8334616. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:37:52,933][41694] Avg episode reward: [(0, '4.334')] +[2024-11-08 00:37:56,786][42004] Updated weights for policy 0, policy_version 13036 (0.0031) +[2024-11-08 00:37:57,932][41694] Fps is (10 sec: 8191.5, 60 sec: 7031.4, 300 sec: 7026.3). Total num frames: 53403648. Throughput: 0: 1842.1. Samples: 8347080. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:37:57,937][41694] Avg episode reward: [(0, '4.420')] +[2024-11-08 00:38:02,080][42004] Updated weights for policy 0, policy_version 13046 (0.0025) +[2024-11-08 00:38:02,932][41694] Fps is (10 sec: 7782.2, 60 sec: 7373.1, 300 sec: 7039.6). Total num frames: 53440512. Throughput: 0: 1837.0. Samples: 8353028. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:38:02,935][41694] Avg episode reward: [(0, '4.752')] +[2024-11-08 00:38:07,932][41694] Fps is (10 sec: 6963.6, 60 sec: 7304.5, 300 sec: 7025.7). Total num frames: 53473280. Throughput: 0: 1793.6. Samples: 8363292. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:38:07,934][41694] Avg episode reward: [(0, '4.435')] +[2024-11-08 00:38:08,085][42004] Updated weights for policy 0, policy_version 13056 (0.0033) +[2024-11-08 00:38:13,280][41694] Fps is (10 sec: 5541.4, 60 sec: 7058.7, 300 sec: 6961.9). Total num frames: 53497856. Throughput: 0: 1622.9. Samples: 8368368. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:38:13,281][41694] Avg episode reward: [(0, '4.449')] +[2024-11-08 00:38:15,926][42004] Updated weights for policy 0, policy_version 13066 (0.0032) +[2024-11-08 00:38:17,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6963.2, 300 sec: 6956.3). Total num frames: 53530624. Throughput: 0: 1673.3. Samples: 8375442. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:38:17,935][41694] Avg episode reward: [(0, '4.404')] +[2024-11-08 00:38:21,186][42004] Updated weights for policy 0, policy_version 13076 (0.0025) +[2024-11-08 00:38:22,932][41694] Fps is (10 sec: 7638.8, 60 sec: 6963.2, 300 sec: 6956.3). Total num frames: 53571584. Throughput: 0: 1727.3. Samples: 8387052. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:38:22,933][41694] Avg episode reward: [(0, '4.569')] +[2024-11-08 00:38:26,339][42004] Updated weights for policy 0, policy_version 13086 (0.0023) +[2024-11-08 00:38:27,931][41694] Fps is (10 sec: 8192.2, 60 sec: 6963.2, 300 sec: 6956.3). Total num frames: 53612544. Throughput: 0: 1826.3. Samples: 8399090. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 00:38:27,934][41694] Avg episode reward: [(0, '4.519')] +[2024-11-08 00:38:31,247][42004] Updated weights for policy 0, policy_version 13096 (0.0029) +[2024-11-08 00:38:32,932][41694] Fps is (10 sec: 8192.1, 60 sec: 7099.7, 300 sec: 7025.7). Total num frames: 53653504. Throughput: 0: 1841.3. Samples: 8405274. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 00:38:32,933][41694] Avg episode reward: [(0, '4.497')] +[2024-11-08 00:38:36,370][42004] Updated weights for policy 0, policy_version 13106 (0.0020) +[2024-11-08 00:38:37,934][41694] Fps is (10 sec: 7780.5, 60 sec: 7387.6, 300 sec: 7039.5). Total num frames: 53690368. Throughput: 0: 1841.5. Samples: 8417488. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:38:37,938][41694] Avg episode reward: [(0, '4.481')] +[2024-11-08 00:38:42,271][42004] Updated weights for policy 0, policy_version 13116 (0.0031) +[2024-11-08 00:38:42,933][41694] Fps is (10 sec: 7372.0, 60 sec: 7440.9, 300 sec: 7025.7). Total num frames: 53727232. Throughput: 0: 1794.3. Samples: 8427824. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:38:42,934][41694] Avg episode reward: [(0, '4.388')] +[2024-11-08 00:38:47,932][41694] Fps is (10 sec: 5735.7, 60 sec: 7099.7, 300 sec: 6956.3). Total num frames: 53747712. Throughput: 0: 1790.3. Samples: 8433592. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:38:47,934][41694] Avg episode reward: [(0, '4.305')] +[2024-11-08 00:38:49,902][42004] Updated weights for policy 0, policy_version 13126 (0.0030) +[2024-11-08 00:38:52,931][41694] Fps is (10 sec: 5735.1, 60 sec: 7031.5, 300 sec: 6956.3). Total num frames: 53784576. Throughput: 0: 1718.7. Samples: 8440632. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:38:52,933][41694] Avg episode reward: [(0, '4.577')] +[2024-11-08 00:38:55,114][42004] Updated weights for policy 0, policy_version 13136 (0.0024) +[2024-11-08 00:38:57,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7031.5, 300 sec: 6970.1). Total num frames: 53825536. Throughput: 0: 1884.6. Samples: 8452520. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:38:57,933][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 00:39:00,221][42004] Updated weights for policy 0, policy_version 13146 (0.0032) +[2024-11-08 00:39:02,933][41694] Fps is (10 sec: 7781.3, 60 sec: 7031.3, 300 sec: 6970.1). Total num frames: 53862400. Throughput: 0: 1849.6. Samples: 8458674. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:39:02,936][41694] Avg episode reward: [(0, '4.376')] +[2024-11-08 00:39:05,818][42004] Updated weights for policy 0, policy_version 13156 (0.0036) +[2024-11-08 00:39:07,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7099.7, 300 sec: 7039.6). Total num frames: 53899264. Throughput: 0: 1836.4. Samples: 8469688. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:39:07,934][41694] Avg episode reward: [(0, '4.477')] +[2024-11-08 00:39:11,242][42004] Updated weights for policy 0, policy_version 13166 (0.0020) +[2024-11-08 00:39:12,933][41694] Fps is (10 sec: 7373.0, 60 sec: 7347.1, 300 sec: 7039.6). Total num frames: 53936128. Throughput: 0: 1815.7. Samples: 8480798. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:39:12,934][41694] Avg episode reward: [(0, '4.394')] +[2024-11-08 00:39:16,902][42004] Updated weights for policy 0, policy_version 13176 (0.0027) +[2024-11-08 00:39:17,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7372.8, 300 sec: 7025.7). Total num frames: 53972992. Throughput: 0: 1793.3. Samples: 8485972. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:39:17,933][41694] Avg episode reward: [(0, '4.417')] +[2024-11-08 00:39:22,932][41694] Fps is (10 sec: 5734.9, 60 sec: 7031.5, 300 sec: 6956.3). Total num frames: 53993472. Throughput: 0: 1738.6. Samples: 8495720. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:39:22,933][41694] Avg episode reward: [(0, '4.291')] +[2024-11-08 00:39:24,829][42004] Updated weights for policy 0, policy_version 13186 (0.0031) +[2024-11-08 00:39:27,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6963.2, 300 sec: 6956.3). Total num frames: 54030336. Throughput: 0: 1701.4. Samples: 8504386. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:39:27,933][41694] Avg episode reward: [(0, '4.310')] +[2024-11-08 00:39:30,048][42004] Updated weights for policy 0, policy_version 13196 (0.0030) +[2024-11-08 00:39:32,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6963.2, 300 sec: 6956.3). Total num frames: 54071296. Throughput: 0: 1701.7. Samples: 8510166. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:39:32,933][41694] Avg episode reward: [(0, '4.607')] +[2024-11-08 00:39:35,243][42004] Updated weights for policy 0, policy_version 13206 (0.0031) +[2024-11-08 00:39:37,931][41694] Fps is (10 sec: 8192.0, 60 sec: 7031.8, 300 sec: 6984.0). Total num frames: 54112256. Throughput: 0: 1810.7. Samples: 8522112. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:39:37,933][41694] Avg episode reward: [(0, '4.320')] +[2024-11-08 00:39:37,945][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000013211_54112256.pth... +[2024-11-08 00:39:38,058][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000012798_52420608.pth +[2024-11-08 00:39:40,435][42004] Updated weights for policy 0, policy_version 13216 (0.0022) +[2024-11-08 00:39:42,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7031.6, 300 sec: 7039.6). Total num frames: 54149120. Throughput: 0: 1801.7. Samples: 8533598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:39:42,934][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 00:39:46,163][42004] Updated weights for policy 0, policy_version 13226 (0.0031) +[2024-11-08 00:39:47,932][41694] Fps is (10 sec: 6963.0, 60 sec: 7236.3, 300 sec: 7025.7). Total num frames: 54181888. Throughput: 0: 1786.7. Samples: 8539074. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:39:47,936][41694] Avg episode reward: [(0, '4.623')] +[2024-11-08 00:39:51,840][42004] Updated weights for policy 0, policy_version 13236 (0.0031) +[2024-11-08 00:39:52,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7304.5, 300 sec: 7039.6). Total num frames: 54222848. Throughput: 0: 1776.3. Samples: 8549622. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:39:52,934][41694] Avg episode reward: [(0, '4.519')] +[2024-11-08 00:39:57,932][41694] Fps is (10 sec: 6144.1, 60 sec: 6963.2, 300 sec: 6984.0). Total num frames: 54243328. Throughput: 0: 1689.8. Samples: 8556836. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 00:39:57,934][41694] Avg episode reward: [(0, '4.421')] +[2024-11-08 00:39:59,445][42004] Updated weights for policy 0, policy_version 13246 (0.0030) +[2024-11-08 00:40:02,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6963.4, 300 sec: 6970.1). Total num frames: 54280192. Throughput: 0: 1703.0. Samples: 8562606. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 00:40:02,933][41694] Avg episode reward: [(0, '4.297')] +[2024-11-08 00:40:05,101][42004] Updated weights for policy 0, policy_version 13256 (0.0028) +[2024-11-08 00:40:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6984.0). Total num frames: 54317056. Throughput: 0: 1727.0. Samples: 8573434. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:40:07,934][41694] Avg episode reward: [(0, '4.378')] +[2024-11-08 00:40:10,382][42004] Updated weights for policy 0, policy_version 13266 (0.0024) +[2024-11-08 00:40:12,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6963.3, 300 sec: 6997.9). Total num frames: 54353920. Throughput: 0: 1790.8. Samples: 8584972. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:40:12,933][41694] Avg episode reward: [(0, '4.584')] +[2024-11-08 00:40:16,226][42004] Updated weights for policy 0, policy_version 13276 (0.0031) +[2024-11-08 00:40:17,933][41694] Fps is (10 sec: 6962.6, 60 sec: 6894.8, 300 sec: 7053.4). Total num frames: 54386688. Throughput: 0: 1772.2. Samples: 8589918. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:40:17,935][41694] Avg episode reward: [(0, '4.487')] +[2024-11-08 00:40:22,417][42004] Updated weights for policy 0, policy_version 13286 (0.0032) +[2024-11-08 00:40:22,932][41694] Fps is (10 sec: 6553.2, 60 sec: 7099.7, 300 sec: 7039.5). Total num frames: 54419456. Throughput: 0: 1731.3. Samples: 8600022. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:40:22,936][41694] Avg episode reward: [(0, '4.548')] +[2024-11-08 00:40:27,932][41694] Fps is (10 sec: 6963.9, 60 sec: 7099.7, 300 sec: 7039.6). Total num frames: 54456320. Throughput: 0: 1717.4. Samples: 8610882. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:40:27,933][41694] Avg episode reward: [(0, '4.469')] +[2024-11-08 00:40:27,946][42004] Updated weights for policy 0, policy_version 13296 (0.0024) +[2024-11-08 00:40:32,932][41694] Fps is (10 sec: 6144.4, 60 sec: 6826.7, 300 sec: 6984.0). Total num frames: 54480896. Throughput: 0: 1683.9. Samples: 8614848. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:40:32,934][41694] Avg episode reward: [(0, '4.324')] +[2024-11-08 00:40:35,471][42004] Updated weights for policy 0, policy_version 13306 (0.0028) +[2024-11-08 00:40:37,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6758.4, 300 sec: 6984.0). Total num frames: 54517760. Throughput: 0: 1650.0. Samples: 8623872. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:40:37,933][41694] Avg episode reward: [(0, '4.332')] +[2024-11-08 00:40:40,807][42004] Updated weights for policy 0, policy_version 13316 (0.0026) +[2024-11-08 00:40:42,932][41694] Fps is (10 sec: 7782.0, 60 sec: 6826.6, 300 sec: 6984.0). Total num frames: 54558720. Throughput: 0: 1750.9. Samples: 8635626. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:40:42,935][41694] Avg episode reward: [(0, '4.533')] +[2024-11-08 00:40:46,034][42004] Updated weights for policy 0, policy_version 13326 (0.0023) +[2024-11-08 00:40:47,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6894.9, 300 sec: 7050.5). Total num frames: 54595584. Throughput: 0: 1747.8. Samples: 8641258. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:40:47,933][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 00:40:51,283][42004] Updated weights for policy 0, policy_version 13336 (0.0032) +[2024-11-08 00:40:52,931][41694] Fps is (10 sec: 7373.2, 60 sec: 6826.7, 300 sec: 7053.5). Total num frames: 54632448. Throughput: 0: 1771.3. Samples: 8653144. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:40:52,933][41694] Avg episode reward: [(0, '4.161')] +[2024-11-08 00:40:57,210][42004] Updated weights for policy 0, policy_version 13346 (0.0030) +[2024-11-08 00:40:57,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7099.8, 300 sec: 7067.3). Total num frames: 54669312. Throughput: 0: 1746.0. Samples: 8663542. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:40:57,933][41694] Avg episode reward: [(0, '4.342')] +[2024-11-08 00:41:02,932][41694] Fps is (10 sec: 6963.2, 60 sec: 7031.5, 300 sec: 7053.5). Total num frames: 54702080. Throughput: 0: 1755.7. Samples: 8668922. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:41:02,933][41694] Avg episode reward: [(0, '4.323')] +[2024-11-08 00:41:02,977][42004] Updated weights for policy 0, policy_version 13356 (0.0025) +[2024-11-08 00:41:07,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6758.4, 300 sec: 6997.9). Total num frames: 54722560. Throughput: 0: 1686.6. Samples: 8675920. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:41:07,934][41694] Avg episode reward: [(0, '4.294')] +[2024-11-08 00:41:10,902][42004] Updated weights for policy 0, policy_version 13366 (0.0037) +[2024-11-08 00:41:12,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6758.4, 300 sec: 6997.9). Total num frames: 54759424. Throughput: 0: 1671.5. Samples: 8686100. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:41:12,933][41694] Avg episode reward: [(0, '4.347')] +[2024-11-08 00:41:16,608][42004] Updated weights for policy 0, policy_version 13376 (0.0031) +[2024-11-08 00:41:17,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.8, 300 sec: 6997.9). Total num frames: 54796288. Throughput: 0: 1701.8. Samples: 8691430. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:41:17,935][41694] Avg episode reward: [(0, '4.483')] +[2024-11-08 00:41:21,884][42004] Updated weights for policy 0, policy_version 13386 (0.0032) +[2024-11-08 00:41:22,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6895.0, 300 sec: 7054.4). Total num frames: 54833152. Throughput: 0: 1763.8. Samples: 8703244. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:41:22,934][41694] Avg episode reward: [(0, '4.536')] +[2024-11-08 00:41:27,845][42004] Updated weights for policy 0, policy_version 13396 (0.0034) +[2024-11-08 00:41:27,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6894.9, 300 sec: 7067.3). Total num frames: 54870016. Throughput: 0: 1734.2. Samples: 8713666. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:41:27,933][41694] Avg episode reward: [(0, '4.537')] +[2024-11-08 00:41:32,931][41694] Fps is (10 sec: 6963.3, 60 sec: 7031.5, 300 sec: 7039.6). Total num frames: 54902784. Throughput: 0: 1715.9. Samples: 8718472. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:41:32,933][41694] Avg episode reward: [(0, '4.144')] +[2024-11-08 00:41:34,044][42004] Updated weights for policy 0, policy_version 13406 (0.0035) +[2024-11-08 00:41:39,989][41694] Fps is (10 sec: 5775.0, 60 sec: 6798.3, 300 sec: 6990.8). Total num frames: 54939648. Throughput: 0: 1606.5. Samples: 8728740. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:41:39,991][41694] Avg episode reward: [(0, '4.455')] +[2024-11-08 00:41:40,005][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000013413_54939648.pth... +[2024-11-08 00:41:40,142][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000013001_53252096.pth +[2024-11-08 00:41:41,792][42004] Updated weights for policy 0, policy_version 13416 (0.0036) +[2024-11-08 00:41:42,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6690.2, 300 sec: 6970.2). Total num frames: 54960128. Throughput: 0: 1609.7. Samples: 8735980. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:41:42,935][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 00:41:47,144][42004] Updated weights for policy 0, policy_version 13426 (0.0023) +[2024-11-08 00:41:47,932][41694] Fps is (10 sec: 7219.7, 60 sec: 6690.1, 300 sec: 6956.3). Total num frames: 54996992. Throughput: 0: 1607.3. Samples: 8741250. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:41:47,935][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 00:41:52,456][42004] Updated weights for policy 0, policy_version 13436 (0.0022) +[2024-11-08 00:41:52,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6758.4, 300 sec: 6970.2). Total num frames: 55037952. Throughput: 0: 1716.6. Samples: 8753168. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:41:52,933][41694] Avg episode reward: [(0, '4.320')] +[2024-11-08 00:41:57,449][42004] Updated weights for policy 0, policy_version 13446 (0.0030) +[2024-11-08 00:41:57,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6758.4, 300 sec: 7038.4). Total num frames: 55074816. Throughput: 0: 1762.2. Samples: 8765400. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:41:57,933][41694] Avg episode reward: [(0, '4.449')] +[2024-11-08 00:42:02,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 7025.7). Total num frames: 55107584. Throughput: 0: 1757.6. Samples: 8770520. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:42:02,940][41694] Avg episode reward: [(0, '4.448')] +[2024-11-08 00:42:03,714][42004] Updated weights for policy 0, policy_version 13456 (0.0029) +[2024-11-08 00:42:07,931][41694] Fps is (10 sec: 6963.1, 60 sec: 7031.5, 300 sec: 7025.7). Total num frames: 55144448. Throughput: 0: 1718.5. Samples: 8780578. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:42:07,934][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 00:42:09,114][42004] Updated weights for policy 0, policy_version 13466 (0.0029) +[2024-11-08 00:42:14,389][41694] Fps is (10 sec: 6077.5, 60 sec: 6798.1, 300 sec: 6963.5). Total num frames: 55177216. Throughput: 0: 1679.9. Samples: 8791708. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:42:14,392][41694] Avg episode reward: [(0, '4.370')] +[2024-11-08 00:42:16,960][42004] Updated weights for policy 0, policy_version 13476 (0.0029) +[2024-11-08 00:42:17,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6942.4). Total num frames: 55201792. Throughput: 0: 1664.6. Samples: 8793380. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:42:17,933][41694] Avg episode reward: [(0, '4.393')] +[2024-11-08 00:42:22,068][42004] Updated weights for policy 0, policy_version 13486 (0.0028) +[2024-11-08 00:42:22,932][41694] Fps is (10 sec: 7671.6, 60 sec: 6826.7, 300 sec: 6942.4). Total num frames: 55242752. Throughput: 0: 1774.4. Samples: 8804938. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:42:22,933][41694] Avg episode reward: [(0, '4.286')] +[2024-11-08 00:42:27,064][42004] Updated weights for policy 0, policy_version 13496 (0.0031) +[2024-11-08 00:42:27,932][41694] Fps is (10 sec: 8191.4, 60 sec: 6894.9, 300 sec: 6970.1). Total num frames: 55283712. Throughput: 0: 1809.4. Samples: 8817402. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:42:27,934][41694] Avg episode reward: [(0, '4.376')] +[2024-11-08 00:42:32,214][42004] Updated weights for policy 0, policy_version 13506 (0.0032) +[2024-11-08 00:42:32,931][41694] Fps is (10 sec: 8192.1, 60 sec: 7031.5, 300 sec: 7041.8). Total num frames: 55324672. Throughput: 0: 1825.7. Samples: 8823406. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:42:32,933][41694] Avg episode reward: [(0, '4.497')] +[2024-11-08 00:42:37,932][41694] Fps is (10 sec: 7373.2, 60 sec: 7210.4, 300 sec: 7039.6). Total num frames: 55357440. Throughput: 0: 1801.6. Samples: 8834242. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:42:37,934][41694] Avg episode reward: [(0, '4.750')] +[2024-11-08 00:42:38,117][42004] Updated weights for policy 0, policy_version 13516 (0.0033) +[2024-11-08 00:42:42,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7304.5, 300 sec: 7039.6). Total num frames: 55398400. Throughput: 0: 1780.2. Samples: 8845508. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:42:42,933][41694] Avg episode reward: [(0, '4.366')] +[2024-11-08 00:42:43,409][42004] Updated weights for policy 0, policy_version 13526 (0.0025) +[2024-11-08 00:42:48,879][41694] Fps is (10 sec: 5986.3, 60 sec: 6989.4, 300 sec: 6961.7). Total num frames: 55422976. Throughput: 0: 1757.4. Samples: 8851268. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:42:48,881][41694] Avg episode reward: [(0, '4.528')] +[2024-11-08 00:42:51,211][42004] Updated weights for policy 0, policy_version 13536 (0.0026) +[2024-11-08 00:42:52,933][41694] Fps is (10 sec: 5323.9, 60 sec: 6894.7, 300 sec: 6942.3). Total num frames: 55451648. Throughput: 0: 1722.7. Samples: 8858104. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:42:52,937][41694] Avg episode reward: [(0, '4.802')] +[2024-11-08 00:42:56,742][42004] Updated weights for policy 0, policy_version 13546 (0.0022) +[2024-11-08 00:42:57,932][41694] Fps is (10 sec: 7692.2, 60 sec: 6963.2, 300 sec: 6956.3). Total num frames: 55492608. Throughput: 0: 1780.3. Samples: 8869228. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:42:57,936][41694] Avg episode reward: [(0, '4.626')] +[2024-11-08 00:43:02,094][42004] Updated weights for policy 0, policy_version 13556 (0.0028) +[2024-11-08 00:43:02,931][41694] Fps is (10 sec: 7783.9, 60 sec: 7031.5, 300 sec: 6970.1). Total num frames: 55529472. Throughput: 0: 1819.7. Samples: 8875266. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:43:02,933][41694] Avg episode reward: [(0, '4.353')] +[2024-11-08 00:43:07,392][42004] Updated weights for policy 0, policy_version 13566 (0.0024) +[2024-11-08 00:43:07,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 7020.1). Total num frames: 55566336. Throughput: 0: 1814.8. Samples: 8886606. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:43:07,935][41694] Avg episode reward: [(0, '4.525')] +[2024-11-08 00:43:12,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7276.5, 300 sec: 7025.7). Total num frames: 55603200. Throughput: 0: 1769.0. Samples: 8897004. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:43:12,932][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 00:43:13,198][42004] Updated weights for policy 0, policy_version 13576 (0.0025) +[2024-11-08 00:43:17,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7372.8, 300 sec: 7025.7). Total num frames: 55644160. Throughput: 0: 1767.5. Samples: 8902946. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:43:17,934][41694] Avg episode reward: [(0, '4.313')] +[2024-11-08 00:43:18,406][42004] Updated weights for policy 0, policy_version 13586 (0.0023) +[2024-11-08 00:43:23,379][41694] Fps is (10 sec: 5880.8, 60 sec: 6979.4, 300 sec: 6945.7). Total num frames: 55664640. Throughput: 0: 1759.0. Samples: 8914184. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:43:23,383][41694] Avg episode reward: [(0, '4.367')] +[2024-11-08 00:43:26,407][42004] Updated weights for policy 0, policy_version 13596 (0.0028) +[2024-11-08 00:43:27,932][41694] Fps is (10 sec: 5324.5, 60 sec: 6894.9, 300 sec: 6928.5). Total num frames: 55697408. Throughput: 0: 1679.1. Samples: 8921066. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:43:27,934][41694] Avg episode reward: [(0, '4.645')] +[2024-11-08 00:43:31,573][42004] Updated weights for policy 0, policy_version 13606 (0.0027) +[2024-11-08 00:43:32,933][41694] Fps is (10 sec: 7717.4, 60 sec: 6894.8, 300 sec: 6942.4). Total num frames: 55738368. Throughput: 0: 1717.7. Samples: 8926940. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:43:32,935][41694] Avg episode reward: [(0, '4.502')] +[2024-11-08 00:43:36,578][42004] Updated weights for policy 0, policy_version 13616 (0.0019) +[2024-11-08 00:43:37,932][41694] Fps is (10 sec: 8192.0, 60 sec: 7031.4, 300 sec: 6956.3). Total num frames: 55779328. Throughput: 0: 1799.8. Samples: 8939092. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:43:37,935][41694] Avg episode reward: [(0, '4.332')] +[2024-11-08 00:43:37,949][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000013618_55779328.pth... +[2024-11-08 00:43:38,082][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000013211_54112256.pth +[2024-11-08 00:43:42,192][42004] Updated weights for policy 0, policy_version 13626 (0.0037) +[2024-11-08 00:43:42,932][41694] Fps is (10 sec: 7783.0, 60 sec: 6963.2, 300 sec: 7011.8). Total num frames: 55816192. Throughput: 0: 1801.7. Samples: 8950304. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:43:42,935][41694] Avg episode reward: [(0, '4.344')] +[2024-11-08 00:43:47,362][42004] Updated weights for policy 0, policy_version 13636 (0.0018) +[2024-11-08 00:43:47,932][41694] Fps is (10 sec: 7782.9, 60 sec: 7352.4, 300 sec: 7025.7). Total num frames: 55857152. Throughput: 0: 1794.8. Samples: 8956030. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:43:47,934][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 00:43:52,691][42004] Updated weights for policy 0, policy_version 13646 (0.0031) +[2024-11-08 00:43:52,939][41694] Fps is (10 sec: 7776.8, 60 sec: 7372.1, 300 sec: 7011.6). Total num frames: 55894016. Throughput: 0: 1803.0. Samples: 8967754. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:43:52,941][41694] Avg episode reward: [(0, '4.724')] +[2024-11-08 00:43:57,932][41694] Fps is (10 sec: 5734.3, 60 sec: 7031.5, 300 sec: 6956.3). Total num frames: 55914496. Throughput: 0: 1760.3. Samples: 8976220. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:43:57,934][41694] Avg episode reward: [(0, '4.701')] +[2024-11-08 00:44:00,380][42004] Updated weights for policy 0, policy_version 13656 (0.0032) +[2024-11-08 00:44:02,931][41694] Fps is (10 sec: 5738.6, 60 sec: 7031.5, 300 sec: 6956.3). Total num frames: 55951360. Throughput: 0: 1726.7. Samples: 8980646. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:44:02,933][41694] Avg episode reward: [(0, '4.560')] +[2024-11-08 00:44:05,939][42004] Updated weights for policy 0, policy_version 13666 (0.0025) +[2024-11-08 00:44:07,931][41694] Fps is (10 sec: 7782.6, 60 sec: 7099.8, 300 sec: 6970.2). Total num frames: 55992320. Throughput: 0: 1741.5. Samples: 8991772. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 00:44:07,933][41694] Avg episode reward: [(0, '4.371')] +[2024-11-08 00:44:11,117][42004] Updated weights for policy 0, policy_version 13676 (0.0030) +[2024-11-08 00:44:12,934][41694] Fps is (10 sec: 7780.4, 60 sec: 7099.4, 300 sec: 6970.1). Total num frames: 56029184. Throughput: 0: 1833.4. Samples: 9003572. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 00:44:12,938][41694] Avg episode reward: [(0, '4.352')] +[2024-11-08 00:44:16,685][42004] Updated weights for policy 0, policy_version 13686 (0.0025) +[2024-11-08 00:44:17,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7031.5, 300 sec: 7025.7). Total num frames: 56066048. Throughput: 0: 1817.5. Samples: 9008726. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:44:17,934][41694] Avg episode reward: [(0, '4.477')] +[2024-11-08 00:44:21,960][42004] Updated weights for policy 0, policy_version 13696 (0.0024) +[2024-11-08 00:44:22,932][41694] Fps is (10 sec: 7374.6, 60 sec: 7359.4, 300 sec: 7025.7). Total num frames: 56102912. Throughput: 0: 1804.5. Samples: 9020292. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:44:22,933][41694] Avg episode reward: [(0, '4.361')] +[2024-11-08 00:44:27,211][42004] Updated weights for policy 0, policy_version 13706 (0.0026) +[2024-11-08 00:44:27,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7441.1, 300 sec: 7025.7). Total num frames: 56143872. Throughput: 0: 1821.0. Samples: 9032250. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:44:27,934][41694] Avg episode reward: [(0, '4.405')] +[2024-11-08 00:44:32,931][41694] Fps is (10 sec: 5734.5, 60 sec: 7031.6, 300 sec: 6942.4). Total num frames: 56160256. Throughput: 0: 1818.5. Samples: 9037864. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:44:32,933][41694] Avg episode reward: [(0, '4.272')] +[2024-11-08 00:44:35,049][42004] Updated weights for policy 0, policy_version 13716 (0.0031) +[2024-11-08 00:44:37,932][41694] Fps is (10 sec: 5734.5, 60 sec: 7031.5, 300 sec: 6956.3). Total num frames: 56201216. Throughput: 0: 1712.0. Samples: 9044782. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:44:37,934][41694] Avg episode reward: [(0, '4.354')] +[2024-11-08 00:44:40,214][42004] Updated weights for policy 0, policy_version 13726 (0.0025) +[2024-11-08 00:44:42,932][41694] Fps is (10 sec: 8191.9, 60 sec: 7099.7, 300 sec: 6984.0). Total num frames: 56242176. Throughput: 0: 1781.6. Samples: 9056394. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:44:42,933][41694] Avg episode reward: [(0, '4.518')] +[2024-11-08 00:44:45,694][42004] Updated weights for policy 0, policy_version 13736 (0.0024) +[2024-11-08 00:44:47,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6956.3). Total num frames: 56274944. Throughput: 0: 1806.9. Samples: 9061956. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:44:47,933][41694] Avg episode reward: [(0, '4.241')] +[2024-11-08 00:44:51,669][42004] Updated weights for policy 0, policy_version 13746 (0.0027) +[2024-11-08 00:44:52,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6964.1, 300 sec: 7011.8). Total num frames: 56311808. Throughput: 0: 1791.0. Samples: 9072366. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:44:52,933][41694] Avg episode reward: [(0, '4.305')] +[2024-11-08 00:44:57,155][42004] Updated weights for policy 0, policy_version 13756 (0.0029) +[2024-11-08 00:44:57,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7236.3, 300 sec: 7011.8). Total num frames: 56348672. Throughput: 0: 1775.2. Samples: 9083452. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:44:57,933][41694] Avg episode reward: [(0, '4.441')] +[2024-11-08 00:45:02,613][42004] Updated weights for policy 0, policy_version 13766 (0.0033) +[2024-11-08 00:45:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7236.3, 300 sec: 7011.8). Total num frames: 56385536. Throughput: 0: 1786.3. Samples: 9089110. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:45:02,935][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 00:45:07,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6894.9, 300 sec: 6956.3). Total num frames: 56406016. Throughput: 0: 1727.5. Samples: 9098030. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:45:07,933][41694] Avg episode reward: [(0, '4.401')] +[2024-11-08 00:45:10,398][42004] Updated weights for policy 0, policy_version 13776 (0.0028) +[2024-11-08 00:45:12,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6895.2, 300 sec: 6970.2). Total num frames: 56442880. Throughput: 0: 1663.2. Samples: 9107094. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:45:12,933][41694] Avg episode reward: [(0, '4.277')] +[2024-11-08 00:45:16,233][42004] Updated weights for policy 0, policy_version 13786 (0.0027) +[2024-11-08 00:45:17,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6826.7, 300 sec: 6970.2). Total num frames: 56475648. Throughput: 0: 1653.9. Samples: 9112290. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:45:17,934][41694] Avg episode reward: [(0, '4.686')] +[2024-11-08 00:45:22,163][42004] Updated weights for policy 0, policy_version 13796 (0.0045) +[2024-11-08 00:45:22,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6826.7, 300 sec: 6970.1). Total num frames: 56512512. Throughput: 0: 1728.3. Samples: 9122554. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:45:22,935][41694] Avg episode reward: [(0, '4.328')] +[2024-11-08 00:45:27,776][42004] Updated weights for policy 0, policy_version 13806 (0.0034) +[2024-11-08 00:45:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 7011.8). Total num frames: 56549376. Throughput: 0: 1708.9. Samples: 9133296. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:45:27,934][41694] Avg episode reward: [(0, '4.461')] +[2024-11-08 00:45:32,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7099.7, 300 sec: 7011.8). Total num frames: 56586240. Throughput: 0: 1712.4. Samples: 9139012. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:45:32,934][41694] Avg episode reward: [(0, '4.446')] +[2024-11-08 00:45:33,289][42004] Updated weights for policy 0, policy_version 13816 (0.0028) +[2024-11-08 00:45:37,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7031.5, 300 sec: 6997.9). Total num frames: 56623104. Throughput: 0: 1734.9. Samples: 9150438. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:45:37,933][41694] Avg episode reward: [(0, '4.554')] +[2024-11-08 00:45:37,951][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000013825_56627200.pth... +[2024-11-08 00:45:38,059][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000013413_54939648.pth +[2024-11-08 00:45:38,484][42004] Updated weights for policy 0, policy_version 13826 (0.0038) +[2024-11-08 00:45:42,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6758.4, 300 sec: 6956.3). Total num frames: 56647680. Throughput: 0: 1655.5. Samples: 9157948. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:45:42,933][41694] Avg episode reward: [(0, '4.519')] +[2024-11-08 00:45:46,003][42004] Updated weights for policy 0, policy_version 13836 (0.0023) +[2024-11-08 00:45:47,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6826.7, 300 sec: 6956.3). Total num frames: 56684544. Throughput: 0: 1655.7. Samples: 9163618. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:45:47,933][41694] Avg episode reward: [(0, '4.470')] +[2024-11-08 00:45:51,360][42004] Updated weights for policy 0, policy_version 13846 (0.0033) +[2024-11-08 00:45:52,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6956.3). Total num frames: 56721408. Throughput: 0: 1715.3. Samples: 9175218. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:45:52,933][41694] Avg episode reward: [(0, '4.348')] +[2024-11-08 00:45:57,212][42004] Updated weights for policy 0, policy_version 13856 (0.0024) +[2024-11-08 00:45:57,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6970.1). Total num frames: 56758272. Throughput: 0: 1745.5. Samples: 9185642. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:45:57,934][41694] Avg episode reward: [(0, '4.205')] +[2024-11-08 00:46:02,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 7011.8). Total num frames: 56791040. Throughput: 0: 1747.7. Samples: 9190938. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:46:02,933][41694] Avg episode reward: [(0, '4.320')] +[2024-11-08 00:46:03,013][42004] Updated weights for policy 0, policy_version 13866 (0.0031) +[2024-11-08 00:46:07,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7099.7, 300 sec: 7025.7). Total num frames: 56832000. Throughput: 0: 1764.5. Samples: 9201956. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:46:07,935][41694] Avg episode reward: [(0, '4.719')] +[2024-11-08 00:46:08,322][42004] Updated weights for policy 0, policy_version 13876 (0.0036) +[2024-11-08 00:46:12,931][41694] Fps is (10 sec: 7373.1, 60 sec: 7031.5, 300 sec: 7011.8). Total num frames: 56864768. Throughput: 0: 1762.6. Samples: 9212612. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 00:46:12,933][41694] Avg episode reward: [(0, '4.526')] +[2024-11-08 00:46:16,495][42004] Updated weights for policy 0, policy_version 13886 (0.0024) +[2024-11-08 00:46:17,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6826.6, 300 sec: 6956.3). Total num frames: 56885248. Throughput: 0: 1697.7. Samples: 9215410. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 00:46:17,934][41694] Avg episode reward: [(0, '4.508')] +[2024-11-08 00:46:21,885][42004] Updated weights for policy 0, policy_version 13896 (0.0030) +[2024-11-08 00:46:22,932][41694] Fps is (10 sec: 5734.1, 60 sec: 6826.6, 300 sec: 6956.2). Total num frames: 56922112. Throughput: 0: 1666.6. Samples: 9225436. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:46:22,934][41694] Avg episode reward: [(0, '4.484')] +[2024-11-08 00:46:27,321][42004] Updated weights for policy 0, policy_version 13906 (0.0036) +[2024-11-08 00:46:27,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6970.1). Total num frames: 56958976. Throughput: 0: 1750.6. Samples: 9236724. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:46:27,934][41694] Avg episode reward: [(0, '4.494')] +[2024-11-08 00:46:32,931][41694] Fps is (10 sec: 7373.1, 60 sec: 6826.7, 300 sec: 7019.1). Total num frames: 56995840. Throughput: 0: 1732.6. Samples: 9241584. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:46:32,933][41694] Avg episode reward: [(0, '4.481')] +[2024-11-08 00:46:33,247][42004] Updated weights for policy 0, policy_version 13916 (0.0024) +[2024-11-08 00:46:37,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6894.9, 300 sec: 7039.6). Total num frames: 57036800. Throughput: 0: 1725.5. Samples: 9252864. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:46:37,933][41694] Avg episode reward: [(0, '4.656')] +[2024-11-08 00:46:38,484][42004] Updated weights for policy 0, policy_version 13926 (0.0033) +[2024-11-08 00:46:42,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7099.7, 300 sec: 7039.6). Total num frames: 57073664. Throughput: 0: 1744.4. Samples: 9264142. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:46:42,933][41694] Avg episode reward: [(0, '4.520')] +[2024-11-08 00:46:44,027][42004] Updated weights for policy 0, policy_version 13936 (0.0031) +[2024-11-08 00:46:47,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7099.7, 300 sec: 7025.7). Total num frames: 57110528. Throughput: 0: 1754.6. Samples: 9269896. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:46:47,933][41694] Avg episode reward: [(0, '4.350')] +[2024-11-08 00:46:51,619][42004] Updated weights for policy 0, policy_version 13946 (0.0035) +[2024-11-08 00:46:52,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6826.7, 300 sec: 6970.1). Total num frames: 57131008. Throughput: 0: 1672.4. Samples: 9277214. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:46:52,933][41694] Avg episode reward: [(0, '4.232')] +[2024-11-08 00:46:57,103][42004] Updated weights for policy 0, policy_version 13956 (0.0033) +[2024-11-08 00:46:57,931][41694] Fps is (10 sec: 5734.3, 60 sec: 6826.7, 300 sec: 6984.0). Total num frames: 57167872. Throughput: 0: 1678.2. Samples: 9288132. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:46:57,933][41694] Avg episode reward: [(0, '4.585')] +[2024-11-08 00:47:02,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6826.7, 300 sec: 6970.1). Total num frames: 57200640. Throughput: 0: 1741.7. Samples: 9293788. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 00:47:02,935][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 00:47:03,106][42004] Updated weights for policy 0, policy_version 13966 (0.0033) +[2024-11-08 00:47:07,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 7018.7). Total num frames: 57237504. Throughput: 0: 1733.3. Samples: 9303434. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 00:47:07,934][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 00:47:08,851][42004] Updated weights for policy 0, policy_version 13976 (0.0032) +[2024-11-08 00:47:12,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 7025.7). Total num frames: 57274368. Throughput: 0: 1734.6. Samples: 9314780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:47:12,933][41694] Avg episode reward: [(0, '4.745')] +[2024-11-08 00:47:14,238][42004] Updated weights for policy 0, policy_version 13986 (0.0034) +[2024-11-08 00:47:17,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7099.7, 300 sec: 7011.8). Total num frames: 57311232. Throughput: 0: 1756.3. Samples: 9320618. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:47:17,934][41694] Avg episode reward: [(0, '4.566')] +[2024-11-08 00:47:19,752][42004] Updated weights for policy 0, policy_version 13996 (0.0027) +[2024-11-08 00:47:24,493][41694] Fps is (10 sec: 6022.9, 60 sec: 6853.2, 300 sec: 6947.3). Total num frames: 57344000. Throughput: 0: 1695.9. Samples: 9331828. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:47:24,495][41694] Avg episode reward: [(0, '4.366')] +[2024-11-08 00:47:27,219][42004] Updated weights for policy 0, policy_version 14006 (0.0022) +[2024-11-08 00:47:27,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6894.9, 300 sec: 6942.4). Total num frames: 57372672. Throughput: 0: 1669.0. Samples: 9339246. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:47:27,934][41694] Avg episode reward: [(0, '4.267')] +[2024-11-08 00:47:32,694][42004] Updated weights for policy 0, policy_version 14016 (0.0026) +[2024-11-08 00:47:32,931][41694] Fps is (10 sec: 7766.1, 60 sec: 6894.9, 300 sec: 6956.3). Total num frames: 57409536. Throughput: 0: 1665.1. Samples: 9344826. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:47:32,933][41694] Avg episode reward: [(0, '4.509')] +[2024-11-08 00:47:37,932][41694] Fps is (10 sec: 7372.2, 60 sec: 6826.6, 300 sec: 6942.4). Total num frames: 57446400. Throughput: 0: 1758.4. Samples: 9356346. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:47:37,934][41694] Avg episode reward: [(0, '4.719')] +[2024-11-08 00:47:37,961][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000014026_57450496.pth... +[2024-11-08 00:47:37,960][42004] Updated weights for policy 0, policy_version 14026 (0.0025) +[2024-11-08 00:47:38,067][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000013618_55779328.pth +[2024-11-08 00:47:42,931][41694] Fps is (10 sec: 7782.3, 60 sec: 6894.9, 300 sec: 7020.5). Total num frames: 57487360. Throughput: 0: 1771.3. Samples: 9367842. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:47:42,935][41694] Avg episode reward: [(0, '4.377')] +[2024-11-08 00:47:43,340][42004] Updated weights for policy 0, policy_version 14036 (0.0029) +[2024-11-08 00:47:47,932][41694] Fps is (10 sec: 7783.2, 60 sec: 6894.9, 300 sec: 7025.7). Total num frames: 57524224. Throughput: 0: 1775.4. Samples: 9373682. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:47:47,933][41694] Avg episode reward: [(0, '4.227')] +[2024-11-08 00:47:48,553][42004] Updated weights for policy 0, policy_version 14046 (0.0027) +[2024-11-08 00:47:52,932][41694] Fps is (10 sec: 7782.2, 60 sec: 7236.2, 300 sec: 7025.7). Total num frames: 57565184. Throughput: 0: 1820.8. Samples: 9385372. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:47:52,935][41694] Avg episode reward: [(0, '4.299')] +[2024-11-08 00:47:53,870][42004] Updated weights for policy 0, policy_version 14056 (0.0029) +[2024-11-08 00:47:58,874][41694] Fps is (10 sec: 5989.3, 60 sec: 6922.8, 300 sec: 6961.8). Total num frames: 57589760. Throughput: 0: 1662.8. Samples: 9391172. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:47:58,876][41694] Avg episode reward: [(0, '4.614')] +[2024-11-08 00:48:02,108][42004] Updated weights for policy 0, policy_version 14066 (0.0026) +[2024-11-08 00:48:02,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6963.2, 300 sec: 6956.3). Total num frames: 57618432. Throughput: 0: 1710.5. Samples: 9397590. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:48:02,935][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 00:48:07,875][42004] Updated weights for policy 0, policy_version 14076 (0.0023) +[2024-11-08 00:48:07,932][41694] Fps is (10 sec: 7235.1, 60 sec: 6963.2, 300 sec: 6956.2). Total num frames: 57655296. Throughput: 0: 1749.5. Samples: 9407826. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:48:07,933][41694] Avg episode reward: [(0, '4.730')] +[2024-11-08 00:48:12,931][41694] Fps is (10 sec: 7373.2, 60 sec: 6963.2, 300 sec: 6942.4). Total num frames: 57692160. Throughput: 0: 1770.1. Samples: 9418898. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 00:48:12,933][41694] Avg episode reward: [(0, '4.345')] +[2024-11-08 00:48:13,391][42004] Updated weights for policy 0, policy_version 14086 (0.0030) +[2024-11-08 00:48:17,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6963.2, 300 sec: 7008.5). Total num frames: 57729024. Throughput: 0: 1775.5. Samples: 9424724. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 00:48:17,934][41694] Avg episode reward: [(0, '4.542')] +[2024-11-08 00:48:18,632][42004] Updated weights for policy 0, policy_version 14096 (0.0030) +[2024-11-08 00:48:22,931][41694] Fps is (10 sec: 7782.3, 60 sec: 7289.4, 300 sec: 7025.7). Total num frames: 57769984. Throughput: 0: 1783.9. Samples: 9436620. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:48:22,933][41694] Avg episode reward: [(0, '4.566')] +[2024-11-08 00:48:23,971][42004] Updated weights for policy 0, policy_version 14106 (0.0027) +[2024-11-08 00:48:27,932][41694] Fps is (10 sec: 7782.7, 60 sec: 7236.3, 300 sec: 7011.8). Total num frames: 57806848. Throughput: 0: 1776.8. Samples: 9447798. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:48:27,934][41694] Avg episode reward: [(0, '4.527')] +[2024-11-08 00:48:29,440][42004] Updated weights for policy 0, policy_version 14116 (0.0033) +[2024-11-08 00:48:33,239][41694] Fps is (10 sec: 5563.3, 60 sec: 6927.7, 300 sec: 6935.2). Total num frames: 57827328. Throughput: 0: 1755.3. Samples: 9453208. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:48:33,241][41694] Avg episode reward: [(0, '4.387')] +[2024-11-08 00:48:37,537][42004] Updated weights for policy 0, policy_version 14126 (0.0047) +[2024-11-08 00:48:37,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6895.0, 300 sec: 6928.5). Total num frames: 57860096. Throughput: 0: 1651.5. Samples: 9459688. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:48:37,935][41694] Avg episode reward: [(0, '4.522')] +[2024-11-08 00:48:42,819][42004] Updated weights for policy 0, policy_version 14136 (0.0042) +[2024-11-08 00:48:42,931][41694] Fps is (10 sec: 7606.7, 60 sec: 6894.9, 300 sec: 6928.5). Total num frames: 57901056. Throughput: 0: 1820.1. Samples: 9471362. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:48:42,933][41694] Avg episode reward: [(0, '4.481')] +[2024-11-08 00:48:47,921][42004] Updated weights for policy 0, policy_version 14146 (0.0036) +[2024-11-08 00:48:47,932][41694] Fps is (10 sec: 8192.1, 60 sec: 6963.2, 300 sec: 6942.5). Total num frames: 57942016. Throughput: 0: 1770.8. Samples: 9477274. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:48:47,933][41694] Avg episode reward: [(0, '4.525')] +[2024-11-08 00:48:52,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6895.0, 300 sec: 6997.9). Total num frames: 57978880. Throughput: 0: 1807.6. Samples: 9489166. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:48:52,934][41694] Avg episode reward: [(0, '4.448')] +[2024-11-08 00:48:53,096][42004] Updated weights for policy 0, policy_version 14156 (0.0027) +[2024-11-08 00:48:57,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7282.3, 300 sec: 7011.8). Total num frames: 58019840. Throughput: 0: 1820.5. Samples: 9500820. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:48:57,939][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 00:48:58,460][42004] Updated weights for policy 0, policy_version 14166 (0.0033) +[2024-11-08 00:49:02,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7236.3, 300 sec: 6984.0). Total num frames: 58052608. Throughput: 0: 1811.5. Samples: 9506240. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:49:02,933][41694] Avg episode reward: [(0, '4.441')] +[2024-11-08 00:49:04,733][42004] Updated weights for policy 0, policy_version 14176 (0.0035) +[2024-11-08 00:49:07,932][41694] Fps is (10 sec: 4914.8, 60 sec: 6894.8, 300 sec: 6914.6). Total num frames: 58068992. Throughput: 0: 1742.2. Samples: 9515022. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:49:07,934][41694] Avg episode reward: [(0, '4.521')] +[2024-11-08 00:49:12,502][42004] Updated weights for policy 0, policy_version 14186 (0.0037) +[2024-11-08 00:49:12,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6894.9, 300 sec: 6914.6). Total num frames: 58105856. Throughput: 0: 1674.8. Samples: 9523164. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:49:12,934][41694] Avg episode reward: [(0, '4.468')] +[2024-11-08 00:49:17,626][42004] Updated weights for policy 0, policy_version 14196 (0.0030) +[2024-11-08 00:49:17,931][41694] Fps is (10 sec: 7783.2, 60 sec: 6963.3, 300 sec: 6928.5). Total num frames: 58146816. Throughput: 0: 1697.4. Samples: 9529070. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:49:17,933][41694] Avg episode reward: [(0, '4.485')] +[2024-11-08 00:49:22,840][42004] Updated weights for policy 0, policy_version 14206 (0.0027) +[2024-11-08 00:49:22,932][41694] Fps is (10 sec: 8192.2, 60 sec: 6963.2, 300 sec: 6928.5). Total num frames: 58187776. Throughput: 0: 1803.4. Samples: 9540842. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:49:22,934][41694] Avg episode reward: [(0, '4.511')] +[2024-11-08 00:49:27,931][41694] Fps is (10 sec: 7782.3, 60 sec: 6963.2, 300 sec: 6997.9). Total num frames: 58224640. Throughput: 0: 1808.9. Samples: 9552764. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:49:27,933][41694] Avg episode reward: [(0, '4.395')] +[2024-11-08 00:49:27,996][42004] Updated weights for policy 0, policy_version 14216 (0.0033) +[2024-11-08 00:49:32,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7342.2, 300 sec: 6997.9). Total num frames: 58265600. Throughput: 0: 1806.1. Samples: 9558548. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:49:32,933][41694] Avg episode reward: [(0, '4.334')] +[2024-11-08 00:49:33,282][42004] Updated weights for policy 0, policy_version 14226 (0.0027) +[2024-11-08 00:49:37,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7372.8, 300 sec: 6984.0). Total num frames: 58302464. Throughput: 0: 1804.2. Samples: 9570356. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:49:37,933][41694] Avg episode reward: [(0, '4.311')] +[2024-11-08 00:49:37,946][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000014234_58302464.pth... +[2024-11-08 00:49:38,078][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000013825_56627200.pth +[2024-11-08 00:49:38,911][42004] Updated weights for policy 0, policy_version 14236 (0.0035) +[2024-11-08 00:49:42,931][41694] Fps is (10 sec: 5324.7, 60 sec: 6963.2, 300 sec: 6928.5). Total num frames: 58318848. Throughput: 0: 1685.3. Samples: 9576660. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:49:42,933][41694] Avg episode reward: [(0, '4.305')] +[2024-11-08 00:49:47,186][42004] Updated weights for policy 0, policy_version 14246 (0.0033) +[2024-11-08 00:49:47,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6895.0, 300 sec: 6928.5). Total num frames: 58355712. Throughput: 0: 1672.4. Samples: 9581500. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:49:47,933][41694] Avg episode reward: [(0, '4.261')] +[2024-11-08 00:49:52,754][42004] Updated weights for policy 0, policy_version 14256 (0.0021) +[2024-11-08 00:49:52,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6928.5). Total num frames: 58392576. Throughput: 0: 1721.7. Samples: 9592496. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:49:52,933][41694] Avg episode reward: [(0, '4.541')] +[2024-11-08 00:49:57,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6928.5). Total num frames: 58429440. Throughput: 0: 1796.1. Samples: 9603986. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:49:57,933][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 00:49:58,028][42004] Updated weights for policy 0, policy_version 14266 (0.0027) +[2024-11-08 00:50:02,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6984.0). Total num frames: 58466304. Throughput: 0: 1794.6. Samples: 9609826. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:50:02,933][41694] Avg episode reward: [(0, '4.514')] +[2024-11-08 00:50:03,556][42004] Updated weights for policy 0, policy_version 14276 (0.0035) +[2024-11-08 00:50:07,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7304.6, 300 sec: 6997.9). Total num frames: 58507264. Throughput: 0: 1779.1. Samples: 9620900. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:50:07,933][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 00:50:08,809][42004] Updated weights for policy 0, policy_version 14286 (0.0033) +[2024-11-08 00:50:12,931][41694] Fps is (10 sec: 6963.2, 60 sec: 7168.0, 300 sec: 6984.0). Total num frames: 58535936. Throughput: 0: 1738.8. Samples: 9631008. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:50:12,933][41694] Avg episode reward: [(0, '4.541')] +[2024-11-08 00:50:17,936][41694] Fps is (10 sec: 4503.6, 60 sec: 6757.9, 300 sec: 6914.5). Total num frames: 58552320. Throughput: 0: 1680.5. Samples: 9634178. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 00:50:17,950][41694] Avg episode reward: [(0, '4.447')] +[2024-11-08 00:50:18,098][42004] Updated weights for policy 0, policy_version 14296 (0.0033) +[2024-11-08 00:50:22,932][41694] Fps is (10 sec: 5324.5, 60 sec: 6690.1, 300 sec: 6914.6). Total num frames: 58589184. Throughput: 0: 1585.9. Samples: 9641720. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 00:50:22,934][41694] Avg episode reward: [(0, '4.327')] +[2024-11-08 00:50:23,620][42004] Updated weights for policy 0, policy_version 14306 (0.0024) +[2024-11-08 00:50:27,932][41694] Fps is (10 sec: 7785.5, 60 sec: 6758.3, 300 sec: 6928.5). Total num frames: 58630144. Throughput: 0: 1707.3. Samples: 9653488. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:50:27,935][41694] Avg episode reward: [(0, '4.424')] +[2024-11-08 00:50:28,915][42004] Updated weights for policy 0, policy_version 14316 (0.0026) +[2024-11-08 00:50:32,931][41694] Fps is (10 sec: 7782.9, 60 sec: 6690.1, 300 sec: 6928.5). Total num frames: 58667008. Throughput: 0: 1726.4. Samples: 9659188. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:50:32,933][41694] Avg episode reward: [(0, '4.426')] +[2024-11-08 00:50:34,094][42004] Updated weights for policy 0, policy_version 14326 (0.0031) +[2024-11-08 00:50:37,931][41694] Fps is (10 sec: 7782.8, 60 sec: 6758.4, 300 sec: 6984.0). Total num frames: 58707968. Throughput: 0: 1742.6. Samples: 9670912. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:50:37,933][41694] Avg episode reward: [(0, '4.443')] +[2024-11-08 00:50:39,486][42004] Updated weights for policy 0, policy_version 14336 (0.0030) +[2024-11-08 00:50:42,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7099.7, 300 sec: 6984.0). Total num frames: 58744832. Throughput: 0: 1747.7. Samples: 9682634. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:50:42,941][41694] Avg episode reward: [(0, '4.615')] +[2024-11-08 00:50:44,916][42004] Updated weights for policy 0, policy_version 14346 (0.0021) +[2024-11-08 00:50:47,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7099.7, 300 sec: 6984.0). Total num frames: 58781696. Throughput: 0: 1737.3. Samples: 9688006. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:50:47,933][41694] Avg episode reward: [(0, '4.370')] +[2024-11-08 00:50:52,847][42004] Updated weights for policy 0, policy_version 14356 (0.0036) +[2024-11-08 00:50:52,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6826.7, 300 sec: 6928.5). Total num frames: 58802176. Throughput: 0: 1638.6. Samples: 9694638. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:50:52,933][41694] Avg episode reward: [(0, '4.413')] +[2024-11-08 00:50:57,932][41694] Fps is (10 sec: 5734.1, 60 sec: 6826.6, 300 sec: 6942.4). Total num frames: 58839040. Throughput: 0: 1672.8. Samples: 9706284. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:50:57,934][41694] Avg episode reward: [(0, '4.485')] +[2024-11-08 00:50:58,031][42004] Updated weights for policy 0, policy_version 14366 (0.0026) +[2024-11-08 00:51:02,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6928.5). Total num frames: 58875904. Throughput: 0: 1723.2. Samples: 9711712. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:51:02,935][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 00:51:03,750][42004] Updated weights for policy 0, policy_version 14376 (0.0032) +[2024-11-08 00:51:07,932][41694] Fps is (10 sec: 7373.3, 60 sec: 6758.4, 300 sec: 6942.4). Total num frames: 58912768. Throughput: 0: 1799.7. Samples: 9722706. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:51:07,937][41694] Avg episode reward: [(0, '4.595')] +[2024-11-08 00:51:09,347][42004] Updated weights for policy 0, policy_version 14386 (0.0036) +[2024-11-08 00:51:12,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6984.0). Total num frames: 58945536. Throughput: 0: 1765.9. Samples: 9732952. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:51:12,933][41694] Avg episode reward: [(0, '4.422')] +[2024-11-08 00:51:15,356][42004] Updated weights for policy 0, policy_version 14396 (0.0037) +[2024-11-08 00:51:17,931][41694] Fps is (10 sec: 6963.3, 60 sec: 7168.5, 300 sec: 6984.0). Total num frames: 58982400. Throughput: 0: 1762.4. Samples: 9738494. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:51:17,934][41694] Avg episode reward: [(0, '4.361')] +[2024-11-08 00:51:21,059][42004] Updated weights for policy 0, policy_version 14406 (0.0034) +[2024-11-08 00:51:22,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7168.1, 300 sec: 6984.0). Total num frames: 59019264. Throughput: 0: 1737.7. Samples: 9749110. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:51:22,933][41694] Avg episode reward: [(0, '4.426')] +[2024-11-08 00:51:27,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6826.7, 300 sec: 6928.5). Total num frames: 59039744. Throughput: 0: 1636.9. Samples: 9756296. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:51:27,933][41694] Avg episode reward: [(0, '4.404')] +[2024-11-08 00:51:28,585][42004] Updated weights for policy 0, policy_version 14416 (0.0028) +[2024-11-08 00:51:32,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6894.9, 300 sec: 6928.5). Total num frames: 59080704. Throughput: 0: 1652.1. Samples: 9762348. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:51:32,934][41694] Avg episode reward: [(0, '4.592')] +[2024-11-08 00:51:33,579][42004] Updated weights for policy 0, policy_version 14426 (0.0025) +[2024-11-08 00:51:37,932][41694] Fps is (10 sec: 8192.1, 60 sec: 6894.9, 300 sec: 6942.4). Total num frames: 59121664. Throughput: 0: 1779.1. Samples: 9774698. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:51:37,935][41694] Avg episode reward: [(0, '4.524')] +[2024-11-08 00:51:37,965][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000014434_59121664.pth... +[2024-11-08 00:51:38,086][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000014026_57450496.pth +[2024-11-08 00:51:38,651][42004] Updated weights for policy 0, policy_version 14436 (0.0026) +[2024-11-08 00:51:42,931][41694] Fps is (10 sec: 8192.0, 60 sec: 6963.2, 300 sec: 6956.3). Total num frames: 59162624. Throughput: 0: 1782.7. Samples: 9786504. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:51:42,933][41694] Avg episode reward: [(0, '4.290')] +[2024-11-08 00:51:43,862][42004] Updated weights for policy 0, policy_version 14446 (0.0030) +[2024-11-08 00:51:47,932][41694] Fps is (10 sec: 7782.6, 60 sec: 6963.2, 300 sec: 7011.8). Total num frames: 59199488. Throughput: 0: 1795.7. Samples: 9792520. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:51:47,934][41694] Avg episode reward: [(0, '4.402')] +[2024-11-08 00:51:49,065][42004] Updated weights for policy 0, policy_version 14456 (0.0023) +[2024-11-08 00:51:52,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7236.3, 300 sec: 7011.8). Total num frames: 59236352. Throughput: 0: 1811.6. Samples: 9804226. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:51:52,933][41694] Avg episode reward: [(0, '4.344')] +[2024-11-08 00:51:54,719][42004] Updated weights for policy 0, policy_version 14466 (0.0023) +[2024-11-08 00:51:59,473][41694] Fps is (10 sec: 6033.3, 60 sec: 6988.6, 300 sec: 6975.4). Total num frames: 59269120. Throughput: 0: 1756.0. Samples: 9814680. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:51:59,475][41694] Avg episode reward: [(0, '4.620')] +[2024-11-08 00:52:02,431][42004] Updated weights for policy 0, policy_version 14476 (0.0031) +[2024-11-08 00:52:02,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6963.2, 300 sec: 6970.1). Total num frames: 59293696. Throughput: 0: 1729.7. Samples: 9816330. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:52:02,934][41694] Avg episode reward: [(0, '4.519')] +[2024-11-08 00:52:07,932][41694] Fps is (10 sec: 7263.4, 60 sec: 6963.2, 300 sec: 6970.1). Total num frames: 59330560. Throughput: 0: 1736.2. Samples: 9827240. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 00:52:07,937][41694] Avg episode reward: [(0, '4.502')] +[2024-11-08 00:52:07,992][42004] Updated weights for policy 0, policy_version 14486 (0.0027) +[2024-11-08 00:52:12,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7099.7, 300 sec: 6984.0). Total num frames: 59371520. Throughput: 0: 1843.6. Samples: 9839256. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 00:52:12,934][41694] Avg episode reward: [(0, '4.481')] +[2024-11-08 00:52:13,095][42004] Updated weights for policy 0, policy_version 14496 (0.0027) +[2024-11-08 00:52:17,932][41694] Fps is (10 sec: 8191.8, 60 sec: 7168.0, 300 sec: 7049.1). Total num frames: 59412480. Throughput: 0: 1840.4. Samples: 9845166. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:52:17,937][41694] Avg episode reward: [(0, '4.409')] +[2024-11-08 00:52:18,268][42004] Updated weights for policy 0, policy_version 14506 (0.0024) +[2024-11-08 00:52:22,932][41694] Fps is (10 sec: 8192.0, 60 sec: 7236.3, 300 sec: 7053.5). Total num frames: 59453440. Throughput: 0: 1839.9. Samples: 9857494. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:52:22,933][41694] Avg episode reward: [(0, '4.406')] +[2024-11-08 00:52:23,287][42004] Updated weights for policy 0, policy_version 14516 (0.0023) +[2024-11-08 00:52:27,931][41694] Fps is (10 sec: 7782.7, 60 sec: 7509.4, 300 sec: 7053.5). Total num frames: 59490304. Throughput: 0: 1827.0. Samples: 9868720. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:52:27,933][41694] Avg episode reward: [(0, '4.351')] +[2024-11-08 00:52:28,856][42004] Updated weights for policy 0, policy_version 14526 (0.0030) +[2024-11-08 00:52:33,834][41694] Fps is (10 sec: 6011.0, 60 sec: 7196.3, 300 sec: 7004.3). Total num frames: 59518976. Throughput: 0: 1776.9. Samples: 9874086. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:52:33,835][41694] Avg episode reward: [(0, '4.358')] +[2024-11-08 00:52:36,416][42004] Updated weights for policy 0, policy_version 14536 (0.0033) +[2024-11-08 00:52:37,932][41694] Fps is (10 sec: 6143.9, 60 sec: 7168.0, 300 sec: 6997.9). Total num frames: 59551744. Throughput: 0: 1719.8. Samples: 9881618. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:52:37,933][41694] Avg episode reward: [(0, '4.520')] +[2024-11-08 00:52:41,797][42004] Updated weights for policy 0, policy_version 14546 (0.0022) +[2024-11-08 00:52:42,931][41694] Fps is (10 sec: 7654.2, 60 sec: 7099.7, 300 sec: 6997.9). Total num frames: 59588608. Throughput: 0: 1808.1. Samples: 9893256. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:52:42,934][41694] Avg episode reward: [(0, '4.652')] +[2024-11-08 00:52:46,788][42004] Updated weights for policy 0, policy_version 14556 (0.0029) +[2024-11-08 00:52:47,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7168.0, 300 sec: 6997.9). Total num frames: 59629568. Throughput: 0: 1844.3. Samples: 9899322. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:52:47,935][41694] Avg episode reward: [(0, '4.574')] +[2024-11-08 00:52:51,794][42004] Updated weights for policy 0, policy_version 14566 (0.0026) +[2024-11-08 00:52:52,931][41694] Fps is (10 sec: 8192.0, 60 sec: 7236.3, 300 sec: 7076.1). Total num frames: 59670528. Throughput: 0: 1873.2. Samples: 9911532. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:52:52,934][41694] Avg episode reward: [(0, '4.412')] +[2024-11-08 00:52:57,005][42004] Updated weights for policy 0, policy_version 14576 (0.0029) +[2024-11-08 00:52:57,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7497.1, 300 sec: 7081.2). Total num frames: 59707392. Throughput: 0: 1871.2. Samples: 9923462. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:52:57,933][41694] Avg episode reward: [(0, '4.315')] +[2024-11-08 00:53:02,883][42004] Updated weights for policy 0, policy_version 14586 (0.0027) +[2024-11-08 00:53:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7509.3, 300 sec: 7081.2). Total num frames: 59744256. Throughput: 0: 1859.2. Samples: 9928830. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:53:02,933][41694] Avg episode reward: [(0, '4.409')] +[2024-11-08 00:53:08,225][41694] Fps is (10 sec: 5571.0, 60 sec: 7201.1, 300 sec: 7018.7). Total num frames: 59764736. Throughput: 0: 1810.5. Samples: 9939496. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:53:08,227][41694] Avg episode reward: [(0, '4.212')] +[2024-11-08 00:53:10,386][42004] Updated weights for policy 0, policy_version 14596 (0.0032) +[2024-11-08 00:53:12,932][41694] Fps is (10 sec: 5734.4, 60 sec: 7168.0, 300 sec: 7025.7). Total num frames: 59801600. Throughput: 0: 1734.7. Samples: 9946780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:53:12,933][41694] Avg episode reward: [(0, '4.554')] +[2024-11-08 00:53:15,552][42004] Updated weights for policy 0, policy_version 14606 (0.0025) +[2024-11-08 00:53:17,932][41694] Fps is (10 sec: 8017.6, 60 sec: 7168.0, 300 sec: 7025.7). Total num frames: 59842560. Throughput: 0: 1787.5. Samples: 9952912. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:53:17,933][41694] Avg episode reward: [(0, '4.441')] +[2024-11-08 00:53:20,669][42004] Updated weights for policy 0, policy_version 14616 (0.0023) +[2024-11-08 00:53:22,932][41694] Fps is (10 sec: 8191.9, 60 sec: 7168.0, 300 sec: 7039.6). Total num frames: 59883520. Throughput: 0: 1851.6. Samples: 9964940. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:53:22,934][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 00:53:25,785][42004] Updated weights for policy 0, policy_version 14626 (0.0023) +[2024-11-08 00:53:27,932][41694] Fps is (10 sec: 8191.3, 60 sec: 7236.1, 300 sec: 7116.4). Total num frames: 59924480. Throughput: 0: 1861.6. Samples: 9977030. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:53:27,935][41694] Avg episode reward: [(0, '4.514')] +[2024-11-08 00:53:30,882][42004] Updated weights for policy 0, policy_version 14636 (0.0030) +[2024-11-08 00:53:32,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7485.4, 300 sec: 7122.9). Total num frames: 59961344. Throughput: 0: 1856.8. Samples: 9982878. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 00:53:32,935][41694] Avg episode reward: [(0, '4.279')] +[2024-11-08 00:53:36,756][42004] Updated weights for policy 0, policy_version 14646 (0.0030) +[2024-11-08 00:53:37,932][41694] Fps is (10 sec: 7373.4, 60 sec: 7441.1, 300 sec: 7109.0). Total num frames: 59998208. Throughput: 0: 1826.8. Samples: 9993738. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 00:53:37,933][41694] Avg episode reward: [(0, '4.608')] +[2024-11-08 00:53:37,946][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000014648_59998208.pth... +[2024-11-08 00:53:38,182][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000014234_58302464.pth +[2024-11-08 00:53:42,932][41694] Fps is (10 sec: 5734.4, 60 sec: 7168.0, 300 sec: 7039.6). Total num frames: 60018688. Throughput: 0: 1744.4. Samples: 10001958. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 00:53:42,934][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 00:53:44,341][42004] Updated weights for policy 0, policy_version 14656 (0.0040) +[2024-11-08 00:53:47,931][41694] Fps is (10 sec: 5734.4, 60 sec: 7099.7, 300 sec: 7039.6). Total num frames: 60055552. Throughput: 0: 1727.0. Samples: 10006544. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:53:47,934][41694] Avg episode reward: [(0, '4.308')] +[2024-11-08 00:53:49,553][42004] Updated weights for policy 0, policy_version 14666 (0.0026) +[2024-11-08 00:53:52,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7099.7, 300 sec: 7039.6). Total num frames: 60096512. Throughput: 0: 1764.7. Samples: 10018392. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:53:52,934][41694] Avg episode reward: [(0, '4.403')] +[2024-11-08 00:53:54,713][42004] Updated weights for policy 0, policy_version 14676 (0.0029) +[2024-11-08 00:53:57,932][41694] Fps is (10 sec: 8192.0, 60 sec: 7168.0, 300 sec: 7067.3). Total num frames: 60137472. Throughput: 0: 1859.3. Samples: 10030450. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:53:57,933][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 00:53:59,744][42004] Updated weights for policy 0, policy_version 14686 (0.0030) +[2024-11-08 00:54:02,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7168.0, 300 sec: 7136.8). Total num frames: 60174336. Throughput: 0: 1856.5. Samples: 10036456. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:54:02,933][41694] Avg episode reward: [(0, '4.252')] +[2024-11-08 00:54:05,515][42004] Updated weights for policy 0, policy_version 14696 (0.0026) +[2024-11-08 00:54:07,932][41694] Fps is (10 sec: 6963.2, 60 sec: 7409.0, 300 sec: 7122.9). Total num frames: 60207104. Throughput: 0: 1826.9. Samples: 10047150. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 00:54:07,933][41694] Avg episode reward: [(0, '4.509')] +[2024-11-08 00:54:11,697][42004] Updated weights for policy 0, policy_version 14706 (0.0032) +[2024-11-08 00:54:12,932][41694] Fps is (10 sec: 6963.2, 60 sec: 7372.8, 300 sec: 7109.0). Total num frames: 60243968. Throughput: 0: 1780.9. Samples: 10057168. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 00:54:12,934][41694] Avg episode reward: [(0, '4.611')] +[2024-11-08 00:54:17,932][41694] Fps is (10 sec: 5734.3, 60 sec: 7031.4, 300 sec: 7039.6). Total num frames: 60264448. Throughput: 0: 1772.5. Samples: 10062640. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:54:17,933][41694] Avg episode reward: [(0, '4.573')] +[2024-11-08 00:54:19,058][42004] Updated weights for policy 0, policy_version 14716 (0.0028) +[2024-11-08 00:54:22,931][41694] Fps is (10 sec: 6144.0, 60 sec: 7031.5, 300 sec: 7053.5). Total num frames: 60305408. Throughput: 0: 1706.0. Samples: 10070510. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:54:22,935][41694] Avg episode reward: [(0, '4.475')] +[2024-11-08 00:54:24,175][42004] Updated weights for policy 0, policy_version 14726 (0.0025) +[2024-11-08 00:54:27,932][41694] Fps is (10 sec: 8192.1, 60 sec: 7031.6, 300 sec: 7053.4). Total num frames: 60346368. Throughput: 0: 1790.0. Samples: 10082506. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:54:27,934][41694] Avg episode reward: [(0, '4.422')] +[2024-11-08 00:54:29,269][42004] Updated weights for policy 0, policy_version 14736 (0.0032) +[2024-11-08 00:54:32,932][41694] Fps is (10 sec: 8192.0, 60 sec: 7099.7, 300 sec: 7067.3). Total num frames: 60387328. Throughput: 0: 1826.8. Samples: 10088748. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:54:32,933][41694] Avg episode reward: [(0, '4.255')] +[2024-11-08 00:54:34,354][42004] Updated weights for policy 0, policy_version 14746 (0.0030) +[2024-11-08 00:54:37,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7099.7, 300 sec: 7136.8). Total num frames: 60424192. Throughput: 0: 1827.6. Samples: 10100634. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:54:37,934][41694] Avg episode reward: [(0, '4.296')] +[2024-11-08 00:54:39,806][42004] Updated weights for policy 0, policy_version 14756 (0.0032) +[2024-11-08 00:54:42,931][41694] Fps is (10 sec: 6963.2, 60 sec: 7304.5, 300 sec: 7122.9). Total num frames: 60456960. Throughput: 0: 1795.2. Samples: 10111234. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:54:42,933][41694] Avg episode reward: [(0, '4.478')] +[2024-11-08 00:54:45,780][42004] Updated weights for policy 0, policy_version 14766 (0.0027) +[2024-11-08 00:54:47,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7372.8, 300 sec: 7136.8). Total num frames: 60497920. Throughput: 0: 1776.0. Samples: 10116374. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:54:47,935][41694] Avg episode reward: [(0, '4.512')] +[2024-11-08 00:54:52,931][41694] Fps is (10 sec: 6144.0, 60 sec: 7031.5, 300 sec: 7081.2). Total num frames: 60518400. Throughput: 0: 1724.4. Samples: 10124748. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:54:52,934][41694] Avg episode reward: [(0, '4.558')] +[2024-11-08 00:54:53,134][42004] Updated weights for policy 0, policy_version 14776 (0.0029) +[2024-11-08 00:54:57,931][41694] Fps is (10 sec: 6144.1, 60 sec: 7031.5, 300 sec: 7095.1). Total num frames: 60559360. Throughput: 0: 1749.6. Samples: 10135902. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:54:57,933][41694] Avg episode reward: [(0, '4.387')] +[2024-11-08 00:54:58,317][42004] Updated weights for policy 0, policy_version 14786 (0.0033) +[2024-11-08 00:55:02,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7031.5, 300 sec: 7081.2). Total num frames: 60596224. Throughput: 0: 1758.9. Samples: 10141788. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:55:02,933][41694] Avg episode reward: [(0, '4.451')] +[2024-11-08 00:55:03,581][42004] Updated weights for policy 0, policy_version 14796 (0.0027) +[2024-11-08 00:55:07,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7168.0, 300 sec: 7122.9). Total num frames: 60637184. Throughput: 0: 1846.9. Samples: 10153620. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:55:07,933][41694] Avg episode reward: [(0, '4.570')] +[2024-11-08 00:55:08,625][42004] Updated weights for policy 0, policy_version 14806 (0.0031) +[2024-11-08 00:55:12,932][41694] Fps is (10 sec: 8191.9, 60 sec: 7236.3, 300 sec: 7206.3). Total num frames: 60678144. Throughput: 0: 1844.1. Samples: 10165490. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:55:12,934][41694] Avg episode reward: [(0, '4.514')] +[2024-11-08 00:55:14,403][42004] Updated weights for policy 0, policy_version 14816 (0.0029) +[2024-11-08 00:55:17,932][41694] Fps is (10 sec: 6553.5, 60 sec: 7304.5, 300 sec: 7164.5). Total num frames: 60702720. Throughput: 0: 1801.9. Samples: 10169834. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:55:17,935][41694] Avg episode reward: [(0, '4.309')] +[2024-11-08 00:55:21,033][42004] Updated weights for policy 0, policy_version 14826 (0.0028) +[2024-11-08 00:55:22,932][41694] Fps is (10 sec: 6144.0, 60 sec: 7236.3, 300 sec: 7150.7). Total num frames: 60739584. Throughput: 0: 1749.2. Samples: 10179346. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:55:22,934][41694] Avg episode reward: [(0, '4.584')] +[2024-11-08 00:55:27,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6895.0, 300 sec: 7095.1). Total num frames: 60760064. Throughput: 0: 1668.0. Samples: 10186296. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:55:27,934][41694] Avg episode reward: [(0, '4.618')] +[2024-11-08 00:55:28,855][42004] Updated weights for policy 0, policy_version 14836 (0.0028) +[2024-11-08 00:55:32,932][41694] Fps is (10 sec: 5734.1, 60 sec: 6826.6, 300 sec: 7081.2). Total num frames: 60796928. Throughput: 0: 1677.4. Samples: 10191858. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:55:32,935][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 00:55:34,139][42004] Updated weights for policy 0, policy_version 14846 (0.0022) +[2024-11-08 00:55:37,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6894.9, 300 sec: 7095.1). Total num frames: 60837888. Throughput: 0: 1755.8. Samples: 10203758. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:55:37,933][41694] Avg episode reward: [(0, '4.451')] +[2024-11-08 00:55:37,972][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000014854_60841984.pth... +[2024-11-08 00:55:38,083][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000014434_59121664.pth +[2024-11-08 00:55:39,018][42004] Updated weights for policy 0, policy_version 14856 (0.0023) +[2024-11-08 00:55:42,932][41694] Fps is (10 sec: 8191.8, 60 sec: 7031.4, 300 sec: 7109.0). Total num frames: 60878848. Throughput: 0: 1783.2. Samples: 10216148. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:55:42,936][41694] Avg episode reward: [(0, '4.713')] +[2024-11-08 00:55:44,195][42004] Updated weights for policy 0, policy_version 14866 (0.0024) +[2024-11-08 00:55:47,932][41694] Fps is (10 sec: 8191.9, 60 sec: 7031.5, 300 sec: 7178.4). Total num frames: 60919808. Throughput: 0: 1783.7. Samples: 10222056. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:55:47,933][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 00:55:49,496][42004] Updated weights for policy 0, policy_version 14876 (0.0033) +[2024-11-08 00:55:52,932][41694] Fps is (10 sec: 7373.3, 60 sec: 7236.3, 300 sec: 7164.5). Total num frames: 60952576. Throughput: 0: 1757.6. Samples: 10232714. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:55:52,937][41694] Avg episode reward: [(0, '4.235')] +[2024-11-08 00:55:55,284][42004] Updated weights for policy 0, policy_version 14886 (0.0031) +[2024-11-08 00:55:57,932][41694] Fps is (10 sec: 6963.1, 60 sec: 7168.0, 300 sec: 7164.5). Total num frames: 60989440. Throughput: 0: 1750.0. Samples: 10244240. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:55:57,933][41694] Avg episode reward: [(0, '4.511')] +[2024-11-08 00:56:00,706][42004] Updated weights for policy 0, policy_version 14896 (0.0037) +[2024-11-08 00:56:02,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7168.0, 300 sec: 7164.5). Total num frames: 61026304. Throughput: 0: 1780.0. Samples: 10249936. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 00:56:02,933][41694] Avg episode reward: [(0, '4.326')] +[2024-11-08 00:56:06,117][42004] Updated weights for policy 0, policy_version 14906 (0.0031) +[2024-11-08 00:56:07,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7168.0, 300 sec: 7192.3). Total num frames: 61067264. Throughput: 0: 1816.7. Samples: 10261096. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 00:56:07,940][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 00:56:11,971][42004] Updated weights for policy 0, policy_version 14916 (0.0030) +[2024-11-08 00:56:12,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 7178.4). Total num frames: 61100032. Throughput: 0: 1891.7. Samples: 10271424. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:56:12,933][41694] Avg episode reward: [(0, '4.343')] +[2024-11-08 00:56:17,932][41694] Fps is (10 sec: 4914.8, 60 sec: 6894.9, 300 sec: 7109.0). Total num frames: 61116416. Throughput: 0: 1815.7. Samples: 10273566. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:56:17,937][41694] Avg episode reward: [(0, '4.362')] +[2024-11-08 00:56:20,716][42004] Updated weights for policy 0, policy_version 14926 (0.0026) +[2024-11-08 00:56:22,932][41694] Fps is (10 sec: 4914.8, 60 sec: 6826.6, 300 sec: 7150.6). Total num frames: 61149184. Throughput: 0: 1745.3. Samples: 10282298. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:56:22,934][41694] Avg episode reward: [(0, '4.533')] +[2024-11-08 00:56:27,347][42004] Updated weights for policy 0, policy_version 14936 (0.0038) +[2024-11-08 00:56:27,933][41694] Fps is (10 sec: 6553.4, 60 sec: 7031.3, 300 sec: 7122.8). Total num frames: 61181952. Throughput: 0: 1673.9. Samples: 10291474. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:56:27,936][41694] Avg episode reward: [(0, '4.373')] +[2024-11-08 00:56:32,932][41694] Fps is (10 sec: 6553.9, 60 sec: 6963.2, 300 sec: 7095.1). Total num frames: 61214720. Throughput: 0: 1644.3. Samples: 10296052. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:56:32,934][41694] Avg episode reward: [(0, '4.166')] +[2024-11-08 00:56:33,573][42004] Updated weights for policy 0, policy_version 14946 (0.0034) +[2024-11-08 00:56:37,932][41694] Fps is (10 sec: 6963.8, 60 sec: 6894.9, 300 sec: 7081.2). Total num frames: 61251584. Throughput: 0: 1657.5. Samples: 10307300. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:56:37,934][41694] Avg episode reward: [(0, '4.571')] +[2024-11-08 00:56:38,678][42004] Updated weights for policy 0, policy_version 14956 (0.0027) +[2024-11-08 00:56:42,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6826.7, 300 sec: 7081.2). Total num frames: 61288448. Throughput: 0: 1648.3. Samples: 10318414. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:56:42,934][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 00:56:44,247][42004] Updated weights for policy 0, policy_version 14966 (0.0026) +[2024-11-08 00:56:47,931][41694] Fps is (10 sec: 7782.7, 60 sec: 6826.7, 300 sec: 7095.1). Total num frames: 61329408. Throughput: 0: 1648.6. Samples: 10324122. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:56:47,934][41694] Avg episode reward: [(0, '4.684')] +[2024-11-08 00:56:49,554][42004] Updated weights for policy 0, policy_version 14976 (0.0028) +[2024-11-08 00:56:52,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6895.0, 300 sec: 7146.3). Total num frames: 61366272. Throughput: 0: 1662.8. Samples: 10335922. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:56:52,934][41694] Avg episode reward: [(0, '4.475')] +[2024-11-08 00:56:54,700][42004] Updated weights for policy 0, policy_version 14986 (0.0028) +[2024-11-08 00:56:57,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6894.9, 300 sec: 7150.6). Total num frames: 61403136. Throughput: 0: 1686.2. Samples: 10347304. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:56:57,934][41694] Avg episode reward: [(0, '4.573')] +[2024-11-08 00:57:00,717][42004] Updated weights for policy 0, policy_version 14996 (0.0024) +[2024-11-08 00:57:02,931][41694] Fps is (10 sec: 6963.1, 60 sec: 6826.7, 300 sec: 7136.8). Total num frames: 61435904. Throughput: 0: 1748.6. Samples: 10352252. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:57:02,934][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 00:57:06,703][42004] Updated weights for policy 0, policy_version 15006 (0.0030) +[2024-11-08 00:57:07,932][41694] Fps is (10 sec: 6553.0, 60 sec: 6690.0, 300 sec: 7109.0). Total num frames: 61468672. Throughput: 0: 1781.5. Samples: 10362466. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:57:07,935][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 00:57:12,870][42004] Updated weights for policy 0, policy_version 15016 (0.0028) +[2024-11-08 00:57:12,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 7095.1). Total num frames: 61505536. Throughput: 0: 1798.1. Samples: 10372386. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:57:12,934][41694] Avg episode reward: [(0, '4.348')] +[2024-11-08 00:57:17,932][41694] Fps is (10 sec: 6963.7, 60 sec: 7031.5, 300 sec: 7067.3). Total num frames: 61538304. Throughput: 0: 1808.0. Samples: 10377412. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 00:57:17,934][41694] Avg episode reward: [(0, '4.306')] +[2024-11-08 00:57:18,740][42004] Updated weights for policy 0, policy_version 15026 (0.0028) +[2024-11-08 00:57:22,932][41694] Fps is (10 sec: 6144.1, 60 sec: 6963.3, 300 sec: 7039.6). Total num frames: 61566976. Throughput: 0: 1775.7. Samples: 10387206. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 00:57:22,934][41694] Avg episode reward: [(0, '4.492')] +[2024-11-08 00:57:25,877][42004] Updated weights for policy 0, policy_version 15036 (0.0027) +[2024-11-08 00:57:27,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6963.3, 300 sec: 7075.1). Total num frames: 61599744. Throughput: 0: 1733.8. Samples: 10396436. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:57:27,935][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 00:57:31,853][42004] Updated weights for policy 0, policy_version 15046 (0.0023) +[2024-11-08 00:57:32,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6963.3, 300 sec: 7053.5). Total num frames: 61632512. Throughput: 0: 1722.3. Samples: 10401624. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:57:32,934][41694] Avg episode reward: [(0, '4.532')] +[2024-11-08 00:57:37,395][42004] Updated weights for policy 0, policy_version 15056 (0.0032) +[2024-11-08 00:57:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7031.5, 300 sec: 7067.3). Total num frames: 61673472. Throughput: 0: 1695.6. Samples: 10412224. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:57:37,934][41694] Avg episode reward: [(0, '4.338')] +[2024-11-08 00:57:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000015057_61673472.pth... +[2024-11-08 00:57:38,062][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000014648_59998208.pth +[2024-11-08 00:57:42,602][42004] Updated weights for policy 0, policy_version 15066 (0.0022) +[2024-11-08 00:57:42,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7031.5, 300 sec: 7053.5). Total num frames: 61710336. Throughput: 0: 1703.0. Samples: 10423940. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:57:42,933][41694] Avg episode reward: [(0, '4.382')] +[2024-11-08 00:57:47,699][42004] Updated weights for policy 0, policy_version 15076 (0.0026) +[2024-11-08 00:57:47,931][41694] Fps is (10 sec: 7782.6, 60 sec: 7031.5, 300 sec: 7053.5). Total num frames: 61751296. Throughput: 0: 1727.4. Samples: 10429986. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:57:47,933][41694] Avg episode reward: [(0, '4.212')] +[2024-11-08 00:57:52,932][41694] Fps is (10 sec: 7782.1, 60 sec: 7031.4, 300 sec: 7053.5). Total num frames: 61788160. Throughput: 0: 1754.4. Samples: 10441412. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 00:57:52,933][41694] Avg episode reward: [(0, '4.357')] +[2024-11-08 00:57:53,209][42004] Updated weights for policy 0, policy_version 15086 (0.0031) +[2024-11-08 00:57:57,932][41694] Fps is (10 sec: 6143.8, 60 sec: 6826.7, 300 sec: 7011.8). Total num frames: 61812736. Throughput: 0: 1717.8. Samples: 10449688. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 00:57:57,933][41694] Avg episode reward: [(0, '4.504')] +[2024-11-08 00:58:00,466][42004] Updated weights for policy 0, policy_version 15096 (0.0024) +[2024-11-08 00:58:02,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6826.6, 300 sec: 7060.5). Total num frames: 61845504. Throughput: 0: 1725.0. Samples: 10455036. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:58:02,936][41694] Avg episode reward: [(0, '4.244')] +[2024-11-08 00:58:06,806][42004] Updated weights for policy 0, policy_version 15106 (0.0034) +[2024-11-08 00:58:07,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6895.0, 300 sec: 7053.5). Total num frames: 61882368. Throughput: 0: 1722.0. Samples: 10464696. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:58:07,936][41694] Avg episode reward: [(0, '4.417')] +[2024-11-08 00:58:12,243][42004] Updated weights for policy 0, policy_version 15116 (0.0026) +[2024-11-08 00:58:12,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6895.0, 300 sec: 7039.6). Total num frames: 61919232. Throughput: 0: 1766.4. Samples: 10475922. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:58:12,933][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 00:58:17,349][42004] Updated weights for policy 0, policy_version 15126 (0.0028) +[2024-11-08 00:58:17,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7031.5, 300 sec: 7039.6). Total num frames: 61960192. Throughput: 0: 1783.3. Samples: 10481874. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:58:17,933][41694] Avg episode reward: [(0, '4.376')] +[2024-11-08 00:58:22,750][42004] Updated weights for policy 0, policy_version 15136 (0.0024) +[2024-11-08 00:58:22,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7168.0, 300 sec: 7025.7). Total num frames: 61997056. Throughput: 0: 1809.5. Samples: 10493652. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:58:22,933][41694] Avg episode reward: [(0, '4.627')] +[2024-11-08 00:58:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7236.3, 300 sec: 7025.7). Total num frames: 62033920. Throughput: 0: 1800.9. Samples: 10504982. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:58:27,934][41694] Avg episode reward: [(0, '4.447')] +[2024-11-08 00:58:28,099][42004] Updated weights for policy 0, policy_version 15146 (0.0024) +[2024-11-08 00:58:32,932][41694] Fps is (10 sec: 6143.8, 60 sec: 7099.7, 300 sec: 6984.0). Total num frames: 62058496. Throughput: 0: 1727.6. Samples: 10507730. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:58:32,934][41694] Avg episode reward: [(0, '4.386')] +[2024-11-08 00:58:35,249][42004] Updated weights for policy 0, policy_version 15156 (0.0029) +[2024-11-08 00:58:37,932][41694] Fps is (10 sec: 6144.0, 60 sec: 7031.5, 300 sec: 7039.6). Total num frames: 62095360. Throughput: 0: 1718.5. Samples: 10518744. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:58:37,934][41694] Avg episode reward: [(0, '4.198')] +[2024-11-08 00:58:41,297][42004] Updated weights for policy 0, policy_version 15166 (0.0025) +[2024-11-08 00:58:42,932][41694] Fps is (10 sec: 7372.9, 60 sec: 7031.4, 300 sec: 7039.6). Total num frames: 62132224. Throughput: 0: 1761.3. Samples: 10528946. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:58:42,934][41694] Avg episode reward: [(0, '4.270')] +[2024-11-08 00:58:46,478][42004] Updated weights for policy 0, policy_version 15176 (0.0024) +[2024-11-08 00:58:47,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 7025.7). Total num frames: 62169088. Throughput: 0: 1776.3. Samples: 10534970. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:58:47,943][41694] Avg episode reward: [(0, '4.409')] +[2024-11-08 00:58:51,542][42004] Updated weights for policy 0, policy_version 15186 (0.0026) +[2024-11-08 00:58:52,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7031.5, 300 sec: 7025.7). Total num frames: 62210048. Throughput: 0: 1827.2. Samples: 10546920. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:58:52,935][41694] Avg episode reward: [(0, '4.481')] +[2024-11-08 00:58:56,843][42004] Updated weights for policy 0, policy_version 15196 (0.0038) +[2024-11-08 00:58:57,931][41694] Fps is (10 sec: 8192.0, 60 sec: 7304.6, 300 sec: 7039.6). Total num frames: 62251008. Throughput: 0: 1836.7. Samples: 10558574. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:58:57,934][41694] Avg episode reward: [(0, '4.526')] +[2024-11-08 00:59:03,759][41694] Fps is (10 sec: 6052.8, 60 sec: 7070.5, 300 sec: 6992.2). Total num frames: 62275584. Throughput: 0: 1796.1. Samples: 10564184. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:59:03,761][41694] Avg episode reward: [(0, '4.552')] +[2024-11-08 00:59:04,364][42004] Updated weights for policy 0, policy_version 15206 (0.0037) +[2024-11-08 00:59:07,932][41694] Fps is (10 sec: 5734.3, 60 sec: 7099.7, 300 sec: 6997.9). Total num frames: 62308352. Throughput: 0: 1736.5. Samples: 10571796. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:59:07,934][41694] Avg episode reward: [(0, '4.518')] +[2024-11-08 00:59:09,761][42004] Updated weights for policy 0, policy_version 15216 (0.0032) +[2024-11-08 00:59:12,931][41694] Fps is (10 sec: 7144.9, 60 sec: 7031.5, 300 sec: 7039.6). Total num frames: 62341120. Throughput: 0: 1720.1. Samples: 10582388. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:59:12,933][41694] Avg episode reward: [(0, '4.507')] +[2024-11-08 00:59:15,960][42004] Updated weights for policy 0, policy_version 15226 (0.0034) +[2024-11-08 00:59:17,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6963.2, 300 sec: 7025.7). Total num frames: 62377984. Throughput: 0: 1770.0. Samples: 10587380. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:59:17,933][41694] Avg episode reward: [(0, '4.531')] +[2024-11-08 00:59:21,337][42004] Updated weights for policy 0, policy_version 15236 (0.0024) +[2024-11-08 00:59:22,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7031.4, 300 sec: 7025.7). Total num frames: 62418944. Throughput: 0: 1777.3. Samples: 10598724. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:59:22,934][41694] Avg episode reward: [(0, '4.459')] +[2024-11-08 00:59:26,614][42004] Updated weights for policy 0, policy_version 15246 (0.0029) +[2024-11-08 00:59:27,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7031.5, 300 sec: 7011.8). Total num frames: 62455808. Throughput: 0: 1805.1. Samples: 10610176. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:59:27,933][41694] Avg episode reward: [(0, '4.451')] +[2024-11-08 00:59:31,899][42004] Updated weights for policy 0, policy_version 15256 (0.0036) +[2024-11-08 00:59:32,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7236.3, 300 sec: 7011.8). Total num frames: 62492672. Throughput: 0: 1795.6. Samples: 10615772. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:59:32,933][41694] Avg episode reward: [(0, '4.328')] +[2024-11-08 00:59:37,931][41694] Fps is (10 sec: 6553.6, 60 sec: 7099.8, 300 sec: 6997.9). Total num frames: 62521344. Throughput: 0: 1792.3. Samples: 10627572. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:59:37,934][41694] Avg episode reward: [(0, '4.448')] +[2024-11-08 00:59:37,949][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000015264_62521344.pth... +[2024-11-08 00:59:38,061][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000014854_60841984.pth +[2024-11-08 00:59:38,907][42004] Updated weights for policy 0, policy_version 15266 (0.0026) +[2024-11-08 00:59:42,932][41694] Fps is (10 sec: 6553.3, 60 sec: 7099.7, 300 sec: 6984.0). Total num frames: 62558208. Throughput: 0: 1713.2. Samples: 10635670. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:59:42,935][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 00:59:44,763][42004] Updated weights for policy 0, policy_version 15276 (0.0033) +[2024-11-08 00:59:47,932][41694] Fps is (10 sec: 6962.9, 60 sec: 7031.4, 300 sec: 7025.7). Total num frames: 62590976. Throughput: 0: 1733.5. Samples: 10640756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:59:47,934][41694] Avg episode reward: [(0, '4.430')] +[2024-11-08 00:59:50,568][42004] Updated weights for policy 0, policy_version 15286 (0.0029) +[2024-11-08 00:59:52,932][41694] Fps is (10 sec: 6963.4, 60 sec: 6963.2, 300 sec: 7011.8). Total num frames: 62627840. Throughput: 0: 1767.2. Samples: 10651322. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:59:52,935][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 00:59:56,011][42004] Updated weights for policy 0, policy_version 15296 (0.0027) +[2024-11-08 00:59:57,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6894.9, 300 sec: 7011.8). Total num frames: 62664704. Throughput: 0: 1780.3. Samples: 10662502. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:59:57,934][41694] Avg episode reward: [(0, '4.245')] +[2024-11-08 01:00:01,756][42004] Updated weights for policy 0, policy_version 15306 (0.0035) +[2024-11-08 01:00:02,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7199.0, 300 sec: 6997.9). Total num frames: 62701568. Throughput: 0: 1792.6. Samples: 10668046. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:00:02,933][41694] Avg episode reward: [(0, '4.441')] +[2024-11-08 01:00:06,975][42004] Updated weights for policy 0, policy_version 15316 (0.0024) +[2024-11-08 01:00:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7168.0, 300 sec: 6984.0). Total num frames: 62738432. Throughput: 0: 1787.5. Samples: 10679162. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:00:07,933][41694] Avg episode reward: [(0, '4.501')] +[2024-11-08 01:00:12,931][41694] Fps is (10 sec: 6553.7, 60 sec: 7099.7, 300 sec: 6997.9). Total num frames: 62767104. Throughput: 0: 1723.9. Samples: 10687752. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:00:12,933][41694] Avg episode reward: [(0, '4.652')] +[2024-11-08 01:00:14,039][42004] Updated weights for policy 0, policy_version 15326 (0.0029) +[2024-11-08 01:00:17,932][41694] Fps is (10 sec: 6143.9, 60 sec: 7031.4, 300 sec: 6984.0). Total num frames: 62799872. Throughput: 0: 1713.0. Samples: 10692856. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:00:17,933][41694] Avg episode reward: [(0, '4.532')] +[2024-11-08 01:00:20,845][42004] Updated weights for policy 0, policy_version 15336 (0.0037) +[2024-11-08 01:00:22,935][41694] Fps is (10 sec: 6141.6, 60 sec: 6826.2, 300 sec: 7011.7). Total num frames: 62828544. Throughput: 0: 1650.4. Samples: 10701846. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:00:22,939][41694] Avg episode reward: [(0, '4.588')] +[2024-11-08 01:00:26,504][42004] Updated weights for policy 0, policy_version 15346 (0.0026) +[2024-11-08 01:00:27,934][41694] Fps is (10 sec: 6552.1, 60 sec: 6826.4, 300 sec: 7011.7). Total num frames: 62865408. Throughput: 0: 1716.8. Samples: 10712930. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 01:00:27,939][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 01:00:31,661][42004] Updated weights for policy 0, policy_version 15356 (0.0028) +[2024-11-08 01:00:32,931][41694] Fps is (10 sec: 7785.5, 60 sec: 6894.9, 300 sec: 7011.8). Total num frames: 62906368. Throughput: 0: 1734.3. Samples: 10718800. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 01:00:32,933][41694] Avg episode reward: [(0, '4.490')] +[2024-11-08 01:00:36,979][42004] Updated weights for policy 0, policy_version 15366 (0.0023) +[2024-11-08 01:00:37,932][41694] Fps is (10 sec: 7784.1, 60 sec: 7031.4, 300 sec: 6997.9). Total num frames: 62943232. Throughput: 0: 1760.6. Samples: 10730548. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:00:37,938][41694] Avg episode reward: [(0, '4.281')] +[2024-11-08 01:00:42,299][42004] Updated weights for policy 0, policy_version 15376 (0.0030) +[2024-11-08 01:00:42,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7099.8, 300 sec: 6997.9). Total num frames: 62984192. Throughput: 0: 1766.4. Samples: 10741988. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:00:42,933][41694] Avg episode reward: [(0, '4.332')] +[2024-11-08 01:00:47,932][41694] Fps is (10 sec: 6553.9, 60 sec: 6963.2, 300 sec: 6970.1). Total num frames: 63008768. Throughput: 0: 1715.6. Samples: 10745248. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:00:47,934][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 01:00:49,358][42004] Updated weights for policy 0, policy_version 15386 (0.0037) +[2024-11-08 01:00:52,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6963.2, 300 sec: 6970.1). Total num frames: 63045632. Throughput: 0: 1710.0. Samples: 10756114. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:00:52,934][41694] Avg episode reward: [(0, '4.450')] +[2024-11-08 01:00:55,127][42004] Updated weights for policy 0, policy_version 15396 (0.0030) +[2024-11-08 01:00:57,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6970.1). Total num frames: 63082496. Throughput: 0: 1753.3. Samples: 10766652. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:00:57,934][41694] Avg episode reward: [(0, '4.319')] +[2024-11-08 01:01:00,403][42004] Updated weights for policy 0, policy_version 15406 (0.0028) +[2024-11-08 01:01:02,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 6956.3). Total num frames: 63119360. Throughput: 0: 1772.2. Samples: 10772606. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:01:02,933][41694] Avg episode reward: [(0, '4.390')] +[2024-11-08 01:01:05,967][42004] Updated weights for policy 0, policy_version 15416 (0.0025) +[2024-11-08 01:01:07,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6970.1). Total num frames: 63156224. Throughput: 0: 1815.7. Samples: 10783544. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:01:07,935][41694] Avg episode reward: [(0, '4.322')] +[2024-11-08 01:01:11,848][42004] Updated weights for policy 0, policy_version 15426 (0.0024) +[2024-11-08 01:01:12,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7099.7, 300 sec: 7039.6). Total num frames: 63193088. Throughput: 0: 1806.2. Samples: 10794204. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:01:12,934][41694] Avg episode reward: [(0, '4.184')] +[2024-11-08 01:01:17,232][42004] Updated weights for policy 0, policy_version 15436 (0.0025) +[2024-11-08 01:01:19,376][41694] Fps is (10 sec: 6084.2, 60 sec: 6932.8, 300 sec: 7005.3). Total num frames: 63225856. Throughput: 0: 1741.2. Samples: 10799672. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:01:19,378][41694] Avg episode reward: [(0, '4.589')] +[2024-11-08 01:01:22,932][41694] Fps is (10 sec: 6144.0, 60 sec: 7100.2, 300 sec: 7025.7). Total num frames: 63254528. Throughput: 0: 1721.1. Samples: 10807996. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:01:22,934][41694] Avg episode reward: [(0, '4.399')] +[2024-11-08 01:01:24,664][42004] Updated weights for policy 0, policy_version 15446 (0.0032) +[2024-11-08 01:01:27,932][41694] Fps is (10 sec: 7180.9, 60 sec: 7031.7, 300 sec: 7025.7). Total num frames: 63287296. Throughput: 0: 1690.6. Samples: 10818068. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:01:27,937][41694] Avg episode reward: [(0, '4.455')] +[2024-11-08 01:01:30,678][42004] Updated weights for policy 0, policy_version 15456 (0.0051) +[2024-11-08 01:01:32,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6963.2, 300 sec: 7025.7). Total num frames: 63324160. Throughput: 0: 1732.3. Samples: 10823202. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:01:32,933][41694] Avg episode reward: [(0, '4.431')] +[2024-11-08 01:01:35,652][42004] Updated weights for policy 0, policy_version 15466 (0.0029) +[2024-11-08 01:01:37,931][41694] Fps is (10 sec: 7783.1, 60 sec: 7031.5, 300 sec: 7039.6). Total num frames: 63365120. Throughput: 0: 1761.5. Samples: 10835380. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:01:37,934][41694] Avg episode reward: [(0, '4.577')] +[2024-11-08 01:01:37,944][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000015470_63365120.pth... +[2024-11-08 01:01:38,067][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000015057_61673472.pth +[2024-11-08 01:01:40,913][42004] Updated weights for policy 0, policy_version 15476 (0.0030) +[2024-11-08 01:01:42,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6963.2, 300 sec: 7025.7). Total num frames: 63401984. Throughput: 0: 1784.4. Samples: 10846952. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:01:42,935][41694] Avg episode reward: [(0, '4.295')] +[2024-11-08 01:01:46,315][42004] Updated weights for policy 0, policy_version 15486 (0.0026) +[2024-11-08 01:01:47,932][41694] Fps is (10 sec: 7781.7, 60 sec: 7236.2, 300 sec: 7039.5). Total num frames: 63442944. Throughput: 0: 1779.7. Samples: 10852696. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:01:47,934][41694] Avg episode reward: [(0, '4.490')] +[2024-11-08 01:01:51,584][42004] Updated weights for policy 0, policy_version 15496 (0.0027) +[2024-11-08 01:01:53,358][41694] Fps is (10 sec: 6678.3, 60 sec: 7049.6, 300 sec: 7001.7). Total num frames: 63471616. Throughput: 0: 1777.5. Samples: 10864292. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:01:53,360][41694] Avg episode reward: [(0, '4.747')] +[2024-11-08 01:01:57,931][41694] Fps is (10 sec: 6144.6, 60 sec: 7031.5, 300 sec: 7011.8). Total num frames: 63504384. Throughput: 0: 1738.5. Samples: 10872434. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:01:57,934][41694] Avg episode reward: [(0, '4.383')] +[2024-11-08 01:01:58,674][42004] Updated weights for policy 0, policy_version 15506 (0.0035) +[2024-11-08 01:02:02,932][41694] Fps is (10 sec: 6845.7, 60 sec: 6963.2, 300 sec: 7011.8). Total num frames: 63537152. Throughput: 0: 1791.0. Samples: 10877680. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:02:02,934][41694] Avg episode reward: [(0, '4.327')] +[2024-11-08 01:02:05,108][42004] Updated weights for policy 0, policy_version 15516 (0.0021) +[2024-11-08 01:02:07,932][41694] Fps is (10 sec: 6553.2, 60 sec: 6894.9, 300 sec: 6997.9). Total num frames: 63569920. Throughput: 0: 1770.3. Samples: 10887658. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:02:07,935][41694] Avg episode reward: [(0, '4.339')] +[2024-11-08 01:02:10,633][42004] Updated weights for policy 0, policy_version 15526 (0.0040) +[2024-11-08 01:02:12,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 7025.7). Total num frames: 63610880. Throughput: 0: 1796.6. Samples: 10898914. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:02:12,933][41694] Avg episode reward: [(0, '4.454')] +[2024-11-08 01:02:15,827][42004] Updated weights for policy 0, policy_version 15536 (0.0023) +[2024-11-08 01:02:17,931][41694] Fps is (10 sec: 7782.8, 60 sec: 7205.0, 300 sec: 7053.5). Total num frames: 63647744. Throughput: 0: 1813.2. Samples: 10904794. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:02:17,933][41694] Avg episode reward: [(0, '4.435')] +[2024-11-08 01:02:21,082][42004] Updated weights for policy 0, policy_version 15546 (0.0026) +[2024-11-08 01:02:22,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7236.3, 300 sec: 7081.2). Total num frames: 63688704. Throughput: 0: 1801.2. Samples: 10916432. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:02:22,934][41694] Avg episode reward: [(0, '4.344')] +[2024-11-08 01:02:27,932][41694] Fps is (10 sec: 6553.4, 60 sec: 7099.8, 300 sec: 7053.4). Total num frames: 63713280. Throughput: 0: 1735.7. Samples: 10925058. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:02:27,934][41694] Avg episode reward: [(0, '4.327')] +[2024-11-08 01:02:28,262][42004] Updated weights for policy 0, policy_version 15556 (0.0027) +[2024-11-08 01:02:32,931][41694] Fps is (10 sec: 6144.1, 60 sec: 7099.7, 300 sec: 7039.6). Total num frames: 63750144. Throughput: 0: 1718.9. Samples: 10930044. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:02:32,936][41694] Avg episode reward: [(0, '4.470')] +[2024-11-08 01:02:33,673][42004] Updated weights for policy 0, policy_version 15566 (0.0037) +[2024-11-08 01:02:37,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6963.2, 300 sec: 7025.7). Total num frames: 63782912. Throughput: 0: 1721.4. Samples: 10941020. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:02:37,933][41694] Avg episode reward: [(0, '4.482')] +[2024-11-08 01:02:39,849][42004] Updated weights for policy 0, policy_version 15576 (0.0033) +[2024-11-08 01:02:42,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6895.0, 300 sec: 6997.9). Total num frames: 63815680. Throughput: 0: 1736.7. Samples: 10950586. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:02:42,933][41694] Avg episode reward: [(0, '4.328')] +[2024-11-08 01:02:45,778][42004] Updated weights for policy 0, policy_version 15586 (0.0030) +[2024-11-08 01:02:47,937][41694] Fps is (10 sec: 6549.8, 60 sec: 6757.8, 300 sec: 6983.9). Total num frames: 63848448. Throughput: 0: 1742.2. Samples: 10956090. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:02:47,939][41694] Avg episode reward: [(0, '4.192')] +[2024-11-08 01:02:52,308][42004] Updated weights for policy 0, policy_version 15596 (0.0045) +[2024-11-08 01:02:52,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6944.3, 300 sec: 7025.7). Total num frames: 63885312. Throughput: 0: 1731.1. Samples: 10965556. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:02:52,933][41694] Avg episode reward: [(0, '4.386')] +[2024-11-08 01:02:57,747][42004] Updated weights for policy 0, policy_version 15606 (0.0031) +[2024-11-08 01:02:57,932][41694] Fps is (10 sec: 7376.7, 60 sec: 6963.1, 300 sec: 7039.6). Total num frames: 63922176. Throughput: 0: 1730.9. Samples: 10976806. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:02:57,934][41694] Avg episode reward: [(0, '4.332')] +[2024-11-08 01:03:02,931][41694] Fps is (10 sec: 6143.9, 60 sec: 6826.7, 300 sec: 6997.9). Total num frames: 63946752. Throughput: 0: 1707.6. Samples: 10981634. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:03:02,934][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 01:03:05,159][42004] Updated weights for policy 0, policy_version 15616 (0.0032) +[2024-11-08 01:03:07,931][41694] Fps is (10 sec: 5734.7, 60 sec: 6826.7, 300 sec: 6984.0). Total num frames: 63979520. Throughput: 0: 1632.3. Samples: 10989884. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:03:07,933][41694] Avg episode reward: [(0, '4.355')] +[2024-11-08 01:03:10,949][42004] Updated weights for policy 0, policy_version 15626 (0.0045) +[2024-11-08 01:03:12,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6970.1). Total num frames: 64016384. Throughput: 0: 1671.5. Samples: 11000274. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:03:12,936][41694] Avg episode reward: [(0, '4.332')] +[2024-11-08 01:03:16,719][42004] Updated weights for policy 0, policy_version 15636 (0.0041) +[2024-11-08 01:03:17,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6970.1). Total num frames: 64053248. Throughput: 0: 1677.1. Samples: 11005514. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:03:17,934][41694] Avg episode reward: [(0, '4.476')] +[2024-11-08 01:03:22,091][42004] Updated weights for policy 0, policy_version 15646 (0.0031) +[2024-11-08 01:03:22,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.1, 300 sec: 6970.1). Total num frames: 64090112. Throughput: 0: 1692.8. Samples: 11017196. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:03:22,934][41694] Avg episode reward: [(0, '4.436')] +[2024-11-08 01:03:27,316][42004] Updated weights for policy 0, policy_version 15656 (0.0031) +[2024-11-08 01:03:27,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6963.2, 300 sec: 7025.7). Total num frames: 64131072. Throughput: 0: 1736.8. Samples: 11028740. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:03:27,934][41694] Avg episode reward: [(0, '4.371')] +[2024-11-08 01:03:32,625][42004] Updated weights for policy 0, policy_version 15666 (0.0031) +[2024-11-08 01:03:32,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6963.2, 300 sec: 7025.7). Total num frames: 64167936. Throughput: 0: 1740.1. Samples: 11034386. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:03:32,934][41694] Avg episode reward: [(0, '4.479')] +[2024-11-08 01:03:37,932][41694] Fps is (10 sec: 6143.8, 60 sec: 6826.6, 300 sec: 6984.0). Total num frames: 64192512. Throughput: 0: 1714.5. Samples: 11042710. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:03:37,934][41694] Avg episode reward: [(0, '4.444')] +[2024-11-08 01:03:37,990][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000015673_64196608.pth... +[2024-11-08 01:03:38,086][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000015264_62521344.pth +[2024-11-08 01:03:39,662][42004] Updated weights for policy 0, policy_version 15676 (0.0029) +[2024-11-08 01:03:42,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6894.9, 300 sec: 6984.0). Total num frames: 64229376. Throughput: 0: 1713.1. Samples: 11053896. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 01:03:42,933][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 01:03:45,405][42004] Updated weights for policy 0, policy_version 15686 (0.0034) +[2024-11-08 01:03:47,931][41694] Fps is (10 sec: 7373.1, 60 sec: 6963.9, 300 sec: 6970.1). Total num frames: 64266240. Throughput: 0: 1727.1. Samples: 11059354. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 01:03:47,933][41694] Avg episode reward: [(0, '4.696')] +[2024-11-08 01:03:50,931][42004] Updated weights for policy 0, policy_version 15696 (0.0030) +[2024-11-08 01:03:52,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 6942.4). Total num frames: 64299008. Throughput: 0: 1787.2. Samples: 11070308. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:03:52,933][41694] Avg episode reward: [(0, '4.331')] +[2024-11-08 01:03:57,725][42004] Updated weights for policy 0, policy_version 15706 (0.0041) +[2024-11-08 01:03:57,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6826.7, 300 sec: 6989.7). Total num frames: 64331776. Throughput: 0: 1756.4. Samples: 11079314. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:03:57,934][41694] Avg episode reward: [(0, '4.414')] +[2024-11-08 01:04:02,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6963.2, 300 sec: 6970.1). Total num frames: 64364544. Throughput: 0: 1758.8. Samples: 11084660. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:04:02,935][41694] Avg episode reward: [(0, '4.547')] +[2024-11-08 01:04:03,865][42004] Updated weights for policy 0, policy_version 15716 (0.0033) +[2024-11-08 01:04:09,024][41694] Fps is (10 sec: 5539.2, 60 sec: 6771.7, 300 sec: 6930.6). Total num frames: 64393216. Throughput: 0: 1662.8. Samples: 11093836. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:04:09,025][41694] Avg episode reward: [(0, '4.412')] +[2024-11-08 01:04:11,949][42004] Updated weights for policy 0, policy_version 15726 (0.0043) +[2024-11-08 01:04:12,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6690.1, 300 sec: 6914.6). Total num frames: 64417792. Throughput: 0: 1609.3. Samples: 11101160. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:04:12,933][41694] Avg episode reward: [(0, '4.548')] +[2024-11-08 01:04:17,931][41694] Fps is (10 sec: 6437.4, 60 sec: 6621.9, 300 sec: 6886.8). Total num frames: 64450560. Throughput: 0: 1584.0. Samples: 11105666. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:04:17,933][41694] Avg episode reward: [(0, '4.267')] +[2024-11-08 01:04:18,223][42004] Updated weights for policy 0, policy_version 15736 (0.0028) +[2024-11-08 01:04:22,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6872.9). Total num frames: 64483328. Throughput: 0: 1619.9. Samples: 11115604. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:04:22,934][41694] Avg episode reward: [(0, '4.346')] +[2024-11-08 01:04:24,548][42004] Updated weights for policy 0, policy_version 15746 (0.0028) +[2024-11-08 01:04:27,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6485.3, 300 sec: 6872.9). Total num frames: 64520192. Throughput: 0: 1604.4. Samples: 11126096. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:04:27,940][41694] Avg episode reward: [(0, '4.370')] +[2024-11-08 01:04:30,090][42004] Updated weights for policy 0, policy_version 15756 (0.0036) +[2024-11-08 01:04:32,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6485.3, 300 sec: 6900.7). Total num frames: 64557056. Throughput: 0: 1603.3. Samples: 11131502. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:04:32,933][41694] Avg episode reward: [(0, '4.584')] +[2024-11-08 01:04:35,812][42004] Updated weights for policy 0, policy_version 15766 (0.0029) +[2024-11-08 01:04:37,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 6886.8). Total num frames: 64589824. Throughput: 0: 1601.5. Samples: 11142374. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:04:37,933][41694] Avg episode reward: [(0, '4.528')] +[2024-11-08 01:04:43,059][41694] Fps is (10 sec: 5662.5, 60 sec: 6403.5, 300 sec: 6856.1). Total num frames: 64614400. Throughput: 0: 1505.1. Samples: 11147236. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:04:43,060][41694] Avg episode reward: [(0, '4.575')] +[2024-11-08 01:04:43,695][42004] Updated weights for policy 0, policy_version 15776 (0.0032) +[2024-11-08 01:04:47,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6348.8, 300 sec: 6845.2). Total num frames: 64647168. Throughput: 0: 1548.2. Samples: 11154328. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:04:47,933][41694] Avg episode reward: [(0, '4.503')] +[2024-11-08 01:04:49,818][42004] Updated weights for policy 0, policy_version 15786 (0.0029) +[2024-11-08 01:04:52,932][41694] Fps is (10 sec: 6223.0, 60 sec: 6280.5, 300 sec: 6817.4). Total num frames: 64675840. Throughput: 0: 1603.2. Samples: 11164228. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:04:52,933][41694] Avg episode reward: [(0, '4.396')] +[2024-11-08 01:04:56,619][42004] Updated weights for policy 0, policy_version 15796 (0.0048) +[2024-11-08 01:04:57,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6280.6, 300 sec: 6803.5). Total num frames: 64708608. Throughput: 0: 1603.3. Samples: 11173310. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:04:57,933][41694] Avg episode reward: [(0, '4.373')] +[2024-11-08 01:05:02,607][42004] Updated weights for policy 0, policy_version 15806 (0.0043) +[2024-11-08 01:05:02,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6789.6). Total num frames: 64741376. Throughput: 0: 1621.8. Samples: 11178648. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:05:02,933][41694] Avg episode reward: [(0, '4.458')] +[2024-11-08 01:05:07,932][41694] Fps is (10 sec: 6962.6, 60 sec: 6535.9, 300 sec: 6817.4). Total num frames: 64778240. Throughput: 0: 1627.3. Samples: 11188832. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:05:07,934][41694] Avg episode reward: [(0, '4.121')] +[2024-11-08 01:05:08,483][42004] Updated weights for policy 0, policy_version 15816 (0.0028) +[2024-11-08 01:05:12,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6553.6, 300 sec: 6817.4). Total num frames: 64811008. Throughput: 0: 1622.4. Samples: 11199104. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:05:12,933][41694] Avg episode reward: [(0, '4.517')] +[2024-11-08 01:05:14,768][42004] Updated weights for policy 0, policy_version 15826 (0.0040) +[2024-11-08 01:05:17,932][41694] Fps is (10 sec: 4915.5, 60 sec: 6280.5, 300 sec: 6775.8). Total num frames: 64827392. Throughput: 0: 1605.6. Samples: 11203752. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:05:17,933][41694] Avg episode reward: [(0, '4.668')] +[2024-11-08 01:05:22,931][41694] Fps is (10 sec: 4505.7, 60 sec: 6212.3, 300 sec: 6748.0). Total num frames: 64856064. Throughput: 0: 1490.1. Samples: 11209430. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:05:22,934][41694] Avg episode reward: [(0, '4.666')] +[2024-11-08 01:05:23,564][42004] Updated weights for policy 0, policy_version 15836 (0.0045) +[2024-11-08 01:05:27,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6212.3, 300 sec: 6734.1). Total num frames: 64892928. Throughput: 0: 1613.2. Samples: 11219626. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:05:27,934][41694] Avg episode reward: [(0, '4.502')] +[2024-11-08 01:05:29,675][42004] Updated weights for policy 0, policy_version 15846 (0.0022) +[2024-11-08 01:05:32,932][41694] Fps is (10 sec: 6962.8, 60 sec: 6143.9, 300 sec: 6720.2). Total num frames: 64925696. Throughput: 0: 1557.5. Samples: 11224416. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:05:32,936][41694] Avg episode reward: [(0, '4.463')] +[2024-11-08 01:05:35,486][42004] Updated weights for policy 0, policy_version 15856 (0.0041) +[2024-11-08 01:05:37,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6212.3, 300 sec: 6706.3). Total num frames: 64962560. Throughput: 0: 1579.4. Samples: 11235302. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:05:37,933][41694] Avg episode reward: [(0, '4.489')] +[2024-11-08 01:05:37,952][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000015860_64962560.pth... +[2024-11-08 01:05:38,127][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000015470_63365120.pth +[2024-11-08 01:05:41,253][42004] Updated weights for policy 0, policy_version 15866 (0.0039) +[2024-11-08 01:05:42,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6362.2, 300 sec: 6734.1). Total num frames: 64995328. Throughput: 0: 1613.7. Samples: 11245928. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:05:42,938][41694] Avg episode reward: [(0, '4.351')] +[2024-11-08 01:05:47,194][42004] Updated weights for policy 0, policy_version 15876 (0.0036) +[2024-11-08 01:05:47,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6417.1, 300 sec: 6734.1). Total num frames: 65032192. Throughput: 0: 1602.8. Samples: 11250776. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:05:47,933][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 01:05:52,932][41694] Fps is (10 sec: 5734.1, 60 sec: 6280.4, 300 sec: 6678.5). Total num frames: 65052672. Throughput: 0: 1544.6. Samples: 11258340. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:05:52,937][41694] Avg episode reward: [(0, '4.533')] +[2024-11-08 01:05:55,051][42004] Updated weights for policy 0, policy_version 15886 (0.0029) +[2024-11-08 01:05:57,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6280.5, 300 sec: 6664.7). Total num frames: 65085440. Throughput: 0: 1528.8. Samples: 11267898. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:05:57,933][41694] Avg episode reward: [(0, '4.145')] +[2024-11-08 01:06:00,975][42004] Updated weights for policy 0, policy_version 15896 (0.0039) +[2024-11-08 01:06:02,932][41694] Fps is (10 sec: 6554.0, 60 sec: 6280.5, 300 sec: 6650.8). Total num frames: 65118208. Throughput: 0: 1546.4. Samples: 11273340. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:06:02,933][41694] Avg episode reward: [(0, '4.420')] +[2024-11-08 01:06:07,800][42004] Updated weights for policy 0, policy_version 15906 (0.0027) +[2024-11-08 01:06:07,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6212.3, 300 sec: 6636.9). Total num frames: 65150976. Throughput: 0: 1618.6. Samples: 11282266. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:06:07,933][41694] Avg episode reward: [(0, '4.495')] +[2024-11-08 01:06:12,932][41694] Fps is (10 sec: 6553.8, 60 sec: 6212.3, 300 sec: 6669.6). Total num frames: 65183744. Throughput: 0: 1619.5. Samples: 11292502. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:06:12,933][41694] Avg episode reward: [(0, '4.489')] +[2024-11-08 01:06:13,712][42004] Updated weights for policy 0, policy_version 15916 (0.0031) +[2024-11-08 01:06:17,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 65220608. Throughput: 0: 1622.5. Samples: 11297428. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:06:17,933][41694] Avg episode reward: [(0, '4.351')] +[2024-11-08 01:06:19,590][42004] Updated weights for policy 0, policy_version 15926 (0.0034) +[2024-11-08 01:06:22,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 65257472. Throughput: 0: 1623.3. Samples: 11308352. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:06:22,933][41694] Avg episode reward: [(0, '4.455')] +[2024-11-08 01:06:26,916][42004] Updated weights for policy 0, policy_version 15936 (0.0028) +[2024-11-08 01:06:27,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6417.1, 300 sec: 6623.0). Total num frames: 65277952. Throughput: 0: 1559.6. Samples: 11316108. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:06:27,934][41694] Avg episode reward: [(0, '4.348')] +[2024-11-08 01:06:32,570][42004] Updated weights for policy 0, policy_version 15946 (0.0025) +[2024-11-08 01:06:32,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6485.4, 300 sec: 6609.1). Total num frames: 65314816. Throughput: 0: 1570.4. Samples: 11321444. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:06:32,933][41694] Avg episode reward: [(0, '4.538')] +[2024-11-08 01:06:37,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6485.3, 300 sec: 6609.1). Total num frames: 65351680. Throughput: 0: 1647.2. Samples: 11332464. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:06:37,933][41694] Avg episode reward: [(0, '4.507')] +[2024-11-08 01:06:38,478][42004] Updated weights for policy 0, policy_version 15956 (0.0029) +[2024-11-08 01:06:42,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6485.4, 300 sec: 6581.4). Total num frames: 65384448. Throughput: 0: 1666.9. Samples: 11342910. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:06:42,933][41694] Avg episode reward: [(0, '4.338')] +[2024-11-08 01:06:44,072][42004] Updated weights for policy 0, policy_version 15966 (0.0049) +[2024-11-08 01:06:47,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6553.6, 300 sec: 6632.6). Total num frames: 65425408. Throughput: 0: 1674.0. Samples: 11348668. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:06:47,933][41694] Avg episode reward: [(0, '4.469')] +[2024-11-08 01:06:49,605][42004] Updated weights for policy 0, policy_version 15976 (0.0035) +[2024-11-08 01:06:52,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.5, 300 sec: 6623.0). Total num frames: 65458176. Throughput: 0: 1719.2. Samples: 11359632. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:06:52,933][41694] Avg episode reward: [(0, '4.307')] +[2024-11-08 01:06:55,519][42004] Updated weights for policy 0, policy_version 15986 (0.0029) +[2024-11-08 01:06:59,380][41694] Fps is (10 sec: 5366.5, 60 sec: 6532.4, 300 sec: 6576.8). Total num frames: 65486848. Throughput: 0: 1656.3. Samples: 11369436. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:06:59,384][41694] Avg episode reward: [(0, '4.355')] +[2024-11-08 01:07:02,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6553.6, 300 sec: 6581.4). Total num frames: 65511424. Throughput: 0: 1638.9. Samples: 11371180. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:07:02,936][41694] Avg episode reward: [(0, '4.252')] +[2024-11-08 01:07:04,406][42004] Updated weights for policy 0, policy_version 15996 (0.0036) +[2024-11-08 01:07:07,932][41694] Fps is (10 sec: 6705.8, 60 sec: 6553.6, 300 sec: 6553.6). Total num frames: 65544192. Throughput: 0: 1599.3. Samples: 11380322. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:07:07,937][41694] Avg episode reward: [(0, '4.246')] +[2024-11-08 01:07:10,214][42004] Updated weights for policy 0, policy_version 16006 (0.0026) +[2024-11-08 01:07:12,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6485.3, 300 sec: 6525.8). Total num frames: 65572864. Throughput: 0: 1643.2. Samples: 11390052. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:07:12,935][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 01:07:16,347][42004] Updated weights for policy 0, policy_version 16016 (0.0036) +[2024-11-08 01:07:17,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6485.3, 300 sec: 6511.9). Total num frames: 65609728. Throughput: 0: 1640.7. Samples: 11395274. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:07:17,934][41694] Avg episode reward: [(0, '4.629')] +[2024-11-08 01:07:22,077][42004] Updated weights for policy 0, policy_version 16026 (0.0029) +[2024-11-08 01:07:22,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6485.3, 300 sec: 6553.6). Total num frames: 65646592. Throughput: 0: 1638.3. Samples: 11406188. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:07:22,933][41694] Avg episode reward: [(0, '4.567')] +[2024-11-08 01:07:27,412][42004] Updated weights for policy 0, policy_version 16036 (0.0025) +[2024-11-08 01:07:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6553.6). Total num frames: 65683456. Throughput: 0: 1660.3. Samples: 11417622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:07:27,934][41694] Avg episode reward: [(0, '4.702')] +[2024-11-08 01:07:33,373][41694] Fps is (10 sec: 6276.5, 60 sec: 6573.5, 300 sec: 6529.9). Total num frames: 65712128. Throughput: 0: 1636.8. Samples: 11423048. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:07:33,374][41694] Avg episode reward: [(0, '4.512')] +[2024-11-08 01:07:34,663][42004] Updated weights for policy 0, policy_version 16046 (0.0025) +[2024-11-08 01:07:37,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6553.6, 300 sec: 6539.7). Total num frames: 65744896. Throughput: 0: 1587.4. Samples: 11431064. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:07:37,934][41694] Avg episode reward: [(0, '4.368')] +[2024-11-08 01:07:37,950][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000016051_65744896.pth... +[2024-11-08 01:07:38,099][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000015673_64196608.pth +[2024-11-08 01:07:40,396][42004] Updated weights for policy 0, policy_version 16056 (0.0026) +[2024-11-08 01:07:42,932][41694] Fps is (10 sec: 7284.6, 60 sec: 6621.8, 300 sec: 6553.7). Total num frames: 65781760. Throughput: 0: 1659.3. Samples: 11441702. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:07:42,937][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 01:07:46,562][42004] Updated weights for policy 0, policy_version 16066 (0.0025) +[2024-11-08 01:07:47,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6485.3, 300 sec: 6539.7). Total num frames: 65814528. Throughput: 0: 1673.6. Samples: 11446492. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:07:47,933][41694] Avg episode reward: [(0, '4.522')] +[2024-11-08 01:07:52,407][42004] Updated weights for policy 0, policy_version 16076 (0.0032) +[2024-11-08 01:07:52,931][41694] Fps is (10 sec: 6553.8, 60 sec: 6485.3, 300 sec: 6525.8). Total num frames: 65847296. Throughput: 0: 1700.6. Samples: 11456848. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:07:52,933][41694] Avg episode reward: [(0, '4.409')] +[2024-11-08 01:07:57,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6785.7, 300 sec: 6567.5). Total num frames: 65884160. Throughput: 0: 1724.8. Samples: 11467668. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:07:57,934][41694] Avg episode reward: [(0, '4.404')] +[2024-11-08 01:07:57,976][42004] Updated weights for policy 0, policy_version 16086 (0.0042) +[2024-11-08 01:08:02,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6567.5). Total num frames: 65916928. Throughput: 0: 1729.5. Samples: 11473100. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:08:02,933][41694] Avg episode reward: [(0, '4.280')] +[2024-11-08 01:08:04,033][42004] Updated weights for policy 0, policy_version 16096 (0.0047) +[2024-11-08 01:08:07,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6621.9, 300 sec: 6525.8). Total num frames: 65941504. Throughput: 0: 1691.2. Samples: 11482294. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:08:07,933][41694] Avg episode reward: [(0, '4.215')] +[2024-11-08 01:08:11,656][42004] Updated weights for policy 0, policy_version 16106 (0.0034) +[2024-11-08 01:08:12,940][41694] Fps is (10 sec: 6138.6, 60 sec: 6757.4, 300 sec: 6525.6). Total num frames: 65978368. Throughput: 0: 1621.2. Samples: 11490592. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:08:12,942][41694] Avg episode reward: [(0, '4.403')] +[2024-11-08 01:08:17,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6621.9, 300 sec: 6498.1). Total num frames: 66007040. Throughput: 0: 1626.7. Samples: 11495532. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:08:17,934][41694] Avg episode reward: [(0, '4.482')] +[2024-11-08 01:08:17,995][42004] Updated weights for policy 0, policy_version 16116 (0.0029) +[2024-11-08 01:08:22,932][41694] Fps is (10 sec: 6149.4, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 66039808. Throughput: 0: 1641.7. Samples: 11504940. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:08:22,933][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 01:08:24,358][42004] Updated weights for policy 0, policy_version 16126 (0.0031) +[2024-11-08 01:08:27,932][41694] Fps is (10 sec: 6962.8, 60 sec: 6553.5, 300 sec: 6470.3). Total num frames: 66076672. Throughput: 0: 1639.1. Samples: 11515460. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:08:27,938][41694] Avg episode reward: [(0, '4.340')] +[2024-11-08 01:08:30,142][42004] Updated weights for policy 0, policy_version 16136 (0.0034) +[2024-11-08 01:08:32,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6739.7, 300 sec: 6511.9). Total num frames: 66113536. Throughput: 0: 1646.7. Samples: 11520596. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:08:32,934][41694] Avg episode reward: [(0, '4.362')] +[2024-11-08 01:08:35,608][42004] Updated weights for policy 0, policy_version 16146 (0.0032) +[2024-11-08 01:08:37,932][41694] Fps is (10 sec: 7373.2, 60 sec: 6758.4, 300 sec: 6511.9). Total num frames: 66150400. Throughput: 0: 1664.2. Samples: 11531738. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:08:37,933][41694] Avg episode reward: [(0, '4.274')] +[2024-11-08 01:08:42,816][42004] Updated weights for policy 0, policy_version 16156 (0.0030) +[2024-11-08 01:08:42,931][41694] Fps is (10 sec: 6144.2, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 66174976. Throughput: 0: 1603.2. Samples: 11539812. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:08:42,933][41694] Avg episode reward: [(0, '4.281')] +[2024-11-08 01:08:47,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6621.9, 300 sec: 6484.2). Total num frames: 66211840. Throughput: 0: 1602.3. Samples: 11545202. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:08:47,934][41694] Avg episode reward: [(0, '4.493')] +[2024-11-08 01:08:48,360][42004] Updated weights for policy 0, policy_version 16166 (0.0026) +[2024-11-08 01:08:52,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6621.8, 300 sec: 6484.2). Total num frames: 66244608. Throughput: 0: 1638.6. Samples: 11556030. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:08:52,934][41694] Avg episode reward: [(0, '4.530')] +[2024-11-08 01:08:54,474][42004] Updated weights for policy 0, policy_version 16176 (0.0031) +[2024-11-08 01:08:57,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6621.9, 300 sec: 6498.1). Total num frames: 66281472. Throughput: 0: 1682.1. Samples: 11566274. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:08:57,936][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 01:09:00,267][42004] Updated weights for policy 0, policy_version 16186 (0.0039) +[2024-11-08 01:09:02,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.8, 300 sec: 6536.1). Total num frames: 66314240. Throughput: 0: 1691.3. Samples: 11571642. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:09:02,934][41694] Avg episode reward: [(0, '4.458')] +[2024-11-08 01:09:06,095][42004] Updated weights for policy 0, policy_version 16196 (0.0033) +[2024-11-08 01:09:07,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6553.6). Total num frames: 66351104. Throughput: 0: 1715.8. Samples: 11582150. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:09:07,934][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 01:09:11,639][42004] Updated weights for policy 0, policy_version 16206 (0.0028) +[2024-11-08 01:09:12,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6827.6, 300 sec: 6567.5). Total num frames: 66387968. Throughput: 0: 1725.6. Samples: 11593110. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:09:12,935][41694] Avg episode reward: [(0, '4.481')] +[2024-11-08 01:09:17,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6758.4, 300 sec: 6539.7). Total num frames: 66412544. Throughput: 0: 1667.3. Samples: 11595626. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:09:17,934][41694] Avg episode reward: [(0, '4.529')] +[2024-11-08 01:09:18,949][42004] Updated weights for policy 0, policy_version 16216 (0.0028) +[2024-11-08 01:09:22,931][41694] Fps is (10 sec: 6144.3, 60 sec: 6826.7, 300 sec: 6539.7). Total num frames: 66449408. Throughput: 0: 1658.6. Samples: 11606376. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:09:22,933][41694] Avg episode reward: [(0, '4.469')] +[2024-11-08 01:09:24,789][42004] Updated weights for policy 0, policy_version 16226 (0.0035) +[2024-11-08 01:09:27,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6525.8). Total num frames: 66482176. Throughput: 0: 1705.0. Samples: 11616538. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:09:27,934][41694] Avg episode reward: [(0, '4.443')] +[2024-11-08 01:09:30,794][42004] Updated weights for policy 0, policy_version 16236 (0.0034) +[2024-11-08 01:09:32,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6690.2, 300 sec: 6525.8). Total num frames: 66514944. Throughput: 0: 1702.0. Samples: 11621794. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:09:32,933][41694] Avg episode reward: [(0, '4.411')] +[2024-11-08 01:09:36,210][42004] Updated weights for policy 0, policy_version 16246 (0.0031) +[2024-11-08 01:09:37,933][41694] Fps is (10 sec: 6963.0, 60 sec: 6690.1, 300 sec: 6570.3). Total num frames: 66551808. Throughput: 0: 1710.8. Samples: 11633018. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:09:37,935][41694] Avg episode reward: [(0, '4.351')] +[2024-11-08 01:09:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000016249_66555904.pth... +[2024-11-08 01:09:38,057][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000015860_64962560.pth +[2024-11-08 01:09:41,913][42004] Updated weights for policy 0, policy_version 16256 (0.0044) +[2024-11-08 01:09:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6581.4). Total num frames: 66588672. Throughput: 0: 1724.9. Samples: 11643896. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:09:42,933][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 01:09:49,066][41694] Fps is (10 sec: 6253.9, 60 sec: 6700.0, 300 sec: 6570.0). Total num frames: 66621440. Throughput: 0: 1681.2. Samples: 11649202. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:09:49,068][41694] Avg episode reward: [(0, '4.395')] +[2024-11-08 01:09:49,249][42004] Updated weights for policy 0, policy_version 16266 (0.0030) +[2024-11-08 01:09:52,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6758.4, 300 sec: 6581.4). Total num frames: 66650112. Throughput: 0: 1671.2. Samples: 11657354. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:09:52,933][41694] Avg episode reward: [(0, '4.365')] +[2024-11-08 01:09:54,771][42004] Updated weights for policy 0, policy_version 16276 (0.0036) +[2024-11-08 01:09:57,932][41694] Fps is (10 sec: 7392.3, 60 sec: 6758.4, 300 sec: 6595.2). Total num frames: 66686976. Throughput: 0: 1673.7. Samples: 11668424. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:09:57,935][41694] Avg episode reward: [(0, '4.386')] +[2024-11-08 01:10:01,114][42004] Updated weights for policy 0, policy_version 16286 (0.0032) +[2024-11-08 01:10:02,932][41694] Fps is (10 sec: 6553.3, 60 sec: 6690.1, 300 sec: 6567.5). Total num frames: 66715648. Throughput: 0: 1713.5. Samples: 11672736. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:10:02,935][41694] Avg episode reward: [(0, '4.440')] +[2024-11-08 01:10:07,222][42004] Updated weights for policy 0, policy_version 16296 (0.0029) +[2024-11-08 01:10:07,931][41694] Fps is (10 sec: 6553.8, 60 sec: 6690.1, 300 sec: 6581.4). Total num frames: 66752512. Throughput: 0: 1687.2. Samples: 11682298. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:10:07,933][41694] Avg episode reward: [(0, '4.164')] +[2024-11-08 01:10:12,932][41694] Fps is (10 sec: 6963.6, 60 sec: 6621.9, 300 sec: 6636.9). Total num frames: 66785280. Throughput: 0: 1700.4. Samples: 11693056. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:10:12,933][41694] Avg episode reward: [(0, '4.374')] +[2024-11-08 01:10:13,116][42004] Updated weights for policy 0, policy_version 16306 (0.0024) +[2024-11-08 01:10:17,932][41694] Fps is (10 sec: 6553.1, 60 sec: 6758.4, 300 sec: 6650.8). Total num frames: 66818048. Throughput: 0: 1684.6. Samples: 11697602. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:10:17,934][41694] Avg episode reward: [(0, '4.300')] +[2024-11-08 01:10:19,352][42004] Updated weights for policy 0, policy_version 16316 (0.0024) +[2024-11-08 01:10:23,024][41694] Fps is (10 sec: 5682.1, 60 sec: 6543.5, 300 sec: 6607.1). Total num frames: 66842624. Throughput: 0: 1661.0. Samples: 11707916. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:10:23,026][41694] Avg episode reward: [(0, '4.307')] +[2024-11-08 01:10:26,594][42004] Updated weights for policy 0, policy_version 16326 (0.0027) +[2024-11-08 01:10:27,932][41694] Fps is (10 sec: 6144.3, 60 sec: 6621.9, 300 sec: 6623.0). Total num frames: 66879488. Throughput: 0: 1601.9. Samples: 11715984. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:10:27,936][41694] Avg episode reward: [(0, '4.403')] +[2024-11-08 01:10:32,932][41694] Fps is (10 sec: 6614.3, 60 sec: 6553.5, 300 sec: 6595.2). Total num frames: 66908160. Throughput: 0: 1628.6. Samples: 11720642. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:10:32,934][41694] Avg episode reward: [(0, '4.402')] +[2024-11-08 01:10:33,243][42004] Updated weights for policy 0, policy_version 16336 (0.0026) +[2024-11-08 01:10:37,932][41694] Fps is (10 sec: 6143.7, 60 sec: 6485.3, 300 sec: 6595.2). Total num frames: 66940928. Throughput: 0: 1612.2. Samples: 11729906. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:10:37,934][41694] Avg episode reward: [(0, '4.472')] +[2024-11-08 01:10:39,510][42004] Updated weights for policy 0, policy_version 16346 (0.0048) +[2024-11-08 01:10:42,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6485.3, 300 sec: 6595.3). Total num frames: 66977792. Throughput: 0: 1598.9. Samples: 11740376. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:10:42,933][41694] Avg episode reward: [(0, '4.510')] +[2024-11-08 01:10:45,367][42004] Updated weights for policy 0, policy_version 16356 (0.0041) +[2024-11-08 01:10:47,932][41694] Fps is (10 sec: 6963.6, 60 sec: 6610.3, 300 sec: 6636.9). Total num frames: 67010560. Throughput: 0: 1617.8. Samples: 11745534. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:10:47,935][41694] Avg episode reward: [(0, '4.441')] +[2024-11-08 01:10:50,887][42004] Updated weights for policy 0, policy_version 16366 (0.0021) +[2024-11-08 01:10:52,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 67047424. Throughput: 0: 1652.8. Samples: 11756674. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:10:52,935][41694] Avg episode reward: [(0, '4.456')] +[2024-11-08 01:10:57,932][41694] Fps is (10 sec: 6143.8, 60 sec: 6417.0, 300 sec: 6623.0). Total num frames: 67072000. Throughput: 0: 1592.3. Samples: 11764708. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:10:57,934][41694] Avg episode reward: [(0, '4.482')] +[2024-11-08 01:10:58,018][42004] Updated weights for policy 0, policy_version 16376 (0.0037) +[2024-11-08 01:11:02,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6553.7, 300 sec: 6636.9). Total num frames: 67108864. Throughput: 0: 1615.8. Samples: 11770310. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:11:02,934][41694] Avg episode reward: [(0, '4.429')] +[2024-11-08 01:11:03,825][42004] Updated weights for policy 0, policy_version 16386 (0.0033) +[2024-11-08 01:11:07,940][41694] Fps is (10 sec: 6957.8, 60 sec: 6484.4, 300 sec: 6636.7). Total num frames: 67141632. Throughput: 0: 1620.7. Samples: 11780712. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:11:07,944][41694] Avg episode reward: [(0, '4.282')] +[2024-11-08 01:11:10,053][42004] Updated weights for policy 0, policy_version 16396 (0.0055) +[2024-11-08 01:11:12,937][41694] Fps is (10 sec: 6550.1, 60 sec: 6484.8, 300 sec: 6622.9). Total num frames: 67174400. Throughput: 0: 1640.0. Samples: 11789794. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:11:12,939][41694] Avg episode reward: [(0, '4.421')] +[2024-11-08 01:11:16,204][42004] Updated weights for policy 0, policy_version 16406 (0.0044) +[2024-11-08 01:11:17,932][41694] Fps is (10 sec: 6558.8, 60 sec: 6485.4, 300 sec: 6609.1). Total num frames: 67207168. Throughput: 0: 1652.4. Samples: 11794998. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:11:17,933][41694] Avg episode reward: [(0, '4.412')] +[2024-11-08 01:11:21,862][42004] Updated weights for policy 0, policy_version 16416 (0.0028) +[2024-11-08 01:11:22,932][41694] Fps is (10 sec: 7376.4, 60 sec: 6768.8, 300 sec: 6678.6). Total num frames: 67248128. Throughput: 0: 1691.8. Samples: 11806036. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:11:22,935][41694] Avg episode reward: [(0, '4.468')] +[2024-11-08 01:11:27,610][42004] Updated weights for policy 0, policy_version 16426 (0.0029) +[2024-11-08 01:11:27,932][41694] Fps is (10 sec: 7372.4, 60 sec: 6690.0, 300 sec: 6664.7). Total num frames: 67280896. Throughput: 0: 1699.7. Samples: 11816864. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:11:27,934][41694] Avg episode reward: [(0, '4.495')] +[2024-11-08 01:11:32,931][41694] Fps is (10 sec: 5734.6, 60 sec: 6621.9, 300 sec: 6623.0). Total num frames: 67305472. Throughput: 0: 1670.5. Samples: 11820706. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:11:32,934][41694] Avg episode reward: [(0, '4.518')] +[2024-11-08 01:11:34,846][42004] Updated weights for policy 0, policy_version 16436 (0.0029) +[2024-11-08 01:11:37,931][41694] Fps is (10 sec: 6144.5, 60 sec: 6690.2, 300 sec: 6636.9). Total num frames: 67342336. Throughput: 0: 1635.9. Samples: 11830288. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:11:37,933][41694] Avg episode reward: [(0, '4.440')] +[2024-11-08 01:11:37,945][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000016441_67342336.pth... +[2024-11-08 01:11:38,064][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000016051_65744896.pth +[2024-11-08 01:11:40,738][42004] Updated weights for policy 0, policy_version 16446 (0.0028) +[2024-11-08 01:11:42,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6621.8, 300 sec: 6609.1). Total num frames: 67375104. Throughput: 0: 1674.4. Samples: 11840056. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:11:42,934][41694] Avg episode reward: [(0, '4.364')] +[2024-11-08 01:11:47,255][42004] Updated weights for policy 0, policy_version 16456 (0.0034) +[2024-11-08 01:11:47,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6621.9, 300 sec: 6609.1). Total num frames: 67407872. Throughput: 0: 1647.4. Samples: 11844442. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:11:47,933][41694] Avg episode reward: [(0, '4.518')] +[2024-11-08 01:11:52,576][42004] Updated weights for policy 0, policy_version 16466 (0.0034) +[2024-11-08 01:11:52,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 6669.7). Total num frames: 67444736. Throughput: 0: 1667.1. Samples: 11855720. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:11:52,934][41694] Avg episode reward: [(0, '4.510')] +[2024-11-08 01:11:57,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6826.7, 300 sec: 6678.6). Total num frames: 67481600. Throughput: 0: 1710.0. Samples: 11866734. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:11:57,934][41694] Avg episode reward: [(0, '4.394')] +[2024-11-08 01:11:58,101][42004] Updated weights for policy 0, policy_version 16476 (0.0032) +[2024-11-08 01:12:02,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.6, 300 sec: 6692.4). Total num frames: 67518464. Throughput: 0: 1722.6. Samples: 11872516. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:12:02,934][41694] Avg episode reward: [(0, '4.615')] +[2024-11-08 01:12:05,525][42004] Updated weights for policy 0, policy_version 16486 (0.0031) +[2024-11-08 01:12:07,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6691.1, 300 sec: 6678.6). Total num frames: 67543040. Throughput: 0: 1642.4. Samples: 11879944. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:12:07,933][41694] Avg episode reward: [(0, '4.725')] +[2024-11-08 01:12:11,058][42004] Updated weights for policy 0, policy_version 16496 (0.0027) +[2024-11-08 01:12:12,932][41694] Fps is (10 sec: 6143.7, 60 sec: 6758.9, 300 sec: 6678.6). Total num frames: 67579904. Throughput: 0: 1651.0. Samples: 11891158. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:12:12,935][41694] Avg episode reward: [(0, '4.526')] +[2024-11-08 01:12:17,058][42004] Updated weights for policy 0, policy_version 16506 (0.0047) +[2024-11-08 01:12:17,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 67612672. Throughput: 0: 1674.6. Samples: 11896064. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:12:17,933][41694] Avg episode reward: [(0, '4.553')] +[2024-11-08 01:12:22,800][42004] Updated weights for policy 0, policy_version 16516 (0.0037) +[2024-11-08 01:12:22,931][41694] Fps is (10 sec: 6963.7, 60 sec: 6690.2, 300 sec: 6664.7). Total num frames: 67649536. Throughput: 0: 1696.0. Samples: 11906610. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:12:22,933][41694] Avg episode reward: [(0, '4.368')] +[2024-11-08 01:12:27,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.5, 300 sec: 6702.5). Total num frames: 67686400. Throughput: 0: 1725.9. Samples: 11917722. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:12:27,934][41694] Avg episode reward: [(0, '4.270')] +[2024-11-08 01:12:28,412][42004] Updated weights for policy 0, policy_version 16526 (0.0024) +[2024-11-08 01:12:32,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6706.3). Total num frames: 67723264. Throughput: 0: 1748.9. Samples: 11923142. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:12:32,933][41694] Avg episode reward: [(0, '4.314')] +[2024-11-08 01:12:34,000][42004] Updated weights for policy 0, policy_version 16536 (0.0036) +[2024-11-08 01:12:38,822][41694] Fps is (10 sec: 6017.5, 60 sec: 6726.8, 300 sec: 6658.5). Total num frames: 67751936. Throughput: 0: 1711.4. Samples: 11934256. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:12:38,824][41694] Avg episode reward: [(0, '4.297')] +[2024-11-08 01:12:41,102][42004] Updated weights for policy 0, policy_version 16546 (0.0025) +[2024-11-08 01:12:42,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6826.7, 300 sec: 6678.6). Total num frames: 67784704. Throughput: 0: 1676.5. Samples: 11942174. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:12:42,933][41694] Avg episode reward: [(0, '4.405')] +[2024-11-08 01:12:46,801][42004] Updated weights for policy 0, policy_version 16556 (0.0034) +[2024-11-08 01:12:47,932][41694] Fps is (10 sec: 7194.4, 60 sec: 6826.6, 300 sec: 6678.6). Total num frames: 67817472. Throughput: 0: 1668.2. Samples: 11947586. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:12:47,934][41694] Avg episode reward: [(0, '4.324')] +[2024-11-08 01:12:52,932][41694] Fps is (10 sec: 6553.1, 60 sec: 6758.3, 300 sec: 6664.7). Total num frames: 67850240. Throughput: 0: 1725.8. Samples: 11957606. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:12:52,934][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 01:12:53,180][42004] Updated weights for policy 0, policy_version 16566 (0.0028) +[2024-11-08 01:12:57,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6692.4). Total num frames: 67891200. Throughput: 0: 1722.0. Samples: 11968648. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:12:57,933][41694] Avg episode reward: [(0, '4.276')] +[2024-11-08 01:12:58,420][42004] Updated weights for policy 0, policy_version 16576 (0.0024) +[2024-11-08 01:13:02,932][41694] Fps is (10 sec: 7373.3, 60 sec: 6758.4, 300 sec: 6720.2). Total num frames: 67923968. Throughput: 0: 1742.9. Samples: 11974496. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:13:02,933][41694] Avg episode reward: [(0, '4.596')] +[2024-11-08 01:13:04,509][42004] Updated weights for policy 0, policy_version 16586 (0.0029) +[2024-11-08 01:13:07,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6963.2, 300 sec: 6720.4). Total num frames: 67960832. Throughput: 0: 1726.5. Samples: 11984304. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:13:07,935][41694] Avg episode reward: [(0, '4.378')] +[2024-11-08 01:13:10,030][42004] Updated weights for policy 0, policy_version 16596 (0.0035) +[2024-11-08 01:13:12,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6758.5, 300 sec: 6706.3). Total num frames: 67985408. Throughput: 0: 1689.6. Samples: 11993752. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:13:12,933][41694] Avg episode reward: [(0, '4.386')] +[2024-11-08 01:13:17,235][42004] Updated weights for policy 0, policy_version 16606 (0.0043) +[2024-11-08 01:13:17,932][41694] Fps is (10 sec: 6144.1, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 68022272. Throughput: 0: 1657.9. Samples: 11997746. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:13:17,933][41694] Avg episode reward: [(0, '4.361')] +[2024-11-08 01:13:22,860][42004] Updated weights for policy 0, policy_version 16616 (0.0027) +[2024-11-08 01:13:22,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 68059136. Throughput: 0: 1697.9. Samples: 12009150. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:13:22,933][41694] Avg episode reward: [(0, '4.273')] +[2024-11-08 01:13:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 68096000. Throughput: 0: 1727.2. Samples: 12019900. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:13:27,933][41694] Avg episode reward: [(0, '4.299')] +[2024-11-08 01:13:28,367][42004] Updated weights for policy 0, policy_version 16626 (0.0039) +[2024-11-08 01:13:32,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 68132864. Throughput: 0: 1732.9. Samples: 12025568. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:13:32,933][41694] Avg episode reward: [(0, '4.358')] +[2024-11-08 01:13:33,731][42004] Updated weights for policy 0, policy_version 16636 (0.0029) +[2024-11-08 01:13:37,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7068.1, 300 sec: 6761.9). Total num frames: 68169728. Throughput: 0: 1760.3. Samples: 12036820. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:13:37,934][41694] Avg episode reward: [(0, '4.539')] +[2024-11-08 01:13:37,951][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000016643_68169728.pth... +[2024-11-08 01:13:38,061][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000016249_66555904.pth +[2024-11-08 01:13:39,284][42004] Updated weights for policy 0, policy_version 16646 (0.0024) +[2024-11-08 01:13:42,933][41694] Fps is (10 sec: 7371.9, 60 sec: 7031.3, 300 sec: 6761.8). Total num frames: 68206592. Throughput: 0: 1765.3. Samples: 12048090. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:13:42,936][41694] Avg episode reward: [(0, '4.624')] +[2024-11-08 01:13:44,816][42004] Updated weights for policy 0, policy_version 16656 (0.0032) +[2024-11-08 01:13:47,934][41694] Fps is (10 sec: 6552.7, 60 sec: 6963.0, 300 sec: 6748.0). Total num frames: 68235264. Throughput: 0: 1752.6. Samples: 12053366. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:13:47,937][41694] Avg episode reward: [(0, '4.251')] +[2024-11-08 01:13:51,591][42004] Updated weights for policy 0, policy_version 16666 (0.0029) +[2024-11-08 01:13:52,932][41694] Fps is (10 sec: 6554.3, 60 sec: 7031.5, 300 sec: 6748.0). Total num frames: 68272128. Throughput: 0: 1737.0. Samples: 12062468. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:13:52,934][41694] Avg episode reward: [(0, '4.195')] +[2024-11-08 01:13:57,246][42004] Updated weights for policy 0, policy_version 16676 (0.0029) +[2024-11-08 01:13:57,932][41694] Fps is (10 sec: 7373.8, 60 sec: 6963.2, 300 sec: 6761.9). Total num frames: 68308992. Throughput: 0: 1766.3. Samples: 12073234. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:13:57,934][41694] Avg episode reward: [(0, '4.431')] +[2024-11-08 01:14:02,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6963.2, 300 sec: 6748.0). Total num frames: 68341760. Throughput: 0: 1795.1. Samples: 12078524. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:14:02,933][41694] Avg episode reward: [(0, '4.520')] +[2024-11-08 01:14:03,238][42004] Updated weights for policy 0, policy_version 16686 (0.0027) +[2024-11-08 01:14:07,932][41694] Fps is (10 sec: 7372.9, 60 sec: 7031.5, 300 sec: 6761.9). Total num frames: 68382720. Throughput: 0: 1788.0. Samples: 12089612. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:14:07,934][41694] Avg episode reward: [(0, '4.676')] +[2024-11-08 01:14:08,277][42004] Updated weights for policy 0, policy_version 16696 (0.0021) +[2024-11-08 01:14:12,932][41694] Fps is (10 sec: 7781.8, 60 sec: 7236.2, 300 sec: 6803.5). Total num frames: 68419584. Throughput: 0: 1816.1. Samples: 12101628. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:14:12,936][41694] Avg episode reward: [(0, '4.646')] +[2024-11-08 01:14:13,448][42004] Updated weights for policy 0, policy_version 16706 (0.0030) +[2024-11-08 01:14:17,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7304.5, 300 sec: 6817.4). Total num frames: 68460544. Throughput: 0: 1820.8. Samples: 12107502. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:14:17,933][41694] Avg episode reward: [(0, '4.348')] +[2024-11-08 01:14:20,692][42004] Updated weights for policy 0, policy_version 16716 (0.0030) +[2024-11-08 01:14:22,932][41694] Fps is (10 sec: 6554.0, 60 sec: 7099.7, 300 sec: 6789.6). Total num frames: 68485120. Throughput: 0: 1742.2. Samples: 12115220. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:14:22,934][41694] Avg episode reward: [(0, '4.227')] +[2024-11-08 01:14:26,109][42004] Updated weights for policy 0, policy_version 16726 (0.0033) +[2024-11-08 01:14:27,932][41694] Fps is (10 sec: 6143.9, 60 sec: 7099.7, 300 sec: 6803.5). Total num frames: 68521984. Throughput: 0: 1748.4. Samples: 12126766. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:14:27,935][41694] Avg episode reward: [(0, '4.469')] +[2024-11-08 01:14:32,259][42004] Updated weights for policy 0, policy_version 16736 (0.0045) +[2024-11-08 01:14:32,931][41694] Fps is (10 sec: 6963.3, 60 sec: 7031.5, 300 sec: 6789.7). Total num frames: 68554752. Throughput: 0: 1729.5. Samples: 12131192. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:14:32,933][41694] Avg episode reward: [(0, '4.338')] +[2024-11-08 01:14:37,491][42004] Updated weights for policy 0, policy_version 16746 (0.0025) +[2024-11-08 01:14:37,931][41694] Fps is (10 sec: 6963.3, 60 sec: 7031.5, 300 sec: 6789.6). Total num frames: 68591616. Throughput: 0: 1780.6. Samples: 12142596. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:14:37,934][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 01:14:42,934][41694] Fps is (10 sec: 7371.3, 60 sec: 7031.4, 300 sec: 6829.7). Total num frames: 68628480. Throughput: 0: 1795.7. Samples: 12154044. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:14:42,936][41694] Avg episode reward: [(0, '4.407')] +[2024-11-08 01:14:43,002][42004] Updated weights for policy 0, policy_version 16756 (0.0039) +[2024-11-08 01:14:47,932][41694] Fps is (10 sec: 7372.5, 60 sec: 7168.1, 300 sec: 6831.3). Total num frames: 68665344. Throughput: 0: 1780.9. Samples: 12158666. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:14:47,933][41694] Avg episode reward: [(0, '4.302')] +[2024-11-08 01:14:48,988][42004] Updated weights for policy 0, policy_version 16766 (0.0034) +[2024-11-08 01:14:54,217][41694] Fps is (10 sec: 5807.9, 60 sec: 6883.9, 300 sec: 6774.0). Total num frames: 68694016. Throughput: 0: 1729.9. Samples: 12169682. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:14:54,219][41694] Avg episode reward: [(0, '4.456')] +[2024-11-08 01:14:56,672][42004] Updated weights for policy 0, policy_version 16776 (0.0028) +[2024-11-08 01:14:57,931][41694] Fps is (10 sec: 5734.7, 60 sec: 6895.0, 300 sec: 6803.5). Total num frames: 68722688. Throughput: 0: 1674.9. Samples: 12176998. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:14:57,933][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 01:15:02,847][42004] Updated weights for policy 0, policy_version 16786 (0.0033) +[2024-11-08 01:15:02,931][41694] Fps is (10 sec: 7050.7, 60 sec: 6894.9, 300 sec: 6789.6). Total num frames: 68755456. Throughput: 0: 1661.6. Samples: 12182272. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:15:02,933][41694] Avg episode reward: [(0, '4.534')] +[2024-11-08 01:15:07,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 68788224. Throughput: 0: 1693.8. Samples: 12191440. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:15:07,933][41694] Avg episode reward: [(0, '4.476')] +[2024-11-08 01:15:08,861][42004] Updated weights for policy 0, policy_version 16796 (0.0031) +[2024-11-08 01:15:12,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.5, 300 sec: 6803.5). Total num frames: 68825088. Throughput: 0: 1678.8. Samples: 12202310. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:15:12,934][41694] Avg episode reward: [(0, '4.395')] +[2024-11-08 01:15:14,980][42004] Updated weights for policy 0, policy_version 16806 (0.0037) +[2024-11-08 01:15:17,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6819.5). Total num frames: 68853760. Throughput: 0: 1682.8. Samples: 12206918. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:15:17,935][41694] Avg episode reward: [(0, '4.501')] +[2024-11-08 01:15:20,971][42004] Updated weights for policy 0, policy_version 16816 (0.0037) +[2024-11-08 01:15:22,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 68890624. Throughput: 0: 1657.8. Samples: 12217198. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:15:22,935][41694] Avg episode reward: [(0, '4.459')] +[2024-11-08 01:15:28,164][42004] Updated weights for policy 0, policy_version 16826 (0.0032) +[2024-11-08 01:15:28,165][41694] Fps is (10 sec: 6404.2, 60 sec: 6596.2, 300 sec: 6812.0). Total num frames: 68919296. Throughput: 0: 1522.9. Samples: 12222928. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:15:28,172][41694] Avg episode reward: [(0, '4.545')] +[2024-11-08 01:15:32,934][41694] Fps is (10 sec: 6142.5, 60 sec: 6621.6, 300 sec: 6817.4). Total num frames: 68952064. Throughput: 0: 1602.9. Samples: 12230802. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:15:32,936][41694] Avg episode reward: [(0, '4.604')] +[2024-11-08 01:15:33,537][42004] Updated weights for policy 0, policy_version 16836 (0.0034) +[2024-11-08 01:15:37,932][41694] Fps is (10 sec: 7129.5, 60 sec: 6621.9, 300 sec: 6817.4). Total num frames: 68988928. Throughput: 0: 1651.5. Samples: 12241878. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:15:37,934][41694] Avg episode reward: [(0, '4.569')] +[2024-11-08 01:15:37,948][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000016843_68988928.pth... +[2024-11-08 01:15:38,164][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000016441_67342336.pth +[2024-11-08 01:15:39,930][42004] Updated weights for policy 0, policy_version 16846 (0.0030) +[2024-11-08 01:15:42,932][41694] Fps is (10 sec: 6964.9, 60 sec: 6553.8, 300 sec: 6817.4). Total num frames: 69021696. Throughput: 0: 1661.2. Samples: 12251750. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:15:42,935][41694] Avg episode reward: [(0, '4.537')] +[2024-11-08 01:15:45,311][42004] Updated weights for policy 0, policy_version 16856 (0.0033) +[2024-11-08 01:15:47,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6553.6, 300 sec: 6817.4). Total num frames: 69058560. Throughput: 0: 1672.8. Samples: 12257548. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:15:47,935][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 01:15:50,785][42004] Updated weights for policy 0, policy_version 16866 (0.0029) +[2024-11-08 01:15:52,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6836.7, 300 sec: 6859.1). Total num frames: 69095424. Throughput: 0: 1717.9. Samples: 12268744. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:15:52,933][41694] Avg episode reward: [(0, '4.355')] +[2024-11-08 01:15:56,178][42004] Updated weights for policy 0, policy_version 16876 (0.0030) +[2024-11-08 01:15:57,932][41694] Fps is (10 sec: 7373.1, 60 sec: 6826.7, 300 sec: 6859.1). Total num frames: 69132288. Throughput: 0: 1725.2. Samples: 12279944. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:15:57,934][41694] Avg episode reward: [(0, '4.496')] +[2024-11-08 01:16:02,932][41694] Fps is (10 sec: 6143.4, 60 sec: 6690.0, 300 sec: 6831.5). Total num frames: 69156864. Throughput: 0: 1745.8. Samples: 12285480. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:16:02,935][41694] Avg episode reward: [(0, '4.325')] +[2024-11-08 01:16:03,779][42004] Updated weights for policy 0, policy_version 16886 (0.0032) +[2024-11-08 01:16:07,933][41694] Fps is (10 sec: 6143.3, 60 sec: 6758.3, 300 sec: 6845.3). Total num frames: 69193728. Throughput: 0: 1681.7. Samples: 12292878. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:16:07,935][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 01:16:09,367][42004] Updated weights for policy 0, policy_version 16896 (0.0031) +[2024-11-08 01:16:12,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6621.8, 300 sec: 6831.3). Total num frames: 69222400. Throughput: 0: 1778.7. Samples: 12302554. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:16:12,936][41694] Avg episode reward: [(0, '4.277')] +[2024-11-08 01:16:15,994][42004] Updated weights for policy 0, policy_version 16906 (0.0035) +[2024-11-08 01:16:17,932][41694] Fps is (10 sec: 6554.3, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 69259264. Throughput: 0: 1701.3. Samples: 12307358. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 01:16:17,935][41694] Avg episode reward: [(0, '4.294')] +[2024-11-08 01:16:21,462][42004] Updated weights for policy 0, policy_version 16916 (0.0037) +[2024-11-08 01:16:22,931][41694] Fps is (10 sec: 7373.4, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 69296128. Throughput: 0: 1707.3. Samples: 12318708. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 01:16:22,933][41694] Avg episode reward: [(0, '4.301')] +[2024-11-08 01:16:26,655][42004] Updated weights for policy 0, policy_version 16926 (0.0032) +[2024-11-08 01:16:27,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6990.3, 300 sec: 6886.8). Total num frames: 69337088. Throughput: 0: 1745.4. Samples: 12330292. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 01:16:27,934][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 01:16:32,129][42004] Updated weights for policy 0, policy_version 16936 (0.0034) +[2024-11-08 01:16:32,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7031.7, 300 sec: 6886.8). Total num frames: 69373952. Throughput: 0: 1740.5. Samples: 12335868. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:16:32,933][41694] Avg episode reward: [(0, '4.376')] +[2024-11-08 01:16:37,931][41694] Fps is (10 sec: 6144.4, 60 sec: 6826.7, 300 sec: 6859.1). Total num frames: 69398528. Throughput: 0: 1690.7. Samples: 12344826. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:16:37,934][41694] Avg episode reward: [(0, '4.588')] +[2024-11-08 01:16:39,346][42004] Updated weights for policy 0, policy_version 16946 (0.0036) +[2024-11-08 01:16:42,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6894.9, 300 sec: 6872.9). Total num frames: 69435392. Throughput: 0: 1676.6. Samples: 12355390. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:16:42,933][41694] Avg episode reward: [(0, '4.645')] +[2024-11-08 01:16:45,055][42004] Updated weights for policy 0, policy_version 16956 (0.0033) +[2024-11-08 01:16:47,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6826.7, 300 sec: 6859.1). Total num frames: 69468160. Throughput: 0: 1666.4. Samples: 12360468. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:16:47,934][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 01:16:50,594][42004] Updated weights for policy 0, policy_version 16966 (0.0024) +[2024-11-08 01:16:52,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6894.9, 300 sec: 6872.9). Total num frames: 69509120. Throughput: 0: 1747.4. Samples: 12371508. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:16:52,934][41694] Avg episode reward: [(0, '4.437')] +[2024-11-08 01:16:56,074][42004] Updated weights for policy 0, policy_version 16976 (0.0030) +[2024-11-08 01:16:57,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6894.9, 300 sec: 6872.9). Total num frames: 69545984. Throughput: 0: 1775.5. Samples: 12382450. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:16:57,933][41694] Avg episode reward: [(0, '4.365')] +[2024-11-08 01:17:02,070][42004] Updated weights for policy 0, policy_version 16986 (0.0029) +[2024-11-08 01:17:02,932][41694] Fps is (10 sec: 6963.2, 60 sec: 7031.6, 300 sec: 6900.7). Total num frames: 69578752. Throughput: 0: 1796.1. Samples: 12388182. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:17:02,933][41694] Avg episode reward: [(0, '4.391')] +[2024-11-08 01:17:07,423][42004] Updated weights for policy 0, policy_version 16996 (0.0033) +[2024-11-08 01:17:07,931][41694] Fps is (10 sec: 6963.3, 60 sec: 7031.6, 300 sec: 6900.7). Total num frames: 69615616. Throughput: 0: 1781.0. Samples: 12398852. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:17:07,933][41694] Avg episode reward: [(0, '4.450')] +[2024-11-08 01:17:12,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6963.3, 300 sec: 6872.9). Total num frames: 69640192. Throughput: 0: 1696.4. Samples: 12406630. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:17:12,934][41694] Avg episode reward: [(0, '4.276')] +[2024-11-08 01:17:14,615][42004] Updated weights for policy 0, policy_version 17006 (0.0034) +[2024-11-08 01:17:17,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6963.2, 300 sec: 6872.9). Total num frames: 69677056. Throughput: 0: 1704.0. Samples: 12412550. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:17:17,933][41694] Avg episode reward: [(0, '4.504')] +[2024-11-08 01:17:20,554][42004] Updated weights for policy 0, policy_version 17016 (0.0041) +[2024-11-08 01:17:22,933][41694] Fps is (10 sec: 7371.4, 60 sec: 6963.0, 300 sec: 6872.9). Total num frames: 69713920. Throughput: 0: 1728.7. Samples: 12422622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 01:17:22,938][41694] Avg episode reward: [(0, '4.530')] +[2024-11-08 01:17:26,007][42004] Updated weights for policy 0, policy_version 17026 (0.0023) +[2024-11-08 01:17:27,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6895.0, 300 sec: 6873.0). Total num frames: 69750784. Throughput: 0: 1744.4. Samples: 12433890. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 01:17:27,933][41694] Avg episode reward: [(0, '4.398')] +[2024-11-08 01:17:31,430][42004] Updated weights for policy 0, policy_version 17036 (0.0036) +[2024-11-08 01:17:32,932][41694] Fps is (10 sec: 7374.2, 60 sec: 6894.9, 300 sec: 6921.6). Total num frames: 69787648. Throughput: 0: 1757.6. Samples: 12439562. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:17:32,934][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 01:17:36,837][42004] Updated weights for policy 0, policy_version 17046 (0.0033) +[2024-11-08 01:17:37,932][41694] Fps is (10 sec: 7782.2, 60 sec: 7168.0, 300 sec: 6928.5). Total num frames: 69828608. Throughput: 0: 1768.6. Samples: 12451094. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:17:37,933][41694] Avg episode reward: [(0, '4.433')] +[2024-11-08 01:17:37,944][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000017048_69828608.pth... +[2024-11-08 01:17:38,051][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000016643_68169728.pth +[2024-11-08 01:17:42,313][42004] Updated weights for policy 0, policy_version 17056 (0.0034) +[2024-11-08 01:17:44,322][41694] Fps is (10 sec: 6473.1, 60 sec: 6939.0, 300 sec: 6896.0). Total num frames: 69861376. Throughput: 0: 1722.7. Samples: 12462366. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:17:44,323][41694] Avg episode reward: [(0, '4.218')] +[2024-11-08 01:17:47,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6963.2, 300 sec: 6900.7). Total num frames: 69885952. Throughput: 0: 1694.6. Samples: 12464438. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:17:47,934][41694] Avg episode reward: [(0, '4.199')] +[2024-11-08 01:17:49,774][42004] Updated weights for policy 0, policy_version 17066 (0.0042) +[2024-11-08 01:17:52,931][41694] Fps is (10 sec: 7135.9, 60 sec: 6894.9, 300 sec: 6886.8). Total num frames: 69922816. Throughput: 0: 1705.6. Samples: 12475606. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:17:52,933][41694] Avg episode reward: [(0, '4.323')] +[2024-11-08 01:17:55,108][42004] Updated weights for policy 0, policy_version 17076 (0.0028) +[2024-11-08 01:17:57,932][41694] Fps is (10 sec: 7782.1, 60 sec: 6963.1, 300 sec: 6914.6). Total num frames: 69963776. Throughput: 0: 1787.0. Samples: 12487044. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:17:57,935][41694] Avg episode reward: [(0, '4.511')] +[2024-11-08 01:18:01,000][42004] Updated weights for policy 0, policy_version 17086 (0.0032) +[2024-11-08 01:18:02,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 6886.8). Total num frames: 69992448. Throughput: 0: 1761.4. Samples: 12491812. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:18:02,934][41694] Avg episode reward: [(0, '4.287')] +[2024-11-08 01:18:06,795][42004] Updated weights for policy 0, policy_version 17096 (0.0025) +[2024-11-08 01:18:07,931][41694] Fps is (10 sec: 6963.7, 60 sec: 6963.2, 300 sec: 6942.4). Total num frames: 70033408. Throughput: 0: 1771.7. Samples: 12502344. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:18:07,934][41694] Avg episode reward: [(0, '4.289')] +[2024-11-08 01:18:12,269][42004] Updated weights for policy 0, policy_version 17106 (0.0030) +[2024-11-08 01:18:12,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7168.0, 300 sec: 6942.4). Total num frames: 70070272. Throughput: 0: 1771.6. Samples: 12513614. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:18:12,934][41694] Avg episode reward: [(0, '4.436')] +[2024-11-08 01:18:18,275][41694] Fps is (10 sec: 5939.7, 60 sec: 6923.5, 300 sec: 6892.7). Total num frames: 70094848. Throughput: 0: 1754.5. Samples: 12519118. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:18:18,277][41694] Avg episode reward: [(0, '4.338')] +[2024-11-08 01:18:19,536][42004] Updated weights for policy 0, policy_version 17116 (0.0036) +[2024-11-08 01:18:22,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6963.4, 300 sec: 6900.7). Total num frames: 70131712. Throughput: 0: 1691.6. Samples: 12527216. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:18:22,933][41694] Avg episode reward: [(0, '4.378')] +[2024-11-08 01:18:24,811][42004] Updated weights for policy 0, policy_version 17126 (0.0027) +[2024-11-08 01:18:27,931][41694] Fps is (10 sec: 7635.4, 60 sec: 6963.2, 300 sec: 6900.7). Total num frames: 70168576. Throughput: 0: 1749.0. Samples: 12538640. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:18:27,933][41694] Avg episode reward: [(0, '4.586')] +[2024-11-08 01:18:30,189][42004] Updated weights for policy 0, policy_version 17136 (0.0033) +[2024-11-08 01:18:32,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6900.7). Total num frames: 70205440. Throughput: 0: 1780.0. Samples: 12544538. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:18:32,933][41694] Avg episode reward: [(0, '4.628')] +[2024-11-08 01:18:36,365][42004] Updated weights for policy 0, policy_version 17146 (0.0026) +[2024-11-08 01:18:37,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6886.9). Total num frames: 70238208. Throughput: 0: 1751.3. Samples: 12554416. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:18:37,933][41694] Avg episode reward: [(0, '4.586')] +[2024-11-08 01:18:41,668][42004] Updated weights for policy 0, policy_version 17156 (0.0032) +[2024-11-08 01:18:42,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7128.3, 300 sec: 6928.5). Total num frames: 70279168. Throughput: 0: 1753.8. Samples: 12565966. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:18:42,933][41694] Avg episode reward: [(0, '4.459')] +[2024-11-08 01:18:47,163][42004] Updated weights for policy 0, policy_version 17166 (0.0030) +[2024-11-08 01:18:47,934][41694] Fps is (10 sec: 7780.5, 60 sec: 7167.7, 300 sec: 6928.4). Total num frames: 70316032. Throughput: 0: 1769.6. Samples: 12571448. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:18:47,936][41694] Avg episode reward: [(0, '4.494')] +[2024-11-08 01:18:52,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6963.2, 300 sec: 6886.8). Total num frames: 70340608. Throughput: 0: 1766.4. Samples: 12581834. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:18:52,933][41694] Avg episode reward: [(0, '4.415')] +[2024-11-08 01:18:54,339][42004] Updated weights for policy 0, policy_version 17176 (0.0025) +[2024-11-08 01:18:57,932][41694] Fps is (10 sec: 6555.1, 60 sec: 6963.3, 300 sec: 6914.6). Total num frames: 70381568. Throughput: 0: 1723.2. Samples: 12591156. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:18:57,934][41694] Avg episode reward: [(0, '4.408')] +[2024-11-08 01:18:59,827][42004] Updated weights for policy 0, policy_version 17186 (0.0032) +[2024-11-08 01:19:02,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7031.5, 300 sec: 6886.8). Total num frames: 70414336. Throughput: 0: 1727.2. Samples: 12596246. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:19:02,933][41694] Avg episode reward: [(0, '4.333')] +[2024-11-08 01:19:05,873][42004] Updated weights for policy 0, policy_version 17196 (0.0030) +[2024-11-08 01:19:07,934][41694] Fps is (10 sec: 6552.3, 60 sec: 6894.7, 300 sec: 6872.9). Total num frames: 70447104. Throughput: 0: 1760.1. Samples: 12606426. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:19:07,936][41694] Avg episode reward: [(0, '4.546')] +[2024-11-08 01:19:12,227][42004] Updated weights for policy 0, policy_version 17206 (0.0044) +[2024-11-08 01:19:12,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6826.7, 300 sec: 6845.2). Total num frames: 70479872. Throughput: 0: 1724.4. Samples: 12616238. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:19:12,933][41694] Avg episode reward: [(0, '4.423')] +[2024-11-08 01:19:17,911][42004] Updated weights for policy 0, policy_version 17216 (0.0039) +[2024-11-08 01:19:17,931][41694] Fps is (10 sec: 6964.7, 60 sec: 7072.0, 300 sec: 6886.8). Total num frames: 70516736. Throughput: 0: 1711.5. Samples: 12621554. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:19:17,933][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 01:19:22,932][41694] Fps is (10 sec: 7372.5, 60 sec: 7031.4, 300 sec: 6886.8). Total num frames: 70553600. Throughput: 0: 1735.7. Samples: 12632522. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:19:22,934][41694] Avg episode reward: [(0, '4.411')] +[2024-11-08 01:19:23,478][42004] Updated weights for policy 0, policy_version 17226 (0.0033) +[2024-11-08 01:19:27,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6826.7, 300 sec: 6859.1). Total num frames: 70578176. Throughput: 0: 1657.2. Samples: 12640542. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 01:19:27,933][41694] Avg episode reward: [(0, '4.223')] +[2024-11-08 01:19:30,543][42004] Updated weights for policy 0, policy_version 17236 (0.0024) +[2024-11-08 01:19:32,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6826.6, 300 sec: 6859.0). Total num frames: 70615040. Throughput: 0: 1663.4. Samples: 12646300. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 01:19:32,935][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 01:19:36,075][42004] Updated weights for policy 0, policy_version 17246 (0.0028) +[2024-11-08 01:19:37,932][41694] Fps is (10 sec: 7372.4, 60 sec: 6894.9, 300 sec: 6859.1). Total num frames: 70651904. Throughput: 0: 1680.8. Samples: 12657472. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 01:19:37,934][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 01:19:37,950][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000017249_70651904.pth... +[2024-11-08 01:19:38,136][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000016843_68988928.pth +[2024-11-08 01:19:42,288][42004] Updated weights for policy 0, policy_version 17256 (0.0046) +[2024-11-08 01:19:42,933][41694] Fps is (10 sec: 6553.1, 60 sec: 6690.0, 300 sec: 6831.3). Total num frames: 70680576. Throughput: 0: 1691.8. Samples: 12667288. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:19:42,935][41694] Avg episode reward: [(0, '4.236')] +[2024-11-08 01:19:47,790][42004] Updated weights for policy 0, policy_version 17266 (0.0038) +[2024-11-08 01:19:47,931][41694] Fps is (10 sec: 6963.6, 60 sec: 6758.7, 300 sec: 6903.0). Total num frames: 70721536. Throughput: 0: 1696.3. Samples: 12672580. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:19:47,934][41694] Avg episode reward: [(0, '4.283')] +[2024-11-08 01:19:52,932][41694] Fps is (10 sec: 7783.2, 60 sec: 6963.2, 300 sec: 6900.7). Total num frames: 70758400. Throughput: 0: 1716.2. Samples: 12683652. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 01:19:52,939][41694] Avg episode reward: [(0, '4.347')] +[2024-11-08 01:19:53,410][42004] Updated weights for policy 0, policy_version 17276 (0.0031) +[2024-11-08 01:19:57,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6894.9, 300 sec: 6914.6). Total num frames: 70795264. Throughput: 0: 1749.9. Samples: 12694982. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 01:19:57,938][41694] Avg episode reward: [(0, '4.332')] +[2024-11-08 01:20:00,617][42004] Updated weights for policy 0, policy_version 17286 (0.0022) +[2024-11-08 01:20:02,932][41694] Fps is (10 sec: 5734.6, 60 sec: 6690.1, 300 sec: 6872.9). Total num frames: 70815744. Throughput: 0: 1697.9. Samples: 12697960. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 01:20:02,936][41694] Avg episode reward: [(0, '4.246')] +[2024-11-08 01:20:06,367][42004] Updated weights for policy 0, policy_version 17296 (0.0025) +[2024-11-08 01:20:07,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.6, 300 sec: 6872.9). Total num frames: 70852608. Throughput: 0: 1676.7. Samples: 12707974. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:20:07,935][41694] Avg episode reward: [(0, '4.622')] +[2024-11-08 01:20:12,094][42004] Updated weights for policy 0, policy_version 17306 (0.0057) +[2024-11-08 01:20:12,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6826.6, 300 sec: 6900.7). Total num frames: 70889472. Throughput: 0: 1738.3. Samples: 12718766. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:20:12,934][41694] Avg episode reward: [(0, '4.501')] +[2024-11-08 01:20:17,932][41694] Fps is (10 sec: 6553.1, 60 sec: 6690.0, 300 sec: 6872.9). Total num frames: 70918144. Throughput: 0: 1706.7. Samples: 12723104. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:20:17,936][41694] Avg episode reward: [(0, '4.555')] +[2024-11-08 01:20:19,103][42004] Updated weights for policy 0, policy_version 17316 (0.0024) +[2024-11-08 01:20:22,932][41694] Fps is (10 sec: 6144.2, 60 sec: 6621.9, 300 sec: 6892.3). Total num frames: 70950912. Throughput: 0: 1661.4. Samples: 12732234. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:20:22,936][41694] Avg episode reward: [(0, '4.279')] +[2024-11-08 01:20:24,894][42004] Updated weights for policy 0, policy_version 17326 (0.0031) +[2024-11-08 01:20:27,932][41694] Fps is (10 sec: 6554.2, 60 sec: 6758.4, 300 sec: 6886.9). Total num frames: 70983680. Throughput: 0: 1676.3. Samples: 12742718. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:20:27,933][41694] Avg episode reward: [(0, '4.413')] +[2024-11-08 01:20:30,868][42004] Updated weights for policy 0, policy_version 17336 (0.0040) +[2024-11-08 01:20:34,771][41694] Fps is (10 sec: 5881.5, 60 sec: 6557.4, 300 sec: 6844.2). Total num frames: 71020544. Throughput: 0: 1614.5. Samples: 12748200. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:20:34,772][41694] Avg episode reward: [(0, '4.292')] +[2024-11-08 01:20:37,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6485.4, 300 sec: 6845.2). Total num frames: 71041024. Throughput: 0: 1584.5. Samples: 12754952. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:20:37,933][41694] Avg episode reward: [(0, '4.324')] +[2024-11-08 01:20:38,820][42004] Updated weights for policy 0, policy_version 17346 (0.0035) +[2024-11-08 01:20:42,932][41694] Fps is (10 sec: 6524.9, 60 sec: 6553.8, 300 sec: 6831.3). Total num frames: 71073792. Throughput: 0: 1566.2. Samples: 12765460. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:20:42,933][41694] Avg episode reward: [(0, '4.310')] +[2024-11-08 01:20:44,902][42004] Updated weights for policy 0, policy_version 17356 (0.0030) +[2024-11-08 01:20:47,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6485.3, 300 sec: 6831.3). Total num frames: 71110656. Throughput: 0: 1607.7. Samples: 12770308. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:20:47,934][41694] Avg episode reward: [(0, '4.393')] +[2024-11-08 01:20:51,126][42004] Updated weights for policy 0, policy_version 17366 (0.0050) +[2024-11-08 01:20:52,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6417.1, 300 sec: 6817.4). Total num frames: 71143424. Throughput: 0: 1605.2. Samples: 12780210. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:20:52,933][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 01:20:56,540][42004] Updated weights for policy 0, policy_version 17376 (0.0039) +[2024-11-08 01:20:57,932][41694] Fps is (10 sec: 6963.5, 60 sec: 6417.1, 300 sec: 6859.1). Total num frames: 71180288. Throughput: 0: 1618.6. Samples: 12791604. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:20:57,933][41694] Avg episode reward: [(0, '4.579')] +[2024-11-08 01:21:02,165][42004] Updated weights for policy 0, policy_version 17386 (0.0026) +[2024-11-08 01:21:02,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6859.1). Total num frames: 71217152. Throughput: 0: 1643.9. Samples: 12797076. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:21:02,933][41694] Avg episode reward: [(0, '4.416')] +[2024-11-08 01:21:08,880][41694] Fps is (10 sec: 6359.9, 60 sec: 6518.8, 300 sec: 6850.9). Total num frames: 71249920. Throughput: 0: 1647.1. Samples: 12807918. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 01:21:08,882][41694] Avg episode reward: [(0, '4.443')] +[2024-11-08 01:21:09,417][42004] Updated weights for policy 0, policy_version 17396 (0.0026) +[2024-11-08 01:21:12,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6417.1, 300 sec: 6831.3). Total num frames: 71274496. Throughput: 0: 1605.1. Samples: 12814950. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 01:21:12,933][41694] Avg episode reward: [(0, '4.279')] +[2024-11-08 01:21:15,738][42004] Updated weights for policy 0, policy_version 17406 (0.0041) +[2024-11-08 01:21:17,931][41694] Fps is (10 sec: 6335.4, 60 sec: 6485.4, 300 sec: 6817.4). Total num frames: 71307264. Throughput: 0: 1662.3. Samples: 12819946. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 01:21:17,934][41694] Avg episode reward: [(0, '4.544')] +[2024-11-08 01:21:21,375][42004] Updated weights for policy 0, policy_version 17416 (0.0043) +[2024-11-08 01:21:22,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6553.6, 300 sec: 6803.5). Total num frames: 71344128. Throughput: 0: 1685.9. Samples: 12830818. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:21:22,934][41694] Avg episode reward: [(0, '4.548')] +[2024-11-08 01:21:27,346][42004] Updated weights for policy 0, policy_version 17426 (0.0028) +[2024-11-08 01:21:27,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6621.9, 300 sec: 6803.5). Total num frames: 71380992. Throughput: 0: 1684.1. Samples: 12841246. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:21:27,933][41694] Avg episode reward: [(0, '4.437')] +[2024-11-08 01:21:32,672][42004] Updated weights for policy 0, policy_version 17436 (0.0027) +[2024-11-08 01:21:32,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6831.2, 300 sec: 6845.2). Total num frames: 71417856. Throughput: 0: 1705.4. Samples: 12847050. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 01:21:32,933][41694] Avg episode reward: [(0, '4.340')] +[2024-11-08 01:21:37,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6845.2). Total num frames: 71454720. Throughput: 0: 1740.0. Samples: 12858508. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 01:21:37,934][41694] Avg episode reward: [(0, '4.459')] +[2024-11-08 01:21:37,999][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000017446_71458816.pth... +[2024-11-08 01:21:38,001][42004] Updated weights for policy 0, policy_version 17446 (0.0029) +[2024-11-08 01:21:38,185][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000017048_69828608.pth +[2024-11-08 01:21:42,959][41694] Fps is (10 sec: 6127.2, 60 sec: 6755.3, 300 sec: 6816.8). Total num frames: 71479296. Throughput: 0: 1606.5. Samples: 12863940. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 01:21:42,961][41694] Avg episode reward: [(0, '4.492')] +[2024-11-08 01:21:45,650][42004] Updated weights for policy 0, policy_version 17456 (0.0024) +[2024-11-08 01:21:47,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 71516160. Throughput: 0: 1651.3. Samples: 12871384. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:21:47,933][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 01:21:51,306][42004] Updated weights for policy 0, policy_version 17466 (0.0031) +[2024-11-08 01:21:52,931][41694] Fps is (10 sec: 7393.4, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 71553024. Throughput: 0: 1686.8. Samples: 12882222. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:21:52,933][41694] Avg episode reward: [(0, '4.312')] +[2024-11-08 01:21:57,167][42004] Updated weights for policy 0, policy_version 17476 (0.0029) +[2024-11-08 01:21:57,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 71585792. Throughput: 0: 1726.5. Samples: 12892642. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:21:57,933][41694] Avg episode reward: [(0, '4.401')] +[2024-11-08 01:22:02,790][42004] Updated weights for policy 0, policy_version 17486 (0.0024) +[2024-11-08 01:22:02,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 71622656. Throughput: 0: 1737.8. Samples: 12898146. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:22:02,933][41694] Avg episode reward: [(0, '4.559')] +[2024-11-08 01:22:07,723][42004] Updated weights for policy 0, policy_version 17496 (0.0036) +[2024-11-08 01:22:07,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7005.7, 300 sec: 6859.1). Total num frames: 71663616. Throughput: 0: 1760.5. Samples: 12910042. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:22:07,934][41694] Avg episode reward: [(0, '4.420')] +[2024-11-08 01:22:12,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7099.7, 300 sec: 6859.1). Total num frames: 71700480. Throughput: 0: 1787.6. Samples: 12921690. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:22:12,934][41694] Avg episode reward: [(0, '4.287')] +[2024-11-08 01:22:13,067][42004] Updated weights for policy 0, policy_version 17506 (0.0029) +[2024-11-08 01:22:17,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6963.2, 300 sec: 6817.5). Total num frames: 71725056. Throughput: 0: 1786.9. Samples: 12927458. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:22:17,934][41694] Avg episode reward: [(0, '4.219')] +[2024-11-08 01:22:20,743][42004] Updated weights for policy 0, policy_version 17516 (0.0027) +[2024-11-08 01:22:22,933][41694] Fps is (10 sec: 6143.0, 60 sec: 6963.1, 300 sec: 6817.4). Total num frames: 71761920. Throughput: 0: 1687.2. Samples: 12934434. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:22:22,935][41694] Avg episode reward: [(0, '4.532')] +[2024-11-08 01:22:26,122][42004] Updated weights for policy 0, policy_version 17526 (0.0026) +[2024-11-08 01:22:27,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 6817.4). Total num frames: 71798784. Throughput: 0: 1826.0. Samples: 12946060. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:22:27,933][41694] Avg episode reward: [(0, '4.458')] +[2024-11-08 01:22:31,848][42004] Updated weights for policy 0, policy_version 17536 (0.0025) +[2024-11-08 01:22:32,932][41694] Fps is (10 sec: 6964.1, 60 sec: 6894.9, 300 sec: 6789.6). Total num frames: 71831552. Throughput: 0: 1769.7. Samples: 12951020. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:22:32,934][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 01:22:37,209][42004] Updated weights for policy 0, policy_version 17546 (0.0033) +[2024-11-08 01:22:37,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6849.7). Total num frames: 71872512. Throughput: 0: 1780.0. Samples: 12962320. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:22:37,934][41694] Avg episode reward: [(0, '4.831')] +[2024-11-08 01:22:42,708][42004] Updated weights for policy 0, policy_version 17556 (0.0034) +[2024-11-08 01:22:42,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7171.3, 300 sec: 6859.1). Total num frames: 71909376. Throughput: 0: 1800.0. Samples: 12973642. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:22:42,934][41694] Avg episode reward: [(0, '4.575')] +[2024-11-08 01:22:47,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7168.0, 300 sec: 6859.1). Total num frames: 71946240. Throughput: 0: 1797.7. Samples: 12979042. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:22:47,936][41694] Avg episode reward: [(0, '4.333')] +[2024-11-08 01:22:48,083][42004] Updated weights for policy 0, policy_version 17566 (0.0030) +[2024-11-08 01:22:52,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6963.2, 300 sec: 6803.5). Total num frames: 71970816. Throughput: 0: 1737.9. Samples: 12988246. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:22:52,933][41694] Avg episode reward: [(0, '4.364')] +[2024-11-08 01:22:55,673][42004] Updated weights for policy 0, policy_version 17576 (0.0026) +[2024-11-08 01:22:57,931][41694] Fps is (10 sec: 6144.1, 60 sec: 7031.5, 300 sec: 6831.3). Total num frames: 72007680. Throughput: 0: 1693.1. Samples: 12997878. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:22:57,933][41694] Avg episode reward: [(0, '4.734')] +[2024-11-08 01:23:01,083][42004] Updated weights for policy 0, policy_version 17586 (0.0029) +[2024-11-08 01:23:02,935][41694] Fps is (10 sec: 6960.7, 60 sec: 6962.8, 300 sec: 6803.4). Total num frames: 72040448. Throughput: 0: 1692.9. Samples: 13003642. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:23:02,937][41694] Avg episode reward: [(0, '4.482')] +[2024-11-08 01:23:07,237][42004] Updated weights for policy 0, policy_version 17596 (0.0030) +[2024-11-08 01:23:07,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 6803.5). Total num frames: 72077312. Throughput: 0: 1756.6. Samples: 13013480. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:23:07,933][41694] Avg episode reward: [(0, '4.575')] +[2024-11-08 01:23:12,499][42004] Updated weights for policy 0, policy_version 17606 (0.0033) +[2024-11-08 01:23:12,931][41694] Fps is (10 sec: 7375.4, 60 sec: 6894.9, 300 sec: 6853.2). Total num frames: 72114176. Throughput: 0: 1753.1. Samples: 13024948. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:23:12,933][41694] Avg episode reward: [(0, '4.563')] +[2024-11-08 01:23:17,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7099.7, 300 sec: 6845.2). Total num frames: 72151040. Throughput: 0: 1763.6. Samples: 13030382. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:23:17,934][41694] Avg episode reward: [(0, '4.280')] +[2024-11-08 01:23:18,067][42004] Updated weights for policy 0, policy_version 17616 (0.0029) +[2024-11-08 01:23:22,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7168.2, 300 sec: 6859.1). Total num frames: 72192000. Throughput: 0: 1772.6. Samples: 13042088. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:23:22,934][41694] Avg episode reward: [(0, '4.366')] +[2024-11-08 01:23:23,361][42004] Updated weights for policy 0, policy_version 17626 (0.0028) +[2024-11-08 01:23:27,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6894.9, 300 sec: 6803.5). Total num frames: 72212480. Throughput: 0: 1682.0. Samples: 13049332. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:23:27,937][41694] Avg episode reward: [(0, '4.189')] +[2024-11-08 01:23:31,055][42004] Updated weights for policy 0, policy_version 17636 (0.0029) +[2024-11-08 01:23:32,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6963.2, 300 sec: 6817.4). Total num frames: 72249344. Throughput: 0: 1683.6. Samples: 13054802. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:23:32,933][41694] Avg episode reward: [(0, '4.450')] +[2024-11-08 01:23:36,965][42004] Updated weights for policy 0, policy_version 17646 (0.0030) +[2024-11-08 01:23:37,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6826.7, 300 sec: 6789.6). Total num frames: 72282112. Throughput: 0: 1716.9. Samples: 13065508. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:23:37,933][41694] Avg episode reward: [(0, '4.514')] +[2024-11-08 01:23:37,944][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000017647_72282112.pth... +[2024-11-08 01:23:38,058][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000017249_70651904.pth +[2024-11-08 01:23:42,580][42004] Updated weights for policy 0, policy_version 17656 (0.0029) +[2024-11-08 01:23:42,933][41694] Fps is (10 sec: 6962.2, 60 sec: 6826.5, 300 sec: 6789.7). Total num frames: 72318976. Throughput: 0: 1742.8. Samples: 13076306. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:23:42,936][41694] Avg episode reward: [(0, '4.502')] +[2024-11-08 01:23:47,585][42004] Updated weights for policy 0, policy_version 17666 (0.0024) +[2024-11-08 01:23:47,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6894.9, 300 sec: 6845.2). Total num frames: 72359936. Throughput: 0: 1747.0. Samples: 13082252. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:23:47,935][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 01:23:52,854][42004] Updated weights for policy 0, policy_version 17676 (0.0022) +[2024-11-08 01:23:52,932][41694] Fps is (10 sec: 8193.0, 60 sec: 7168.0, 300 sec: 6845.2). Total num frames: 72400896. Throughput: 0: 1793.7. Samples: 13094196. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:23:52,933][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 01:23:57,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7099.7, 300 sec: 6845.2). Total num frames: 72433664. Throughput: 0: 1787.1. Samples: 13105368. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:23:57,933][41694] Avg episode reward: [(0, '4.485')] +[2024-11-08 01:23:58,568][42004] Updated weights for policy 0, policy_version 17686 (0.0034) +[2024-11-08 01:24:02,932][41694] Fps is (10 sec: 5324.6, 60 sec: 6895.3, 300 sec: 6803.6). Total num frames: 72454144. Throughput: 0: 1734.0. Samples: 13108412. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:24:02,934][41694] Avg episode reward: [(0, '4.530')] +[2024-11-08 01:24:06,957][42004] Updated weights for policy 0, policy_version 17696 (0.0034) +[2024-11-08 01:24:07,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 72486912. Throughput: 0: 1653.4. Samples: 13116490. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 01:24:07,933][41694] Avg episode reward: [(0, '4.095')] +[2024-11-08 01:24:12,713][42004] Updated weights for policy 0, policy_version 17706 (0.0025) +[2024-11-08 01:24:12,931][41694] Fps is (10 sec: 6963.6, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 72523776. Throughput: 0: 1733.3. Samples: 13127328. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 01:24:12,934][41694] Avg episode reward: [(0, '4.295')] +[2024-11-08 01:24:17,791][42004] Updated weights for policy 0, policy_version 17716 (0.0028) +[2024-11-08 01:24:17,932][41694] Fps is (10 sec: 7782.1, 60 sec: 6894.9, 300 sec: 6817.4). Total num frames: 72564736. Throughput: 0: 1730.9. Samples: 13132694. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:24:17,934][41694] Avg episode reward: [(0, '4.362')] +[2024-11-08 01:24:22,844][42004] Updated weights for policy 0, policy_version 17726 (0.0030) +[2024-11-08 01:24:22,931][41694] Fps is (10 sec: 8192.0, 60 sec: 6894.9, 300 sec: 6872.9). Total num frames: 72605696. Throughput: 0: 1772.1. Samples: 13145254. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:24:22,932][41694] Avg episode reward: [(0, '4.386')] +[2024-11-08 01:24:27,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7168.0, 300 sec: 6873.0). Total num frames: 72642560. Throughput: 0: 1792.8. Samples: 13156982. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:24:27,934][41694] Avg episode reward: [(0, '4.346')] +[2024-11-08 01:24:28,190][42004] Updated weights for policy 0, policy_version 17736 (0.0033) +[2024-11-08 01:24:32,932][41694] Fps is (10 sec: 7372.5, 60 sec: 7168.0, 300 sec: 6873.0). Total num frames: 72679424. Throughput: 0: 1789.8. Samples: 13162792. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:24:32,934][41694] Avg episode reward: [(0, '4.375')] +[2024-11-08 01:24:35,679][42004] Updated weights for policy 0, policy_version 17746 (0.0033) +[2024-11-08 01:24:37,932][41694] Fps is (10 sec: 6144.0, 60 sec: 7031.4, 300 sec: 6859.1). Total num frames: 72704000. Throughput: 0: 1689.2. Samples: 13170210. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:24:37,934][41694] Avg episode reward: [(0, '4.270')] +[2024-11-08 01:24:41,195][42004] Updated weights for policy 0, policy_version 17756 (0.0027) +[2024-11-08 01:24:42,934][41694] Fps is (10 sec: 5733.1, 60 sec: 6963.1, 300 sec: 6831.2). Total num frames: 72736768. Throughput: 0: 1686.6. Samples: 13181270. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:24:42,936][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 01:24:47,443][42004] Updated weights for policy 0, policy_version 17766 (0.0032) +[2024-11-08 01:24:47,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 72769536. Throughput: 0: 1717.7. Samples: 13185710. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:24:47,933][41694] Avg episode reward: [(0, '4.479')] +[2024-11-08 01:24:52,731][42004] Updated weights for policy 0, policy_version 17776 (0.0028) +[2024-11-08 01:24:52,933][41694] Fps is (10 sec: 7373.8, 60 sec: 6826.6, 300 sec: 6831.3). Total num frames: 72810496. Throughput: 0: 1781.1. Samples: 13196640. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:24:52,935][41694] Avg episode reward: [(0, '4.546')] +[2024-11-08 01:24:57,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6894.9, 300 sec: 6886.8). Total num frames: 72847360. Throughput: 0: 1797.1. Samples: 13208198. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:24:57,934][41694] Avg episode reward: [(0, '4.356')] +[2024-11-08 01:24:58,233][42004] Updated weights for policy 0, policy_version 17786 (0.0027) +[2024-11-08 01:25:02,932][41694] Fps is (10 sec: 6963.9, 60 sec: 7099.8, 300 sec: 6872.9). Total num frames: 72880128. Throughput: 0: 1790.3. Samples: 13213258. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:25:02,935][41694] Avg episode reward: [(0, '4.398')] +[2024-11-08 01:25:04,285][42004] Updated weights for policy 0, policy_version 17796 (0.0029) +[2024-11-08 01:25:09,662][41694] Fps is (10 sec: 5586.9, 60 sec: 6900.7, 300 sec: 6819.1). Total num frames: 72912896. Throughput: 0: 1676.5. Samples: 13223600. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:25:09,666][41694] Avg episode reward: [(0, '4.429')] +[2024-11-08 01:25:12,077][42004] Updated weights for policy 0, policy_version 17806 (0.0037) +[2024-11-08 01:25:12,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6894.9, 300 sec: 6845.2). Total num frames: 72937472. Throughput: 0: 1642.0. Samples: 13230874. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:25:12,936][41694] Avg episode reward: [(0, '4.426')] +[2024-11-08 01:25:17,932][41694] Fps is (10 sec: 6439.2, 60 sec: 6690.2, 300 sec: 6831.3). Total num frames: 72966144. Throughput: 0: 1619.5. Samples: 13235670. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:25:17,934][41694] Avg episode reward: [(0, '4.350')] +[2024-11-08 01:25:18,659][42004] Updated weights for policy 0, policy_version 17816 (0.0048) +[2024-11-08 01:25:22,932][41694] Fps is (10 sec: 6144.2, 60 sec: 6553.6, 300 sec: 6831.3). Total num frames: 72998912. Throughput: 0: 1652.7. Samples: 13244582. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:25:22,934][41694] Avg episode reward: [(0, '4.470')] +[2024-11-08 01:25:24,626][42004] Updated weights for policy 0, policy_version 17826 (0.0024) +[2024-11-08 01:25:27,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6888.1). Total num frames: 73039872. Throughput: 0: 1656.5. Samples: 13255808. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:25:27,933][41694] Avg episode reward: [(0, '4.503')] +[2024-11-08 01:25:30,082][42004] Updated weights for policy 0, policy_version 17836 (0.0033) +[2024-11-08 01:25:32,933][41694] Fps is (10 sec: 7781.7, 60 sec: 6621.8, 300 sec: 6900.7). Total num frames: 73076736. Throughput: 0: 1682.8. Samples: 13261438. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:25:32,935][41694] Avg episode reward: [(0, '4.522')] +[2024-11-08 01:25:35,533][42004] Updated weights for policy 0, policy_version 17846 (0.0040) +[2024-11-08 01:25:37,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6826.7, 300 sec: 6914.6). Total num frames: 73113600. Throughput: 0: 1689.9. Samples: 13272686. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:25:37,934][41694] Avg episode reward: [(0, '4.345')] +[2024-11-08 01:25:37,944][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000017850_73113600.pth... +[2024-11-08 01:25:38,064][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000017446_71458816.pth +[2024-11-08 01:25:41,157][42004] Updated weights for policy 0, policy_version 17856 (0.0032) +[2024-11-08 01:25:43,609][41694] Fps is (10 sec: 6138.5, 60 sec: 6683.3, 300 sec: 6871.1). Total num frames: 73142272. Throughput: 0: 1534.5. Samples: 13278290. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:25:43,610][41694] Avg episode reward: [(0, '4.592')] +[2024-11-08 01:25:47,932][41694] Fps is (10 sec: 6144.1, 60 sec: 6758.4, 300 sec: 6886.8). Total num frames: 73175040. Throughput: 0: 1612.0. Samples: 13285796. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:25:47,933][41694] Avg episode reward: [(0, '4.589')] +[2024-11-08 01:25:48,405][42004] Updated weights for policy 0, policy_version 17866 (0.0039) +[2024-11-08 01:25:52,932][41694] Fps is (10 sec: 7029.6, 60 sec: 6622.0, 300 sec: 6872.9). Total num frames: 73207808. Throughput: 0: 1688.6. Samples: 13296664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:25:52,933][41694] Avg episode reward: [(0, '4.484')] +[2024-11-08 01:25:54,556][42004] Updated weights for policy 0, policy_version 17876 (0.0025) +[2024-11-08 01:25:57,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 6872.9). Total num frames: 73244672. Throughput: 0: 1692.1. Samples: 13307016. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:25:57,933][41694] Avg episode reward: [(0, '4.344')] +[2024-11-08 01:26:00,087][42004] Updated weights for policy 0, policy_version 17886 (0.0035) +[2024-11-08 01:26:02,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6621.9, 300 sec: 6895.1). Total num frames: 73277440. Throughput: 0: 1710.9. Samples: 13312660. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:26:02,933][41694] Avg episode reward: [(0, '4.232')] +[2024-11-08 01:26:05,824][42004] Updated weights for policy 0, policy_version 17896 (0.0027) +[2024-11-08 01:26:07,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6888.8, 300 sec: 6914.6). Total num frames: 73314304. Throughput: 0: 1750.8. Samples: 13323368. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:26:07,933][41694] Avg episode reward: [(0, '4.540')] +[2024-11-08 01:26:12,106][42004] Updated weights for policy 0, policy_version 17906 (0.0046) +[2024-11-08 01:26:12,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6826.7, 300 sec: 6914.6). Total num frames: 73347072. Throughput: 0: 1719.3. Samples: 13333176. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:26:12,933][41694] Avg episode reward: [(0, '4.620')] +[2024-11-08 01:26:17,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6690.1, 300 sec: 6859.1). Total num frames: 73367552. Throughput: 0: 1709.4. Samples: 13338358. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:26:17,936][41694] Avg episode reward: [(0, '4.716')] +[2024-11-08 01:26:19,666][42004] Updated weights for policy 0, policy_version 17916 (0.0024) +[2024-11-08 01:26:22,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6758.4, 300 sec: 6859.1). Total num frames: 73404416. Throughput: 0: 1630.6. Samples: 13346064. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:26:22,933][41694] Avg episode reward: [(0, '4.497')] +[2024-11-08 01:26:25,573][42004] Updated weights for policy 0, policy_version 17926 (0.0030) +[2024-11-08 01:26:27,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6621.8, 300 sec: 6845.2). Total num frames: 73437184. Throughput: 0: 1755.9. Samples: 13356116. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:26:27,934][41694] Avg episode reward: [(0, '4.504')] +[2024-11-08 01:26:31,331][42004] Updated weights for policy 0, policy_version 17936 (0.0032) +[2024-11-08 01:26:32,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6622.0, 300 sec: 6845.2). Total num frames: 73474048. Throughput: 0: 1683.2. Samples: 13361538. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:26:32,933][41694] Avg episode reward: [(0, '4.429')] +[2024-11-08 01:26:36,754][42004] Updated weights for policy 0, policy_version 17946 (0.0027) +[2024-11-08 01:26:37,932][41694] Fps is (10 sec: 7782.7, 60 sec: 6690.1, 300 sec: 6901.4). Total num frames: 73515008. Throughput: 0: 1692.6. Samples: 13372830. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:26:37,934][41694] Avg episode reward: [(0, '4.526')] +[2024-11-08 01:26:42,544][42004] Updated weights for policy 0, policy_version 17956 (0.0027) +[2024-11-08 01:26:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6835.5, 300 sec: 6886.8). Total num frames: 73547776. Throughput: 0: 1701.8. Samples: 13383598. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:26:42,934][41694] Avg episode reward: [(0, '4.415')] +[2024-11-08 01:26:47,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6826.7, 300 sec: 6886.8). Total num frames: 73584640. Throughput: 0: 1695.7. Samples: 13388966. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:26:47,933][41694] Avg episode reward: [(0, '4.469')] +[2024-11-08 01:26:48,512][42004] Updated weights for policy 0, policy_version 17966 (0.0034) +[2024-11-08 01:26:52,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6621.9, 300 sec: 6845.2). Total num frames: 73605120. Throughput: 0: 1640.6. Samples: 13397196. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:26:52,934][41694] Avg episode reward: [(0, '4.561')] +[2024-11-08 01:26:56,204][42004] Updated weights for policy 0, policy_version 17976 (0.0032) +[2024-11-08 01:26:57,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6621.9, 300 sec: 6845.2). Total num frames: 73641984. Throughput: 0: 1630.1. Samples: 13406532. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:26:57,933][41694] Avg episode reward: [(0, '4.355')] +[2024-11-08 01:27:02,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6485.3, 300 sec: 6789.6). Total num frames: 73666560. Throughput: 0: 1619.2. Samples: 13411222. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:27:02,935][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 01:27:03,051][42004] Updated weights for policy 0, policy_version 17986 (0.0037) +[2024-11-08 01:27:07,932][41694] Fps is (10 sec: 6143.8, 60 sec: 6485.3, 300 sec: 6789.6). Total num frames: 73703424. Throughput: 0: 1664.2. Samples: 13420956. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:27:07,935][41694] Avg episode reward: [(0, '4.160')] +[2024-11-08 01:27:08,640][42004] Updated weights for policy 0, policy_version 17996 (0.0027) +[2024-11-08 01:27:12,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6621.9, 300 sec: 6845.2). Total num frames: 73744384. Throughput: 0: 1686.5. Samples: 13432008. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:27:12,935][41694] Avg episode reward: [(0, '4.414')] +[2024-11-08 01:27:13,886][42004] Updated weights for policy 0, policy_version 18006 (0.0025) +[2024-11-08 01:27:17,931][41694] Fps is (10 sec: 7782.8, 60 sec: 6895.0, 300 sec: 6845.2). Total num frames: 73781248. Throughput: 0: 1695.8. Samples: 13437848. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:27:17,934][41694] Avg episode reward: [(0, '4.388')] +[2024-11-08 01:27:19,345][42004] Updated weights for policy 0, policy_version 18016 (0.0031) +[2024-11-08 01:27:22,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6894.9, 300 sec: 6845.2). Total num frames: 73818112. Throughput: 0: 1692.0. Samples: 13448970. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:27:22,934][41694] Avg episode reward: [(0, '4.356')] +[2024-11-08 01:27:26,755][42004] Updated weights for policy 0, policy_version 18026 (0.0031) +[2024-11-08 01:27:27,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6758.5, 300 sec: 6817.4). Total num frames: 73842688. Throughput: 0: 1627.3. Samples: 13456826. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:27:27,933][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 01:27:32,701][42004] Updated weights for policy 0, policy_version 18036 (0.0034) +[2024-11-08 01:27:32,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6690.1, 300 sec: 6789.6). Total num frames: 73875456. Throughput: 0: 1625.9. Samples: 13462132. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:27:32,933][41694] Avg episode reward: [(0, '4.420')] +[2024-11-08 01:27:37,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6621.9, 300 sec: 6789.6). Total num frames: 73912320. Throughput: 0: 1659.0. Samples: 13471850. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:27:37,934][41694] Avg episode reward: [(0, '4.288')] +[2024-11-08 01:27:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000018045_73912320.pth... +[2024-11-08 01:27:38,065][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000017647_72282112.pth +[2024-11-08 01:27:38,468][42004] Updated weights for policy 0, policy_version 18046 (0.0028) +[2024-11-08 01:27:42,932][41694] Fps is (10 sec: 7372.4, 60 sec: 6690.1, 300 sec: 6789.6). Total num frames: 73949184. Throughput: 0: 1708.8. Samples: 13483428. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:27:42,934][41694] Avg episode reward: [(0, '4.390')] +[2024-11-08 01:27:43,816][42004] Updated weights for policy 0, policy_version 18056 (0.0023) +[2024-11-08 01:27:47,932][41694] Fps is (10 sec: 7372.3, 60 sec: 6690.1, 300 sec: 6831.3). Total num frames: 73986048. Throughput: 0: 1729.8. Samples: 13489066. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:27:47,935][41694] Avg episode reward: [(0, '4.304')] +[2024-11-08 01:27:49,322][42004] Updated weights for policy 0, policy_version 18066 (0.0025) +[2024-11-08 01:27:52,932][41694] Fps is (10 sec: 7373.2, 60 sec: 6963.2, 300 sec: 6831.3). Total num frames: 74022912. Throughput: 0: 1765.4. Samples: 13500400. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:27:52,935][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 01:27:54,726][42004] Updated weights for policy 0, policy_version 18076 (0.0031) +[2024-11-08 01:27:57,932][41694] Fps is (10 sec: 7782.7, 60 sec: 7031.4, 300 sec: 6859.1). Total num frames: 74063872. Throughput: 0: 1770.9. Samples: 13511698. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:27:57,933][41694] Avg episode reward: [(0, '4.344')] +[2024-11-08 01:28:02,151][42004] Updated weights for policy 0, policy_version 18086 (0.0028) +[2024-11-08 01:28:02,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6963.2, 300 sec: 6803.5). Total num frames: 74084352. Throughput: 0: 1696.3. Samples: 13514180. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:28:02,934][41694] Avg episode reward: [(0, '4.259')] +[2024-11-08 01:28:07,933][41694] Fps is (10 sec: 5324.2, 60 sec: 6894.8, 300 sec: 6789.6). Total num frames: 74117120. Throughput: 0: 1663.9. Samples: 13523846. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:28:07,935][41694] Avg episode reward: [(0, '4.485')] +[2024-11-08 01:28:08,372][42004] Updated weights for policy 0, policy_version 18096 (0.0023) +[2024-11-08 01:28:12,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6826.7, 300 sec: 6789.6). Total num frames: 74153984. Throughput: 0: 1725.5. Samples: 13534472. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:28:12,934][41694] Avg episode reward: [(0, '4.391')] +[2024-11-08 01:28:13,991][42004] Updated weights for policy 0, policy_version 18106 (0.0032) +[2024-11-08 01:28:17,932][41694] Fps is (10 sec: 6963.9, 60 sec: 6758.3, 300 sec: 6761.9). Total num frames: 74186752. Throughput: 0: 1718.7. Samples: 13539476. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:28:17,935][41694] Avg episode reward: [(0, '4.256')] +[2024-11-08 01:28:19,878][42004] Updated weights for policy 0, policy_version 18116 (0.0022) +[2024-11-08 01:28:22,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 74223616. Throughput: 0: 1747.8. Samples: 13550502. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:28:22,933][41694] Avg episode reward: [(0, '4.323')] +[2024-11-08 01:28:25,301][42004] Updated weights for policy 0, policy_version 18126 (0.0028) +[2024-11-08 01:28:27,931][41694] Fps is (10 sec: 7782.9, 60 sec: 7031.5, 300 sec: 6831.3). Total num frames: 74264576. Throughput: 0: 1743.3. Samples: 13561876. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:28:27,933][41694] Avg episode reward: [(0, '4.258')] +[2024-11-08 01:28:30,608][42004] Updated weights for policy 0, policy_version 18136 (0.0024) +[2024-11-08 01:28:34,397][41694] Fps is (10 sec: 6430.3, 60 sec: 6863.8, 300 sec: 6797.5). Total num frames: 74297344. Throughput: 0: 1688.1. Samples: 13567502. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:28:34,399][41694] Avg episode reward: [(0, '4.518')] +[2024-11-08 01:28:37,863][42004] Updated weights for policy 0, policy_version 18146 (0.0033) +[2024-11-08 01:28:37,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6895.0, 300 sec: 6803.6). Total num frames: 74326016. Throughput: 0: 1664.3. Samples: 13575292. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:28:37,933][41694] Avg episode reward: [(0, '4.390')] +[2024-11-08 01:28:42,933][41694] Fps is (10 sec: 7197.8, 60 sec: 6826.6, 300 sec: 6775.7). Total num frames: 74358784. Throughput: 0: 1649.3. Samples: 13585920. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:28:42,935][41694] Avg episode reward: [(0, '4.583')] +[2024-11-08 01:28:43,892][42004] Updated weights for policy 0, policy_version 18156 (0.0036) +[2024-11-08 01:28:47,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6826.7, 300 sec: 6761.9). Total num frames: 74395648. Throughput: 0: 1717.0. Samples: 13591444. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:28:47,934][41694] Avg episode reward: [(0, '4.496')] +[2024-11-08 01:28:49,316][42004] Updated weights for policy 0, policy_version 18166 (0.0028) +[2024-11-08 01:28:52,932][41694] Fps is (10 sec: 7373.9, 60 sec: 6826.7, 300 sec: 6775.8). Total num frames: 74432512. Throughput: 0: 1752.6. Samples: 13602710. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 01:28:52,933][41694] Avg episode reward: [(0, '4.263')] +[2024-11-08 01:28:54,618][42004] Updated weights for policy 0, policy_version 18176 (0.0034) +[2024-11-08 01:28:57,931][41694] Fps is (10 sec: 7782.7, 60 sec: 6826.7, 300 sec: 6845.2). Total num frames: 74473472. Throughput: 0: 1773.6. Samples: 13614284. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 01:28:57,933][41694] Avg episode reward: [(0, '4.393')] +[2024-11-08 01:28:59,987][42004] Updated weights for policy 0, policy_version 18186 (0.0040) +[2024-11-08 01:29:02,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 6845.2). Total num frames: 74506240. Throughput: 0: 1790.4. Samples: 13620042. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:29:02,934][41694] Avg episode reward: [(0, '4.339')] +[2024-11-08 01:29:05,968][42004] Updated weights for policy 0, policy_version 18196 (0.0054) +[2024-11-08 01:29:08,564][41694] Fps is (10 sec: 5778.4, 60 sec: 6890.7, 300 sec: 6802.8). Total num frames: 74534912. Throughput: 0: 1749.5. Samples: 13630338. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:29:08,568][41694] Avg episode reward: [(0, '4.227')] +[2024-11-08 01:29:12,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6826.7, 300 sec: 6775.8). Total num frames: 74563584. Throughput: 0: 1678.8. Samples: 13637420. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:29:12,934][41694] Avg episode reward: [(0, '4.307')] +[2024-11-08 01:29:14,041][42004] Updated weights for policy 0, policy_version 18206 (0.0026) +[2024-11-08 01:29:17,933][41694] Fps is (10 sec: 6558.1, 60 sec: 6826.6, 300 sec: 6748.0). Total num frames: 74596352. Throughput: 0: 1709.1. Samples: 13641910. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:29:17,935][41694] Avg episode reward: [(0, '4.396')] +[2024-11-08 01:29:20,041][42004] Updated weights for policy 0, policy_version 18216 (0.0033) +[2024-11-08 01:29:22,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 74629120. Throughput: 0: 1710.5. Samples: 13652264. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:29:22,934][41694] Avg episode reward: [(0, '4.388')] +[2024-11-08 01:29:25,843][42004] Updated weights for policy 0, policy_version 18226 (0.0026) +[2024-11-08 01:29:27,932][41694] Fps is (10 sec: 6963.6, 60 sec: 6690.0, 300 sec: 6734.1). Total num frames: 74665984. Throughput: 0: 1712.3. Samples: 13662972. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:29:27,933][41694] Avg episode reward: [(0, '4.493')] +[2024-11-08 01:29:31,833][42004] Updated weights for policy 0, policy_version 18236 (0.0032) +[2024-11-08 01:29:32,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6927.6, 300 sec: 6775.8). Total num frames: 74702848. Throughput: 0: 1699.2. Samples: 13667906. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:29:32,934][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 01:29:37,106][42004] Updated weights for policy 0, policy_version 18246 (0.0032) +[2024-11-08 01:29:37,932][41694] Fps is (10 sec: 7373.2, 60 sec: 6894.9, 300 sec: 6789.7). Total num frames: 74739712. Throughput: 0: 1702.9. Samples: 13679342. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:29:37,934][41694] Avg episode reward: [(0, '4.623')] +[2024-11-08 01:29:37,949][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000018247_74739712.pth... +[2024-11-08 01:29:38,114][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000017850_73113600.pth +[2024-11-08 01:29:42,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6690.3, 300 sec: 6748.0). Total num frames: 74760192. Throughput: 0: 1623.5. Samples: 13687342. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:29:42,933][41694] Avg episode reward: [(0, '4.275')] +[2024-11-08 01:29:45,569][42004] Updated weights for policy 0, policy_version 18256 (0.0024) +[2024-11-08 01:29:47,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6553.6, 300 sec: 6706.3). Total num frames: 74788864. Throughput: 0: 1571.4. Samples: 13690756. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:29:47,934][41694] Avg episode reward: [(0, '4.229')] +[2024-11-08 01:29:52,898][42004] Updated weights for policy 0, policy_version 18266 (0.0031) +[2024-11-08 01:29:52,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6417.1, 300 sec: 6678.6). Total num frames: 74817536. Throughput: 0: 1556.0. Samples: 13699372. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:29:52,935][41694] Avg episode reward: [(0, '4.520')] +[2024-11-08 01:29:57,933][41694] Fps is (10 sec: 5733.9, 60 sec: 6212.1, 300 sec: 6664.7). Total num frames: 74846208. Throughput: 0: 1578.7. Samples: 13708462. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:29:57,935][41694] Avg episode reward: [(0, '4.661')] +[2024-11-08 01:29:59,276][42004] Updated weights for policy 0, policy_version 18276 (0.0045) +[2024-11-08 01:30:02,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6690.0). Total num frames: 74874880. Throughput: 0: 1588.2. Samples: 13713376. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:30:02,933][41694] Avg episode reward: [(0, '4.457')] +[2024-11-08 01:30:06,200][42004] Updated weights for policy 0, policy_version 18286 (0.0035) +[2024-11-08 01:30:07,932][41694] Fps is (10 sec: 6144.7, 60 sec: 6278.5, 300 sec: 6678.6). Total num frames: 74907648. Throughput: 0: 1553.9. Samples: 13722190. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:30:07,933][41694] Avg episode reward: [(0, '4.563')] +[2024-11-08 01:30:12,000][42004] Updated weights for policy 0, policy_version 18296 (0.0034) +[2024-11-08 01:30:12,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6348.8, 300 sec: 6706.3). Total num frames: 74944512. Throughput: 0: 1545.2. Samples: 13732506. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:30:12,933][41694] Avg episode reward: [(0, '4.470')] +[2024-11-08 01:30:17,932][41694] Fps is (10 sec: 5324.5, 60 sec: 6075.8, 300 sec: 6650.8). Total num frames: 74960896. Throughput: 0: 1529.9. Samples: 13736754. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:30:17,934][41694] Avg episode reward: [(0, '4.302')] +[2024-11-08 01:30:20,481][42004] Updated weights for policy 0, policy_version 18306 (0.0029) +[2024-11-08 01:30:22,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6075.7, 300 sec: 6623.0). Total num frames: 74993664. Throughput: 0: 1432.0. Samples: 13743782. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:30:22,934][41694] Avg episode reward: [(0, '4.615')] +[2024-11-08 01:30:26,624][42004] Updated weights for policy 0, policy_version 18316 (0.0024) +[2024-11-08 01:30:27,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6075.7, 300 sec: 6623.0). Total num frames: 75030528. Throughput: 0: 1479.3. Samples: 13753912. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:30:27,936][41694] Avg episode reward: [(0, '4.423')] +[2024-11-08 01:30:32,093][42004] Updated weights for policy 0, policy_version 18326 (0.0026) +[2024-11-08 01:30:32,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6075.7, 300 sec: 6623.0). Total num frames: 75067392. Throughput: 0: 1523.6. Samples: 13759318. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:30:32,934][41694] Avg episode reward: [(0, '4.334')] +[2024-11-08 01:30:37,656][42004] Updated weights for policy 0, policy_version 18336 (0.0023) +[2024-11-08 01:30:37,932][41694] Fps is (10 sec: 7373.2, 60 sec: 6075.7, 300 sec: 6666.1). Total num frames: 75104256. Throughput: 0: 1581.8. Samples: 13770552. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:30:37,934][41694] Avg episode reward: [(0, '4.238')] +[2024-11-08 01:30:42,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6348.8, 300 sec: 6664.7). Total num frames: 75141120. Throughput: 0: 1635.7. Samples: 13782066. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:30:42,933][41694] Avg episode reward: [(0, '4.470')] +[2024-11-08 01:30:43,092][42004] Updated weights for policy 0, policy_version 18346 (0.0029) +[2024-11-08 01:30:47,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6485.4, 300 sec: 6678.6). Total num frames: 75177984. Throughput: 0: 1643.4. Samples: 13787330. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:30:47,937][41694] Avg episode reward: [(0, '4.379')] +[2024-11-08 01:30:48,509][42004] Updated weights for policy 0, policy_version 18356 (0.0030) +[2024-11-08 01:30:52,932][41694] Fps is (10 sec: 6143.6, 60 sec: 6417.0, 300 sec: 6636.9). Total num frames: 75202560. Throughput: 0: 1622.0. Samples: 13795180. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:30:52,934][41694] Avg episode reward: [(0, '4.563')] +[2024-11-08 01:30:56,047][42004] Updated weights for policy 0, policy_version 18366 (0.0037) +[2024-11-08 01:30:57,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6485.5, 300 sec: 6636.9). Total num frames: 75235328. Throughput: 0: 1631.5. Samples: 13805922. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:30:57,933][41694] Avg episode reward: [(0, '4.265')] +[2024-11-08 01:31:01,896][42004] Updated weights for policy 0, policy_version 18376 (0.0025) +[2024-11-08 01:31:02,932][41694] Fps is (10 sec: 6963.5, 60 sec: 6621.8, 300 sec: 6636.9). Total num frames: 75272192. Throughput: 0: 1649.3. Samples: 13810972. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:31:02,933][41694] Avg episode reward: [(0, '4.416')] +[2024-11-08 01:31:07,609][42004] Updated weights for policy 0, policy_version 18386 (0.0032) +[2024-11-08 01:31:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.1, 300 sec: 6650.8). Total num frames: 75309056. Throughput: 0: 1733.5. Samples: 13821788. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:31:07,933][41694] Avg episode reward: [(0, '4.296')] +[2024-11-08 01:31:12,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6690.1, 300 sec: 6706.3). Total num frames: 75345920. Throughput: 0: 1739.9. Samples: 13832204. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:31:12,933][41694] Avg episode reward: [(0, '4.216')] +[2024-11-08 01:31:13,536][42004] Updated weights for policy 0, policy_version 18396 (0.0040) +[2024-11-08 01:31:17,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6963.3, 300 sec: 6692.4). Total num frames: 75378688. Throughput: 0: 1738.9. Samples: 13837568. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:31:17,934][41694] Avg episode reward: [(0, '4.463')] +[2024-11-08 01:31:19,173][42004] Updated weights for policy 0, policy_version 18406 (0.0037) +[2024-11-08 01:31:22,932][41694] Fps is (10 sec: 6963.1, 60 sec: 7031.5, 300 sec: 6706.3). Total num frames: 75415552. Throughput: 0: 1732.3. Samples: 13848504. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:31:22,933][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 01:31:27,413][42004] Updated weights for policy 0, policy_version 18416 (0.0037) +[2024-11-08 01:31:27,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6690.2, 300 sec: 6636.9). Total num frames: 75431936. Throughput: 0: 1618.4. Samples: 13854892. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:31:27,938][41694] Avg episode reward: [(0, '4.581')] +[2024-11-08 01:31:32,931][41694] Fps is (10 sec: 4915.3, 60 sec: 6621.9, 300 sec: 6609.1). Total num frames: 75464704. Throughput: 0: 1598.0. Samples: 13859240. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:31:32,934][41694] Avg episode reward: [(0, '4.608')] +[2024-11-08 01:31:34,194][42004] Updated weights for policy 0, policy_version 18426 (0.0027) +[2024-11-08 01:31:37,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6609.1). Total num frames: 75497472. Throughput: 0: 1645.4. Samples: 13869224. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:31:37,934][41694] Avg episode reward: [(0, '4.493')] +[2024-11-08 01:31:37,996][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000018433_75501568.pth... +[2024-11-08 01:31:38,113][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000018045_73912320.pth +[2024-11-08 01:31:39,717][42004] Updated weights for policy 0, policy_version 18436 (0.0033) +[2024-11-08 01:31:42,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6623.0). Total num frames: 75538432. Throughput: 0: 1655.7. Samples: 13880430. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:31:42,934][41694] Avg episode reward: [(0, '4.469')] +[2024-11-08 01:31:44,981][42004] Updated weights for policy 0, policy_version 18446 (0.0027) +[2024-11-08 01:31:47,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6621.9, 300 sec: 6678.6). Total num frames: 75575296. Throughput: 0: 1671.3. Samples: 13886182. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:31:47,933][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 01:31:50,554][42004] Updated weights for policy 0, policy_version 18456 (0.0032) +[2024-11-08 01:31:52,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6678.6). Total num frames: 75612160. Throughput: 0: 1680.4. Samples: 13897408. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:31:52,933][41694] Avg episode reward: [(0, '4.219')] +[2024-11-08 01:31:56,065][42004] Updated weights for policy 0, policy_version 18466 (0.0031) +[2024-11-08 01:31:59,382][41694] Fps is (10 sec: 6081.0, 60 sec: 6665.5, 300 sec: 6673.5). Total num frames: 75644928. Throughput: 0: 1644.5. Samples: 13908594. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:31:59,384][41694] Avg episode reward: [(0, '4.415')] +[2024-11-08 01:32:02,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6621.9, 300 sec: 6664.7). Total num frames: 75669504. Throughput: 0: 1621.1. Samples: 13910520. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:32:02,934][41694] Avg episode reward: [(0, '4.552')] +[2024-11-08 01:32:04,016][42004] Updated weights for policy 0, policy_version 18476 (0.0042) +[2024-11-08 01:32:07,932][41694] Fps is (10 sec: 6707.5, 60 sec: 6553.6, 300 sec: 6636.9). Total num frames: 75702272. Throughput: 0: 1589.7. Samples: 13920042. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:32:07,934][41694] Avg episode reward: [(0, '4.524')] +[2024-11-08 01:32:10,002][42004] Updated weights for policy 0, policy_version 18486 (0.0040) +[2024-11-08 01:32:12,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6553.6, 300 sec: 6636.9). Total num frames: 75739136. Throughput: 0: 1694.0. Samples: 13931120. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:32:12,933][41694] Avg episode reward: [(0, '4.395')] +[2024-11-08 01:32:15,363][42004] Updated weights for policy 0, policy_version 18496 (0.0032) +[2024-11-08 01:32:17,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6636.9). Total num frames: 75776000. Throughput: 0: 1723.4. Samples: 13936794. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:32:17,933][41694] Avg episode reward: [(0, '4.501')] +[2024-11-08 01:32:20,864][42004] Updated weights for policy 0, policy_version 18506 (0.0050) +[2024-11-08 01:32:22,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6621.8, 300 sec: 6678.6). Total num frames: 75812864. Throughput: 0: 1752.2. Samples: 13948074. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:32:22,937][41694] Avg episode reward: [(0, '4.391')] +[2024-11-08 01:32:26,341][42004] Updated weights for policy 0, policy_version 18516 (0.0023) +[2024-11-08 01:32:27,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7031.5, 300 sec: 6706.3). Total num frames: 75853824. Throughput: 0: 1753.0. Samples: 13959316. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:32:27,933][41694] Avg episode reward: [(0, '4.676')] +[2024-11-08 01:32:33,366][41694] Fps is (10 sec: 6281.1, 60 sec: 6845.4, 300 sec: 6654.9). Total num frames: 75878400. Throughput: 0: 1734.9. Samples: 13965006. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:32:33,367][41694] Avg episode reward: [(0, '4.512')] +[2024-11-08 01:32:33,471][42004] Updated weights for policy 0, policy_version 18526 (0.0033) +[2024-11-08 01:32:37,932][41694] Fps is (10 sec: 5733.9, 60 sec: 6894.9, 300 sec: 6650.8). Total num frames: 75911168. Throughput: 0: 1678.2. Samples: 13972928. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:32:37,934][41694] Avg episode reward: [(0, '4.528')] +[2024-11-08 01:32:39,408][42004] Updated weights for policy 0, policy_version 18536 (0.0026) +[2024-11-08 01:32:42,931][41694] Fps is (10 sec: 7279.3, 60 sec: 6826.7, 300 sec: 6650.8). Total num frames: 75948032. Throughput: 0: 1707.1. Samples: 13982938. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:32:42,933][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 01:32:45,202][42004] Updated weights for policy 0, policy_version 18546 (0.0028) +[2024-11-08 01:32:47,931][41694] Fps is (10 sec: 6963.9, 60 sec: 6758.4, 300 sec: 6636.9). Total num frames: 75980800. Throughput: 0: 1729.1. Samples: 13988330. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:32:47,933][41694] Avg episode reward: [(0, '4.134')] +[2024-11-08 01:32:50,995][42004] Updated weights for policy 0, policy_version 18556 (0.0036) +[2024-11-08 01:32:52,932][41694] Fps is (10 sec: 6962.8, 60 sec: 6758.4, 300 sec: 6623.0). Total num frames: 76017664. Throughput: 0: 1750.5. Samples: 13998814. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:32:52,934][41694] Avg episode reward: [(0, '4.443')] +[2024-11-08 01:32:56,476][42004] Updated weights for policy 0, policy_version 18566 (0.0033) +[2024-11-08 01:32:57,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6995.8, 300 sec: 6678.6). Total num frames: 76054528. Throughput: 0: 1757.0. Samples: 14010184. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:32:57,933][41694] Avg episode reward: [(0, '4.311')] +[2024-11-08 01:33:02,102][42004] Updated weights for policy 0, policy_version 18576 (0.0036) +[2024-11-08 01:33:02,931][41694] Fps is (10 sec: 7373.2, 60 sec: 7031.5, 300 sec: 6692.5). Total num frames: 76091392. Throughput: 0: 1756.1. Samples: 14015818. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:33:02,933][41694] Avg episode reward: [(0, '4.577')] +[2024-11-08 01:33:08,006][41694] Fps is (10 sec: 5692.0, 60 sec: 6818.2, 300 sec: 6635.2). Total num frames: 76111872. Throughput: 0: 1625.2. Samples: 14021326. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:33:08,007][41694] Avg episode reward: [(0, '4.461')] +[2024-11-08 01:33:09,787][42004] Updated weights for policy 0, policy_version 18586 (0.0025) +[2024-11-08 01:33:12,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6758.4, 300 sec: 6636.9). Total num frames: 76144640. Throughput: 0: 1636.7. Samples: 14032966. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:33:12,934][41694] Avg episode reward: [(0, '4.470')] +[2024-11-08 01:33:16,080][42004] Updated weights for policy 0, policy_version 18596 (0.0038) +[2024-11-08 01:33:17,931][41694] Fps is (10 sec: 7015.4, 60 sec: 6758.4, 300 sec: 6636.9). Total num frames: 76181504. Throughput: 0: 1638.3. Samples: 14038018. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:33:17,933][41694] Avg episode reward: [(0, '4.524')] +[2024-11-08 01:33:21,454][42004] Updated weights for policy 0, policy_version 18606 (0.0027) +[2024-11-08 01:33:22,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6623.0). Total num frames: 76218368. Throughput: 0: 1695.0. Samples: 14049200. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:33:22,935][41694] Avg episode reward: [(0, '4.268')] +[2024-11-08 01:33:26,922][42004] Updated weights for policy 0, policy_version 18616 (0.0030) +[2024-11-08 01:33:27,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.4, 300 sec: 6684.0). Total num frames: 76259328. Throughput: 0: 1725.6. Samples: 14060592. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:33:27,933][41694] Avg episode reward: [(0, '4.241')] +[2024-11-08 01:33:32,242][42004] Updated weights for policy 0, policy_version 18626 (0.0027) +[2024-11-08 01:33:32,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7014.0, 300 sec: 6678.6). Total num frames: 76296192. Throughput: 0: 1732.1. Samples: 14066276. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:33:32,934][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 01:33:37,793][42004] Updated weights for policy 0, policy_version 18636 (0.0029) +[2024-11-08 01:33:37,932][41694] Fps is (10 sec: 7372.1, 60 sec: 7031.5, 300 sec: 6692.5). Total num frames: 76333056. Throughput: 0: 1749.9. Samples: 14077560. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:33:37,936][41694] Avg episode reward: [(0, '4.396')] +[2024-11-08 01:33:37,954][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000018636_76333056.pth... +[2024-11-08 01:33:38,109][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000018247_74739712.pth +[2024-11-08 01:33:42,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6758.4, 300 sec: 6636.9). Total num frames: 76353536. Throughput: 0: 1663.2. Samples: 14085030. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:33:42,934][41694] Avg episode reward: [(0, '4.461')] +[2024-11-08 01:33:46,045][42004] Updated weights for policy 0, policy_version 18646 (0.0031) +[2024-11-08 01:33:47,931][41694] Fps is (10 sec: 4915.6, 60 sec: 6690.1, 300 sec: 6609.1). Total num frames: 76382208. Throughput: 0: 1637.0. Samples: 14089482. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:33:47,933][41694] Avg episode reward: [(0, '4.465')] +[2024-11-08 01:33:51,879][42004] Updated weights for policy 0, policy_version 18656 (0.0029) +[2024-11-08 01:33:52,932][41694] Fps is (10 sec: 6962.5, 60 sec: 6758.3, 300 sec: 6609.1). Total num frames: 76423168. Throughput: 0: 1744.6. Samples: 14099704. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:33:52,934][41694] Avg episode reward: [(0, '4.399')] +[2024-11-08 01:33:57,144][42004] Updated weights for policy 0, policy_version 18666 (0.0026) +[2024-11-08 01:33:57,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6758.4, 300 sec: 6623.0). Total num frames: 76460032. Throughput: 0: 1741.7. Samples: 14111344. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:33:57,933][41694] Avg episode reward: [(0, '4.252')] +[2024-11-08 01:34:02,699][42004] Updated weights for policy 0, policy_version 18676 (0.0027) +[2024-11-08 01:34:02,931][41694] Fps is (10 sec: 7373.5, 60 sec: 6758.4, 300 sec: 6665.1). Total num frames: 76496896. Throughput: 0: 1755.9. Samples: 14117032. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:34:02,935][41694] Avg episode reward: [(0, '4.423')] +[2024-11-08 01:34:07,932][41694] Fps is (10 sec: 7372.5, 60 sec: 7040.1, 300 sec: 6678.6). Total num frames: 76533760. Throughput: 0: 1748.6. Samples: 14127886. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:34:07,935][41694] Avg episode reward: [(0, '4.288')] +[2024-11-08 01:34:08,214][42004] Updated weights for policy 0, policy_version 18686 (0.0026) +[2024-11-08 01:34:12,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7168.0, 300 sec: 6706.4). Total num frames: 76574720. Throughput: 0: 1754.5. Samples: 14139544. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:34:12,934][41694] Avg episode reward: [(0, '4.640')] +[2024-11-08 01:34:13,385][42004] Updated weights for policy 0, policy_version 18696 (0.0027) +[2024-11-08 01:34:17,932][41694] Fps is (10 sec: 6144.3, 60 sec: 6894.9, 300 sec: 6664.7). Total num frames: 76595200. Throughput: 0: 1737.0. Samples: 14144440. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:34:17,942][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 01:34:21,423][42004] Updated weights for policy 0, policy_version 18706 (0.0032) +[2024-11-08 01:34:22,932][41694] Fps is (10 sec: 5324.3, 60 sec: 6826.6, 300 sec: 6650.8). Total num frames: 76627968. Throughput: 0: 1650.2. Samples: 14151818. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:34:22,935][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 01:34:27,144][42004] Updated weights for policy 0, policy_version 18716 (0.0040) +[2024-11-08 01:34:27,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6650.8). Total num frames: 76664832. Throughput: 0: 1724.3. Samples: 14162626. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:34:27,935][41694] Avg episode reward: [(0, '4.542')] +[2024-11-08 01:34:32,505][42004] Updated weights for policy 0, policy_version 18726 (0.0030) +[2024-11-08 01:34:32,931][41694] Fps is (10 sec: 7373.5, 60 sec: 6758.4, 300 sec: 6650.8). Total num frames: 76701696. Throughput: 0: 1752.7. Samples: 14168354. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:34:32,933][41694] Avg episode reward: [(0, '4.291')] +[2024-11-08 01:34:37,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.5, 300 sec: 6706.3). Total num frames: 76738560. Throughput: 0: 1768.5. Samples: 14179284. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:34:37,934][41694] Avg episode reward: [(0, '4.327')] +[2024-11-08 01:34:38,028][42004] Updated weights for policy 0, policy_version 18736 (0.0049) +[2024-11-08 01:34:42,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 6734.1). Total num frames: 76775424. Throughput: 0: 1760.8. Samples: 14190578. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:34:42,935][41694] Avg episode reward: [(0, '4.341')] +[2024-11-08 01:34:43,574][42004] Updated weights for policy 0, policy_version 18746 (0.0026) +[2024-11-08 01:34:47,931][41694] Fps is (10 sec: 7373.0, 60 sec: 7168.0, 300 sec: 6761.9). Total num frames: 76812288. Throughput: 0: 1758.8. Samples: 14196180. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:34:47,933][41694] Avg episode reward: [(0, '4.217')] +[2024-11-08 01:34:50,995][42004] Updated weights for policy 0, policy_version 18756 (0.0026) +[2024-11-08 01:34:52,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6826.8, 300 sec: 6734.1). Total num frames: 76832768. Throughput: 0: 1686.4. Samples: 14203772. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:34:52,936][41694] Avg episode reward: [(0, '4.347')] +[2024-11-08 01:34:57,119][42004] Updated weights for policy 0, policy_version 18766 (0.0028) +[2024-11-08 01:34:57,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6826.7, 300 sec: 6761.9). Total num frames: 76869632. Throughput: 0: 1650.8. Samples: 14213832. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:34:57,934][41694] Avg episode reward: [(0, '4.284')] +[2024-11-08 01:35:02,545][42004] Updated weights for policy 0, policy_version 18776 (0.0030) +[2024-11-08 01:35:02,933][41694] Fps is (10 sec: 7371.9, 60 sec: 6826.5, 300 sec: 6775.7). Total num frames: 76906496. Throughput: 0: 1671.2. Samples: 14219646. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 01:35:02,935][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 01:35:07,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6826.7, 300 sec: 6775.7). Total num frames: 76943360. Throughput: 0: 1748.8. Samples: 14230512. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 01:35:07,935][41694] Avg episode reward: [(0, '4.388')] +[2024-11-08 01:35:08,099][42004] Updated weights for policy 0, policy_version 18786 (0.0033) +[2024-11-08 01:35:12,932][41694] Fps is (10 sec: 7373.2, 60 sec: 6758.3, 300 sec: 6845.2). Total num frames: 76980224. Throughput: 0: 1758.6. Samples: 14241764. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 01:35:12,936][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 01:35:13,714][42004] Updated weights for policy 0, policy_version 18796 (0.0030) +[2024-11-08 01:35:17,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6963.2, 300 sec: 6845.2). Total num frames: 77012992. Throughput: 0: 1738.5. Samples: 14246586. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:35:17,935][41694] Avg episode reward: [(0, '4.200')] +[2024-11-08 01:35:19,654][42004] Updated weights for policy 0, policy_version 18806 (0.0032) +[2024-11-08 01:35:22,931][41694] Fps is (10 sec: 6963.7, 60 sec: 7031.6, 300 sec: 6845.2). Total num frames: 77049856. Throughput: 0: 1737.4. Samples: 14257468. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:35:22,934][41694] Avg episode reward: [(0, '4.608')] +[2024-11-08 01:35:27,587][42004] Updated weights for policy 0, policy_version 18816 (0.2051) +[2024-11-08 01:35:27,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 77070336. Throughput: 0: 1639.3. Samples: 14264346. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:35:27,937][41694] Avg episode reward: [(0, '4.456')] +[2024-11-08 01:35:32,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 77107200. Throughput: 0: 1627.8. Samples: 14269432. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:35:32,933][41694] Avg episode reward: [(0, '4.423')] +[2024-11-08 01:35:33,055][42004] Updated weights for policy 0, policy_version 18826 (0.0027) +[2024-11-08 01:35:37,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 77144064. Throughput: 0: 1717.3. Samples: 14281050. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:35:37,933][41694] Avg episode reward: [(0, '4.292')] +[2024-11-08 01:35:37,987][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000018835_77148160.pth... +[2024-11-08 01:35:38,116][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000018433_75501568.pth +[2024-11-08 01:35:38,637][42004] Updated weights for policy 0, policy_version 18836 (0.0034) +[2024-11-08 01:35:42,933][41694] Fps is (10 sec: 7372.0, 60 sec: 6758.3, 300 sec: 6789.6). Total num frames: 77180928. Throughput: 0: 1731.4. Samples: 14291746. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:35:42,934][41694] Avg episode reward: [(0, '4.210')] +[2024-11-08 01:35:44,305][42004] Updated weights for policy 0, policy_version 18846 (0.0029) +[2024-11-08 01:35:47,939][41694] Fps is (10 sec: 7371.3, 60 sec: 6758.2, 300 sec: 6831.3). Total num frames: 77217792. Throughput: 0: 1719.8. Samples: 14297038. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:35:47,942][41694] Avg episode reward: [(0, '4.173')] +[2024-11-08 01:35:49,859][42004] Updated weights for policy 0, policy_version 18856 (0.0025) +[2024-11-08 01:35:52,932][41694] Fps is (10 sec: 7373.4, 60 sec: 7031.4, 300 sec: 6845.2). Total num frames: 77254656. Throughput: 0: 1726.1. Samples: 14308188. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:35:52,933][41694] Avg episode reward: [(0, '4.277')] +[2024-11-08 01:35:55,373][42004] Updated weights for policy 0, policy_version 18866 (0.0024) +[2024-11-08 01:35:59,234][41694] Fps is (10 sec: 6161.9, 60 sec: 6815.3, 300 sec: 6801.3). Total num frames: 77287424. Throughput: 0: 1673.3. Samples: 14319240. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:35:59,238][41694] Avg episode reward: [(0, '4.508')] +[2024-11-08 01:36:02,931][41694] Fps is (10 sec: 5325.0, 60 sec: 6690.3, 300 sec: 6775.8). Total num frames: 77307904. Throughput: 0: 1649.6. Samples: 14320818. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:36:02,934][41694] Avg episode reward: [(0, '4.317')] +[2024-11-08 01:36:03,671][42004] Updated weights for policy 0, policy_version 18876 (0.0022) +[2024-11-08 01:36:07,931][41694] Fps is (10 sec: 6593.2, 60 sec: 6690.2, 300 sec: 6775.8). Total num frames: 77344768. Throughput: 0: 1631.4. Samples: 14330880. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:36:07,934][41694] Avg episode reward: [(0, '4.399')] +[2024-11-08 01:36:09,342][42004] Updated weights for policy 0, policy_version 18886 (0.0027) +[2024-11-08 01:36:12,935][41694] Fps is (10 sec: 6960.9, 60 sec: 6621.6, 300 sec: 6775.7). Total num frames: 77377536. Throughput: 0: 1692.4. Samples: 14340508. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:36:12,937][41694] Avg episode reward: [(0, '4.580')] +[2024-11-08 01:36:15,858][42004] Updated weights for policy 0, policy_version 18896 (0.0044) +[2024-11-08 01:36:17,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6621.9, 300 sec: 6761.9). Total num frames: 77410304. Throughput: 0: 1689.8. Samples: 14345472. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:36:17,935][41694] Avg episode reward: [(0, '4.297')] +[2024-11-08 01:36:21,427][42004] Updated weights for policy 0, policy_version 18906 (0.0026) +[2024-11-08 01:36:22,932][41694] Fps is (10 sec: 6965.4, 60 sec: 6621.9, 300 sec: 6831.3). Total num frames: 77447168. Throughput: 0: 1676.5. Samples: 14356492. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:36:22,936][41694] Avg episode reward: [(0, '4.281')] +[2024-11-08 01:36:26,719][42004] Updated weights for policy 0, policy_version 18916 (0.0041) +[2024-11-08 01:36:27,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6859.1). Total num frames: 77488128. Throughput: 0: 1695.5. Samples: 14368040. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:36:27,934][41694] Avg episode reward: [(0, '4.376')] +[2024-11-08 01:36:33,465][41694] Fps is (10 sec: 6221.6, 60 sec: 6698.8, 300 sec: 6819.0). Total num frames: 77512704. Throughput: 0: 1680.4. Samples: 14373550. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:36:33,467][41694] Avg episode reward: [(0, '4.463')] +[2024-11-08 01:36:34,539][42004] Updated weights for policy 0, policy_version 18926 (0.0035) +[2024-11-08 01:36:37,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.9, 300 sec: 6789.6). Total num frames: 77541376. Throughput: 0: 1599.3. Samples: 14380154. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:36:37,934][41694] Avg episode reward: [(0, '4.451')] +[2024-11-08 01:36:40,237][42004] Updated weights for policy 0, policy_version 18936 (0.0028) +[2024-11-08 01:36:42,931][41694] Fps is (10 sec: 6923.1, 60 sec: 6622.0, 300 sec: 6789.6). Total num frames: 77578240. Throughput: 0: 1650.3. Samples: 14391356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:36:42,935][41694] Avg episode reward: [(0, '4.369')] +[2024-11-08 01:36:45,573][42004] Updated weights for policy 0, policy_version 18946 (0.0044) +[2024-11-08 01:36:47,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6690.4, 300 sec: 6803.5). Total num frames: 77619200. Throughput: 0: 1697.1. Samples: 14397188. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:36:47,933][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 01:36:50,748][42004] Updated weights for policy 0, policy_version 18956 (0.0025) +[2024-11-08 01:36:52,933][41694] Fps is (10 sec: 8191.0, 60 sec: 6758.3, 300 sec: 6865.0). Total num frames: 77660160. Throughput: 0: 1738.5. Samples: 14409116. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:36:52,934][41694] Avg episode reward: [(0, '4.307')] +[2024-11-08 01:36:55,996][42004] Updated weights for policy 0, policy_version 18966 (0.0031) +[2024-11-08 01:36:57,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6978.1, 300 sec: 6873.0). Total num frames: 77697024. Throughput: 0: 1785.1. Samples: 14420832. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:36:57,934][41694] Avg episode reward: [(0, '4.572')] +[2024-11-08 01:37:01,557][42004] Updated weights for policy 0, policy_version 18976 (0.0036) +[2024-11-08 01:37:02,932][41694] Fps is (10 sec: 7373.6, 60 sec: 7099.7, 300 sec: 6886.8). Total num frames: 77733888. Throughput: 0: 1797.4. Samples: 14426354. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:37:02,934][41694] Avg episode reward: [(0, '4.546')] +[2024-11-08 01:37:07,971][41694] Fps is (10 sec: 5304.0, 60 sec: 6754.0, 300 sec: 6816.5). Total num frames: 77750272. Throughput: 0: 1666.1. Samples: 14431532. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:37:07,973][41694] Avg episode reward: [(0, '4.545')] +[2024-11-08 01:37:09,805][42004] Updated weights for policy 0, policy_version 18986 (0.0037) +[2024-11-08 01:37:12,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6827.0, 300 sec: 6817.4). Total num frames: 77787136. Throughput: 0: 1666.1. Samples: 14443014. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:37:12,935][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 01:37:15,409][42004] Updated weights for policy 0, policy_version 18996 (0.0031) +[2024-11-08 01:37:17,931][41694] Fps is (10 sec: 7401.8, 60 sec: 6894.9, 300 sec: 6817.4). Total num frames: 77824000. Throughput: 0: 1690.3. Samples: 14448712. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 01:37:17,933][41694] Avg episode reward: [(0, '4.522')] +[2024-11-08 01:37:20,802][42004] Updated weights for policy 0, policy_version 19006 (0.0025) +[2024-11-08 01:37:22,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6963.2, 300 sec: 6817.4). Total num frames: 77864960. Throughput: 0: 1777.8. Samples: 14460156. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 01:37:22,933][41694] Avg episode reward: [(0, '4.565')] +[2024-11-08 01:37:26,112][42004] Updated weights for policy 0, policy_version 19016 (0.0022) +[2024-11-08 01:37:27,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6894.9, 300 sec: 6869.2). Total num frames: 77901824. Throughput: 0: 1787.3. Samples: 14471784. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:37:27,933][41694] Avg episode reward: [(0, '4.338')] +[2024-11-08 01:37:31,533][42004] Updated weights for policy 0, policy_version 19026 (0.0031) +[2024-11-08 01:37:32,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7163.5, 300 sec: 6873.0). Total num frames: 77938688. Throughput: 0: 1776.6. Samples: 14477136. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:37:32,936][41694] Avg episode reward: [(0, '4.394')] +[2024-11-08 01:37:37,090][42004] Updated weights for policy 0, policy_version 19036 (0.0024) +[2024-11-08 01:37:37,939][41694] Fps is (10 sec: 7367.3, 60 sec: 7235.4, 300 sec: 6872.8). Total num frames: 77975552. Throughput: 0: 1765.6. Samples: 14488580. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:37:37,941][41694] Avg episode reward: [(0, '4.483')] +[2024-11-08 01:37:37,963][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000019037_77975552.pth... +[2024-11-08 01:37:38,119][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000018636_76333056.pth +[2024-11-08 01:37:42,932][41694] Fps is (10 sec: 5324.6, 60 sec: 6894.9, 300 sec: 6817.4). Total num frames: 77991936. Throughput: 0: 1652.9. Samples: 14495214. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:37:42,934][41694] Avg episode reward: [(0, '4.410')] +[2024-11-08 01:37:45,374][42004] Updated weights for policy 0, policy_version 19046 (0.0041) +[2024-11-08 01:37:47,932][41694] Fps is (10 sec: 5328.7, 60 sec: 6826.6, 300 sec: 6817.4). Total num frames: 78028800. Throughput: 0: 1637.4. Samples: 14500038. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:37:47,933][41694] Avg episode reward: [(0, '4.393')] +[2024-11-08 01:37:50,798][42004] Updated weights for policy 0, policy_version 19056 (0.0029) +[2024-11-08 01:37:52,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6826.8, 300 sec: 6831.3). Total num frames: 78069760. Throughput: 0: 1780.6. Samples: 14511590. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:37:52,933][41694] Avg episode reward: [(0, '4.556')] +[2024-11-08 01:37:55,912][42004] Updated weights for policy 0, policy_version 19066 (0.0024) +[2024-11-08 01:37:57,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6826.7, 300 sec: 6831.3). Total num frames: 78106624. Throughput: 0: 1786.1. Samples: 14523388. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:37:57,933][41694] Avg episode reward: [(0, '4.579')] +[2024-11-08 01:38:01,596][42004] Updated weights for policy 0, policy_version 19076 (0.0026) +[2024-11-08 01:38:02,935][41694] Fps is (10 sec: 7370.5, 60 sec: 6826.3, 300 sec: 6888.5). Total num frames: 78143488. Throughput: 0: 1778.2. Samples: 14528736. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:38:02,948][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 01:38:06,921][42004] Updated weights for policy 0, policy_version 19086 (0.0026) +[2024-11-08 01:38:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7172.7, 300 sec: 6900.7). Total num frames: 78180352. Throughput: 0: 1770.3. Samples: 14539820. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:38:07,934][41694] Avg episode reward: [(0, '4.413')] +[2024-11-08 01:38:12,846][42004] Updated weights for policy 0, policy_version 19096 (0.0036) +[2024-11-08 01:38:12,931][41694] Fps is (10 sec: 7375.1, 60 sec: 7168.0, 300 sec: 6900.7). Total num frames: 78217216. Throughput: 0: 1747.0. Samples: 14550398. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:38:12,933][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 01:38:17,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6826.7, 300 sec: 6831.3). Total num frames: 78233600. Throughput: 0: 1710.9. Samples: 14554126. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:38:17,933][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 01:38:21,085][42004] Updated weights for policy 0, policy_version 19106 (0.0042) +[2024-11-08 01:38:22,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 78270464. Throughput: 0: 1633.3. Samples: 14562066. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:38:22,933][41694] Avg episode reward: [(0, '4.564')] +[2024-11-08 01:38:26,501][42004] Updated weights for policy 0, policy_version 19116 (0.0022) +[2024-11-08 01:38:27,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 78307328. Throughput: 0: 1740.4. Samples: 14573534. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:38:27,933][41694] Avg episode reward: [(0, '4.476')] +[2024-11-08 01:38:31,688][42004] Updated weights for policy 0, policy_version 19126 (0.0024) +[2024-11-08 01:38:32,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6826.7, 300 sec: 6831.3). Total num frames: 78348288. Throughput: 0: 1761.9. Samples: 14579324. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:38:32,935][41694] Avg episode reward: [(0, '4.568')] +[2024-11-08 01:38:36,990][42004] Updated weights for policy 0, policy_version 19136 (0.0026) +[2024-11-08 01:38:37,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6827.5, 300 sec: 6886.8). Total num frames: 78385152. Throughput: 0: 1763.8. Samples: 14590962. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:38:37,933][41694] Avg episode reward: [(0, '4.621')] +[2024-11-08 01:38:42,447][42004] Updated weights for policy 0, policy_version 19146 (0.0029) +[2024-11-08 01:38:42,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7168.0, 300 sec: 6914.6). Total num frames: 78422016. Throughput: 0: 1752.9. Samples: 14602270. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:38:42,933][41694] Avg episode reward: [(0, '4.411')] +[2024-11-08 01:38:47,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7168.0, 300 sec: 6900.7). Total num frames: 78458880. Throughput: 0: 1758.1. Samples: 14607846. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:38:47,934][41694] Avg episode reward: [(0, '4.452')] +[2024-11-08 01:38:48,273][42004] Updated weights for policy 0, policy_version 19156 (0.0031) +[2024-11-08 01:38:52,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6826.7, 300 sec: 6845.2). Total num frames: 78479360. Throughput: 0: 1652.8. Samples: 14614194. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:38:52,932][41694] Avg episode reward: [(0, '4.570')] +[2024-11-08 01:38:56,218][42004] Updated weights for policy 0, policy_version 19166 (0.0027) +[2024-11-08 01:38:57,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6826.7, 300 sec: 6845.2). Total num frames: 78516224. Throughput: 0: 1660.7. Samples: 14625130. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:38:57,933][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 01:39:01,841][42004] Updated weights for policy 0, policy_version 19176 (0.0026) +[2024-11-08 01:39:02,931][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.7, 300 sec: 6831.3). Total num frames: 78548992. Throughput: 0: 1696.5. Samples: 14630468. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:39:02,933][41694] Avg episode reward: [(0, '4.399')] +[2024-11-08 01:39:07,231][42004] Updated weights for policy 0, policy_version 19186 (0.0029) +[2024-11-08 01:39:07,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6826.6, 300 sec: 6831.3). Total num frames: 78589952. Throughput: 0: 1769.0. Samples: 14641674. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:39:07,935][41694] Avg episode reward: [(0, '4.458')] +[2024-11-08 01:39:12,813][42004] Updated weights for policy 0, policy_version 19196 (0.0035) +[2024-11-08 01:39:12,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6826.7, 300 sec: 6886.8). Total num frames: 78626816. Throughput: 0: 1763.3. Samples: 14652880. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:39:12,933][41694] Avg episode reward: [(0, '4.349')] +[2024-11-08 01:39:17,932][41694] Fps is (10 sec: 7373.1, 60 sec: 7168.0, 300 sec: 6900.7). Total num frames: 78663680. Throughput: 0: 1755.3. Samples: 14658312. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:39:17,933][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 01:39:18,440][42004] Updated weights for policy 0, policy_version 19206 (0.0035) +[2024-11-08 01:39:22,931][41694] Fps is (10 sec: 6963.2, 60 sec: 7099.7, 300 sec: 6886.8). Total num frames: 78696448. Throughput: 0: 1720.5. Samples: 14668386. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:39:22,933][41694] Avg episode reward: [(0, '4.390')] +[2024-11-08 01:39:26,599][42004] Updated weights for policy 0, policy_version 19216 (0.0030) +[2024-11-08 01:39:27,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6826.7, 300 sec: 6831.3). Total num frames: 78716928. Throughput: 0: 1626.6. Samples: 14675468. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:39:27,934][41694] Avg episode reward: [(0, '4.431')] +[2024-11-08 01:39:32,199][42004] Updated weights for policy 0, policy_version 19226 (0.0032) +[2024-11-08 01:39:32,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 78753792. Throughput: 0: 1620.8. Samples: 14680782. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:39:32,934][41694] Avg episode reward: [(0, '4.555')] +[2024-11-08 01:39:37,603][42004] Updated weights for policy 0, policy_version 19236 (0.0020) +[2024-11-08 01:39:37,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 78790656. Throughput: 0: 1733.0. Samples: 14692180. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:39:37,933][41694] Avg episode reward: [(0, '4.583')] +[2024-11-08 01:39:37,950][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000019236_78790656.pth... +[2024-11-08 01:39:38,065][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000018835_77148160.pth +[2024-11-08 01:39:42,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 78827520. Throughput: 0: 1741.9. Samples: 14703516. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:39:42,935][41694] Avg episode reward: [(0, '4.310')] +[2024-11-08 01:39:43,070][42004] Updated weights for policy 0, policy_version 19246 (0.0029) +[2024-11-08 01:39:47,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6826.7, 300 sec: 6900.7). Total num frames: 78868480. Throughput: 0: 1752.1. Samples: 14709312. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:39:47,933][41694] Avg episode reward: [(0, '4.321')] +[2024-11-08 01:39:48,215][42004] Updated weights for policy 0, policy_version 19256 (0.0019) +[2024-11-08 01:39:52,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7031.4, 300 sec: 6886.8). Total num frames: 78901248. Throughput: 0: 1755.2. Samples: 14720656. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:39:52,935][41694] Avg episode reward: [(0, '4.318')] +[2024-11-08 01:39:54,223][42004] Updated weights for policy 0, policy_version 19266 (0.0038) +[2024-11-08 01:39:59,967][41694] Fps is (10 sec: 5445.1, 60 sec: 6734.7, 300 sec: 6825.9). Total num frames: 78934016. Throughput: 0: 1652.6. Samples: 14730612. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:39:59,972][41694] Avg episode reward: [(0, '4.290')] +[2024-11-08 01:40:02,550][42004] Updated weights for policy 0, policy_version 19276 (0.0028) +[2024-11-08 01:40:02,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 78954496. Throughput: 0: 1634.5. Samples: 14731864. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:40:02,933][41694] Avg episode reward: [(0, '4.283')] +[2024-11-08 01:40:07,932][41694] Fps is (10 sec: 7200.2, 60 sec: 6690.2, 300 sec: 6817.4). Total num frames: 78991360. Throughput: 0: 1639.4. Samples: 14742160. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:40:07,935][41694] Avg episode reward: [(0, '4.669')] +[2024-11-08 01:40:08,341][42004] Updated weights for policy 0, policy_version 19286 (0.0029) +[2024-11-08 01:40:12,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6831.3). Total num frames: 79028224. Throughput: 0: 1723.1. Samples: 14753008. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:40:12,933][41694] Avg episode reward: [(0, '4.746')] +[2024-11-08 01:40:14,176][42004] Updated weights for policy 0, policy_version 19296 (0.0033) +[2024-11-08 01:40:17,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6621.9, 300 sec: 6817.4). Total num frames: 79060992. Throughput: 0: 1717.7. Samples: 14758078. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:40:17,934][41694] Avg episode reward: [(0, '4.344')] +[2024-11-08 01:40:20,277][42004] Updated weights for policy 0, policy_version 19306 (0.0045) +[2024-11-08 01:40:22,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6690.1, 300 sec: 6873.0). Total num frames: 79097856. Throughput: 0: 1693.9. Samples: 14768404. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:40:22,933][41694] Avg episode reward: [(0, '4.222')] +[2024-11-08 01:40:25,730][42004] Updated weights for policy 0, policy_version 19316 (0.0030) +[2024-11-08 01:40:27,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 6859.1). Total num frames: 79130624. Throughput: 0: 1681.2. Samples: 14779170. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:40:27,934][41694] Avg episode reward: [(0, '4.407')] +[2024-11-08 01:40:31,559][42004] Updated weights for policy 0, policy_version 19326 (0.0030) +[2024-11-08 01:40:34,582][41694] Fps is (10 sec: 5625.4, 60 sec: 6643.9, 300 sec: 6807.1). Total num frames: 79163392. Throughput: 0: 1608.3. Samples: 14784338. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:40:34,585][41694] Avg episode reward: [(0, '4.486')] +[2024-11-08 01:40:37,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6621.9, 300 sec: 6803.5). Total num frames: 79187968. Throughput: 0: 1565.1. Samples: 14791086. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:40:37,934][41694] Avg episode reward: [(0, '4.415')] +[2024-11-08 01:40:39,696][42004] Updated weights for policy 0, policy_version 19336 (0.0034) +[2024-11-08 01:40:42,932][41694] Fps is (10 sec: 6867.6, 60 sec: 6553.6, 300 sec: 6789.7). Total num frames: 79220736. Throughput: 0: 1657.0. Samples: 14801802. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:40:42,933][41694] Avg episode reward: [(0, '4.394')] +[2024-11-08 01:40:45,221][42004] Updated weights for policy 0, policy_version 19346 (0.0027) +[2024-11-08 01:40:47,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6553.6, 300 sec: 6803.5). Total num frames: 79261696. Throughput: 0: 1681.0. Samples: 14807508. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:40:47,934][41694] Avg episode reward: [(0, '4.364')] +[2024-11-08 01:40:50,395][42004] Updated weights for policy 0, policy_version 19356 (0.0028) +[2024-11-08 01:40:52,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6621.9, 300 sec: 6847.6). Total num frames: 79298560. Throughput: 0: 1715.9. Samples: 14819374. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:40:52,933][41694] Avg episode reward: [(0, '4.609')] +[2024-11-08 01:40:55,831][42004] Updated weights for policy 0, policy_version 19366 (0.0040) +[2024-11-08 01:40:57,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6925.1, 300 sec: 6872.9). Total num frames: 79335424. Throughput: 0: 1727.5. Samples: 14830746. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 01:40:57,934][41694] Avg episode reward: [(0, '4.484')] +[2024-11-08 01:41:01,931][42004] Updated weights for policy 0, policy_version 19376 (0.0041) +[2024-11-08 01:41:02,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6894.9, 300 sec: 6859.1). Total num frames: 79368192. Throughput: 0: 1724.2. Samples: 14835666. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 01:41:02,935][41694] Avg episode reward: [(0, '4.593')] +[2024-11-08 01:41:09,141][41694] Fps is (10 sec: 5481.3, 60 sec: 6624.9, 300 sec: 6817.3). Total num frames: 79396864. Throughput: 0: 1680.7. Samples: 14846066. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 01:41:09,143][41694] Avg episode reward: [(0, '4.497')] +[2024-11-08 01:41:10,090][42004] Updated weights for policy 0, policy_version 19386 (0.0039) +[2024-11-08 01:41:12,935][41694] Fps is (10 sec: 5322.8, 60 sec: 6553.2, 300 sec: 6817.3). Total num frames: 79421440. Throughput: 0: 1611.2. Samples: 14851680. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:41:12,939][41694] Avg episode reward: [(0, '4.451')] +[2024-11-08 01:41:16,683][42004] Updated weights for policy 0, policy_version 19396 (0.0029) +[2024-11-08 01:41:17,931][41694] Fps is (10 sec: 6523.3, 60 sec: 6553.6, 300 sec: 6803.5). Total num frames: 79454208. Throughput: 0: 1665.9. Samples: 14856556. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:41:17,934][41694] Avg episode reward: [(0, '4.408')] +[2024-11-08 01:41:21,938][42004] Updated weights for policy 0, policy_version 19406 (0.0032) +[2024-11-08 01:41:22,931][41694] Fps is (10 sec: 6965.9, 60 sec: 6553.6, 300 sec: 6789.6). Total num frames: 79491072. Throughput: 0: 1698.1. Samples: 14867502. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:41:22,933][41694] Avg episode reward: [(0, '4.472')] +[2024-11-08 01:41:27,059][42004] Updated weights for policy 0, policy_version 19416 (0.0035) +[2024-11-08 01:41:27,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6690.1, 300 sec: 6857.6). Total num frames: 79532032. Throughput: 0: 1724.8. Samples: 14879416. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:41:27,933][41694] Avg episode reward: [(0, '4.653')] +[2024-11-08 01:41:32,255][42004] Updated weights for policy 0, policy_version 19426 (0.0026) +[2024-11-08 01:41:32,931][41694] Fps is (10 sec: 8192.0, 60 sec: 7019.7, 300 sec: 6886.8). Total num frames: 79572992. Throughput: 0: 1731.4. Samples: 14885420. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:41:32,935][41694] Avg episode reward: [(0, '4.358')] +[2024-11-08 01:41:37,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6872.9). Total num frames: 79605760. Throughput: 0: 1703.6. Samples: 14896034. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:41:37,933][41694] Avg episode reward: [(0, '4.452')] +[2024-11-08 01:41:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000019435_79605760.pth... +[2024-11-08 01:41:38,048][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000019037_77975552.pth +[2024-11-08 01:41:38,267][42004] Updated weights for policy 0, policy_version 19436 (0.0031) +[2024-11-08 01:41:43,818][41694] Fps is (10 sec: 5267.3, 60 sec: 6727.3, 300 sec: 6797.0). Total num frames: 79630336. Throughput: 0: 1539.8. Samples: 14901402. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:41:43,820][41694] Avg episode reward: [(0, '4.553')] +[2024-11-08 01:41:46,273][42004] Updated weights for policy 0, policy_version 19446 (0.0023) +[2024-11-08 01:41:47,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.9, 300 sec: 6775.8). Total num frames: 79659008. Throughput: 0: 1609.2. Samples: 14908082. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:41:47,933][41694] Avg episode reward: [(0, '4.331')] +[2024-11-08 01:41:52,059][42004] Updated weights for policy 0, policy_version 19456 (0.0030) +[2024-11-08 01:41:52,932][41694] Fps is (10 sec: 7191.1, 60 sec: 6621.8, 300 sec: 6775.8). Total num frames: 79695872. Throughput: 0: 1657.9. Samples: 14918666. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:41:52,934][41694] Avg episode reward: [(0, '4.277')] +[2024-11-08 01:41:57,434][42004] Updated weights for policy 0, policy_version 19466 (0.0034) +[2024-11-08 01:41:57,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6775.8). Total num frames: 79732736. Throughput: 0: 1738.8. Samples: 14929918. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:41:57,937][41694] Avg episode reward: [(0, '4.383')] +[2024-11-08 01:42:02,823][42004] Updated weights for policy 0, policy_version 19476 (0.0024) +[2024-11-08 01:42:02,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.4, 300 sec: 6860.0). Total num frames: 79773696. Throughput: 0: 1759.9. Samples: 14935752. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:42:02,933][41694] Avg episode reward: [(0, '4.469')] +[2024-11-08 01:42:07,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7036.8, 300 sec: 6859.1). Total num frames: 79810560. Throughput: 0: 1765.4. Samples: 14946944. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:42:07,935][41694] Avg episode reward: [(0, '4.263')] +[2024-11-08 01:42:08,441][42004] Updated weights for policy 0, policy_version 19486 (0.0036) +[2024-11-08 01:42:12,931][41694] Fps is (10 sec: 6963.3, 60 sec: 7031.9, 300 sec: 6845.2). Total num frames: 79843328. Throughput: 0: 1729.1. Samples: 14957226. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:42:12,932][41694] Avg episode reward: [(0, '4.331')] +[2024-11-08 01:42:14,263][42004] Updated weights for policy 0, policy_version 19496 (0.0034) +[2024-11-08 01:42:18,380][41694] Fps is (10 sec: 5488.2, 60 sec: 6843.8, 300 sec: 6779.3). Total num frames: 79867904. Throughput: 0: 1702.3. Samples: 14962788. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:42:18,384][41694] Avg episode reward: [(0, '4.294')] +[2024-11-08 01:42:22,220][42004] Updated weights for policy 0, policy_version 19506 (0.0029) +[2024-11-08 01:42:22,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6826.7, 300 sec: 6775.8). Total num frames: 79900672. Throughput: 0: 1630.7. Samples: 14969416. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:42:22,935][41694] Avg episode reward: [(0, '4.392')] +[2024-11-08 01:42:27,668][42004] Updated weights for policy 0, policy_version 19516 (0.0024) +[2024-11-08 01:42:27,932][41694] Fps is (10 sec: 7290.1, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 79937536. Throughput: 0: 1798.3. Samples: 14980730. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:42:27,933][41694] Avg episode reward: [(0, '4.670')] +[2024-11-08 01:42:32,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6775.9). Total num frames: 79974400. Throughput: 0: 1737.8. Samples: 14986284. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:42:32,936][41694] Avg episode reward: [(0, '4.441')] +[2024-11-08 01:42:33,079][42004] Updated weights for policy 0, policy_version 19526 (0.0033) +[2024-11-08 01:42:37,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6826.7, 300 sec: 6859.1). Total num frames: 80015360. Throughput: 0: 1764.2. Samples: 14998054. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:42:37,933][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 01:42:38,421][42004] Updated weights for policy 0, policy_version 19536 (0.0024) +[2024-11-08 01:42:42,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7067.6, 300 sec: 6845.2). Total num frames: 80048128. Throughput: 0: 1748.1. Samples: 15008584. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:42:42,934][41694] Avg episode reward: [(0, '4.354')] +[2024-11-08 01:42:44,397][42004] Updated weights for policy 0, policy_version 19546 (0.0025) +[2024-11-08 01:42:47,931][41694] Fps is (10 sec: 6963.2, 60 sec: 7099.7, 300 sec: 6831.3). Total num frames: 80084992. Throughput: 0: 1733.5. Samples: 15013760. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:42:47,933][41694] Avg episode reward: [(0, '4.426')] +[2024-11-08 01:42:49,778][42004] Updated weights for policy 0, policy_version 19556 (0.0040) +[2024-11-08 01:42:52,931][41694] Fps is (10 sec: 6144.2, 60 sec: 6895.0, 300 sec: 6789.6). Total num frames: 80109568. Throughput: 0: 1723.4. Samples: 15024498. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:42:52,933][41694] Avg episode reward: [(0, '4.362')] +[2024-11-08 01:42:57,023][42004] Updated weights for policy 0, policy_version 19566 (0.0024) +[2024-11-08 01:42:57,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6894.9, 300 sec: 6789.7). Total num frames: 80146432. Throughput: 0: 1686.7. Samples: 15033126. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:42:57,933][41694] Avg episode reward: [(0, '4.741')] +[2024-11-08 01:43:02,409][42004] Updated weights for policy 0, policy_version 19576 (0.0037) +[2024-11-08 01:43:02,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6789.6). Total num frames: 80183296. Throughput: 0: 1707.7. Samples: 15038870. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:43:02,933][41694] Avg episode reward: [(0, '4.570')] +[2024-11-08 01:43:07,428][42004] Updated weights for policy 0, policy_version 19586 (0.0034) +[2024-11-08 01:43:07,931][41694] Fps is (10 sec: 8192.0, 60 sec: 6963.2, 300 sec: 6817.4). Total num frames: 80228352. Throughput: 0: 1805.2. Samples: 15050648. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:43:07,933][41694] Avg episode reward: [(0, '4.442')] +[2024-11-08 01:43:12,761][42004] Updated weights for policy 0, policy_version 19596 (0.0032) +[2024-11-08 01:43:12,931][41694] Fps is (10 sec: 8192.0, 60 sec: 7031.5, 300 sec: 6886.8). Total num frames: 80265216. Throughput: 0: 1818.0. Samples: 15062540. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:43:12,934][41694] Avg episode reward: [(0, '4.544')] +[2024-11-08 01:43:17,931][41694] Fps is (10 sec: 6963.2, 60 sec: 7222.0, 300 sec: 6872.9). Total num frames: 80297984. Throughput: 0: 1802.7. Samples: 15067404. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:43:17,933][41694] Avg episode reward: [(0, '4.259')] +[2024-11-08 01:43:18,767][42004] Updated weights for policy 0, policy_version 19606 (0.0040) +[2024-11-08 01:43:22,931][41694] Fps is (10 sec: 6963.2, 60 sec: 7236.3, 300 sec: 6873.0). Total num frames: 80334848. Throughput: 0: 1783.5. Samples: 15078310. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:43:22,933][41694] Avg episode reward: [(0, '4.415')] +[2024-11-08 01:43:24,205][42004] Updated weights for policy 0, policy_version 19616 (0.0024) +[2024-11-08 01:43:27,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6963.2, 300 sec: 6803.5). Total num frames: 80355328. Throughput: 0: 1704.6. Samples: 15085290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:43:27,936][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 01:43:32,214][42004] Updated weights for policy 0, policy_version 19626 (0.0030) +[2024-11-08 01:43:32,931][41694] Fps is (10 sec: 5734.3, 60 sec: 6963.2, 300 sec: 6803.5). Total num frames: 80392192. Throughput: 0: 1700.0. Samples: 15090262. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:43:32,933][41694] Avg episode reward: [(0, '4.444')] +[2024-11-08 01:43:37,813][42004] Updated weights for policy 0, policy_version 19636 (0.0024) +[2024-11-08 01:43:37,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6803.5). Total num frames: 80429056. Throughput: 0: 1706.6. Samples: 15101294. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:43:37,933][41694] Avg episode reward: [(0, '4.335')] +[2024-11-08 01:43:37,945][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000019636_80429056.pth... +[2024-11-08 01:43:38,106][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000019236_78790656.pth +[2024-11-08 01:43:42,760][42004] Updated weights for policy 0, policy_version 19646 (0.0024) +[2024-11-08 01:43:42,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7031.5, 300 sec: 6817.4). Total num frames: 80470016. Throughput: 0: 1784.8. Samples: 15113444. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:43:42,933][41694] Avg episode reward: [(0, '4.584')] +[2024-11-08 01:43:47,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7031.4, 300 sec: 6872.9). Total num frames: 80506880. Throughput: 0: 1789.9. Samples: 15119418. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:43:47,933][41694] Avg episode reward: [(0, '4.646')] +[2024-11-08 01:43:48,066][42004] Updated weights for policy 0, policy_version 19656 (0.0025) +[2024-11-08 01:43:52,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7236.2, 300 sec: 6873.0). Total num frames: 80543744. Throughput: 0: 1770.8. Samples: 15130334. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:43:52,933][41694] Avg episode reward: [(0, '4.357')] +[2024-11-08 01:43:53,674][42004] Updated weights for policy 0, policy_version 19666 (0.0033) +[2024-11-08 01:43:57,932][41694] Fps is (10 sec: 7781.9, 60 sec: 7304.4, 300 sec: 6900.7). Total num frames: 80584704. Throughput: 0: 1768.2. Samples: 15142112. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:43:57,934][41694] Avg episode reward: [(0, '4.421')] +[2024-11-08 01:43:58,814][42004] Updated weights for policy 0, policy_version 19676 (0.0027) +[2024-11-08 01:44:02,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6963.2, 300 sec: 6817.4). Total num frames: 80601088. Throughput: 0: 1760.7. Samples: 15146636. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:44:02,933][41694] Avg episode reward: [(0, '4.541')] +[2024-11-08 01:44:06,795][42004] Updated weights for policy 0, policy_version 19686 (0.0026) +[2024-11-08 01:44:07,931][41694] Fps is (10 sec: 5734.9, 60 sec: 6894.9, 300 sec: 6831.3). Total num frames: 80642048. Throughput: 0: 1691.5. Samples: 15154428. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:44:07,933][41694] Avg episode reward: [(0, '4.498')] +[2024-11-08 01:44:12,260][42004] Updated weights for policy 0, policy_version 19696 (0.0028) +[2024-11-08 01:44:12,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6894.9, 300 sec: 6831.3). Total num frames: 80678912. Throughput: 0: 1787.7. Samples: 15165738. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:44:12,933][41694] Avg episode reward: [(0, '4.331')] +[2024-11-08 01:44:17,532][42004] Updated weights for policy 0, policy_version 19706 (0.0026) +[2024-11-08 01:44:17,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6845.2). Total num frames: 80715776. Throughput: 0: 1802.8. Samples: 15171390. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:44:17,933][41694] Avg episode reward: [(0, '4.527')] +[2024-11-08 01:44:22,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6900.7). Total num frames: 80752640. Throughput: 0: 1816.2. Samples: 15183024. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:44:22,933][41694] Avg episode reward: [(0, '4.396')] +[2024-11-08 01:44:23,235][42004] Updated weights for policy 0, policy_version 19716 (0.0023) +[2024-11-08 01:44:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7236.3, 300 sec: 6900.7). Total num frames: 80789504. Throughput: 0: 1784.9. Samples: 15193764. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:44:27,933][41694] Avg episode reward: [(0, '4.255')] +[2024-11-08 01:44:28,654][42004] Updated weights for policy 0, policy_version 19726 (0.0021) +[2024-11-08 01:44:32,932][41694] Fps is (10 sec: 7782.2, 60 sec: 7304.5, 300 sec: 6914.6). Total num frames: 80830464. Throughput: 0: 1783.3. Samples: 15199668. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:44:32,935][41694] Avg episode reward: [(0, '4.632')] +[2024-11-08 01:44:33,877][42004] Updated weights for policy 0, policy_version 19736 (0.0027) +[2024-11-08 01:44:37,932][41694] Fps is (10 sec: 6144.0, 60 sec: 7031.5, 300 sec: 6859.1). Total num frames: 80850944. Throughput: 0: 1713.2. Samples: 15207428. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:44:37,934][41694] Avg episode reward: [(0, '4.413')] +[2024-11-08 01:44:41,590][42004] Updated weights for policy 0, policy_version 19746 (0.0026) +[2024-11-08 01:44:42,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6963.2, 300 sec: 6845.2). Total num frames: 80887808. Throughput: 0: 1695.0. Samples: 15218388. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:44:42,934][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 01:44:46,751][42004] Updated weights for policy 0, policy_version 19756 (0.0031) +[2024-11-08 01:44:47,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7031.5, 300 sec: 6873.0). Total num frames: 80928768. Throughput: 0: 1719.5. Samples: 15224012. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:44:47,933][41694] Avg episode reward: [(0, '4.659')] +[2024-11-08 01:44:51,959][42004] Updated weights for policy 0, policy_version 19766 (0.0033) +[2024-11-08 01:44:52,931][41694] Fps is (10 sec: 7782.6, 60 sec: 7031.5, 300 sec: 6934.7). Total num frames: 80965632. Throughput: 0: 1812.7. Samples: 15235998. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:44:52,933][41694] Avg episode reward: [(0, '4.629')] +[2024-11-08 01:44:57,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6895.0, 300 sec: 6928.5). Total num frames: 80998400. Throughput: 0: 1787.5. Samples: 15246176. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:44:57,934][41694] Avg episode reward: [(0, '4.444')] +[2024-11-08 01:44:58,152][42004] Updated weights for policy 0, policy_version 19776 (0.0034) +[2024-11-08 01:45:02,932][41694] Fps is (10 sec: 6962.9, 60 sec: 7236.2, 300 sec: 6928.5). Total num frames: 81035264. Throughput: 0: 1781.5. Samples: 15251560. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:45:02,934][41694] Avg episode reward: [(0, '4.596')] +[2024-11-08 01:45:03,611][42004] Updated weights for policy 0, policy_version 19786 (0.0035) +[2024-11-08 01:45:07,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7168.0, 300 sec: 6928.5). Total num frames: 81072128. Throughput: 0: 1775.7. Samples: 15262930. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:45:07,934][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 01:45:11,430][42004] Updated weights for policy 0, policy_version 19796 (0.0032) +[2024-11-08 01:45:12,931][41694] Fps is (10 sec: 5734.6, 60 sec: 6894.9, 300 sec: 6886.8). Total num frames: 81092608. Throughput: 0: 1685.8. Samples: 15269624. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:45:12,933][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 01:45:17,560][42004] Updated weights for policy 0, policy_version 19806 (0.0026) +[2024-11-08 01:45:17,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6826.7, 300 sec: 6872.9). Total num frames: 81125376. Throughput: 0: 1665.4. Samples: 15274612. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:45:17,933][41694] Avg episode reward: [(0, '4.457')] +[2024-11-08 01:45:22,710][42004] Updated weights for policy 0, policy_version 19816 (0.0027) +[2024-11-08 01:45:22,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6894.9, 300 sec: 6900.7). Total num frames: 81166336. Throughput: 0: 1735.6. Samples: 15285528. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:45:22,935][41694] Avg episode reward: [(0, '4.393')] +[2024-11-08 01:45:27,934][41694] Fps is (10 sec: 7781.1, 60 sec: 6894.7, 300 sec: 6953.5). Total num frames: 81203200. Throughput: 0: 1757.0. Samples: 15297454. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:45:27,939][41694] Avg episode reward: [(0, '4.455')] +[2024-11-08 01:45:28,186][42004] Updated weights for policy 0, policy_version 19826 (0.0026) +[2024-11-08 01:45:32,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6956.3). Total num frames: 81240064. Throughput: 0: 1743.0. Samples: 15302448. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:45:32,933][41694] Avg episode reward: [(0, '4.372')] +[2024-11-08 01:45:33,791][42004] Updated weights for policy 0, policy_version 19836 (0.0034) +[2024-11-08 01:45:37,931][41694] Fps is (10 sec: 7374.2, 60 sec: 7099.7, 300 sec: 6970.1). Total num frames: 81276928. Throughput: 0: 1725.7. Samples: 15313654. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:45:37,933][41694] Avg episode reward: [(0, '4.330')] +[2024-11-08 01:45:37,951][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000019843_81276928.pth... +[2024-11-08 01:45:38,178][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000019435_79605760.pth +[2024-11-08 01:45:39,330][42004] Updated weights for policy 0, policy_version 19846 (0.0032) +[2024-11-08 01:45:42,933][41694] Fps is (10 sec: 7372.0, 60 sec: 7099.6, 300 sec: 6956.2). Total num frames: 81313792. Throughput: 0: 1750.8. Samples: 15324964. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:45:42,936][41694] Avg episode reward: [(0, '4.351')] +[2024-11-08 01:45:47,203][42004] Updated weights for policy 0, policy_version 19856 (0.0032) +[2024-11-08 01:45:47,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6758.4, 300 sec: 6900.7). Total num frames: 81334272. Throughput: 0: 1672.1. Samples: 15326802. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:45:47,933][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 01:45:52,606][42004] Updated weights for policy 0, policy_version 19866 (0.0025) +[2024-11-08 01:45:52,931][41694] Fps is (10 sec: 5735.0, 60 sec: 6758.4, 300 sec: 6900.7). Total num frames: 81371136. Throughput: 0: 1651.2. Samples: 15337234. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:45:52,934][41694] Avg episode reward: [(0, '4.290')] +[2024-11-08 01:45:57,774][42004] Updated weights for policy 0, policy_version 19876 (0.0030) +[2024-11-08 01:45:57,937][41694] Fps is (10 sec: 7778.0, 60 sec: 6894.3, 300 sec: 6928.4). Total num frames: 81412096. Throughput: 0: 1766.5. Samples: 15349128. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:45:57,946][41694] Avg episode reward: [(0, '4.452')] +[2024-11-08 01:46:02,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6970.9). Total num frames: 81444864. Throughput: 0: 1779.0. Samples: 15354668. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:46:02,934][41694] Avg episode reward: [(0, '4.465')] +[2024-11-08 01:46:03,789][42004] Updated weights for policy 0, policy_version 19886 (0.0027) +[2024-11-08 01:46:07,932][41694] Fps is (10 sec: 6966.7, 60 sec: 6826.6, 300 sec: 6984.1). Total num frames: 81481728. Throughput: 0: 1761.9. Samples: 15364816. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:46:07,935][41694] Avg episode reward: [(0, '4.402')] +[2024-11-08 01:46:09,744][42004] Updated weights for policy 0, policy_version 19896 (0.0038) +[2024-11-08 01:46:12,932][41694] Fps is (10 sec: 6963.1, 60 sec: 7031.4, 300 sec: 6984.0). Total num frames: 81514496. Throughput: 0: 1724.6. Samples: 15375058. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:46:12,934][41694] Avg episode reward: [(0, '4.404')] +[2024-11-08 01:46:15,477][42004] Updated weights for policy 0, policy_version 19906 (0.0036) +[2024-11-08 01:46:19,954][41694] Fps is (10 sec: 5792.3, 60 sec: 6868.3, 300 sec: 6936.5). Total num frames: 81551360. Throughput: 0: 1658.2. Samples: 15380422. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:46:19,955][41694] Avg episode reward: [(0, '4.417')] +[2024-11-08 01:46:22,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6758.4, 300 sec: 6914.6). Total num frames: 81571840. Throughput: 0: 1641.1. Samples: 15387504. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:46:22,934][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 01:46:23,004][42004] Updated weights for policy 0, policy_version 19916 (0.0032) +[2024-11-08 01:46:27,931][41694] Fps is (10 sec: 7701.4, 60 sec: 6826.9, 300 sec: 6914.6). Total num frames: 81612800. Throughput: 0: 1654.7. Samples: 15399422. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 01:46:27,933][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 01:46:28,225][42004] Updated weights for policy 0, policy_version 19926 (0.0030) +[2024-11-08 01:46:32,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6826.7, 300 sec: 6928.5). Total num frames: 81649664. Throughput: 0: 1741.5. Samples: 15405170. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 01:46:32,933][41694] Avg episode reward: [(0, '4.470')] +[2024-11-08 01:46:33,504][42004] Updated weights for policy 0, policy_version 19936 (0.0037) +[2024-11-08 01:46:37,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6991.2). Total num frames: 81686528. Throughput: 0: 1768.0. Samples: 15416796. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 01:46:37,933][41694] Avg episode reward: [(0, '4.458')] +[2024-11-08 01:46:39,380][42004] Updated weights for policy 0, policy_version 19946 (0.0041) +[2024-11-08 01:46:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.8, 300 sec: 6997.9). Total num frames: 81723392. Throughput: 0: 1735.5. Samples: 15427216. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:46:42,934][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 01:46:44,786][42004] Updated weights for policy 0, policy_version 19956 (0.0028) +[2024-11-08 01:46:47,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7168.0, 300 sec: 7011.8). Total num frames: 81764352. Throughput: 0: 1741.5. Samples: 15433036. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:46:47,933][41694] Avg episode reward: [(0, '4.543')] +[2024-11-08 01:46:50,107][42004] Updated weights for policy 0, policy_version 19966 (0.0032) +[2024-11-08 01:46:54,450][41694] Fps is (10 sec: 6045.2, 60 sec: 6857.9, 300 sec: 6948.3). Total num frames: 81793024. Throughput: 0: 1712.4. Samples: 15444472. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:46:54,452][41694] Avg episode reward: [(0, '4.417')] +[2024-11-08 01:46:57,788][42004] Updated weights for policy 0, policy_version 19976 (0.0028) +[2024-11-08 01:46:57,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6827.3, 300 sec: 6942.4). Total num frames: 81821696. Throughput: 0: 1699.7. Samples: 15451546. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:46:57,934][41694] Avg episode reward: [(0, '4.424')] +[2024-11-08 01:47:02,931][41694] Fps is (10 sec: 7726.9, 60 sec: 6894.9, 300 sec: 6942.4). Total num frames: 81858560. Throughput: 0: 1784.9. Samples: 15457134. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:47:02,933][41694] Avg episode reward: [(0, '4.726')] +[2024-11-08 01:47:03,325][42004] Updated weights for policy 0, policy_version 19986 (0.0028) +[2024-11-08 01:47:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6895.0, 300 sec: 6956.3). Total num frames: 81895424. Throughput: 0: 1795.8. Samples: 15468316. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:47:07,934][41694] Avg episode reward: [(0, '4.683')] +[2024-11-08 01:47:08,710][42004] Updated weights for policy 0, policy_version 19996 (0.0026) +[2024-11-08 01:47:12,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6894.9, 300 sec: 6994.7). Total num frames: 81928192. Throughput: 0: 1764.1. Samples: 15478806. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:47:12,934][41694] Avg episode reward: [(0, '4.574')] +[2024-11-08 01:47:14,967][42004] Updated weights for policy 0, policy_version 20006 (0.0024) +[2024-11-08 01:47:17,932][41694] Fps is (10 sec: 6963.3, 60 sec: 7135.4, 300 sec: 6997.9). Total num frames: 81965056. Throughput: 0: 1752.6. Samples: 15484038. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:47:17,935][41694] Avg episode reward: [(0, '4.339')] +[2024-11-08 01:47:20,134][42004] Updated weights for policy 0, policy_version 20016 (0.0033) +[2024-11-08 01:47:22,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7236.2, 300 sec: 7011.8). Total num frames: 82006016. Throughput: 0: 1753.1. Samples: 15495688. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:47:22,938][41694] Avg episode reward: [(0, '4.396')] +[2024-11-08 01:47:25,477][42004] Updated weights for policy 0, policy_version 20026 (0.0030) +[2024-11-08 01:47:28,919][41694] Fps is (10 sec: 6337.6, 60 sec: 6917.6, 300 sec: 6960.7). Total num frames: 82034688. Throughput: 0: 1615.4. Samples: 15501504. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:47:28,922][41694] Avg episode reward: [(0, '4.563')] +[2024-11-08 01:47:32,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6894.9, 300 sec: 6942.4). Total num frames: 82063360. Throughput: 0: 1678.3. Samples: 15508560. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:47:32,934][41694] Avg episode reward: [(0, '4.558')] +[2024-11-08 01:47:33,015][42004] Updated weights for policy 0, policy_version 20036 (0.0026) +[2024-11-08 01:47:37,932][41694] Fps is (10 sec: 7725.9, 60 sec: 6963.2, 300 sec: 6970.1). Total num frames: 82104320. Throughput: 0: 1743.2. Samples: 15520268. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:47:37,934][41694] Avg episode reward: [(0, '4.419')] +[2024-11-08 01:47:37,945][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000020045_82104320.pth... +[2024-11-08 01:47:38,066][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000019636_80429056.pth +[2024-11-08 01:47:38,219][42004] Updated weights for policy 0, policy_version 20046 (0.0029) +[2024-11-08 01:47:42,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6963.2, 300 sec: 6970.1). Total num frames: 82141184. Throughput: 0: 1782.4. Samples: 15531756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:47:42,933][41694] Avg episode reward: [(0, '4.263')] +[2024-11-08 01:47:43,664][42004] Updated weights for policy 0, policy_version 20056 (0.0035) +[2024-11-08 01:47:47,933][41694] Fps is (10 sec: 7781.5, 60 sec: 6963.1, 300 sec: 7025.6). Total num frames: 82182144. Throughput: 0: 1790.0. Samples: 15537688. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:47:47,936][41694] Avg episode reward: [(0, '4.506')] +[2024-11-08 01:47:48,787][42004] Updated weights for policy 0, policy_version 20066 (0.0030) +[2024-11-08 01:47:52,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7284.1, 300 sec: 7025.7). Total num frames: 82219008. Throughput: 0: 1800.0. Samples: 15549316. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:47:52,933][41694] Avg episode reward: [(0, '4.557')] +[2024-11-08 01:47:54,126][42004] Updated weights for policy 0, policy_version 20076 (0.0031) +[2024-11-08 01:47:57,937][41694] Fps is (10 sec: 7373.7, 60 sec: 7236.3, 300 sec: 7025.7). Total num frames: 82255872. Throughput: 0: 1820.0. Samples: 15560706. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:47:57,948][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 01:48:00,108][42004] Updated weights for policy 0, policy_version 20086 (0.0025) +[2024-11-08 01:48:03,415][41694] Fps is (10 sec: 5469.8, 60 sec: 6907.5, 300 sec: 6931.0). Total num frames: 82276352. Throughput: 0: 1791.5. Samples: 15565520. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:48:03,416][41694] Avg episode reward: [(0, '4.477')] +[2024-11-08 01:48:07,889][42004] Updated weights for policy 0, policy_version 20096 (0.0033) +[2024-11-08 01:48:07,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6963.2, 300 sec: 6942.4). Total num frames: 82313216. Throughput: 0: 1699.6. Samples: 15572170. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:48:07,933][41694] Avg episode reward: [(0, '4.273')] +[2024-11-08 01:48:12,932][41694] Fps is (10 sec: 7747.5, 60 sec: 7031.5, 300 sec: 6956.3). Total num frames: 82350080. Throughput: 0: 1862.5. Samples: 15583478. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:48:12,934][41694] Avg episode reward: [(0, '4.302')] +[2024-11-08 01:48:13,370][42004] Updated weights for policy 0, policy_version 20106 (0.0032) +[2024-11-08 01:48:17,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 6956.3). Total num frames: 82386944. Throughput: 0: 1791.1. Samples: 15589158. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:48:17,933][41694] Avg episode reward: [(0, '4.371')] +[2024-11-08 01:48:18,788][42004] Updated weights for policy 0, policy_version 20116 (0.0026) +[2024-11-08 01:48:22,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 7011.8). Total num frames: 82423808. Throughput: 0: 1780.7. Samples: 15600400. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:48:22,933][41694] Avg episode reward: [(0, '4.178')] +[2024-11-08 01:48:24,071][42004] Updated weights for policy 0, policy_version 20126 (0.0024) +[2024-11-08 01:48:27,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7287.9, 300 sec: 7025.7). Total num frames: 82464768. Throughput: 0: 1785.3. Samples: 15612094. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:48:27,932][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 01:48:29,512][42004] Updated weights for policy 0, policy_version 20136 (0.0030) +[2024-11-08 01:48:32,931][41694] Fps is (10 sec: 7373.0, 60 sec: 7236.3, 300 sec: 7011.8). Total num frames: 82497536. Throughput: 0: 1776.0. Samples: 15617604. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:48:32,932][41694] Avg episode reward: [(0, '4.487')] +[2024-11-08 01:48:35,559][42004] Updated weights for policy 0, policy_version 20146 (0.0028) +[2024-11-08 01:48:37,933][41694] Fps is (10 sec: 5324.2, 60 sec: 6894.8, 300 sec: 6942.3). Total num frames: 82518016. Throughput: 0: 1730.2. Samples: 15627176. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:48:37,934][41694] Avg episode reward: [(0, '4.532')] +[2024-11-08 01:48:42,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6894.9, 300 sec: 6942.4). Total num frames: 82554880. Throughput: 0: 1655.4. Samples: 15635200. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:48:42,933][41694] Avg episode reward: [(0, '4.505')] +[2024-11-08 01:48:43,152][42004] Updated weights for policy 0, policy_version 20156 (0.0034) +[2024-11-08 01:48:47,931][41694] Fps is (10 sec: 7783.2, 60 sec: 6895.1, 300 sec: 6956.3). Total num frames: 82595840. Throughput: 0: 1698.2. Samples: 15641116. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:48:47,933][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 01:48:48,362][42004] Updated weights for policy 0, policy_version 20166 (0.0031) +[2024-11-08 01:48:52,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6894.9, 300 sec: 6942.4). Total num frames: 82632704. Throughput: 0: 1786.8. Samples: 15652574. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:48:52,934][41694] Avg episode reward: [(0, '4.611')] +[2024-11-08 01:48:53,740][42004] Updated weights for policy 0, policy_version 20176 (0.0019) +[2024-11-08 01:48:57,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6963.2, 300 sec: 7025.7). Total num frames: 82673664. Throughput: 0: 1793.3. Samples: 15664178. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:48:57,933][41694] Avg episode reward: [(0, '4.395')] +[2024-11-08 01:48:58,897][42004] Updated weights for policy 0, policy_version 20186 (0.0025) +[2024-11-08 01:49:02,932][41694] Fps is (10 sec: 7372.9, 60 sec: 7226.3, 300 sec: 6997.9). Total num frames: 82706432. Throughput: 0: 1794.4. Samples: 15669906. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:49:02,936][41694] Avg episode reward: [(0, '4.464')] +[2024-11-08 01:49:05,098][42004] Updated weights for policy 0, policy_version 20196 (0.0030) +[2024-11-08 01:49:07,932][41694] Fps is (10 sec: 6553.3, 60 sec: 7099.7, 300 sec: 6984.0). Total num frames: 82739200. Throughput: 0: 1761.4. Samples: 15679664. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:49:07,937][41694] Avg episode reward: [(0, '4.553')] +[2024-11-08 01:49:12,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6826.7, 300 sec: 6928.5). Total num frames: 82759680. Throughput: 0: 1666.4. Samples: 15687084. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:49:12,934][41694] Avg episode reward: [(0, '4.506')] +[2024-11-08 01:49:13,035][42004] Updated weights for policy 0, policy_version 20206 (0.0031) +[2024-11-08 01:49:17,931][41694] Fps is (10 sec: 6144.3, 60 sec: 6894.9, 300 sec: 6942.4). Total num frames: 82800640. Throughput: 0: 1658.0. Samples: 15692212. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:49:17,933][41694] Avg episode reward: [(0, '4.444')] +[2024-11-08 01:49:18,211][42004] Updated weights for policy 0, policy_version 20216 (0.0023) +[2024-11-08 01:49:22,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6894.9, 300 sec: 6942.4). Total num frames: 82837504. Throughput: 0: 1703.5. Samples: 15703832. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:49:22,933][41694] Avg episode reward: [(0, '4.338')] +[2024-11-08 01:49:23,665][42004] Updated weights for policy 0, policy_version 20226 (0.0024) +[2024-11-08 01:49:27,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6894.9, 300 sec: 6942.4). Total num frames: 82878464. Throughput: 0: 1785.9. Samples: 15715564. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:49:27,933][41694] Avg episode reward: [(0, '4.447')] +[2024-11-08 01:49:28,836][42004] Updated weights for policy 0, policy_version 20236 (0.0031) +[2024-11-08 01:49:32,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6963.2, 300 sec: 6997.9). Total num frames: 82915328. Throughput: 0: 1777.7. Samples: 15721114. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:49:32,933][41694] Avg episode reward: [(0, '4.514')] +[2024-11-08 01:49:34,321][42004] Updated weights for policy 0, policy_version 20246 (0.0032) +[2024-11-08 01:49:37,937][41694] Fps is (10 sec: 7368.8, 60 sec: 7235.7, 300 sec: 6997.8). Total num frames: 82952192. Throughput: 0: 1777.3. Samples: 15732562. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:49:37,949][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 01:49:37,962][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000020252_82952192.pth... +[2024-11-08 01:49:38,073][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000019843_81276928.pth +[2024-11-08 01:49:40,274][42004] Updated weights for policy 0, policy_version 20256 (0.0031) +[2024-11-08 01:49:42,931][41694] Fps is (10 sec: 6963.2, 60 sec: 7168.0, 300 sec: 6970.1). Total num frames: 82984960. Throughput: 0: 1742.4. Samples: 15742586. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:49:42,934][41694] Avg episode reward: [(0, '4.539')] +[2024-11-08 01:49:47,931][41694] Fps is (10 sec: 5327.7, 60 sec: 6826.7, 300 sec: 6914.6). Total num frames: 83005440. Throughput: 0: 1723.7. Samples: 15747472. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:49:47,934][41694] Avg episode reward: [(0, '4.648')] +[2024-11-08 01:49:48,182][42004] Updated weights for policy 0, policy_version 20266 (0.0029) +[2024-11-08 01:49:52,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6894.9, 300 sec: 6942.4). Total num frames: 83046400. Throughput: 0: 1685.2. Samples: 15755496. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:49:52,934][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 01:49:53,358][42004] Updated weights for policy 0, policy_version 20276 (0.0030) +[2024-11-08 01:49:57,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6826.7, 300 sec: 6942.4). Total num frames: 83083264. Throughput: 0: 1781.6. Samples: 15767258. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:49:57,933][41694] Avg episode reward: [(0, '4.398')] +[2024-11-08 01:49:58,678][42004] Updated weights for policy 0, policy_version 20286 (0.0027) +[2024-11-08 01:50:02,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6894.9, 300 sec: 6942.4). Total num frames: 83120128. Throughput: 0: 1796.6. Samples: 15773058. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:50:02,936][41694] Avg episode reward: [(0, '4.306')] +[2024-11-08 01:50:04,117][42004] Updated weights for policy 0, policy_version 20296 (0.0030) +[2024-11-08 01:50:07,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7031.5, 300 sec: 7011.8). Total num frames: 83161088. Throughput: 0: 1786.4. Samples: 15784218. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:50:07,933][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 01:50:09,367][42004] Updated weights for policy 0, policy_version 20306 (0.0031) +[2024-11-08 01:50:12,931][41694] Fps is (10 sec: 7373.0, 60 sec: 7236.3, 300 sec: 7011.8). Total num frames: 83193856. Throughput: 0: 1771.3. Samples: 15795272. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:50:12,933][41694] Avg episode reward: [(0, '4.378')] +[2024-11-08 01:50:16,235][42004] Updated weights for policy 0, policy_version 20316 (0.0034) +[2024-11-08 01:50:17,932][41694] Fps is (10 sec: 6143.9, 60 sec: 7031.4, 300 sec: 6970.1). Total num frames: 83222528. Throughput: 0: 1738.9. Samples: 15799364. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:50:17,934][41694] Avg episode reward: [(0, '4.431')] +[2024-11-08 01:50:22,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6758.4, 300 sec: 6914.6). Total num frames: 83243008. Throughput: 0: 1633.7. Samples: 15806070. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:50:22,933][41694] Avg episode reward: [(0, '4.481')] +[2024-11-08 01:50:24,161][42004] Updated weights for policy 0, policy_version 20326 (0.0043) +[2024-11-08 01:50:27,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.1, 300 sec: 6914.6). Total num frames: 83279872. Throughput: 0: 1644.3. Samples: 15816578. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:50:27,935][41694] Avg episode reward: [(0, '4.314')] +[2024-11-08 01:50:29,630][42004] Updated weights for policy 0, policy_version 20336 (0.0023) +[2024-11-08 01:50:32,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.4, 300 sec: 6928.5). Total num frames: 83320832. Throughput: 0: 1657.6. Samples: 15822062. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:50:32,934][41694] Avg episode reward: [(0, '4.242')] +[2024-11-08 01:50:34,996][42004] Updated weights for policy 0, policy_version 20346 (0.0020) +[2024-11-08 01:50:37,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6759.0, 300 sec: 6928.5). Total num frames: 83357696. Throughput: 0: 1737.6. Samples: 15833688. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:50:37,934][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 01:50:40,352][42004] Updated weights for policy 0, policy_version 20356 (0.0033) +[2024-11-08 01:50:42,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6984.0). Total num frames: 83394560. Throughput: 0: 1733.3. Samples: 15845256. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:50:42,934][41694] Avg episode reward: [(0, '4.526')] +[2024-11-08 01:50:45,922][42004] Updated weights for policy 0, policy_version 20366 (0.0029) +[2024-11-08 01:50:47,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7099.7, 300 sec: 6984.0). Total num frames: 83431424. Throughput: 0: 1728.8. Samples: 15850852. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:50:47,934][41694] Avg episode reward: [(0, '4.457')] +[2024-11-08 01:50:51,704][42004] Updated weights for policy 0, policy_version 20376 (0.0024) +[2024-11-08 01:50:52,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 6970.3). Total num frames: 83468288. Throughput: 0: 1714.0. Samples: 15861350. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 01:50:52,933][41694] Avg episode reward: [(0, '4.710')] +[2024-11-08 01:50:57,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6758.4, 300 sec: 6928.5). Total num frames: 83488768. Throughput: 0: 1627.7. Samples: 15868518. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 01:50:57,934][41694] Avg episode reward: [(0, '4.764')] +[2024-11-08 01:50:59,267][42004] Updated weights for policy 0, policy_version 20386 (0.0027) +[2024-11-08 01:51:02,932][41694] Fps is (10 sec: 5324.6, 60 sec: 6690.1, 300 sec: 6914.6). Total num frames: 83521536. Throughput: 0: 1664.7. Samples: 15874276. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 01:51:02,935][41694] Avg episode reward: [(0, '4.456')] +[2024-11-08 01:51:05,155][42004] Updated weights for policy 0, policy_version 20396 (0.0026) +[2024-11-08 01:51:07,931][41694] Fps is (10 sec: 7373.1, 60 sec: 6690.1, 300 sec: 6942.4). Total num frames: 83562496. Throughput: 0: 1748.0. Samples: 15884728. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 01:51:07,934][41694] Avg episode reward: [(0, '4.288')] +[2024-11-08 01:51:10,805][42004] Updated weights for policy 0, policy_version 20406 (0.0025) +[2024-11-08 01:51:12,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.1, 300 sec: 6976.3). Total num frames: 83595264. Throughput: 0: 1750.7. Samples: 15895360. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 01:51:12,935][41694] Avg episode reward: [(0, '4.454')] +[2024-11-08 01:51:16,728][42004] Updated weights for policy 0, policy_version 20416 (0.0025) +[2024-11-08 01:51:17,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6984.0). Total num frames: 83632128. Throughput: 0: 1737.5. Samples: 15900250. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 01:51:17,933][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 01:51:22,781][42004] Updated weights for policy 0, policy_version 20426 (0.0040) +[2024-11-08 01:51:22,932][41694] Fps is (10 sec: 6963.5, 60 sec: 7031.5, 300 sec: 6956.3). Total num frames: 83664896. Throughput: 0: 1713.3. Samples: 15910786. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:51:22,933][41694] Avg episode reward: [(0, '4.358')] +[2024-11-08 01:51:27,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6963.2, 300 sec: 6942.4). Total num frames: 83697664. Throughput: 0: 1686.1. Samples: 15921132. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:51:27,933][41694] Avg episode reward: [(0, '4.315')] +[2024-11-08 01:51:30,912][42004] Updated weights for policy 0, policy_version 20436 (0.0027) +[2024-11-08 01:51:32,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6621.9, 300 sec: 6886.8). Total num frames: 83718144. Throughput: 0: 1606.0. Samples: 15923120. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:51:32,933][41694] Avg episode reward: [(0, '4.355')] +[2024-11-08 01:51:36,340][42004] Updated weights for policy 0, policy_version 20446 (0.0022) +[2024-11-08 01:51:37,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6621.9, 300 sec: 6886.8). Total num frames: 83755008. Throughput: 0: 1607.3. Samples: 15933678. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:51:37,935][41694] Avg episode reward: [(0, '4.382')] +[2024-11-08 01:51:37,992][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000020449_83759104.pth... +[2024-11-08 01:51:38,108][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000020045_82104320.pth +[2024-11-08 01:51:41,706][42004] Updated weights for policy 0, policy_version 20456 (0.0024) +[2024-11-08 01:51:42,931][41694] Fps is (10 sec: 7782.3, 60 sec: 6690.1, 300 sec: 6886.8). Total num frames: 83795968. Throughput: 0: 1701.7. Samples: 15945096. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:51:42,934][41694] Avg episode reward: [(0, '4.308')] +[2024-11-08 01:51:47,167][42004] Updated weights for policy 0, policy_version 20466 (0.0035) +[2024-11-08 01:51:47,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6690.1, 300 sec: 6950.4). Total num frames: 83832832. Throughput: 0: 1692.0. Samples: 15950414. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:51:47,934][41694] Avg episode reward: [(0, '4.245')] +[2024-11-08 01:51:52,502][42004] Updated weights for policy 0, policy_version 20476 (0.0034) +[2024-11-08 01:51:52,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6690.1, 300 sec: 6942.4). Total num frames: 83869696. Throughput: 0: 1721.5. Samples: 15962194. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:51:52,934][41694] Avg episode reward: [(0, '4.476')] +[2024-11-08 01:51:57,932][41694] Fps is (10 sec: 6962.7, 60 sec: 6894.9, 300 sec: 6928.5). Total num frames: 83902464. Throughput: 0: 1714.5. Samples: 15972514. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:51:57,935][41694] Avg episode reward: [(0, '4.498')] +[2024-11-08 01:51:58,531][42004] Updated weights for policy 0, policy_version 20486 (0.0028) +[2024-11-08 01:52:04,991][41694] Fps is (10 sec: 5774.2, 60 sec: 6732.2, 300 sec: 6880.5). Total num frames: 83939328. Throughput: 0: 1649.6. Samples: 15977878. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:52:04,994][41694] Avg episode reward: [(0, '4.472')] +[2024-11-08 01:52:06,563][42004] Updated weights for policy 0, policy_version 20496 (0.0026) +[2024-11-08 01:52:07,932][41694] Fps is (10 sec: 5734.6, 60 sec: 6621.8, 300 sec: 6886.8). Total num frames: 83959808. Throughput: 0: 1643.4. Samples: 15984738. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:52:07,934][41694] Avg episode reward: [(0, '4.550')] +[2024-11-08 01:52:11,953][42004] Updated weights for policy 0, policy_version 20506 (0.0038) +[2024-11-08 01:52:12,931][41694] Fps is (10 sec: 7221.7, 60 sec: 6690.2, 300 sec: 6886.8). Total num frames: 83996672. Throughput: 0: 1660.0. Samples: 15995834. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:52:12,935][41694] Avg episode reward: [(0, '4.330')] +[2024-11-08 01:52:17,136][42004] Updated weights for policy 0, policy_version 20516 (0.0030) +[2024-11-08 01:52:17,932][41694] Fps is (10 sec: 7782.6, 60 sec: 6758.4, 300 sec: 6886.8). Total num frames: 84037632. Throughput: 0: 1747.0. Samples: 16001736. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:52:17,934][41694] Avg episode reward: [(0, '4.303')] +[2024-11-08 01:52:22,271][42004] Updated weights for policy 0, policy_version 20526 (0.0035) +[2024-11-08 01:52:22,933][41694] Fps is (10 sec: 8191.1, 60 sec: 6894.8, 300 sec: 6951.7). Total num frames: 84078592. Throughput: 0: 1778.4. Samples: 16013708. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:52:22,934][41694] Avg episode reward: [(0, '4.435')] +[2024-11-08 01:52:27,775][42004] Updated weights for policy 0, policy_version 20536 (0.0026) +[2024-11-08 01:52:27,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6963.2, 300 sec: 6956.3). Total num frames: 84115456. Throughput: 0: 1778.6. Samples: 16025132. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:52:27,940][41694] Avg episode reward: [(0, '4.596')] +[2024-11-08 01:52:32,931][41694] Fps is (10 sec: 6964.0, 60 sec: 7168.0, 300 sec: 6928.5). Total num frames: 84148224. Throughput: 0: 1766.3. Samples: 16029898. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:52:32,932][41694] Avg episode reward: [(0, '4.592')] +[2024-11-08 01:52:33,540][42004] Updated weights for policy 0, policy_version 20546 (0.0033) +[2024-11-08 01:52:39,430][41694] Fps is (10 sec: 5699.8, 60 sec: 6926.8, 300 sec: 6879.7). Total num frames: 84180992. Throughput: 0: 1694.7. Samples: 16040994. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:52:39,434][41694] Avg episode reward: [(0, '4.459')] +[2024-11-08 01:52:41,500][42004] Updated weights for policy 0, policy_version 20556 (0.0030) +[2024-11-08 01:52:42,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6826.7, 300 sec: 6859.1). Total num frames: 84205568. Throughput: 0: 1670.6. Samples: 16047690. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:52:42,934][41694] Avg episode reward: [(0, '4.352')] +[2024-11-08 01:52:47,039][42004] Updated weights for policy 0, policy_version 20566 (0.0027) +[2024-11-08 01:52:47,931][41694] Fps is (10 sec: 7226.9, 60 sec: 6826.7, 300 sec: 6859.1). Total num frames: 84242432. Throughput: 0: 1751.7. Samples: 16053098. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:52:47,933][41694] Avg episode reward: [(0, '4.508')] +[2024-11-08 01:52:52,223][42004] Updated weights for policy 0, policy_version 20576 (0.0023) +[2024-11-08 01:52:52,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6895.0, 300 sec: 6872.9). Total num frames: 84283392. Throughput: 0: 1783.3. Samples: 16064988. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:52:52,935][41694] Avg episode reward: [(0, '4.436')] +[2024-11-08 01:52:57,334][42004] Updated weights for policy 0, policy_version 20586 (0.0026) +[2024-11-08 01:52:57,932][41694] Fps is (10 sec: 8191.8, 60 sec: 7031.5, 300 sec: 6953.8). Total num frames: 84324352. Throughput: 0: 1804.9. Samples: 16077054. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:52:57,934][41694] Avg episode reward: [(0, '4.556')] +[2024-11-08 01:53:02,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7210.7, 300 sec: 6928.5). Total num frames: 84357120. Throughput: 0: 1798.2. Samples: 16082656. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:53:02,933][41694] Avg episode reward: [(0, '4.581')] +[2024-11-08 01:53:03,121][42004] Updated weights for policy 0, policy_version 20596 (0.0022) +[2024-11-08 01:53:07,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7304.6, 300 sec: 6942.4). Total num frames: 84398080. Throughput: 0: 1766.6. Samples: 16093204. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:53:07,934][41694] Avg episode reward: [(0, '4.222')] +[2024-11-08 01:53:08,369][42004] Updated weights for policy 0, policy_version 20606 (0.0035) +[2024-11-08 01:53:14,322][41694] Fps is (10 sec: 6113.0, 60 sec: 7005.6, 300 sec: 6882.2). Total num frames: 84426752. Throughput: 0: 1600.6. Samples: 16099386. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:53:14,324][41694] Avg episode reward: [(0, '4.557')] +[2024-11-08 01:53:16,286][42004] Updated weights for policy 0, policy_version 20616 (0.0022) +[2024-11-08 01:53:17,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6894.9, 300 sec: 6872.9). Total num frames: 84451328. Throughput: 0: 1689.9. Samples: 16105942. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:53:17,933][41694] Avg episode reward: [(0, '4.590')] +[2024-11-08 01:53:21,855][42004] Updated weights for policy 0, policy_version 20626 (0.0034) +[2024-11-08 01:53:22,932][41694] Fps is (10 sec: 7612.1, 60 sec: 6895.0, 300 sec: 6872.9). Total num frames: 84492288. Throughput: 0: 1744.4. Samples: 16116880. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:53:22,933][41694] Avg episode reward: [(0, '4.402')] +[2024-11-08 01:53:27,350][42004] Updated weights for policy 0, policy_version 20636 (0.0042) +[2024-11-08 01:53:27,933][41694] Fps is (10 sec: 7781.7, 60 sec: 6894.8, 300 sec: 6886.8). Total num frames: 84529152. Throughput: 0: 1791.4. Samples: 16128304. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:53:27,935][41694] Avg episode reward: [(0, '4.619')] +[2024-11-08 01:53:32,555][42004] Updated weights for policy 0, policy_version 20646 (0.0024) +[2024-11-08 01:53:32,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 6942.4). Total num frames: 84566016. Throughput: 0: 1796.6. Samples: 16133946. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:53:32,935][41694] Avg episode reward: [(0, '4.540')] +[2024-11-08 01:53:37,931][41694] Fps is (10 sec: 7373.6, 60 sec: 7211.6, 300 sec: 6942.4). Total num frames: 84602880. Throughput: 0: 1774.0. Samples: 16144820. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:53:37,933][41694] Avg episode reward: [(0, '4.363')] +[2024-11-08 01:53:37,946][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000020655_84602880.pth... +[2024-11-08 01:53:38,076][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000020252_82952192.pth +[2024-11-08 01:53:38,460][42004] Updated weights for policy 0, policy_version 20656 (0.0032) +[2024-11-08 01:53:42,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7236.3, 300 sec: 6928.5). Total num frames: 84639744. Throughput: 0: 1755.3. Samples: 16156040. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:53:42,935][41694] Avg episode reward: [(0, '4.652')] +[2024-11-08 01:53:43,729][42004] Updated weights for policy 0, policy_version 20666 (0.0024) +[2024-11-08 01:53:48,813][41694] Fps is (10 sec: 6022.8, 60 sec: 6997.0, 300 sec: 6880.2). Total num frames: 84668416. Throughput: 0: 1724.4. Samples: 16161772. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:53:48,816][41694] Avg episode reward: [(0, '4.345')] +[2024-11-08 01:53:51,434][42004] Updated weights for policy 0, policy_version 20676 (0.0027) +[2024-11-08 01:53:52,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6894.9, 300 sec: 6859.1). Total num frames: 84697088. Throughput: 0: 1683.9. Samples: 16168978. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:53:52,933][41694] Avg episode reward: [(0, '4.470')] +[2024-11-08 01:53:56,721][42004] Updated weights for policy 0, policy_version 20686 (0.0047) +[2024-11-08 01:53:57,931][41694] Fps is (10 sec: 7636.2, 60 sec: 6895.0, 300 sec: 6886.8). Total num frames: 84738048. Throughput: 0: 1858.4. Samples: 16180430. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:53:57,934][41694] Avg episode reward: [(0, '4.210')] +[2024-11-08 01:54:02,365][42004] Updated weights for policy 0, policy_version 20696 (0.0030) +[2024-11-08 01:54:02,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6963.2, 300 sec: 6900.7). Total num frames: 84774912. Throughput: 0: 1779.9. Samples: 16186036. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:54:02,933][41694] Avg episode reward: [(0, '4.318')] +[2024-11-08 01:54:07,880][42004] Updated weights for policy 0, policy_version 20706 (0.0045) +[2024-11-08 01:54:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6894.9, 300 sec: 6956.3). Total num frames: 84811776. Throughput: 0: 1784.5. Samples: 16197182. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:54:07,933][41694] Avg episode reward: [(0, '4.386')] +[2024-11-08 01:54:12,932][41694] Fps is (10 sec: 6963.0, 60 sec: 7128.4, 300 sec: 6928.5). Total num frames: 84844544. Throughput: 0: 1755.8. Samples: 16207314. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:54:12,934][41694] Avg episode reward: [(0, '4.318')] +[2024-11-08 01:54:13,851][42004] Updated weights for policy 0, policy_version 20716 (0.0021) +[2024-11-08 01:54:17,932][41694] Fps is (10 sec: 6963.2, 60 sec: 7168.0, 300 sec: 6928.5). Total num frames: 84881408. Throughput: 0: 1753.5. Samples: 16212854. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:54:17,934][41694] Avg episode reward: [(0, '4.536')] +[2024-11-08 01:54:19,112][42004] Updated weights for policy 0, policy_version 20726 (0.0029) +[2024-11-08 01:54:22,931][41694] Fps is (10 sec: 6144.3, 60 sec: 6895.0, 300 sec: 6873.0). Total num frames: 84905984. Throughput: 0: 1764.2. Samples: 16224210. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:54:22,936][41694] Avg episode reward: [(0, '4.633')] +[2024-11-08 01:54:26,595][42004] Updated weights for policy 0, policy_version 20736 (0.0030) +[2024-11-08 01:54:27,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6895.0, 300 sec: 6872.9). Total num frames: 84942848. Throughput: 0: 1688.7. Samples: 16232030. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:54:27,933][41694] Avg episode reward: [(0, '4.772')] +[2024-11-08 01:54:31,811][42004] Updated weights for policy 0, policy_version 20746 (0.0021) +[2024-11-08 01:54:32,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6887.0). Total num frames: 84983808. Throughput: 0: 1719.5. Samples: 16237636. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:54:32,933][41694] Avg episode reward: [(0, '4.583')] +[2024-11-08 01:54:36,976][42004] Updated weights for policy 0, policy_version 20756 (0.0026) +[2024-11-08 01:54:37,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6963.2, 300 sec: 6900.7). Total num frames: 85020672. Throughput: 0: 1794.9. Samples: 16249748. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:54:37,933][41694] Avg episode reward: [(0, '4.430')] +[2024-11-08 01:54:42,909][42004] Updated weights for policy 0, policy_version 20766 (0.0036) +[2024-11-08 01:54:42,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6956.3). Total num frames: 85057536. Throughput: 0: 1781.3. Samples: 16260590. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:54:42,935][41694] Avg episode reward: [(0, '4.344')] +[2024-11-08 01:54:47,931][41694] Fps is (10 sec: 6963.2, 60 sec: 7136.3, 300 sec: 6928.5). Total num frames: 85090304. Throughput: 0: 1765.7. Samples: 16265492. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:54:47,933][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 01:54:48,555][42004] Updated weights for policy 0, policy_version 20776 (0.0026) +[2024-11-08 01:54:52,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7236.3, 300 sec: 6942.4). Total num frames: 85131264. Throughput: 0: 1772.1. Samples: 16276928. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:54:52,933][41694] Avg episode reward: [(0, '4.499')] +[2024-11-08 01:54:53,922][42004] Updated weights for policy 0, policy_version 20786 (0.0024) +[2024-11-08 01:54:57,932][41694] Fps is (10 sec: 6143.7, 60 sec: 6894.9, 300 sec: 6886.8). Total num frames: 85151744. Throughput: 0: 1722.5. Samples: 16284826. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:54:57,934][41694] Avg episode reward: [(0, '4.433')] +[2024-11-08 01:55:01,814][42004] Updated weights for policy 0, policy_version 20796 (0.0030) +[2024-11-08 01:55:02,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6826.7, 300 sec: 6859.1). Total num frames: 85184512. Throughput: 0: 1699.4. Samples: 16289328. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:55:02,934][41694] Avg episode reward: [(0, '4.405')] +[2024-11-08 01:55:07,225][42004] Updated weights for policy 0, policy_version 20806 (0.0026) +[2024-11-08 01:55:07,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6894.9, 300 sec: 6886.8). Total num frames: 85225472. Throughput: 0: 1693.1. Samples: 16300398. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:55:07,933][41694] Avg episode reward: [(0, '4.293')] +[2024-11-08 01:55:12,613][42004] Updated weights for policy 0, policy_version 20816 (0.0029) +[2024-11-08 01:55:12,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6914.6). Total num frames: 85262336. Throughput: 0: 1775.7. Samples: 16311938. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:55:12,935][41694] Avg episode reward: [(0, '4.404')] +[2024-11-08 01:55:17,932][41694] Fps is (10 sec: 6553.8, 60 sec: 6826.7, 300 sec: 6942.4). Total num frames: 85291008. Throughput: 0: 1758.1. Samples: 16316750. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:55:17,934][41694] Avg episode reward: [(0, '4.356')] +[2024-11-08 01:55:19,370][42004] Updated weights for policy 0, policy_version 20826 (0.0029) +[2024-11-08 01:55:22,932][41694] Fps is (10 sec: 6553.6, 60 sec: 7031.5, 300 sec: 6942.4). Total num frames: 85327872. Throughput: 0: 1698.4. Samples: 16326174. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:55:22,934][41694] Avg episode reward: [(0, '4.413')] +[2024-11-08 01:55:24,790][42004] Updated weights for policy 0, policy_version 20836 (0.0020) +[2024-11-08 01:55:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 6928.5). Total num frames: 85364736. Throughput: 0: 1713.4. Samples: 16337692. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:55:27,933][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 01:55:32,444][42004] Updated weights for policy 0, policy_version 20846 (0.0026) +[2024-11-08 01:55:32,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6758.4, 300 sec: 6886.8). Total num frames: 85389312. Throughput: 0: 1725.6. Samples: 16343144. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:55:32,934][41694] Avg episode reward: [(0, '4.500')] +[2024-11-08 01:55:37,616][42004] Updated weights for policy 0, policy_version 20856 (0.0040) +[2024-11-08 01:55:37,933][41694] Fps is (10 sec: 6143.6, 60 sec: 6758.3, 300 sec: 6886.8). Total num frames: 85426176. Throughput: 0: 1640.9. Samples: 16350770. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:55:37,935][41694] Avg episode reward: [(0, '4.543')] +[2024-11-08 01:55:37,946][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000020856_85426176.pth... +[2024-11-08 01:55:38,071][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000020449_83759104.pth +[2024-11-08 01:55:42,926][42004] Updated weights for policy 0, policy_version 20866 (0.0039) +[2024-11-08 01:55:42,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6826.7, 300 sec: 6900.7). Total num frames: 85467136. Throughput: 0: 1727.3. Samples: 16362552. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:55:42,934][41694] Avg episode reward: [(0, '4.512')] +[2024-11-08 01:55:47,931][41694] Fps is (10 sec: 7783.0, 60 sec: 6894.9, 300 sec: 6900.7). Total num frames: 85504000. Throughput: 0: 1756.7. Samples: 16368378. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:55:47,933][41694] Avg episode reward: [(0, '4.365')] +[2024-11-08 01:55:48,389][42004] Updated weights for policy 0, policy_version 20876 (0.0032) +[2024-11-08 01:55:52,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 6942.4). Total num frames: 85536768. Throughput: 0: 1745.5. Samples: 16378946. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:55:52,934][41694] Avg episode reward: [(0, '4.550')] +[2024-11-08 01:55:54,320][42004] Updated weights for policy 0, policy_version 20886 (0.0033) +[2024-11-08 01:55:57,932][41694] Fps is (10 sec: 6963.2, 60 sec: 7031.5, 300 sec: 6956.3). Total num frames: 85573632. Throughput: 0: 1733.0. Samples: 16389922. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:55:57,934][41694] Avg episode reward: [(0, '4.564')] +[2024-11-08 01:55:59,788][42004] Updated weights for policy 0, policy_version 20896 (0.0026) +[2024-11-08 01:56:02,932][41694] Fps is (10 sec: 7372.9, 60 sec: 7099.7, 300 sec: 6942.4). Total num frames: 85610496. Throughput: 0: 1748.0. Samples: 16395410. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:56:02,933][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 01:56:07,797][42004] Updated weights for policy 0, policy_version 20906 (0.0049) +[2024-11-08 01:56:07,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6900.7). Total num frames: 85630976. Throughput: 0: 1701.5. Samples: 16402742. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:56:07,933][41694] Avg episode reward: [(0, '4.453')] +[2024-11-08 01:56:12,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6690.1, 300 sec: 6886.8). Total num frames: 85663744. Throughput: 0: 1659.3. Samples: 16412362. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 01:56:12,934][41694] Avg episode reward: [(0, '4.485')] +[2024-11-08 01:56:13,845][42004] Updated weights for policy 0, policy_version 20916 (0.0030) +[2024-11-08 01:56:17,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6900.7). Total num frames: 85700608. Throughput: 0: 1651.0. Samples: 16417438. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 01:56:17,933][41694] Avg episode reward: [(0, '4.286')] +[2024-11-08 01:56:19,457][42004] Updated weights for policy 0, policy_version 20926 (0.0025) +[2024-11-08 01:56:22,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6758.4, 300 sec: 6900.7). Total num frames: 85733376. Throughput: 0: 1729.2. Samples: 16428584. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 01:56:22,933][41694] Avg episode reward: [(0, '4.490')] +[2024-11-08 01:56:25,437][42004] Updated weights for policy 0, policy_version 20936 (0.0032) +[2024-11-08 01:56:27,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 6956.3). Total num frames: 85770240. Throughput: 0: 1694.9. Samples: 16438824. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:56:27,933][41694] Avg episode reward: [(0, '4.596')] +[2024-11-08 01:56:30,885][42004] Updated weights for policy 0, policy_version 20946 (0.0027) +[2024-11-08 01:56:32,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6956.3). Total num frames: 85807104. Throughput: 0: 1692.3. Samples: 16444530. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:56:32,936][41694] Avg episode reward: [(0, '4.358')] +[2024-11-08 01:56:36,207][42004] Updated weights for policy 0, policy_version 20956 (0.0025) +[2024-11-08 01:56:37,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7031.5, 300 sec: 6956.3). Total num frames: 85848064. Throughput: 0: 1714.8. Samples: 16456112. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:56:37,934][41694] Avg episode reward: [(0, '4.733')] +[2024-11-08 01:56:42,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6690.1, 300 sec: 6900.7). Total num frames: 85868544. Throughput: 0: 1622.5. Samples: 16462934. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:56:42,933][41694] Avg episode reward: [(0, '4.639')] +[2024-11-08 01:56:43,992][42004] Updated weights for policy 0, policy_version 20966 (0.0032) +[2024-11-08 01:56:47,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.1, 300 sec: 6900.7). Total num frames: 85905408. Throughput: 0: 1628.0. Samples: 16468670. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:56:47,933][41694] Avg episode reward: [(0, '4.605')] +[2024-11-08 01:56:49,389][42004] Updated weights for policy 0, policy_version 20976 (0.0026) +[2024-11-08 01:56:52,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6914.6). Total num frames: 85942272. Throughput: 0: 1720.7. Samples: 16480174. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:56:52,933][41694] Avg episode reward: [(0, '4.408')] +[2024-11-08 01:56:54,630][42004] Updated weights for policy 0, policy_version 20986 (0.0030) +[2024-11-08 01:56:57,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6963.2). Total num frames: 85979136. Throughput: 0: 1756.9. Samples: 16491420. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:56:57,935][41694] Avg episode reward: [(0, '4.177')] +[2024-11-08 01:57:00,874][42004] Updated weights for policy 0, policy_version 20996 (0.0034) +[2024-11-08 01:57:02,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.2, 300 sec: 6956.3). Total num frames: 86011904. Throughput: 0: 1750.4. Samples: 16496208. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:57:02,933][41694] Avg episode reward: [(0, '4.488')] +[2024-11-08 01:57:06,676][42004] Updated weights for policy 0, policy_version 21006 (0.0038) +[2024-11-08 01:57:07,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6963.1, 300 sec: 6956.2). Total num frames: 86048768. Throughput: 0: 1729.2. Samples: 16506400. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:57:07,934][41694] Avg episode reward: [(0, '4.464')] +[2024-11-08 01:57:11,940][42004] Updated weights for policy 0, policy_version 21016 (0.0023) +[2024-11-08 01:57:12,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 6942.4). Total num frames: 86085632. Throughput: 0: 1761.4. Samples: 16518088. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:57:12,933][41694] Avg episode reward: [(0, '4.244')] +[2024-11-08 01:57:17,931][41694] Fps is (10 sec: 5734.6, 60 sec: 6758.4, 300 sec: 6873.0). Total num frames: 86106112. Throughput: 0: 1685.4. Samples: 16520374. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:57:17,933][41694] Avg episode reward: [(0, '4.193')] +[2024-11-08 01:57:19,622][42004] Updated weights for policy 0, policy_version 21026 (0.0028) +[2024-11-08 01:57:22,932][41694] Fps is (10 sec: 6143.6, 60 sec: 6894.9, 300 sec: 6886.8). Total num frames: 86147072. Throughput: 0: 1661.9. Samples: 16530898. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:57:22,937][41694] Avg episode reward: [(0, '4.543')] +[2024-11-08 01:57:24,863][42004] Updated weights for policy 0, policy_version 21036 (0.0022) +[2024-11-08 01:57:27,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6894.9, 300 sec: 6900.7). Total num frames: 86183936. Throughput: 0: 1767.3. Samples: 16542462. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:57:27,933][41694] Avg episode reward: [(0, '4.359')] +[2024-11-08 01:57:30,340][42004] Updated weights for policy 0, policy_version 21046 (0.0026) +[2024-11-08 01:57:32,932][41694] Fps is (10 sec: 7373.1, 60 sec: 6894.9, 300 sec: 6949.9). Total num frames: 86220800. Throughput: 0: 1764.8. Samples: 16548086. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:57:32,934][41694] Avg episode reward: [(0, '4.229')] +[2024-11-08 01:57:36,112][42004] Updated weights for policy 0, policy_version 21056 (0.0025) +[2024-11-08 01:57:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6956.3). Total num frames: 86257664. Throughput: 0: 1744.8. Samples: 16558688. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:57:37,934][41694] Avg episode reward: [(0, '4.351')] +[2024-11-08 01:57:37,948][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000021059_86257664.pth... +[2024-11-08 01:57:38,090][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000020655_84602880.pth +[2024-11-08 01:57:41,849][42004] Updated weights for policy 0, policy_version 21066 (0.0040) +[2024-11-08 01:57:42,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7099.7, 300 sec: 6956.3). Total num frames: 86294528. Throughput: 0: 1739.3. Samples: 16569688. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:57:42,934][41694] Avg episode reward: [(0, '4.590')] +[2024-11-08 01:57:46,994][42004] Updated weights for policy 0, policy_version 21076 (0.0024) +[2024-11-08 01:57:50,001][41694] Fps is (10 sec: 6108.5, 60 sec: 6863.0, 300 sec: 6894.0). Total num frames: 86331392. Throughput: 0: 1683.4. Samples: 16575446. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:57:50,004][41694] Avg episode reward: [(0, '4.545')] +[2024-11-08 01:57:52,932][41694] Fps is (10 sec: 6143.6, 60 sec: 6894.9, 300 sec: 6886.8). Total num frames: 86355968. Throughput: 0: 1701.3. Samples: 16582958. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:57:52,935][41694] Avg episode reward: [(0, '4.503')] +[2024-11-08 01:57:54,476][42004] Updated weights for policy 0, policy_version 21086 (0.0029) +[2024-11-08 01:57:57,932][41694] Fps is (10 sec: 7747.6, 60 sec: 6894.9, 300 sec: 6900.7). Total num frames: 86392832. Throughput: 0: 1692.9. Samples: 16594270. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:57:57,934][41694] Avg episode reward: [(0, '4.408')] +[2024-11-08 01:57:59,912][42004] Updated weights for policy 0, policy_version 21096 (0.0035) +[2024-11-08 01:58:02,931][41694] Fps is (10 sec: 7373.2, 60 sec: 6963.2, 300 sec: 6886.8). Total num frames: 86429696. Throughput: 0: 1771.6. Samples: 16600094. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:58:02,933][41694] Avg episode reward: [(0, '4.401')] +[2024-11-08 01:58:05,918][42004] Updated weights for policy 0, policy_version 21106 (0.0031) +[2024-11-08 01:58:07,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6895.0, 300 sec: 6933.4). Total num frames: 86462464. Throughput: 0: 1763.0. Samples: 16610234. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:58:07,934][41694] Avg episode reward: [(0, '4.265')] +[2024-11-08 01:58:11,745][42004] Updated weights for policy 0, policy_version 21116 (0.0038) +[2024-11-08 01:58:12,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 6942.4). Total num frames: 86499328. Throughput: 0: 1742.8. Samples: 16620888. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:58:12,934][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 01:58:17,288][42004] Updated weights for policy 0, policy_version 21126 (0.0035) +[2024-11-08 01:58:17,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7168.0, 300 sec: 6928.5). Total num frames: 86536192. Throughput: 0: 1742.0. Samples: 16626476. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:58:17,933][41694] Avg episode reward: [(0, '4.323')] +[2024-11-08 01:58:24,480][41694] Fps is (10 sec: 6029.4, 60 sec: 6854.6, 300 sec: 6878.5). Total num frames: 86568960. Throughput: 0: 1701.2. Samples: 16637878. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:58:24,482][41694] Avg episode reward: [(0, '4.540')] +[2024-11-08 01:58:24,774][42004] Updated weights for policy 0, policy_version 21136 (0.0029) +[2024-11-08 01:58:27,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6826.6, 300 sec: 6872.9). Total num frames: 86593536. Throughput: 0: 1677.4. Samples: 16645170. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:58:27,933][41694] Avg episode reward: [(0, '4.281')] +[2024-11-08 01:58:30,063][42004] Updated weights for policy 0, policy_version 21146 (0.0025) +[2024-11-08 01:58:32,931][41694] Fps is (10 sec: 7754.8, 60 sec: 6895.0, 300 sec: 6886.8). Total num frames: 86634496. Throughput: 0: 1757.2. Samples: 16650884. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:58:32,933][41694] Avg episode reward: [(0, '4.361')] +[2024-11-08 01:58:35,450][42004] Updated weights for policy 0, policy_version 21156 (0.0028) +[2024-11-08 01:58:37,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6894.9, 300 sec: 6886.8). Total num frames: 86671360. Throughput: 0: 1767.5. Samples: 16662494. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:58:37,933][41694] Avg episode reward: [(0, '4.341')] +[2024-11-08 01:58:41,287][42004] Updated weights for policy 0, policy_version 21166 (0.0037) +[2024-11-08 01:58:42,931][41694] Fps is (10 sec: 6963.1, 60 sec: 6826.7, 300 sec: 6921.4). Total num frames: 86704128. Throughput: 0: 1744.0. Samples: 16672752. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 01:58:42,933][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 01:58:46,699][42004] Updated weights for policy 0, policy_version 21176 (0.0029) +[2024-11-08 01:58:47,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7141.3, 300 sec: 6942.4). Total num frames: 86745088. Throughput: 0: 1739.6. Samples: 16678376. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 01:58:47,934][41694] Avg episode reward: [(0, '4.443')] +[2024-11-08 01:58:51,993][42004] Updated weights for policy 0, policy_version 21186 (0.0027) +[2024-11-08 01:58:52,932][41694] Fps is (10 sec: 7782.1, 60 sec: 7099.8, 300 sec: 6928.5). Total num frames: 86781952. Throughput: 0: 1772.3. Samples: 16689988. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 01:58:52,934][41694] Avg episode reward: [(0, '4.317')] +[2024-11-08 01:58:58,969][41694] Fps is (10 sec: 5937.8, 60 sec: 6844.9, 300 sec: 6876.5). Total num frames: 86810624. Throughput: 0: 1630.4. Samples: 16695946. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:58:58,970][41694] Avg episode reward: [(0, '4.358')] +[2024-11-08 01:58:59,496][42004] Updated weights for policy 0, policy_version 21196 (0.0030) +[2024-11-08 01:59:02,931][41694] Fps is (10 sec: 6144.2, 60 sec: 6894.9, 300 sec: 6886.8). Total num frames: 86843392. Throughput: 0: 1705.5. Samples: 16703224. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:59:02,934][41694] Avg episode reward: [(0, '4.268')] +[2024-11-08 01:59:05,130][42004] Updated weights for policy 0, policy_version 21206 (0.0034) +[2024-11-08 01:59:07,933][41694] Fps is (10 sec: 7767.6, 60 sec: 6963.0, 300 sec: 6900.7). Total num frames: 86880256. Throughput: 0: 1761.3. Samples: 16714410. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:59:07,935][41694] Avg episode reward: [(0, '4.484')] +[2024-11-08 01:59:10,427][42004] Updated weights for policy 0, policy_version 21216 (0.0030) +[2024-11-08 01:59:12,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6900.7). Total num frames: 86917120. Throughput: 0: 1780.8. Samples: 16725308. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:59:12,934][41694] Avg episode reward: [(0, '4.618')] +[2024-11-08 01:59:16,412][42004] Updated weights for policy 0, policy_version 21226 (0.0024) +[2024-11-08 01:59:17,932][41694] Fps is (10 sec: 7373.7, 60 sec: 6963.2, 300 sec: 6942.4). Total num frames: 86953984. Throughput: 0: 1766.5. Samples: 16730378. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:59:17,938][41694] Avg episode reward: [(0, '4.362')] +[2024-11-08 01:59:21,789][42004] Updated weights for policy 0, policy_version 21236 (0.0031) +[2024-11-08 01:59:22,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7217.8, 300 sec: 6942.4). Total num frames: 86990848. Throughput: 0: 1761.7. Samples: 16741772. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:59:22,934][41694] Avg episode reward: [(0, '4.560')] +[2024-11-08 01:59:27,071][42004] Updated weights for policy 0, policy_version 21246 (0.0025) +[2024-11-08 01:59:27,931][41694] Fps is (10 sec: 7373.1, 60 sec: 7236.3, 300 sec: 6928.5). Total num frames: 87027712. Throughput: 0: 1791.5. Samples: 16753370. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:59:27,933][41694] Avg episode reward: [(0, '4.291')] +[2024-11-08 01:59:33,463][41694] Fps is (10 sec: 5833.8, 60 sec: 6902.0, 300 sec: 6874.4). Total num frames: 87052288. Throughput: 0: 1772.7. Samples: 16759090. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:59:33,466][41694] Avg episode reward: [(0, '4.395')] +[2024-11-08 01:59:34,815][42004] Updated weights for policy 0, policy_version 21256 (0.0047) +[2024-11-08 01:59:37,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6894.9, 300 sec: 6873.0). Total num frames: 87085056. Throughput: 0: 1683.1. Samples: 16765726. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:59:37,933][41694] Avg episode reward: [(0, '4.546')] +[2024-11-08 01:59:37,943][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000021261_87085056.pth... +[2024-11-08 01:59:38,042][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000020856_85426176.pth +[2024-11-08 01:59:40,548][42004] Updated weights for policy 0, policy_version 21266 (0.0029) +[2024-11-08 01:59:42,931][41694] Fps is (10 sec: 7354.3, 60 sec: 6963.2, 300 sec: 6886.8). Total num frames: 87121920. Throughput: 0: 1832.1. Samples: 16776488. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 01:59:42,934][41694] Avg episode reward: [(0, '4.574')] +[2024-11-08 01:59:46,398][42004] Updated weights for policy 0, policy_version 21276 (0.0040) +[2024-11-08 01:59:47,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6826.7, 300 sec: 6859.1). Total num frames: 87154688. Throughput: 0: 1744.3. Samples: 16781720. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 01:59:47,933][41694] Avg episode reward: [(0, '4.416')] +[2024-11-08 01:59:52,453][42004] Updated weights for policy 0, policy_version 21286 (0.0030) +[2024-11-08 01:59:52,932][41694] Fps is (10 sec: 6553.0, 60 sec: 6758.3, 300 sec: 6900.7). Total num frames: 87187456. Throughput: 0: 1725.9. Samples: 16792072. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 01:59:52,935][41694] Avg episode reward: [(0, '4.506')] +[2024-11-08 01:59:57,828][42004] Updated weights for policy 0, policy_version 21296 (0.0032) +[2024-11-08 01:59:57,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7085.6, 300 sec: 6928.5). Total num frames: 87228416. Throughput: 0: 1728.3. Samples: 16803082. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:59:57,934][41694] Avg episode reward: [(0, '4.507')] +[2024-11-08 02:00:02,932][41694] Fps is (10 sec: 7373.4, 60 sec: 6963.2, 300 sec: 6900.7). Total num frames: 87261184. Throughput: 0: 1739.3. Samples: 16808646. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:00:02,935][41694] Avg episode reward: [(0, '4.267')] +[2024-11-08 02:00:03,491][42004] Updated weights for policy 0, policy_version 21306 (0.0031) +[2024-11-08 02:00:07,953][41694] Fps is (10 sec: 5722.6, 60 sec: 6756.2, 300 sec: 6858.6). Total num frames: 87285760. Throughput: 0: 1604.9. Samples: 16814026. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:00:07,955][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 02:00:11,151][42004] Updated weights for policy 0, policy_version 21316 (0.2265) +[2024-11-08 02:00:12,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6758.4, 300 sec: 6886.8). Total num frames: 87322624. Throughput: 0: 1629.2. Samples: 16826686. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:00:12,933][41694] Avg episode reward: [(0, '4.294')] +[2024-11-08 02:00:17,448][42004] Updated weights for policy 0, policy_version 21326 (0.0028) +[2024-11-08 02:00:17,932][41694] Fps is (10 sec: 6567.4, 60 sec: 6621.9, 300 sec: 6859.1). Total num frames: 87351296. Throughput: 0: 1627.9. Samples: 16831478. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:00:17,933][41694] Avg episode reward: [(0, '4.210')] +[2024-11-08 02:00:22,932][41694] Fps is (10 sec: 6143.8, 60 sec: 6553.6, 300 sec: 6845.2). Total num frames: 87384064. Throughput: 0: 1674.5. Samples: 16841080. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:00:22,939][41694] Avg episode reward: [(0, '4.423')] +[2024-11-08 02:00:23,556][42004] Updated weights for policy 0, policy_version 21336 (0.0038) +[2024-11-08 02:00:27,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6621.8, 300 sec: 6900.7). Total num frames: 87425024. Throughput: 0: 1685.5. Samples: 16852336. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:00:27,935][41694] Avg episode reward: [(0, '4.357')] +[2024-11-08 02:00:28,842][42004] Updated weights for policy 0, policy_version 21346 (0.0030) +[2024-11-08 02:00:32,931][41694] Fps is (10 sec: 7782.7, 60 sec: 6887.7, 300 sec: 6900.7). Total num frames: 87461888. Throughput: 0: 1694.7. Samples: 16857980. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:00:32,933][41694] Avg episode reward: [(0, '4.390')] +[2024-11-08 02:00:34,200][42004] Updated weights for policy 0, policy_version 21356 (0.0035) +[2024-11-08 02:00:37,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6894.9, 300 sec: 6886.8). Total num frames: 87498752. Throughput: 0: 1719.8. Samples: 16869462. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:00:37,933][41694] Avg episode reward: [(0, '4.361')] +[2024-11-08 02:00:39,705][42004] Updated weights for policy 0, policy_version 21366 (0.0029) +[2024-11-08 02:00:42,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6621.9, 300 sec: 6831.3). Total num frames: 87519232. Throughput: 0: 1647.4. Samples: 16877212. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:00:42,933][41694] Avg episode reward: [(0, '4.411')] +[2024-11-08 02:00:47,271][42004] Updated weights for policy 0, policy_version 21376 (0.0035) +[2024-11-08 02:00:47,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6758.4, 300 sec: 6859.1). Total num frames: 87560192. Throughput: 0: 1630.2. Samples: 16882004. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:00:47,935][41694] Avg episode reward: [(0, '4.544')] +[2024-11-08 02:00:52,894][42004] Updated weights for policy 0, policy_version 21386 (0.0034) +[2024-11-08 02:00:52,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6826.8, 300 sec: 6859.1). Total num frames: 87597056. Throughput: 0: 1765.8. Samples: 16893448. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:00:52,933][41694] Avg episode reward: [(0, '4.402')] +[2024-11-08 02:00:57,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6690.2, 300 sec: 6845.2). Total num frames: 87629824. Throughput: 0: 1716.9. Samples: 16903948. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:00:57,934][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 02:00:58,588][42004] Updated weights for policy 0, policy_version 21396 (0.0025) +[2024-11-08 02:01:02,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6900.7). Total num frames: 87666688. Throughput: 0: 1740.2. Samples: 16909786. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:01:02,933][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 02:01:04,170][42004] Updated weights for policy 0, policy_version 21406 (0.0035) +[2024-11-08 02:01:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6965.6, 300 sec: 6914.6). Total num frames: 87703552. Throughput: 0: 1764.1. Samples: 16920466. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 02:01:07,935][41694] Avg episode reward: [(0, '4.500')] +[2024-11-08 02:01:10,178][42004] Updated weights for policy 0, policy_version 21416 (0.0043) +[2024-11-08 02:01:12,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 6900.7). Total num frames: 87736320. Throughput: 0: 1734.8. Samples: 16930402. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 02:01:12,933][41694] Avg episode reward: [(0, '4.229')] +[2024-11-08 02:01:17,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6758.4, 300 sec: 6859.1). Total num frames: 87756800. Throughput: 0: 1712.0. Samples: 16935018. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 02:01:17,933][41694] Avg episode reward: [(0, '4.482')] +[2024-11-08 02:01:18,351][42004] Updated weights for policy 0, policy_version 21426 (0.0024) +[2024-11-08 02:01:22,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6826.7, 300 sec: 6859.1). Total num frames: 87793664. Throughput: 0: 1628.9. Samples: 16942762. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:01:22,934][41694] Avg episode reward: [(0, '4.410')] +[2024-11-08 02:01:24,116][42004] Updated weights for policy 0, policy_version 21436 (0.0028) +[2024-11-08 02:01:27,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.2, 300 sec: 6845.2). Total num frames: 87826432. Throughput: 0: 1681.5. Samples: 16952880. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:01:27,933][41694] Avg episode reward: [(0, '4.494')] +[2024-11-08 02:01:30,097][42004] Updated weights for policy 0, policy_version 21446 (0.0021) +[2024-11-08 02:01:32,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 6831.3). Total num frames: 87863296. Throughput: 0: 1691.9. Samples: 16958140. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:01:32,936][41694] Avg episode reward: [(0, '4.245')] +[2024-11-08 02:01:35,604][42004] Updated weights for policy 0, policy_version 21456 (0.0026) +[2024-11-08 02:01:37,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6886.8). Total num frames: 87900160. Throughput: 0: 1688.7. Samples: 16969442. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:01:37,934][41694] Avg episode reward: [(0, '4.465')] +[2024-11-08 02:01:37,952][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000021460_87900160.pth... +[2024-11-08 02:01:38,079][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000021059_86257664.pth +[2024-11-08 02:01:41,040][42004] Updated weights for policy 0, policy_version 21466 (0.0034) +[2024-11-08 02:01:42,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 6886.8). Total num frames: 87937024. Throughput: 0: 1704.9. Samples: 16980670. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:01:42,933][41694] Avg episode reward: [(0, '4.571')] +[2024-11-08 02:01:46,423][42004] Updated weights for policy 0, policy_version 21476 (0.0029) +[2024-11-08 02:01:47,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6895.0, 300 sec: 6886.8). Total num frames: 87973888. Throughput: 0: 1695.4. Samples: 16986078. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:01:47,934][41694] Avg episode reward: [(0, '4.614')] +[2024-11-08 02:01:52,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6621.9, 300 sec: 6831.3). Total num frames: 87994368. Throughput: 0: 1632.5. Samples: 16993928. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 02:01:52,933][41694] Avg episode reward: [(0, '4.382')] +[2024-11-08 02:01:54,267][42004] Updated weights for policy 0, policy_version 21486 (0.0032) +[2024-11-08 02:01:57,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.1, 300 sec: 6845.2). Total num frames: 88031232. Throughput: 0: 1642.7. Samples: 17004322. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 02:01:57,935][41694] Avg episode reward: [(0, '4.630')] +[2024-11-08 02:01:59,943][42004] Updated weights for policy 0, policy_version 21496 (0.0025) +[2024-11-08 02:02:02,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 6831.3). Total num frames: 88064000. Throughput: 0: 1658.5. Samples: 17009652. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 02:02:02,933][41694] Avg episode reward: [(0, '4.624')] +[2024-11-08 02:02:06,023][42004] Updated weights for policy 0, policy_version 21506 (0.0022) +[2024-11-08 02:02:07,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6621.8, 300 sec: 6831.3). Total num frames: 88100864. Throughput: 0: 1709.8. Samples: 17019702. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:02:07,936][41694] Avg episode reward: [(0, '4.328')] +[2024-11-08 02:02:11,641][42004] Updated weights for policy 0, policy_version 21516 (0.0032) +[2024-11-08 02:02:12,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6886.8). Total num frames: 88137728. Throughput: 0: 1729.6. Samples: 17030714. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:02:12,934][41694] Avg episode reward: [(0, '4.452')] +[2024-11-08 02:02:16,932][42004] Updated weights for policy 0, policy_version 21526 (0.0025) +[2024-11-08 02:02:17,932][41694] Fps is (10 sec: 7373.1, 60 sec: 6963.2, 300 sec: 6873.0). Total num frames: 88174592. Throughput: 0: 1739.2. Samples: 17036406. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:02:17,934][41694] Avg episode reward: [(0, '4.506')] +[2024-11-08 02:02:22,244][42004] Updated weights for policy 0, policy_version 21536 (0.0033) +[2024-11-08 02:02:22,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7031.5, 300 sec: 6886.8). Total num frames: 88215552. Throughput: 0: 1746.3. Samples: 17048026. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:02:22,933][41694] Avg episode reward: [(0, '4.333')] +[2024-11-08 02:02:27,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6826.7, 300 sec: 6831.3). Total num frames: 88236032. Throughput: 0: 1655.6. Samples: 17055174. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:02:27,933][41694] Avg episode reward: [(0, '4.510')] +[2024-11-08 02:02:29,902][42004] Updated weights for policy 0, policy_version 21546 (0.0032) +[2024-11-08 02:02:32,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6826.7, 300 sec: 6831.3). Total num frames: 88272896. Throughput: 0: 1661.6. Samples: 17060848. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:02:32,934][41694] Avg episode reward: [(0, '4.586')] +[2024-11-08 02:02:35,664][42004] Updated weights for policy 0, policy_version 21556 (0.0029) +[2024-11-08 02:02:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6831.3). Total num frames: 88309760. Throughput: 0: 1725.9. Samples: 17071596. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:02:37,935][41694] Avg episode reward: [(0, '4.594')] +[2024-11-08 02:02:41,056][42004] Updated weights for policy 0, policy_version 21566 (0.0025) +[2024-11-08 02:02:42,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6879.6). Total num frames: 88346624. Throughput: 0: 1753.0. Samples: 17083206. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:02:42,933][41694] Avg episode reward: [(0, '4.265')] +[2024-11-08 02:02:46,667][42004] Updated weights for policy 0, policy_version 21576 (0.0026) +[2024-11-08 02:02:47,938][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6873.0). Total num frames: 88383488. Throughput: 0: 1744.5. Samples: 17088156. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:02:47,941][41694] Avg episode reward: [(0, '4.311')] +[2024-11-08 02:02:52,073][42004] Updated weights for policy 0, policy_version 21586 (0.0021) +[2024-11-08 02:02:52,931][41694] Fps is (10 sec: 7372.7, 60 sec: 7099.7, 300 sec: 6873.0). Total num frames: 88420352. Throughput: 0: 1775.9. Samples: 17099618. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:02:52,934][41694] Avg episode reward: [(0, '4.664')] +[2024-11-08 02:02:57,398][42004] Updated weights for policy 0, policy_version 21596 (0.0022) +[2024-11-08 02:02:57,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7099.8, 300 sec: 6873.0). Total num frames: 88457216. Throughput: 0: 1786.8. Samples: 17111122. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:02:57,933][41694] Avg episode reward: [(0, '4.353')] +[2024-11-08 02:03:02,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6894.9, 300 sec: 6831.3). Total num frames: 88477696. Throughput: 0: 1708.1. Samples: 17113272. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:03:02,934][41694] Avg episode reward: [(0, '4.477')] +[2024-11-08 02:03:05,508][42004] Updated weights for policy 0, policy_version 21606 (0.0022) +[2024-11-08 02:03:07,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 88510464. Throughput: 0: 1666.0. Samples: 17122996. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:03:07,940][41694] Avg episode reward: [(0, '4.562')] +[2024-11-08 02:03:11,711][42004] Updated weights for policy 0, policy_version 21616 (0.0032) +[2024-11-08 02:03:12,936][41694] Fps is (10 sec: 6960.3, 60 sec: 6826.2, 300 sec: 6817.3). Total num frames: 88547328. Throughput: 0: 1724.9. Samples: 17132802. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:03:12,941][41694] Avg episode reward: [(0, '4.464')] +[2024-11-08 02:03:17,052][42004] Updated weights for policy 0, policy_version 21626 (0.0027) +[2024-11-08 02:03:17,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6867.3). Total num frames: 88584192. Throughput: 0: 1720.4. Samples: 17138266. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:03:17,934][41694] Avg episode reward: [(0, '4.349')] +[2024-11-08 02:03:22,298][42004] Updated weights for policy 0, policy_version 21636 (0.0034) +[2024-11-08 02:03:22,932][41694] Fps is (10 sec: 7785.3, 60 sec: 6826.6, 300 sec: 6886.8). Total num frames: 88625152. Throughput: 0: 1746.1. Samples: 17150172. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:03:22,935][41694] Avg episode reward: [(0, '4.232')] +[2024-11-08 02:03:27,809][42004] Updated weights for policy 0, policy_version 21646 (0.0038) +[2024-11-08 02:03:27,931][41694] Fps is (10 sec: 7782.6, 60 sec: 7099.7, 300 sec: 6872.9). Total num frames: 88662016. Throughput: 0: 1741.1. Samples: 17161554. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:03:27,934][41694] Avg episode reward: [(0, '4.398')] +[2024-11-08 02:03:34,801][41694] Fps is (10 sec: 6211.8, 60 sec: 6885.2, 300 sec: 6829.7). Total num frames: 88698880. Throughput: 0: 1685.4. Samples: 17167150. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:03:34,804][41694] Avg episode reward: [(0, '4.353')] +[2024-11-08 02:03:35,288][42004] Updated weights for policy 0, policy_version 21656 (0.0032) +[2024-11-08 02:03:37,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6826.7, 300 sec: 6831.3). Total num frames: 88719360. Throughput: 0: 1666.5. Samples: 17174612. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:03:37,934][41694] Avg episode reward: [(0, '4.373')] +[2024-11-08 02:03:37,944][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000021660_88719360.pth... +[2024-11-08 02:03:38,112][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000021261_87085056.pth +[2024-11-08 02:03:41,126][42004] Updated weights for policy 0, policy_version 21666 (0.0045) +[2024-11-08 02:03:42,931][41694] Fps is (10 sec: 6549.2, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 88752128. Throughput: 0: 1642.6. Samples: 17185040. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:03:42,933][41694] Avg episode reward: [(0, '4.548')] +[2024-11-08 02:03:47,001][42004] Updated weights for policy 0, policy_version 21676 (0.0028) +[2024-11-08 02:03:47,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 88788992. Throughput: 0: 1703.6. Samples: 17189934. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:03:47,934][41694] Avg episode reward: [(0, '4.352')] +[2024-11-08 02:03:52,311][42004] Updated weights for policy 0, policy_version 21686 (0.0028) +[2024-11-08 02:03:52,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6826.7, 300 sec: 6869.3). Total num frames: 88829952. Throughput: 0: 1749.8. Samples: 17201738. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:03:52,934][41694] Avg episode reward: [(0, '4.416')] +[2024-11-08 02:03:57,856][42004] Updated weights for policy 0, policy_version 21696 (0.0040) +[2024-11-08 02:03:57,931][41694] Fps is (10 sec: 7782.8, 60 sec: 6826.7, 300 sec: 6859.1). Total num frames: 88866816. Throughput: 0: 1775.2. Samples: 17212680. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:03:57,933][41694] Avg episode reward: [(0, '4.484')] +[2024-11-08 02:04:02,933][41694] Fps is (10 sec: 6962.3, 60 sec: 7031.3, 300 sec: 6845.2). Total num frames: 88899584. Throughput: 0: 1776.6. Samples: 17218214. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:04:02,935][41694] Avg episode reward: [(0, '4.353')] +[2024-11-08 02:04:03,861][42004] Updated weights for policy 0, policy_version 21706 (0.0040) +[2024-11-08 02:04:09,457][41694] Fps is (10 sec: 5330.7, 60 sec: 6790.5, 300 sec: 6782.3). Total num frames: 88928256. Throughput: 0: 1667.1. Samples: 17227736. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:04:09,459][41694] Avg episode reward: [(0, '4.329')] +[2024-11-08 02:04:12,392][42004] Updated weights for policy 0, policy_version 21716 (0.0041) +[2024-11-08 02:04:12,932][41694] Fps is (10 sec: 4915.8, 60 sec: 6690.6, 300 sec: 6761.9). Total num frames: 88948736. Throughput: 0: 1613.4. Samples: 17234158. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:04:12,934][41694] Avg episode reward: [(0, '4.178')] +[2024-11-08 02:04:17,931][41694] Fps is (10 sec: 6283.4, 60 sec: 6621.9, 300 sec: 6748.0). Total num frames: 88981504. Throughput: 0: 1668.7. Samples: 17239122. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:04:17,933][41694] Avg episode reward: [(0, '4.333')] +[2024-11-08 02:04:18,565][42004] Updated weights for policy 0, policy_version 21726 (0.0024) +[2024-11-08 02:04:22,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6553.6, 300 sec: 6748.0). Total num frames: 89018368. Throughput: 0: 1663.5. Samples: 17249470. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:04:22,934][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 02:04:23,957][42004] Updated weights for policy 0, policy_version 21736 (0.0025) +[2024-11-08 02:04:27,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6621.9, 300 sec: 6815.8). Total num frames: 89059328. Throughput: 0: 1692.8. Samples: 17261218. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:04:27,934][41694] Avg episode reward: [(0, '4.362')] +[2024-11-08 02:04:29,157][42004] Updated weights for policy 0, policy_version 21746 (0.0021) +[2024-11-08 02:04:32,931][41694] Fps is (10 sec: 8192.1, 60 sec: 6905.3, 300 sec: 6831.3). Total num frames: 89100288. Throughput: 0: 1718.3. Samples: 17267256. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:04:32,933][41694] Avg episode reward: [(0, '4.261')] +[2024-11-08 02:04:34,398][42004] Updated weights for policy 0, policy_version 21756 (0.0026) +[2024-11-08 02:04:37,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6831.3). Total num frames: 89137152. Throughput: 0: 1713.5. Samples: 17278846. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:04:37,934][41694] Avg episode reward: [(0, '4.737')] +[2024-11-08 02:04:39,814][42004] Updated weights for policy 0, policy_version 21766 (0.0042) +[2024-11-08 02:04:44,090][41694] Fps is (10 sec: 5873.4, 60 sec: 6764.4, 300 sec: 6790.8). Total num frames: 89165824. Throughput: 0: 1561.2. Samples: 17284740. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:04:44,094][41694] Avg episode reward: [(0, '4.589')] +[2024-11-08 02:04:47,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6690.2, 300 sec: 6789.7). Total num frames: 89190400. Throughput: 0: 1616.6. Samples: 17290958. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:04:47,933][41694] Avg episode reward: [(0, '4.468')] +[2024-11-08 02:04:47,961][42004] Updated weights for policy 0, policy_version 21776 (0.0034) +[2024-11-08 02:04:52,931][41694] Fps is (10 sec: 6948.7, 60 sec: 6621.9, 300 sec: 6775.8). Total num frames: 89227264. Throughput: 0: 1683.6. Samples: 17300928. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:04:52,934][41694] Avg episode reward: [(0, '4.359')] +[2024-11-08 02:04:54,002][42004] Updated weights for policy 0, policy_version 21786 (0.0024) +[2024-11-08 02:04:57,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6789.6). Total num frames: 89264128. Throughput: 0: 1731.0. Samples: 17312052. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:04:57,934][41694] Avg episode reward: [(0, '4.475')] +[2024-11-08 02:04:59,420][42004] Updated weights for policy 0, policy_version 21796 (0.0025) +[2024-11-08 02:05:02,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.3, 300 sec: 6831.8). Total num frames: 89300992. Throughput: 0: 1752.2. Samples: 17317970. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:05:02,933][41694] Avg episode reward: [(0, '4.620')] +[2024-11-08 02:05:04,724][42004] Updated weights for policy 0, policy_version 21806 (0.0022) +[2024-11-08 02:05:07,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7074.8, 300 sec: 6845.2). Total num frames: 89341952. Throughput: 0: 1780.7. Samples: 17329600. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:05:07,934][41694] Avg episode reward: [(0, '4.539')] +[2024-11-08 02:05:10,022][42004] Updated weights for policy 0, policy_version 21816 (0.0026) +[2024-11-08 02:05:12,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7099.7, 300 sec: 6859.1). Total num frames: 89374720. Throughput: 0: 1767.0. Samples: 17340734. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:05:12,934][41694] Avg episode reward: [(0, '4.493')] +[2024-11-08 02:05:16,105][42004] Updated weights for policy 0, policy_version 21826 (0.0036) +[2024-11-08 02:05:18,548][41694] Fps is (10 sec: 5401.3, 60 sec: 6892.3, 300 sec: 6817.0). Total num frames: 89399296. Throughput: 0: 1713.7. Samples: 17345428. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:05:18,553][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 02:05:22,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6826.7, 300 sec: 6789.6). Total num frames: 89427968. Throughput: 0: 1622.5. Samples: 17351860. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 02:05:22,934][41694] Avg episode reward: [(0, '4.390')] +[2024-11-08 02:05:24,607][42004] Updated weights for policy 0, policy_version 21836 (0.0031) +[2024-11-08 02:05:27,931][41694] Fps is (10 sec: 6548.0, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 89460736. Throughput: 0: 1752.7. Samples: 17361580. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 02:05:27,934][41694] Avg episode reward: [(0, '4.399')] +[2024-11-08 02:05:30,120][42004] Updated weights for policy 0, policy_version 21846 (0.0040) +[2024-11-08 02:05:32,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6789.6). Total num frames: 89501696. Throughput: 0: 1700.9. Samples: 17367500. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 02:05:32,933][41694] Avg episode reward: [(0, '4.486')] +[2024-11-08 02:05:35,562][42004] Updated weights for policy 0, policy_version 21856 (0.0028) +[2024-11-08 02:05:37,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6690.1, 300 sec: 6845.2). Total num frames: 89538560. Throughput: 0: 1735.9. Samples: 17379046. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:05:37,933][41694] Avg episode reward: [(0, '4.283')] +[2024-11-08 02:05:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000021860_89538560.pth... +[2024-11-08 02:05:38,063][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000021460_87900160.pth +[2024-11-08 02:05:40,725][42004] Updated weights for policy 0, policy_version 21866 (0.0033) +[2024-11-08 02:05:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6961.0, 300 sec: 6831.3). Total num frames: 89575424. Throughput: 0: 1752.4. Samples: 17390912. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:05:42,934][41694] Avg episode reward: [(0, '4.320')] +[2024-11-08 02:05:46,155][42004] Updated weights for policy 0, policy_version 21876 (0.0027) +[2024-11-08 02:05:47,937][41694] Fps is (10 sec: 7777.9, 60 sec: 7099.0, 300 sec: 6845.0). Total num frames: 89616384. Throughput: 0: 1740.2. Samples: 17396288. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:05:47,940][41694] Avg episode reward: [(0, '4.582')] +[2024-11-08 02:05:53,065][41694] Fps is (10 sec: 6062.9, 60 sec: 6811.5, 300 sec: 6800.4). Total num frames: 89636864. Throughput: 0: 1602.4. Samples: 17401924. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:05:53,067][41694] Avg episode reward: [(0, '4.622')] +[2024-11-08 02:05:54,020][42004] Updated weights for policy 0, policy_version 21886 (0.0025) +[2024-11-08 02:05:57,932][41694] Fps is (10 sec: 5327.8, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 89669632. Throughput: 0: 1622.1. Samples: 17413730. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:05:57,934][41694] Avg episode reward: [(0, '4.554')] +[2024-11-08 02:06:00,176][42004] Updated weights for policy 0, policy_version 21896 (0.0031) +[2024-11-08 02:06:02,932][41694] Fps is (10 sec: 6642.2, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 89702400. Throughput: 0: 1654.9. Samples: 17418878. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:06:02,943][41694] Avg episode reward: [(0, '4.441')] +[2024-11-08 02:06:05,748][42004] Updated weights for policy 0, policy_version 21906 (0.0027) +[2024-11-08 02:06:07,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6621.9, 300 sec: 6789.6). Total num frames: 89739264. Throughput: 0: 1727.1. Samples: 17429580. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:06:07,935][41694] Avg episode reward: [(0, '4.321')] +[2024-11-08 02:06:11,480][42004] Updated weights for policy 0, policy_version 21916 (0.0043) +[2024-11-08 02:06:12,933][41694] Fps is (10 sec: 7371.8, 60 sec: 6690.0, 300 sec: 6845.1). Total num frames: 89776128. Throughput: 0: 1752.9. Samples: 17440464. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:06:12,935][41694] Avg episode reward: [(0, '4.543')] +[2024-11-08 02:06:17,182][42004] Updated weights for policy 0, policy_version 21926 (0.0026) +[2024-11-08 02:06:17,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6966.6, 300 sec: 6845.2). Total num frames: 89812992. Throughput: 0: 1731.4. Samples: 17445412. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:06:17,932][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 02:06:22,577][42004] Updated weights for policy 0, policy_version 21936 (0.0030) +[2024-11-08 02:06:22,933][41694] Fps is (10 sec: 7373.0, 60 sec: 7031.3, 300 sec: 6859.0). Total num frames: 89849856. Throughput: 0: 1730.0. Samples: 17456896. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:06:22,936][41694] Avg episode reward: [(0, '4.586')] +[2024-11-08 02:06:27,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6826.6, 300 sec: 6803.5). Total num frames: 89870336. Throughput: 0: 1643.9. Samples: 17464886. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:06:27,934][41694] Avg episode reward: [(0, '4.472')] +[2024-11-08 02:06:30,847][42004] Updated weights for policy 0, policy_version 21946 (0.0042) +[2024-11-08 02:06:32,931][41694] Fps is (10 sec: 5325.5, 60 sec: 6690.1, 300 sec: 6789.6). Total num frames: 89903104. Throughput: 0: 1613.0. Samples: 17468862. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:06:32,933][41694] Avg episode reward: [(0, '4.535')] +[2024-11-08 02:06:36,595][42004] Updated weights for policy 0, policy_version 21956 (0.0032) +[2024-11-08 02:06:37,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 6789.6). Total num frames: 89939968. Throughput: 0: 1726.1. Samples: 17479370. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:06:37,933][41694] Avg episode reward: [(0, '4.344')] +[2024-11-08 02:06:41,767][42004] Updated weights for policy 0, policy_version 21966 (0.0026) +[2024-11-08 02:06:42,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 89980928. Throughput: 0: 1723.6. Samples: 17491290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:06:42,933][41694] Avg episode reward: [(0, '4.157')] +[2024-11-08 02:06:46,952][42004] Updated weights for policy 0, policy_version 21976 (0.0022) +[2024-11-08 02:06:47,933][41694] Fps is (10 sec: 7780.9, 60 sec: 6690.6, 300 sec: 6859.0). Total num frames: 90017792. Throughput: 0: 1736.1. Samples: 17497004. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:06:47,935][41694] Avg episode reward: [(0, '4.447')] +[2024-11-08 02:06:52,108][42004] Updated weights for policy 0, policy_version 21986 (0.0019) +[2024-11-08 02:06:52,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7047.2, 300 sec: 6873.0). Total num frames: 90058752. Throughput: 0: 1766.1. Samples: 17509054. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:06:52,933][41694] Avg episode reward: [(0, '4.671')] +[2024-11-08 02:06:57,354][42004] Updated weights for policy 0, policy_version 21996 (0.0024) +[2024-11-08 02:06:57,931][41694] Fps is (10 sec: 8193.7, 60 sec: 7168.0, 300 sec: 6900.7). Total num frames: 90099712. Throughput: 0: 1782.2. Samples: 17520660. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:06:57,933][41694] Avg episode reward: [(0, '4.324')] +[2024-11-08 02:07:02,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6894.9, 300 sec: 6831.3). Total num frames: 90116096. Throughput: 0: 1782.2. Samples: 17525612. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:07:02,933][41694] Avg episode reward: [(0, '4.259')] +[2024-11-08 02:07:06,239][42004] Updated weights for policy 0, policy_version 22006 (0.0039) +[2024-11-08 02:07:07,931][41694] Fps is (10 sec: 4505.5, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 90144768. Throughput: 0: 1654.0. Samples: 17531322. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:07:07,933][41694] Avg episode reward: [(0, '4.452')] +[2024-11-08 02:07:11,974][42004] Updated weights for policy 0, policy_version 22016 (0.0033) +[2024-11-08 02:07:12,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6758.5, 300 sec: 6803.5). Total num frames: 90181632. Throughput: 0: 1715.1. Samples: 17542068. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:07:12,933][41694] Avg episode reward: [(0, '4.563')] +[2024-11-08 02:07:17,002][42004] Updated weights for policy 0, policy_version 22026 (0.0024) +[2024-11-08 02:07:17,931][41694] Fps is (10 sec: 7782.3, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 90222592. Throughput: 0: 1755.6. Samples: 17547864. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:07:17,933][41694] Avg episode reward: [(0, '4.498')] +[2024-11-08 02:07:22,201][42004] Updated weights for policy 0, policy_version 22036 (0.0032) +[2024-11-08 02:07:22,932][41694] Fps is (10 sec: 8192.1, 60 sec: 6895.0, 300 sec: 6872.9). Total num frames: 90263552. Throughput: 0: 1789.2. Samples: 17559886. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:07:22,935][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 02:07:27,562][42004] Updated weights for policy 0, policy_version 22046 (0.0027) +[2024-11-08 02:07:27,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7168.0, 300 sec: 6872.9). Total num frames: 90300416. Throughput: 0: 1787.1. Samples: 17571710. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 02:07:27,934][41694] Avg episode reward: [(0, '4.375')] +[2024-11-08 02:07:32,932][41694] Fps is (10 sec: 7373.0, 60 sec: 7236.3, 300 sec: 6873.0). Total num frames: 90337280. Throughput: 0: 1785.7. Samples: 17577356. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 02:07:32,933][41694] Avg episode reward: [(0, '4.413')] +[2024-11-08 02:07:32,941][42004] Updated weights for policy 0, policy_version 22056 (0.0024) +[2024-11-08 02:07:37,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6963.1, 300 sec: 6817.4). Total num frames: 90357760. Throughput: 0: 1699.6. Samples: 17585538. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 02:07:37,934][41694] Avg episode reward: [(0, '4.340')] +[2024-11-08 02:07:38,099][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000022061_90361856.pth... +[2024-11-08 02:07:38,217][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000021660_88719360.pth +[2024-11-08 02:07:41,123][42004] Updated weights for policy 0, policy_version 22066 (0.0035) +[2024-11-08 02:07:42,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 90390528. Throughput: 0: 1645.1. Samples: 17594690. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:07:42,934][41694] Avg episode reward: [(0, '4.447')] +[2024-11-08 02:07:46,775][42004] Updated weights for policy 0, policy_version 22076 (0.0019) +[2024-11-08 02:07:47,932][41694] Fps is (10 sec: 7373.1, 60 sec: 6895.1, 300 sec: 6817.4). Total num frames: 90431488. Throughput: 0: 1646.6. Samples: 17599710. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:07:47,934][41694] Avg episode reward: [(0, '4.284')] +[2024-11-08 02:07:52,068][42004] Updated weights for policy 0, policy_version 22086 (0.0035) +[2024-11-08 02:07:52,932][41694] Fps is (10 sec: 7782.1, 60 sec: 6826.6, 300 sec: 6817.4). Total num frames: 90468352. Throughput: 0: 1781.9. Samples: 17611508. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:07:52,934][41694] Avg episode reward: [(0, '4.522')] +[2024-11-08 02:07:57,214][42004] Updated weights for policy 0, policy_version 22096 (0.0026) +[2024-11-08 02:07:57,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6826.7, 300 sec: 6886.8). Total num frames: 90509312. Throughput: 0: 1809.6. Samples: 17623500. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:07:57,932][41694] Avg episode reward: [(0, '4.453')] +[2024-11-08 02:08:02,868][42004] Updated weights for policy 0, policy_version 22106 (0.0027) +[2024-11-08 02:08:02,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7168.0, 300 sec: 6900.7). Total num frames: 90546176. Throughput: 0: 1804.3. Samples: 17629058. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:08:02,933][41694] Avg episode reward: [(0, '4.392')] +[2024-11-08 02:08:07,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7304.5, 300 sec: 6900.8). Total num frames: 90583040. Throughput: 0: 1777.7. Samples: 17639882. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:08:07,934][41694] Avg episode reward: [(0, '4.536')] +[2024-11-08 02:08:08,317][42004] Updated weights for policy 0, policy_version 22116 (0.0022) +[2024-11-08 02:08:12,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6963.2, 300 sec: 6831.3). Total num frames: 90599424. Throughput: 0: 1659.7. Samples: 17646396. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 02:08:12,933][41694] Avg episode reward: [(0, '4.479')] +[2024-11-08 02:08:16,961][42004] Updated weights for policy 0, policy_version 22126 (0.0030) +[2024-11-08 02:08:17,931][41694] Fps is (10 sec: 4915.3, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 90632192. Throughput: 0: 1640.0. Samples: 17651154. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 02:08:17,934][41694] Avg episode reward: [(0, '4.240')] +[2024-11-08 02:08:22,514][42004] Updated weights for policy 0, policy_version 22136 (0.0029) +[2024-11-08 02:08:22,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 90669056. Throughput: 0: 1695.9. Samples: 17661854. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 02:08:22,934][41694] Avg episode reward: [(0, '4.351')] +[2024-11-08 02:08:27,862][42004] Updated weights for policy 0, policy_version 22146 (0.0034) +[2024-11-08 02:08:27,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6826.7, 300 sec: 6860.9). Total num frames: 90710016. Throughput: 0: 1747.9. Samples: 17673346. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:08:27,934][41694] Avg episode reward: [(0, '4.485')] +[2024-11-08 02:08:32,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6826.7, 300 sec: 6873.0). Total num frames: 90746880. Throughput: 0: 1761.4. Samples: 17678974. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:08:32,935][41694] Avg episode reward: [(0, '4.517')] +[2024-11-08 02:08:33,196][42004] Updated weights for policy 0, policy_version 22156 (0.0035) +[2024-11-08 02:08:37,932][41694] Fps is (10 sec: 7372.9, 60 sec: 7099.8, 300 sec: 6886.8). Total num frames: 90783744. Throughput: 0: 1752.9. Samples: 17690390. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:08:37,935][41694] Avg episode reward: [(0, '4.406')] +[2024-11-08 02:08:38,656][42004] Updated weights for policy 0, policy_version 22166 (0.0031) +[2024-11-08 02:08:42,932][41694] Fps is (10 sec: 7372.0, 60 sec: 7167.9, 300 sec: 6886.8). Total num frames: 90820608. Throughput: 0: 1740.8. Samples: 17701840. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:08:42,935][41694] Avg episode reward: [(0, '4.290')] +[2024-11-08 02:08:46,334][42004] Updated weights for policy 0, policy_version 22176 (0.0031) +[2024-11-08 02:08:47,933][41694] Fps is (10 sec: 5733.5, 60 sec: 6826.5, 300 sec: 6817.4). Total num frames: 90841088. Throughput: 0: 1689.1. Samples: 17705070. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:08:47,935][41694] Avg episode reward: [(0, '4.303')] +[2024-11-08 02:08:52,625][42004] Updated weights for policy 0, policy_version 22186 (0.0055) +[2024-11-08 02:08:52,931][41694] Fps is (10 sec: 5325.4, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 90873856. Throughput: 0: 1626.8. Samples: 17713088. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:08:52,934][41694] Avg episode reward: [(0, '4.280')] +[2024-11-08 02:08:57,929][42004] Updated weights for policy 0, policy_version 22196 (0.0030) +[2024-11-08 02:08:57,932][41694] Fps is (10 sec: 7374.0, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 90914816. Throughput: 0: 1735.2. Samples: 17724480. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:08:57,933][41694] Avg episode reward: [(0, '4.411')] +[2024-11-08 02:09:02,931][41694] Fps is (10 sec: 7782.3, 60 sec: 6758.4, 300 sec: 6894.7). Total num frames: 90951680. Throughput: 0: 1757.2. Samples: 17730226. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:09:02,935][41694] Avg episode reward: [(0, '4.303')] +[2024-11-08 02:09:03,506][42004] Updated weights for policy 0, policy_version 22206 (0.0035) +[2024-11-08 02:09:07,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6914.6). Total num frames: 90988544. Throughput: 0: 1767.2. Samples: 17741376. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:09:07,933][41694] Avg episode reward: [(0, '4.476')] +[2024-11-08 02:09:08,698][42004] Updated weights for policy 0, policy_version 22216 (0.0020) +[2024-11-08 02:09:12,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7099.7, 300 sec: 6928.5). Total num frames: 91025408. Throughput: 0: 1767.8. Samples: 17752896. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:09:12,935][41694] Avg episode reward: [(0, '4.463')] +[2024-11-08 02:09:14,138][42004] Updated weights for policy 0, policy_version 22226 (0.0025) +[2024-11-08 02:09:17,932][41694] Fps is (10 sec: 7372.2, 60 sec: 7167.9, 300 sec: 6928.5). Total num frames: 91062272. Throughput: 0: 1766.3. Samples: 17758460. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:09:17,934][41694] Avg episode reward: [(0, '4.562')] +[2024-11-08 02:09:22,136][42004] Updated weights for policy 0, policy_version 22236 (0.0037) +[2024-11-08 02:09:22,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6894.9, 300 sec: 6859.1). Total num frames: 91082752. Throughput: 0: 1664.0. Samples: 17765272. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:09:22,934][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 02:09:27,931][41694] Fps is (10 sec: 5325.2, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 91115520. Throughput: 0: 1637.3. Samples: 17775518. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:09:27,933][41694] Avg episode reward: [(0, '4.573')] +[2024-11-08 02:09:27,995][42004] Updated weights for policy 0, policy_version 22246 (0.0024) +[2024-11-08 02:09:32,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6826.7, 300 sec: 6845.2). Total num frames: 91156480. Throughput: 0: 1693.8. Samples: 17781290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:09:32,933][41694] Avg episode reward: [(0, '4.387')] +[2024-11-08 02:09:33,202][42004] Updated weights for policy 0, policy_version 22256 (0.0028) +[2024-11-08 02:09:37,931][41694] Fps is (10 sec: 8192.0, 60 sec: 6894.9, 300 sec: 6914.0). Total num frames: 91197440. Throughput: 0: 1782.5. Samples: 17793302. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:09:37,937][41694] Avg episode reward: [(0, '4.387')] +[2024-11-08 02:09:37,948][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000022265_91197440.pth... +[2024-11-08 02:09:38,040][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000021860_89538560.pth +[2024-11-08 02:09:38,334][42004] Updated weights for policy 0, policy_version 22266 (0.0027) +[2024-11-08 02:09:42,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6895.1, 300 sec: 6928.5). Total num frames: 91234304. Throughput: 0: 1789.6. Samples: 17805012. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:09:42,933][41694] Avg episode reward: [(0, '4.433')] +[2024-11-08 02:09:43,663][42004] Updated weights for policy 0, policy_version 22276 (0.0022) +[2024-11-08 02:09:47,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7168.2, 300 sec: 6928.5). Total num frames: 91271168. Throughput: 0: 1791.2. Samples: 17810830. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:09:47,935][41694] Avg episode reward: [(0, '4.590')] +[2024-11-08 02:09:49,025][42004] Updated weights for policy 0, policy_version 22286 (0.0030) +[2024-11-08 02:09:55,147][41694] Fps is (10 sec: 6035.5, 60 sec: 6978.6, 300 sec: 6876.8). Total num frames: 91308032. Throughput: 0: 1706.8. Samples: 17821962. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:09:55,149][41694] Avg episode reward: [(0, '4.570')] +[2024-11-08 02:09:57,508][42004] Updated weights for policy 0, policy_version 22296 (0.0024) +[2024-11-08 02:09:57,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6826.7, 300 sec: 6859.1). Total num frames: 91324416. Throughput: 0: 1667.5. Samples: 17827934. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:09:57,933][41694] Avg episode reward: [(0, '4.649')] +[2024-11-08 02:10:02,932][41694] Fps is (10 sec: 6839.7, 60 sec: 6826.6, 300 sec: 6845.2). Total num frames: 91361280. Throughput: 0: 1659.2. Samples: 17833126. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:10:02,934][41694] Avg episode reward: [(0, '4.287')] +[2024-11-08 02:10:03,297][42004] Updated weights for policy 0, policy_version 22306 (0.0041) +[2024-11-08 02:10:07,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6826.6, 300 sec: 6859.1). Total num frames: 91398144. Throughput: 0: 1749.0. Samples: 17843976. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:10:07,934][41694] Avg episode reward: [(0, '4.561')] +[2024-11-08 02:10:08,674][42004] Updated weights for policy 0, policy_version 22316 (0.0033) +[2024-11-08 02:10:12,932][41694] Fps is (10 sec: 7373.2, 60 sec: 6826.7, 300 sec: 6915.2). Total num frames: 91435008. Throughput: 0: 1774.7. Samples: 17855380. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:10:12,935][41694] Avg episode reward: [(0, '4.393')] +[2024-11-08 02:10:14,522][42004] Updated weights for policy 0, policy_version 22326 (0.0035) +[2024-11-08 02:10:17,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6758.5, 300 sec: 6914.6). Total num frames: 91467776. Throughput: 0: 1753.3. Samples: 17860188. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:10:17,933][41694] Avg episode reward: [(0, '4.482')] +[2024-11-08 02:10:20,451][42004] Updated weights for policy 0, policy_version 22336 (0.0029) +[2024-11-08 02:10:22,932][41694] Fps is (10 sec: 6963.3, 60 sec: 7031.5, 300 sec: 6928.5). Total num frames: 91504640. Throughput: 0: 1718.9. Samples: 17870654. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:10:22,933][41694] Avg episode reward: [(0, '4.261')] +[2024-11-08 02:10:26,064][42004] Updated weights for policy 0, policy_version 22346 (0.0029) +[2024-11-08 02:10:29,313][41694] Fps is (10 sec: 6117.8, 60 sec: 6873.2, 300 sec: 6868.5). Total num frames: 91537408. Throughput: 0: 1650.5. Samples: 17881566. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:10:29,316][41694] Avg episode reward: [(0, '4.529')] +[2024-11-08 02:10:32,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6758.4, 300 sec: 6859.1). Total num frames: 91561984. Throughput: 0: 1606.5. Samples: 17883124. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:10:32,943][41694] Avg episode reward: [(0, '4.329')] +[2024-11-08 02:10:34,064][42004] Updated weights for policy 0, policy_version 22356 (0.0046) +[2024-11-08 02:10:37,932][41694] Fps is (10 sec: 7128.7, 60 sec: 6690.1, 300 sec: 6859.0). Total num frames: 91598848. Throughput: 0: 1672.4. Samples: 17893516. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:10:37,934][41694] Avg episode reward: [(0, '4.595')] +[2024-11-08 02:10:39,545][42004] Updated weights for policy 0, policy_version 22366 (0.0024) +[2024-11-08 02:10:42,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6621.8, 300 sec: 6831.4). Total num frames: 91631616. Throughput: 0: 1709.4. Samples: 17904856. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:10:42,933][41694] Avg episode reward: [(0, '4.498')] +[2024-11-08 02:10:45,149][42004] Updated weights for policy 0, policy_version 22376 (0.0033) +[2024-11-08 02:10:47,931][41694] Fps is (10 sec: 7373.3, 60 sec: 6690.1, 300 sec: 6903.8). Total num frames: 91672576. Throughput: 0: 1718.5. Samples: 17910458. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:10:47,933][41694] Avg episode reward: [(0, '4.469')] +[2024-11-08 02:10:50,489][42004] Updated weights for policy 0, policy_version 22386 (0.0036) +[2024-11-08 02:10:52,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6946.6, 300 sec: 6914.6). Total num frames: 91709440. Throughput: 0: 1730.8. Samples: 17921860. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:10:52,934][41694] Avg episode reward: [(0, '4.387')] +[2024-11-08 02:10:56,035][42004] Updated weights for policy 0, policy_version 22396 (0.0026) +[2024-11-08 02:10:57,931][41694] Fps is (10 sec: 7372.7, 60 sec: 7031.5, 300 sec: 6928.5). Total num frames: 91746304. Throughput: 0: 1723.5. Samples: 17932936. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:10:57,933][41694] Avg episode reward: [(0, '4.645')] +[2024-11-08 02:11:03,754][41694] Fps is (10 sec: 5298.7, 60 sec: 6667.1, 300 sec: 6853.8). Total num frames: 91766784. Throughput: 0: 1697.8. Samples: 17937984. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:11:03,757][41694] Avg episode reward: [(0, '4.670')] +[2024-11-08 02:11:04,453][42004] Updated weights for policy 0, policy_version 22406 (0.0065) +[2024-11-08 02:11:07,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6621.9, 300 sec: 6845.2). Total num frames: 91795456. Throughput: 0: 1631.0. Samples: 17944048. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:11:07,933][41694] Avg episode reward: [(0, '4.288')] +[2024-11-08 02:11:10,299][42004] Updated weights for policy 0, policy_version 22416 (0.0041) +[2024-11-08 02:11:12,932][41694] Fps is (10 sec: 7140.4, 60 sec: 6621.8, 300 sec: 6845.2). Total num frames: 91832320. Throughput: 0: 1667.2. Samples: 17954288. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:11:12,935][41694] Avg episode reward: [(0, '4.371')] +[2024-11-08 02:11:16,286][42004] Updated weights for policy 0, policy_version 22426 (0.0043) +[2024-11-08 02:11:17,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6621.9, 300 sec: 6831.3). Total num frames: 91865088. Throughput: 0: 1695.5. Samples: 17959420. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:11:17,934][41694] Avg episode reward: [(0, '4.219')] +[2024-11-08 02:11:22,071][42004] Updated weights for policy 0, policy_version 22436 (0.0023) +[2024-11-08 02:11:22,932][41694] Fps is (10 sec: 6963.5, 60 sec: 6621.9, 300 sec: 6886.8). Total num frames: 91901952. Throughput: 0: 1700.6. Samples: 17970042. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:11:22,933][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 02:11:27,419][42004] Updated weights for policy 0, policy_version 22446 (0.0045) +[2024-11-08 02:11:27,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6847.8, 300 sec: 6900.7). Total num frames: 91938816. Throughput: 0: 1701.9. Samples: 17981442. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:11:27,934][41694] Avg episode reward: [(0, '4.628')] +[2024-11-08 02:11:32,714][42004] Updated weights for policy 0, policy_version 22456 (0.0028) +[2024-11-08 02:11:32,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6914.6). Total num frames: 91979776. Throughput: 0: 1705.0. Samples: 17987184. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:11:32,933][41694] Avg episode reward: [(0, '4.656')] +[2024-11-08 02:11:38,225][41694] Fps is (10 sec: 5968.6, 60 sec: 6657.6, 300 sec: 6838.4). Total num frames: 92000256. Throughput: 0: 1573.3. Samples: 17993122. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:11:38,230][41694] Avg episode reward: [(0, '4.449')] +[2024-11-08 02:11:38,244][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000022461_92000256.pth... +[2024-11-08 02:11:38,354][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000022061_90361856.pth +[2024-11-08 02:11:41,065][42004] Updated weights for policy 0, policy_version 22466 (0.0042) +[2024-11-08 02:11:42,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6690.1, 300 sec: 6831.3). Total num frames: 92033024. Throughput: 0: 1591.7. Samples: 18004562. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:11:42,933][41694] Avg episode reward: [(0, '4.369')] +[2024-11-08 02:11:46,519][42004] Updated weights for policy 0, policy_version 22476 (0.0032) +[2024-11-08 02:11:47,932][41694] Fps is (10 sec: 7174.0, 60 sec: 6621.8, 300 sec: 6817.4). Total num frames: 92069888. Throughput: 0: 1630.1. Samples: 18010000. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:11:47,934][41694] Avg episode reward: [(0, '4.489')] +[2024-11-08 02:11:51,807][42004] Updated weights for policy 0, policy_version 22486 (0.0029) +[2024-11-08 02:11:52,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6803.5). Total num frames: 92106752. Throughput: 0: 1724.4. Samples: 18021646. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:11:52,933][41694] Avg episode reward: [(0, '4.379')] +[2024-11-08 02:11:57,234][42004] Updated weights for policy 0, policy_version 22496 (0.0031) +[2024-11-08 02:11:57,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6690.1, 300 sec: 6886.8). Total num frames: 92147712. Throughput: 0: 1753.3. Samples: 18033186. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:11:57,933][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 02:12:02,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6990.8, 300 sec: 6900.7). Total num frames: 92180480. Throughput: 0: 1761.7. Samples: 18038696. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:12:02,934][41694] Avg episode reward: [(0, '4.465')] +[2024-11-08 02:12:02,958][42004] Updated weights for policy 0, policy_version 22506 (0.0032) +[2024-11-08 02:12:07,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6963.2, 300 sec: 6886.8). Total num frames: 92213248. Throughput: 0: 1727.4. Samples: 18047776. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:12:07,935][41694] Avg episode reward: [(0, '4.521')] +[2024-11-08 02:12:10,090][42004] Updated weights for policy 0, policy_version 22516 (0.0032) +[2024-11-08 02:12:12,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6621.9, 300 sec: 6803.5). Total num frames: 92229632. Throughput: 0: 1626.8. Samples: 18054650. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:12:12,933][41694] Avg episode reward: [(0, '4.472')] +[2024-11-08 02:12:17,904][42004] Updated weights for policy 0, policy_version 22526 (0.0046) +[2024-11-08 02:12:17,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6690.1, 300 sec: 6789.6). Total num frames: 92266496. Throughput: 0: 1585.6. Samples: 18058538. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:12:17,934][41694] Avg episode reward: [(0, '4.409')] +[2024-11-08 02:12:22,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.1, 300 sec: 6789.6). Total num frames: 92303360. Throughput: 0: 1720.2. Samples: 18070026. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:12:22,933][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 02:12:23,287][42004] Updated weights for policy 0, policy_version 22536 (0.0027) +[2024-11-08 02:12:27,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6789.6). Total num frames: 92340224. Throughput: 0: 1706.6. Samples: 18081360. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:12:27,933][41694] Avg episode reward: [(0, '4.479')] +[2024-11-08 02:12:28,676][42004] Updated weights for policy 0, policy_version 22546 (0.0023) +[2024-11-08 02:12:32,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6690.1, 300 sec: 6859.1). Total num frames: 92381184. Throughput: 0: 1712.8. Samples: 18087076. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:12:32,933][41694] Avg episode reward: [(0, '4.503')] +[2024-11-08 02:12:33,929][42004] Updated weights for policy 0, policy_version 22556 (0.0041) +[2024-11-08 02:12:37,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6997.5, 300 sec: 6873.0). Total num frames: 92418048. Throughput: 0: 1703.9. Samples: 18098322. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:12:37,933][41694] Avg episode reward: [(0, '4.305')] +[2024-11-08 02:12:39,589][42004] Updated weights for policy 0, policy_version 22566 (0.0031) +[2024-11-08 02:12:42,933][41694] Fps is (10 sec: 6962.3, 60 sec: 6963.1, 300 sec: 6845.2). Total num frames: 92450816. Throughput: 0: 1690.8. Samples: 18109274. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:12:42,936][41694] Avg episode reward: [(0, '4.257')] +[2024-11-08 02:12:47,829][42004] Updated weights for policy 0, policy_version 22576 (0.0027) +[2024-11-08 02:12:47,934][41694] Fps is (10 sec: 5323.6, 60 sec: 6689.9, 300 sec: 6789.6). Total num frames: 92471296. Throughput: 0: 1677.4. Samples: 18114182. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:12:47,936][41694] Avg episode reward: [(0, '4.318')] +[2024-11-08 02:12:52,931][41694] Fps is (10 sec: 5735.1, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 92508160. Throughput: 0: 1630.7. Samples: 18121156. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:12:52,933][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 02:12:53,262][42004] Updated weights for policy 0, policy_version 22586 (0.0021) +[2024-11-08 02:12:57,932][41694] Fps is (10 sec: 7374.2, 60 sec: 6621.8, 300 sec: 6775.8). Total num frames: 92545024. Throughput: 0: 1736.0. Samples: 18132772. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:12:57,934][41694] Avg episode reward: [(0, '4.458')] +[2024-11-08 02:12:58,578][42004] Updated weights for policy 0, policy_version 22596 (0.0028) +[2024-11-08 02:13:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 92581888. Throughput: 0: 1780.0. Samples: 18138636. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:13:02,934][41694] Avg episode reward: [(0, '4.484')] +[2024-11-08 02:13:04,155][42004] Updated weights for policy 0, policy_version 22606 (0.0034) +[2024-11-08 02:13:07,932][41694] Fps is (10 sec: 7782.6, 60 sec: 6826.7, 300 sec: 6859.1). Total num frames: 92622848. Throughput: 0: 1766.3. Samples: 18149510. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:13:07,936][41694] Avg episode reward: [(0, '4.790')] +[2024-11-08 02:13:09,605][42004] Updated weights for policy 0, policy_version 22616 (0.0028) +[2024-11-08 02:13:12,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7168.0, 300 sec: 6872.9). Total num frames: 92659712. Throughput: 0: 1766.6. Samples: 18160858. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:13:12,933][41694] Avg episode reward: [(0, '4.408')] +[2024-11-08 02:13:15,103][42004] Updated weights for policy 0, policy_version 22626 (0.0032) +[2024-11-08 02:13:17,932][41694] Fps is (10 sec: 6963.2, 60 sec: 7099.7, 300 sec: 6859.1). Total num frames: 92692480. Throughput: 0: 1764.2. Samples: 18166466. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:13:17,933][41694] Avg episode reward: [(0, '4.373')] +[2024-11-08 02:13:22,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6826.7, 300 sec: 6789.6). Total num frames: 92712960. Throughput: 0: 1670.2. Samples: 18173482. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:13:22,934][41694] Avg episode reward: [(0, '4.268')] +[2024-11-08 02:13:23,423][42004] Updated weights for policy 0, policy_version 22636 (0.0045) +[2024-11-08 02:13:27,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6826.7, 300 sec: 6789.6). Total num frames: 92749824. Throughput: 0: 1652.9. Samples: 18183652. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:13:27,933][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 02:13:28,742][42004] Updated weights for policy 0, policy_version 22646 (0.0029) +[2024-11-08 02:13:32,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 92786688. Throughput: 0: 1671.8. Samples: 18189408. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:13:32,934][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 02:13:34,072][42004] Updated weights for policy 0, policy_version 22656 (0.0040) +[2024-11-08 02:13:37,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6826.6, 300 sec: 6803.5). Total num frames: 92827648. Throughput: 0: 1771.2. Samples: 18200862. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:13:37,933][41694] Avg episode reward: [(0, '4.583')] +[2024-11-08 02:13:37,953][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000022663_92827648.pth... +[2024-11-08 02:13:38,055][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000022265_91197440.pth +[2024-11-08 02:13:39,413][42004] Updated weights for policy 0, policy_version 22666 (0.0026) +[2024-11-08 02:13:42,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6895.1, 300 sec: 6859.1). Total num frames: 92864512. Throughput: 0: 1767.3. Samples: 18212300. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:13:42,933][41694] Avg episode reward: [(0, '4.494')] +[2024-11-08 02:13:44,930][42004] Updated weights for policy 0, policy_version 22676 (0.0043) +[2024-11-08 02:13:47,932][41694] Fps is (10 sec: 7372.9, 60 sec: 7168.3, 300 sec: 6872.9). Total num frames: 92901376. Throughput: 0: 1762.6. Samples: 18217954. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:13:47,934][41694] Avg episode reward: [(0, '4.464')] +[2024-11-08 02:13:50,447][42004] Updated weights for policy 0, policy_version 22686 (0.0029) +[2024-11-08 02:13:52,931][41694] Fps is (10 sec: 7373.0, 60 sec: 7168.0, 300 sec: 6859.1). Total num frames: 92938240. Throughput: 0: 1763.4. Samples: 18228864. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:13:52,934][41694] Avg episode reward: [(0, '4.487')] +[2024-11-08 02:13:57,932][41694] Fps is (10 sec: 5324.6, 60 sec: 6826.7, 300 sec: 6789.6). Total num frames: 92954624. Throughput: 0: 1655.0. Samples: 18235332. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:13:57,935][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 02:13:58,591][42004] Updated weights for policy 0, policy_version 22696 (0.0027) +[2024-11-08 02:14:02,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6826.7, 300 sec: 6789.6). Total num frames: 92991488. Throughput: 0: 1657.7. Samples: 18241062. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:14:02,933][41694] Avg episode reward: [(0, '4.436')] +[2024-11-08 02:14:04,135][42004] Updated weights for policy 0, policy_version 22706 (0.0028) +[2024-11-08 02:14:07,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 93028352. Throughput: 0: 1735.0. Samples: 18251558. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:14:07,936][41694] Avg episode reward: [(0, '4.573')] +[2024-11-08 02:14:09,829][42004] Updated weights for policy 0, policy_version 22716 (0.0029) +[2024-11-08 02:14:12,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6789.7). Total num frames: 93065216. Throughput: 0: 1757.7. Samples: 18262750. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:14:12,934][41694] Avg episode reward: [(0, '4.493')] +[2024-11-08 02:14:15,462][42004] Updated weights for policy 0, policy_version 22726 (0.0032) +[2024-11-08 02:14:17,932][41694] Fps is (10 sec: 7372.4, 60 sec: 6826.6, 300 sec: 6845.2). Total num frames: 93102080. Throughput: 0: 1749.6. Samples: 18268140. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:14:17,936][41694] Avg episode reward: [(0, '4.267')] +[2024-11-08 02:14:20,849][42004] Updated weights for policy 0, policy_version 22736 (0.0040) +[2024-11-08 02:14:22,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7099.7, 300 sec: 6859.1). Total num frames: 93138944. Throughput: 0: 1747.1. Samples: 18279482. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:14:22,934][41694] Avg episode reward: [(0, '4.424')] +[2024-11-08 02:14:26,786][42004] Updated weights for policy 0, policy_version 22746 (0.0036) +[2024-11-08 02:14:27,934][41694] Fps is (10 sec: 6962.1, 60 sec: 7031.2, 300 sec: 6831.2). Total num frames: 93171712. Throughput: 0: 1722.1. Samples: 18289800. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:14:27,938][41694] Avg episode reward: [(0, '4.576')] +[2024-11-08 02:14:32,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 93192192. Throughput: 0: 1642.5. Samples: 18291868. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:14:32,933][41694] Avg episode reward: [(0, '4.580')] +[2024-11-08 02:14:34,702][42004] Updated weights for policy 0, policy_version 22756 (0.0029) +[2024-11-08 02:14:37,934][41694] Fps is (10 sec: 5734.2, 60 sec: 6689.8, 300 sec: 6761.8). Total num frames: 93229056. Throughput: 0: 1631.9. Samples: 18302302. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:14:37,938][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 02:14:40,104][42004] Updated weights for policy 0, policy_version 22766 (0.0034) +[2024-11-08 02:14:42,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 93270016. Throughput: 0: 1735.3. Samples: 18313420. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:14:42,934][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 02:14:45,728][42004] Updated weights for policy 0, policy_version 22776 (0.0025) +[2024-11-08 02:14:47,932][41694] Fps is (10 sec: 7784.4, 60 sec: 6758.4, 300 sec: 6827.0). Total num frames: 93306880. Throughput: 0: 1732.4. Samples: 18319020. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:14:47,933][41694] Avg episode reward: [(0, '4.444')] +[2024-11-08 02:14:51,025][42004] Updated weights for policy 0, policy_version 22786 (0.0030) +[2024-11-08 02:14:52,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6690.1, 300 sec: 6831.3). Total num frames: 93339648. Throughput: 0: 1754.7. Samples: 18330518. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:14:52,934][41694] Avg episode reward: [(0, '4.758')] +[2024-11-08 02:14:56,878][42004] Updated weights for policy 0, policy_version 22796 (0.0023) +[2024-11-08 02:14:57,932][41694] Fps is (10 sec: 6963.3, 60 sec: 7031.5, 300 sec: 6831.3). Total num frames: 93376512. Throughput: 0: 1738.0. Samples: 18340960. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:14:57,934][41694] Avg episode reward: [(0, '4.598')] +[2024-11-08 02:15:05,030][41694] Fps is (10 sec: 5755.5, 60 sec: 6727.9, 300 sec: 6769.3). Total num frames: 93409280. Throughput: 0: 1651.5. Samples: 18345920. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:15:05,032][41694] Avg episode reward: [(0, '4.433')] +[2024-11-08 02:15:05,102][42004] Updated weights for policy 0, policy_version 22806 (0.0029) +[2024-11-08 02:15:07,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 93433856. Throughput: 0: 1629.3. Samples: 18352800. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:15:07,933][41694] Avg episode reward: [(0, '4.276')] +[2024-11-08 02:15:10,428][42004] Updated weights for policy 0, policy_version 22816 (0.0026) +[2024-11-08 02:15:12,932][41694] Fps is (10 sec: 7775.7, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 93470720. Throughput: 0: 1647.9. Samples: 18363952. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:15:12,933][41694] Avg episode reward: [(0, '4.310')] +[2024-11-08 02:15:16,392][42004] Updated weights for policy 0, policy_version 22826 (0.0026) +[2024-11-08 02:15:17,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6690.2, 300 sec: 6775.8). Total num frames: 93503488. Throughput: 0: 1716.3. Samples: 18369100. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:15:17,938][41694] Avg episode reward: [(0, '4.283')] +[2024-11-08 02:15:21,946][42004] Updated weights for policy 0, policy_version 22836 (0.0031) +[2024-11-08 02:15:22,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.2, 300 sec: 6821.6). Total num frames: 93540352. Throughput: 0: 1719.8. Samples: 18379686. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:15:22,933][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 02:15:27,169][42004] Updated weights for policy 0, policy_version 22846 (0.0026) +[2024-11-08 02:15:27,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6826.9, 300 sec: 6845.2). Total num frames: 93581312. Throughput: 0: 1734.8. Samples: 18391486. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:15:27,934][41694] Avg episode reward: [(0, '4.394')] +[2024-11-08 02:15:32,898][42004] Updated weights for policy 0, policy_version 22856 (0.0023) +[2024-11-08 02:15:32,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7099.7, 300 sec: 6845.2). Total num frames: 93618176. Throughput: 0: 1734.2. Samples: 18397058. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:15:32,934][41694] Avg episode reward: [(0, '4.475')] +[2024-11-08 02:15:39,497][41694] Fps is (10 sec: 5666.5, 60 sec: 6786.4, 300 sec: 6795.2). Total num frames: 93646848. Throughput: 0: 1647.2. Samples: 18407222. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:15:39,499][41694] Avg episode reward: [(0, '4.671')] +[2024-11-08 02:15:39,515][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000022863_93646848.pth... +[2024-11-08 02:15:39,639][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000022461_92000256.pth +[2024-11-08 02:15:40,775][42004] Updated weights for policy 0, policy_version 22866 (0.0037) +[2024-11-08 02:15:42,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 93671424. Throughput: 0: 1635.2. Samples: 18414544. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:15:42,933][41694] Avg episode reward: [(0, '4.536')] +[2024-11-08 02:15:46,318][42004] Updated weights for policy 0, policy_version 22876 (0.0029) +[2024-11-08 02:15:47,932][41694] Fps is (10 sec: 7283.8, 60 sec: 6690.1, 300 sec: 6775.7). Total num frames: 93708288. Throughput: 0: 1728.7. Samples: 18420086. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:15:47,934][41694] Avg episode reward: [(0, '4.389')] +[2024-11-08 02:15:51,644][42004] Updated weights for policy 0, policy_version 22886 (0.0028) +[2024-11-08 02:15:52,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6826.7, 300 sec: 6789.6). Total num frames: 93749248. Throughput: 0: 1753.5. Samples: 18431706. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:15:52,934][41694] Avg episode reward: [(0, '4.367')] +[2024-11-08 02:15:56,884][42004] Updated weights for policy 0, policy_version 22896 (0.0029) +[2024-11-08 02:15:57,932][41694] Fps is (10 sec: 8192.1, 60 sec: 6894.9, 300 sec: 6878.2). Total num frames: 93790208. Throughput: 0: 1767.2. Samples: 18443478. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:15:57,935][41694] Avg episode reward: [(0, '4.491')] +[2024-11-08 02:16:02,341][42004] Updated weights for policy 0, policy_version 22906 (0.0030) +[2024-11-08 02:16:02,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7215.6, 300 sec: 6886.8). Total num frames: 93827072. Throughput: 0: 1779.2. Samples: 18449166. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:16:02,933][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 02:16:07,932][41694] Fps is (10 sec: 6963.5, 60 sec: 7099.7, 300 sec: 6873.0). Total num frames: 93859840. Throughput: 0: 1778.3. Samples: 18459708. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:16:07,933][41694] Avg episode reward: [(0, '4.474')] +[2024-11-08 02:16:08,190][42004] Updated weights for policy 0, policy_version 22916 (0.0035) +[2024-11-08 02:16:13,973][41694] Fps is (10 sec: 5193.5, 60 sec: 6777.3, 300 sec: 6821.1). Total num frames: 93884416. Throughput: 0: 1591.1. Samples: 18464742. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:16:13,974][41694] Avg episode reward: [(0, '4.343')] +[2024-11-08 02:16:16,551][42004] Updated weights for policy 0, policy_version 22926 (0.0044) +[2024-11-08 02:16:17,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 93913088. Throughput: 0: 1645.7. Samples: 18471114. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:16:17,933][41694] Avg episode reward: [(0, '4.334')] +[2024-11-08 02:16:21,814][42004] Updated weights for policy 0, policy_version 22936 (0.0033) +[2024-11-08 02:16:22,931][41694] Fps is (10 sec: 7772.8, 60 sec: 6894.9, 300 sec: 6831.3). Total num frames: 93954048. Throughput: 0: 1733.2. Samples: 18482504. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:16:22,933][41694] Avg episode reward: [(0, '4.579')] +[2024-11-08 02:16:27,015][42004] Updated weights for policy 0, policy_version 22946 (0.0026) +[2024-11-08 02:16:27,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 93990912. Throughput: 0: 1766.7. Samples: 18494046. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:16:27,933][41694] Avg episode reward: [(0, '4.591')] +[2024-11-08 02:16:32,272][42004] Updated weights for policy 0, policy_version 22956 (0.0027) +[2024-11-08 02:16:32,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6895.0, 300 sec: 6893.7). Total num frames: 94031872. Throughput: 0: 1773.1. Samples: 18499872. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 02:16:32,933][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 02:16:37,577][42004] Updated weights for policy 0, policy_version 22966 (0.0032) +[2024-11-08 02:16:37,932][41694] Fps is (10 sec: 7781.8, 60 sec: 7219.8, 300 sec: 6900.7). Total num frames: 94068736. Throughput: 0: 1777.5. Samples: 18511696. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 02:16:37,934][41694] Avg episode reward: [(0, '4.524')] +[2024-11-08 02:16:42,932][41694] Fps is (10 sec: 6963.1, 60 sec: 7168.0, 300 sec: 6886.8). Total num frames: 94101504. Throughput: 0: 1742.3. Samples: 18521882. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 02:16:42,933][41694] Avg episode reward: [(0, '4.349')] +[2024-11-08 02:16:43,700][42004] Updated weights for policy 0, policy_version 22976 (0.0038) +[2024-11-08 02:16:48,477][41694] Fps is (10 sec: 5438.4, 60 sec: 6900.6, 300 sec: 6832.6). Total num frames: 94126080. Throughput: 0: 1714.9. Samples: 18527270. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:16:48,479][41694] Avg episode reward: [(0, '4.528')] +[2024-11-08 02:16:51,541][42004] Updated weights for policy 0, policy_version 22986 (0.0026) +[2024-11-08 02:16:52,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 94158848. Throughput: 0: 1654.6. Samples: 18534164. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:16:52,933][41694] Avg episode reward: [(0, '4.544')] +[2024-11-08 02:16:56,778][42004] Updated weights for policy 0, policy_version 22996 (0.0029) +[2024-11-08 02:16:57,932][41694] Fps is (10 sec: 7797.5, 60 sec: 6826.7, 300 sec: 6845.2). Total num frames: 94199808. Throughput: 0: 1844.0. Samples: 18545802. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:16:57,934][41694] Avg episode reward: [(0, '4.392')] +[2024-11-08 02:17:02,568][42004] Updated weights for policy 0, policy_version 23006 (0.0027) +[2024-11-08 02:17:02,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 94232576. Throughput: 0: 1787.0. Samples: 18551530. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:17:02,933][41694] Avg episode reward: [(0, '4.266')] +[2024-11-08 02:17:07,666][42004] Updated weights for policy 0, policy_version 23016 (0.0028) +[2024-11-08 02:17:07,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6894.9, 300 sec: 6928.5). Total num frames: 94273536. Throughput: 0: 1782.8. Samples: 18562728. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:17:07,933][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 02:17:12,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7225.1, 300 sec: 6928.5). Total num frames: 94310400. Throughput: 0: 1771.6. Samples: 18573768. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:17:12,935][41694] Avg episode reward: [(0, '4.475')] +[2024-11-08 02:17:13,512][42004] Updated weights for policy 0, policy_version 23026 (0.0034) +[2024-11-08 02:17:17,931][41694] Fps is (10 sec: 6963.3, 60 sec: 7168.0, 300 sec: 6914.6). Total num frames: 94343168. Throughput: 0: 1746.9. Samples: 18578482. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:17:17,933][41694] Avg episode reward: [(0, '4.435')] +[2024-11-08 02:17:19,495][42004] Updated weights for policy 0, policy_version 23036 (0.0035) +[2024-11-08 02:17:23,101][41694] Fps is (10 sec: 5235.9, 60 sec: 6807.4, 300 sec: 6855.1). Total num frames: 94363648. Throughput: 0: 1597.9. Samples: 18583870. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:17:23,103][41694] Avg episode reward: [(0, '4.494')] +[2024-11-08 02:17:27,242][42004] Updated weights for policy 0, policy_version 23046 (0.0029) +[2024-11-08 02:17:27,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6826.7, 300 sec: 6845.2). Total num frames: 94400512. Throughput: 0: 1652.8. Samples: 18596258. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:17:27,936][41694] Avg episode reward: [(0, '4.519')] +[2024-11-08 02:17:32,551][42004] Updated weights for policy 0, policy_version 23056 (0.0022) +[2024-11-08 02:17:32,932][41694] Fps is (10 sec: 7500.1, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 94437376. Throughput: 0: 1677.1. Samples: 18601828. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:17:32,933][41694] Avg episode reward: [(0, '4.413')] +[2024-11-08 02:17:37,761][42004] Updated weights for policy 0, policy_version 23066 (0.0043) +[2024-11-08 02:17:37,932][41694] Fps is (10 sec: 7782.1, 60 sec: 6826.7, 300 sec: 6873.0). Total num frames: 94478336. Throughput: 0: 1763.9. Samples: 18613542. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:17:37,934][41694] Avg episode reward: [(0, '4.458')] +[2024-11-08 02:17:37,944][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000023066_94478336.pth... +[2024-11-08 02:17:38,055][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000022663_92827648.pth +[2024-11-08 02:17:42,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6895.0, 300 sec: 6928.5). Total num frames: 94515200. Throughput: 0: 1756.5. Samples: 18624844. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:17:42,933][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 02:17:43,391][42004] Updated weights for policy 0, policy_version 23076 (0.0026) +[2024-11-08 02:17:47,932][41694] Fps is (10 sec: 7373.0, 60 sec: 7164.8, 300 sec: 6928.5). Total num frames: 94552064. Throughput: 0: 1755.5. Samples: 18630528. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:17:47,933][41694] Avg episode reward: [(0, '4.389')] +[2024-11-08 02:17:48,585][42004] Updated weights for policy 0, policy_version 23086 (0.0025) +[2024-11-08 02:17:52,932][41694] Fps is (10 sec: 7782.2, 60 sec: 7236.3, 300 sec: 6942.4). Total num frames: 94593024. Throughput: 0: 1769.2. Samples: 18642344. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:17:52,933][41694] Avg episode reward: [(0, '4.207')] +[2024-11-08 02:17:53,843][42004] Updated weights for policy 0, policy_version 23096 (0.0026) +[2024-11-08 02:17:57,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6895.0, 300 sec: 6886.8). Total num frames: 94613504. Throughput: 0: 1706.8. Samples: 18650572. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:17:57,934][41694] Avg episode reward: [(0, '4.263')] +[2024-11-08 02:18:02,054][42004] Updated weights for policy 0, policy_version 23106 (0.0027) +[2024-11-08 02:18:02,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6894.9, 300 sec: 6859.1). Total num frames: 94646272. Throughput: 0: 1698.5. Samples: 18654914. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:18:02,934][41694] Avg episode reward: [(0, '4.326')] +[2024-11-08 02:18:07,687][42004] Updated weights for policy 0, policy_version 23116 (0.0029) +[2024-11-08 02:18:07,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6826.6, 300 sec: 6859.1). Total num frames: 94683136. Throughput: 0: 1813.0. Samples: 18665150. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:18:07,935][41694] Avg episode reward: [(0, '4.431')] +[2024-11-08 02:18:12,931][41694] Fps is (10 sec: 7373.1, 60 sec: 6826.7, 300 sec: 6873.0). Total num frames: 94720000. Throughput: 0: 1789.1. Samples: 18676766. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:18:12,933][41694] Avg episode reward: [(0, '4.509')] +[2024-11-08 02:18:12,982][42004] Updated weights for policy 0, policy_version 23126 (0.0024) +[2024-11-08 02:18:17,932][41694] Fps is (10 sec: 7782.7, 60 sec: 6963.2, 300 sec: 6942.4). Total num frames: 94760960. Throughput: 0: 1793.7. Samples: 18682544. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:18:17,933][41694] Avg episode reward: [(0, '4.628')] +[2024-11-08 02:18:18,121][42004] Updated weights for policy 0, policy_version 23136 (0.0029) +[2024-11-08 02:18:22,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7256.8, 300 sec: 6942.4). Total num frames: 94797824. Throughput: 0: 1792.0. Samples: 18694182. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:18:22,933][41694] Avg episode reward: [(0, '4.513')] +[2024-11-08 02:18:23,558][42004] Updated weights for policy 0, policy_version 23146 (0.0024) +[2024-11-08 02:18:27,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7304.5, 300 sec: 6956.3). Total num frames: 94838784. Throughput: 0: 1800.8. Samples: 18705882. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:18:27,933][41694] Avg episode reward: [(0, '4.321')] +[2024-11-08 02:18:28,823][42004] Updated weights for policy 0, policy_version 23156 (0.0028) +[2024-11-08 02:18:32,931][41694] Fps is (10 sec: 6144.0, 60 sec: 7031.5, 300 sec: 6886.8). Total num frames: 94859264. Throughput: 0: 1789.6. Samples: 18711058. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:18:32,933][41694] Avg episode reward: [(0, '4.290')] +[2024-11-08 02:18:37,137][42004] Updated weights for policy 0, policy_version 23166 (0.0034) +[2024-11-08 02:18:37,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6895.0, 300 sec: 6873.0). Total num frames: 94892032. Throughput: 0: 1673.0. Samples: 18717630. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:18:37,933][41694] Avg episode reward: [(0, '4.239')] +[2024-11-08 02:18:42,493][42004] Updated weights for policy 0, policy_version 23176 (0.0035) +[2024-11-08 02:18:42,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 6872.9). Total num frames: 94928896. Throughput: 0: 1741.9. Samples: 18728956. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:18:42,934][41694] Avg episode reward: [(0, '4.367')] +[2024-11-08 02:18:47,766][42004] Updated weights for policy 0, policy_version 23186 (0.0030) +[2024-11-08 02:18:47,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6886.8). Total num frames: 94969856. Throughput: 0: 1772.3. Samples: 18734666. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:18:47,933][41694] Avg episode reward: [(0, '4.523')] +[2024-11-08 02:18:52,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6894.9, 300 sec: 6956.3). Total num frames: 95006720. Throughput: 0: 1805.4. Samples: 18746394. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:18:52,933][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 02:18:53,053][42004] Updated weights for policy 0, policy_version 23196 (0.0028) +[2024-11-08 02:18:57,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7168.0, 300 sec: 6956.3). Total num frames: 95043584. Throughput: 0: 1800.7. Samples: 18757798. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:18:57,934][41694] Avg episode reward: [(0, '4.346')] +[2024-11-08 02:18:58,526][42004] Updated weights for policy 0, policy_version 23206 (0.0043) +[2024-11-08 02:19:02,933][41694] Fps is (10 sec: 7371.9, 60 sec: 7236.2, 300 sec: 6956.2). Total num frames: 95080448. Throughput: 0: 1795.1. Samples: 18763324. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:19:02,936][41694] Avg episode reward: [(0, '4.334')] +[2024-11-08 02:19:04,134][42004] Updated weights for policy 0, policy_version 23216 (0.0029) +[2024-11-08 02:19:07,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6963.3, 300 sec: 6900.7). Total num frames: 95100928. Throughput: 0: 1705.1. Samples: 18770910. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:19:07,934][41694] Avg episode reward: [(0, '4.365')] +[2024-11-08 02:19:12,742][42004] Updated weights for policy 0, policy_version 23226 (0.0029) +[2024-11-08 02:19:12,931][41694] Fps is (10 sec: 5325.5, 60 sec: 6894.9, 300 sec: 6886.9). Total num frames: 95133696. Throughput: 0: 1641.4. Samples: 18779744. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:19:12,933][41694] Avg episode reward: [(0, '4.579')] +[2024-11-08 02:19:17,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6826.6, 300 sec: 6886.8). Total num frames: 95170560. Throughput: 0: 1647.0. Samples: 18785172. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:19:17,935][41694] Avg episode reward: [(0, '4.503')] +[2024-11-08 02:19:18,071][42004] Updated weights for policy 0, policy_version 23236 (0.0040) +[2024-11-08 02:19:22,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6894.9, 300 sec: 6914.7). Total num frames: 95211520. Throughput: 0: 1763.6. Samples: 18796994. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:19:22,933][41694] Avg episode reward: [(0, '4.655')] +[2024-11-08 02:19:23,338][42004] Updated weights for policy 0, policy_version 23246 (0.0023) +[2024-11-08 02:19:27,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6826.7, 300 sec: 6970.1). Total num frames: 95248384. Throughput: 0: 1767.6. Samples: 18808500. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:19:27,934][41694] Avg episode reward: [(0, '4.499')] +[2024-11-08 02:19:28,628][42004] Updated weights for policy 0, policy_version 23256 (0.0030) +[2024-11-08 02:19:32,931][41694] Fps is (10 sec: 7782.3, 60 sec: 7168.0, 300 sec: 6984.1). Total num frames: 95289344. Throughput: 0: 1769.3. Samples: 18814284. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:19:32,933][41694] Avg episode reward: [(0, '4.553')] +[2024-11-08 02:19:34,045][42004] Updated weights for policy 0, policy_version 23266 (0.0027) +[2024-11-08 02:19:37,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7236.3, 300 sec: 6970.1). Total num frames: 95326208. Throughput: 0: 1756.5. Samples: 18825438. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:19:37,934][41694] Avg episode reward: [(0, '4.309')] +[2024-11-08 02:19:37,950][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000023273_95326208.pth... +[2024-11-08 02:19:38,057][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000022863_93646848.pth +[2024-11-08 02:19:42,208][42004] Updated weights for policy 0, policy_version 23276 (0.0028) +[2024-11-08 02:19:42,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6894.9, 300 sec: 6900.7). Total num frames: 95342592. Throughput: 0: 1645.7. Samples: 18831856. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:19:42,933][41694] Avg episode reward: [(0, '4.359')] +[2024-11-08 02:19:47,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6758.4, 300 sec: 6900.7). Total num frames: 95375360. Throughput: 0: 1631.5. Samples: 18836740. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:19:47,934][41694] Avg episode reward: [(0, '4.339')] +[2024-11-08 02:19:48,006][42004] Updated weights for policy 0, policy_version 23286 (0.0023) +[2024-11-08 02:19:52,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6826.7, 300 sec: 6914.6). Total num frames: 95416320. Throughput: 0: 1719.1. Samples: 18848268. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:19:52,934][41694] Avg episode reward: [(0, '4.804')] +[2024-11-08 02:19:53,220][42004] Updated weights for policy 0, policy_version 23296 (0.0033) +[2024-11-08 02:19:57,931][41694] Fps is (10 sec: 8192.2, 60 sec: 6894.9, 300 sec: 6992.1). Total num frames: 95457280. Throughput: 0: 1787.2. Samples: 18860166. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:19:57,933][41694] Avg episode reward: [(0, '4.592')] +[2024-11-08 02:19:58,385][42004] Updated weights for policy 0, policy_version 23306 (0.0036) +[2024-11-08 02:20:02,932][41694] Fps is (10 sec: 7781.9, 60 sec: 6895.0, 300 sec: 6984.0). Total num frames: 95494144. Throughput: 0: 1795.4. Samples: 18865964. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:20:02,934][41694] Avg episode reward: [(0, '4.547')] +[2024-11-08 02:20:03,887][42004] Updated weights for policy 0, policy_version 23316 (0.0041) +[2024-11-08 02:20:07,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7168.0, 300 sec: 6984.0). Total num frames: 95531008. Throughput: 0: 1776.2. Samples: 18876924. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:20:07,934][41694] Avg episode reward: [(0, '4.314')] +[2024-11-08 02:20:09,413][42004] Updated weights for policy 0, policy_version 23326 (0.0041) +[2024-11-08 02:20:12,934][41694] Fps is (10 sec: 6962.1, 60 sec: 7167.7, 300 sec: 6984.0). Total num frames: 95563776. Throughput: 0: 1764.3. Samples: 18887898. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:20:12,935][41694] Avg episode reward: [(0, '4.342')] +[2024-11-08 02:20:17,933][41694] Fps is (10 sec: 4914.7, 60 sec: 6826.6, 300 sec: 6914.6). Total num frames: 95580160. Throughput: 0: 1670.3. Samples: 18889450. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:20:17,935][41694] Avg episode reward: [(0, '4.370')] +[2024-11-08 02:20:18,510][42004] Updated weights for policy 0, policy_version 23336 (0.0035) +[2024-11-08 02:20:22,931][41694] Fps is (10 sec: 4916.3, 60 sec: 6690.1, 300 sec: 6886.8). Total num frames: 95612928. Throughput: 0: 1603.7. Samples: 18897604. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:20:22,934][41694] Avg episode reward: [(0, '4.442')] +[2024-11-08 02:20:24,261][42004] Updated weights for policy 0, policy_version 23346 (0.0033) +[2024-11-08 02:20:27,932][41694] Fps is (10 sec: 6964.0, 60 sec: 6690.1, 300 sec: 6886.8). Total num frames: 95649792. Throughput: 0: 1717.7. Samples: 18909154. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:20:27,936][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 02:20:29,681][42004] Updated weights for policy 0, policy_version 23356 (0.0026) +[2024-11-08 02:20:32,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6690.1, 300 sec: 6965.5). Total num frames: 95690752. Throughput: 0: 1734.9. Samples: 18914810. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:20:32,933][41694] Avg episode reward: [(0, '4.442')] +[2024-11-08 02:20:34,999][42004] Updated weights for policy 0, policy_version 23366 (0.0031) +[2024-11-08 02:20:37,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6690.1, 300 sec: 6970.1). Total num frames: 95727616. Throughput: 0: 1733.6. Samples: 18926280. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:20:37,933][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 02:20:40,304][42004] Updated weights for policy 0, policy_version 23376 (0.0021) +[2024-11-08 02:20:42,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7031.5, 300 sec: 6970.2). Total num frames: 95764480. Throughput: 0: 1714.1. Samples: 18937302. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:20:42,936][41694] Avg episode reward: [(0, '4.789')] +[2024-11-08 02:20:46,119][42004] Updated weights for policy 0, policy_version 23386 (0.0024) +[2024-11-08 02:20:49,870][41694] Fps is (10 sec: 5832.3, 60 sec: 6811.4, 300 sec: 6897.0). Total num frames: 95797248. Throughput: 0: 1633.6. Samples: 18942642. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:20:49,872][41694] Avg episode reward: [(0, '4.574')] +[2024-11-08 02:20:52,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6690.1, 300 sec: 6873.0). Total num frames: 95817728. Throughput: 0: 1607.2. Samples: 18949248. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:20:52,935][41694] Avg episode reward: [(0, '4.333')] +[2024-11-08 02:20:54,556][42004] Updated weights for policy 0, policy_version 23396 (0.0032) +[2024-11-08 02:20:57,931][41694] Fps is (10 sec: 7113.7, 60 sec: 6621.9, 300 sec: 6873.0). Total num frames: 95854592. Throughput: 0: 1599.1. Samples: 18959854. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:20:57,933][41694] Avg episode reward: [(0, '4.266')] +[2024-11-08 02:20:59,846][42004] Updated weights for policy 0, policy_version 23406 (0.0036) +[2024-11-08 02:21:02,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6621.9, 300 sec: 6886.8). Total num frames: 95891456. Throughput: 0: 1694.4. Samples: 18965698. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:21:02,934][41694] Avg episode reward: [(0, '4.454')] +[2024-11-08 02:21:05,305][42004] Updated weights for policy 0, policy_version 23416 (0.0023) +[2024-11-08 02:21:07,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6690.2, 300 sec: 6967.0). Total num frames: 95932416. Throughput: 0: 1762.9. Samples: 18976934. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:21:07,933][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 02:21:10,905][42004] Updated weights for policy 0, policy_version 23426 (0.0030) +[2024-11-08 02:21:12,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.4, 300 sec: 6956.3). Total num frames: 95965184. Throughput: 0: 1744.2. Samples: 18987644. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:21:12,934][41694] Avg episode reward: [(0, '4.255')] +[2024-11-08 02:21:16,626][42004] Updated weights for policy 0, policy_version 23436 (0.0031) +[2024-11-08 02:21:17,932][41694] Fps is (10 sec: 6963.1, 60 sec: 7031.6, 300 sec: 6942.4). Total num frames: 96002048. Throughput: 0: 1733.8. Samples: 18992830. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:21:17,940][41694] Avg episode reward: [(0, '4.351')] +[2024-11-08 02:21:24,346][41694] Fps is (10 sec: 5741.7, 60 sec: 6802.9, 300 sec: 6881.6). Total num frames: 96030720. Throughput: 0: 1671.6. Samples: 19003866. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:21:24,354][41694] Avg episode reward: [(0, '4.465')] +[2024-11-08 02:21:24,667][42004] Updated weights for policy 0, policy_version 23446 (0.0035) +[2024-11-08 02:21:27,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6758.4, 300 sec: 6859.1). Total num frames: 96055296. Throughput: 0: 1622.1. Samples: 19010298. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:21:27,934][41694] Avg episode reward: [(0, '4.424')] +[2024-11-08 02:21:30,514][42004] Updated weights for policy 0, policy_version 23456 (0.0040) +[2024-11-08 02:21:32,932][41694] Fps is (10 sec: 7155.2, 60 sec: 6690.0, 300 sec: 6859.1). Total num frames: 96092160. Throughput: 0: 1692.7. Samples: 19015534. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:21:32,934][41694] Avg episode reward: [(0, '4.548')] +[2024-11-08 02:21:35,991][42004] Updated weights for policy 0, policy_version 23466 (0.0025) +[2024-11-08 02:21:37,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.1, 300 sec: 6873.0). Total num frames: 96129024. Throughput: 0: 1726.4. Samples: 19026934. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:21:37,934][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 02:21:38,016][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000023470_96133120.pth... +[2024-11-08 02:21:38,114][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000023066_94478336.pth +[2024-11-08 02:21:41,400][42004] Updated weights for policy 0, policy_version 23476 (0.0026) +[2024-11-08 02:21:42,931][41694] Fps is (10 sec: 7373.5, 60 sec: 6690.1, 300 sec: 6927.4). Total num frames: 96165888. Throughput: 0: 1743.1. Samples: 19038294. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:21:42,934][41694] Avg episode reward: [(0, '4.527')] +[2024-11-08 02:21:46,701][42004] Updated weights for policy 0, policy_version 23486 (0.0026) +[2024-11-08 02:21:47,932][41694] Fps is (10 sec: 7781.8, 60 sec: 7054.6, 300 sec: 6942.4). Total num frames: 96206848. Throughput: 0: 1738.6. Samples: 19043936. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:21:47,934][41694] Avg episode reward: [(0, '4.426')] +[2024-11-08 02:21:52,289][42004] Updated weights for policy 0, policy_version 23496 (0.0028) +[2024-11-08 02:21:52,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7099.7, 300 sec: 6928.5). Total num frames: 96243712. Throughput: 0: 1733.6. Samples: 19054944. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:21:52,934][41694] Avg episode reward: [(0, '4.501')] +[2024-11-08 02:21:58,789][41694] Fps is (10 sec: 5659.0, 60 sec: 6797.7, 300 sec: 6880.7). Total num frames: 96268288. Throughput: 0: 1588.5. Samples: 19060488. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:21:58,793][41694] Avg episode reward: [(0, '4.336')] +[2024-11-08 02:22:00,619][42004] Updated weights for policy 0, policy_version 23506 (0.0028) +[2024-11-08 02:22:02,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6690.2, 300 sec: 6845.2). Total num frames: 96292864. Throughput: 0: 1640.3. Samples: 19066642. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:22:02,934][41694] Avg episode reward: [(0, '4.478')] +[2024-11-08 02:22:06,591][42004] Updated weights for policy 0, policy_version 23516 (0.0037) +[2024-11-08 02:22:07,931][41694] Fps is (10 sec: 6720.5, 60 sec: 6621.9, 300 sec: 6845.2). Total num frames: 96329728. Throughput: 0: 1672.1. Samples: 19076748. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:22:07,934][41694] Avg episode reward: [(0, '4.519')] +[2024-11-08 02:22:11,987][42004] Updated weights for policy 0, policy_version 23526 (0.0029) +[2024-11-08 02:22:12,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.1, 300 sec: 6859.1). Total num frames: 96366592. Throughput: 0: 1732.7. Samples: 19088268. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:22:12,933][41694] Avg episode reward: [(0, '4.539')] +[2024-11-08 02:22:17,038][42004] Updated weights for policy 0, policy_version 23536 (0.0026) +[2024-11-08 02:22:17,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 6932.5). Total num frames: 96407552. Throughput: 0: 1748.3. Samples: 19094204. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:22:17,934][41694] Avg episode reward: [(0, '4.522')] +[2024-11-08 02:22:22,297][42004] Updated weights for policy 0, policy_version 23546 (0.0023) +[2024-11-08 02:22:22,931][41694] Fps is (10 sec: 8192.1, 60 sec: 7131.3, 300 sec: 6942.4). Total num frames: 96448512. Throughput: 0: 1756.1. Samples: 19105958. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:22:22,933][41694] Avg episode reward: [(0, '4.421')] +[2024-11-08 02:22:27,651][42004] Updated weights for policy 0, policy_version 23556 (0.0029) +[2024-11-08 02:22:27,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7168.0, 300 sec: 6942.4). Total num frames: 96485376. Throughput: 0: 1763.8. Samples: 19117664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:22:27,933][41694] Avg episode reward: [(0, '4.491')] +[2024-11-08 02:22:33,257][41694] Fps is (10 sec: 5553.5, 60 sec: 6857.8, 300 sec: 6865.4). Total num frames: 96505856. Throughput: 0: 1733.4. Samples: 19122504. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:22:33,260][41694] Avg episode reward: [(0, '4.495')] +[2024-11-08 02:22:35,975][42004] Updated weights for policy 0, policy_version 23566 (0.0033) +[2024-11-08 02:22:37,932][41694] Fps is (10 sec: 5324.3, 60 sec: 6826.6, 300 sec: 6859.0). Total num frames: 96538624. Throughput: 0: 1649.4. Samples: 19129170. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:22:37,940][41694] Avg episode reward: [(0, '4.542')] +[2024-11-08 02:22:41,272][42004] Updated weights for policy 0, policy_version 23576 (0.0023) +[2024-11-08 02:22:42,932][41694] Fps is (10 sec: 7197.1, 60 sec: 6826.6, 300 sec: 6859.1). Total num frames: 96575488. Throughput: 0: 1818.2. Samples: 19140750. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:22:42,937][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 02:22:46,718][42004] Updated weights for policy 0, policy_version 23586 (0.0038) +[2024-11-08 02:22:47,931][41694] Fps is (10 sec: 7783.0, 60 sec: 6826.7, 300 sec: 6859.1). Total num frames: 96616448. Throughput: 0: 1771.9. Samples: 19146378. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:22:47,934][41694] Avg episode reward: [(0, '4.411')] +[2024-11-08 02:22:52,037][42004] Updated weights for policy 0, policy_version 23596 (0.0026) +[2024-11-08 02:22:52,931][41694] Fps is (10 sec: 7783.0, 60 sec: 6826.7, 300 sec: 6914.6). Total num frames: 96653312. Throughput: 0: 1805.7. Samples: 19158004. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:22:52,933][41694] Avg episode reward: [(0, '4.323')] +[2024-11-08 02:22:57,153][42004] Updated weights for policy 0, policy_version 23606 (0.0032) +[2024-11-08 02:22:57,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7202.7, 300 sec: 6942.4). Total num frames: 96694272. Throughput: 0: 1813.7. Samples: 19169884. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:22:57,935][41694] Avg episode reward: [(0, '4.341')] +[2024-11-08 02:23:02,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7236.3, 300 sec: 6928.5). Total num frames: 96727040. Throughput: 0: 1806.4. Samples: 19175490. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:23:02,933][41694] Avg episode reward: [(0, '4.307')] +[2024-11-08 02:23:03,027][42004] Updated weights for policy 0, policy_version 23616 (0.0029) +[2024-11-08 02:23:07,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6963.2, 300 sec: 6872.9). Total num frames: 96747520. Throughput: 0: 1745.4. Samples: 19184500. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:23:07,937][41694] Avg episode reward: [(0, '4.379')] +[2024-11-08 02:23:11,029][42004] Updated weights for policy 0, policy_version 23626 (0.0033) +[2024-11-08 02:23:12,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6963.2, 300 sec: 6859.1). Total num frames: 96784384. Throughput: 0: 1665.7. Samples: 19192622. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:23:12,934][41694] Avg episode reward: [(0, '4.651')] +[2024-11-08 02:23:16,300][42004] Updated weights for policy 0, policy_version 23636 (0.0036) +[2024-11-08 02:23:17,932][41694] Fps is (10 sec: 7782.1, 60 sec: 6963.1, 300 sec: 6872.9). Total num frames: 96825344. Throughput: 0: 1696.0. Samples: 19198274. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:23:17,934][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 02:23:21,751][42004] Updated weights for policy 0, policy_version 23646 (0.0036) +[2024-11-08 02:23:22,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6894.9, 300 sec: 6859.1). Total num frames: 96862208. Throughput: 0: 1790.5. Samples: 19209742. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:23:22,934][41694] Avg episode reward: [(0, '4.347')] +[2024-11-08 02:23:26,870][42004] Updated weights for policy 0, policy_version 23656 (0.0022) +[2024-11-08 02:23:27,932][41694] Fps is (10 sec: 7782.6, 60 sec: 6963.2, 300 sec: 6928.5). Total num frames: 96903168. Throughput: 0: 1797.2. Samples: 19221624. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:23:27,934][41694] Avg episode reward: [(0, '4.602')] +[2024-11-08 02:23:32,038][42004] Updated weights for policy 0, policy_version 23666 (0.0033) +[2024-11-08 02:23:32,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7275.8, 300 sec: 6942.4). Total num frames: 96940032. Throughput: 0: 1802.9. Samples: 19227510. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:23:32,943][41694] Avg episode reward: [(0, '4.518')] +[2024-11-08 02:23:37,858][42004] Updated weights for policy 0, policy_version 23676 (0.0049) +[2024-11-08 02:23:37,932][41694] Fps is (10 sec: 7372.9, 60 sec: 7304.6, 300 sec: 6942.4). Total num frames: 96976896. Throughput: 0: 1791.8. Samples: 19238634. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:23:37,933][41694] Avg episode reward: [(0, '4.495')] +[2024-11-08 02:23:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000023676_96976896.pth... +[2024-11-08 02:23:38,059][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000023273_95326208.pth +[2024-11-08 02:23:42,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6963.3, 300 sec: 6859.1). Total num frames: 96993280. Throughput: 0: 1674.7. Samples: 19245244. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:23:42,935][41694] Avg episode reward: [(0, '4.351')] +[2024-11-08 02:23:45,700][42004] Updated weights for policy 0, policy_version 23686 (0.0044) +[2024-11-08 02:23:47,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6963.2, 300 sec: 6872.9). Total num frames: 97034240. Throughput: 0: 1671.6. Samples: 19250714. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:23:47,934][41694] Avg episode reward: [(0, '4.373')] +[2024-11-08 02:23:50,906][42004] Updated weights for policy 0, policy_version 23696 (0.0020) +[2024-11-08 02:23:52,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6963.2, 300 sec: 6872.9). Total num frames: 97071104. Throughput: 0: 1731.2. Samples: 19262406. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:23:52,933][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 02:23:56,131][42004] Updated weights for policy 0, policy_version 23706 (0.0042) +[2024-11-08 02:23:57,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6886.9). Total num frames: 97112064. Throughput: 0: 1815.2. Samples: 19274306. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:23:57,934][41694] Avg episode reward: [(0, '4.338')] +[2024-11-08 02:24:01,805][42004] Updated weights for policy 0, policy_version 23716 (0.0025) +[2024-11-08 02:24:02,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7031.5, 300 sec: 6942.4). Total num frames: 97148928. Throughput: 0: 1809.0. Samples: 19279680. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:24:02,935][41694] Avg episode reward: [(0, '4.355')] +[2024-11-08 02:24:06,999][42004] Updated weights for policy 0, policy_version 23726 (0.0031) +[2024-11-08 02:24:07,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7304.5, 300 sec: 6956.3). Total num frames: 97185792. Throughput: 0: 1804.4. Samples: 19290938. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:24:07,939][41694] Avg episode reward: [(0, '4.469')] +[2024-11-08 02:24:12,932][41694] Fps is (10 sec: 6963.1, 60 sec: 7236.3, 300 sec: 6942.4). Total num frames: 97218560. Throughput: 0: 1774.6. Samples: 19301480. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:24:12,933][41694] Avg episode reward: [(0, '4.512')] +[2024-11-08 02:24:13,026][42004] Updated weights for policy 0, policy_version 23736 (0.0032) +[2024-11-08 02:24:17,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6895.0, 300 sec: 6872.9). Total num frames: 97239040. Throughput: 0: 1735.8. Samples: 19305620. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:24:17,933][41694] Avg episode reward: [(0, '4.419')] +[2024-11-08 02:24:20,799][42004] Updated weights for policy 0, policy_version 23746 (0.0034) +[2024-11-08 02:24:22,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6963.2, 300 sec: 6886.8). Total num frames: 97280000. Throughput: 0: 1674.9. Samples: 19314004. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:24:22,933][41694] Avg episode reward: [(0, '4.494')] +[2024-11-08 02:24:26,031][42004] Updated weights for policy 0, policy_version 23756 (0.0035) +[2024-11-08 02:24:27,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6895.0, 300 sec: 6872.9). Total num frames: 97316864. Throughput: 0: 1786.3. Samples: 19325628. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:24:27,933][41694] Avg episode reward: [(0, '4.270')] +[2024-11-08 02:24:31,229][42004] Updated weights for policy 0, policy_version 23766 (0.0022) +[2024-11-08 02:24:32,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6886.8). Total num frames: 97357824. Throughput: 0: 1795.7. Samples: 19331520. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:24:32,933][41694] Avg episode reward: [(0, '4.464')] +[2024-11-08 02:24:36,495][42004] Updated weights for policy 0, policy_version 23776 (0.0026) +[2024-11-08 02:24:37,933][41694] Fps is (10 sec: 7780.9, 60 sec: 6963.0, 300 sec: 6956.2). Total num frames: 97394688. Throughput: 0: 1796.1. Samples: 19343232. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:24:37,935][41694] Avg episode reward: [(0, '4.449')] +[2024-11-08 02:24:42,030][42004] Updated weights for policy 0, policy_version 23786 (0.0037) +[2024-11-08 02:24:42,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7304.5, 300 sec: 6970.1). Total num frames: 97431552. Throughput: 0: 1779.8. Samples: 19354398. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:24:42,933][41694] Avg episode reward: [(0, '4.412')] +[2024-11-08 02:24:47,932][41694] Fps is (10 sec: 6964.5, 60 sec: 7168.0, 300 sec: 6942.4). Total num frames: 97464320. Throughput: 0: 1760.7. Samples: 19358912. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:24:47,933][41694] Avg episode reward: [(0, '4.363')] +[2024-11-08 02:24:48,142][42004] Updated weights for policy 0, policy_version 23796 (0.0028) +[2024-11-08 02:24:52,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6894.9, 300 sec: 6872.9). Total num frames: 97484800. Throughput: 0: 1672.3. Samples: 19366192. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:24:52,933][41694] Avg episode reward: [(0, '4.340')] +[2024-11-08 02:24:55,884][42004] Updated weights for policy 0, policy_version 23806 (0.0028) +[2024-11-08 02:24:57,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6826.6, 300 sec: 6873.0). Total num frames: 97521664. Throughput: 0: 1679.9. Samples: 19377074. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:24:57,934][41694] Avg episode reward: [(0, '4.224')] +[2024-11-08 02:25:01,432][42004] Updated weights for policy 0, policy_version 23816 (0.0029) +[2024-11-08 02:25:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6873.0). Total num frames: 97558528. Throughput: 0: 1704.0. Samples: 19382302. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:25:02,935][41694] Avg episode reward: [(0, '4.312')] +[2024-11-08 02:25:07,298][42004] Updated weights for policy 0, policy_version 23826 (0.0027) +[2024-11-08 02:25:07,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6826.7, 300 sec: 6886.9). Total num frames: 97595392. Throughput: 0: 1754.1. Samples: 19392940. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:25:07,933][41694] Avg episode reward: [(0, '4.595')] +[2024-11-08 02:25:12,915][42004] Updated weights for policy 0, policy_version 23836 (0.0038) +[2024-11-08 02:25:12,932][41694] Fps is (10 sec: 7372.3, 60 sec: 6894.9, 300 sec: 6956.3). Total num frames: 97632256. Throughput: 0: 1740.1. Samples: 19403934. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:25:12,934][41694] Avg episode reward: [(0, '4.492')] +[2024-11-08 02:25:17,931][41694] Fps is (10 sec: 6553.6, 60 sec: 7031.5, 300 sec: 6942.4). Total num frames: 97660928. Throughput: 0: 1713.3. Samples: 19408620. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:25:17,938][41694] Avg episode reward: [(0, '4.528')] +[2024-11-08 02:25:19,597][42004] Updated weights for policy 0, policy_version 23846 (0.0031) +[2024-11-08 02:25:22,932][41694] Fps is (10 sec: 6144.4, 60 sec: 6894.9, 300 sec: 6928.5). Total num frames: 97693696. Throughput: 0: 1662.9. Samples: 19418058. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:25:22,934][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 02:25:27,482][42004] Updated weights for policy 0, policy_version 23856 (0.0028) +[2024-11-08 02:25:27,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.9, 300 sec: 6859.1). Total num frames: 97714176. Throughput: 0: 1571.4. Samples: 19425110. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:25:27,933][41694] Avg episode reward: [(0, '4.376')] +[2024-11-08 02:25:32,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6553.6, 300 sec: 6859.1). Total num frames: 97751040. Throughput: 0: 1589.2. Samples: 19430426. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:25:32,934][41694] Avg episode reward: [(0, '4.322')] +[2024-11-08 02:25:33,186][42004] Updated weights for policy 0, policy_version 23866 (0.0038) +[2024-11-08 02:25:37,932][41694] Fps is (10 sec: 7781.6, 60 sec: 6622.0, 300 sec: 6872.9). Total num frames: 97792000. Throughput: 0: 1681.0. Samples: 19441840. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:25:37,935][41694] Avg episode reward: [(0, '4.458')] +[2024-11-08 02:25:37,948][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000023875_97792000.pth... +[2024-11-08 02:25:38,058][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000023470_96133120.pth +[2024-11-08 02:25:38,358][42004] Updated weights for policy 0, policy_version 23876 (0.0023) +[2024-11-08 02:25:42,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6621.9, 300 sec: 6932.4). Total num frames: 97828864. Throughput: 0: 1706.9. Samples: 19453884. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:25:42,933][41694] Avg episode reward: [(0, '4.458')] +[2024-11-08 02:25:43,528][42004] Updated weights for policy 0, policy_version 23886 (0.0028) +[2024-11-08 02:25:47,932][41694] Fps is (10 sec: 7783.1, 60 sec: 6758.4, 300 sec: 6956.3). Total num frames: 97869824. Throughput: 0: 1720.6. Samples: 19459730. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:25:47,934][41694] Avg episode reward: [(0, '4.517')] +[2024-11-08 02:25:48,690][42004] Updated weights for policy 0, policy_version 23896 (0.0032) +[2024-11-08 02:25:52,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6942.4). Total num frames: 97902592. Throughput: 0: 1729.4. Samples: 19470764. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:25:52,934][41694] Avg episode reward: [(0, '4.547')] +[2024-11-08 02:25:54,820][42004] Updated weights for policy 0, policy_version 23906 (0.0041) +[2024-11-08 02:26:00,222][41694] Fps is (10 sec: 5665.7, 60 sec: 6707.2, 300 sec: 6888.9). Total num frames: 97939456. Throughput: 0: 1639.1. Samples: 19481446. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:26:00,225][41694] Avg episode reward: [(0, '4.282')] +[2024-11-08 02:26:02,778][42004] Updated weights for policy 0, policy_version 23916 (0.0038) +[2024-11-08 02:26:02,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6690.1, 300 sec: 6872.9). Total num frames: 97959936. Throughput: 0: 1658.0. Samples: 19483230. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:26:02,934][41694] Avg episode reward: [(0, '4.177')] +[2024-11-08 02:26:07,931][41694] Fps is (10 sec: 7437.9, 60 sec: 6690.1, 300 sec: 6886.8). Total num frames: 97996800. Throughput: 0: 1676.0. Samples: 19493476. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:26:07,933][41694] Avg episode reward: [(0, '4.208')] +[2024-11-08 02:26:08,334][42004] Updated weights for policy 0, policy_version 23926 (0.0029) +[2024-11-08 02:26:12,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6622.0, 300 sec: 6872.9). Total num frames: 98029568. Throughput: 0: 1753.9. Samples: 19504034. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:26:12,935][41694] Avg episode reward: [(0, '4.574')] +[2024-11-08 02:26:14,347][42004] Updated weights for policy 0, policy_version 23936 (0.0029) +[2024-11-08 02:26:17,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 6934.0). Total num frames: 98066432. Throughput: 0: 1749.0. Samples: 19509130. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:26:17,933][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 02:26:19,798][42004] Updated weights for policy 0, policy_version 23946 (0.0030) +[2024-11-08 02:26:22,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6942.4). Total num frames: 98103296. Throughput: 0: 1749.9. Samples: 19520582. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:26:22,933][41694] Avg episode reward: [(0, '4.477')] +[2024-11-08 02:26:25,585][42004] Updated weights for policy 0, policy_version 23956 (0.0026) +[2024-11-08 02:26:27,932][41694] Fps is (10 sec: 6963.0, 60 sec: 7031.4, 300 sec: 6928.5). Total num frames: 98136064. Throughput: 0: 1712.3. Samples: 19530936. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:26:27,934][41694] Avg episode reward: [(0, '4.433')] +[2024-11-08 02:26:31,349][42004] Updated weights for policy 0, policy_version 23966 (0.0027) +[2024-11-08 02:26:34,567][41694] Fps is (10 sec: 5632.3, 60 sec: 6778.4, 300 sec: 6876.5). Total num frames: 98168832. Throughput: 0: 1639.6. Samples: 19536192. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:26:34,572][41694] Avg episode reward: [(0, '4.627')] +[2024-11-08 02:26:37,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6690.2, 300 sec: 6872.9). Total num frames: 98193408. Throughput: 0: 1601.6. Samples: 19542838. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:26:37,935][41694] Avg episode reward: [(0, '4.369')] +[2024-11-08 02:26:39,258][42004] Updated weights for policy 0, policy_version 23976 (0.0028) +[2024-11-08 02:26:42,931][41694] Fps is (10 sec: 7345.6, 60 sec: 6690.1, 300 sec: 6859.1). Total num frames: 98230272. Throughput: 0: 1691.7. Samples: 19553698. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:26:42,933][41694] Avg episode reward: [(0, '4.456')] +[2024-11-08 02:26:44,798][42004] Updated weights for policy 0, policy_version 23986 (0.0030) +[2024-11-08 02:26:47,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6690.1, 300 sec: 6873.0). Total num frames: 98271232. Throughput: 0: 1696.8. Samples: 19559586. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:26:47,932][41694] Avg episode reward: [(0, '4.361')] +[2024-11-08 02:26:49,956][42004] Updated weights for policy 0, policy_version 23996 (0.0024) +[2024-11-08 02:26:52,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 6934.8). Total num frames: 98308096. Throughput: 0: 1737.5. Samples: 19571662. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:26:52,935][41694] Avg episode reward: [(0, '4.622')] +[2024-11-08 02:26:55,119][42004] Updated weights for policy 0, policy_version 24006 (0.0026) +[2024-11-08 02:26:57,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7026.6, 300 sec: 6956.3). Total num frames: 98344960. Throughput: 0: 1761.3. Samples: 19583292. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:26:57,934][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 02:27:01,198][42004] Updated weights for policy 0, policy_version 24016 (0.0030) +[2024-11-08 02:27:02,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6963.2, 300 sec: 6942.4). Total num frames: 98377728. Throughput: 0: 1753.6. Samples: 19588042. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:27:02,934][41694] Avg episode reward: [(0, '4.383')] +[2024-11-08 02:27:09,234][41694] Fps is (10 sec: 5436.0, 60 sec: 6681.6, 300 sec: 6884.2). Total num frames: 98406400. Throughput: 0: 1683.7. Samples: 19598542. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:27:09,236][41694] Avg episode reward: [(0, '4.500')] +[2024-11-08 02:27:09,285][42004] Updated weights for policy 0, policy_version 24026 (0.0029) +[2024-11-08 02:27:12,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6758.4, 300 sec: 6872.9). Total num frames: 98435072. Throughput: 0: 1648.1. Samples: 19605100. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:27:12,934][41694] Avg episode reward: [(0, '4.323')] +[2024-11-08 02:27:15,301][42004] Updated weights for policy 0, policy_version 24036 (0.0042) +[2024-11-08 02:27:17,931][41694] Fps is (10 sec: 7064.4, 60 sec: 6690.1, 300 sec: 6845.2). Total num frames: 98467840. Throughput: 0: 1701.5. Samples: 19609974. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:27:17,933][41694] Avg episode reward: [(0, '4.541')] +[2024-11-08 02:27:20,892][42004] Updated weights for policy 0, policy_version 24046 (0.0042) +[2024-11-08 02:27:22,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6690.1, 300 sec: 6845.2). Total num frames: 98504704. Throughput: 0: 1739.1. Samples: 19621096. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:27:22,934][41694] Avg episode reward: [(0, '4.559')] +[2024-11-08 02:27:25,947][42004] Updated weights for policy 0, policy_version 24056 (0.0023) +[2024-11-08 02:27:27,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6826.7, 300 sec: 6922.2). Total num frames: 98545664. Throughput: 0: 1762.4. Samples: 19633004. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:27:27,934][41694] Avg episode reward: [(0, '4.494')] +[2024-11-08 02:27:31,294][42004] Updated weights for policy 0, policy_version 24066 (0.0045) +[2024-11-08 02:27:32,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7088.2, 300 sec: 6928.5). Total num frames: 98582528. Throughput: 0: 1761.4. Samples: 19638850. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:27:32,933][41694] Avg episode reward: [(0, '4.310')] +[2024-11-08 02:27:37,219][42004] Updated weights for policy 0, policy_version 24076 (0.0033) +[2024-11-08 02:27:37,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7099.8, 300 sec: 6928.5). Total num frames: 98619392. Throughput: 0: 1717.8. Samples: 19648964. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:27:37,935][41694] Avg episode reward: [(0, '4.392')] +[2024-11-08 02:27:37,952][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000024077_98619392.pth... +[2024-11-08 02:27:38,055][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000023676_96976896.pth +[2024-11-08 02:27:43,891][41694] Fps is (10 sec: 5605.8, 60 sec: 6786.3, 300 sec: 6850.7). Total num frames: 98643968. Throughput: 0: 1554.4. Samples: 19654732. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:27:43,893][41694] Avg episode reward: [(0, '4.246')] +[2024-11-08 02:27:45,087][42004] Updated weights for policy 0, policy_version 24086 (0.0023) +[2024-11-08 02:27:47,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6690.1, 300 sec: 6845.2). Total num frames: 98672640. Throughput: 0: 1634.2. Samples: 19661582. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:27:47,933][41694] Avg episode reward: [(0, '4.390')] +[2024-11-08 02:27:50,708][42004] Updated weights for policy 0, policy_version 24096 (0.0025) +[2024-11-08 02:27:52,932][41694] Fps is (10 sec: 7249.3, 60 sec: 6690.1, 300 sec: 6831.3). Total num frames: 98709504. Throughput: 0: 1689.7. Samples: 19672376. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:27:52,934][41694] Avg episode reward: [(0, '4.295')] +[2024-11-08 02:27:55,943][42004] Updated weights for policy 0, policy_version 24106 (0.0024) +[2024-11-08 02:27:57,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.4, 300 sec: 6859.1). Total num frames: 98750464. Throughput: 0: 1759.3. Samples: 19684270. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:27:57,934][41694] Avg episode reward: [(0, '4.535')] +[2024-11-08 02:28:01,539][42004] Updated weights for policy 0, policy_version 24116 (0.0033) +[2024-11-08 02:28:02,931][41694] Fps is (10 sec: 7782.8, 60 sec: 6826.7, 300 sec: 6914.6). Total num frames: 98787328. Throughput: 0: 1773.6. Samples: 19689788. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:28:02,933][41694] Avg episode reward: [(0, '4.491')] +[2024-11-08 02:28:07,058][42004] Updated weights for policy 0, policy_version 24126 (0.0035) +[2024-11-08 02:28:07,931][41694] Fps is (10 sec: 7372.7, 60 sec: 7117.8, 300 sec: 6914.6). Total num frames: 98824192. Throughput: 0: 1764.7. Samples: 19700508. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:28:07,932][41694] Avg episode reward: [(0, '4.524')] +[2024-11-08 02:28:12,668][42004] Updated weights for policy 0, policy_version 24136 (0.0028) +[2024-11-08 02:28:12,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7099.7, 300 sec: 6900.7). Total num frames: 98861056. Throughput: 0: 1746.0. Samples: 19711572. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:28:12,936][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 02:28:18,475][41694] Fps is (10 sec: 5827.1, 60 sec: 6900.6, 300 sec: 6846.4). Total num frames: 98885632. Throughput: 0: 1722.2. Samples: 19717284. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:28:18,477][41694] Avg episode reward: [(0, '4.389')] +[2024-11-08 02:28:20,514][42004] Updated weights for policy 0, policy_version 24146 (0.0049) +[2024-11-08 02:28:22,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6894.9, 300 sec: 6831.3). Total num frames: 98918400. Throughput: 0: 1660.0. Samples: 19723666. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:28:22,937][41694] Avg episode reward: [(0, '4.567')] +[2024-11-08 02:28:26,134][42004] Updated weights for policy 0, policy_version 24156 (0.0026) +[2024-11-08 02:28:27,931][41694] Fps is (10 sec: 7363.7, 60 sec: 6826.7, 300 sec: 6831.3). Total num frames: 98955264. Throughput: 0: 1822.5. Samples: 19734994. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:28:27,933][41694] Avg episode reward: [(0, '4.505')] +[2024-11-08 02:28:31,480][42004] Updated weights for policy 0, policy_version 24166 (0.0027) +[2024-11-08 02:28:32,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6831.3). Total num frames: 98992128. Throughput: 0: 1757.6. Samples: 19740674. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:28:32,934][41694] Avg episode reward: [(0, '4.499')] +[2024-11-08 02:28:36,852][42004] Updated weights for policy 0, policy_version 24176 (0.0028) +[2024-11-08 02:28:37,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6894.9, 300 sec: 6914.6). Total num frames: 99033088. Throughput: 0: 1773.9. Samples: 19752200. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:28:37,933][41694] Avg episode reward: [(0, '4.287')] +[2024-11-08 02:28:42,599][42004] Updated weights for policy 0, policy_version 24186 (0.0031) +[2024-11-08 02:28:42,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7145.8, 300 sec: 6886.8). Total num frames: 99065856. Throughput: 0: 1751.8. Samples: 19763102. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:28:42,934][41694] Avg episode reward: [(0, '4.527')] +[2024-11-08 02:28:47,893][42004] Updated weights for policy 0, policy_version 24196 (0.0029) +[2024-11-08 02:28:47,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7236.3, 300 sec: 6900.7). Total num frames: 99106816. Throughput: 0: 1747.8. Samples: 19768438. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:28:47,933][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 02:28:52,932][41694] Fps is (10 sec: 6553.5, 60 sec: 7031.5, 300 sec: 6845.2). Total num frames: 99131392. Throughput: 0: 1759.6. Samples: 19779690. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:28:52,934][41694] Avg episode reward: [(0, '4.348')] +[2024-11-08 02:28:55,032][42004] Updated weights for policy 0, policy_version 24206 (0.0029) +[2024-11-08 02:28:57,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6963.2, 300 sec: 6845.2). Total num frames: 99168256. Throughput: 0: 1706.9. Samples: 19788380. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:28:57,933][41694] Avg episode reward: [(0, '4.557')] +[2024-11-08 02:29:00,239][42004] Updated weights for policy 0, policy_version 24216 (0.0028) +[2024-11-08 02:29:02,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 6845.2). Total num frames: 99205120. Throughput: 0: 1731.9. Samples: 19794276. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:29:02,933][41694] Avg episode reward: [(0, '4.268')] +[2024-11-08 02:29:05,589][42004] Updated weights for policy 0, policy_version 24226 (0.0039) +[2024-11-08 02:29:07,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7031.5, 300 sec: 6873.0). Total num frames: 99246080. Throughput: 0: 1828.5. Samples: 19805950. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:29:07,934][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 02:29:10,786][42004] Updated weights for policy 0, policy_version 24236 (0.0029) +[2024-11-08 02:29:12,934][41694] Fps is (10 sec: 7780.8, 60 sec: 7031.2, 300 sec: 6928.4). Total num frames: 99282944. Throughput: 0: 1830.8. Samples: 19817382. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:29:12,935][41694] Avg episode reward: [(0, '4.234')] +[2024-11-08 02:29:16,850][42004] Updated weights for policy 0, policy_version 24246 (0.0040) +[2024-11-08 02:29:17,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7302.5, 300 sec: 6914.6). Total num frames: 99319808. Throughput: 0: 1813.8. Samples: 19822294. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:29:17,934][41694] Avg episode reward: [(0, '4.366')] +[2024-11-08 02:29:22,070][42004] Updated weights for policy 0, policy_version 24256 (0.0028) +[2024-11-08 02:29:22,931][41694] Fps is (10 sec: 7374.4, 60 sec: 7304.5, 300 sec: 6914.6). Total num frames: 99356672. Throughput: 0: 1811.3. Samples: 19833710. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:29:22,933][41694] Avg episode reward: [(0, '4.294')] +[2024-11-08 02:29:27,931][41694] Fps is (10 sec: 5734.5, 60 sec: 7031.5, 300 sec: 6845.2). Total num frames: 99377152. Throughput: 0: 1723.5. Samples: 19840660. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:29:27,933][41694] Avg episode reward: [(0, '4.262')] +[2024-11-08 02:29:30,043][42004] Updated weights for policy 0, policy_version 24266 (0.0037) +[2024-11-08 02:29:32,932][41694] Fps is (10 sec: 5733.9, 60 sec: 7031.4, 300 sec: 6845.2). Total num frames: 99414016. Throughput: 0: 1721.5. Samples: 19845906. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:29:32,935][41694] Avg episode reward: [(0, '4.442')] +[2024-11-08 02:29:35,472][42004] Updated weights for policy 0, policy_version 24276 (0.0027) +[2024-11-08 02:29:37,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6845.2). Total num frames: 99450880. Throughput: 0: 1726.3. Samples: 19857372. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:29:37,933][41694] Avg episode reward: [(0, '4.298')] +[2024-11-08 02:29:38,039][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000024281_99454976.pth... +[2024-11-08 02:29:38,176][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000023875_97792000.pth +[2024-11-08 02:29:40,782][42004] Updated weights for policy 0, policy_version 24286 (0.0027) +[2024-11-08 02:29:42,932][41694] Fps is (10 sec: 7373.0, 60 sec: 7031.4, 300 sec: 6859.1). Total num frames: 99487744. Throughput: 0: 1788.3. Samples: 19868856. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:29:42,942][41694] Avg episode reward: [(0, '4.486')] +[2024-11-08 02:29:46,389][42004] Updated weights for policy 0, policy_version 24296 (0.0029) +[2024-11-08 02:29:47,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6914.6). Total num frames: 99524608. Throughput: 0: 1776.8. Samples: 19874230. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:29:47,934][41694] Avg episode reward: [(0, '4.650')] +[2024-11-08 02:29:52,470][42004] Updated weights for policy 0, policy_version 24306 (0.0037) +[2024-11-08 02:29:52,937][41694] Fps is (10 sec: 6960.0, 60 sec: 7099.2, 300 sec: 6900.6). Total num frames: 99557376. Throughput: 0: 1749.9. Samples: 19884706. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:29:52,938][41694] Avg episode reward: [(0, '4.440')] +[2024-11-08 02:29:57,626][42004] Updated weights for policy 0, policy_version 24316 (0.0026) +[2024-11-08 02:29:57,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7168.0, 300 sec: 6914.6). Total num frames: 99598336. Throughput: 0: 1744.9. Samples: 19895898. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:29:57,933][41694] Avg episode reward: [(0, '4.599')] +[2024-11-08 02:30:02,932][41694] Fps is (10 sec: 6147.0, 60 sec: 6894.9, 300 sec: 6859.1). Total num frames: 99618816. Throughput: 0: 1734.8. Samples: 19900362. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:30:02,934][41694] Avg episode reward: [(0, '4.370')] +[2024-11-08 02:30:05,635][42004] Updated weights for policy 0, policy_version 24326 (0.0033) +[2024-11-08 02:30:07,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6826.6, 300 sec: 6859.1). Total num frames: 99655680. Throughput: 0: 1649.3. Samples: 19907930. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:30:07,934][41694] Avg episode reward: [(0, '4.342')] +[2024-11-08 02:30:11,080][42004] Updated weights for policy 0, policy_version 24336 (0.0037) +[2024-11-08 02:30:12,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6826.9, 300 sec: 6886.8). Total num frames: 99692544. Throughput: 0: 1749.1. Samples: 19919368. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:30:12,933][41694] Avg episode reward: [(0, '4.414')] +[2024-11-08 02:30:17,194][42004] Updated weights for policy 0, policy_version 24346 (0.0032) +[2024-11-08 02:30:17,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6758.4, 300 sec: 6886.8). Total num frames: 99725312. Throughput: 0: 1740.7. Samples: 19924238. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:30:17,933][41694] Avg episode reward: [(0, '4.302')] +[2024-11-08 02:30:22,912][42004] Updated weights for policy 0, policy_version 24356 (0.0033) +[2024-11-08 02:30:22,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6942.4). Total num frames: 99762176. Throughput: 0: 1714.9. Samples: 19934544. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:30:22,933][41694] Avg episode reward: [(0, '4.515')] +[2024-11-08 02:30:27,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7031.5, 300 sec: 6942.4). Total num frames: 99799040. Throughput: 0: 1705.7. Samples: 19945610. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:30:27,933][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 02:30:28,411][42004] Updated weights for policy 0, policy_version 24366 (0.0030) +[2024-11-08 02:30:32,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7031.5, 300 sec: 6928.5). Total num frames: 99835904. Throughput: 0: 1712.9. Samples: 19951310. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:30:32,934][41694] Avg episode reward: [(0, '4.410')] +[2024-11-08 02:30:33,710][42004] Updated weights for policy 0, policy_version 24376 (0.0027) +[2024-11-08 02:30:37,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6758.4, 300 sec: 6872.9). Total num frames: 99856384. Throughput: 0: 1651.8. Samples: 19959030. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:30:37,934][41694] Avg episode reward: [(0, '4.552')] +[2024-11-08 02:30:41,469][42004] Updated weights for policy 0, policy_version 24386 (0.0022) +[2024-11-08 02:30:42,932][41694] Fps is (10 sec: 5734.0, 60 sec: 6758.4, 300 sec: 6859.0). Total num frames: 99893248. Throughput: 0: 1643.7. Samples: 19969868. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:30:42,935][41694] Avg episode reward: [(0, '4.437')] +[2024-11-08 02:30:46,858][42004] Updated weights for policy 0, policy_version 24396 (0.0058) +[2024-11-08 02:30:47,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6826.7, 300 sec: 6886.8). Total num frames: 99934208. Throughput: 0: 1666.6. Samples: 19975358. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:30:47,933][41694] Avg episode reward: [(0, '4.223')] +[2024-11-08 02:30:51,895][42004] Updated weights for policy 0, policy_version 24406 (0.0028) +[2024-11-08 02:30:52,931][41694] Fps is (10 sec: 7783.0, 60 sec: 6895.5, 300 sec: 6940.7). Total num frames: 99971072. Throughput: 0: 1767.7. Samples: 19987478. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:30:52,934][41694] Avg episode reward: [(0, '4.353')] +[2024-11-08 02:30:57,761][42004] Updated weights for policy 0, policy_version 24416 (0.0040) +[2024-11-08 02:30:57,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6942.4). Total num frames: 100007936. Throughput: 0: 1753.1. Samples: 19998256. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:30:57,933][41694] Avg episode reward: [(0, '4.583')] +[2024-11-08 02:31:02,932][41694] Fps is (10 sec: 7372.4, 60 sec: 7099.7, 300 sec: 6942.4). Total num frames: 100044800. Throughput: 0: 1768.3. Samples: 20003814. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:31:02,934][41694] Avg episode reward: [(0, '4.453')] +[2024-11-08 02:31:03,368][42004] Updated weights for policy 0, policy_version 24426 (0.0021) +[2024-11-08 02:31:07,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7099.7, 300 sec: 6956.3). Total num frames: 100081664. Throughput: 0: 1784.1. Samples: 20014828. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:31:07,933][41694] Avg episode reward: [(0, '4.635')] +[2024-11-08 02:31:11,273][42004] Updated weights for policy 0, policy_version 24436 (0.0035) +[2024-11-08 02:31:12,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6758.3, 300 sec: 6886.8). Total num frames: 100098048. Throughput: 0: 1672.3. Samples: 20020864. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:31:12,935][41694] Avg episode reward: [(0, '4.472')] +[2024-11-08 02:31:17,359][42004] Updated weights for policy 0, policy_version 24446 (0.0029) +[2024-11-08 02:31:17,932][41694] Fps is (10 sec: 5324.5, 60 sec: 6826.6, 300 sec: 6886.8). Total num frames: 100134912. Throughput: 0: 1653.1. Samples: 20025702. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:31:17,935][41694] Avg episode reward: [(0, '4.406')] +[2024-11-08 02:31:22,739][42004] Updated weights for policy 0, policy_version 24456 (0.0022) +[2024-11-08 02:31:22,932][41694] Fps is (10 sec: 7373.2, 60 sec: 6826.6, 300 sec: 6900.7). Total num frames: 100171776. Throughput: 0: 1739.8. Samples: 20037320. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:31:22,933][41694] Avg episode reward: [(0, '4.371')] +[2024-11-08 02:31:27,932][41694] Fps is (10 sec: 7373.3, 60 sec: 6826.7, 300 sec: 6953.2). Total num frames: 100208640. Throughput: 0: 1756.2. Samples: 20048896. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:31:27,935][41694] Avg episode reward: [(0, '4.442')] +[2024-11-08 02:31:28,097][42004] Updated weights for policy 0, policy_version 24466 (0.0027) +[2024-11-08 02:31:32,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6956.3). Total num frames: 100245504. Throughput: 0: 1748.4. Samples: 20054038. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:31:32,933][41694] Avg episode reward: [(0, '4.305')] +[2024-11-08 02:31:33,787][42004] Updated weights for policy 0, policy_version 24476 (0.0030) +[2024-11-08 02:31:37,932][41694] Fps is (10 sec: 6962.9, 60 sec: 7031.4, 300 sec: 6942.4). Total num frames: 100278272. Throughput: 0: 1716.2. Samples: 20064708. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:31:37,937][41694] Avg episode reward: [(0, '4.249')] +[2024-11-08 02:31:38,104][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000024483_100282368.pth... +[2024-11-08 02:31:38,216][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000024077_98619392.pth +[2024-11-08 02:31:39,831][42004] Updated weights for policy 0, policy_version 24486 (0.0033) +[2024-11-08 02:31:42,931][41694] Fps is (10 sec: 6963.2, 60 sec: 7031.6, 300 sec: 6928.5). Total num frames: 100315136. Throughput: 0: 1710.3. Samples: 20075220. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:31:42,933][41694] Avg episode reward: [(0, '4.202')] +[2024-11-08 02:31:47,804][42004] Updated weights for policy 0, policy_version 24496 (0.0024) +[2024-11-08 02:31:47,931][41694] Fps is (10 sec: 5734.7, 60 sec: 6690.1, 300 sec: 6872.9). Total num frames: 100335616. Throughput: 0: 1627.3. Samples: 20077042. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:31:47,933][41694] Avg episode reward: [(0, '4.287')] +[2024-11-08 02:31:52,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.1, 300 sec: 6873.0). Total num frames: 100372480. Throughput: 0: 1617.1. Samples: 20087598. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:31:52,933][41694] Avg episode reward: [(0, '4.396')] +[2024-11-08 02:31:53,104][42004] Updated weights for policy 0, policy_version 24506 (0.0029) +[2024-11-08 02:31:57,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6758.4, 300 sec: 6900.7). Total num frames: 100413440. Throughput: 0: 1740.9. Samples: 20099204. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:31:57,933][41694] Avg episode reward: [(0, '4.287')] +[2024-11-08 02:31:58,424][42004] Updated weights for policy 0, policy_version 24516 (0.0029) +[2024-11-08 02:32:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.2, 300 sec: 6945.3). Total num frames: 100446208. Throughput: 0: 1760.6. Samples: 20104930. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:32:02,934][41694] Avg episode reward: [(0, '4.344')] +[2024-11-08 02:32:04,635][42004] Updated weights for policy 0, policy_version 24526 (0.0029) +[2024-11-08 02:32:07,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6621.9, 300 sec: 6928.5). Total num frames: 100478976. Throughput: 0: 1716.4. Samples: 20114560. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:32:07,934][41694] Avg episode reward: [(0, '4.340')] +[2024-11-08 02:32:10,153][42004] Updated weights for policy 0, policy_version 24536 (0.0028) +[2024-11-08 02:32:12,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7031.5, 300 sec: 6956.3). Total num frames: 100519936. Throughput: 0: 1714.3. Samples: 20126038. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:32:12,935][41694] Avg episode reward: [(0, '4.317')] +[2024-11-08 02:32:15,477][42004] Updated weights for policy 0, policy_version 24546 (0.0023) +[2024-11-08 02:32:19,978][41694] Fps is (10 sec: 6460.4, 60 sec: 6799.6, 300 sec: 6908.3). Total num frames: 100556800. Throughput: 0: 1651.9. Samples: 20131754. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:32:19,980][41694] Avg episode reward: [(0, '4.521')] +[2024-11-08 02:32:22,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6758.4, 300 sec: 6886.8). Total num frames: 100577280. Throughput: 0: 1648.1. Samples: 20138872. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:32:22,934][41694] Avg episode reward: [(0, '4.444')] +[2024-11-08 02:32:23,164][42004] Updated weights for policy 0, policy_version 24556 (0.0028) +[2024-11-08 02:32:27,932][41694] Fps is (10 sec: 7724.7, 60 sec: 6826.6, 300 sec: 6900.7). Total num frames: 100618240. Throughput: 0: 1673.7. Samples: 20150536. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:32:27,933][41694] Avg episode reward: [(0, '4.324')] +[2024-11-08 02:32:28,404][42004] Updated weights for policy 0, policy_version 24566 (0.0027) +[2024-11-08 02:32:32,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6826.7, 300 sec: 6900.7). Total num frames: 100655104. Throughput: 0: 1764.9. Samples: 20156464. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:32:32,933][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 02:32:33,599][42004] Updated weights for policy 0, policy_version 24576 (0.0028) +[2024-11-08 02:32:37,931][41694] Fps is (10 sec: 7373.1, 60 sec: 6895.0, 300 sec: 6965.0). Total num frames: 100691968. Throughput: 0: 1772.4. Samples: 20167354. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:32:37,933][41694] Avg episode reward: [(0, '4.602')] +[2024-11-08 02:32:39,595][42004] Updated weights for policy 0, policy_version 24586 (0.0024) +[2024-11-08 02:32:42,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6970.1). Total num frames: 100728832. Throughput: 0: 1755.9. Samples: 20178220. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:32:42,933][41694] Avg episode reward: [(0, '4.853')] +[2024-11-08 02:32:45,028][42004] Updated weights for policy 0, policy_version 24596 (0.0031) +[2024-11-08 02:32:47,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7168.0, 300 sec: 6970.1). Total num frames: 100765696. Throughput: 0: 1753.7. Samples: 20183848. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:32:47,934][41694] Avg episode reward: [(0, '4.520')] +[2024-11-08 02:32:50,514][42004] Updated weights for policy 0, policy_version 24606 (0.0030) +[2024-11-08 02:32:54,584][41694] Fps is (10 sec: 5975.6, 60 sec: 6909.4, 300 sec: 6903.7). Total num frames: 100798464. Throughput: 0: 1726.8. Samples: 20195118. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:32:54,588][41694] Avg episode reward: [(0, '4.556')] +[2024-11-08 02:32:57,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6826.7, 300 sec: 6900.7). Total num frames: 100823040. Throughput: 0: 1687.7. Samples: 20201986. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:32:57,934][41694] Avg episode reward: [(0, '4.396')] +[2024-11-08 02:32:58,151][42004] Updated weights for policy 0, policy_version 24616 (0.0026) +[2024-11-08 02:33:02,931][41694] Fps is (10 sec: 7360.5, 60 sec: 6894.9, 300 sec: 6900.7). Total num frames: 100859904. Throughput: 0: 1771.3. Samples: 20207836. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:33:02,934][41694] Avg episode reward: [(0, '4.354')] +[2024-11-08 02:33:03,699][42004] Updated weights for policy 0, policy_version 24626 (0.0029) +[2024-11-08 02:33:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6900.7). Total num frames: 100896768. Throughput: 0: 1776.5. Samples: 20218814. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:33:07,934][41694] Avg episode reward: [(0, '4.499')] +[2024-11-08 02:33:09,446][42004] Updated weights for policy 0, policy_version 24636 (0.0043) +[2024-11-08 02:33:12,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6941.3). Total num frames: 100929536. Throughput: 0: 1746.9. Samples: 20229146. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:33:12,934][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 02:33:15,201][42004] Updated weights for policy 0, policy_version 24646 (0.0037) +[2024-11-08 02:33:17,932][41694] Fps is (10 sec: 7372.9, 60 sec: 7138.4, 300 sec: 6956.3). Total num frames: 100970496. Throughput: 0: 1739.3. Samples: 20234734. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:33:17,933][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 02:33:20,994][42004] Updated weights for policy 0, policy_version 24656 (0.0021) +[2024-11-08 02:33:22,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7099.7, 300 sec: 6942.4). Total num frames: 101003264. Throughput: 0: 1732.0. Samples: 20245292. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:33:22,934][41694] Avg episode reward: [(0, '4.530')] +[2024-11-08 02:33:26,506][42004] Updated weights for policy 0, policy_version 24666 (0.0033) +[2024-11-08 02:33:28,835][41694] Fps is (10 sec: 5634.7, 60 sec: 6792.6, 300 sec: 6893.5). Total num frames: 101031936. Throughput: 0: 1580.5. Samples: 20250772. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:33:28,837][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 02:33:32,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6826.7, 300 sec: 6886.8). Total num frames: 101064704. Throughput: 0: 1657.4. Samples: 20258432. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:33:32,933][41694] Avg episode reward: [(0, '4.343')] +[2024-11-08 02:33:33,762][42004] Updated weights for policy 0, policy_version 24676 (0.0034) +[2024-11-08 02:33:37,939][41694] Fps is (10 sec: 7648.5, 60 sec: 6825.8, 300 sec: 6900.5). Total num frames: 101101568. Throughput: 0: 1726.6. Samples: 20269976. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:33:37,944][41694] Avg episode reward: [(0, '4.400')] +[2024-11-08 02:33:37,964][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000024683_101101568.pth... +[2024-11-08 02:33:38,070][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000024281_99454976.pth +[2024-11-08 02:33:39,160][42004] Updated weights for policy 0, policy_version 24686 (0.0023) +[2024-11-08 02:33:42,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6886.8). Total num frames: 101138432. Throughput: 0: 1756.8. Samples: 20281042. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:33:42,935][41694] Avg episode reward: [(0, '4.515')] +[2024-11-08 02:33:45,220][42004] Updated weights for policy 0, policy_version 24696 (0.0032) +[2024-11-08 02:33:47,931][41694] Fps is (10 sec: 6968.8, 60 sec: 6758.4, 300 sec: 6914.6). Total num frames: 101171200. Throughput: 0: 1737.4. Samples: 20286020. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:33:47,933][41694] Avg episode reward: [(0, '4.379')] +[2024-11-08 02:33:50,796][42004] Updated weights for policy 0, policy_version 24706 (0.0033) +[2024-11-08 02:33:52,931][41694] Fps is (10 sec: 6963.2, 60 sec: 7020.0, 300 sec: 6914.6). Total num frames: 101208064. Throughput: 0: 1737.3. Samples: 20296994. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:33:52,933][41694] Avg episode reward: [(0, '4.299')] +[2024-11-08 02:33:56,146][42004] Updated weights for policy 0, policy_version 24716 (0.0034) +[2024-11-08 02:33:57,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7099.7, 300 sec: 6928.5). Total num frames: 101249024. Throughput: 0: 1759.8. Samples: 20308338. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:33:57,934][41694] Avg episode reward: [(0, '4.524')] +[2024-11-08 02:34:03,285][41694] Fps is (10 sec: 5934.1, 60 sec: 6786.7, 300 sec: 6850.8). Total num frames: 101269504. Throughput: 0: 1738.1. Samples: 20313562. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:34:03,287][41694] Avg episode reward: [(0, '4.288')] +[2024-11-08 02:34:04,025][42004] Updated weights for policy 0, policy_version 24726 (0.0035) +[2024-11-08 02:34:07,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 101302272. Throughput: 0: 1665.0. Samples: 20320218. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 02:34:07,934][41694] Avg episode reward: [(0, '4.266')] +[2024-11-08 02:34:09,718][42004] Updated weights for policy 0, policy_version 24736 (0.0028) +[2024-11-08 02:34:12,932][41694] Fps is (10 sec: 7218.4, 60 sec: 6826.6, 300 sec: 6845.2). Total num frames: 101339136. Throughput: 0: 1825.3. Samples: 20331260. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 02:34:12,933][41694] Avg episode reward: [(0, '4.270')] +[2024-11-08 02:34:15,388][42004] Updated weights for policy 0, policy_version 24746 (0.0036) +[2024-11-08 02:34:17,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 101376000. Throughput: 0: 1738.9. Samples: 20336682. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 02:34:17,934][41694] Avg episode reward: [(0, '4.375')] +[2024-11-08 02:34:21,423][42004] Updated weights for policy 0, policy_version 24756 (0.0028) +[2024-11-08 02:34:22,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 6886.8). Total num frames: 101408768. Throughput: 0: 1709.0. Samples: 20346870. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:34:22,934][41694] Avg episode reward: [(0, '4.341')] +[2024-11-08 02:34:26,864][42004] Updated weights for policy 0, policy_version 24766 (0.0030) +[2024-11-08 02:34:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7069.7, 300 sec: 6900.7). Total num frames: 101449728. Throughput: 0: 1716.1. Samples: 20358266. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:34:27,934][41694] Avg episode reward: [(0, '4.545')] +[2024-11-08 02:34:32,049][42004] Updated weights for policy 0, policy_version 24776 (0.0025) +[2024-11-08 02:34:32,931][41694] Fps is (10 sec: 7782.7, 60 sec: 7031.5, 300 sec: 6900.7). Total num frames: 101486592. Throughput: 0: 1731.7. Samples: 20363948. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:34:32,933][41694] Avg episode reward: [(0, '4.474')] +[2024-11-08 02:34:37,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6759.3, 300 sec: 6845.2). Total num frames: 101507072. Throughput: 0: 1720.4. Samples: 20374414. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:34:37,933][41694] Avg episode reward: [(0, '4.344')] +[2024-11-08 02:34:39,831][42004] Updated weights for policy 0, policy_version 24786 (0.0026) +[2024-11-08 02:34:42,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 101543936. Throughput: 0: 1644.2. Samples: 20382328. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:34:42,934][41694] Avg episode reward: [(0, '4.154')] +[2024-11-08 02:34:45,409][42004] Updated weights for policy 0, policy_version 24796 (0.0029) +[2024-11-08 02:34:47,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.6, 300 sec: 6859.2). Total num frames: 101580800. Throughput: 0: 1663.7. Samples: 20387840. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:34:47,934][41694] Avg episode reward: [(0, '4.414')] +[2024-11-08 02:34:51,590][42004] Updated weights for policy 0, policy_version 24806 (0.0030) +[2024-11-08 02:34:52,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 101613568. Throughput: 0: 1727.2. Samples: 20397942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:34:52,934][41694] Avg episode reward: [(0, '4.490')] +[2024-11-08 02:34:57,190][42004] Updated weights for policy 0, policy_version 24816 (0.0040) +[2024-11-08 02:34:57,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 6886.8). Total num frames: 101650432. Throughput: 0: 1725.5. Samples: 20408908. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:34:57,933][41694] Avg episode reward: [(0, '4.293')] +[2024-11-08 02:35:02,877][42004] Updated weights for policy 0, policy_version 24826 (0.0038) +[2024-11-08 02:35:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7004.5, 300 sec: 6886.8). Total num frames: 101687296. Throughput: 0: 1729.8. Samples: 20414524. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:35:02,933][41694] Avg episode reward: [(0, '4.431')] +[2024-11-08 02:35:07,932][41694] Fps is (10 sec: 7372.1, 60 sec: 7031.3, 300 sec: 6886.8). Total num frames: 101724160. Throughput: 0: 1744.7. Samples: 20425384. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:35:07,936][41694] Avg episode reward: [(0, '4.622')] +[2024-11-08 02:35:08,154][42004] Updated weights for policy 0, policy_version 24836 (0.0035) +[2024-11-08 02:35:12,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 101744640. Throughput: 0: 1654.9. Samples: 20432734. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:35:12,932][41694] Avg episode reward: [(0, '4.701')] +[2024-11-08 02:35:16,410][42004] Updated weights for policy 0, policy_version 24846 (0.0037) +[2024-11-08 02:35:17,931][41694] Fps is (10 sec: 5325.3, 60 sec: 6690.2, 300 sec: 6831.3). Total num frames: 101777408. Throughput: 0: 1636.2. Samples: 20437578. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:35:17,933][41694] Avg episode reward: [(0, '4.436')] +[2024-11-08 02:35:22,235][42004] Updated weights for policy 0, policy_version 24856 (0.0030) +[2024-11-08 02:35:22,933][41694] Fps is (10 sec: 6552.9, 60 sec: 6690.1, 300 sec: 6817.4). Total num frames: 101810176. Throughput: 0: 1629.4. Samples: 20447740. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:35:22,936][41694] Avg episode reward: [(0, '4.498')] +[2024-11-08 02:35:27,932][41694] Fps is (10 sec: 6962.7, 60 sec: 6621.8, 300 sec: 6817.4). Total num frames: 101847040. Throughput: 0: 1682.3. Samples: 20458032. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:35:27,935][41694] Avg episode reward: [(0, '4.479')] +[2024-11-08 02:35:28,214][42004] Updated weights for policy 0, policy_version 24866 (0.0025) +[2024-11-08 02:35:32,931][41694] Fps is (10 sec: 7373.6, 60 sec: 6621.9, 300 sec: 6873.0). Total num frames: 101883904. Throughput: 0: 1687.7. Samples: 20463788. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:35:32,934][41694] Avg episode reward: [(0, '4.657')] +[2024-11-08 02:35:33,442][42004] Updated weights for policy 0, policy_version 24876 (0.0034) +[2024-11-08 02:35:37,932][41694] Fps is (10 sec: 7782.8, 60 sec: 6963.2, 300 sec: 6886.8). Total num frames: 101924864. Throughput: 0: 1723.0. Samples: 20475478. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:35:37,934][41694] Avg episode reward: [(0, '4.449')] +[2024-11-08 02:35:37,944][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000024884_101924864.pth... +[2024-11-08 02:35:38,038][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000024483_100282368.pth +[2024-11-08 02:35:38,849][42004] Updated weights for policy 0, policy_version 24886 (0.0026) +[2024-11-08 02:35:42,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6963.2, 300 sec: 6872.9). Total num frames: 101961728. Throughput: 0: 1733.6. Samples: 20486920. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:35:42,933][41694] Avg episode reward: [(0, '4.218')] +[2024-11-08 02:35:44,261][42004] Updated weights for policy 0, policy_version 24896 (0.0031) +[2024-11-08 02:35:47,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.1, 300 sec: 6817.4). Total num frames: 101982208. Throughput: 0: 1708.8. Samples: 20491422. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:35:47,934][41694] Avg episode reward: [(0, '4.468')] +[2024-11-08 02:35:51,797][42004] Updated weights for policy 0, policy_version 24906 (0.0028) +[2024-11-08 02:35:52,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6826.7, 300 sec: 6831.3). Total num frames: 102023168. Throughput: 0: 1652.4. Samples: 20499740. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:35:52,933][41694] Avg episode reward: [(0, '4.318')] +[2024-11-08 02:35:57,376][42004] Updated weights for policy 0, policy_version 24916 (0.0032) +[2024-11-08 02:35:57,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 102055936. Throughput: 0: 1739.4. Samples: 20511006. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:35:57,933][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 02:36:02,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 102092800. Throughput: 0: 1744.3. Samples: 20516072. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:36:02,933][41694] Avg episode reward: [(0, '4.468')] +[2024-11-08 02:36:03,288][42004] Updated weights for policy 0, policy_version 24926 (0.0033) +[2024-11-08 02:36:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.5, 300 sec: 6886.8). Total num frames: 102129664. Throughput: 0: 1756.0. Samples: 20526760. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:36:07,934][41694] Avg episode reward: [(0, '4.410')] +[2024-11-08 02:36:08,841][42004] Updated weights for policy 0, policy_version 24936 (0.0042) +[2024-11-08 02:36:12,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6963.2, 300 sec: 6873.0). Total num frames: 102162432. Throughput: 0: 1757.9. Samples: 20537138. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:36:12,934][41694] Avg episode reward: [(0, '4.337')] +[2024-11-08 02:36:14,887][42004] Updated weights for policy 0, policy_version 24946 (0.0027) +[2024-11-08 02:36:17,932][41694] Fps is (10 sec: 6963.2, 60 sec: 7031.4, 300 sec: 6872.9). Total num frames: 102199296. Throughput: 0: 1749.9. Samples: 20542532. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:36:17,934][41694] Avg episode reward: [(0, '4.464')] +[2024-11-08 02:36:22,441][42004] Updated weights for policy 0, policy_version 24956 (0.0037) +[2024-11-08 02:36:22,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6895.0, 300 sec: 6831.3). Total num frames: 102223872. Throughput: 0: 1654.4. Samples: 20549926. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:36:22,934][41694] Avg episode reward: [(0, '4.377')] +[2024-11-08 02:36:27,931][41694] Fps is (10 sec: 5734.6, 60 sec: 6826.8, 300 sec: 6817.4). Total num frames: 102256640. Throughput: 0: 1643.8. Samples: 20560890. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:36:27,933][41694] Avg episode reward: [(0, '4.319')] +[2024-11-08 02:36:27,962][42004] Updated weights for policy 0, policy_version 24966 (0.0045) +[2024-11-08 02:36:32,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6826.7, 300 sec: 6831.3). Total num frames: 102293504. Throughput: 0: 1660.1. Samples: 20566126. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:36:32,933][41694] Avg episode reward: [(0, '4.533')] +[2024-11-08 02:36:33,935][42004] Updated weights for policy 0, policy_version 24976 (0.0036) +[2024-11-08 02:36:37,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 102330368. Throughput: 0: 1710.2. Samples: 20576700. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:36:37,933][41694] Avg episode reward: [(0, '4.446')] +[2024-11-08 02:36:39,376][42004] Updated weights for policy 0, policy_version 24986 (0.0029) +[2024-11-08 02:36:42,931][41694] Fps is (10 sec: 6963.1, 60 sec: 6690.1, 300 sec: 6872.9). Total num frames: 102363136. Throughput: 0: 1703.8. Samples: 20587678. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:36:42,935][41694] Avg episode reward: [(0, '4.540')] +[2024-11-08 02:36:45,212][42004] Updated weights for policy 0, policy_version 24996 (0.0028) +[2024-11-08 02:36:47,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7031.5, 300 sec: 6886.8). Total num frames: 102404096. Throughput: 0: 1714.2. Samples: 20593212. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:36:47,934][41694] Avg episode reward: [(0, '4.586')] +[2024-11-08 02:36:50,577][42004] Updated weights for policy 0, policy_version 25006 (0.0030) +[2024-11-08 02:36:52,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6963.2, 300 sec: 6873.0). Total num frames: 102440960. Throughput: 0: 1731.5. Samples: 20604676. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:36:52,933][41694] Avg episode reward: [(0, '4.733')] +[2024-11-08 02:36:57,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 102461440. Throughput: 0: 1660.0. Samples: 20611838. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:36:57,933][41694] Avg episode reward: [(0, '4.496')] +[2024-11-08 02:36:58,129][42004] Updated weights for policy 0, policy_version 25016 (0.0028) +[2024-11-08 02:37:02,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 102498304. Throughput: 0: 1666.7. Samples: 20617532. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:37:02,934][41694] Avg episode reward: [(0, '4.406')] +[2024-11-08 02:37:03,868][42004] Updated weights for policy 0, policy_version 25026 (0.0031) +[2024-11-08 02:37:07,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 6817.4). Total num frames: 102531072. Throughput: 0: 1723.1. Samples: 20627464. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:37:07,934][41694] Avg episode reward: [(0, '4.614')] +[2024-11-08 02:37:09,802][42004] Updated weights for policy 0, policy_version 25036 (0.0030) +[2024-11-08 02:37:12,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 6865.0). Total num frames: 102567936. Throughput: 0: 1724.3. Samples: 20638484. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:37:12,933][41694] Avg episode reward: [(0, '4.611')] +[2024-11-08 02:37:15,166][42004] Updated weights for policy 0, policy_version 25046 (0.0033) +[2024-11-08 02:37:17,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6758.4, 300 sec: 6873.0). Total num frames: 102604800. Throughput: 0: 1736.1. Samples: 20644252. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:37:17,933][41694] Avg episode reward: [(0, '4.217')] +[2024-11-08 02:37:20,651][42004] Updated weights for policy 0, policy_version 25056 (0.0032) +[2024-11-08 02:37:22,932][41694] Fps is (10 sec: 7782.2, 60 sec: 7031.4, 300 sec: 6872.9). Total num frames: 102645760. Throughput: 0: 1753.0. Samples: 20655586. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:37:22,935][41694] Avg episode reward: [(0, '4.548')] +[2024-11-08 02:37:25,945][42004] Updated weights for policy 0, policy_version 25066 (0.0024) +[2024-11-08 02:37:30,126][41694] Fps is (10 sec: 6381.9, 60 sec: 6849.2, 300 sec: 6822.2). Total num frames: 102682624. Throughput: 0: 1682.5. Samples: 20667084. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:37:30,127][41694] Avg episode reward: [(0, '4.598')] +[2024-11-08 02:37:32,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6826.6, 300 sec: 6817.4). Total num frames: 102703104. Throughput: 0: 1672.5. Samples: 20668476. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:37:32,935][41694] Avg episode reward: [(0, '4.562')] +[2024-11-08 02:37:33,846][42004] Updated weights for policy 0, policy_version 25076 (0.0028) +[2024-11-08 02:37:37,932][41694] Fps is (10 sec: 6821.5, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 102735872. Throughput: 0: 1657.2. Samples: 20679252. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:37:37,934][41694] Avg episode reward: [(0, '4.416')] +[2024-11-08 02:37:37,997][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000025083_102739968.pth... +[2024-11-08 02:37:38,147][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000024683_101101568.pth +[2024-11-08 02:37:40,038][42004] Updated weights for policy 0, policy_version 25086 (0.0035) +[2024-11-08 02:37:42,932][41694] Fps is (10 sec: 6554.1, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 102768640. Throughput: 0: 1709.5. Samples: 20688766. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:37:42,933][41694] Avg episode reward: [(0, '4.440')] +[2024-11-08 02:37:45,752][42004] Updated weights for policy 0, policy_version 25096 (0.0039) +[2024-11-08 02:37:47,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6855.8). Total num frames: 102809600. Throughput: 0: 1709.2. Samples: 20694448. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:37:47,934][41694] Avg episode reward: [(0, '4.382')] +[2024-11-08 02:37:51,072][42004] Updated weights for policy 0, policy_version 25106 (0.0023) +[2024-11-08 02:37:52,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 6859.1). Total num frames: 102846464. Throughput: 0: 1745.0. Samples: 20705990. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:37:52,934][41694] Avg episode reward: [(0, '4.326')] +[2024-11-08 02:37:56,442][42004] Updated weights for policy 0, policy_version 25116 (0.0028) +[2024-11-08 02:37:57,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7031.5, 300 sec: 6859.1). Total num frames: 102883328. Throughput: 0: 1755.1. Samples: 20717464. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:37:57,940][41694] Avg episode reward: [(0, '4.370')] +[2024-11-08 02:38:02,188][42004] Updated weights for policy 0, policy_version 25126 (0.0028) +[2024-11-08 02:38:04,608][41694] Fps is (10 sec: 5963.5, 60 sec: 6773.9, 300 sec: 6806.5). Total num frames: 102916096. Throughput: 0: 1679.6. Samples: 20722652. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:38:04,611][41694] Avg episode reward: [(0, '4.521')] +[2024-11-08 02:38:07,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 102940672. Throughput: 0: 1645.2. Samples: 20729620. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:38:07,934][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 02:38:09,822][42004] Updated weights for policy 0, policy_version 25136 (0.0026) +[2024-11-08 02:38:12,931][41694] Fps is (10 sec: 6889.3, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 102973440. Throughput: 0: 1713.8. Samples: 20740446. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:38:12,934][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 02:38:15,934][42004] Updated weights for policy 0, policy_version 25146 (0.0027) +[2024-11-08 02:38:17,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 103010304. Throughput: 0: 1711.1. Samples: 20745472. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:38:17,933][41694] Avg episode reward: [(0, '4.613')] +[2024-11-08 02:38:21,517][42004] Updated weights for policy 0, policy_version 25156 (0.0026) +[2024-11-08 02:38:22,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.2, 300 sec: 6852.3). Total num frames: 103047168. Throughput: 0: 1714.5. Samples: 20756404. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:38:22,933][41694] Avg episode reward: [(0, '4.391')] +[2024-11-08 02:38:27,008][42004] Updated weights for policy 0, policy_version 25166 (0.0025) +[2024-11-08 02:38:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6944.1, 300 sec: 6845.2). Total num frames: 103084032. Throughput: 0: 1752.6. Samples: 20767632. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:38:27,934][41694] Avg episode reward: [(0, '4.694')] +[2024-11-08 02:38:32,156][42004] Updated weights for policy 0, policy_version 25176 (0.0027) +[2024-11-08 02:38:32,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7031.5, 300 sec: 6859.2). Total num frames: 103124992. Throughput: 0: 1757.7. Samples: 20773546. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:38:32,934][41694] Avg episode reward: [(0, '4.433')] +[2024-11-08 02:38:39,122][41694] Fps is (10 sec: 6222.4, 60 sec: 6827.7, 300 sec: 6803.8). Total num frames: 103153664. Throughput: 0: 1714.7. Samples: 20785194. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:38:39,124][41694] Avg episode reward: [(0, '4.469')] +[2024-11-08 02:38:39,784][42004] Updated weights for policy 0, policy_version 25186 (0.0024) +[2024-11-08 02:38:42,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6894.9, 300 sec: 6817.4). Total num frames: 103182336. Throughput: 0: 1661.1. Samples: 20792212. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:38:42,939][41694] Avg episode reward: [(0, '4.536')] +[2024-11-08 02:38:45,328][42004] Updated weights for policy 0, policy_version 25196 (0.0032) +[2024-11-08 02:38:47,931][41694] Fps is (10 sec: 7439.4, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 103219200. Throughput: 0: 1733.1. Samples: 20797736. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:38:47,933][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 02:38:51,225][42004] Updated weights for policy 0, policy_version 25206 (0.0030) +[2024-11-08 02:38:52,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 103256064. Throughput: 0: 1743.4. Samples: 20808074. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:38:52,934][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 02:38:56,641][42004] Updated weights for policy 0, policy_version 25216 (0.0027) +[2024-11-08 02:38:57,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6867.3). Total num frames: 103292928. Throughput: 0: 1760.1. Samples: 20819652. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:38:57,935][41694] Avg episode reward: [(0, '4.483')] +[2024-11-08 02:39:02,189][42004] Updated weights for policy 0, policy_version 25226 (0.0025) +[2024-11-08 02:39:02,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7093.1, 300 sec: 6872.9). Total num frames: 103329792. Throughput: 0: 1768.4. Samples: 20825048. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:39:02,933][41694] Avg episode reward: [(0, '4.453')] +[2024-11-08 02:39:07,544][42004] Updated weights for policy 0, policy_version 25236 (0.0032) +[2024-11-08 02:39:07,931][41694] Fps is (10 sec: 7373.0, 60 sec: 7099.8, 300 sec: 6873.0). Total num frames: 103366656. Throughput: 0: 1773.4. Samples: 20836206. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:39:07,935][41694] Avg episode reward: [(0, '4.376')] +[2024-11-08 02:39:13,623][41694] Fps is (10 sec: 6129.7, 60 sec: 6951.4, 300 sec: 6829.2). Total num frames: 103395328. Throughput: 0: 1629.2. Samples: 20842072. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:39:13,625][41694] Avg episode reward: [(0, '4.386')] +[2024-11-08 02:39:15,327][42004] Updated weights for policy 0, policy_version 25246 (0.0031) +[2024-11-08 02:39:17,933][41694] Fps is (10 sec: 5733.8, 60 sec: 6894.8, 300 sec: 6831.3). Total num frames: 103424000. Throughput: 0: 1675.3. Samples: 20848936. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:39:17,935][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 02:39:21,248][42004] Updated weights for policy 0, policy_version 25256 (0.0029) +[2024-11-08 02:39:22,932][41694] Fps is (10 sec: 6600.3, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 103456768. Throughput: 0: 1691.7. Samples: 20859308. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:39:22,934][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 02:39:26,913][42004] Updated weights for policy 0, policy_version 25266 (0.0028) +[2024-11-08 02:39:27,931][41694] Fps is (10 sec: 6964.0, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 103493632. Throughput: 0: 1733.5. Samples: 20870218. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:39:27,933][41694] Avg episode reward: [(0, '4.285')] +[2024-11-08 02:39:32,037][42004] Updated weights for policy 0, policy_version 25276 (0.0022) +[2024-11-08 02:39:32,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6826.7, 300 sec: 6873.0). Total num frames: 103534592. Throughput: 0: 1741.0. Samples: 20876082. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:39:32,933][41694] Avg episode reward: [(0, '4.372')] +[2024-11-08 02:39:37,336][42004] Updated weights for policy 0, policy_version 25286 (0.0023) +[2024-11-08 02:39:37,931][41694] Fps is (10 sec: 8192.0, 60 sec: 7173.8, 300 sec: 6886.8). Total num frames: 103575552. Throughput: 0: 1772.3. Samples: 20887828. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:39:37,933][41694] Avg episode reward: [(0, '4.403')] +[2024-11-08 02:39:37,943][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000025287_103575552.pth... +[2024-11-08 02:39:38,046][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000024884_101924864.pth +[2024-11-08 02:39:42,928][42004] Updated weights for policy 0, policy_version 25296 (0.0043) +[2024-11-08 02:39:42,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7168.0, 300 sec: 6886.8). Total num frames: 103612416. Throughput: 0: 1763.3. Samples: 20899002. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:39:42,933][41694] Avg episode reward: [(0, '4.661')] +[2024-11-08 02:39:48,134][41694] Fps is (10 sec: 5620.4, 60 sec: 6871.7, 300 sec: 6840.5). Total num frames: 103632896. Throughput: 0: 1758.7. Samples: 20904546. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:39:48,136][41694] Avg episode reward: [(0, '4.706')] +[2024-11-08 02:39:50,737][42004] Updated weights for policy 0, policy_version 25306 (0.0029) +[2024-11-08 02:39:52,932][41694] Fps is (10 sec: 5324.4, 60 sec: 6826.6, 300 sec: 6831.3). Total num frames: 103665664. Throughput: 0: 1672.9. Samples: 20911486. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:39:52,934][41694] Avg episode reward: [(0, '4.419')] +[2024-11-08 02:39:56,751][42004] Updated weights for policy 0, policy_version 25316 (0.0027) +[2024-11-08 02:39:57,932][41694] Fps is (10 sec: 7107.2, 60 sec: 6826.7, 300 sec: 6831.3). Total num frames: 103702528. Throughput: 0: 1793.3. Samples: 20921532. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:39:57,937][41694] Avg episode reward: [(0, '4.545')] +[2024-11-08 02:40:02,289][42004] Updated weights for policy 0, policy_version 25326 (0.0027) +[2024-11-08 02:40:02,932][41694] Fps is (10 sec: 7373.1, 60 sec: 6826.6, 300 sec: 6831.3). Total num frames: 103739392. Throughput: 0: 1734.3. Samples: 20926976. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:40:02,934][41694] Avg episode reward: [(0, '4.457')] +[2024-11-08 02:40:07,644][42004] Updated weights for policy 0, policy_version 25336 (0.0027) +[2024-11-08 02:40:07,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.6, 300 sec: 6886.8). Total num frames: 103776256. Throughput: 0: 1753.9. Samples: 20938232. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:40:07,934][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 02:40:12,932][41694] Fps is (10 sec: 7372.5, 60 sec: 7044.3, 300 sec: 6900.7). Total num frames: 103813120. Throughput: 0: 1763.8. Samples: 20949592. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:40:12,938][41694] Avg episode reward: [(0, '4.455')] +[2024-11-08 02:40:13,378][42004] Updated weights for policy 0, policy_version 25346 (0.0029) +[2024-11-08 02:40:17,931][41694] Fps is (10 sec: 6963.4, 60 sec: 7031.6, 300 sec: 6900.7). Total num frames: 103845888. Throughput: 0: 1737.9. Samples: 20954288. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:40:17,933][41694] Avg episode reward: [(0, '4.446')] +[2024-11-08 02:40:19,513][42004] Updated weights for policy 0, policy_version 25356 (0.0029) +[2024-11-08 02:40:22,931][41694] Fps is (10 sec: 4915.6, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 103862272. Throughput: 0: 1683.1. Samples: 20963568. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:40:22,933][41694] Avg episode reward: [(0, '4.359')] +[2024-11-08 02:40:27,634][42004] Updated weights for policy 0, policy_version 25366 (0.0024) +[2024-11-08 02:40:27,932][41694] Fps is (10 sec: 5324.3, 60 sec: 6758.3, 300 sec: 6831.3). Total num frames: 103899136. Throughput: 0: 1606.1. Samples: 20971276. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:40:27,934][41694] Avg episode reward: [(0, '4.656')] +[2024-11-08 02:40:32,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6817.4). Total num frames: 103936000. Throughput: 0: 1598.5. Samples: 20976152. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:40:32,933][41694] Avg episode reward: [(0, '4.510')] +[2024-11-08 02:40:33,377][42004] Updated weights for policy 0, policy_version 25376 (0.0038) +[2024-11-08 02:40:37,932][41694] Fps is (10 sec: 7373.4, 60 sec: 6621.9, 300 sec: 6817.4). Total num frames: 103972864. Throughput: 0: 1684.6. Samples: 20987290. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:40:37,935][41694] Avg episode reward: [(0, '4.505')] +[2024-11-08 02:40:38,875][42004] Updated weights for policy 0, policy_version 25386 (0.0031) +[2024-11-08 02:40:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6873.0). Total num frames: 104009728. Throughput: 0: 1713.4. Samples: 20998636. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:40:42,933][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 02:40:44,405][42004] Updated weights for policy 0, policy_version 25396 (0.0022) +[2024-11-08 02:40:47,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6918.3, 300 sec: 6859.1). Total num frames: 104046592. Throughput: 0: 1716.7. Samples: 21004228. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:40:47,935][41694] Avg episode reward: [(0, '4.159')] +[2024-11-08 02:40:49,768][42004] Updated weights for policy 0, policy_version 25406 (0.0026) +[2024-11-08 02:40:52,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.3, 300 sec: 6872.9). Total num frames: 104083456. Throughput: 0: 1719.6. Samples: 21015616. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:40:52,933][41694] Avg episode reward: [(0, '4.350')] +[2024-11-08 02:40:57,431][42004] Updated weights for policy 0, policy_version 25416 (0.0025) +[2024-11-08 02:40:57,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6690.2, 300 sec: 6817.4). Total num frames: 104103936. Throughput: 0: 1627.5. Samples: 21022830. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:40:57,933][41694] Avg episode reward: [(0, '4.323')] +[2024-11-08 02:41:02,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6621.9, 300 sec: 6803.5). Total num frames: 104136704. Throughput: 0: 1639.1. Samples: 21028046. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:41:02,933][41694] Avg episode reward: [(0, '4.285')] +[2024-11-08 02:41:03,516][42004] Updated weights for policy 0, policy_version 25426 (0.0027) +[2024-11-08 02:41:07,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 6817.4). Total num frames: 104173568. Throughput: 0: 1655.6. Samples: 21038072. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:41:07,933][41694] Avg episode reward: [(0, '4.553')] +[2024-11-08 02:41:09,058][42004] Updated weights for policy 0, policy_version 25436 (0.0031) +[2024-11-08 02:41:12,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6817.4). Total num frames: 104210432. Throughput: 0: 1720.1. Samples: 21048680. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:41:12,933][41694] Avg episode reward: [(0, '4.395')] +[2024-11-08 02:41:15,141][42004] Updated weights for policy 0, policy_version 25446 (0.0031) +[2024-11-08 02:41:17,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.1, 300 sec: 6859.1). Total num frames: 104247296. Throughput: 0: 1729.7. Samples: 21053990. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:41:17,933][41694] Avg episode reward: [(0, '4.277')] +[2024-11-08 02:41:20,672][42004] Updated weights for policy 0, policy_version 25456 (0.0031) +[2024-11-08 02:41:22,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7031.5, 300 sec: 6872.9). Total num frames: 104284160. Throughput: 0: 1732.4. Samples: 21065248. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:41:22,933][41694] Avg episode reward: [(0, '4.442')] +[2024-11-08 02:41:25,969][42004] Updated weights for policy 0, policy_version 25466 (0.0025) +[2024-11-08 02:41:27,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.6, 300 sec: 6872.9). Total num frames: 104321024. Throughput: 0: 1735.8. Samples: 21076748. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:41:27,933][41694] Avg episode reward: [(0, '4.526')] +[2024-11-08 02:41:32,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 104341504. Throughput: 0: 1704.2. Samples: 21080916. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:41:32,934][41694] Avg episode reward: [(0, '4.505')] +[2024-11-08 02:41:33,809][42004] Updated weights for policy 0, policy_version 25476 (0.0031) +[2024-11-08 02:41:37,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6690.1, 300 sec: 6817.4). Total num frames: 104374272. Throughput: 0: 1613.6. Samples: 21088226. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:41:37,934][41694] Avg episode reward: [(0, '4.371')] +[2024-11-08 02:41:37,943][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000025482_104374272.pth... +[2024-11-08 02:41:38,075][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000025083_102739968.pth +[2024-11-08 02:41:39,964][42004] Updated weights for policy 0, policy_version 25486 (0.0026) +[2024-11-08 02:41:42,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 6803.5). Total num frames: 104411136. Throughput: 0: 1691.5. Samples: 21098948. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:41:42,934][41694] Avg episode reward: [(0, '4.392')] +[2024-11-08 02:41:45,574][42004] Updated weights for policy 0, policy_version 25496 (0.0030) +[2024-11-08 02:41:47,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6803.5). Total num frames: 104448000. Throughput: 0: 1697.9. Samples: 21104450. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:41:47,933][41694] Avg episode reward: [(0, '4.408')] +[2024-11-08 02:41:50,997][42004] Updated weights for policy 0, policy_version 25506 (0.0028) +[2024-11-08 02:41:52,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6859.1). Total num frames: 104484864. Throughput: 0: 1724.7. Samples: 21115682. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:41:52,934][41694] Avg episode reward: [(0, '4.347')] +[2024-11-08 02:41:56,383][42004] Updated weights for policy 0, policy_version 25516 (0.0028) +[2024-11-08 02:41:57,941][41694] Fps is (10 sec: 7365.6, 60 sec: 6962.0, 300 sec: 6858.8). Total num frames: 104521728. Throughput: 0: 1743.3. Samples: 21127146. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:41:57,944][41694] Avg episode reward: [(0, '4.506')] +[2024-11-08 02:42:02,086][42004] Updated weights for policy 0, policy_version 25526 (0.0045) +[2024-11-08 02:42:02,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7031.5, 300 sec: 6873.0). Total num frames: 104558592. Throughput: 0: 1745.1. Samples: 21132518. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:42:02,933][41694] Avg episode reward: [(0, '4.389')] +[2024-11-08 02:42:07,932][41694] Fps is (10 sec: 5740.0, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 104579072. Throughput: 0: 1649.2. Samples: 21139462. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:42:07,934][41694] Avg episode reward: [(0, '4.476')] +[2024-11-08 02:42:10,214][42004] Updated weights for policy 0, policy_version 25536 (0.0034) +[2024-11-08 02:42:12,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6690.1, 300 sec: 6803.5). Total num frames: 104611840. Throughput: 0: 1612.3. Samples: 21149300. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:42:12,933][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 02:42:15,809][42004] Updated weights for policy 0, policy_version 25546 (0.0030) +[2024-11-08 02:42:17,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 6789.7). Total num frames: 104648704. Throughput: 0: 1647.2. Samples: 21155042. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:42:17,934][41694] Avg episode reward: [(0, '4.647')] +[2024-11-08 02:42:21,517][42004] Updated weights for policy 0, policy_version 25556 (0.0035) +[2024-11-08 02:42:22,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6840.5). Total num frames: 104685568. Throughput: 0: 1727.2. Samples: 21165950. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:42:22,933][41694] Avg episode reward: [(0, '4.493')] +[2024-11-08 02:42:26,857][42004] Updated weights for policy 0, policy_version 25566 (0.0029) +[2024-11-08 02:42:27,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6845.2). Total num frames: 104722432. Throughput: 0: 1744.0. Samples: 21177428. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:42:27,933][41694] Avg episode reward: [(0, '4.544')] +[2024-11-08 02:42:32,126][42004] Updated weights for policy 0, policy_version 25576 (0.0028) +[2024-11-08 02:42:32,932][41694] Fps is (10 sec: 7782.0, 60 sec: 7031.4, 300 sec: 6872.9). Total num frames: 104763392. Throughput: 0: 1746.7. Samples: 21183050. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:42:32,934][41694] Avg episode reward: [(0, '4.501')] +[2024-11-08 02:42:37,509][42004] Updated weights for policy 0, policy_version 25586 (0.0033) +[2024-11-08 02:42:37,932][41694] Fps is (10 sec: 7782.2, 60 sec: 7099.7, 300 sec: 6886.8). Total num frames: 104800256. Throughput: 0: 1753.8. Samples: 21194602. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:42:37,934][41694] Avg episode reward: [(0, '4.510')] +[2024-11-08 02:42:42,932][41694] Fps is (10 sec: 5325.0, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 104816640. Throughput: 0: 1643.4. Samples: 21201084. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:42:42,934][41694] Avg episode reward: [(0, '4.523')] +[2024-11-08 02:42:45,962][42004] Updated weights for policy 0, policy_version 25596 (0.0027) +[2024-11-08 02:42:47,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 104853504. Throughput: 0: 1635.2. Samples: 21206102. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:42:47,933][41694] Avg episode reward: [(0, '4.358')] +[2024-11-08 02:42:51,364][42004] Updated weights for policy 0, policy_version 25606 (0.0032) +[2024-11-08 02:42:52,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 104890368. Throughput: 0: 1732.0. Samples: 21217402. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:42:52,934][41694] Avg episode reward: [(0, '4.465')] +[2024-11-08 02:42:56,732][42004] Updated weights for policy 0, policy_version 25616 (0.0030) +[2024-11-08 02:42:57,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6827.8, 300 sec: 6870.3). Total num frames: 104931328. Throughput: 0: 1769.7. Samples: 21228936. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:42:57,933][41694] Avg episode reward: [(0, '4.279')] +[2024-11-08 02:43:02,259][42004] Updated weights for policy 0, policy_version 25626 (0.0030) +[2024-11-08 02:43:02,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6826.7, 300 sec: 6873.0). Total num frames: 104968192. Throughput: 0: 1764.5. Samples: 21234444. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:43:02,933][41694] Avg episode reward: [(0, '4.609')] +[2024-11-08 02:43:07,604][42004] Updated weights for policy 0, policy_version 25636 (0.0025) +[2024-11-08 02:43:07,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7099.7, 300 sec: 6886.8). Total num frames: 105005056. Throughput: 0: 1772.5. Samples: 21245714. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:43:07,933][41694] Avg episode reward: [(0, '4.454')] +[2024-11-08 02:43:14,987][41694] Fps is (10 sec: 6115.7, 60 sec: 6930.6, 300 sec: 6839.2). Total num frames: 105041920. Throughput: 0: 1689.5. Samples: 21256930. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:43:14,989][41694] Avg episode reward: [(0, '4.585')] +[2024-11-08 02:43:15,530][42004] Updated weights for policy 0, policy_version 25646 (0.0039) +[2024-11-08 02:43:17,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 105058304. Throughput: 0: 1669.6. Samples: 21258180. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:43:17,934][41694] Avg episode reward: [(0, '4.594')] +[2024-11-08 02:43:21,300][42004] Updated weights for policy 0, policy_version 25656 (0.0029) +[2024-11-08 02:43:22,932][41694] Fps is (10 sec: 6702.4, 60 sec: 6826.6, 300 sec: 6817.4). Total num frames: 105095168. Throughput: 0: 1647.8. Samples: 21268754. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:43:22,933][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 02:43:26,596][42004] Updated weights for policy 0, policy_version 25666 (0.0040) +[2024-11-08 02:43:27,932][41694] Fps is (10 sec: 7782.1, 60 sec: 6894.9, 300 sec: 6817.4). Total num frames: 105136128. Throughput: 0: 1762.2. Samples: 21280382. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:43:27,936][41694] Avg episode reward: [(0, '4.581')] +[2024-11-08 02:43:31,807][42004] Updated weights for policy 0, policy_version 25676 (0.0026) +[2024-11-08 02:43:32,931][41694] Fps is (10 sec: 8192.2, 60 sec: 6895.0, 300 sec: 6886.9). Total num frames: 105177088. Throughput: 0: 1779.3. Samples: 21286170. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:43:32,933][41694] Avg episode reward: [(0, '4.548')] +[2024-11-08 02:43:36,979][42004] Updated weights for policy 0, policy_version 25686 (0.0027) +[2024-11-08 02:43:37,938][41694] Fps is (10 sec: 7777.8, 60 sec: 6894.2, 300 sec: 6886.7). Total num frames: 105213952. Throughput: 0: 1794.2. Samples: 21298152. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:43:37,940][41694] Avg episode reward: [(0, '4.447')] +[2024-11-08 02:43:38,008][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000025688_105218048.pth... +[2024-11-08 02:43:38,112][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000025287_103575552.pth +[2024-11-08 02:43:42,477][42004] Updated weights for policy 0, policy_version 25696 (0.0026) +[2024-11-08 02:43:42,934][41694] Fps is (10 sec: 7370.8, 60 sec: 7236.0, 300 sec: 6886.8). Total num frames: 105250816. Throughput: 0: 1788.5. Samples: 21309422. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:43:42,937][41694] Avg episode reward: [(0, '4.393')] +[2024-11-08 02:43:49,489][41694] Fps is (10 sec: 6028.3, 60 sec: 6986.7, 300 sec: 6836.9). Total num frames: 105283584. Throughput: 0: 1730.5. Samples: 21315012. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:43:49,490][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 02:43:50,518][42004] Updated weights for policy 0, policy_version 25706 (0.0037) +[2024-11-08 02:43:52,932][41694] Fps is (10 sec: 5326.1, 60 sec: 6894.9, 300 sec: 6817.4). Total num frames: 105304064. Throughput: 0: 1678.7. Samples: 21321254. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:43:52,933][41694] Avg episode reward: [(0, '4.478')] +[2024-11-08 02:43:56,126][42004] Updated weights for policy 0, policy_version 25716 (0.0036) +[2024-11-08 02:43:57,932][41694] Fps is (10 sec: 7277.0, 60 sec: 6894.9, 300 sec: 6831.3). Total num frames: 105345024. Throughput: 0: 1762.4. Samples: 21332614. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:43:57,934][41694] Avg episode reward: [(0, '4.533')] +[2024-11-08 02:44:01,433][42004] Updated weights for policy 0, policy_version 25726 (0.0023) +[2024-11-08 02:44:02,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6894.9, 300 sec: 6831.3). Total num frames: 105381888. Throughput: 0: 1778.9. Samples: 21338232. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:44:02,934][41694] Avg episode reward: [(0, '4.598')] +[2024-11-08 02:44:06,834][42004] Updated weights for policy 0, policy_version 25736 (0.0034) +[2024-11-08 02:44:07,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6963.2, 300 sec: 6889.1). Total num frames: 105422848. Throughput: 0: 1794.2. Samples: 21349494. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:44:07,933][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 02:44:12,137][42004] Updated weights for policy 0, policy_version 25746 (0.0030) +[2024-11-08 02:44:12,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7210.2, 300 sec: 6900.7). Total num frames: 105459712. Throughput: 0: 1797.6. Samples: 21361274. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:44:12,934][41694] Avg episode reward: [(0, '4.532')] +[2024-11-08 02:44:17,413][42004] Updated weights for policy 0, policy_version 25756 (0.0027) +[2024-11-08 02:44:17,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7372.8, 300 sec: 6928.5). Total num frames: 105500672. Throughput: 0: 1794.7. Samples: 21366930. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:44:17,933][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 02:44:23,955][41694] Fps is (10 sec: 5573.4, 60 sec: 6980.6, 300 sec: 6849.2). Total num frames: 105521152. Throughput: 0: 1724.7. Samples: 21377520. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:44:23,964][41694] Avg episode reward: [(0, '4.488')] +[2024-11-08 02:44:25,865][42004] Updated weights for policy 0, policy_version 25766 (0.0021) +[2024-11-08 02:44:27,932][41694] Fps is (10 sec: 4915.2, 60 sec: 6895.0, 300 sec: 6831.3). Total num frames: 105549824. Throughput: 0: 1655.8. Samples: 21383930. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:44:27,933][41694] Avg episode reward: [(0, '4.281')] +[2024-11-08 02:44:31,467][42004] Updated weights for policy 0, policy_version 25776 (0.0028) +[2024-11-08 02:44:32,934][41694] Fps is (10 sec: 7299.3, 60 sec: 6826.4, 300 sec: 6817.4). Total num frames: 105586688. Throughput: 0: 1709.8. Samples: 21389296. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:44:32,937][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 02:44:36,924][42004] Updated weights for policy 0, policy_version 25786 (0.0039) +[2024-11-08 02:44:37,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6827.4, 300 sec: 6817.4). Total num frames: 105623552. Throughput: 0: 1763.3. Samples: 21400604. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:44:37,935][41694] Avg episode reward: [(0, '4.511')] +[2024-11-08 02:44:42,250][42004] Updated weights for policy 0, policy_version 25796 (0.0033) +[2024-11-08 02:44:42,931][41694] Fps is (10 sec: 7784.3, 60 sec: 6895.2, 300 sec: 6891.6). Total num frames: 105664512. Throughput: 0: 1770.1. Samples: 21412268. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:44:42,933][41694] Avg episode reward: [(0, '4.585')] +[2024-11-08 02:44:47,532][42004] Updated weights for policy 0, policy_version 25806 (0.0028) +[2024-11-08 02:44:47,931][41694] Fps is (10 sec: 7782.6, 60 sec: 7148.7, 300 sec: 6900.7). Total num frames: 105701376. Throughput: 0: 1769.4. Samples: 21417854. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:44:47,934][41694] Avg episode reward: [(0, '4.478')] +[2024-11-08 02:44:52,931][41694] Fps is (10 sec: 7372.7, 60 sec: 7236.3, 300 sec: 6900.7). Total num frames: 105738240. Throughput: 0: 1770.8. Samples: 21429180. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:44:52,933][41694] Avg episode reward: [(0, '4.567')] +[2024-11-08 02:44:53,537][42004] Updated weights for policy 0, policy_version 25816 (0.0035) +[2024-11-08 02:44:58,438][41694] Fps is (10 sec: 5458.0, 60 sec: 6837.3, 300 sec: 6833.5). Total num frames: 105758720. Throughput: 0: 1594.6. Samples: 21433840. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:44:58,439][41694] Avg episode reward: [(0, '4.347')] +[2024-11-08 02:45:01,785][42004] Updated weights for policy 0, policy_version 25826 (0.0026) +[2024-11-08 02:45:02,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 105787392. Throughput: 0: 1623.6. Samples: 21439994. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:45:02,935][41694] Avg episode reward: [(0, '4.418')] +[2024-11-08 02:45:07,291][42004] Updated weights for policy 0, policy_version 25836 (0.0033) +[2024-11-08 02:45:07,932][41694] Fps is (10 sec: 7334.4, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 105828352. Throughput: 0: 1671.7. Samples: 21451034. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:45:07,933][41694] Avg episode reward: [(0, '4.336')] +[2024-11-08 02:45:12,688][42004] Updated weights for policy 0, policy_version 25846 (0.0028) +[2024-11-08 02:45:12,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 105865216. Throughput: 0: 1746.6. Samples: 21462528. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:45:12,933][41694] Avg episode reward: [(0, '4.266')] +[2024-11-08 02:45:17,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6621.8, 300 sec: 6900.7). Total num frames: 105897984. Throughput: 0: 1736.1. Samples: 21467418. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 02:45:17,934][41694] Avg episode reward: [(0, '4.669')] +[2024-11-08 02:45:18,744][42004] Updated weights for policy 0, policy_version 25856 (0.0035) +[2024-11-08 02:45:22,931][41694] Fps is (10 sec: 6963.4, 60 sec: 7014.6, 300 sec: 6900.7). Total num frames: 105934848. Throughput: 0: 1720.6. Samples: 21478030. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 02:45:22,934][41694] Avg episode reward: [(0, '4.470')] +[2024-11-08 02:45:24,389][42004] Updated weights for policy 0, policy_version 25866 (0.0028) +[2024-11-08 02:45:27,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6963.2, 300 sec: 6886.8). Total num frames: 105967616. Throughput: 0: 1703.6. Samples: 21488930. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 02:45:27,933][41694] Avg episode reward: [(0, '4.342')] +[2024-11-08 02:45:30,614][42004] Updated weights for policy 0, policy_version 25876 (0.0026) +[2024-11-08 02:45:32,935][41694] Fps is (10 sec: 5323.1, 60 sec: 6690.0, 300 sec: 6831.2). Total num frames: 105988096. Throughput: 0: 1685.5. Samples: 21493708. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 02:45:32,936][41694] Avg episode reward: [(0, '4.470')] +[2024-11-08 02:45:37,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.2, 300 sec: 6831.3). Total num frames: 106024960. Throughput: 0: 1576.4. Samples: 21500116. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:45:37,933][41694] Avg episode reward: [(0, '4.481')] +[2024-11-08 02:45:37,946][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000025885_106024960.pth... +[2024-11-08 02:45:38,058][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000025482_104374272.pth +[2024-11-08 02:45:38,488][42004] Updated weights for policy 0, policy_version 25886 (0.0028) +[2024-11-08 02:45:42,932][41694] Fps is (10 sec: 7375.1, 60 sec: 6621.9, 300 sec: 6831.3). Total num frames: 106061824. Throughput: 0: 1746.9. Samples: 21511568. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:45:42,934][41694] Avg episode reward: [(0, '4.511')] +[2024-11-08 02:45:43,829][42004] Updated weights for policy 0, policy_version 25896 (0.0024) +[2024-11-08 02:45:47,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6621.9, 300 sec: 6831.3). Total num frames: 106098688. Throughput: 0: 1712.3. Samples: 21517048. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:45:47,934][41694] Avg episode reward: [(0, '4.463')] +[2024-11-08 02:45:49,325][42004] Updated weights for policy 0, policy_version 25906 (0.0030) +[2024-11-08 02:45:52,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6886.8). Total num frames: 106135552. Throughput: 0: 1717.3. Samples: 21528312. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:45:52,934][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 02:45:54,839][42004] Updated weights for policy 0, policy_version 25916 (0.0031) +[2024-11-08 02:45:57,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6953.6, 300 sec: 6900.7). Total num frames: 106172416. Throughput: 0: 1712.3. Samples: 21539580. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:45:57,933][41694] Avg episode reward: [(0, '4.484')] +[2024-11-08 02:46:00,637][42004] Updated weights for policy 0, policy_version 25926 (0.0025) +[2024-11-08 02:46:02,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6963.2, 300 sec: 6886.8). Total num frames: 106205184. Throughput: 0: 1723.4. Samples: 21544970. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:46:02,933][41694] Avg episode reward: [(0, '4.623')] +[2024-11-08 02:46:07,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6553.6, 300 sec: 6817.4). Total num frames: 106221568. Throughput: 0: 1659.8. Samples: 21552722. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:46:07,933][41694] Avg episode reward: [(0, '4.651')] +[2024-11-08 02:46:09,136][42004] Updated weights for policy 0, policy_version 25936 (0.0044) +[2024-11-08 02:46:12,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6485.4, 300 sec: 6803.5). Total num frames: 106254336. Throughput: 0: 1587.4. Samples: 21560364. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 02:46:12,935][41694] Avg episode reward: [(0, '4.669')] +[2024-11-08 02:46:15,491][42004] Updated weights for policy 0, policy_version 25946 (0.0036) +[2024-11-08 02:46:17,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6553.6, 300 sec: 6803.5). Total num frames: 106291200. Throughput: 0: 1593.8. Samples: 21565426. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 02:46:17,935][41694] Avg episode reward: [(0, '4.413')] +[2024-11-08 02:46:20,875][42004] Updated weights for policy 0, policy_version 25956 (0.0041) +[2024-11-08 02:46:22,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6553.6, 300 sec: 6803.5). Total num frames: 106328064. Throughput: 0: 1702.7. Samples: 21576738. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 02:46:22,935][41694] Avg episode reward: [(0, '4.562')] +[2024-11-08 02:46:26,269][42004] Updated weights for policy 0, policy_version 25966 (0.0027) +[2024-11-08 02:46:27,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6690.1, 300 sec: 6872.9). Total num frames: 106369024. Throughput: 0: 1702.2. Samples: 21588168. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:46:27,934][41694] Avg episode reward: [(0, '4.702')] +[2024-11-08 02:46:31,770][42004] Updated weights for policy 0, policy_version 25976 (0.0029) +[2024-11-08 02:46:32,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.6, 300 sec: 6886.8). Total num frames: 106405888. Throughput: 0: 1702.4. Samples: 21593658. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:46:32,933][41694] Avg episode reward: [(0, '4.458')] +[2024-11-08 02:46:37,697][42004] Updated weights for policy 0, policy_version 25986 (0.0034) +[2024-11-08 02:46:37,933][41694] Fps is (10 sec: 6962.5, 60 sec: 6894.8, 300 sec: 6872.9). Total num frames: 106438656. Throughput: 0: 1687.8. Samples: 21604266. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:46:37,941][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 02:46:42,931][41694] Fps is (10 sec: 4915.3, 60 sec: 6553.6, 300 sec: 6803.5). Total num frames: 106455040. Throughput: 0: 1572.3. Samples: 21610334. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:46:42,934][41694] Avg episode reward: [(0, '4.515')] +[2024-11-08 02:46:45,717][42004] Updated weights for policy 0, policy_version 25996 (0.0042) +[2024-11-08 02:46:47,932][41694] Fps is (10 sec: 5734.6, 60 sec: 6621.8, 300 sec: 6817.4). Total num frames: 106496000. Throughput: 0: 1580.6. Samples: 21616098. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:46:47,934][41694] Avg episode reward: [(0, '4.378')] +[2024-11-08 02:46:51,255][42004] Updated weights for policy 0, policy_version 26006 (0.0027) +[2024-11-08 02:46:52,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6553.6, 300 sec: 6803.8). Total num frames: 106528768. Throughput: 0: 1656.1. Samples: 21627246. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:46:52,933][41694] Avg episode reward: [(0, '4.332')] +[2024-11-08 02:46:56,777][42004] Updated weights for policy 0, policy_version 26016 (0.0026) +[2024-11-08 02:46:57,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6621.8, 300 sec: 6817.4). Total num frames: 106569728. Throughput: 0: 1736.3. Samples: 21638496. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:46:57,941][41694] Avg episode reward: [(0, '4.271')] +[2024-11-08 02:47:02,425][42004] Updated weights for policy 0, policy_version 26026 (0.0034) +[2024-11-08 02:47:02,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6621.8, 300 sec: 6859.1). Total num frames: 106602496. Throughput: 0: 1744.5. Samples: 21643930. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:47:02,933][41694] Avg episode reward: [(0, '4.394')] +[2024-11-08 02:47:07,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6963.2, 300 sec: 6873.0). Total num frames: 106639360. Throughput: 0: 1734.3. Samples: 21654782. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:47:07,933][41694] Avg episode reward: [(0, '4.407')] +[2024-11-08 02:47:07,960][42004] Updated weights for policy 0, policy_version 26036 (0.0031) +[2024-11-08 02:47:12,932][41694] Fps is (10 sec: 7373.1, 60 sec: 7031.5, 300 sec: 6872.9). Total num frames: 106676224. Throughput: 0: 1711.9. Samples: 21665202. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:47:12,934][41694] Avg episode reward: [(0, '4.342')] +[2024-11-08 02:47:13,884][42004] Updated weights for policy 0, policy_version 26046 (0.0033) +[2024-11-08 02:47:17,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6690.1, 300 sec: 6803.5). Total num frames: 106692608. Throughput: 0: 1673.6. Samples: 21668970. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:47:17,933][41694] Avg episode reward: [(0, '4.253')] +[2024-11-08 02:47:21,763][42004] Updated weights for policy 0, policy_version 26056 (0.0033) +[2024-11-08 02:47:22,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 106733568. Throughput: 0: 1627.6. Samples: 21677506. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:47:22,933][41694] Avg episode reward: [(0, '4.378')] +[2024-11-08 02:47:27,056][42004] Updated weights for policy 0, policy_version 26066 (0.0031) +[2024-11-08 02:47:27,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6690.1, 300 sec: 6803.5). Total num frames: 106770432. Throughput: 0: 1750.4. Samples: 21689100. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:47:27,933][41694] Avg episode reward: [(0, '4.365')] +[2024-11-08 02:47:32,028][42004] Updated weights for policy 0, policy_version 26076 (0.0020) +[2024-11-08 02:47:32,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 106811392. Throughput: 0: 1758.3. Samples: 21695220. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:47:32,935][41694] Avg episode reward: [(0, '4.334')] +[2024-11-08 02:47:37,226][42004] Updated weights for policy 0, policy_version 26086 (0.0022) +[2024-11-08 02:47:37,932][41694] Fps is (10 sec: 8191.8, 60 sec: 6895.0, 300 sec: 6900.7). Total num frames: 106852352. Throughput: 0: 1778.1. Samples: 21707262. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:47:37,933][41694] Avg episode reward: [(0, '4.470')] +[2024-11-08 02:47:37,943][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000026087_106852352.pth... +[2024-11-08 02:47:38,060][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000025688_105218048.pth +[2024-11-08 02:47:42,906][42004] Updated weights for policy 0, policy_version 26096 (0.0027) +[2024-11-08 02:47:42,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7236.3, 300 sec: 6900.7). Total num frames: 106889216. Throughput: 0: 1768.4. Samples: 21718074. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:47:42,934][41694] Avg episode reward: [(0, '4.512')] +[2024-11-08 02:47:47,932][41694] Fps is (10 sec: 7372.9, 60 sec: 7168.1, 300 sec: 6900.7). Total num frames: 106926080. Throughput: 0: 1772.5. Samples: 21723690. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:47:47,934][41694] Avg episode reward: [(0, '4.521')] +[2024-11-08 02:47:48,203][42004] Updated weights for policy 0, policy_version 26106 (0.0027) +[2024-11-08 02:47:52,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6963.2, 300 sec: 6831.3). Total num frames: 106946560. Throughput: 0: 1699.1. Samples: 21731240. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:47:52,935][41694] Avg episode reward: [(0, '4.585')] +[2024-11-08 02:47:55,846][42004] Updated weights for policy 0, policy_version 26116 (0.0033) +[2024-11-08 02:47:57,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6894.9, 300 sec: 6831.3). Total num frames: 106983424. Throughput: 0: 1721.8. Samples: 21742682. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:47:57,935][41694] Avg episode reward: [(0, '4.582')] +[2024-11-08 02:48:01,500][42004] Updated weights for policy 0, policy_version 26126 (0.0030) +[2024-11-08 02:48:02,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6831.3). Total num frames: 107020288. Throughput: 0: 1753.0. Samples: 21747854. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:48:02,943][41694] Avg episode reward: [(0, '4.517')] +[2024-11-08 02:48:07,376][42004] Updated weights for policy 0, policy_version 26136 (0.0032) +[2024-11-08 02:48:07,931][41694] Fps is (10 sec: 7373.2, 60 sec: 6963.2, 300 sec: 6879.2). Total num frames: 107057152. Throughput: 0: 1790.7. Samples: 21758086. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:48:07,933][41694] Avg episode reward: [(0, '4.421')] +[2024-11-08 02:48:12,771][42004] Updated weights for policy 0, policy_version 26146 (0.0030) +[2024-11-08 02:48:12,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6900.7). Total num frames: 107094016. Throughput: 0: 1787.5. Samples: 21769538. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 02:48:12,933][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 02:48:17,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7304.6, 300 sec: 6900.7). Total num frames: 107130880. Throughput: 0: 1777.6. Samples: 21775214. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 02:48:17,933][41694] Avg episode reward: [(0, '4.321')] +[2024-11-08 02:48:18,067][42004] Updated weights for policy 0, policy_version 26156 (0.0028) +[2024-11-08 02:48:22,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7304.5, 300 sec: 6900.7). Total num frames: 107171840. Throughput: 0: 1763.1. Samples: 21786600. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 02:48:22,933][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 02:48:25,685][42004] Updated weights for policy 0, policy_version 26166 (0.0028) +[2024-11-08 02:48:27,931][41694] Fps is (10 sec: 6144.0, 60 sec: 7031.5, 300 sec: 6831.3). Total num frames: 107192320. Throughput: 0: 1689.9. Samples: 21794120. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 02:48:27,934][41694] Avg episode reward: [(0, '4.352')] +[2024-11-08 02:48:31,079][42004] Updated weights for policy 0, policy_version 26176 (0.0034) +[2024-11-08 02:48:32,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6963.2, 300 sec: 6831.4). Total num frames: 107229184. Throughput: 0: 1687.2. Samples: 21799616. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:48:32,937][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 02:48:36,971][42004] Updated weights for policy 0, policy_version 26186 (0.0032) +[2024-11-08 02:48:37,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6817.5). Total num frames: 107261952. Throughput: 0: 1749.9. Samples: 21809986. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:48:37,934][41694] Avg episode reward: [(0, '4.350')] +[2024-11-08 02:48:42,251][42004] Updated weights for policy 0, policy_version 26196 (0.0038) +[2024-11-08 02:48:42,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6894.9, 300 sec: 6881.5). Total num frames: 107302912. Throughput: 0: 1760.6. Samples: 21821906. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:48:42,934][41694] Avg episode reward: [(0, '4.483')] +[2024-11-08 02:48:47,412][42004] Updated weights for policy 0, policy_version 26206 (0.0035) +[2024-11-08 02:48:47,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6894.9, 300 sec: 6900.7). Total num frames: 107339776. Throughput: 0: 1773.3. Samples: 21827652. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:48:47,933][41694] Avg episode reward: [(0, '4.239')] +[2024-11-08 02:48:52,832][42004] Updated weights for policy 0, policy_version 26216 (0.0022) +[2024-11-08 02:48:52,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7236.3, 300 sec: 6900.7). Total num frames: 107380736. Throughput: 0: 1801.8. Samples: 21839168. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:48:52,933][41694] Avg episode reward: [(0, '4.495')] +[2024-11-08 02:48:59,931][41694] Fps is (10 sec: 6144.5, 60 sec: 6936.9, 300 sec: 6840.5). Total num frames: 107413504. Throughput: 0: 1727.5. Samples: 21850730. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:48:59,932][41694] Avg episode reward: [(0, '4.311')] +[2024-11-08 02:49:00,490][42004] Updated weights for policy 0, policy_version 26226 (0.0026) +[2024-11-08 02:49:02,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6963.2, 300 sec: 6831.3). Total num frames: 107438080. Throughput: 0: 1706.7. Samples: 21852016. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:49:02,933][41694] Avg episode reward: [(0, '4.275')] +[2024-11-08 02:49:06,368][42004] Updated weights for policy 0, policy_version 26236 (0.0028) +[2024-11-08 02:49:07,931][41694] Fps is (10 sec: 7167.2, 60 sec: 6894.9, 300 sec: 6817.4). Total num frames: 107470848. Throughput: 0: 1687.8. Samples: 21862550. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:49:07,933][41694] Avg episode reward: [(0, '4.585')] +[2024-11-08 02:49:12,637][42004] Updated weights for policy 0, policy_version 26246 (0.0034) +[2024-11-08 02:49:12,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6826.7, 300 sec: 6789.6). Total num frames: 107503616. Throughput: 0: 1738.3. Samples: 21872342. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:49:12,937][41694] Avg episode reward: [(0, '4.278')] +[2024-11-08 02:49:17,856][42004] Updated weights for policy 0, policy_version 26256 (0.0035) +[2024-11-08 02:49:17,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6894.9, 300 sec: 6882.9). Total num frames: 107544576. Throughput: 0: 1741.2. Samples: 21877972. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:49:17,934][41694] Avg episode reward: [(0, '4.615')] +[2024-11-08 02:49:22,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6826.7, 300 sec: 6886.8). Total num frames: 107581440. Throughput: 0: 1775.4. Samples: 21889880. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:49:22,933][41694] Avg episode reward: [(0, '4.320')] +[2024-11-08 02:49:22,992][42004] Updated weights for policy 0, policy_version 26266 (0.0034) +[2024-11-08 02:49:27,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7168.0, 300 sec: 6900.8). Total num frames: 107622400. Throughput: 0: 1770.1. Samples: 21901562. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:49:27,935][41694] Avg episode reward: [(0, '4.301')] +[2024-11-08 02:49:28,386][42004] Updated weights for policy 0, policy_version 26276 (0.0027) +[2024-11-08 02:49:34,288][41694] Fps is (10 sec: 6492.4, 60 sec: 6942.8, 300 sec: 6855.3). Total num frames: 107655168. Throughput: 0: 1717.4. Samples: 21907264. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:49:34,289][41694] Avg episode reward: [(0, '4.501')] +[2024-11-08 02:49:35,938][42004] Updated weights for policy 0, policy_version 26286 (0.0028) +[2024-11-08 02:49:37,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6963.2, 300 sec: 6831.3). Total num frames: 107679744. Throughput: 0: 1676.2. Samples: 21914598. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:49:37,934][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 02:49:37,951][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000026289_107679744.pth... +[2024-11-08 02:49:38,068][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000025885_106024960.pth +[2024-11-08 02:49:41,904][42004] Updated weights for policy 0, policy_version 26296 (0.0037) +[2024-11-08 02:49:42,931][41694] Fps is (10 sec: 6634.1, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 107712512. Throughput: 0: 1723.6. Samples: 21924848. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:49:42,935][41694] Avg episode reward: [(0, '4.420')] +[2024-11-08 02:49:47,571][42004] Updated weights for policy 0, policy_version 26306 (0.0028) +[2024-11-08 02:49:47,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 107749376. Throughput: 0: 1726.4. Samples: 21929702. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:49:47,933][41694] Avg episode reward: [(0, '4.536')] +[2024-11-08 02:49:52,667][42004] Updated weights for policy 0, policy_version 26316 (0.0022) +[2024-11-08 02:49:52,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6826.7, 300 sec: 6898.7). Total num frames: 107790336. Throughput: 0: 1761.4. Samples: 21941814. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:49:52,933][41694] Avg episode reward: [(0, '4.398')] +[2024-11-08 02:49:57,697][42004] Updated weights for policy 0, policy_version 26326 (0.0038) +[2024-11-08 02:49:57,931][41694] Fps is (10 sec: 8192.1, 60 sec: 7203.2, 300 sec: 6928.5). Total num frames: 107831296. Throughput: 0: 1813.9. Samples: 21953968. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:49:57,934][41694] Avg episode reward: [(0, '4.616')] +[2024-11-08 02:50:02,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7099.7, 300 sec: 6900.7). Total num frames: 107864064. Throughput: 0: 1805.7. Samples: 21959228. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:50:02,933][41694] Avg episode reward: [(0, '4.491')] +[2024-11-08 02:50:03,492][42004] Updated weights for policy 0, policy_version 26336 (0.0022) +[2024-11-08 02:50:08,981][41694] Fps is (10 sec: 5560.6, 60 sec: 6910.6, 300 sec: 6848.6). Total num frames: 107892736. Throughput: 0: 1748.1. Samples: 21970380. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:50:08,983][41694] Avg episode reward: [(0, '4.494')] +[2024-11-08 02:50:11,603][42004] Updated weights for policy 0, policy_version 26346 (0.0026) +[2024-11-08 02:50:12,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6963.2, 300 sec: 6859.1). Total num frames: 107921408. Throughput: 0: 1664.8. Samples: 21976480. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:50:12,934][41694] Avg episode reward: [(0, '4.494')] +[2024-11-08 02:50:17,931][41694] Fps is (10 sec: 6406.5, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 107950080. Throughput: 0: 1690.5. Samples: 21981046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 02:50:17,933][41694] Avg episode reward: [(0, '4.342')] +[2024-11-08 02:50:18,342][42004] Updated weights for policy 0, policy_version 26356 (0.0038) +[2024-11-08 02:50:22,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 107986944. Throughput: 0: 1684.8. Samples: 21990416. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 02:50:22,933][41694] Avg episode reward: [(0, '4.295')] +[2024-11-08 02:50:23,898][42004] Updated weights for policy 0, policy_version 26366 (0.0018) +[2024-11-08 02:50:27,933][41694] Fps is (10 sec: 7371.7, 60 sec: 6690.0, 300 sec: 6900.8). Total num frames: 108023808. Throughput: 0: 1721.3. Samples: 22002308. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 02:50:27,935][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 02:50:29,287][42004] Updated weights for policy 0, policy_version 26376 (0.0025) +[2024-11-08 02:50:32,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6984.5, 300 sec: 6914.6). Total num frames: 108064768. Throughput: 0: 1744.0. Samples: 22008184. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 02:50:32,934][41694] Avg episode reward: [(0, '4.527')] +[2024-11-08 02:50:34,362][42004] Updated weights for policy 0, policy_version 26386 (0.0034) +[2024-11-08 02:50:37,932][41694] Fps is (10 sec: 7783.3, 60 sec: 7031.4, 300 sec: 6914.6). Total num frames: 108101632. Throughput: 0: 1732.3. Samples: 22019770. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:50:37,935][41694] Avg episode reward: [(0, '4.293')] +[2024-11-08 02:50:39,934][42004] Updated weights for policy 0, policy_version 26396 (0.0032) +[2024-11-08 02:50:43,601][41694] Fps is (10 sec: 5758.8, 60 sec: 6818.9, 300 sec: 6857.4). Total num frames: 108126208. Throughput: 0: 1562.4. Samples: 22025322. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:50:43,602][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 02:50:47,931][41694] Fps is (10 sec: 5325.0, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 108154880. Throughput: 0: 1611.1. Samples: 22031726. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:50:47,933][41694] Avg episode reward: [(0, '4.401')] +[2024-11-08 02:50:48,029][42004] Updated weights for policy 0, policy_version 26406 (0.0041) +[2024-11-08 02:50:52,935][41694] Fps is (10 sec: 6582.0, 60 sec: 6621.5, 300 sec: 6831.2). Total num frames: 108187648. Throughput: 0: 1625.6. Samples: 22041834. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:50:52,937][41694] Avg episode reward: [(0, '4.528')] +[2024-11-08 02:50:54,119][42004] Updated weights for policy 0, policy_version 26416 (0.0041) +[2024-11-08 02:50:57,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6859.1). Total num frames: 108228608. Throughput: 0: 1704.0. Samples: 22053158. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:50:57,933][41694] Avg episode reward: [(0, '4.308')] +[2024-11-08 02:50:59,299][42004] Updated weights for policy 0, policy_version 26426 (0.0030) +[2024-11-08 02:51:02,931][41694] Fps is (10 sec: 7785.2, 60 sec: 6690.1, 300 sec: 6928.5). Total num frames: 108265472. Throughput: 0: 1735.4. Samples: 22059140. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:51:02,933][41694] Avg episode reward: [(0, '4.443')] +[2024-11-08 02:51:04,763][42004] Updated weights for policy 0, policy_version 26436 (0.0021) +[2024-11-08 02:51:07,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6948.1, 300 sec: 6942.4). Total num frames: 108302336. Throughput: 0: 1773.2. Samples: 22070212. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:51:07,934][41694] Avg episode reward: [(0, '4.349')] +[2024-11-08 02:51:11,112][42004] Updated weights for policy 0, policy_version 26446 (0.0027) +[2024-11-08 02:51:12,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6895.0, 300 sec: 6928.5). Total num frames: 108335104. Throughput: 0: 1730.7. Samples: 22080186. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:51:12,934][41694] Avg episode reward: [(0, '4.356')] +[2024-11-08 02:51:18,248][41694] Fps is (10 sec: 5161.8, 60 sec: 6723.0, 300 sec: 6865.6). Total num frames: 108355584. Throughput: 0: 1698.5. Samples: 22085152. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:51:18,251][41694] Avg episode reward: [(0, '4.165')] +[2024-11-08 02:51:19,245][42004] Updated weights for policy 0, policy_version 26456 (0.0039) +[2024-11-08 02:51:22,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6621.9, 300 sec: 6831.3). Total num frames: 108384256. Throughput: 0: 1589.3. Samples: 22091290. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:51:22,935][41694] Avg episode reward: [(0, '4.145')] +[2024-11-08 02:51:25,511][42004] Updated weights for policy 0, policy_version 26466 (0.0041) +[2024-11-08 02:51:27,931][41694] Fps is (10 sec: 6344.4, 60 sec: 6553.7, 300 sec: 6817.4). Total num frames: 108417024. Throughput: 0: 1705.4. Samples: 22100924. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:51:27,933][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 02:51:31,181][42004] Updated weights for policy 0, policy_version 26476 (0.0037) +[2024-11-08 02:51:32,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6553.6, 300 sec: 6845.2). Total num frames: 108457984. Throughput: 0: 1662.8. Samples: 22106550. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:51:32,933][41694] Avg episode reward: [(0, '4.343')] +[2024-11-08 02:51:36,651][42004] Updated weights for policy 0, policy_version 26486 (0.0025) +[2024-11-08 02:51:37,932][41694] Fps is (10 sec: 7782.1, 60 sec: 6553.6, 300 sec: 6914.6). Total num frames: 108494848. Throughput: 0: 1693.3. Samples: 22118026. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:51:37,933][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 02:51:37,942][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000026488_108494848.pth... +[2024-11-08 02:51:38,257][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000026087_106852352.pth +[2024-11-08 02:51:41,916][42004] Updated weights for policy 0, policy_version 26496 (0.0049) +[2024-11-08 02:51:42,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6834.6, 300 sec: 6900.7). Total num frames: 108531712. Throughput: 0: 1697.8. Samples: 22129558. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:51:42,933][41694] Avg episode reward: [(0, '4.377')] +[2024-11-08 02:51:47,317][42004] Updated weights for policy 0, policy_version 26506 (0.0031) +[2024-11-08 02:51:47,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6963.1, 300 sec: 6928.5). Total num frames: 108572672. Throughput: 0: 1686.9. Samples: 22135052. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:51:47,937][41694] Avg episode reward: [(0, '4.395')] +[2024-11-08 02:51:52,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6758.8, 300 sec: 6859.1). Total num frames: 108593152. Throughput: 0: 1673.8. Samples: 22145534. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:51:52,933][41694] Avg episode reward: [(0, '4.548')] +[2024-11-08 02:51:55,245][42004] Updated weights for policy 0, policy_version 26516 (0.0025) +[2024-11-08 02:51:57,931][41694] Fps is (10 sec: 4915.4, 60 sec: 6553.6, 300 sec: 6845.2). Total num frames: 108621824. Throughput: 0: 1608.4. Samples: 22152562. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:51:57,933][41694] Avg episode reward: [(0, '4.212')] +[2024-11-08 02:52:01,535][42004] Updated weights for policy 0, policy_version 26526 (0.0031) +[2024-11-08 02:52:02,933][41694] Fps is (10 sec: 6552.4, 60 sec: 6553.4, 300 sec: 6845.1). Total num frames: 108658688. Throughput: 0: 1614.9. Samples: 22157314. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:52:02,935][41694] Avg episode reward: [(0, '4.390')] +[2024-11-08 02:52:07,027][42004] Updated weights for policy 0, policy_version 26536 (0.0030) +[2024-11-08 02:52:07,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6553.6, 300 sec: 6845.2). Total num frames: 108695552. Throughput: 0: 1711.9. Samples: 22168324. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:52:07,934][41694] Avg episode reward: [(0, '4.435')] +[2024-11-08 02:52:12,238][42004] Updated weights for policy 0, policy_version 26546 (0.0052) +[2024-11-08 02:52:12,931][41694] Fps is (10 sec: 7783.9, 60 sec: 6690.1, 300 sec: 6928.5). Total num frames: 108736512. Throughput: 0: 1765.2. Samples: 22180356. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:52:12,933][41694] Avg episode reward: [(0, '4.387')] +[2024-11-08 02:52:17,334][42004] Updated weights for policy 0, policy_version 26556 (0.0027) +[2024-11-08 02:52:17,933][41694] Fps is (10 sec: 7781.8, 60 sec: 6999.9, 300 sec: 6914.6). Total num frames: 108773376. Throughput: 0: 1769.3. Samples: 22186172. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:52:17,936][41694] Avg episode reward: [(0, '4.490')] +[2024-11-08 02:52:22,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7099.7, 300 sec: 6914.6). Total num frames: 108810240. Throughput: 0: 1755.7. Samples: 22197032. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:52:22,934][41694] Avg episode reward: [(0, '4.464')] +[2024-11-08 02:52:23,111][42004] Updated weights for policy 0, policy_version 26566 (0.0036) +[2024-11-08 02:52:27,932][41694] Fps is (10 sec: 5325.2, 60 sec: 6826.6, 300 sec: 6831.3). Total num frames: 108826624. Throughput: 0: 1654.0. Samples: 22203990. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:52:27,934][41694] Avg episode reward: [(0, '4.486')] +[2024-11-08 02:52:31,959][42004] Updated weights for policy 0, policy_version 26576 (0.0034) +[2024-11-08 02:52:32,931][41694] Fps is (10 sec: 4915.3, 60 sec: 6690.1, 300 sec: 6803.5). Total num frames: 108859392. Throughput: 0: 1628.4. Samples: 22208328. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:52:32,934][41694] Avg episode reward: [(0, '4.532')] +[2024-11-08 02:52:37,467][42004] Updated weights for policy 0, policy_version 26586 (0.0024) +[2024-11-08 02:52:37,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6690.2, 300 sec: 6803.5). Total num frames: 108896256. Throughput: 0: 1623.6. Samples: 22218596. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:52:37,933][41694] Avg episode reward: [(0, '4.541')] +[2024-11-08 02:52:42,679][42004] Updated weights for policy 0, policy_version 26596 (0.0036) +[2024-11-08 02:52:42,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 108937216. Throughput: 0: 1734.9. Samples: 22230632. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:52:42,933][41694] Avg episode reward: [(0, '4.230')] +[2024-11-08 02:52:47,787][42004] Updated weights for policy 0, policy_version 26606 (0.0025) +[2024-11-08 02:52:47,931][41694] Fps is (10 sec: 8192.0, 60 sec: 6758.5, 300 sec: 6886.8). Total num frames: 108978176. Throughput: 0: 1762.1. Samples: 22236606. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:52:47,933][41694] Avg episode reward: [(0, '4.665')] +[2024-11-08 02:52:52,931][41694] Fps is (10 sec: 7782.6, 60 sec: 7031.5, 300 sec: 6886.8). Total num frames: 109015040. Throughput: 0: 1780.4. Samples: 22248440. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:52:52,933][41694] Avg episode reward: [(0, '4.657')] +[2024-11-08 02:52:52,994][42004] Updated weights for policy 0, policy_version 26616 (0.0028) +[2024-11-08 02:52:57,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7236.3, 300 sec: 6900.7). Total num frames: 109056000. Throughput: 0: 1766.6. Samples: 22259852. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:52:57,933][41694] Avg episode reward: [(0, '4.508')] +[2024-11-08 02:52:58,456][42004] Updated weights for policy 0, policy_version 26626 (0.0034) +[2024-11-08 02:53:02,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6895.1, 300 sec: 6831.3). Total num frames: 109072384. Throughput: 0: 1741.4. Samples: 22264534. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:53:02,933][41694] Avg episode reward: [(0, '4.568')] +[2024-11-08 02:53:07,392][42004] Updated weights for policy 0, policy_version 26636 (0.0038) +[2024-11-08 02:53:07,931][41694] Fps is (10 sec: 4505.6, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 109101056. Throughput: 0: 1629.2. Samples: 22270346. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:53:07,933][41694] Avg episode reward: [(0, '4.696')] +[2024-11-08 02:53:12,641][42004] Updated weights for policy 0, policy_version 26646 (0.0050) +[2024-11-08 02:53:12,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 109142016. Throughput: 0: 1728.7. Samples: 22281780. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:53:12,933][41694] Avg episode reward: [(0, '4.594')] +[2024-11-08 02:53:17,674][42004] Updated weights for policy 0, policy_version 26656 (0.0031) +[2024-11-08 02:53:17,932][41694] Fps is (10 sec: 8191.9, 60 sec: 6826.8, 300 sec: 6817.4). Total num frames: 109182976. Throughput: 0: 1762.9. Samples: 22287660. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:53:17,934][41694] Avg episode reward: [(0, '4.274')] +[2024-11-08 02:53:22,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6826.7, 300 sec: 6872.9). Total num frames: 109219840. Throughput: 0: 1802.4. Samples: 22299704. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:53:22,933][41694] Avg episode reward: [(0, '4.611')] +[2024-11-08 02:53:22,946][42004] Updated weights for policy 0, policy_version 26666 (0.0024) +[2024-11-08 02:53:27,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7236.3, 300 sec: 6886.8). Total num frames: 109260800. Throughput: 0: 1793.1. Samples: 22311322. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:53:27,933][41694] Avg episode reward: [(0, '4.661')] +[2024-11-08 02:53:28,374][42004] Updated weights for policy 0, policy_version 26676 (0.0023) +[2024-11-08 02:53:32,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7304.5, 300 sec: 6900.7). Total num frames: 109297664. Throughput: 0: 1783.8. Samples: 22316878. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:53:32,933][41694] Avg episode reward: [(0, '4.585')] +[2024-11-08 02:53:33,600][42004] Updated weights for policy 0, policy_version 26686 (0.0030) +[2024-11-08 02:53:37,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6963.2, 300 sec: 6817.4). Total num frames: 109314048. Throughput: 0: 1696.7. Samples: 22324790. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:53:37,934][41694] Avg episode reward: [(0, '4.296')] +[2024-11-08 02:53:38,067][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000026689_109318144.pth... +[2024-11-08 02:53:38,194][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000026289_107679744.pth +[2024-11-08 02:53:42,349][42004] Updated weights for policy 0, policy_version 26696 (0.0036) +[2024-11-08 02:53:42,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6894.9, 300 sec: 6817.4). Total num frames: 109350912. Throughput: 0: 1638.8. Samples: 22333600. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:53:42,934][41694] Avg episode reward: [(0, '4.456')] +[2024-11-08 02:53:47,572][42004] Updated weights for policy 0, policy_version 26706 (0.0030) +[2024-11-08 02:53:47,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.6, 300 sec: 6803.5). Total num frames: 109387776. Throughput: 0: 1662.0. Samples: 22339326. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:53:47,935][41694] Avg episode reward: [(0, '4.176')] +[2024-11-08 02:53:52,715][42004] Updated weights for policy 0, policy_version 26716 (0.0023) +[2024-11-08 02:53:52,934][41694] Fps is (10 sec: 7780.3, 60 sec: 6894.6, 300 sec: 6877.8). Total num frames: 109428736. Throughput: 0: 1796.5. Samples: 22351192. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:53:52,937][41694] Avg episode reward: [(0, '4.544')] +[2024-11-08 02:53:57,926][42004] Updated weights for policy 0, policy_version 26726 (0.0030) +[2024-11-08 02:53:57,931][41694] Fps is (10 sec: 8192.1, 60 sec: 6894.9, 300 sec: 6886.8). Total num frames: 109469696. Throughput: 0: 1809.2. Samples: 22363194. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:53:57,932][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 02:54:02,932][41694] Fps is (10 sec: 7784.5, 60 sec: 7236.3, 300 sec: 6900.7). Total num frames: 109506560. Throughput: 0: 1807.2. Samples: 22368986. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:54:02,934][41694] Avg episode reward: [(0, '4.685')] +[2024-11-08 02:54:03,410][42004] Updated weights for policy 0, policy_version 26736 (0.0021) +[2024-11-08 02:54:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7372.8, 300 sec: 6914.6). Total num frames: 109543424. Throughput: 0: 1786.7. Samples: 22380106. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:54:07,934][41694] Avg episode reward: [(0, '4.518')] +[2024-11-08 02:54:11,374][42004] Updated weights for policy 0, policy_version 26746 (0.0041) +[2024-11-08 02:54:12,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6963.2, 300 sec: 6831.3). Total num frames: 109559808. Throughput: 0: 1669.0. Samples: 22386426. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:54:12,933][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 02:54:17,293][42004] Updated weights for policy 0, policy_version 26756 (0.0030) +[2024-11-08 02:54:17,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6894.9, 300 sec: 6831.3). Total num frames: 109596672. Throughput: 0: 1654.8. Samples: 22391342. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:54:17,933][41694] Avg episode reward: [(0, '4.553')] +[2024-11-08 02:54:22,448][42004] Updated weights for policy 0, policy_version 26766 (0.0027) +[2024-11-08 02:54:22,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6817.4). Total num frames: 109633536. Throughput: 0: 1738.4. Samples: 22403016. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:54:22,934][41694] Avg episode reward: [(0, '4.759')] +[2024-11-08 02:54:27,452][42004] Updated weights for policy 0, policy_version 26776 (0.0038) +[2024-11-08 02:54:27,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6894.9, 300 sec: 6876.8). Total num frames: 109674496. Throughput: 0: 1815.2. Samples: 22415282. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:54:27,933][41694] Avg episode reward: [(0, '4.588')] +[2024-11-08 02:54:32,509][42004] Updated weights for policy 0, policy_version 26786 (0.0036) +[2024-11-08 02:54:32,931][41694] Fps is (10 sec: 8192.1, 60 sec: 6963.2, 300 sec: 6900.7). Total num frames: 109715456. Throughput: 0: 1822.4. Samples: 22421334. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:54:32,933][41694] Avg episode reward: [(0, '4.527')] +[2024-11-08 02:54:37,677][42004] Updated weights for policy 0, policy_version 26796 (0.0028) +[2024-11-08 02:54:37,931][41694] Fps is (10 sec: 8192.2, 60 sec: 7372.8, 300 sec: 6928.5). Total num frames: 109756416. Throughput: 0: 1826.3. Samples: 22433372. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:54:37,935][41694] Avg episode reward: [(0, '4.429')] +[2024-11-08 02:54:42,932][41694] Fps is (10 sec: 7782.0, 60 sec: 7372.8, 300 sec: 6928.5). Total num frames: 109793280. Throughput: 0: 1809.4. Samples: 22444618. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:54:42,936][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 02:54:43,493][42004] Updated weights for policy 0, policy_version 26806 (0.0025) +[2024-11-08 02:54:47,932][41694] Fps is (10 sec: 5324.4, 60 sec: 7031.4, 300 sec: 6845.2). Total num frames: 109809664. Throughput: 0: 1731.5. Samples: 22446904. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:54:47,934][41694] Avg episode reward: [(0, '4.311')] +[2024-11-08 02:54:51,830][42004] Updated weights for policy 0, policy_version 26816 (0.0032) +[2024-11-08 02:54:52,931][41694] Fps is (10 sec: 5325.1, 60 sec: 6963.5, 300 sec: 6831.3). Total num frames: 109846528. Throughput: 0: 1678.6. Samples: 22455644. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:54:52,933][41694] Avg episode reward: [(0, '4.353')] +[2024-11-08 02:54:56,831][42004] Updated weights for policy 0, policy_version 26826 (0.0023) +[2024-11-08 02:54:57,931][41694] Fps is (10 sec: 7783.0, 60 sec: 6963.2, 300 sec: 6859.1). Total num frames: 109887488. Throughput: 0: 1805.3. Samples: 22467664. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:54:57,934][41694] Avg episode reward: [(0, '4.350')] +[2024-11-08 02:55:02,422][42004] Updated weights for policy 0, policy_version 26836 (0.0034) +[2024-11-08 02:55:02,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6963.2, 300 sec: 6911.4). Total num frames: 109924352. Throughput: 0: 1821.6. Samples: 22473312. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:55:02,933][41694] Avg episode reward: [(0, '4.423')] +[2024-11-08 02:55:07,645][42004] Updated weights for policy 0, policy_version 26846 (0.0026) +[2024-11-08 02:55:07,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6914.6). Total num frames: 109961216. Throughput: 0: 1814.0. Samples: 22484646. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:55:07,933][41694] Avg episode reward: [(0, '4.584')] +[2024-11-08 02:55:12,932][41694] Fps is (10 sec: 6963.2, 60 sec: 7236.3, 300 sec: 6928.5). Total num frames: 109993984. Throughput: 0: 1785.9. Samples: 22495648. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:55:12,943][41694] Avg episode reward: [(0, '4.234')] +[2024-11-08 02:55:13,607][42004] Updated weights for policy 0, policy_version 26856 (0.0026) +[2024-11-08 02:55:17,932][41694] Fps is (10 sec: 6553.4, 60 sec: 7168.0, 300 sec: 6914.6). Total num frames: 110026752. Throughput: 0: 1756.7. Samples: 22500384. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:55:17,942][41694] Avg episode reward: [(0, '4.485')] +[2024-11-08 02:55:22,651][42004] Updated weights for policy 0, policy_version 26866 (0.0032) +[2024-11-08 02:55:22,931][41694] Fps is (10 sec: 4915.3, 60 sec: 6826.7, 300 sec: 6845.2). Total num frames: 110043136. Throughput: 0: 1605.9. Samples: 22505636. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:55:22,934][41694] Avg episode reward: [(0, '4.573')] +[2024-11-08 02:55:27,931][41694] Fps is (10 sec: 5325.0, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 110080000. Throughput: 0: 1595.6. Samples: 22516420. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:55:27,933][41694] Avg episode reward: [(0, '4.477')] +[2024-11-08 02:55:28,114][42004] Updated weights for policy 0, policy_version 26876 (0.0023) +[2024-11-08 02:55:32,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 110120960. Throughput: 0: 1677.6. Samples: 22522396. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:55:32,933][41694] Avg episode reward: [(0, '4.361')] +[2024-11-08 02:55:33,131][42004] Updated weights for policy 0, policy_version 26886 (0.0023) +[2024-11-08 02:55:37,932][41694] Fps is (10 sec: 8191.8, 60 sec: 6758.4, 300 sec: 6916.4). Total num frames: 110161920. Throughput: 0: 1752.4. Samples: 22534504. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:55:37,934][41694] Avg episode reward: [(0, '4.395')] +[2024-11-08 02:55:37,946][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000026895_110161920.pth... +[2024-11-08 02:55:38,056][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000026488_108494848.pth +[2024-11-08 02:55:38,385][42004] Updated weights for policy 0, policy_version 26896 (0.0021) +[2024-11-08 02:55:42,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.4, 300 sec: 6928.5). Total num frames: 110198784. Throughput: 0: 1745.2. Samples: 22546200. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:55:42,934][41694] Avg episode reward: [(0, '4.303')] +[2024-11-08 02:55:43,632][42004] Updated weights for policy 0, policy_version 26906 (0.0023) +[2024-11-08 02:55:47,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7168.1, 300 sec: 6956.3). Total num frames: 110239744. Throughput: 0: 1748.6. Samples: 22552000. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:55:47,934][41694] Avg episode reward: [(0, '4.435')] +[2024-11-08 02:55:48,832][42004] Updated weights for policy 0, policy_version 26916 (0.0028) +[2024-11-08 02:55:52,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7099.7, 300 sec: 6928.5). Total num frames: 110272512. Throughput: 0: 1739.5. Samples: 22562924. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:55:52,933][41694] Avg episode reward: [(0, '4.558')] +[2024-11-08 02:55:54,992][42004] Updated weights for policy 0, policy_version 26926 (0.0031) +[2024-11-08 02:55:57,932][41694] Fps is (10 sec: 6963.0, 60 sec: 7031.4, 300 sec: 6928.5). Total num frames: 110309376. Throughput: 0: 1731.4. Samples: 22573560. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:55:57,934][41694] Avg episode reward: [(0, '4.657')] +[2024-11-08 02:56:00,631][42004] Updated weights for policy 0, policy_version 26936 (0.0033) +[2024-11-08 02:56:02,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7031.5, 300 sec: 6928.5). Total num frames: 110346240. Throughput: 0: 1743.3. Samples: 22578832. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:56:02,934][41694] Avg episode reward: [(0, '4.514')] +[2024-11-08 02:56:06,051][42004] Updated weights for policy 0, policy_version 26946 (0.0024) +[2024-11-08 02:56:07,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6963.2, 300 sec: 6928.5). Total num frames: 110379008. Throughput: 0: 1877.9. Samples: 22590140. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:56:07,933][41694] Avg episode reward: [(0, '4.254')] +[2024-11-08 02:56:12,939][41694] Fps is (10 sec: 4092.8, 60 sec: 6552.7, 300 sec: 6894.0). Total num frames: 110387200. Throughput: 0: 1722.9. Samples: 22593964. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:56:12,954][41694] Avg episode reward: [(0, '4.315')] +[2024-11-08 02:56:15,938][42004] Updated weights for policy 0, policy_version 26956 (0.0035) +[2024-11-08 02:56:17,931][41694] Fps is (10 sec: 4505.6, 60 sec: 6621.9, 300 sec: 6914.6). Total num frames: 110424064. Throughput: 0: 1698.1. Samples: 22598808. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:56:17,933][41694] Avg episode reward: [(0, '4.577')] +[2024-11-08 02:56:21,516][42004] Updated weights for policy 0, policy_version 26966 (0.0030) +[2024-11-08 02:56:22,932][41694] Fps is (10 sec: 7378.6, 60 sec: 6963.2, 300 sec: 6928.5). Total num frames: 110460928. Throughput: 0: 1673.9. Samples: 22609828. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:56:22,942][41694] Avg episode reward: [(0, '4.692')] +[2024-11-08 02:56:27,498][42004] Updated weights for policy 0, policy_version 26976 (0.0031) +[2024-11-08 02:56:27,931][41694] Fps is (10 sec: 6963.1, 60 sec: 6894.9, 300 sec: 6900.7). Total num frames: 110493696. Throughput: 0: 1643.7. Samples: 22620166. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:56:27,933][41694] Avg episode reward: [(0, '4.645')] +[2024-11-08 02:56:32,934][41694] Fps is (10 sec: 6961.4, 60 sec: 6826.4, 300 sec: 6900.7). Total num frames: 110530560. Throughput: 0: 1621.0. Samples: 22624948. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:56:32,937][41694] Avg episode reward: [(0, '4.430')] +[2024-11-08 02:56:33,184][42004] Updated weights for policy 0, policy_version 26986 (0.0026) +[2024-11-08 02:56:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6900.7). Total num frames: 110567424. Throughput: 0: 1636.0. Samples: 22636546. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:56:37,935][41694] Avg episode reward: [(0, '4.631')] +[2024-11-08 02:56:38,660][42004] Updated weights for policy 0, policy_version 26996 (0.0028) +[2024-11-08 02:56:42,931][41694] Fps is (10 sec: 7784.5, 60 sec: 6826.7, 300 sec: 6900.7). Total num frames: 110608384. Throughput: 0: 1654.6. Samples: 22648018. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:56:42,933][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 02:56:46,311][42004] Updated weights for policy 0, policy_version 27006 (0.0032) +[2024-11-08 02:56:47,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6417.1, 300 sec: 6886.8). Total num frames: 110624768. Throughput: 0: 1617.6. Samples: 22651624. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:56:47,939][41694] Avg episode reward: [(0, '4.355')] +[2024-11-08 02:56:51,804][42004] Updated weights for policy 0, policy_version 27016 (0.0031) +[2024-11-08 02:56:52,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6553.6, 300 sec: 6928.5). Total num frames: 110665728. Throughput: 0: 1560.8. Samples: 22660374. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:56:52,934][41694] Avg episode reward: [(0, '4.553')] +[2024-11-08 02:56:57,117][42004] Updated weights for policy 0, policy_version 27026 (0.0022) +[2024-11-08 02:56:57,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6553.6, 300 sec: 6928.5). Total num frames: 110702592. Throughput: 0: 1733.3. Samples: 22671950. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:56:57,933][41694] Avg episode reward: [(0, '4.560')] +[2024-11-08 02:57:02,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6900.7). Total num frames: 110731264. Throughput: 0: 1732.2. Samples: 22676758. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:57:02,935][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 02:57:03,738][42004] Updated weights for policy 0, policy_version 27036 (0.0056) +[2024-11-08 02:57:07,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6485.3, 300 sec: 6886.8). Total num frames: 110768128. Throughput: 0: 1699.0. Samples: 22686282. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:57:07,935][41694] Avg episode reward: [(0, '4.275')] +[2024-11-08 02:57:09,386][42004] Updated weights for policy 0, policy_version 27046 (0.0030) +[2024-11-08 02:57:12,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6964.1, 300 sec: 6886.9). Total num frames: 110804992. Throughput: 0: 1722.7. Samples: 22697686. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:57:12,934][41694] Avg episode reward: [(0, '4.609')] +[2024-11-08 02:57:14,693][42004] Updated weights for policy 0, policy_version 27056 (0.0023) +[2024-11-08 02:57:17,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7031.4, 300 sec: 6900.7). Total num frames: 110845952. Throughput: 0: 1743.7. Samples: 22703412. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:57:17,934][41694] Avg episode reward: [(0, '4.326')] +[2024-11-08 02:57:22,541][42004] Updated weights for policy 0, policy_version 27066 (0.0027) +[2024-11-08 02:57:22,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.1, 300 sec: 6900.7). Total num frames: 110862336. Throughput: 0: 1636.2. Samples: 22710176. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:57:22,933][41694] Avg episode reward: [(0, '4.459')] +[2024-11-08 02:57:27,932][41694] Fps is (10 sec: 5324.6, 60 sec: 6758.3, 300 sec: 6914.6). Total num frames: 110899200. Throughput: 0: 1619.1. Samples: 22720880. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:57:27,934][41694] Avg episode reward: [(0, '4.328')] +[2024-11-08 02:57:28,301][42004] Updated weights for policy 0, policy_version 27076 (0.0026) +[2024-11-08 02:57:32,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.7, 300 sec: 6914.6). Total num frames: 110936064. Throughput: 0: 1664.1. Samples: 22726510. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:57:32,937][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 02:57:33,918][42004] Updated weights for policy 0, policy_version 27086 (0.0033) +[2024-11-08 02:57:37,932][41694] Fps is (10 sec: 6963.5, 60 sec: 6690.1, 300 sec: 6886.8). Total num frames: 110968832. Throughput: 0: 1705.5. Samples: 22737124. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:57:37,935][41694] Avg episode reward: [(0, '4.369')] +[2024-11-08 02:57:37,945][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000027092_110968832.pth... +[2024-11-08 02:57:38,075][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000026689_109318144.pth +[2024-11-08 02:57:39,932][42004] Updated weights for policy 0, policy_version 27096 (0.0034) +[2024-11-08 02:57:42,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6621.9, 300 sec: 6872.9). Total num frames: 111005696. Throughput: 0: 1689.0. Samples: 22747956. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:57:42,933][41694] Avg episode reward: [(0, '4.210')] +[2024-11-08 02:57:45,297][42004] Updated weights for policy 0, policy_version 27106 (0.0029) +[2024-11-08 02:57:47,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7031.5, 300 sec: 6886.8). Total num frames: 111046656. Throughput: 0: 1711.6. Samples: 22753782. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:57:47,933][41694] Avg episode reward: [(0, '4.342')] +[2024-11-08 02:57:50,693][42004] Updated weights for policy 0, policy_version 27116 (0.0024) +[2024-11-08 02:57:52,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6872.9). Total num frames: 111083520. Throughput: 0: 1748.3. Samples: 22764954. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:57:52,933][41694] Avg episode reward: [(0, '4.452')] +[2024-11-08 02:57:57,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.1, 300 sec: 6886.8). Total num frames: 111104000. Throughput: 0: 1646.2. Samples: 22771764. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:57:57,934][41694] Avg episode reward: [(0, '4.366')] +[2024-11-08 02:57:58,375][42004] Updated weights for policy 0, policy_version 27126 (0.0030) +[2024-11-08 02:58:02,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6826.7, 300 sec: 6914.6). Total num frames: 111140864. Throughput: 0: 1647.6. Samples: 22777554. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:58:02,933][41694] Avg episode reward: [(0, '4.547')] +[2024-11-08 02:58:03,934][42004] Updated weights for policy 0, policy_version 27136 (0.0039) +[2024-11-08 02:58:07,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6900.7). Total num frames: 111177728. Throughput: 0: 1745.5. Samples: 22788724. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:58:07,934][41694] Avg episode reward: [(0, '4.580')] +[2024-11-08 02:58:09,572][42004] Updated weights for policy 0, policy_version 27146 (0.0026) +[2024-11-08 02:58:12,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 6873.0). Total num frames: 111210496. Throughput: 0: 1741.0. Samples: 22799226. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:58:12,934][41694] Avg episode reward: [(0, '4.306')] +[2024-11-08 02:58:15,135][42004] Updated weights for policy 0, policy_version 27156 (0.0032) +[2024-11-08 02:58:17,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6886.8). Total num frames: 111251456. Throughput: 0: 1744.5. Samples: 22805014. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:58:17,935][41694] Avg episode reward: [(0, '4.486')] +[2024-11-08 02:58:20,651][42004] Updated weights for policy 0, policy_version 27166 (0.0029) +[2024-11-08 02:58:22,932][41694] Fps is (10 sec: 7782.1, 60 sec: 7099.7, 300 sec: 6872.9). Total num frames: 111288320. Throughput: 0: 1756.6. Samples: 22816172. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:58:22,934][41694] Avg episode reward: [(0, '4.641')] +[2024-11-08 02:58:26,140][42004] Updated weights for policy 0, policy_version 27176 (0.0031) +[2024-11-08 02:58:30,192][41694] Fps is (10 sec: 6013.7, 60 sec: 6842.1, 300 sec: 6820.7). Total num frames: 111325184. Throughput: 0: 1682.1. Samples: 22827452. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:58:30,193][41694] Avg episode reward: [(0, '4.231')] +[2024-11-08 02:58:32,931][41694] Fps is (10 sec: 5325.0, 60 sec: 6758.4, 300 sec: 6873.0). Total num frames: 111341568. Throughput: 0: 1665.6. Samples: 22828734. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:58:32,935][41694] Avg episode reward: [(0, '4.357')] +[2024-11-08 02:58:34,151][42004] Updated weights for policy 0, policy_version 27186 (0.0030) +[2024-11-08 02:58:37,931][41694] Fps is (10 sec: 6879.6, 60 sec: 6826.7, 300 sec: 6873.0). Total num frames: 111378432. Throughput: 0: 1653.5. Samples: 22839360. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:58:37,934][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 02:58:39,742][42004] Updated weights for policy 0, policy_version 27196 (0.0038) +[2024-11-08 02:58:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6873.0). Total num frames: 111415296. Throughput: 0: 1732.1. Samples: 22849710. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:58:42,933][41694] Avg episode reward: [(0, '4.234')] +[2024-11-08 02:58:45,825][42004] Updated weights for policy 0, policy_version 27206 (0.0031) +[2024-11-08 02:58:47,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.2, 300 sec: 6845.2). Total num frames: 111448064. Throughput: 0: 1716.6. Samples: 22854802. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:58:47,933][41694] Avg episode reward: [(0, '4.306')] +[2024-11-08 02:58:51,351][42004] Updated weights for policy 0, policy_version 27216 (0.0027) +[2024-11-08 02:58:52,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 6831.3). Total num frames: 111484928. Throughput: 0: 1717.6. Samples: 22866016. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:58:52,933][41694] Avg episode reward: [(0, '4.346')] +[2024-11-08 02:58:56,788][42004] Updated weights for policy 0, policy_version 27226 (0.0035) +[2024-11-08 02:58:57,932][41694] Fps is (10 sec: 7782.2, 60 sec: 7031.5, 300 sec: 6845.2). Total num frames: 111525888. Throughput: 0: 1737.6. Samples: 22877416. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:58:57,934][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 02:59:02,308][42004] Updated weights for policy 0, policy_version 27236 (0.0030) +[2024-11-08 02:59:04,581][41694] Fps is (10 sec: 6329.1, 60 sec: 6776.9, 300 sec: 6793.3). Total num frames: 111558656. Throughput: 0: 1670.1. Samples: 22882924. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:59:04,582][41694] Avg episode reward: [(0, '4.412')] +[2024-11-08 02:59:07,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6859.1). Total num frames: 111583232. Throughput: 0: 1641.0. Samples: 22890018. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:59:07,933][41694] Avg episode reward: [(0, '4.451')] +[2024-11-08 02:59:09,903][42004] Updated weights for policy 0, policy_version 27246 (0.0029) +[2024-11-08 02:59:12,931][41694] Fps is (10 sec: 7357.2, 60 sec: 6826.7, 300 sec: 6859.1). Total num frames: 111620096. Throughput: 0: 1728.8. Samples: 22901340. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:59:12,933][41694] Avg episode reward: [(0, '4.335')] +[2024-11-08 02:59:15,873][42004] Updated weights for policy 0, policy_version 27256 (0.0034) +[2024-11-08 02:59:17,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 6845.2). Total num frames: 111652864. Throughput: 0: 1724.5. Samples: 22906338. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:59:17,933][41694] Avg episode reward: [(0, '4.517')] +[2024-11-08 02:59:21,494][42004] Updated weights for policy 0, policy_version 27266 (0.0028) +[2024-11-08 02:59:22,940][41694] Fps is (10 sec: 6957.4, 60 sec: 6689.2, 300 sec: 6831.1). Total num frames: 111689728. Throughput: 0: 1724.7. Samples: 22916984. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:59:22,942][41694] Avg episode reward: [(0, '4.550')] +[2024-11-08 02:59:26,888][42004] Updated weights for policy 0, policy_version 27276 (0.0030) +[2024-11-08 02:59:27,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6952.0, 300 sec: 6817.4). Total num frames: 111726592. Throughput: 0: 1751.2. Samples: 22928516. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:59:27,934][41694] Avg episode reward: [(0, '4.339')] +[2024-11-08 02:59:32,229][42004] Updated weights for policy 0, policy_version 27286 (0.0034) +[2024-11-08 02:59:32,931][41694] Fps is (10 sec: 7788.9, 60 sec: 7099.7, 300 sec: 6817.4). Total num frames: 111767552. Throughput: 0: 1763.7. Samples: 22934168. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:59:32,933][41694] Avg episode reward: [(0, '4.558')] +[2024-11-08 02:59:39,141][41694] Fps is (10 sec: 6212.0, 60 sec: 6825.6, 300 sec: 6761.9). Total num frames: 111796224. Throughput: 0: 1724.6. Samples: 22945708. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:59:39,146][41694] Avg episode reward: [(0, '4.508')] +[2024-11-08 02:59:39,155][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000027294_111796224.pth... +[2024-11-08 02:59:39,324][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000026895_110161920.pth +[2024-11-08 02:59:39,915][42004] Updated weights for policy 0, policy_version 27296 (0.0025) +[2024-11-08 02:59:42,931][41694] Fps is (10 sec: 5734.3, 60 sec: 6826.7, 300 sec: 6831.3). Total num frames: 111824896. Throughput: 0: 1667.6. Samples: 22952460. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:59:42,934][41694] Avg episode reward: [(0, '4.414')] +[2024-11-08 02:59:45,554][42004] Updated weights for policy 0, policy_version 27306 (0.0023) +[2024-11-08 02:59:47,932][41694] Fps is (10 sec: 6989.2, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 111857664. Throughput: 0: 1727.7. Samples: 22957820. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:59:47,937][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 02:59:51,573][42004] Updated weights for policy 0, policy_version 27316 (0.0038) +[2024-11-08 02:59:52,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6826.6, 300 sec: 6803.5). Total num frames: 111894528. Throughput: 0: 1736.5. Samples: 22968162. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:59:52,936][41694] Avg episode reward: [(0, '4.325')] +[2024-11-08 02:59:57,276][42004] Updated weights for policy 0, policy_version 27326 (0.0049) +[2024-11-08 02:59:57,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 111931392. Throughput: 0: 1720.7. Samples: 22978770. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:59:57,935][41694] Avg episode reward: [(0, '4.555')] +[2024-11-08 03:00:02,825][42004] Updated weights for policy 0, policy_version 27336 (0.0025) +[2024-11-08 03:00:02,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7019.6, 300 sec: 6803.5). Total num frames: 111968256. Throughput: 0: 1736.6. Samples: 22984486. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:00:02,933][41694] Avg episode reward: [(0, '4.339')] +[2024-11-08 03:00:07,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7031.5, 300 sec: 6817.4). Total num frames: 112005120. Throughput: 0: 1748.3. Samples: 22995642. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:00:07,933][41694] Avg episode reward: [(0, '4.380')] +[2024-11-08 03:00:08,181][42004] Updated weights for policy 0, policy_version 27346 (0.0026) +[2024-11-08 03:00:13,713][41694] Fps is (10 sec: 5698.6, 60 sec: 6738.9, 300 sec: 6771.7). Total num frames: 112029696. Throughput: 0: 1596.5. Samples: 23001608. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:00:13,715][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 03:00:16,484][42004] Updated weights for policy 0, policy_version 27356 (0.0037) +[2024-11-08 03:00:17,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 112058368. Throughput: 0: 1634.5. Samples: 23007722. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:00:17,933][41694] Avg episode reward: [(0, '4.472')] +[2024-11-08 03:00:22,607][42004] Updated weights for policy 0, policy_version 27366 (0.0027) +[2024-11-08 03:00:22,931][41694] Fps is (10 sec: 6665.0, 60 sec: 6691.1, 300 sec: 6817.4). Total num frames: 112091136. Throughput: 0: 1643.2. Samples: 23017666. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:00:22,933][41694] Avg episode reward: [(0, '4.258')] +[2024-11-08 03:00:27,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 6803.5). Total num frames: 112128000. Throughput: 0: 1682.6. Samples: 23028178. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:00:27,933][41694] Avg episode reward: [(0, '4.431')] +[2024-11-08 03:00:28,269][42004] Updated weights for policy 0, policy_version 27376 (0.0024) +[2024-11-08 03:00:32,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6789.6). Total num frames: 112164864. Throughput: 0: 1690.8. Samples: 23033906. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:00:32,939][41694] Avg episode reward: [(0, '4.596')] +[2024-11-08 03:00:33,555][42004] Updated weights for policy 0, policy_version 27386 (0.0028) +[2024-11-08 03:00:37,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6967.1, 300 sec: 6803.5). Total num frames: 112205824. Throughput: 0: 1715.9. Samples: 23045378. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:00:37,933][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 03:00:38,934][42004] Updated weights for policy 0, policy_version 27396 (0.0024) +[2024-11-08 03:00:42,932][41694] Fps is (10 sec: 7781.8, 60 sec: 6963.1, 300 sec: 6789.6). Total num frames: 112242688. Throughput: 0: 1736.2. Samples: 23056898. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:00:42,936][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 03:00:44,330][42004] Updated weights for policy 0, policy_version 27406 (0.0022) +[2024-11-08 03:00:48,284][41694] Fps is (10 sec: 5539.1, 60 sec: 6718.9, 300 sec: 6739.9). Total num frames: 112263168. Throughput: 0: 1719.7. Samples: 23062480. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:00:48,287][41694] Avg episode reward: [(0, '4.595')] +[2024-11-08 03:00:52,314][42004] Updated weights for policy 0, policy_version 27416 (0.0033) +[2024-11-08 03:00:52,931][41694] Fps is (10 sec: 5734.8, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 112300032. Throughput: 0: 1629.6. Samples: 23068974. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:00:52,933][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 03:00:57,932][41694] Fps is (10 sec: 7217.6, 60 sec: 6690.1, 300 sec: 6734.1). Total num frames: 112332800. Throughput: 0: 1767.7. Samples: 23079774. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:00:57,933][41694] Avg episode reward: [(0, '4.459')] +[2024-11-08 03:00:58,059][42004] Updated weights for policy 0, policy_version 27426 (0.0030) +[2024-11-08 03:01:02,933][41694] Fps is (10 sec: 6962.4, 60 sec: 6690.0, 300 sec: 6748.0). Total num frames: 112369664. Throughput: 0: 1720.5. Samples: 23085148. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:01:02,936][41694] Avg episode reward: [(0, '4.407')] +[2024-11-08 03:01:03,671][42004] Updated weights for policy 0, policy_version 27436 (0.0024) +[2024-11-08 03:01:07,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6845.4). Total num frames: 112406528. Throughput: 0: 1747.9. Samples: 23096322. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:01:07,936][41694] Avg episode reward: [(0, '4.655')] +[2024-11-08 03:01:09,064][42004] Updated weights for policy 0, policy_version 27446 (0.0028) +[2024-11-08 03:01:12,931][41694] Fps is (10 sec: 7373.6, 60 sec: 6985.9, 300 sec: 6845.2). Total num frames: 112443392. Throughput: 0: 1752.0. Samples: 23107020. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:01:12,934][41694] Avg episode reward: [(0, '4.319')] +[2024-11-08 03:01:15,056][42004] Updated weights for policy 0, policy_version 27456 (0.0030) +[2024-11-08 03:01:17,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7031.5, 300 sec: 6845.2). Total num frames: 112480256. Throughput: 0: 1744.1. Samples: 23112390. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:01:17,933][41694] Avg episode reward: [(0, '4.529')] +[2024-11-08 03:01:22,874][42004] Updated weights for policy 0, policy_version 27466 (0.0032) +[2024-11-08 03:01:22,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 112500736. Throughput: 0: 1717.3. Samples: 23122654. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:01:22,932][41694] Avg episode reward: [(0, '4.352')] +[2024-11-08 03:01:27,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6826.7, 300 sec: 6803.6). Total num frames: 112537600. Throughput: 0: 1634.7. Samples: 23130458. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:01:27,936][41694] Avg episode reward: [(0, '4.358')] +[2024-11-08 03:01:28,490][42004] Updated weights for policy 0, policy_version 27476 (0.0037) +[2024-11-08 03:01:32,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 112570368. Throughput: 0: 1633.4. Samples: 23135408. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:01:32,933][41694] Avg episode reward: [(0, '4.584')] +[2024-11-08 03:01:34,448][42004] Updated weights for policy 0, policy_version 27486 (0.0025) +[2024-11-08 03:01:37,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 112607232. Throughput: 0: 1715.5. Samples: 23146170. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:01:37,934][41694] Avg episode reward: [(0, '4.399')] +[2024-11-08 03:01:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000027492_112607232.pth... +[2024-11-08 03:01:38,052][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000027092_110968832.pth +[2024-11-08 03:01:39,862][42004] Updated weights for policy 0, policy_version 27496 (0.0031) +[2024-11-08 03:01:42,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.2, 300 sec: 6845.2). Total num frames: 112644096. Throughput: 0: 1725.8. Samples: 23157436. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:01:42,933][41694] Avg episode reward: [(0, '4.534')] +[2024-11-08 03:01:45,264][42004] Updated weights for policy 0, policy_version 27506 (0.0027) +[2024-11-08 03:01:47,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7073.0, 300 sec: 6845.2). Total num frames: 112685056. Throughput: 0: 1734.8. Samples: 23163214. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:01:47,934][41694] Avg episode reward: [(0, '4.591')] +[2024-11-08 03:01:50,660][42004] Updated weights for policy 0, policy_version 27516 (0.0032) +[2024-11-08 03:01:52,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7031.4, 300 sec: 6845.2). Total num frames: 112721920. Throughput: 0: 1743.5. Samples: 23174780. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:01:52,934][41694] Avg episode reward: [(0, '4.575')] +[2024-11-08 03:01:57,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 112742400. Throughput: 0: 1679.2. Samples: 23182582. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:01:57,934][41694] Avg episode reward: [(0, '4.523')] +[2024-11-08 03:01:58,280][42004] Updated weights for policy 0, policy_version 27526 (0.0031) +[2024-11-08 03:02:02,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6758.5, 300 sec: 6803.5). Total num frames: 112775168. Throughput: 0: 1671.3. Samples: 23187600. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:02:02,934][41694] Avg episode reward: [(0, '4.655')] +[2024-11-08 03:02:04,706][42004] Updated weights for policy 0, policy_version 27536 (0.0033) +[2024-11-08 03:02:07,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6690.1, 300 sec: 6789.6). Total num frames: 112807936. Throughput: 0: 1641.2. Samples: 23196506. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:02:07,934][41694] Avg episode reward: [(0, '4.325')] +[2024-11-08 03:02:10,590][42004] Updated weights for policy 0, policy_version 27546 (0.0024) +[2024-11-08 03:02:12,933][41694] Fps is (10 sec: 6962.5, 60 sec: 6690.0, 300 sec: 6775.7). Total num frames: 112844800. Throughput: 0: 1708.2. Samples: 23207328. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:02:12,934][41694] Avg episode reward: [(0, '4.363')] +[2024-11-08 03:02:15,954][42004] Updated weights for policy 0, policy_version 27556 (0.0027) +[2024-11-08 03:02:17,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6845.2). Total num frames: 112881664. Throughput: 0: 1722.3. Samples: 23212912. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:02:17,933][41694] Avg episode reward: [(0, '4.542')] +[2024-11-08 03:02:21,512][42004] Updated weights for policy 0, policy_version 27566 (0.0020) +[2024-11-08 03:02:22,932][41694] Fps is (10 sec: 7373.5, 60 sec: 6963.2, 300 sec: 6845.2). Total num frames: 112918528. Throughput: 0: 1731.6. Samples: 23224094. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:02:22,935][41694] Avg episode reward: [(0, '4.405')] +[2024-11-08 03:02:26,908][42004] Updated weights for policy 0, policy_version 27576 (0.0021) +[2024-11-08 03:02:27,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7031.5, 300 sec: 6859.1). Total num frames: 112959488. Throughput: 0: 1738.3. Samples: 23235658. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:02:27,934][41694] Avg episode reward: [(0, '4.433')] +[2024-11-08 03:02:32,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 112979968. Throughput: 0: 1719.9. Samples: 23240610. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:02:32,934][41694] Avg episode reward: [(0, '4.402')] +[2024-11-08 03:02:34,424][42004] Updated weights for policy 0, policy_version 27586 (0.0026) +[2024-11-08 03:02:37,931][41694] Fps is (10 sec: 5324.7, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 113012736. Throughput: 0: 1636.7. Samples: 23248430. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:02:37,933][41694] Avg episode reward: [(0, '4.332')] +[2024-11-08 03:02:40,551][42004] Updated weights for policy 0, policy_version 27596 (0.0033) +[2024-11-08 03:02:42,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 113045504. Throughput: 0: 1682.2. Samples: 23258280. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:02:42,933][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 03:02:46,378][42004] Updated weights for policy 0, policy_version 27606 (0.0030) +[2024-11-08 03:02:47,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6621.9, 300 sec: 6775.8). Total num frames: 113082368. Throughput: 0: 1688.2. Samples: 23263568. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:02:47,933][41694] Avg episode reward: [(0, '4.414')] +[2024-11-08 03:02:51,962][42004] Updated weights for policy 0, policy_version 27616 (0.0024) +[2024-11-08 03:02:52,933][41694] Fps is (10 sec: 7372.4, 60 sec: 6621.8, 300 sec: 6831.3). Total num frames: 113119232. Throughput: 0: 1731.6. Samples: 23274428. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:02:52,936][41694] Avg episode reward: [(0, '4.602')] +[2024-11-08 03:02:57,150][42004] Updated weights for policy 0, policy_version 27626 (0.0029) +[2024-11-08 03:02:57,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6845.2). Total num frames: 113160192. Throughput: 0: 1756.5. Samples: 23286370. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:02:57,933][41694] Avg episode reward: [(0, '4.479')] +[2024-11-08 03:03:02,618][42004] Updated weights for policy 0, policy_version 27636 (0.0030) +[2024-11-08 03:03:02,931][41694] Fps is (10 sec: 7783.1, 60 sec: 7031.5, 300 sec: 6845.2). Total num frames: 113197056. Throughput: 0: 1754.0. Samples: 23291844. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:03:02,934][41694] Avg episode reward: [(0, '4.435')] +[2024-11-08 03:03:07,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 113217536. Throughput: 0: 1677.7. Samples: 23299590. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:03:07,934][41694] Avg episode reward: [(0, '4.349')] +[2024-11-08 03:03:10,537][42004] Updated weights for policy 0, policy_version 27646 (0.0037) +[2024-11-08 03:03:12,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6758.5, 300 sec: 6775.8). Total num frames: 113250304. Throughput: 0: 1635.2. Samples: 23309244. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:03:12,933][41694] Avg episode reward: [(0, '4.272')] +[2024-11-08 03:03:16,519][42004] Updated weights for policy 0, policy_version 27656 (0.0027) +[2024-11-08 03:03:17,936][41694] Fps is (10 sec: 6959.9, 60 sec: 6757.9, 300 sec: 6775.7). Total num frames: 113287168. Throughput: 0: 1639.5. Samples: 23314394. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:03:17,938][41694] Avg episode reward: [(0, '4.517')] +[2024-11-08 03:03:22,007][42004] Updated weights for policy 0, policy_version 27666 (0.0030) +[2024-11-08 03:03:22,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6828.1). Total num frames: 113324032. Throughput: 0: 1712.8. Samples: 23325506. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:03:22,934][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 03:03:27,393][42004] Updated weights for policy 0, policy_version 27676 (0.0032) +[2024-11-08 03:03:27,932][41694] Fps is (10 sec: 7376.3, 60 sec: 6690.1, 300 sec: 6845.2). Total num frames: 113360896. Throughput: 0: 1749.6. Samples: 23337014. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:03:27,934][41694] Avg episode reward: [(0, '4.340')] +[2024-11-08 03:03:32,756][42004] Updated weights for policy 0, policy_version 27686 (0.0031) +[2024-11-08 03:03:32,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7031.5, 300 sec: 6859.1). Total num frames: 113401856. Throughput: 0: 1753.6. Samples: 23342480. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:03:32,934][41694] Avg episode reward: [(0, '4.449')] +[2024-11-08 03:03:37,932][41694] Fps is (10 sec: 7782.1, 60 sec: 7099.7, 300 sec: 6859.0). Total num frames: 113438720. Throughput: 0: 1773.9. Samples: 23354252. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:03:37,933][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 03:03:37,989][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000027696_113442816.pth... +[2024-11-08 03:03:37,996][42004] Updated weights for policy 0, policy_version 27696 (0.0026) +[2024-11-08 03:03:38,100][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000027294_111796224.pth +[2024-11-08 03:03:42,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6895.0, 300 sec: 6817.4). Total num frames: 113459200. Throughput: 0: 1666.1. Samples: 23361346. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:03:42,933][41694] Avg episode reward: [(0, '4.565')] +[2024-11-08 03:03:46,550][42004] Updated weights for policy 0, policy_version 27706 (0.0025) +[2024-11-08 03:03:47,932][41694] Fps is (10 sec: 5325.0, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 113491968. Throughput: 0: 1647.7. Samples: 23365990. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:03:47,934][41694] Avg episode reward: [(0, '4.717')] +[2024-11-08 03:03:52,577][42004] Updated weights for policy 0, policy_version 27716 (0.0029) +[2024-11-08 03:03:52,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6758.5, 300 sec: 6775.8). Total num frames: 113524736. Throughput: 0: 1707.9. Samples: 23376444. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:03:52,933][41694] Avg episode reward: [(0, '4.312')] +[2024-11-08 03:03:57,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.2, 300 sec: 6827.8). Total num frames: 113561600. Throughput: 0: 1709.8. Samples: 23386186. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:03:57,933][41694] Avg episode reward: [(0, '4.566')] +[2024-11-08 03:03:58,449][42004] Updated weights for policy 0, policy_version 27726 (0.0024) +[2024-11-08 03:04:02,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6621.9, 300 sec: 6817.4). Total num frames: 113594368. Throughput: 0: 1721.5. Samples: 23391852. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:04:02,934][41694] Avg episode reward: [(0, '4.507')] +[2024-11-08 03:04:04,391][42004] Updated weights for policy 0, policy_version 27736 (0.0034) +[2024-11-08 03:04:07,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6894.9, 300 sec: 6817.4). Total num frames: 113631232. Throughput: 0: 1698.0. Samples: 23401918. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:04:07,934][41694] Avg episode reward: [(0, '4.566')] +[2024-11-08 03:04:10,010][42004] Updated weights for policy 0, policy_version 27746 (0.0025) +[2024-11-08 03:04:12,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 6831.3). Total num frames: 113668096. Throughput: 0: 1692.0. Samples: 23413152. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:04:12,933][41694] Avg episode reward: [(0, '4.626')] +[2024-11-08 03:04:17,918][42004] Updated weights for policy 0, policy_version 27756 (0.0034) +[2024-11-08 03:04:17,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.7, 300 sec: 6775.9). Total num frames: 113688576. Throughput: 0: 1623.1. Samples: 23415518. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:04:17,933][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 03:04:22,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6621.8, 300 sec: 6761.9). Total num frames: 113721344. Throughput: 0: 1569.4. Samples: 23424874. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:04:22,934][41694] Avg episode reward: [(0, '4.512')] +[2024-11-08 03:04:23,723][42004] Updated weights for policy 0, policy_version 27766 (0.0029) +[2024-11-08 03:04:27,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6621.9, 300 sec: 6748.0). Total num frames: 113758208. Throughput: 0: 1655.3. Samples: 23435834. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:04:27,933][41694] Avg episode reward: [(0, '4.407')] +[2024-11-08 03:04:29,330][42004] Updated weights for policy 0, policy_version 27776 (0.0043) +[2024-11-08 03:04:32,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6553.6, 300 sec: 6803.6). Total num frames: 113795072. Throughput: 0: 1677.3. Samples: 23441470. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:04:32,933][41694] Avg episode reward: [(0, '4.268')] +[2024-11-08 03:04:34,688][42004] Updated weights for policy 0, policy_version 27786 (0.0029) +[2024-11-08 03:04:37,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6553.7, 300 sec: 6803.5). Total num frames: 113831936. Throughput: 0: 1698.0. Samples: 23452856. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:04:37,934][41694] Avg episode reward: [(0, '4.400')] +[2024-11-08 03:04:40,196][42004] Updated weights for policy 0, policy_version 27796 (0.0025) +[2024-11-08 03:04:42,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.6, 300 sec: 6817.4). Total num frames: 113868800. Throughput: 0: 1726.0. Samples: 23463858. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:04:42,934][41694] Avg episode reward: [(0, '4.440')] +[2024-11-08 03:04:45,653][42004] Updated weights for policy 0, policy_version 27806 (0.0036) +[2024-11-08 03:04:49,954][41694] Fps is (10 sec: 6132.7, 60 sec: 6670.1, 300 sec: 6771.0). Total num frames: 113905664. Throughput: 0: 1653.8. Samples: 23469618. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:04:49,956][41694] Avg episode reward: [(0, '4.505')] +[2024-11-08 03:04:52,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 113926144. Throughput: 0: 1663.6. Samples: 23476782. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:04:52,934][41694] Avg episode reward: [(0, '4.422')] +[2024-11-08 03:04:53,663][42004] Updated weights for policy 0, policy_version 27816 (0.0035) +[2024-11-08 03:04:57,933][41694] Fps is (10 sec: 7187.0, 60 sec: 6690.0, 300 sec: 6761.9). Total num frames: 113963008. Throughput: 0: 1640.4. Samples: 23486974. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:04:57,935][41694] Avg episode reward: [(0, '4.309')] +[2024-11-08 03:04:59,302][42004] Updated weights for policy 0, policy_version 27826 (0.0041) +[2024-11-08 03:05:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 113999872. Throughput: 0: 1716.9. Samples: 23492780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:05:02,934][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 03:05:04,925][42004] Updated weights for policy 0, policy_version 27836 (0.0036) +[2024-11-08 03:05:07,932][41694] Fps is (10 sec: 7373.4, 60 sec: 6758.4, 300 sec: 6821.6). Total num frames: 114036736. Throughput: 0: 1752.1. Samples: 23503718. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:05:07,934][41694] Avg episode reward: [(0, '4.251')] +[2024-11-08 03:05:10,405][42004] Updated weights for policy 0, policy_version 27846 (0.0025) +[2024-11-08 03:05:12,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 114073600. Throughput: 0: 1750.2. Samples: 23514594. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:05:12,933][41694] Avg episode reward: [(0, '4.355')] +[2024-11-08 03:05:16,562][42004] Updated weights for policy 0, policy_version 27856 (0.0027) +[2024-11-08 03:05:17,939][41694] Fps is (10 sec: 6963.2, 60 sec: 6963.2, 300 sec: 6831.3). Total num frames: 114106368. Throughput: 0: 1730.1. Samples: 23519326. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:05:17,944][41694] Avg episode reward: [(0, '4.219')] +[2024-11-08 03:05:22,090][42004] Updated weights for policy 0, policy_version 27866 (0.0022) +[2024-11-08 03:05:24,515][41694] Fps is (10 sec: 5657.5, 60 sec: 6784.1, 300 sec: 6781.0). Total num frames: 114139136. Throughput: 0: 1659.4. Samples: 23530156. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:05:24,517][41694] Avg episode reward: [(0, '4.491')] +[2024-11-08 03:05:27,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 114159616. Throughput: 0: 1613.2. Samples: 23536454. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:05:27,934][41694] Avg episode reward: [(0, '4.447')] +[2024-11-08 03:05:30,539][42004] Updated weights for policy 0, policy_version 27876 (0.0026) +[2024-11-08 03:05:32,932][41694] Fps is (10 sec: 6813.3, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 114196480. Throughput: 0: 1674.0. Samples: 23541562. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:05:32,933][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 03:05:35,923][42004] Updated weights for policy 0, policy_version 27886 (0.0033) +[2024-11-08 03:05:37,934][41694] Fps is (10 sec: 7371.3, 60 sec: 6689.9, 300 sec: 6748.0). Total num frames: 114233344. Throughput: 0: 1695.8. Samples: 23553098. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:05:37,936][41694] Avg episode reward: [(0, '4.621')] +[2024-11-08 03:05:38,057][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000027890_114237440.pth... +[2024-11-08 03:05:38,161][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000027492_112607232.pth +[2024-11-08 03:05:41,263][42004] Updated weights for policy 0, policy_version 27896 (0.0023) +[2024-11-08 03:05:42,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6811.7). Total num frames: 114270208. Throughput: 0: 1726.0. Samples: 23564642. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:05:42,934][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 03:05:46,627][42004] Updated weights for policy 0, policy_version 27906 (0.0024) +[2024-11-08 03:05:47,931][41694] Fps is (10 sec: 7784.1, 60 sec: 6994.1, 300 sec: 6817.4). Total num frames: 114311168. Throughput: 0: 1717.0. Samples: 23570044. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:05:47,933][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 03:05:52,068][42004] Updated weights for policy 0, policy_version 27916 (0.0029) +[2024-11-08 03:05:52,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7031.5, 300 sec: 6831.3). Total num frames: 114348032. Throughput: 0: 1725.2. Samples: 23581352. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:05:52,934][41694] Avg episode reward: [(0, '4.327')] +[2024-11-08 03:05:59,050][41694] Fps is (10 sec: 5894.5, 60 sec: 6768.9, 300 sec: 6777.9). Total num frames: 114376704. Throughput: 0: 1571.8. Samples: 23587082. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:05:59,051][41694] Avg episode reward: [(0, '4.503')] +[2024-11-08 03:05:59,991][42004] Updated weights for policy 0, policy_version 27926 (0.0034) +[2024-11-08 03:06:02,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6690.2, 300 sec: 6761.9). Total num frames: 114401280. Throughput: 0: 1655.6. Samples: 23593826. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:06:02,933][41694] Avg episode reward: [(0, '4.629')] +[2024-11-08 03:06:06,260][42004] Updated weights for policy 0, policy_version 27936 (0.0036) +[2024-11-08 03:06:07,938][41694] Fps is (10 sec: 6912.4, 60 sec: 6689.4, 300 sec: 6761.7). Total num frames: 114438144. Throughput: 0: 1691.9. Samples: 23603624. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:06:07,940][41694] Avg episode reward: [(0, '4.443')] +[2024-11-08 03:06:12,310][42004] Updated weights for policy 0, policy_version 27946 (0.0036) +[2024-11-08 03:06:12,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6621.8, 300 sec: 6748.0). Total num frames: 114470912. Throughput: 0: 1720.0. Samples: 23613854. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:06:12,935][41694] Avg episode reward: [(0, '4.584')] +[2024-11-08 03:06:17,561][42004] Updated weights for policy 0, policy_version 27956 (0.0032) +[2024-11-08 03:06:17,931][41694] Fps is (10 sec: 6967.8, 60 sec: 6690.2, 300 sec: 6803.5). Total num frames: 114507776. Throughput: 0: 1724.0. Samples: 23619140. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:06:17,933][41694] Avg episode reward: [(0, '4.478')] +[2024-11-08 03:06:22,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6941.6, 300 sec: 6803.5). Total num frames: 114544640. Throughput: 0: 1731.2. Samples: 23631000. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:06:22,937][41694] Avg episode reward: [(0, '4.626')] +[2024-11-08 03:06:22,944][42004] Updated weights for policy 0, policy_version 27966 (0.0034) +[2024-11-08 03:06:27,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7099.8, 300 sec: 6831.3). Total num frames: 114585600. Throughput: 0: 1734.4. Samples: 23642690. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:06:27,933][41694] Avg episode reward: [(0, '4.398')] +[2024-11-08 03:06:28,301][42004] Updated weights for policy 0, policy_version 27976 (0.0023) +[2024-11-08 03:06:33,595][41694] Fps is (10 sec: 6145.7, 60 sec: 6819.5, 300 sec: 6774.4). Total num frames: 114610176. Throughput: 0: 1711.5. Samples: 23648198. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:06:33,597][41694] Avg episode reward: [(0, '4.520')] +[2024-11-08 03:06:36,410][42004] Updated weights for policy 0, policy_version 27986 (0.0032) +[2024-11-08 03:06:37,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6758.6, 300 sec: 6761.9). Total num frames: 114638848. Throughput: 0: 1623.3. Samples: 23654400. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:06:37,941][41694] Avg episode reward: [(0, '4.540')] +[2024-11-08 03:06:42,010][42004] Updated weights for policy 0, policy_version 27996 (0.0033) +[2024-11-08 03:06:42,932][41694] Fps is (10 sec: 7019.5, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 114675712. Throughput: 0: 1786.0. Samples: 23665456. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:06:42,934][41694] Avg episode reward: [(0, '4.559')] +[2024-11-08 03:06:47,219][42004] Updated weights for policy 0, policy_version 28006 (0.0023) +[2024-11-08 03:06:47,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 114716672. Throughput: 0: 1718.4. Samples: 23671156. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:06:47,933][41694] Avg episode reward: [(0, '4.400')] +[2024-11-08 03:06:52,488][42004] Updated weights for policy 0, policy_version 28016 (0.0033) +[2024-11-08 03:06:52,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 114753536. Throughput: 0: 1763.1. Samples: 23682954. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:06:52,934][41694] Avg episode reward: [(0, '4.551')] +[2024-11-08 03:06:57,773][42004] Updated weights for policy 0, policy_version 28026 (0.0027) +[2024-11-08 03:06:57,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7095.4, 300 sec: 6845.2). Total num frames: 114794496. Throughput: 0: 1794.8. Samples: 23694622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:06:57,933][41694] Avg episode reward: [(0, '4.289')] +[2024-11-08 03:07:02,932][41694] Fps is (10 sec: 7782.2, 60 sec: 7168.0, 300 sec: 6859.1). Total num frames: 114831360. Throughput: 0: 1804.7. Samples: 23700350. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:07:02,934][41694] Avg episode reward: [(0, '4.529')] +[2024-11-08 03:07:03,383][42004] Updated weights for policy 0, policy_version 28036 (0.0022) +[2024-11-08 03:07:08,114][41694] Fps is (10 sec: 5229.4, 60 sec: 6806.7, 300 sec: 6785.5). Total num frames: 114847744. Throughput: 0: 1655.2. Samples: 23705786. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:07:08,116][41694] Avg episode reward: [(0, '4.361')] +[2024-11-08 03:07:12,038][42004] Updated weights for policy 0, policy_version 28046 (0.0043) +[2024-11-08 03:07:12,931][41694] Fps is (10 sec: 4915.4, 60 sec: 6826.7, 300 sec: 6775.8). Total num frames: 114880512. Throughput: 0: 1642.8. Samples: 23716614. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:07:12,934][41694] Avg episode reward: [(0, '4.446')] +[2024-11-08 03:07:17,372][42004] Updated weights for policy 0, policy_version 28056 (0.0027) +[2024-11-08 03:07:17,931][41694] Fps is (10 sec: 7092.7, 60 sec: 6826.7, 300 sec: 6775.8). Total num frames: 114917376. Throughput: 0: 1666.3. Samples: 23722076. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:07:17,933][41694] Avg episode reward: [(0, '4.265')] +[2024-11-08 03:07:22,778][42004] Updated weights for policy 0, policy_version 28066 (0.0026) +[2024-11-08 03:07:22,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6894.9, 300 sec: 6775.8). Total num frames: 114958336. Throughput: 0: 1759.2. Samples: 23733564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:07:22,933][41694] Avg episode reward: [(0, '4.538')] +[2024-11-08 03:07:27,932][41694] Fps is (10 sec: 7782.0, 60 sec: 6826.6, 300 sec: 6831.3). Total num frames: 114995200. Throughput: 0: 1774.4. Samples: 23745306. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:07:27,934][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 03:07:28,098][42004] Updated weights for policy 0, policy_version 28076 (0.0028) +[2024-11-08 03:07:32,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7179.2, 300 sec: 6859.1). Total num frames: 115036160. Throughput: 0: 1774.8. Samples: 23751022. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:07:32,933][41694] Avg episode reward: [(0, '4.580')] +[2024-11-08 03:07:33,376][42004] Updated weights for policy 0, policy_version 28086 (0.0032) +[2024-11-08 03:07:37,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7236.3, 300 sec: 6872.9). Total num frames: 115073024. Throughput: 0: 1768.4. Samples: 23762530. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:07:37,938][41694] Avg episode reward: [(0, '4.392')] +[2024-11-08 03:07:37,950][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000028094_115073024.pth... +[2024-11-08 03:07:38,096][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000027696_113442816.pth +[2024-11-08 03:07:39,310][42004] Updated weights for policy 0, policy_version 28096 (0.0031) +[2024-11-08 03:07:42,935][41694] Fps is (10 sec: 4913.4, 60 sec: 6826.3, 300 sec: 6789.6). Total num frames: 115085312. Throughput: 0: 1664.0. Samples: 23769506. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:07:42,937][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 03:07:47,777][42004] Updated weights for policy 0, policy_version 28106 (0.0029) +[2024-11-08 03:07:47,931][41694] Fps is (10 sec: 4915.3, 60 sec: 6758.4, 300 sec: 6789.7). Total num frames: 115122176. Throughput: 0: 1615.1. Samples: 23773030. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:07:47,933][41694] Avg episode reward: [(0, '4.297')] +[2024-11-08 03:07:52,932][41694] Fps is (10 sec: 7375.4, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 115159040. Throughput: 0: 1757.1. Samples: 23784536. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:07:52,934][41694] Avg episode reward: [(0, '4.426')] +[2024-11-08 03:07:52,963][42004] Updated weights for policy 0, policy_version 28116 (0.0030) +[2024-11-08 03:07:57,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 115200000. Throughput: 0: 1774.1. Samples: 23796450. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:07:57,933][41694] Avg episode reward: [(0, '4.506')] +[2024-11-08 03:07:58,119][42004] Updated weights for policy 0, policy_version 28126 (0.0024) +[2024-11-08 03:08:02,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 115236864. Throughput: 0: 1783.1. Samples: 23802314. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:08:02,934][41694] Avg episode reward: [(0, '4.486')] +[2024-11-08 03:08:03,820][42004] Updated weights for policy 0, policy_version 28136 (0.0026) +[2024-11-08 03:08:07,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7121.4, 300 sec: 6859.1). Total num frames: 115273728. Throughput: 0: 1752.1. Samples: 23812410. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:08:07,933][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 03:08:09,753][42004] Updated weights for policy 0, policy_version 28146 (0.0034) +[2024-11-08 03:08:12,932][41694] Fps is (10 sec: 6963.2, 60 sec: 7099.7, 300 sec: 6845.3). Total num frames: 115306496. Throughput: 0: 1731.7. Samples: 23823232. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:08:12,934][41694] Avg episode reward: [(0, '4.478')] +[2024-11-08 03:08:17,932][41694] Fps is (10 sec: 4915.2, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 115322880. Throughput: 0: 1707.6. Samples: 23827866. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:08:17,934][41694] Avg episode reward: [(0, '4.524')] +[2024-11-08 03:08:18,012][42004] Updated weights for policy 0, policy_version 28156 (0.0033) +[2024-11-08 03:08:22,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 115363840. Throughput: 0: 1605.9. Samples: 23834794. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:08:22,935][41694] Avg episode reward: [(0, '4.481')] +[2024-11-08 03:08:23,395][42004] Updated weights for policy 0, policy_version 28166 (0.0032) +[2024-11-08 03:08:27,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.5, 300 sec: 6775.8). Total num frames: 115400704. Throughput: 0: 1708.8. Samples: 23846394. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:08:27,933][41694] Avg episode reward: [(0, '4.392')] +[2024-11-08 03:08:28,742][42004] Updated weights for policy 0, policy_version 28176 (0.0029) +[2024-11-08 03:08:32,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6758.4, 300 sec: 6789.7). Total num frames: 115441664. Throughput: 0: 1758.8. Samples: 23852178. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:08:32,933][41694] Avg episode reward: [(0, '4.577')] +[2024-11-08 03:08:33,908][42004] Updated weights for policy 0, policy_version 28186 (0.0040) +[2024-11-08 03:08:37,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 115478528. Throughput: 0: 1764.8. Samples: 23863952. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:08:37,934][41694] Avg episode reward: [(0, '4.688')] +[2024-11-08 03:08:39,215][42004] Updated weights for policy 0, policy_version 28196 (0.0034) +[2024-11-08 03:08:42,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7168.4, 300 sec: 6859.1). Total num frames: 115515392. Throughput: 0: 1755.2. Samples: 23875436. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:08:42,933][41694] Avg episode reward: [(0, '4.674')] +[2024-11-08 03:08:44,630][42004] Updated weights for policy 0, policy_version 28206 (0.0026) +[2024-11-08 03:08:47,932][41694] Fps is (10 sec: 7372.9, 60 sec: 7168.0, 300 sec: 6872.9). Total num frames: 115552256. Throughput: 0: 1754.0. Samples: 23881246. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:08:47,933][41694] Avg episode reward: [(0, '4.505')] +[2024-11-08 03:08:52,932][42004] Updated weights for policy 0, policy_version 28216 (0.0032) +[2024-11-08 03:08:52,933][41694] Fps is (10 sec: 5734.4, 60 sec: 6895.0, 300 sec: 6817.4). Total num frames: 115572736. Throughput: 0: 1693.2. Samples: 23888606. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:08:52,935][41694] Avg episode reward: [(0, '4.468')] +[2024-11-08 03:08:57,932][41694] Fps is (10 sec: 5734.1, 60 sec: 6826.6, 300 sec: 6831.3). Total num frames: 115609600. Throughput: 0: 1678.1. Samples: 23898748. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:08:57,935][41694] Avg episode reward: [(0, '4.393')] +[2024-11-08 03:08:58,174][42004] Updated weights for policy 0, policy_version 28226 (0.0023) +[2024-11-08 03:09:02,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6831.3). Total num frames: 115646464. Throughput: 0: 1704.3. Samples: 23904558. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:09:02,936][41694] Avg episode reward: [(0, '4.310')] +[2024-11-08 03:09:03,604][42004] Updated weights for policy 0, policy_version 28236 (0.0030) +[2024-11-08 03:09:07,931][41694] Fps is (10 sec: 7782.8, 60 sec: 6894.9, 300 sec: 6845.2). Total num frames: 115687424. Throughput: 0: 1803.9. Samples: 23915968. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:09:07,933][41694] Avg episode reward: [(0, '4.566')] +[2024-11-08 03:09:08,762][42004] Updated weights for policy 0, policy_version 28246 (0.0032) +[2024-11-08 03:09:12,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6963.2, 300 sec: 6900.7). Total num frames: 115724288. Throughput: 0: 1804.6. Samples: 23927600. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:09:12,935][41694] Avg episode reward: [(0, '4.343')] +[2024-11-08 03:09:14,075][42004] Updated weights for policy 0, policy_version 28256 (0.0026) +[2024-11-08 03:09:17,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7372.8, 300 sec: 6928.5). Total num frames: 115765248. Throughput: 0: 1805.1. Samples: 23933406. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:09:17,933][41694] Avg episode reward: [(0, '4.266')] +[2024-11-08 03:09:19,459][42004] Updated weights for policy 0, policy_version 28266 (0.0024) +[2024-11-08 03:09:22,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7236.2, 300 sec: 6914.6). Total num frames: 115798016. Throughput: 0: 1786.3. Samples: 23944334. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:09:22,934][41694] Avg episode reward: [(0, '4.504')] +[2024-11-08 03:09:27,875][42004] Updated weights for policy 0, policy_version 28276 (0.0021) +[2024-11-08 03:09:27,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6963.2, 300 sec: 6859.1). Total num frames: 115818496. Throughput: 0: 1667.1. Samples: 23950454. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:09:27,933][41694] Avg episode reward: [(0, '4.552')] +[2024-11-08 03:09:32,932][41694] Fps is (10 sec: 5734.6, 60 sec: 6894.9, 300 sec: 6859.1). Total num frames: 115855360. Throughput: 0: 1665.0. Samples: 23956172. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:09:32,934][41694] Avg episode reward: [(0, '4.369')] +[2024-11-08 03:09:33,084][42004] Updated weights for policy 0, policy_version 28286 (0.0047) +[2024-11-08 03:09:37,932][41694] Fps is (10 sec: 7782.0, 60 sec: 6963.2, 300 sec: 6872.9). Total num frames: 115896320. Throughput: 0: 1767.0. Samples: 23968122. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:09:37,936][41694] Avg episode reward: [(0, '4.595')] +[2024-11-08 03:09:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000028295_115896320.pth... +[2024-11-08 03:09:38,075][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000027890_114237440.pth +[2024-11-08 03:09:38,278][42004] Updated weights for policy 0, policy_version 28296 (0.0025) +[2024-11-08 03:09:42,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6920.4). Total num frames: 115933184. Throughput: 0: 1803.6. Samples: 23979910. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:09:42,933][41694] Avg episode reward: [(0, '4.522')] +[2024-11-08 03:09:43,625][42004] Updated weights for policy 0, policy_version 28306 (0.0031) +[2024-11-08 03:09:47,932][41694] Fps is (10 sec: 7782.8, 60 sec: 7031.5, 300 sec: 6942.4). Total num frames: 115974144. Throughput: 0: 1800.2. Samples: 23985566. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:09:47,935][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 03:09:48,913][42004] Updated weights for policy 0, policy_version 28316 (0.0026) +[2024-11-08 03:09:52,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7304.5, 300 sec: 6942.4). Total num frames: 116011008. Throughput: 0: 1800.7. Samples: 23996998. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:09:52,934][41694] Avg episode reward: [(0, '4.287')] +[2024-11-08 03:09:54,532][42004] Updated weights for policy 0, policy_version 28326 (0.0032) +[2024-11-08 03:09:57,931][41694] Fps is (10 sec: 6963.3, 60 sec: 7236.3, 300 sec: 6928.5). Total num frames: 116043776. Throughput: 0: 1777.6. Samples: 24007592. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:09:57,933][41694] Avg episode reward: [(0, '4.361')] +[2024-11-08 03:10:02,924][42004] Updated weights for policy 0, policy_version 28336 (0.0021) +[2024-11-08 03:10:02,931][41694] Fps is (10 sec: 5324.7, 60 sec: 6963.2, 300 sec: 6873.0). Total num frames: 116064256. Throughput: 0: 1704.9. Samples: 24010128. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:10:02,933][41694] Avg episode reward: [(0, '4.518')] +[2024-11-08 03:10:07,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6894.9, 300 sec: 6872.9). Total num frames: 116101120. Throughput: 0: 1675.0. Samples: 24019708. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:10:07,934][41694] Avg episode reward: [(0, '4.376')] +[2024-11-08 03:10:08,165][42004] Updated weights for policy 0, policy_version 28346 (0.0024) +[2024-11-08 03:10:12,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6894.9, 300 sec: 6886.8). Total num frames: 116137984. Throughput: 0: 1787.7. Samples: 24030900. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:10:12,935][41694] Avg episode reward: [(0, '4.472')] +[2024-11-08 03:10:13,862][42004] Updated weights for policy 0, policy_version 28356 (0.0028) +[2024-11-08 03:10:17,937][41694] Fps is (10 sec: 6959.5, 60 sec: 6757.8, 300 sec: 6923.9). Total num frames: 116170752. Throughput: 0: 1769.8. Samples: 24035824. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:10:17,939][41694] Avg episode reward: [(0, '4.260')] +[2024-11-08 03:10:19,818][42004] Updated weights for policy 0, policy_version 28366 (0.0049) +[2024-11-08 03:10:22,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6826.7, 300 sec: 6942.4). Total num frames: 116207616. Throughput: 0: 1743.8. Samples: 24046594. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:10:22,933][41694] Avg episode reward: [(0, '4.352')] +[2024-11-08 03:10:25,199][42004] Updated weights for policy 0, policy_version 28376 (0.0026) +[2024-11-08 03:10:27,931][41694] Fps is (10 sec: 7376.7, 60 sec: 7099.8, 300 sec: 6942.4). Total num frames: 116244480. Throughput: 0: 1730.8. Samples: 24057796. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:10:27,933][41694] Avg episode reward: [(0, '4.352')] +[2024-11-08 03:10:31,404][42004] Updated weights for policy 0, policy_version 28386 (0.0029) +[2024-11-08 03:10:32,932][41694] Fps is (10 sec: 6963.1, 60 sec: 7031.4, 300 sec: 6928.5). Total num frames: 116277248. Throughput: 0: 1705.0. Samples: 24062292. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:10:32,935][41694] Avg episode reward: [(0, '4.345')] +[2024-11-08 03:10:37,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6690.2, 300 sec: 6873.0). Total num frames: 116297728. Throughput: 0: 1599.9. Samples: 24068994. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:10:37,933][41694] Avg episode reward: [(0, '4.486')] +[2024-11-08 03:10:39,194][42004] Updated weights for policy 0, policy_version 28396 (0.0041) +[2024-11-08 03:10:42,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6690.1, 300 sec: 6859.1). Total num frames: 116334592. Throughput: 0: 1609.7. Samples: 24080030. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:10:42,936][41694] Avg episode reward: [(0, '4.375')] +[2024-11-08 03:10:44,806][42004] Updated weights for policy 0, policy_version 28406 (0.0026) +[2024-11-08 03:10:47,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6690.1, 300 sec: 6872.9). Total num frames: 116375552. Throughput: 0: 1680.4. Samples: 24085748. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:10:47,936][41694] Avg episode reward: [(0, '4.655')] +[2024-11-08 03:10:50,015][42004] Updated weights for policy 0, policy_version 28416 (0.0031) +[2024-11-08 03:10:52,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6690.1, 300 sec: 6927.0). Total num frames: 116412416. Throughput: 0: 1728.8. Samples: 24097502. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:10:52,934][41694] Avg episode reward: [(0, '4.291')] +[2024-11-08 03:10:55,261][42004] Updated weights for policy 0, policy_version 28426 (0.0024) +[2024-11-08 03:10:57,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6942.4). Total num frames: 116449280. Throughput: 0: 1736.2. Samples: 24109028. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:10:57,933][41694] Avg episode reward: [(0, '4.633')] +[2024-11-08 03:11:00,943][42004] Updated weights for policy 0, policy_version 28436 (0.0029) +[2024-11-08 03:11:02,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 6942.5). Total num frames: 116486144. Throughput: 0: 1750.6. Samples: 24114594. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:11:02,933][41694] Avg episode reward: [(0, '4.671')] +[2024-11-08 03:11:07,099][42004] Updated weights for policy 0, policy_version 28446 (0.0028) +[2024-11-08 03:11:09,829][41694] Fps is (10 sec: 5508.2, 60 sec: 6683.5, 300 sec: 6884.2). Total num frames: 116514816. Throughput: 0: 1651.3. Samples: 24124038. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:11:09,832][41694] Avg episode reward: [(0, '4.638')] +[2024-11-08 03:11:12,931][41694] Fps is (10 sec: 4915.3, 60 sec: 6621.9, 300 sec: 6873.0). Total num frames: 116535296. Throughput: 0: 1614.0. Samples: 24130426. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:11:12,933][41694] Avg episode reward: [(0, '4.608')] +[2024-11-08 03:11:15,560][42004] Updated weights for policy 0, policy_version 28456 (0.0036) +[2024-11-08 03:11:17,931][41694] Fps is (10 sec: 7077.6, 60 sec: 6690.7, 300 sec: 6873.0). Total num frames: 116572160. Throughput: 0: 1623.8. Samples: 24135364. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:11:17,933][41694] Avg episode reward: [(0, '4.482')] +[2024-11-08 03:11:21,060][42004] Updated weights for policy 0, policy_version 28466 (0.0029) +[2024-11-08 03:11:22,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.1, 300 sec: 6859.1). Total num frames: 116609024. Throughput: 0: 1727.6. Samples: 24146738. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:11:22,933][41694] Avg episode reward: [(0, '4.366')] +[2024-11-08 03:11:26,548][42004] Updated weights for policy 0, policy_version 28476 (0.0032) +[2024-11-08 03:11:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6916.3). Total num frames: 116645888. Throughput: 0: 1731.1. Samples: 24157928. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:11:27,933][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 03:11:31,902][42004] Updated weights for policy 0, policy_version 28486 (0.0051) +[2024-11-08 03:11:32,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6928.5). Total num frames: 116682752. Throughput: 0: 1726.7. Samples: 24163450. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:11:32,933][41694] Avg episode reward: [(0, '4.459')] +[2024-11-08 03:11:37,892][42004] Updated weights for policy 0, policy_version 28496 (0.0034) +[2024-11-08 03:11:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7031.4, 300 sec: 6928.5). Total num frames: 116719616. Throughput: 0: 1705.1. Samples: 24174230. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:11:37,933][41694] Avg episode reward: [(0, '4.794')] +[2024-11-08 03:11:37,944][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000028496_116719616.pth... +[2024-11-08 03:11:38,071][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000028094_115073024.pth +[2024-11-08 03:11:44,357][41694] Fps is (10 sec: 5736.1, 60 sec: 6735.0, 300 sec: 6853.7). Total num frames: 116748288. Throughput: 0: 1629.2. Samples: 24184664. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:11:44,359][41694] Avg episode reward: [(0, '4.620')] +[2024-11-08 03:11:45,837][42004] Updated weights for policy 0, policy_version 28506 (0.0025) +[2024-11-08 03:11:47,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6621.9, 300 sec: 6845.2). Total num frames: 116772864. Throughput: 0: 1588.4. Samples: 24186072. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:11:47,934][41694] Avg episode reward: [(0, '4.298')] +[2024-11-08 03:11:51,386][42004] Updated weights for policy 0, policy_version 28516 (0.0032) +[2024-11-08 03:11:52,933][41694] Fps is (10 sec: 7163.7, 60 sec: 6621.7, 300 sec: 6831.3). Total num frames: 116809728. Throughput: 0: 1697.4. Samples: 24197204. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:11:52,936][41694] Avg episode reward: [(0, '4.574')] +[2024-11-08 03:11:56,822][42004] Updated weights for policy 0, policy_version 28526 (0.0033) +[2024-11-08 03:11:57,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6690.1, 300 sec: 6845.2). Total num frames: 116850688. Throughput: 0: 1736.5. Samples: 24208570. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:11:57,933][41694] Avg episode reward: [(0, '4.528')] +[2024-11-08 03:12:02,426][42004] Updated weights for policy 0, policy_version 28536 (0.0023) +[2024-11-08 03:12:02,933][41694] Fps is (10 sec: 7373.0, 60 sec: 6621.7, 300 sec: 6905.0). Total num frames: 116883456. Throughput: 0: 1755.5. Samples: 24214366. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:12:02,936][41694] Avg episode reward: [(0, '4.599')] +[2024-11-08 03:12:07,862][42004] Updated weights for policy 0, policy_version 28546 (0.0037) +[2024-11-08 03:12:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7049.6, 300 sec: 6928.5). Total num frames: 116924416. Throughput: 0: 1737.9. Samples: 24224942. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:12:07,933][41694] Avg episode reward: [(0, '4.486')] +[2024-11-08 03:12:12,935][41694] Fps is (10 sec: 6961.6, 60 sec: 6962.8, 300 sec: 6900.6). Total num frames: 116953088. Throughput: 0: 1715.5. Samples: 24235130. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:12:12,937][41694] Avg episode reward: [(0, '4.347')] +[2024-11-08 03:12:14,430][42004] Updated weights for policy 0, policy_version 28556 (0.0035) +[2024-11-08 03:12:18,900][41694] Fps is (10 sec: 4854.8, 60 sec: 6651.1, 300 sec: 6822.8). Total num frames: 116977664. Throughput: 0: 1659.2. Samples: 24239720. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:12:18,902][41694] Avg episode reward: [(0, '4.387')] +[2024-11-08 03:12:22,526][42004] Updated weights for policy 0, policy_version 28566 (0.0033) +[2024-11-08 03:12:22,932][41694] Fps is (10 sec: 5326.6, 60 sec: 6621.8, 300 sec: 6817.4). Total num frames: 117006336. Throughput: 0: 1601.1. Samples: 24246280. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:12:22,934][41694] Avg episode reward: [(0, '4.779')] +[2024-11-08 03:12:27,869][42004] Updated weights for policy 0, policy_version 28576 (0.0027) +[2024-11-08 03:12:27,932][41694] Fps is (10 sec: 7709.1, 60 sec: 6690.1, 300 sec: 6817.4). Total num frames: 117047296. Throughput: 0: 1677.4. Samples: 24257758. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:12:27,934][41694] Avg episode reward: [(0, '4.455')] +[2024-11-08 03:12:32,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.8, 300 sec: 6803.5). Total num frames: 117080064. Throughput: 0: 1706.6. Samples: 24262870. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:12:32,933][41694] Avg episode reward: [(0, '4.537')] +[2024-11-08 03:12:33,610][42004] Updated weights for policy 0, policy_version 28586 (0.0032) +[2024-11-08 03:12:37,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6621.8, 300 sec: 6886.9). Total num frames: 117116928. Throughput: 0: 1707.8. Samples: 24274054. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:12:37,936][41694] Avg episode reward: [(0, '4.396')] +[2024-11-08 03:12:39,325][42004] Updated weights for policy 0, policy_version 28596 (0.0031) +[2024-11-08 03:12:42,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6922.8, 300 sec: 6886.8). Total num frames: 117153792. Throughput: 0: 1692.2. Samples: 24284718. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:12:42,933][41694] Avg episode reward: [(0, '4.491')] +[2024-11-08 03:12:45,473][42004] Updated weights for policy 0, policy_version 28606 (0.0024) +[2024-11-08 03:12:47,932][41694] Fps is (10 sec: 6554.0, 60 sec: 6826.7, 300 sec: 6859.1). Total num frames: 117182464. Throughput: 0: 1665.5. Samples: 24289312. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:12:47,935][41694] Avg episode reward: [(0, '4.530')] +[2024-11-08 03:12:51,144][42004] Updated weights for policy 0, policy_version 28616 (0.0034) +[2024-11-08 03:12:53,427][41694] Fps is (10 sec: 5463.9, 60 sec: 6635.6, 300 sec: 6806.0). Total num frames: 117211136. Throughput: 0: 1652.8. Samples: 24300138. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:12:53,428][41694] Avg episode reward: [(0, '4.455')] +[2024-11-08 03:12:57,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6485.3, 300 sec: 6789.6). Total num frames: 117239808. Throughput: 0: 1584.6. Samples: 24306434. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:12:57,936][41694] Avg episode reward: [(0, '4.398')] +[2024-11-08 03:12:59,416][42004] Updated weights for policy 0, policy_version 28626 (0.0041) +[2024-11-08 03:13:02,931][41694] Fps is (10 sec: 6464.0, 60 sec: 6485.5, 300 sec: 6775.8). Total num frames: 117272576. Throughput: 0: 1634.3. Samples: 24311682. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:13:02,933][41694] Avg episode reward: [(0, '4.724')] +[2024-11-08 03:13:05,120][42004] Updated weights for policy 0, policy_version 28636 (0.0030) +[2024-11-08 03:13:07,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6485.3, 300 sec: 6803.5). Total num frames: 117313536. Throughput: 0: 1695.5. Samples: 24322578. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:13:07,933][41694] Avg episode reward: [(0, '4.336')] +[2024-11-08 03:13:10,626][42004] Updated weights for policy 0, policy_version 28646 (0.0027) +[2024-11-08 03:13:12,932][41694] Fps is (10 sec: 7781.6, 60 sec: 6622.2, 300 sec: 6872.9). Total num frames: 117350400. Throughput: 0: 1687.3. Samples: 24333688. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:13:12,934][41694] Avg episode reward: [(0, '4.100')] +[2024-11-08 03:13:16,119][42004] Updated weights for policy 0, policy_version 28656 (0.0032) +[2024-11-08 03:13:17,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6869.2, 300 sec: 6845.2). Total num frames: 117383168. Throughput: 0: 1697.7. Samples: 24339268. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:13:17,933][41694] Avg episode reward: [(0, '4.596')] +[2024-11-08 03:13:22,751][42004] Updated weights for policy 0, policy_version 28666 (0.0050) +[2024-11-08 03:13:22,932][41694] Fps is (10 sec: 6554.1, 60 sec: 6826.7, 300 sec: 6831.3). Total num frames: 117415936. Throughput: 0: 1656.7. Samples: 24348606. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:13:22,933][41694] Avg episode reward: [(0, '4.618')] +[2024-11-08 03:13:27,940][41694] Fps is (10 sec: 5320.1, 60 sec: 6484.4, 300 sec: 6761.7). Total num frames: 117436416. Throughput: 0: 1537.5. Samples: 24353920. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:13:27,943][41694] Avg episode reward: [(0, '4.493')] +[2024-11-08 03:13:30,458][42004] Updated weights for policy 0, policy_version 28676 (0.0027) +[2024-11-08 03:13:32,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6553.6, 300 sec: 6761.9). Total num frames: 117473280. Throughput: 0: 1587.6. Samples: 24360754. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:13:32,932][41694] Avg episode reward: [(0, '4.523')] +[2024-11-08 03:13:36,118][42004] Updated weights for policy 0, policy_version 28686 (0.0035) +[2024-11-08 03:13:37,932][41694] Fps is (10 sec: 7379.0, 60 sec: 6553.6, 300 sec: 6761.9). Total num frames: 117510144. Throughput: 0: 1611.0. Samples: 24371834. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:13:37,934][41694] Avg episode reward: [(0, '4.680')] +[2024-11-08 03:13:37,941][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000028689_117510144.pth... +[2024-11-08 03:13:38,057][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000028295_115896320.pth +[2024-11-08 03:13:41,587][42004] Updated weights for policy 0, policy_version 28696 (0.0027) +[2024-11-08 03:13:42,933][41694] Fps is (10 sec: 7371.7, 60 sec: 6553.5, 300 sec: 6761.8). Total num frames: 117547008. Throughput: 0: 1708.4. Samples: 24383314. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 03:13:42,935][41694] Avg episode reward: [(0, '4.444')] +[2024-11-08 03:13:46,927][42004] Updated weights for policy 0, policy_version 28706 (0.0029) +[2024-11-08 03:13:47,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6690.2, 300 sec: 6817.4). Total num frames: 117583872. Throughput: 0: 1717.9. Samples: 24388986. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 03:13:47,933][41694] Avg episode reward: [(0, '4.486')] +[2024-11-08 03:13:52,810][42004] Updated weights for policy 0, policy_version 28716 (0.0047) +[2024-11-08 03:13:52,931][41694] Fps is (10 sec: 7373.9, 60 sec: 6883.5, 300 sec: 6817.4). Total num frames: 117620736. Throughput: 0: 1715.3. Samples: 24399766. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 03:13:52,933][41694] Avg episode reward: [(0, '4.309')] +[2024-11-08 03:13:57,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 6817.4). Total num frames: 117657600. Throughput: 0: 1704.8. Samples: 24410404. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:13:57,933][41694] Avg episode reward: [(0, '4.440')] +[2024-11-08 03:13:58,391][42004] Updated weights for policy 0, policy_version 28726 (0.0033) +[2024-11-08 03:14:02,931][41694] Fps is (10 sec: 5324.7, 60 sec: 6690.1, 300 sec: 6734.1). Total num frames: 117673984. Throughput: 0: 1700.7. Samples: 24415798. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:14:02,934][41694] Avg episode reward: [(0, '4.332')] +[2024-11-08 03:14:06,387][42004] Updated weights for policy 0, policy_version 28736 (0.0028) +[2024-11-08 03:14:07,932][41694] Fps is (10 sec: 5324.5, 60 sec: 6621.8, 300 sec: 6734.1). Total num frames: 117710848. Throughput: 0: 1640.7. Samples: 24422438. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:14:07,935][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 03:14:12,048][42004] Updated weights for policy 0, policy_version 28746 (0.0031) +[2024-11-08 03:14:12,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6622.0, 300 sec: 6720.2). Total num frames: 117747712. Throughput: 0: 1769.1. Samples: 24433516. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:14:12,934][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 03:14:17,138][42004] Updated weights for policy 0, policy_version 28756 (0.0027) +[2024-11-08 03:14:17,932][41694] Fps is (10 sec: 7782.8, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 117788672. Throughput: 0: 1745.8. Samples: 24439316. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:14:17,933][41694] Avg episode reward: [(0, '4.498')] +[2024-11-08 03:14:22,554][42004] Updated weights for policy 0, policy_version 28766 (0.0031) +[2024-11-08 03:14:22,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6826.6, 300 sec: 6803.5). Total num frames: 117825536. Throughput: 0: 1756.2. Samples: 24450862. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:14:22,934][41694] Avg episode reward: [(0, '4.392')] +[2024-11-08 03:14:27,931][41694] Fps is (10 sec: 6963.3, 60 sec: 7032.5, 300 sec: 6789.6). Total num frames: 117858304. Throughput: 0: 1735.7. Samples: 24461416. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:14:27,934][41694] Avg episode reward: [(0, '4.279')] +[2024-11-08 03:14:28,648][42004] Updated weights for policy 0, policy_version 28776 (0.0046) +[2024-11-08 03:14:32,931][41694] Fps is (10 sec: 6963.5, 60 sec: 7031.5, 300 sec: 6775.8). Total num frames: 117895168. Throughput: 0: 1729.0. Samples: 24466792. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:14:32,933][41694] Avg episode reward: [(0, '4.219')] +[2024-11-08 03:14:34,041][42004] Updated weights for policy 0, policy_version 28786 (0.0029) +[2024-11-08 03:14:37,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6758.4, 300 sec: 6720.2). Total num frames: 117915648. Throughput: 0: 1681.0. Samples: 24475412. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:14:37,934][41694] Avg episode reward: [(0, '4.515')] +[2024-11-08 03:14:42,230][42004] Updated weights for policy 0, policy_version 28796 (0.0027) +[2024-11-08 03:14:42,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.6, 300 sec: 6706.3). Total num frames: 117952512. Throughput: 0: 1643.2. Samples: 24484350. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:14:42,934][41694] Avg episode reward: [(0, '4.352')] +[2024-11-08 03:14:47,548][42004] Updated weights for policy 0, policy_version 28806 (0.0026) +[2024-11-08 03:14:47,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 117989376. Throughput: 0: 1640.6. Samples: 24489624. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:14:47,933][41694] Avg episode reward: [(0, '4.537')] +[2024-11-08 03:14:52,897][42004] Updated weights for policy 0, policy_version 28816 (0.0029) +[2024-11-08 03:14:52,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6826.7, 300 sec: 6734.1). Total num frames: 118030336. Throughput: 0: 1760.3. Samples: 24501650. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:14:52,933][41694] Avg episode reward: [(0, '4.508')] +[2024-11-08 03:14:57,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6826.7, 300 sec: 6789.6). Total num frames: 118067200. Throughput: 0: 1760.1. Samples: 24512720. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:14:57,933][41694] Avg episode reward: [(0, '4.344')] +[2024-11-08 03:14:58,559][42004] Updated weights for policy 0, policy_version 28826 (0.0027) +[2024-11-08 03:15:02,932][41694] Fps is (10 sec: 6553.5, 60 sec: 7031.4, 300 sec: 6761.9). Total num frames: 118095872. Throughput: 0: 1737.0. Samples: 24517482. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:15:02,933][41694] Avg episode reward: [(0, '4.507')] +[2024-11-08 03:15:04,743][42004] Updated weights for policy 0, policy_version 28836 (0.0025) +[2024-11-08 03:15:07,931][41694] Fps is (10 sec: 6553.6, 60 sec: 7031.5, 300 sec: 6761.9). Total num frames: 118132736. Throughput: 0: 1716.6. Samples: 24528108. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:15:07,933][41694] Avg episode reward: [(0, '4.610')] +[2024-11-08 03:15:12,676][42004] Updated weights for policy 0, policy_version 28846 (0.0034) +[2024-11-08 03:15:12,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6720.3). Total num frames: 118153216. Throughput: 0: 1626.8. Samples: 24534622. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:15:12,937][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 03:15:17,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.9, 300 sec: 6706.3). Total num frames: 118185984. Throughput: 0: 1617.3. Samples: 24539572. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:15:17,933][41694] Avg episode reward: [(0, '4.402')] +[2024-11-08 03:15:19,216][42004] Updated weights for policy 0, policy_version 28856 (0.0029) +[2024-11-08 03:15:22,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6692.4). Total num frames: 118218752. Throughput: 0: 1638.0. Samples: 24549120. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:15:22,934][41694] Avg episode reward: [(0, '4.523')] +[2024-11-08 03:15:24,719][42004] Updated weights for policy 0, policy_version 28866 (0.0030) +[2024-11-08 03:15:27,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6720.2). Total num frames: 118259712. Throughput: 0: 1698.2. Samples: 24560770. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:15:27,933][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 03:15:29,992][42004] Updated weights for policy 0, policy_version 28876 (0.0027) +[2024-11-08 03:15:32,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6761.9). Total num frames: 118292480. Throughput: 0: 1710.2. Samples: 24566582. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:15:32,933][41694] Avg episode reward: [(0, '4.426')] +[2024-11-08 03:15:35,897][42004] Updated weights for policy 0, policy_version 28886 (0.0042) +[2024-11-08 03:15:37,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6894.9, 300 sec: 6761.9). Total num frames: 118329344. Throughput: 0: 1675.5. Samples: 24577046. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:15:37,934][41694] Avg episode reward: [(0, '4.279')] +[2024-11-08 03:15:38,066][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000028890_118333440.pth... +[2024-11-08 03:15:38,156][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000028496_116719616.pth +[2024-11-08 03:15:41,394][42004] Updated weights for policy 0, policy_version 28896 (0.0049) +[2024-11-08 03:15:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6748.0). Total num frames: 118366208. Throughput: 0: 1676.7. Samples: 24588170. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:15:42,933][41694] Avg episode reward: [(0, '4.506')] +[2024-11-08 03:15:47,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6621.9, 300 sec: 6692.5). Total num frames: 118386688. Throughput: 0: 1647.0. Samples: 24591598. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:15:47,933][41694] Avg episode reward: [(0, '4.402')] +[2024-11-08 03:15:49,646][42004] Updated weights for policy 0, policy_version 28906 (0.0049) +[2024-11-08 03:15:52,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6485.3, 300 sec: 6678.6). Total num frames: 118419456. Throughput: 0: 1588.9. Samples: 24599610. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:15:52,934][41694] Avg episode reward: [(0, '4.417')] +[2024-11-08 03:15:55,387][42004] Updated weights for policy 0, policy_version 28916 (0.0030) +[2024-11-08 03:15:57,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6485.3, 300 sec: 6678.6). Total num frames: 118456320. Throughput: 0: 1689.7. Samples: 24610658. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:15:57,934][41694] Avg episode reward: [(0, '4.328')] +[2024-11-08 03:16:00,623][42004] Updated weights for policy 0, policy_version 28926 (0.0028) +[2024-11-08 03:16:02,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6749.8). Total num frames: 118493184. Throughput: 0: 1710.8. Samples: 24616556. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:16:02,933][41694] Avg episode reward: [(0, '4.402')] +[2024-11-08 03:16:06,640][42004] Updated weights for policy 0, policy_version 28936 (0.0035) +[2024-11-08 03:16:07,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6621.9, 300 sec: 6761.9). Total num frames: 118530048. Throughput: 0: 1730.3. Samples: 24626982. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:16:07,936][41694] Avg episode reward: [(0, '4.735')] +[2024-11-08 03:16:12,814][42004] Updated weights for policy 0, policy_version 28946 (0.0026) +[2024-11-08 03:16:12,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6826.7, 300 sec: 6748.0). Total num frames: 118562816. Throughput: 0: 1692.1. Samples: 24636916. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:16:12,933][41694] Avg episode reward: [(0, '4.619')] +[2024-11-08 03:16:17,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 6748.0). Total num frames: 118599680. Throughput: 0: 1679.2. Samples: 24642144. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:16:17,936][41694] Avg episode reward: [(0, '4.789')] +[2024-11-08 03:16:18,233][42004] Updated weights for policy 0, policy_version 28956 (0.0030) +[2024-11-08 03:16:22,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.1, 300 sec: 6692.5). Total num frames: 118620160. Throughput: 0: 1604.5. Samples: 24649248. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:16:22,933][41694] Avg episode reward: [(0, '4.665')] +[2024-11-08 03:16:26,566][42004] Updated weights for policy 0, policy_version 28966 (0.0024) +[2024-11-08 03:16:27,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6553.6, 300 sec: 6678.6). Total num frames: 118652928. Throughput: 0: 1587.1. Samples: 24659590. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:16:27,934][41694] Avg episode reward: [(0, '4.446')] +[2024-11-08 03:16:32,204][42004] Updated weights for policy 0, policy_version 28976 (0.0024) +[2024-11-08 03:16:32,931][41694] Fps is (10 sec: 6963.1, 60 sec: 6621.9, 300 sec: 6678.6). Total num frames: 118689792. Throughput: 0: 1622.8. Samples: 24664624. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:16:32,933][41694] Avg episode reward: [(0, '4.375')] +[2024-11-08 03:16:37,633][42004] Updated weights for policy 0, policy_version 28986 (0.0033) +[2024-11-08 03:16:37,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6621.8, 300 sec: 6738.9). Total num frames: 118726656. Throughput: 0: 1696.2. Samples: 24675938. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:16:37,934][41694] Avg episode reward: [(0, '4.692')] +[2024-11-08 03:16:42,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6748.0). Total num frames: 118763520. Throughput: 0: 1691.0. Samples: 24686754. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:16:42,933][41694] Avg episode reward: [(0, '4.311')] +[2024-11-08 03:16:43,485][42004] Updated weights for policy 0, policy_version 28996 (0.0026) +[2024-11-08 03:16:47,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6894.9, 300 sec: 6748.0). Total num frames: 118800384. Throughput: 0: 1683.5. Samples: 24692312. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:16:47,934][41694] Avg episode reward: [(0, '4.557')] +[2024-11-08 03:16:48,885][42004] Updated weights for policy 0, policy_version 29006 (0.0026) +[2024-11-08 03:16:52,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6734.1). Total num frames: 118837248. Throughput: 0: 1702.5. Samples: 24703596. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:16:52,934][41694] Avg episode reward: [(0, '4.474')] +[2024-11-08 03:16:56,458][42004] Updated weights for policy 0, policy_version 29016 (0.0026) +[2024-11-08 03:16:57,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.1, 300 sec: 6692.5). Total num frames: 118857728. Throughput: 0: 1646.0. Samples: 24710988. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:16:57,935][41694] Avg episode reward: [(0, '4.580')] +[2024-11-08 03:17:02,222][42004] Updated weights for policy 0, policy_version 29026 (0.0026) +[2024-11-08 03:17:02,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6621.9, 300 sec: 6664.7). Total num frames: 118890496. Throughput: 0: 1648.0. Samples: 24716304. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:17:02,933][41694] Avg episode reward: [(0, '4.564')] +[2024-11-08 03:17:07,903][42004] Updated weights for policy 0, policy_version 29036 (0.0025) +[2024-11-08 03:17:07,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6706.4). Total num frames: 118931456. Throughput: 0: 1718.9. Samples: 24726598. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:17:07,933][41694] Avg episode reward: [(0, '4.232')] +[2024-11-08 03:17:12,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 6770.2). Total num frames: 118968320. Throughput: 0: 1750.6. Samples: 24738368. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:17:12,934][41694] Avg episode reward: [(0, '4.362')] +[2024-11-08 03:17:13,343][42004] Updated weights for policy 0, policy_version 29046 (0.0038) +[2024-11-08 03:17:17,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 119001088. Throughput: 0: 1754.2. Samples: 24743562. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:17:17,934][41694] Avg episode reward: [(0, '4.494')] +[2024-11-08 03:17:19,144][42004] Updated weights for policy 0, policy_version 29056 (0.0059) +[2024-11-08 03:17:22,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6963.2, 300 sec: 6748.0). Total num frames: 119037952. Throughput: 0: 1738.1. Samples: 24754150. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:17:22,933][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 03:17:24,890][42004] Updated weights for policy 0, policy_version 29066 (0.0030) +[2024-11-08 03:17:30,118][41694] Fps is (10 sec: 6050.2, 60 sec: 6784.3, 300 sec: 6712.1). Total num frames: 119074816. Throughput: 0: 1663.6. Samples: 24765252. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:17:30,119][41694] Avg episode reward: [(0, '4.322')] +[2024-11-08 03:17:32,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6690.1, 300 sec: 6692.5). Total num frames: 119091200. Throughput: 0: 1646.2. Samples: 24766390. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:17:32,933][41694] Avg episode reward: [(0, '4.194')] +[2024-11-08 03:17:32,983][42004] Updated weights for policy 0, policy_version 29076 (0.0025) +[2024-11-08 03:17:37,932][41694] Fps is (10 sec: 7338.6, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 119132160. Throughput: 0: 1634.3. Samples: 24777138. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:17:37,935][41694] Avg episode reward: [(0, '4.500')] +[2024-11-08 03:17:37,943][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000029085_119132160.pth... +[2024-11-08 03:17:38,052][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000028689_117510144.pth +[2024-11-08 03:17:38,375][42004] Updated weights for policy 0, policy_version 29086 (0.0026) +[2024-11-08 03:17:42,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 119169024. Throughput: 0: 1726.6. Samples: 24788684. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:17:42,933][41694] Avg episode reward: [(0, '4.541')] +[2024-11-08 03:17:43,604][42004] Updated weights for policy 0, policy_version 29096 (0.0021) +[2024-11-08 03:17:47,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6826.7, 300 sec: 6787.1). Total num frames: 119209984. Throughput: 0: 1742.2. Samples: 24794704. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:17:47,933][41694] Avg episode reward: [(0, '4.613')] +[2024-11-08 03:17:48,671][42004] Updated weights for policy 0, policy_version 29106 (0.0026) +[2024-11-08 03:17:52,931][41694] Fps is (10 sec: 8192.2, 60 sec: 6894.9, 300 sec: 6817.4). Total num frames: 119250944. Throughput: 0: 1777.8. Samples: 24806598. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:17:52,933][41694] Avg episode reward: [(0, '4.736')] +[2024-11-08 03:17:53,934][42004] Updated weights for policy 0, policy_version 29116 (0.0038) +[2024-11-08 03:17:57,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7168.0, 300 sec: 6831.3). Total num frames: 119287808. Throughput: 0: 1773.6. Samples: 24818182. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:17:57,934][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 03:17:59,703][42004] Updated weights for policy 0, policy_version 29126 (0.0028) +[2024-11-08 03:18:04,837][41694] Fps is (10 sec: 5504.9, 60 sec: 6881.3, 300 sec: 6746.1). Total num frames: 119316480. Throughput: 0: 1693.3. Samples: 24822986. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:18:04,839][41694] Avg episode reward: [(0, '4.310')] +[2024-11-08 03:18:07,931][41694] Fps is (10 sec: 4915.3, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 119336960. Throughput: 0: 1651.2. Samples: 24828454. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:18:07,933][41694] Avg episode reward: [(0, '4.278')] +[2024-11-08 03:18:08,469][42004] Updated weights for policy 0, policy_version 29136 (0.0030) +[2024-11-08 03:18:12,932][41694] Fps is (10 sec: 7083.8, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 119373824. Throughput: 0: 1732.1. Samples: 24839412. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:18:12,934][41694] Avg episode reward: [(0, '4.491')] +[2024-11-08 03:18:13,852][42004] Updated weights for policy 0, policy_version 29146 (0.0021) +[2024-11-08 03:18:17,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6761.9). Total num frames: 119410688. Throughput: 0: 1751.0. Samples: 24845186. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:18:17,933][41694] Avg episode reward: [(0, '4.564')] +[2024-11-08 03:18:19,184][42004] Updated weights for policy 0, policy_version 29156 (0.0026) +[2024-11-08 03:18:22,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6894.9, 300 sec: 6831.5). Total num frames: 119451648. Throughput: 0: 1765.5. Samples: 24856584. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:18:22,933][41694] Avg episode reward: [(0, '4.552')] +[2024-11-08 03:18:24,751][42004] Updated weights for policy 0, policy_version 29166 (0.0029) +[2024-11-08 03:18:27,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7155.6, 300 sec: 6831.3). Total num frames: 119488512. Throughput: 0: 1766.3. Samples: 24868166. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:18:27,934][41694] Avg episode reward: [(0, '4.429')] +[2024-11-08 03:18:30,045][42004] Updated weights for policy 0, policy_version 29176 (0.0028) +[2024-11-08 03:18:32,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7236.3, 300 sec: 6831.3). Total num frames: 119525376. Throughput: 0: 1757.7. Samples: 24873800. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:18:32,933][41694] Avg episode reward: [(0, '4.260')] +[2024-11-08 03:18:35,352][42004] Updated weights for policy 0, policy_version 29186 (0.0031) +[2024-11-08 03:18:39,438][41694] Fps is (10 sec: 5695.5, 60 sec: 6859.2, 300 sec: 6769.0). Total num frames: 119554048. Throughput: 0: 1683.9. Samples: 24884910. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:18:39,440][41694] Avg episode reward: [(0, '4.330')] +[2024-11-08 03:18:42,932][41694] Fps is (10 sec: 5324.5, 60 sec: 6826.6, 300 sec: 6761.9). Total num frames: 119578624. Throughput: 0: 1616.4. Samples: 24890922. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:18:42,935][41694] Avg episode reward: [(0, '4.414')] +[2024-11-08 03:18:43,756][42004] Updated weights for policy 0, policy_version 29196 (0.0037) +[2024-11-08 03:18:47,931][41694] Fps is (10 sec: 7234.0, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 119615488. Throughput: 0: 1699.4. Samples: 24896220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:18:47,936][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 03:18:49,234][42004] Updated weights for policy 0, policy_version 29206 (0.0029) +[2024-11-08 03:18:52,932][41694] Fps is (10 sec: 7782.7, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 119656448. Throughput: 0: 1762.8. Samples: 24907780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:18:52,933][41694] Avg episode reward: [(0, '4.373')] +[2024-11-08 03:18:54,575][42004] Updated weights for policy 0, policy_version 29216 (0.0025) +[2024-11-08 03:18:57,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 119693312. Throughput: 0: 1780.4. Samples: 24919532. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:18:57,933][41694] Avg episode reward: [(0, '4.382')] +[2024-11-08 03:19:00,019][42004] Updated weights for policy 0, policy_version 29226 (0.0031) +[2024-11-08 03:19:02,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7121.0, 300 sec: 6845.2). Total num frames: 119730176. Throughput: 0: 1771.7. Samples: 24924912. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:19:02,934][41694] Avg episode reward: [(0, '4.251')] +[2024-11-08 03:19:05,717][42004] Updated weights for policy 0, policy_version 29236 (0.0030) +[2024-11-08 03:19:07,932][41694] Fps is (10 sec: 6963.1, 60 sec: 7099.7, 300 sec: 6831.3). Total num frames: 119762944. Throughput: 0: 1757.6. Samples: 24935678. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:19:07,934][41694] Avg episode reward: [(0, '4.539')] +[2024-11-08 03:19:14,121][41694] Fps is (10 sec: 5125.0, 60 sec: 6760.9, 300 sec: 6748.6). Total num frames: 119787520. Throughput: 0: 1580.1. Samples: 24941150. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:19:14,123][41694] Avg episode reward: [(0, '4.507')] +[2024-11-08 03:19:14,196][42004] Updated weights for policy 0, policy_version 29246 (0.0041) +[2024-11-08 03:19:17,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 119816192. Throughput: 0: 1624.4. Samples: 24946900. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:19:17,934][41694] Avg episode reward: [(0, '4.323')] +[2024-11-08 03:19:20,099][42004] Updated weights for policy 0, policy_version 29256 (0.0029) +[2024-11-08 03:19:22,932][41694] Fps is (10 sec: 7438.0, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 119853056. Throughput: 0: 1671.8. Samples: 24957622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:19:22,933][41694] Avg episode reward: [(0, '4.252')] +[2024-11-08 03:19:25,538][42004] Updated weights for policy 0, policy_version 29266 (0.0025) +[2024-11-08 03:19:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 119889920. Throughput: 0: 1727.4. Samples: 24968656. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:19:27,933][41694] Avg episode reward: [(0, '4.418')] +[2024-11-08 03:19:30,943][42004] Updated weights for policy 0, policy_version 29276 (0.0029) +[2024-11-08 03:19:32,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6817.4). Total num frames: 119926784. Throughput: 0: 1733.9. Samples: 24974244. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:19:32,933][41694] Avg episode reward: [(0, '4.379')] +[2024-11-08 03:19:36,609][42004] Updated weights for policy 0, policy_version 29286 (0.0027) +[2024-11-08 03:19:37,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7002.5, 300 sec: 6817.4). Total num frames: 119963648. Throughput: 0: 1722.7. Samples: 24985304. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:19:37,933][41694] Avg episode reward: [(0, '4.525')] +[2024-11-08 03:19:37,952][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000029288_119963648.pth... +[2024-11-08 03:19:38,077][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000028890_118333440.pth +[2024-11-08 03:19:42,070][42004] Updated weights for policy 0, policy_version 29296 (0.0034) +[2024-11-08 03:19:42,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 6817.4). Total num frames: 120000512. Throughput: 0: 1709.7. Samples: 24996468. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:19:42,933][41694] Avg episode reward: [(0, '4.355')] +[2024-11-08 03:19:48,356][41694] Fps is (10 sec: 5500.9, 60 sec: 6710.9, 300 sec: 6738.3). Total num frames: 120020992. Throughput: 0: 1679.4. Samples: 25001200. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:19:48,362][41694] Avg episode reward: [(0, '4.477')] +[2024-11-08 03:19:50,480][42004] Updated weights for policy 0, policy_version 29306 (0.0030) +[2024-11-08 03:19:52,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 120053760. Throughput: 0: 1599.5. Samples: 25007656. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:19:52,934][41694] Avg episode reward: [(0, '4.346')] +[2024-11-08 03:19:56,140][42004] Updated weights for policy 0, policy_version 29316 (0.0043) +[2024-11-08 03:19:57,932][41694] Fps is (10 sec: 7272.0, 60 sec: 6621.8, 300 sec: 6761.9). Total num frames: 120090624. Throughput: 0: 1776.0. Samples: 25018960. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:19:57,935][41694] Avg episode reward: [(0, '4.554')] +[2024-11-08 03:20:01,692][42004] Updated weights for policy 0, policy_version 29326 (0.0033) +[2024-11-08 03:20:02,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6761.9). Total num frames: 120127488. Throughput: 0: 1722.0. Samples: 25024388. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:20:02,937][41694] Avg episode reward: [(0, '4.482')] +[2024-11-08 03:20:07,234][42004] Updated weights for policy 0, policy_version 29336 (0.0032) +[2024-11-08 03:20:07,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6690.2, 300 sec: 6817.4). Total num frames: 120164352. Throughput: 0: 1724.0. Samples: 25035202. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:20:07,933][41694] Avg episode reward: [(0, '4.352')] +[2024-11-08 03:20:12,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6964.7, 300 sec: 6817.4). Total num frames: 120197120. Throughput: 0: 1726.3. Samples: 25046340. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:20:12,933][41694] Avg episode reward: [(0, '4.400')] +[2024-11-08 03:20:12,995][42004] Updated weights for policy 0, policy_version 29346 (0.0034) +[2024-11-08 03:20:17,934][41694] Fps is (10 sec: 6551.9, 60 sec: 6894.6, 300 sec: 6817.4). Total num frames: 120229888. Throughput: 0: 1702.5. Samples: 25050860. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:20:17,937][41694] Avg episode reward: [(0, '4.406')] +[2024-11-08 03:20:19,499][42004] Updated weights for policy 0, policy_version 29356 (0.0025) +[2024-11-08 03:20:22,963][41694] Fps is (10 sec: 4900.0, 60 sec: 6550.2, 300 sec: 6733.4). Total num frames: 120246272. Throughput: 0: 1564.0. Samples: 25055732. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:20:22,965][41694] Avg episode reward: [(0, '4.513')] +[2024-11-08 03:20:27,630][42004] Updated weights for policy 0, policy_version 29366 (0.0028) +[2024-11-08 03:20:27,932][41694] Fps is (10 sec: 5326.1, 60 sec: 6553.6, 300 sec: 6748.0). Total num frames: 120283136. Throughput: 0: 1570.0. Samples: 25067118. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:20:27,933][41694] Avg episode reward: [(0, '4.259')] +[2024-11-08 03:20:32,932][41694] Fps is (10 sec: 7395.8, 60 sec: 6553.6, 300 sec: 6748.0). Total num frames: 120320000. Throughput: 0: 1603.8. Samples: 25072688. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:20:32,933][41694] Avg episode reward: [(0, '4.423')] +[2024-11-08 03:20:32,940][42004] Updated weights for policy 0, policy_version 29376 (0.0044) +[2024-11-08 03:20:37,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6621.9, 300 sec: 6761.9). Total num frames: 120360960. Throughput: 0: 1702.4. Samples: 25084266. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:20:37,933][41694] Avg episode reward: [(0, '4.441')] +[2024-11-08 03:20:38,316][42004] Updated weights for policy 0, policy_version 29386 (0.0033) +[2024-11-08 03:20:42,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6621.9, 300 sec: 6817.4). Total num frames: 120397824. Throughput: 0: 1702.3. Samples: 25095564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:20:42,933][41694] Avg episode reward: [(0, '4.363')] +[2024-11-08 03:20:43,962][42004] Updated weights for policy 0, policy_version 29396 (0.0033) +[2024-11-08 03:20:47,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6944.1, 300 sec: 6831.3). Total num frames: 120434688. Throughput: 0: 1701.2. Samples: 25100942. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:20:47,933][41694] Avg episode reward: [(0, '4.470')] +[2024-11-08 03:20:49,581][42004] Updated weights for policy 0, policy_version 29406 (0.0027) +[2024-11-08 03:20:52,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6894.9, 300 sec: 6817.4). Total num frames: 120467456. Throughput: 0: 1698.7. Samples: 25111644. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:20:52,934][41694] Avg episode reward: [(0, '4.545')] +[2024-11-08 03:20:57,932][41694] Fps is (10 sec: 4914.9, 60 sec: 6553.6, 300 sec: 6748.0). Total num frames: 120483840. Throughput: 0: 1601.0. Samples: 25118386. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:20:57,934][41694] Avg episode reward: [(0, '4.627')] +[2024-11-08 03:20:58,343][42004] Updated weights for policy 0, policy_version 29416 (0.0032) +[2024-11-08 03:21:02,933][41694] Fps is (10 sec: 5324.5, 60 sec: 6553.5, 300 sec: 6748.0). Total num frames: 120520704. Throughput: 0: 1593.5. Samples: 25122566. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:21:02,940][41694] Avg episode reward: [(0, '4.583')] +[2024-11-08 03:21:04,000][42004] Updated weights for policy 0, policy_version 29426 (0.0023) +[2024-11-08 03:21:07,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6485.3, 300 sec: 6748.0). Total num frames: 120553472. Throughput: 0: 1719.3. Samples: 25133048. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:21:07,933][41694] Avg episode reward: [(0, '4.431')] +[2024-11-08 03:21:09,645][42004] Updated weights for policy 0, policy_version 29436 (0.0024) +[2024-11-08 03:21:12,931][41694] Fps is (10 sec: 6963.8, 60 sec: 6553.6, 300 sec: 6748.0). Total num frames: 120590336. Throughput: 0: 1701.8. Samples: 25143698. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:21:12,933][41694] Avg episode reward: [(0, '4.468')] +[2024-11-08 03:21:15,582][42004] Updated weights for policy 0, policy_version 29446 (0.0031) +[2024-11-08 03:21:17,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6622.1, 300 sec: 6803.5). Total num frames: 120627200. Throughput: 0: 1698.0. Samples: 25149098. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:21:17,933][41694] Avg episode reward: [(0, '4.573')] +[2024-11-08 03:21:21,015][42004] Updated weights for policy 0, policy_version 29456 (0.0048) +[2024-11-08 03:21:22,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6898.5, 300 sec: 6803.5). Total num frames: 120659968. Throughput: 0: 1691.5. Samples: 25160386. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:21:22,936][41694] Avg episode reward: [(0, '4.677')] +[2024-11-08 03:21:27,141][42004] Updated weights for policy 0, policy_version 29466 (0.0038) +[2024-11-08 03:21:27,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6894.9, 300 sec: 6803.5). Total num frames: 120696832. Throughput: 0: 1663.4. Samples: 25170416. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:21:27,933][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 03:21:32,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6553.6, 300 sec: 6734.1). Total num frames: 120713216. Throughput: 0: 1647.6. Samples: 25175084. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:21:32,934][41694] Avg episode reward: [(0, '4.270')] +[2024-11-08 03:21:35,586][42004] Updated weights for policy 0, policy_version 29476 (0.0032) +[2024-11-08 03:21:37,932][41694] Fps is (10 sec: 5324.5, 60 sec: 6485.3, 300 sec: 6734.1). Total num frames: 120750080. Throughput: 0: 1558.1. Samples: 25181758. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:21:37,934][41694] Avg episode reward: [(0, '4.336')] +[2024-11-08 03:21:37,949][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000029480_120750080.pth... +[2024-11-08 03:21:38,073][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000029085_119132160.pth +[2024-11-08 03:21:41,328][42004] Updated weights for policy 0, policy_version 29486 (0.0025) +[2024-11-08 03:21:42,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6417.1, 300 sec: 6720.2). Total num frames: 120782848. Throughput: 0: 1649.7. Samples: 25192622. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:21:42,937][41694] Avg episode reward: [(0, '4.408')] +[2024-11-08 03:21:46,592][42004] Updated weights for policy 0, policy_version 29496 (0.0031) +[2024-11-08 03:21:47,932][41694] Fps is (10 sec: 7373.2, 60 sec: 6485.3, 300 sec: 6734.1). Total num frames: 120823808. Throughput: 0: 1680.9. Samples: 25198204. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:21:47,937][41694] Avg episode reward: [(0, '4.409')] +[2024-11-08 03:21:51,920][42004] Updated weights for policy 0, policy_version 29506 (0.0039) +[2024-11-08 03:21:52,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6553.6, 300 sec: 6789.6). Total num frames: 120860672. Throughput: 0: 1704.8. Samples: 25209762. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:21:52,933][41694] Avg episode reward: [(0, '4.322')] +[2024-11-08 03:21:57,526][42004] Updated weights for policy 0, policy_version 29516 (0.0035) +[2024-11-08 03:21:57,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6895.0, 300 sec: 6803.5). Total num frames: 120897536. Throughput: 0: 1720.3. Samples: 25221110. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:21:57,933][41694] Avg episode reward: [(0, '4.512')] +[2024-11-08 03:22:02,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6826.7, 300 sec: 6775.8). Total num frames: 120930304. Throughput: 0: 1704.2. Samples: 25225786. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 03:22:02,936][41694] Avg episode reward: [(0, '4.399')] +[2024-11-08 03:22:04,007][42004] Updated weights for policy 0, policy_version 29526 (0.0040) +[2024-11-08 03:22:07,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6553.6, 300 sec: 6706.3). Total num frames: 120946688. Throughput: 0: 1613.9. Samples: 25233010. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 03:22:07,935][41694] Avg episode reward: [(0, '4.464')] +[2024-11-08 03:22:12,071][42004] Updated weights for policy 0, policy_version 29536 (0.0041) +[2024-11-08 03:22:12,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 120983552. Throughput: 0: 1597.6. Samples: 25242310. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 03:22:12,933][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 03:22:17,334][42004] Updated weights for policy 0, policy_version 29546 (0.0026) +[2024-11-08 03:22:17,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 121024512. Throughput: 0: 1621.8. Samples: 25248064. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 03:22:17,933][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 03:22:22,892][42004] Updated weights for policy 0, policy_version 29556 (0.0023) +[2024-11-08 03:22:22,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6690.1, 300 sec: 6784.4). Total num frames: 121061376. Throughput: 0: 1731.0. Samples: 25259654. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:22:22,933][41694] Avg episode reward: [(0, '4.551')] +[2024-11-08 03:22:27,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6803.5). Total num frames: 121098240. Throughput: 0: 1732.0. Samples: 25270562. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:22:27,933][41694] Avg episode reward: [(0, '4.357')] +[2024-11-08 03:22:28,402][42004] Updated weights for policy 0, policy_version 29566 (0.0028) +[2024-11-08 03:22:32,933][41694] Fps is (10 sec: 7372.0, 60 sec: 7031.3, 300 sec: 6789.6). Total num frames: 121135104. Throughput: 0: 1729.5. Samples: 25276034. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:22:32,935][41694] Avg episode reward: [(0, '4.275')] +[2024-11-08 03:22:34,221][42004] Updated weights for policy 0, policy_version 29576 (0.0031) +[2024-11-08 03:22:37,932][41694] Fps is (10 sec: 6553.2, 60 sec: 6894.9, 300 sec: 6761.9). Total num frames: 121163776. Throughput: 0: 1693.7. Samples: 25285980. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:22:37,934][41694] Avg episode reward: [(0, '4.453')] +[2024-11-08 03:22:42,555][42004] Updated weights for policy 0, policy_version 29586 (0.2268) +[2024-11-08 03:22:42,932][41694] Fps is (10 sec: 4915.8, 60 sec: 6690.1, 300 sec: 6692.4). Total num frames: 121184256. Throughput: 0: 1586.8. Samples: 25292518. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:22:42,934][41694] Avg episode reward: [(0, '4.624')] +[2024-11-08 03:22:47,931][41694] Fps is (10 sec: 5734.7, 60 sec: 6621.9, 300 sec: 6678.6). Total num frames: 121221120. Throughput: 0: 1603.9. Samples: 25297962. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:22:47,933][41694] Avg episode reward: [(0, '4.504')] +[2024-11-08 03:22:48,098][42004] Updated weights for policy 0, policy_version 29596 (0.0030) +[2024-11-08 03:22:52,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6690.1, 300 sec: 6692.4). Total num frames: 121262080. Throughput: 0: 1699.4. Samples: 25309484. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:22:52,934][41694] Avg episode reward: [(0, '4.524')] +[2024-11-08 03:22:53,415][42004] Updated weights for policy 0, policy_version 29606 (0.0025) +[2024-11-08 03:22:57,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6690.1, 300 sec: 6763.9). Total num frames: 121298944. Throughput: 0: 1751.6. Samples: 25321132. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:22:57,933][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 03:22:58,710][42004] Updated weights for policy 0, policy_version 29616 (0.0032) +[2024-11-08 03:23:02,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 121335808. Throughput: 0: 1746.3. Samples: 25326648. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:23:02,934][41694] Avg episode reward: [(0, '4.451')] +[2024-11-08 03:23:04,600][42004] Updated weights for policy 0, policy_version 29626 (0.0034) +[2024-11-08 03:23:07,932][41694] Fps is (10 sec: 6963.1, 60 sec: 7031.5, 300 sec: 6761.9). Total num frames: 121368576. Throughput: 0: 1721.2. Samples: 25337110. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:23:07,934][41694] Avg episode reward: [(0, '4.410')] +[2024-11-08 03:23:10,842][42004] Updated weights for policy 0, policy_version 29636 (0.0032) +[2024-11-08 03:23:12,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6963.2, 300 sec: 6748.0). Total num frames: 121401344. Throughput: 0: 1699.0. Samples: 25347016. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:23:12,934][41694] Avg episode reward: [(0, '4.543')] +[2024-11-08 03:23:17,932][41694] Fps is (10 sec: 5324.5, 60 sec: 6621.8, 300 sec: 6678.5). Total num frames: 121421824. Throughput: 0: 1636.2. Samples: 25349660. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:23:17,935][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 03:23:18,910][42004] Updated weights for policy 0, policy_version 29646 (0.0029) +[2024-11-08 03:23:22,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6621.9, 300 sec: 6678.6). Total num frames: 121458688. Throughput: 0: 1617.3. Samples: 25358756. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:23:22,933][41694] Avg episode reward: [(0, '4.615')] +[2024-11-08 03:23:24,555][42004] Updated weights for policy 0, policy_version 29656 (0.0037) +[2024-11-08 03:23:27,932][41694] Fps is (10 sec: 7373.2, 60 sec: 6621.9, 300 sec: 6678.6). Total num frames: 121495552. Throughput: 0: 1722.8. Samples: 25370046. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:23:27,934][41694] Avg episode reward: [(0, '4.420')] +[2024-11-08 03:23:30,061][42004] Updated weights for policy 0, policy_version 29666 (0.0028) +[2024-11-08 03:23:32,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6622.0, 300 sec: 6740.8). Total num frames: 121532416. Throughput: 0: 1724.0. Samples: 25375540. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:23:32,934][41694] Avg episode reward: [(0, '4.398')] +[2024-11-08 03:23:35,549][42004] Updated weights for policy 0, policy_version 29676 (0.0029) +[2024-11-08 03:23:37,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.5, 300 sec: 6748.0). Total num frames: 121569280. Throughput: 0: 1716.1. Samples: 25386710. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:23:37,933][41694] Avg episode reward: [(0, '4.695')] +[2024-11-08 03:23:37,943][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000029680_121569280.pth... +[2024-11-08 03:23:38,058][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000029288_119963648.pth +[2024-11-08 03:23:41,495][42004] Updated weights for policy 0, policy_version 29686 (0.0034) +[2024-11-08 03:23:42,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6963.2, 300 sec: 6734.1). Total num frames: 121602048. Throughput: 0: 1680.2. Samples: 25396742. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:23:42,934][41694] Avg episode reward: [(0, '4.608')] +[2024-11-08 03:23:47,430][42004] Updated weights for policy 0, policy_version 29696 (0.0026) +[2024-11-08 03:23:47,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6894.9, 300 sec: 6706.3). Total num frames: 121634816. Throughput: 0: 1671.3. Samples: 25401856. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:23:47,933][41694] Avg episode reward: [(0, '4.279')] +[2024-11-08 03:23:52,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 121655296. Throughput: 0: 1586.0. Samples: 25408478. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:23:52,936][41694] Avg episode reward: [(0, '4.277')] +[2024-11-08 03:23:55,317][42004] Updated weights for policy 0, policy_version 29706 (0.0027) +[2024-11-08 03:23:57,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 121692160. Throughput: 0: 1608.9. Samples: 25419416. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:23:57,933][41694] Avg episode reward: [(0, '4.369')] +[2024-11-08 03:24:00,695][42004] Updated weights for policy 0, policy_version 29716 (0.0025) +[2024-11-08 03:24:02,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 121729024. Throughput: 0: 1683.9. Samples: 25425436. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:24:02,933][41694] Avg episode reward: [(0, '4.239')] +[2024-11-08 03:24:06,218][42004] Updated weights for policy 0, policy_version 29726 (0.0022) +[2024-11-08 03:24:07,933][41694] Fps is (10 sec: 7781.9, 60 sec: 6690.0, 300 sec: 6747.4). Total num frames: 121769984. Throughput: 0: 1729.7. Samples: 25436594. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:24:07,935][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 03:24:11,530][42004] Updated weights for policy 0, policy_version 29736 (0.0031) +[2024-11-08 03:24:12,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 121806848. Throughput: 0: 1735.5. Samples: 25448142. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:24:12,933][41694] Avg episode reward: [(0, '4.375')] +[2024-11-08 03:24:17,800][42004] Updated weights for policy 0, policy_version 29746 (0.0036) +[2024-11-08 03:24:17,932][41694] Fps is (10 sec: 6963.7, 60 sec: 6963.2, 300 sec: 6734.1). Total num frames: 121839616. Throughput: 0: 1711.9. Samples: 25452578. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:24:17,934][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 03:24:25,082][41694] Fps is (10 sec: 5393.8, 60 sec: 6656.4, 300 sec: 6671.6). Total num frames: 121872384. Throughput: 0: 1620.1. Samples: 25463096. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:24:25,087][41694] Avg episode reward: [(0, '4.305')] +[2024-11-08 03:24:25,833][42004] Updated weights for policy 0, policy_version 29756 (0.0026) +[2024-11-08 03:24:27,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6621.9, 300 sec: 6664.7). Total num frames: 121892864. Throughput: 0: 1627.3. Samples: 25469970. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:24:27,934][41694] Avg episode reward: [(0, '4.340')] +[2024-11-08 03:24:31,237][42004] Updated weights for policy 0, policy_version 29766 (0.0023) +[2024-11-08 03:24:32,931][41694] Fps is (10 sec: 7827.2, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 121933824. Throughput: 0: 1637.1. Samples: 25475524. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:24:32,933][41694] Avg episode reward: [(0, '4.349')] +[2024-11-08 03:24:36,482][42004] Updated weights for policy 0, policy_version 29776 (0.0032) +[2024-11-08 03:24:37,933][41694] Fps is (10 sec: 7781.0, 60 sec: 6689.9, 300 sec: 6678.5). Total num frames: 121970688. Throughput: 0: 1749.0. Samples: 25487184. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:24:37,935][41694] Avg episode reward: [(0, '4.512')] +[2024-11-08 03:24:41,906][42004] Updated weights for policy 0, policy_version 29786 (0.0028) +[2024-11-08 03:24:42,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6743.8). Total num frames: 122007552. Throughput: 0: 1758.8. Samples: 25498562. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 03:24:42,933][41694] Avg episode reward: [(0, '4.511')] +[2024-11-08 03:24:47,461][42004] Updated weights for policy 0, policy_version 29796 (0.0028) +[2024-11-08 03:24:47,932][41694] Fps is (10 sec: 7374.0, 60 sec: 6826.6, 300 sec: 6748.0). Total num frames: 122044416. Throughput: 0: 1752.7. Samples: 25504308. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 03:24:47,934][41694] Avg episode reward: [(0, '4.523')] +[2024-11-08 03:24:52,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7099.7, 300 sec: 6748.0). Total num frames: 122081280. Throughput: 0: 1725.5. Samples: 25514240. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 03:24:52,933][41694] Avg episode reward: [(0, '4.315')] +[2024-11-08 03:24:53,473][42004] Updated weights for policy 0, policy_version 29806 (0.0035) +[2024-11-08 03:24:59,585][41694] Fps is (10 sec: 5623.9, 60 sec: 6776.5, 300 sec: 6682.8). Total num frames: 122109952. Throughput: 0: 1651.9. Samples: 25525208. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 03:24:59,586][41694] Avg episode reward: [(0, '4.491')] +[2024-11-08 03:25:01,609][42004] Updated weights for policy 0, policy_version 29816 (0.0028) +[2024-11-08 03:25:02,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6758.4, 300 sec: 6678.6). Total num frames: 122134528. Throughput: 0: 1643.7. Samples: 25526546. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:25:02,933][41694] Avg episode reward: [(0, '4.627')] +[2024-11-08 03:25:07,204][42004] Updated weights for policy 0, policy_version 29826 (0.0042) +[2024-11-08 03:25:07,931][41694] Fps is (10 sec: 7361.1, 60 sec: 6690.2, 300 sec: 6692.5). Total num frames: 122171392. Throughput: 0: 1724.2. Samples: 25536976. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:25:07,934][41694] Avg episode reward: [(0, '4.262')] +[2024-11-08 03:25:12,374][42004] Updated weights for policy 0, policy_version 29836 (0.0026) +[2024-11-08 03:25:12,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6758.4, 300 sec: 6720.3). Total num frames: 122212352. Throughput: 0: 1753.0. Samples: 25548856. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:25:12,937][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 03:25:17,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6758.4, 300 sec: 6776.5). Total num frames: 122245120. Throughput: 0: 1740.0. Samples: 25553824. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:25:17,933][41694] Avg episode reward: [(0, '4.346')] +[2024-11-08 03:25:18,402][42004] Updated weights for policy 0, policy_version 29846 (0.0035) +[2024-11-08 03:25:22,931][41694] Fps is (10 sec: 6553.7, 60 sec: 7009.6, 300 sec: 6761.9). Total num frames: 122277888. Throughput: 0: 1714.3. Samples: 25564326. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:25:22,933][41694] Avg episode reward: [(0, '4.537')] +[2024-11-08 03:25:24,533][42004] Updated weights for policy 0, policy_version 29856 (0.0026) +[2024-11-08 03:25:27,932][41694] Fps is (10 sec: 6963.2, 60 sec: 7031.5, 300 sec: 6761.9). Total num frames: 122314752. Throughput: 0: 1690.1. Samples: 25574616. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:25:27,933][41694] Avg episode reward: [(0, '4.300')] +[2024-11-08 03:25:30,258][42004] Updated weights for policy 0, policy_version 29866 (0.0026) +[2024-11-08 03:25:34,083][41694] Fps is (10 sec: 5510.0, 60 sec: 6631.2, 300 sec: 6680.3). Total num frames: 122339328. Throughput: 0: 1639.7. Samples: 25579982. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:25:34,096][41694] Avg episode reward: [(0, '4.334')] +[2024-11-08 03:25:37,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6622.0, 300 sec: 6678.6). Total num frames: 122368000. Throughput: 0: 1591.0. Samples: 25585834. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:25:37,938][41694] Avg episode reward: [(0, '4.416')] +[2024-11-08 03:25:37,951][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000029875_122368000.pth... +[2024-11-08 03:25:38,065][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000029480_120750080.pth +[2024-11-08 03:25:38,474][42004] Updated weights for policy 0, policy_version 29876 (0.0027) +[2024-11-08 03:25:42,931][41694] Fps is (10 sec: 7405.6, 60 sec: 6621.9, 300 sec: 6678.6). Total num frames: 122404864. Throughput: 0: 1663.1. Samples: 25597296. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:25:42,933][41694] Avg episode reward: [(0, '4.444')] +[2024-11-08 03:25:43,774][42004] Updated weights for policy 0, policy_version 29886 (0.0024) +[2024-11-08 03:25:47,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6692.4). Total num frames: 122441728. Throughput: 0: 1697.5. Samples: 25602932. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:25:47,934][41694] Avg episode reward: [(0, '4.358')] +[2024-11-08 03:25:49,167][42004] Updated weights for policy 0, policy_version 29896 (0.0025) +[2024-11-08 03:25:52,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 122482688. Throughput: 0: 1718.3. Samples: 25614302. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:25:52,933][41694] Avg episode reward: [(0, '4.474')] +[2024-11-08 03:25:54,884][42004] Updated weights for policy 0, policy_version 29906 (0.0041) +[2024-11-08 03:25:57,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6879.7, 300 sec: 6748.0). Total num frames: 122511360. Throughput: 0: 1683.6. Samples: 25624616. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:25:57,935][41694] Avg episode reward: [(0, '4.350')] +[2024-11-08 03:26:01,146][42004] Updated weights for policy 0, policy_version 29916 (0.0039) +[2024-11-08 03:26:02,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6894.9, 300 sec: 6761.9). Total num frames: 122548224. Throughput: 0: 1685.1. Samples: 25629652. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:26:02,934][41694] Avg episode reward: [(0, '4.459')] +[2024-11-08 03:26:08,583][41694] Fps is (10 sec: 5383.9, 60 sec: 6550.8, 300 sec: 6691.6). Total num frames: 122568704. Throughput: 0: 1652.7. Samples: 25639772. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:26:08,586][41694] Avg episode reward: [(0, '4.560')] +[2024-11-08 03:26:09,380][42004] Updated weights for policy 0, policy_version 29926 (0.0028) +[2024-11-08 03:26:12,932][41694] Fps is (10 sec: 4915.2, 60 sec: 6417.1, 300 sec: 6678.6). Total num frames: 122597376. Throughput: 0: 1577.6. Samples: 25645606. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:26:12,933][41694] Avg episode reward: [(0, '4.414')] +[2024-11-08 03:26:15,686][42004] Updated weights for policy 0, policy_version 29936 (0.0042) +[2024-11-08 03:26:17,932][41694] Fps is (10 sec: 7010.0, 60 sec: 6485.3, 300 sec: 6692.4). Total num frames: 122634240. Throughput: 0: 1612.1. Samples: 25650674. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:26:17,933][41694] Avg episode reward: [(0, '4.331')] +[2024-11-08 03:26:20,870][42004] Updated weights for policy 0, policy_version 29946 (0.0022) +[2024-11-08 03:26:22,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6553.6, 300 sec: 6692.5). Total num frames: 122671104. Throughput: 0: 1704.3. Samples: 25662526. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:26:22,933][41694] Avg episode reward: [(0, '4.450')] +[2024-11-08 03:26:26,207][42004] Updated weights for policy 0, policy_version 29956 (0.0033) +[2024-11-08 03:26:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6553.6, 300 sec: 6761.9). Total num frames: 122707968. Throughput: 0: 1702.4. Samples: 25673904. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:26:27,933][41694] Avg episode reward: [(0, '4.486')] +[2024-11-08 03:26:32,932][41694] Fps is (10 sec: 6553.3, 60 sec: 6751.3, 300 sec: 6734.1). Total num frames: 122736640. Throughput: 0: 1669.3. Samples: 25678052. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:26:32,934][41694] Avg episode reward: [(0, '4.406')] +[2024-11-08 03:26:33,015][42004] Updated weights for policy 0, policy_version 29966 (0.0043) +[2024-11-08 03:26:37,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 122773504. Throughput: 0: 1646.7. Samples: 25688402. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:26:37,934][41694] Avg episode reward: [(0, '4.380')] +[2024-11-08 03:26:38,537][42004] Updated weights for policy 0, policy_version 29976 (0.0028) +[2024-11-08 03:26:43,005][41694] Fps is (10 sec: 6099.6, 60 sec: 6545.6, 300 sec: 6690.8). Total num frames: 122798080. Throughput: 0: 1534.8. Samples: 25693796. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:26:43,008][41694] Avg episode reward: [(0, '4.492')] +[2024-11-08 03:26:46,584][42004] Updated weights for policy 0, policy_version 29986 (0.0030) +[2024-11-08 03:26:47,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6485.3, 300 sec: 6678.6). Total num frames: 122830848. Throughput: 0: 1569.4. Samples: 25700274. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:26:47,939][41694] Avg episode reward: [(0, '4.545')] +[2024-11-08 03:26:51,867][42004] Updated weights for policy 0, policy_version 29996 (0.0026) +[2024-11-08 03:26:52,931][41694] Fps is (10 sec: 7427.3, 60 sec: 6485.4, 300 sec: 6692.4). Total num frames: 122871808. Throughput: 0: 1622.0. Samples: 25711704. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:26:52,933][41694] Avg episode reward: [(0, '4.665')] +[2024-11-08 03:26:56,954][42004] Updated weights for policy 0, policy_version 30006 (0.0026) +[2024-11-08 03:26:57,932][41694] Fps is (10 sec: 7782.1, 60 sec: 6621.8, 300 sec: 6706.3). Total num frames: 122908672. Throughput: 0: 1737.1. Samples: 25723778. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:26:57,934][41694] Avg episode reward: [(0, '4.534')] +[2024-11-08 03:27:02,529][42004] Updated weights for policy 0, policy_version 30016 (0.0028) +[2024-11-08 03:27:02,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6621.9, 300 sec: 6775.8). Total num frames: 122945536. Throughput: 0: 1752.5. Samples: 25729538. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:27:02,934][41694] Avg episode reward: [(0, '4.552')] +[2024-11-08 03:27:07,932][41694] Fps is (10 sec: 6963.4, 60 sec: 6901.6, 300 sec: 6761.9). Total num frames: 122978304. Throughput: 0: 1713.4. Samples: 25739628. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:27:07,933][41694] Avg episode reward: [(0, '4.343')] +[2024-11-08 03:27:08,483][42004] Updated weights for policy 0, policy_version 30026 (0.0041) +[2024-11-08 03:27:12,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6963.2, 300 sec: 6748.0). Total num frames: 123015168. Throughput: 0: 1699.5. Samples: 25750380. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:27:12,933][41694] Avg episode reward: [(0, '4.465')] +[2024-11-08 03:27:14,283][42004] Updated weights for policy 0, policy_version 30036 (0.0032) +[2024-11-08 03:27:17,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6690.1, 300 sec: 6692.4). Total num frames: 123035648. Throughput: 0: 1721.9. Samples: 25755538. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 03:27:17,933][41694] Avg episode reward: [(0, '4.498')] +[2024-11-08 03:27:22,573][42004] Updated weights for policy 0, policy_version 30046 (0.0036) +[2024-11-08 03:27:22,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6621.9, 300 sec: 6678.6). Total num frames: 123068416. Throughput: 0: 1630.0. Samples: 25761752. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 03:27:22,934][41694] Avg episode reward: [(0, '4.598')] +[2024-11-08 03:27:27,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6621.9, 300 sec: 6678.6). Total num frames: 123105280. Throughput: 0: 1753.1. Samples: 25772556. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 03:27:27,933][41694] Avg episode reward: [(0, '4.346')] +[2024-11-08 03:27:28,264][42004] Updated weights for policy 0, policy_version 30056 (0.0028) +[2024-11-08 03:27:32,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 123142144. Throughput: 0: 1723.2. Samples: 25777818. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 03:27:32,935][41694] Avg episode reward: [(0, '4.542')] +[2024-11-08 03:27:34,010][42004] Updated weights for policy 0, policy_version 30066 (0.0023) +[2024-11-08 03:27:37,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 123170816. Throughput: 0: 1697.9. Samples: 25788110. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:27:37,933][41694] Avg episode reward: [(0, '4.430')] +[2024-11-08 03:27:37,952][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000030071_123170816.pth... +[2024-11-08 03:27:38,117][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000029680_121569280.pth +[2024-11-08 03:27:40,722][42004] Updated weights for policy 0, policy_version 30076 (0.0035) +[2024-11-08 03:27:42,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6766.7, 300 sec: 6720.2). Total num frames: 123203584. Throughput: 0: 1642.9. Samples: 25797706. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:27:42,933][41694] Avg episode reward: [(0, '4.540')] +[2024-11-08 03:27:46,315][42004] Updated weights for policy 0, policy_version 30086 (0.0021) +[2024-11-08 03:27:47,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6706.3). Total num frames: 123240448. Throughput: 0: 1635.3. Samples: 25803126. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:27:47,935][41694] Avg episode reward: [(0, '4.515')] +[2024-11-08 03:27:52,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 123265024. Throughput: 0: 1599.2. Samples: 25811590. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:27:52,933][41694] Avg episode reward: [(0, '4.590')] +[2024-11-08 03:27:53,486][42004] Updated weights for policy 0, policy_version 30096 (0.0032) +[2024-11-08 03:27:57,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6621.9, 300 sec: 6678.6). Total num frames: 123305984. Throughput: 0: 1606.2. Samples: 25822658. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:27:57,933][41694] Avg episode reward: [(0, '4.377')] +[2024-11-08 03:27:58,806][42004] Updated weights for policy 0, policy_version 30106 (0.0020) +[2024-11-08 03:28:02,932][41694] Fps is (10 sec: 7782.1, 60 sec: 6621.8, 300 sec: 6692.4). Total num frames: 123342848. Throughput: 0: 1620.9. Samples: 25828478. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:28:02,934][41694] Avg episode reward: [(0, '4.294')] +[2024-11-08 03:28:04,570][42004] Updated weights for policy 0, policy_version 30116 (0.0030) +[2024-11-08 03:28:07,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6621.8, 300 sec: 6692.4). Total num frames: 123375616. Throughput: 0: 1715.5. Samples: 25838952. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:28:07,934][41694] Avg episode reward: [(0, '4.307')] +[2024-11-08 03:28:10,513][42004] Updated weights for policy 0, policy_version 30126 (0.0050) +[2024-11-08 03:28:12,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6734.1). Total num frames: 123408384. Throughput: 0: 1695.6. Samples: 25848860. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:28:12,934][41694] Avg episode reward: [(0, '4.447')] +[2024-11-08 03:28:16,417][42004] Updated weights for policy 0, policy_version 30136 (0.0039) +[2024-11-08 03:28:17,932][41694] Fps is (10 sec: 6963.4, 60 sec: 6826.7, 300 sec: 6734.1). Total num frames: 123445248. Throughput: 0: 1691.2. Samples: 25853920. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:28:17,933][41694] Avg episode reward: [(0, '4.354')] +[2024-11-08 03:28:22,230][42004] Updated weights for policy 0, policy_version 30146 (0.0032) +[2024-11-08 03:28:22,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6894.9, 300 sec: 6734.1). Total num frames: 123482112. Throughput: 0: 1706.2. Samples: 25864890. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:28:22,933][41694] Avg episode reward: [(0, '4.253')] +[2024-11-08 03:28:27,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6621.8, 300 sec: 6678.5). Total num frames: 123502592. Throughput: 0: 1640.5. Samples: 25871528. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:28:27,935][41694] Avg episode reward: [(0, '4.435')] +[2024-11-08 03:28:30,053][42004] Updated weights for policy 0, policy_version 30156 (0.0036) +[2024-11-08 03:28:32,931][41694] Fps is (10 sec: 5734.6, 60 sec: 6621.9, 300 sec: 6678.6). Total num frames: 123539456. Throughput: 0: 1646.9. Samples: 25877236. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:28:32,933][41694] Avg episode reward: [(0, '4.442')] +[2024-11-08 03:28:35,445][42004] Updated weights for policy 0, policy_version 30166 (0.0027) +[2024-11-08 03:28:37,931][41694] Fps is (10 sec: 7373.3, 60 sec: 6758.4, 300 sec: 6692.4). Total num frames: 123576320. Throughput: 0: 1708.7. Samples: 25888480. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:28:37,933][41694] Avg episode reward: [(0, '4.362')] +[2024-11-08 03:28:40,879][42004] Updated weights for policy 0, policy_version 30176 (0.0029) +[2024-11-08 03:28:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6706.3). Total num frames: 123613184. Throughput: 0: 1718.3. Samples: 25899980. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:28:42,933][41694] Avg episode reward: [(0, '4.607')] +[2024-11-08 03:28:46,664][42004] Updated weights for policy 0, policy_version 30186 (0.0030) +[2024-11-08 03:28:47,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6826.6, 300 sec: 6761.9). Total num frames: 123650048. Throughput: 0: 1698.8. Samples: 25904924. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:28:47,933][41694] Avg episode reward: [(0, '4.833')] +[2024-11-08 03:28:52,268][42004] Updated weights for policy 0, policy_version 30196 (0.0025) +[2024-11-08 03:28:52,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7031.5, 300 sec: 6761.9). Total num frames: 123686912. Throughput: 0: 1705.2. Samples: 25915684. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:28:52,933][41694] Avg episode reward: [(0, '4.452')] +[2024-11-08 03:28:57,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6894.9, 300 sec: 6748.0). Total num frames: 123719680. Throughput: 0: 1729.2. Samples: 25926674. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:28:57,933][41694] Avg episode reward: [(0, '4.508')] +[2024-11-08 03:28:57,992][42004] Updated weights for policy 0, policy_version 30206 (0.0027) +[2024-11-08 03:29:02,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.9, 300 sec: 6678.6). Total num frames: 123740160. Throughput: 0: 1657.1. Samples: 25928490. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:29:02,933][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 03:29:05,953][42004] Updated weights for policy 0, policy_version 30216 (0.0023) +[2024-11-08 03:29:07,932][41694] Fps is (10 sec: 5734.0, 60 sec: 6690.1, 300 sec: 6678.5). Total num frames: 123777024. Throughput: 0: 1645.0. Samples: 25938918. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:29:07,934][41694] Avg episode reward: [(0, '4.662')] +[2024-11-08 03:29:11,277][42004] Updated weights for policy 0, policy_version 30226 (0.0030) +[2024-11-08 03:29:12,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6826.7, 300 sec: 6706.3). Total num frames: 123817984. Throughput: 0: 1755.0. Samples: 25950500. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:29:12,934][41694] Avg episode reward: [(0, '4.730')] +[2024-11-08 03:29:16,801][42004] Updated weights for policy 0, policy_version 30236 (0.0029) +[2024-11-08 03:29:17,932][41694] Fps is (10 sec: 7373.3, 60 sec: 6758.4, 300 sec: 6755.6). Total num frames: 123850752. Throughput: 0: 1747.2. Samples: 25955862. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:29:17,933][41694] Avg episode reward: [(0, '4.576')] +[2024-11-08 03:29:22,599][42004] Updated weights for policy 0, policy_version 30246 (0.0026) +[2024-11-08 03:29:22,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 123887616. Throughput: 0: 1733.0. Samples: 25966466. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:29:22,934][41694] Avg episode reward: [(0, '4.637')] +[2024-11-08 03:29:27,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7031.5, 300 sec: 6748.0). Total num frames: 123924480. Throughput: 0: 1731.3. Samples: 25977888. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:29:27,933][41694] Avg episode reward: [(0, '4.733')] +[2024-11-08 03:29:27,949][42004] Updated weights for policy 0, policy_version 30256 (0.0033) +[2024-11-08 03:29:34,805][41694] Fps is (10 sec: 6209.6, 60 sec: 6818.6, 300 sec: 6705.4). Total num frames: 123961344. Throughput: 0: 1670.4. Samples: 25983220. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:29:34,808][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 03:29:35,875][42004] Updated weights for policy 0, policy_version 30266 (0.0024) +[2024-11-08 03:29:37,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6758.4, 300 sec: 6692.4). Total num frames: 123981824. Throughput: 0: 1655.7. Samples: 25990192. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:29:37,933][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 03:29:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000030269_123981824.pth... +[2024-11-08 03:29:38,078][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000029875_122368000.pth +[2024-11-08 03:29:41,416][42004] Updated weights for policy 0, policy_version 30276 (0.0022) +[2024-11-08 03:29:42,932][41694] Fps is (10 sec: 7056.1, 60 sec: 6758.4, 300 sec: 6692.4). Total num frames: 124018688. Throughput: 0: 1658.0. Samples: 26001286. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:29:42,934][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 03:29:46,879][42004] Updated weights for policy 0, policy_version 30286 (0.0029) +[2024-11-08 03:29:47,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6826.7, 300 sec: 6706.3). Total num frames: 124059648. Throughput: 0: 1737.2. Samples: 26006664. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:29:47,933][41694] Avg episode reward: [(0, '4.356')] +[2024-11-08 03:29:52,581][42004] Updated weights for policy 0, policy_version 30296 (0.0026) +[2024-11-08 03:29:52,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6758.1). Total num frames: 124092416. Throughput: 0: 1765.4. Samples: 26018360. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:29:52,934][41694] Avg episode reward: [(0, '4.740')] +[2024-11-08 03:29:57,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6826.7, 300 sec: 6761.9). Total num frames: 124129280. Throughput: 0: 1729.0. Samples: 26028304. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:29:57,933][41694] Avg episode reward: [(0, '4.400')] +[2024-11-08 03:29:58,415][42004] Updated weights for policy 0, policy_version 30306 (0.0029) +[2024-11-08 03:30:02,931][41694] Fps is (10 sec: 6963.2, 60 sec: 7031.5, 300 sec: 6748.0). Total num frames: 124162048. Throughput: 0: 1729.7. Samples: 26033696. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:30:02,933][41694] Avg episode reward: [(0, '4.618')] +[2024-11-08 03:30:04,126][42004] Updated weights for policy 0, policy_version 30316 (0.0022) +[2024-11-08 03:30:09,542][41694] Fps is (10 sec: 5644.4, 60 sec: 6781.3, 300 sec: 6683.7). Total num frames: 124194816. Throughput: 0: 1668.3. Samples: 26044228. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:30:09,544][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 03:30:12,206][42004] Updated weights for policy 0, policy_version 30326 (0.0031) +[2024-11-08 03:30:12,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6690.1, 300 sec: 6692.4). Total num frames: 124219392. Throughput: 0: 1628.1. Samples: 26051152. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:30:12,933][41694] Avg episode reward: [(0, '4.595')] +[2024-11-08 03:30:17,931][41694] Fps is (10 sec: 6835.4, 60 sec: 6690.1, 300 sec: 6692.4). Total num frames: 124252160. Throughput: 0: 1692.1. Samples: 26056194. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:30:17,934][41694] Avg episode reward: [(0, '4.639')] +[2024-11-08 03:30:18,092][42004] Updated weights for policy 0, policy_version 30336 (0.0026) +[2024-11-08 03:30:22,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 124293120. Throughput: 0: 1707.6. Samples: 26067032. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:30:22,932][41694] Avg episode reward: [(0, '4.578')] +[2024-11-08 03:30:23,364][42004] Updated weights for policy 0, policy_version 30346 (0.0027) +[2024-11-08 03:30:27,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6758.4, 300 sec: 6774.4). Total num frames: 124329984. Throughput: 0: 1712.6. Samples: 26078354. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:30:27,934][41694] Avg episode reward: [(0, '4.569')] +[2024-11-08 03:30:28,950][42004] Updated weights for policy 0, policy_version 30356 (0.0030) +[2024-11-08 03:30:32,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6976.2, 300 sec: 6775.8). Total num frames: 124366848. Throughput: 0: 1715.0. Samples: 26083838. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:30:32,933][41694] Avg episode reward: [(0, '4.564')] +[2024-11-08 03:30:34,492][42004] Updated weights for policy 0, policy_version 30366 (0.0026) +[2024-11-08 03:30:37,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7031.5, 300 sec: 6775.8). Total num frames: 124403712. Throughput: 0: 1705.7. Samples: 26095116. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:30:37,933][41694] Avg episode reward: [(0, '4.421')] +[2024-11-08 03:30:40,136][42004] Updated weights for policy 0, policy_version 30376 (0.0031) +[2024-11-08 03:30:44,344][41694] Fps is (10 sec: 5383.3, 60 sec: 6669.6, 300 sec: 6702.0). Total num frames: 124428288. Throughput: 0: 1558.6. Samples: 26100642. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:30:44,346][41694] Avg episode reward: [(0, '4.540')] +[2024-11-08 03:30:47,931][41694] Fps is (10 sec: 5324.7, 60 sec: 6621.9, 300 sec: 6692.4). Total num frames: 124456960. Throughput: 0: 1625.1. Samples: 26106824. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:30:47,933][41694] Avg episode reward: [(0, '4.495')] +[2024-11-08 03:30:48,089][42004] Updated weights for policy 0, policy_version 30386 (0.0026) +[2024-11-08 03:30:52,932][41694] Fps is (10 sec: 8108.4, 60 sec: 6758.3, 300 sec: 6734.1). Total num frames: 124497920. Throughput: 0: 1710.9. Samples: 26118462. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:30:52,934][41694] Avg episode reward: [(0, '4.398')] +[2024-11-08 03:30:53,309][42004] Updated weights for policy 0, policy_version 30396 (0.0026) +[2024-11-08 03:30:57,933][41694] Fps is (10 sec: 7781.8, 60 sec: 6758.3, 300 sec: 6734.1). Total num frames: 124534784. Throughput: 0: 1760.7. Samples: 26130386. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:30:57,935][41694] Avg episode reward: [(0, '4.559')] +[2024-11-08 03:30:58,562][42004] Updated weights for policy 0, policy_version 30406 (0.0036) +[2024-11-08 03:31:02,933][41694] Fps is (10 sec: 6962.6, 60 sec: 6758.2, 300 sec: 6790.7). Total num frames: 124567552. Throughput: 0: 1760.8. Samples: 26135432. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:31:02,935][41694] Avg episode reward: [(0, '4.518')] +[2024-11-08 03:31:04,830][42004] Updated weights for policy 0, policy_version 30416 (0.0031) +[2024-11-08 03:31:07,932][41694] Fps is (10 sec: 6963.6, 60 sec: 7014.9, 300 sec: 6803.5). Total num frames: 124604416. Throughput: 0: 1750.7. Samples: 26145814. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:31:07,934][41694] Avg episode reward: [(0, '4.573')] +[2024-11-08 03:31:11,088][42004] Updated weights for policy 0, policy_version 30426 (0.0028) +[2024-11-08 03:31:12,933][41694] Fps is (10 sec: 6553.4, 60 sec: 6894.8, 300 sec: 6775.7). Total num frames: 124633088. Throughput: 0: 1716.0. Samples: 26155576. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:31:12,937][41694] Avg episode reward: [(0, '4.497')] +[2024-11-08 03:31:18,926][41694] Fps is (10 sec: 5215.6, 60 sec: 6715.3, 300 sec: 6725.3). Total num frames: 124661760. Throughput: 0: 1663.7. Samples: 26160360. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:31:18,929][41694] Avg episode reward: [(0, '4.333')] +[2024-11-08 03:31:19,339][42004] Updated weights for policy 0, policy_version 30436 (0.0031) +[2024-11-08 03:31:22,931][41694] Fps is (10 sec: 5325.8, 60 sec: 6553.6, 300 sec: 6706.3). Total num frames: 124686336. Throughput: 0: 1594.0. Samples: 26166844. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:31:22,934][41694] Avg episode reward: [(0, '4.468')] +[2024-11-08 03:31:25,311][42004] Updated weights for policy 0, policy_version 30446 (0.0032) +[2024-11-08 03:31:27,931][41694] Fps is (10 sec: 6822.8, 60 sec: 6553.6, 300 sec: 6734.1). Total num frames: 124723200. Throughput: 0: 1759.8. Samples: 26177348. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:31:27,933][41694] Avg episode reward: [(0, '4.378')] +[2024-11-08 03:31:30,750][42004] Updated weights for policy 0, policy_version 30456 (0.0025) +[2024-11-08 03:31:32,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6553.6, 300 sec: 6734.1). Total num frames: 124760064. Throughput: 0: 1694.5. Samples: 26183078. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:31:32,933][41694] Avg episode reward: [(0, '4.568')] +[2024-11-08 03:31:36,702][42004] Updated weights for policy 0, policy_version 30466 (0.0046) +[2024-11-08 03:31:37,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6553.6, 300 sec: 6777.4). Total num frames: 124796928. Throughput: 0: 1669.3. Samples: 26193578. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:31:37,933][41694] Avg episode reward: [(0, '4.305')] +[2024-11-08 03:31:37,943][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000030468_124796928.pth... +[2024-11-08 03:31:38,275][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000030071_123170816.pth +[2024-11-08 03:31:42,204][42004] Updated weights for policy 0, policy_version 30476 (0.0039) +[2024-11-08 03:31:42,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6921.4, 300 sec: 6789.6). Total num frames: 124833792. Throughput: 0: 1648.6. Samples: 26204572. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:31:42,933][41694] Avg episode reward: [(0, '4.368')] +[2024-11-08 03:31:47,677][42004] Updated weights for policy 0, policy_version 30486 (0.0030) +[2024-11-08 03:31:47,932][41694] Fps is (10 sec: 7372.4, 60 sec: 6894.9, 300 sec: 6775.7). Total num frames: 124870656. Throughput: 0: 1662.4. Samples: 26210238. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:31:47,935][41694] Avg episode reward: [(0, '4.629')] +[2024-11-08 03:31:53,548][41694] Fps is (10 sec: 5787.3, 60 sec: 6554.6, 300 sec: 6720.1). Total num frames: 124895232. Throughput: 0: 1656.0. Samples: 26221356. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:31:53,550][41694] Avg episode reward: [(0, '4.631')] +[2024-11-08 03:31:55,570][42004] Updated weights for policy 0, policy_version 30496 (0.0025) +[2024-11-08 03:31:57,932][41694] Fps is (10 sec: 5734.6, 60 sec: 6553.7, 300 sec: 6720.2). Total num frames: 124928000. Throughput: 0: 1612.2. Samples: 26228122. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:31:57,933][41694] Avg episode reward: [(0, '4.517')] +[2024-11-08 03:32:00,923][42004] Updated weights for policy 0, policy_version 30506 (0.0039) +[2024-11-08 03:32:02,932][41694] Fps is (10 sec: 7420.6, 60 sec: 6622.0, 300 sec: 6734.1). Total num frames: 124964864. Throughput: 0: 1669.3. Samples: 26233820. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:32:02,934][41694] Avg episode reward: [(0, '4.499')] +[2024-11-08 03:32:07,127][42004] Updated weights for policy 0, policy_version 30516 (0.0036) +[2024-11-08 03:32:07,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 124997632. Throughput: 0: 1718.2. Samples: 26244164. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:32:07,934][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 03:32:12,627][42004] Updated weights for policy 0, policy_version 30526 (0.0022) +[2024-11-08 03:32:12,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.3, 300 sec: 6775.8). Total num frames: 125034496. Throughput: 0: 1724.3. Samples: 26254942. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:32:12,932][41694] Avg episode reward: [(0, '4.574')] +[2024-11-08 03:32:17,932][41694] Fps is (10 sec: 7373.1, 60 sec: 6941.7, 300 sec: 6789.6). Total num frames: 125071360. Throughput: 0: 1719.8. Samples: 26260470. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:32:17,933][41694] Avg episode reward: [(0, '4.403')] +[2024-11-08 03:32:18,055][42004] Updated weights for policy 0, policy_version 30536 (0.0024) +[2024-11-08 03:32:22,932][41694] Fps is (10 sec: 7372.5, 60 sec: 7031.4, 300 sec: 6789.6). Total num frames: 125108224. Throughput: 0: 1735.5. Samples: 26271676. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:32:22,934][41694] Avg episode reward: [(0, '4.626')] +[2024-11-08 03:32:23,673][42004] Updated weights for policy 0, policy_version 30546 (0.0023) +[2024-11-08 03:32:28,185][41694] Fps is (10 sec: 5592.8, 60 sec: 6730.0, 300 sec: 6728.3). Total num frames: 125128704. Throughput: 0: 1605.4. Samples: 26277220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:32:28,190][41694] Avg episode reward: [(0, '4.572')] +[2024-11-08 03:32:31,807][42004] Updated weights for policy 0, policy_version 30556 (0.0027) +[2024-11-08 03:32:32,931][41694] Fps is (10 sec: 5325.0, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 125161472. Throughput: 0: 1631.1. Samples: 26283636. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:32:32,933][41694] Avg episode reward: [(0, '4.643')] +[2024-11-08 03:32:37,169][42004] Updated weights for policy 0, policy_version 30566 (0.0020) +[2024-11-08 03:32:37,931][41694] Fps is (10 sec: 7564.5, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 125202432. Throughput: 0: 1653.9. Samples: 26294764. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:32:37,933][41694] Avg episode reward: [(0, '4.373')] +[2024-11-08 03:32:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.2, 300 sec: 6761.9). Total num frames: 125235200. Throughput: 0: 1720.1. Samples: 26305524. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:32:42,933][41694] Avg episode reward: [(0, '4.449')] +[2024-11-08 03:32:42,980][42004] Updated weights for policy 0, policy_version 30576 (0.0031) +[2024-11-08 03:32:47,932][41694] Fps is (10 sec: 7372.3, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 125276160. Throughput: 0: 1715.9. Samples: 26311036. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:32:47,935][41694] Avg episode reward: [(0, '4.270')] +[2024-11-08 03:32:48,347][42004] Updated weights for policy 0, policy_version 30586 (0.0023) +[2024-11-08 03:32:52,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7035.5, 300 sec: 6803.5). Total num frames: 125313024. Throughput: 0: 1745.2. Samples: 26322698. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:32:52,933][41694] Avg episode reward: [(0, '4.623')] +[2024-11-08 03:32:53,587][42004] Updated weights for policy 0, policy_version 30596 (0.0034) +[2024-11-08 03:32:57,931][41694] Fps is (10 sec: 7373.3, 60 sec: 7031.5, 300 sec: 6803.5). Total num frames: 125349888. Throughput: 0: 1762.3. Samples: 26334246. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:32:57,933][41694] Avg episode reward: [(0, '4.510')] +[2024-11-08 03:32:59,072][42004] Updated weights for policy 0, policy_version 30606 (0.0025) +[2024-11-08 03:33:02,943][41694] Fps is (10 sec: 5727.8, 60 sec: 6757.1, 300 sec: 6761.6). Total num frames: 125370368. Throughput: 0: 1755.0. Samples: 26339466. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:33:02,945][41694] Avg episode reward: [(0, '4.346')] +[2024-11-08 03:33:07,711][42004] Updated weights for policy 0, policy_version 30616 (0.0039) +[2024-11-08 03:33:07,932][41694] Fps is (10 sec: 5324.4, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 125403136. Throughput: 0: 1639.2. Samples: 26345442. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:33:07,935][41694] Avg episode reward: [(0, '4.683')] +[2024-11-08 03:33:12,933][41694] Fps is (10 sec: 6970.0, 60 sec: 6758.2, 300 sec: 6761.8). Total num frames: 125440000. Throughput: 0: 1762.0. Samples: 26356068. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:33:12,938][41694] Avg episode reward: [(0, '4.465')] +[2024-11-08 03:33:13,352][42004] Updated weights for policy 0, policy_version 30626 (0.0035) +[2024-11-08 03:33:17,932][41694] Fps is (10 sec: 6963.6, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 125472768. Throughput: 0: 1727.7. Samples: 26361382. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:33:17,933][41694] Avg episode reward: [(0, '4.291')] +[2024-11-08 03:33:19,067][42004] Updated weights for policy 0, policy_version 30636 (0.0042) +[2024-11-08 03:33:22,931][41694] Fps is (10 sec: 7374.1, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 125513728. Throughput: 0: 1729.2. Samples: 26372578. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:33:22,933][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 03:33:24,259][42004] Updated weights for policy 0, policy_version 30646 (0.0019) +[2024-11-08 03:33:27,932][41694] Fps is (10 sec: 8192.0, 60 sec: 7129.8, 300 sec: 6831.3). Total num frames: 125554688. Throughput: 0: 1748.6. Samples: 26384212. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:33:27,933][41694] Avg episode reward: [(0, '4.422')] +[2024-11-08 03:33:29,497][42004] Updated weights for policy 0, policy_version 30656 (0.0037) +[2024-11-08 03:33:32,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7168.0, 300 sec: 6831.3). Total num frames: 125591552. Throughput: 0: 1755.3. Samples: 26390022. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:33:32,933][41694] Avg episode reward: [(0, '4.311')] +[2024-11-08 03:33:34,961][42004] Updated weights for policy 0, policy_version 30666 (0.0025) +[2024-11-08 03:33:37,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6826.6, 300 sec: 6775.8). Total num frames: 125612032. Throughput: 0: 1713.1. Samples: 26399788. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:33:37,934][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 03:33:37,952][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000030667_125612032.pth... +[2024-11-08 03:33:38,081][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000030269_123981824.pth +[2024-11-08 03:33:42,589][42004] Updated weights for policy 0, policy_version 30676 (0.0030) +[2024-11-08 03:33:42,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6894.9, 300 sec: 6775.8). Total num frames: 125648896. Throughput: 0: 1654.5. Samples: 26408698. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:33:42,935][41694] Avg episode reward: [(0, '4.557')] +[2024-11-08 03:33:47,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6775.8). Total num frames: 125685760. Throughput: 0: 1657.2. Samples: 26414022. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:33:47,934][41694] Avg episode reward: [(0, '4.350')] +[2024-11-08 03:33:48,353][42004] Updated weights for policy 0, policy_version 30686 (0.0035) +[2024-11-08 03:33:52,933][41694] Fps is (10 sec: 6962.3, 60 sec: 6758.2, 300 sec: 6775.7). Total num frames: 125718528. Throughput: 0: 1751.3. Samples: 26424254. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:33:52,935][41694] Avg episode reward: [(0, '4.361')] +[2024-11-08 03:33:54,084][42004] Updated weights for policy 0, policy_version 30696 (0.0032) +[2024-11-08 03:33:57,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6845.2). Total num frames: 125759488. Throughput: 0: 1778.9. Samples: 26436116. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:33:57,933][41694] Avg episode reward: [(0, '4.422')] +[2024-11-08 03:33:59,178][42004] Updated weights for policy 0, policy_version 30706 (0.0026) +[2024-11-08 03:34:02,931][41694] Fps is (10 sec: 7783.7, 60 sec: 7101.1, 300 sec: 6845.2). Total num frames: 125796352. Throughput: 0: 1792.4. Samples: 26442038. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:34:02,934][41694] Avg episode reward: [(0, '4.296')] +[2024-11-08 03:34:04,852][42004] Updated weights for policy 0, policy_version 30716 (0.0026) +[2024-11-08 03:34:07,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7168.0, 300 sec: 6831.3). Total num frames: 125833216. Throughput: 0: 1781.7. Samples: 26452754. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:34:07,934][41694] Avg episode reward: [(0, '4.453')] +[2024-11-08 03:34:12,914][42004] Updated weights for policy 0, policy_version 30726 (0.0024) +[2024-11-08 03:34:12,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6895.1, 300 sec: 6789.6). Total num frames: 125853696. Throughput: 0: 1669.5. Samples: 26459338. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:34:12,934][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 03:34:17,932][41694] Fps is (10 sec: 5324.5, 60 sec: 6894.9, 300 sec: 6775.7). Total num frames: 125886464. Throughput: 0: 1664.1. Samples: 26464906. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:34:17,935][41694] Avg episode reward: [(0, '4.557')] +[2024-11-08 03:34:18,478][42004] Updated weights for policy 0, policy_version 30736 (0.0023) +[2024-11-08 03:34:22,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6775.8). Total num frames: 125923328. Throughput: 0: 1684.3. Samples: 26475582. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:34:22,933][41694] Avg episode reward: [(0, '4.575')] +[2024-11-08 03:34:24,481][42004] Updated weights for policy 0, policy_version 30746 (0.0032) +[2024-11-08 03:34:27,932][41694] Fps is (10 sec: 7373.2, 60 sec: 6758.4, 300 sec: 6819.1). Total num frames: 125960192. Throughput: 0: 1720.6. Samples: 26486126. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:34:27,934][41694] Avg episode reward: [(0, '4.268')] +[2024-11-08 03:34:29,999][42004] Updated weights for policy 0, policy_version 30756 (0.0026) +[2024-11-08 03:34:32,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 125997056. Throughput: 0: 1727.4. Samples: 26491754. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:34:32,933][41694] Avg episode reward: [(0, '4.121')] +[2024-11-08 03:34:35,263][42004] Updated weights for policy 0, policy_version 30766 (0.0024) +[2024-11-08 03:34:37,941][41694] Fps is (10 sec: 7775.4, 60 sec: 7098.7, 300 sec: 6845.0). Total num frames: 126038016. Throughput: 0: 1761.0. Samples: 26503512. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:34:37,944][41694] Avg episode reward: [(0, '4.521')] +[2024-11-08 03:34:40,857][42004] Updated weights for policy 0, policy_version 30776 (0.0045) +[2024-11-08 03:34:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 6817.4). Total num frames: 126070784. Throughput: 0: 1735.9. Samples: 26514232. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:34:42,933][41694] Avg episode reward: [(0, '4.585')] +[2024-11-08 03:34:47,931][41694] Fps is (10 sec: 5329.7, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 126091264. Throughput: 0: 1688.3. Samples: 26518010. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:34:47,933][41694] Avg episode reward: [(0, '4.634')] +[2024-11-08 03:34:49,039][42004] Updated weights for policy 0, policy_version 30786 (0.0033) +[2024-11-08 03:34:52,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6758.6, 300 sec: 6761.9). Total num frames: 126124032. Throughput: 0: 1618.6. Samples: 26525592. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:34:52,934][41694] Avg episode reward: [(0, '4.305')] +[2024-11-08 03:34:55,202][42004] Updated weights for policy 0, policy_version 30796 (0.0028) +[2024-11-08 03:34:57,932][41694] Fps is (10 sec: 6553.0, 60 sec: 6621.8, 300 sec: 6761.9). Total num frames: 126156800. Throughput: 0: 1694.0. Samples: 26535568. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:34:57,934][41694] Avg episode reward: [(0, '4.557')] +[2024-11-08 03:35:01,011][42004] Updated weights for policy 0, policy_version 30806 (0.0033) +[2024-11-08 03:35:02,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6621.9, 300 sec: 6813.0). Total num frames: 126193664. Throughput: 0: 1691.1. Samples: 26541006. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:35:02,933][41694] Avg episode reward: [(0, '4.413')] +[2024-11-08 03:35:06,467][42004] Updated weights for policy 0, policy_version 30816 (0.0027) +[2024-11-08 03:35:07,932][41694] Fps is (10 sec: 7373.3, 60 sec: 6621.9, 300 sec: 6817.4). Total num frames: 126230528. Throughput: 0: 1705.3. Samples: 26552320. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:35:07,933][41694] Avg episode reward: [(0, '4.294')] +[2024-11-08 03:35:11,920][42004] Updated weights for policy 0, policy_version 30826 (0.0026) +[2024-11-08 03:35:12,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6894.9, 300 sec: 6831.3). Total num frames: 126267392. Throughput: 0: 1720.6. Samples: 26563552. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:35:12,933][41694] Avg episode reward: [(0, '4.387')] +[2024-11-08 03:35:17,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6895.0, 300 sec: 6803.5). Total num frames: 126300160. Throughput: 0: 1697.8. Samples: 26568154. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:35:17,933][41694] Avg episode reward: [(0, '4.679')] +[2024-11-08 03:35:18,166][42004] Updated weights for policy 0, policy_version 30836 (0.0034) +[2024-11-08 03:35:22,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.9, 300 sec: 6748.0). Total num frames: 126320640. Throughput: 0: 1581.7. Samples: 26574674. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:35:22,933][41694] Avg episode reward: [(0, '4.542')] +[2024-11-08 03:35:26,391][42004] Updated weights for policy 0, policy_version 30846 (0.0026) +[2024-11-08 03:35:27,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6553.6, 300 sec: 6734.1). Total num frames: 126353408. Throughput: 0: 1567.6. Samples: 26584774. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:35:27,933][41694] Avg episode reward: [(0, '4.319')] +[2024-11-08 03:35:32,695][42004] Updated weights for policy 0, policy_version 30856 (0.0039) +[2024-11-08 03:35:32,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6485.3, 300 sec: 6720.2). Total num frames: 126386176. Throughput: 0: 1582.2. Samples: 26589208. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:35:32,937][41694] Avg episode reward: [(0, '4.415')] +[2024-11-08 03:35:37,780][42004] Updated weights for policy 0, policy_version 30866 (0.0027) +[2024-11-08 03:35:37,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6486.3, 300 sec: 6808.4). Total num frames: 126427136. Throughput: 0: 1669.7. Samples: 26600728. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:35:37,934][41694] Avg episode reward: [(0, '4.403')] +[2024-11-08 03:35:37,945][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000030866_126427136.pth... +[2024-11-08 03:35:38,052][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000030468_124796928.pth +[2024-11-08 03:35:42,932][41694] Fps is (10 sec: 7782.1, 60 sec: 6553.6, 300 sec: 6803.5). Total num frames: 126464000. Throughput: 0: 1708.3. Samples: 26612442. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:35:42,934][41694] Avg episode reward: [(0, '4.399')] +[2024-11-08 03:35:42,996][42004] Updated weights for policy 0, policy_version 30876 (0.0025) +[2024-11-08 03:35:47,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6826.7, 300 sec: 6789.7). Total num frames: 126500864. Throughput: 0: 1716.7. Samples: 26618256. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:35:47,934][41694] Avg episode reward: [(0, '4.459')] +[2024-11-08 03:35:48,729][42004] Updated weights for policy 0, policy_version 30886 (0.0035) +[2024-11-08 03:35:52,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6894.9, 300 sec: 6789.7). Total num frames: 126537728. Throughput: 0: 1695.3. Samples: 26628610. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:35:52,934][41694] Avg episode reward: [(0, '4.346')] +[2024-11-08 03:35:56,712][42004] Updated weights for policy 0, policy_version 30896 (0.0026) +[2024-11-08 03:35:57,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6690.2, 300 sec: 6748.0). Total num frames: 126558208. Throughput: 0: 1598.6. Samples: 26635488. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:35:57,933][41694] Avg episode reward: [(0, '4.269')] +[2024-11-08 03:36:02,931][41694] Fps is (10 sec: 4915.3, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 126586880. Throughput: 0: 1609.7. Samples: 26640592. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:36:02,934][41694] Avg episode reward: [(0, '4.537')] +[2024-11-08 03:36:03,199][42004] Updated weights for policy 0, policy_version 30906 (0.0027) +[2024-11-08 03:36:07,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6485.3, 300 sec: 6734.1). Total num frames: 126619648. Throughput: 0: 1666.0. Samples: 26649646. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:36:07,933][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 03:36:09,358][42004] Updated weights for policy 0, policy_version 30916 (0.0034) +[2024-11-08 03:36:12,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6770.8). Total num frames: 126652416. Throughput: 0: 1668.0. Samples: 26659832. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:36:12,934][41694] Avg episode reward: [(0, '4.273')] +[2024-11-08 03:36:15,287][42004] Updated weights for policy 0, policy_version 30926 (0.0039) +[2024-11-08 03:36:17,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6553.6, 300 sec: 6803.5). Total num frames: 126693376. Throughput: 0: 1689.8. Samples: 26665250. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:36:17,936][41694] Avg episode reward: [(0, '4.327')] +[2024-11-08 03:36:20,564][42004] Updated weights for policy 0, policy_version 30936 (0.0024) +[2024-11-08 03:36:22,933][41694] Fps is (10 sec: 7781.6, 60 sec: 6826.5, 300 sec: 6803.5). Total num frames: 126730240. Throughput: 0: 1691.4. Samples: 26676840. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:36:22,934][41694] Avg episode reward: [(0, '4.336')] +[2024-11-08 03:36:26,130][42004] Updated weights for policy 0, policy_version 30946 (0.0036) +[2024-11-08 03:36:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6803.5). Total num frames: 126767104. Throughput: 0: 1679.0. Samples: 26687998. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:36:27,934][41694] Avg episode reward: [(0, '4.410')] +[2024-11-08 03:36:32,932][41694] Fps is (10 sec: 5325.3, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 126783488. Throughput: 0: 1587.2. Samples: 26689682. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:36:32,933][41694] Avg episode reward: [(0, '4.629')] +[2024-11-08 03:36:34,523][42004] Updated weights for policy 0, policy_version 30956 (0.0034) +[2024-11-08 03:36:37,932][41694] Fps is (10 sec: 4505.6, 60 sec: 6417.1, 300 sec: 6706.3). Total num frames: 126812160. Throughput: 0: 1558.2. Samples: 26698728. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:36:37,933][41694] Avg episode reward: [(0, '4.518')] +[2024-11-08 03:36:40,990][42004] Updated weights for policy 0, policy_version 30966 (0.0021) +[2024-11-08 03:36:42,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6706.3). Total num frames: 126849024. Throughput: 0: 1632.1. Samples: 26708934. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:36:42,933][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 03:36:46,205][42004] Updated weights for policy 0, policy_version 30976 (0.0032) +[2024-11-08 03:36:47,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6485.3, 300 sec: 6776.0). Total num frames: 126889984. Throughput: 0: 1644.6. Samples: 26714598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:36:47,934][41694] Avg episode reward: [(0, '4.366')] +[2024-11-08 03:36:51,398][42004] Updated weights for policy 0, policy_version 30986 (0.0028) +[2024-11-08 03:36:52,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6485.3, 300 sec: 6775.8). Total num frames: 126926848. Throughput: 0: 1706.8. Samples: 26726454. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:36:52,933][41694] Avg episode reward: [(0, '4.310')] +[2024-11-08 03:36:56,787][42004] Updated weights for policy 0, policy_version 30996 (0.0031) +[2024-11-08 03:36:57,936][41694] Fps is (10 sec: 7779.2, 60 sec: 6826.2, 300 sec: 6789.5). Total num frames: 126967808. Throughput: 0: 1736.2. Samples: 26737966. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:36:57,939][41694] Avg episode reward: [(0, '4.534')] +[2024-11-08 03:37:02,509][42004] Updated weights for policy 0, policy_version 31006 (0.0029) +[2024-11-08 03:37:05,084][41694] Fps is (10 sec: 6066.7, 60 sec: 6656.1, 300 sec: 6740.5). Total num frames: 127000576. Throughput: 0: 1650.2. Samples: 26743060. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:37:05,086][41694] Avg episode reward: [(0, '4.419')] +[2024-11-08 03:37:07,932][41694] Fps is (10 sec: 5327.0, 60 sec: 6690.1, 300 sec: 6734.1). Total num frames: 127021056. Throughput: 0: 1618.1. Samples: 26749652. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:37:07,934][41694] Avg episode reward: [(0, '4.608')] +[2024-11-08 03:37:11,206][42004] Updated weights for policy 0, policy_version 31016 (0.0038) +[2024-11-08 03:37:12,936][41694] Fps is (10 sec: 6259.9, 60 sec: 6621.3, 300 sec: 6706.2). Total num frames: 127049728. Throughput: 0: 1580.4. Samples: 26759122. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:37:12,938][41694] Avg episode reward: [(0, '4.579')] +[2024-11-08 03:37:17,059][42004] Updated weights for policy 0, policy_version 31026 (0.0034) +[2024-11-08 03:37:17,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6706.3). Total num frames: 127086592. Throughput: 0: 1654.3. Samples: 26764126. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:37:17,933][41694] Avg episode reward: [(0, '4.544')] +[2024-11-08 03:37:22,235][42004] Updated weights for policy 0, policy_version 31036 (0.0054) +[2024-11-08 03:37:22,932][41694] Fps is (10 sec: 7786.1, 60 sec: 6622.0, 300 sec: 6781.6). Total num frames: 127127552. Throughput: 0: 1714.8. Samples: 26775894. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:37:22,933][41694] Avg episode reward: [(0, '4.602')] +[2024-11-08 03:37:27,434][42004] Updated weights for policy 0, policy_version 31046 (0.0026) +[2024-11-08 03:37:27,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6621.9, 300 sec: 6789.6). Total num frames: 127164416. Throughput: 0: 1752.5. Samples: 26787796. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:37:27,933][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 03:37:32,841][42004] Updated weights for policy 0, policy_version 31056 (0.0026) +[2024-11-08 03:37:32,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7031.5, 300 sec: 6789.6). Total num frames: 127205376. Throughput: 0: 1755.1. Samples: 26793578. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:37:32,934][41694] Avg episode reward: [(0, '4.482')] +[2024-11-08 03:37:39,693][41694] Fps is (10 sec: 5920.3, 60 sec: 6830.9, 300 sec: 6735.5). Total num frames: 127234048. Throughput: 0: 1672.4. Samples: 26804658. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:37:39,695][41694] Avg episode reward: [(0, '4.341')] +[2024-11-08 03:37:39,726][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000031064_127238144.pth... +[2024-11-08 03:37:39,832][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000030667_125612032.pth +[2024-11-08 03:37:40,964][42004] Updated weights for policy 0, policy_version 31066 (0.0063) +[2024-11-08 03:37:42,933][41694] Fps is (10 sec: 4914.5, 60 sec: 6758.2, 300 sec: 6706.3). Total num frames: 127254528. Throughput: 0: 1622.4. Samples: 26810970. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:37:42,936][41694] Avg episode reward: [(0, '4.314')] +[2024-11-08 03:37:47,543][42004] Updated weights for policy 0, policy_version 31076 (0.0040) +[2024-11-08 03:37:47,932][41694] Fps is (10 sec: 6463.4, 60 sec: 6621.9, 300 sec: 6692.4). Total num frames: 127287296. Throughput: 0: 1681.3. Samples: 26815098. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:37:47,933][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 03:37:52,732][42004] Updated weights for policy 0, policy_version 31086 (0.0034) +[2024-11-08 03:37:52,932][41694] Fps is (10 sec: 7373.9, 60 sec: 6690.1, 300 sec: 6706.3). Total num frames: 127328256. Throughput: 0: 1699.0. Samples: 26826106. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:37:52,933][41694] Avg episode reward: [(0, '4.457')] +[2024-11-08 03:37:57,863][42004] Updated weights for policy 0, policy_version 31096 (0.0023) +[2024-11-08 03:37:57,931][41694] Fps is (10 sec: 8192.1, 60 sec: 6690.6, 300 sec: 6776.0). Total num frames: 127369216. Throughput: 0: 1755.8. Samples: 26838124. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:37:57,933][41694] Avg episode reward: [(0, '4.447')] +[2024-11-08 03:38:02,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7009.9, 300 sec: 6789.7). Total num frames: 127406080. Throughput: 0: 1772.7. Samples: 26843896. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:38:02,933][41694] Avg episode reward: [(0, '4.440')] +[2024-11-08 03:38:03,309][42004] Updated weights for policy 0, policy_version 31106 (0.0028) +[2024-11-08 03:38:07,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7031.5, 300 sec: 6789.7). Total num frames: 127442944. Throughput: 0: 1766.7. Samples: 26855394. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:38:07,934][41694] Avg episode reward: [(0, '4.544')] +[2024-11-08 03:38:08,509][42004] Updated weights for policy 0, policy_version 31116 (0.0025) +[2024-11-08 03:38:14,222][41694] Fps is (10 sec: 6167.0, 60 sec: 6950.7, 300 sec: 6760.1). Total num frames: 127475712. Throughput: 0: 1587.0. Samples: 26861262. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:38:14,224][41694] Avg episode reward: [(0, '4.531')] +[2024-11-08 03:38:16,556][42004] Updated weights for policy 0, policy_version 31126 (0.0035) +[2024-11-08 03:38:17,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6894.9, 300 sec: 6734.1). Total num frames: 127500288. Throughput: 0: 1658.4. Samples: 26868206. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:38:17,934][41694] Avg episode reward: [(0, '4.595')] +[2024-11-08 03:38:22,893][42004] Updated weights for policy 0, policy_version 31136 (0.0036) +[2024-11-08 03:38:22,932][41694] Fps is (10 sec: 6584.4, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 127533056. Throughput: 0: 1674.3. Samples: 26877052. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:38:22,934][41694] Avg episode reward: [(0, '4.456')] +[2024-11-08 03:38:27,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 127569920. Throughput: 0: 1726.1. Samples: 26888642. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:38:27,934][41694] Avg episode reward: [(0, '4.501')] +[2024-11-08 03:38:28,076][42004] Updated weights for policy 0, policy_version 31146 (0.0031) +[2024-11-08 03:38:32,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 127610880. Throughput: 0: 1771.6. Samples: 26894818. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:38:32,932][41694] Avg episode reward: [(0, '4.498')] +[2024-11-08 03:38:33,221][42004] Updated weights for policy 0, policy_version 31156 (0.0030) +[2024-11-08 03:38:37,931][41694] Fps is (10 sec: 8192.2, 60 sec: 7173.8, 300 sec: 6789.6). Total num frames: 127651840. Throughput: 0: 1792.7. Samples: 26906776. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:38:37,934][41694] Avg episode reward: [(0, '4.531')] +[2024-11-08 03:38:38,352][42004] Updated weights for policy 0, policy_version 31166 (0.0028) +[2024-11-08 03:38:42,932][41694] Fps is (10 sec: 7782.2, 60 sec: 7236.4, 300 sec: 6789.6). Total num frames: 127688704. Throughput: 0: 1784.2. Samples: 26918412. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:38:42,934][41694] Avg episode reward: [(0, '4.307')] +[2024-11-08 03:38:43,696][42004] Updated weights for policy 0, policy_version 31176 (0.0031) +[2024-11-08 03:38:48,731][41694] Fps is (10 sec: 6068.1, 60 sec: 7073.7, 300 sec: 6757.5). Total num frames: 127717376. Throughput: 0: 1749.5. Samples: 26924022. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:38:48,733][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 03:38:52,000][42004] Updated weights for policy 0, policy_version 31186 (0.0025) +[2024-11-08 03:38:52,933][41694] Fps is (10 sec: 5324.2, 60 sec: 6894.8, 300 sec: 6720.2). Total num frames: 127741952. Throughput: 0: 1666.5. Samples: 26930390. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:38:52,935][41694] Avg episode reward: [(0, '4.423')] +[2024-11-08 03:38:57,860][42004] Updated weights for policy 0, policy_version 31196 (0.0040) +[2024-11-08 03:38:57,932][41694] Fps is (10 sec: 6678.2, 60 sec: 6826.6, 300 sec: 6720.2). Total num frames: 127778816. Throughput: 0: 1812.1. Samples: 26940468. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:38:57,934][41694] Avg episode reward: [(0, '4.487')] +[2024-11-08 03:39:02,931][41694] Fps is (10 sec: 6964.1, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 127811584. Throughput: 0: 1728.1. Samples: 26945970. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:39:02,933][41694] Avg episode reward: [(0, '4.557')] +[2024-11-08 03:39:03,648][42004] Updated weights for policy 0, policy_version 31206 (0.0027) +[2024-11-08 03:39:07,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6775.8). Total num frames: 127852544. Throughput: 0: 1773.3. Samples: 26956848. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:39:07,933][41694] Avg episode reward: [(0, '4.396')] +[2024-11-08 03:39:08,932][42004] Updated weights for policy 0, policy_version 31216 (0.0029) +[2024-11-08 03:39:12,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7046.5, 300 sec: 6789.7). Total num frames: 127889408. Throughput: 0: 1768.1. Samples: 26968208. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:39:12,934][41694] Avg episode reward: [(0, '4.336')] +[2024-11-08 03:39:14,549][42004] Updated weights for policy 0, policy_version 31226 (0.0030) +[2024-11-08 03:39:17,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7099.8, 300 sec: 6789.6). Total num frames: 127926272. Throughput: 0: 1756.2. Samples: 26973848. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:39:17,933][41694] Avg episode reward: [(0, '4.666')] +[2024-11-08 03:39:19,973][42004] Updated weights for policy 0, policy_version 31236 (0.0028) +[2024-11-08 03:39:22,932][41694] Fps is (10 sec: 6143.8, 60 sec: 6963.2, 300 sec: 6748.0). Total num frames: 127950848. Throughput: 0: 1742.4. Samples: 26985184. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:39:22,934][41694] Avg episode reward: [(0, '4.558')] +[2024-11-08 03:39:27,570][42004] Updated weights for policy 0, policy_version 31246 (0.0043) +[2024-11-08 03:39:27,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6894.9, 300 sec: 6734.1). Total num frames: 127983616. Throughput: 0: 1648.2. Samples: 26992580. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:39:27,933][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 03:39:32,824][42004] Updated weights for policy 0, policy_version 31256 (0.0024) +[2024-11-08 03:39:32,931][41694] Fps is (10 sec: 7373.3, 60 sec: 6894.9, 300 sec: 6734.3). Total num frames: 128024576. Throughput: 0: 1666.7. Samples: 26997692. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:39:32,932][41694] Avg episode reward: [(0, '4.294')] +[2024-11-08 03:39:37,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6826.6, 300 sec: 6748.0). Total num frames: 128061440. Throughput: 0: 1766.5. Samples: 27009880. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:39:37,935][41694] Avg episode reward: [(0, '4.407')] +[2024-11-08 03:39:38,032][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000031266_128065536.pth... +[2024-11-08 03:39:38,038][42004] Updated weights for policy 0, policy_version 31266 (0.0024) +[2024-11-08 03:39:38,121][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000030866_126427136.pth +[2024-11-08 03:39:42,931][41694] Fps is (10 sec: 7782.3, 60 sec: 6894.9, 300 sec: 6817.4). Total num frames: 128102400. Throughput: 0: 1806.9. Samples: 27021780. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:39:42,933][41694] Avg episode reward: [(0, '4.576')] +[2024-11-08 03:39:43,205][42004] Updated weights for policy 0, policy_version 31276 (0.0031) +[2024-11-08 03:39:47,932][41694] Fps is (10 sec: 8192.0, 60 sec: 7195.6, 300 sec: 6845.2). Total num frames: 128143360. Throughput: 0: 1815.0. Samples: 27027646. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:39:47,933][41694] Avg episode reward: [(0, '4.634')] +[2024-11-08 03:39:48,448][42004] Updated weights for policy 0, policy_version 31286 (0.0025) +[2024-11-08 03:39:52,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7304.7, 300 sec: 6859.1). Total num frames: 128180224. Throughput: 0: 1831.5. Samples: 27039266. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:39:52,935][41694] Avg episode reward: [(0, '4.521')] +[2024-11-08 03:39:53,822][42004] Updated weights for policy 0, policy_version 31296 (0.0032) +[2024-11-08 03:39:57,933][41694] Fps is (10 sec: 5324.1, 60 sec: 6963.0, 300 sec: 6789.6). Total num frames: 128196608. Throughput: 0: 1736.0. Samples: 27046332. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:39:57,936][41694] Avg episode reward: [(0, '4.525')] +[2024-11-08 03:40:02,931][41694] Fps is (10 sec: 4505.6, 60 sec: 6894.9, 300 sec: 6761.9). Total num frames: 128225280. Throughput: 0: 1689.0. Samples: 27049854. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:40:02,933][41694] Avg episode reward: [(0, '4.447')] +[2024-11-08 03:40:03,151][42004] Updated weights for policy 0, policy_version 31306 (0.0029) +[2024-11-08 03:40:07,931][41694] Fps is (10 sec: 6964.3, 60 sec: 6894.9, 300 sec: 6775.8). Total num frames: 128266240. Throughput: 0: 1668.2. Samples: 27060254. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:40:07,933][41694] Avg episode reward: [(0, '4.544')] +[2024-11-08 03:40:08,405][42004] Updated weights for policy 0, policy_version 31316 (0.0023) +[2024-11-08 03:40:12,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6894.9, 300 sec: 6789.6). Total num frames: 128303104. Throughput: 0: 1766.8. Samples: 27072088. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:40:12,936][41694] Avg episode reward: [(0, '4.236')] +[2024-11-08 03:40:13,801][42004] Updated weights for policy 0, policy_version 31326 (0.0027) +[2024-11-08 03:40:17,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6831.3). Total num frames: 128335872. Throughput: 0: 1767.9. Samples: 27077248. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:40:17,935][41694] Avg episode reward: [(0, '4.437')] +[2024-11-08 03:40:19,832][42004] Updated weights for policy 0, policy_version 31336 (0.0041) +[2024-11-08 03:40:22,932][41694] Fps is (10 sec: 6962.7, 60 sec: 7031.4, 300 sec: 6845.2). Total num frames: 128372736. Throughput: 0: 1735.9. Samples: 27087998. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:40:22,935][41694] Avg episode reward: [(0, '4.603')] +[2024-11-08 03:40:25,308][42004] Updated weights for policy 0, policy_version 31346 (0.0034) +[2024-11-08 03:40:27,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7099.7, 300 sec: 6859.1). Total num frames: 128409600. Throughput: 0: 1713.5. Samples: 27098890. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:40:27,933][41694] Avg episode reward: [(0, '4.205')] +[2024-11-08 03:40:32,932][41694] Fps is (10 sec: 5325.2, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 128425984. Throughput: 0: 1686.7. Samples: 27103546. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:40:32,936][41694] Avg episode reward: [(0, '4.320')] +[2024-11-08 03:40:33,680][42004] Updated weights for policy 0, policy_version 31356 (0.0033) +[2024-11-08 03:40:37,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6690.2, 300 sec: 6775.8). Total num frames: 128462848. Throughput: 0: 1569.6. Samples: 27109900. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:40:37,935][41694] Avg episode reward: [(0, '4.577')] +[2024-11-08 03:40:39,429][42004] Updated weights for policy 0, policy_version 31366 (0.0025) +[2024-11-08 03:40:42,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6621.9, 300 sec: 6775.8). Total num frames: 128499712. Throughput: 0: 1667.4. Samples: 27121360. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:40:42,933][41694] Avg episode reward: [(0, '4.268')] +[2024-11-08 03:40:44,744][42004] Updated weights for policy 0, policy_version 31376 (0.0037) +[2024-11-08 03:40:47,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6621.9, 300 sec: 6789.6). Total num frames: 128540672. Throughput: 0: 1716.4. Samples: 27127092. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:40:47,933][41694] Avg episode reward: [(0, '4.333')] +[2024-11-08 03:40:50,000][42004] Updated weights for policy 0, policy_version 31386 (0.0037) +[2024-11-08 03:40:52,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6621.8, 300 sec: 6845.2). Total num frames: 128577536. Throughput: 0: 1745.7. Samples: 27138812. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:40:52,934][41694] Avg episode reward: [(0, '4.574')] +[2024-11-08 03:40:55,381][42004] Updated weights for policy 0, policy_version 31396 (0.0024) +[2024-11-08 03:40:57,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6963.3, 300 sec: 6872.9). Total num frames: 128614400. Throughput: 0: 1734.6. Samples: 27150144. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:40:57,935][41694] Avg episode reward: [(0, '4.534')] +[2024-11-08 03:41:01,034][42004] Updated weights for policy 0, policy_version 31406 (0.0027) +[2024-11-08 03:41:02,933][41694] Fps is (10 sec: 7371.7, 60 sec: 7099.5, 300 sec: 6886.8). Total num frames: 128651264. Throughput: 0: 1739.1. Samples: 27155512. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:41:02,936][41694] Avg episode reward: [(0, '4.298')] +[2024-11-08 03:41:07,934][41694] Fps is (10 sec: 4914.2, 60 sec: 6621.6, 300 sec: 6817.4). Total num frames: 128663552. Throughput: 0: 1648.8. Samples: 27162198. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:41:07,936][41694] Avg episode reward: [(0, '4.365')] +[2024-11-08 03:41:10,058][42004] Updated weights for policy 0, policy_version 31416 (0.0033) +[2024-11-08 03:41:12,932][41694] Fps is (10 sec: 4506.3, 60 sec: 6553.6, 300 sec: 6789.6). Total num frames: 128696320. Throughput: 0: 1583.2. Samples: 27170132. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:41:12,933][41694] Avg episode reward: [(0, '4.426')] +[2024-11-08 03:41:16,392][42004] Updated weights for policy 0, policy_version 31426 (0.0037) +[2024-11-08 03:41:17,932][41694] Fps is (10 sec: 6555.0, 60 sec: 6553.6, 300 sec: 6775.8). Total num frames: 128729088. Throughput: 0: 1591.7. Samples: 27175172. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:41:17,933][41694] Avg episode reward: [(0, '4.677')] +[2024-11-08 03:41:22,183][42004] Updated weights for policy 0, policy_version 31436 (0.0031) +[2024-11-08 03:41:22,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6553.7, 300 sec: 6775.8). Total num frames: 128765952. Throughput: 0: 1686.3. Samples: 27185784. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:41:22,932][41694] Avg episode reward: [(0, '4.440')] +[2024-11-08 03:41:27,392][42004] Updated weights for policy 0, policy_version 31446 (0.0031) +[2024-11-08 03:41:27,934][41694] Fps is (10 sec: 7780.7, 60 sec: 6621.6, 300 sec: 6859.0). Total num frames: 128806912. Throughput: 0: 1694.3. Samples: 27197606. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:41:27,936][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 03:41:32,717][42004] Updated weights for policy 0, policy_version 31456 (0.0027) +[2024-11-08 03:41:32,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6886.8). Total num frames: 128843776. Throughput: 0: 1690.0. Samples: 27203140. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:41:32,936][41694] Avg episode reward: [(0, '4.527')] +[2024-11-08 03:41:37,932][41694] Fps is (10 sec: 6964.6, 60 sec: 6894.9, 300 sec: 6872.9). Total num frames: 128876544. Throughput: 0: 1682.4. Samples: 27214522. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:41:37,934][41694] Avg episode reward: [(0, '4.631')] +[2024-11-08 03:41:37,971][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000031465_128880640.pth... +[2024-11-08 03:41:38,121][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000031064_127238144.pth +[2024-11-08 03:41:38,717][42004] Updated weights for policy 0, policy_version 31466 (0.0032) +[2024-11-08 03:41:42,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6553.6, 300 sec: 6789.6). Total num frames: 128892928. Throughput: 0: 1560.7. Samples: 27220376. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:41:42,934][41694] Avg episode reward: [(0, '4.570')] +[2024-11-08 03:41:47,088][42004] Updated weights for policy 0, policy_version 31476 (0.0030) +[2024-11-08 03:41:47,932][41694] Fps is (10 sec: 5325.0, 60 sec: 6485.3, 300 sec: 6789.6). Total num frames: 128929792. Throughput: 0: 1544.7. Samples: 27225020. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:41:47,933][41694] Avg episode reward: [(0, '4.340')] +[2024-11-08 03:41:52,506][42004] Updated weights for policy 0, policy_version 31486 (0.0024) +[2024-11-08 03:41:52,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6485.3, 300 sec: 6775.9). Total num frames: 128966656. Throughput: 0: 1647.4. Samples: 27236326. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:41:52,934][41694] Avg episode reward: [(0, '4.637')] +[2024-11-08 03:41:57,851][42004] Updated weights for policy 0, policy_version 31496 (0.0036) +[2024-11-08 03:41:57,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6553.6, 300 sec: 6853.5). Total num frames: 129007616. Throughput: 0: 1727.2. Samples: 27247858. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:41:57,934][41694] Avg episode reward: [(0, '4.524')] +[2024-11-08 03:42:02,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6553.8, 300 sec: 6859.1). Total num frames: 129044480. Throughput: 0: 1743.8. Samples: 27253642. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:42:02,934][41694] Avg episode reward: [(0, '4.280')] +[2024-11-08 03:42:03,359][42004] Updated weights for policy 0, policy_version 31506 (0.0026) +[2024-11-08 03:42:07,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.5, 300 sec: 6886.9). Total num frames: 129081344. Throughput: 0: 1748.7. Samples: 27264476. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:42:07,933][41694] Avg episode reward: [(0, '4.307')] +[2024-11-08 03:42:08,837][42004] Updated weights for policy 0, policy_version 31516 (0.0032) +[2024-11-08 03:42:12,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6963.2, 300 sec: 6872.9). Total num frames: 129114112. Throughput: 0: 1718.1. Samples: 27274916. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:42:12,934][41694] Avg episode reward: [(0, '4.638')] +[2024-11-08 03:42:17,719][42004] Updated weights for policy 0, policy_version 31526 (0.0041) +[2024-11-08 03:42:17,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6690.2, 300 sec: 6789.6). Total num frames: 129130496. Throughput: 0: 1642.4. Samples: 27277046. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:42:17,933][41694] Avg episode reward: [(0, '4.551')] +[2024-11-08 03:42:22,932][41694] Fps is (10 sec: 5324.6, 60 sec: 6690.1, 300 sec: 6789.6). Total num frames: 129167360. Throughput: 0: 1593.2. Samples: 27286214. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:42:22,934][41694] Avg episode reward: [(0, '4.443')] +[2024-11-08 03:42:23,169][42004] Updated weights for policy 0, policy_version 31536 (0.0034) +[2024-11-08 03:42:27,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6690.4, 300 sec: 6789.6). Total num frames: 129208320. Throughput: 0: 1722.2. Samples: 27297876. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:42:27,933][41694] Avg episode reward: [(0, '4.248')] +[2024-11-08 03:42:28,420][42004] Updated weights for policy 0, policy_version 31546 (0.0032) +[2024-11-08 03:42:32,932][41694] Fps is (10 sec: 7782.6, 60 sec: 6690.1, 300 sec: 6858.4). Total num frames: 129245184. Throughput: 0: 1747.2. Samples: 27303642. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:42:32,933][41694] Avg episode reward: [(0, '4.417')] +[2024-11-08 03:42:33,493][42004] Updated weights for policy 0, policy_version 31556 (0.0022) +[2024-11-08 03:42:37,932][41694] Fps is (10 sec: 7781.8, 60 sec: 6826.6, 300 sec: 6886.9). Total num frames: 129286144. Throughput: 0: 1758.8. Samples: 27315472. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:42:37,935][41694] Avg episode reward: [(0, '4.371')] +[2024-11-08 03:42:39,113][42004] Updated weights for policy 0, policy_version 31566 (0.0030) +[2024-11-08 03:42:42,931][41694] Fps is (10 sec: 6963.3, 60 sec: 7031.5, 300 sec: 6873.0). Total num frames: 129314816. Throughput: 0: 1731.8. Samples: 27325788. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:42:42,933][41694] Avg episode reward: [(0, '4.575')] +[2024-11-08 03:42:45,406][42004] Updated weights for policy 0, policy_version 31576 (0.0034) +[2024-11-08 03:42:50,145][41694] Fps is (10 sec: 5030.7, 60 sec: 6715.4, 300 sec: 6794.2). Total num frames: 129347584. Throughput: 0: 1633.3. Samples: 27330754. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:42:50,148][41694] Avg episode reward: [(0, '4.352')] +[2024-11-08 03:42:52,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6690.1, 300 sec: 6775.7). Total num frames: 129368064. Throughput: 0: 1602.2. Samples: 27336576. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:42:52,933][41694] Avg episode reward: [(0, '4.456')] +[2024-11-08 03:42:53,828][42004] Updated weights for policy 0, policy_version 31586 (0.0033) +[2024-11-08 03:42:57,932][41694] Fps is (10 sec: 7364.7, 60 sec: 6621.9, 300 sec: 6775.8). Total num frames: 129404928. Throughput: 0: 1619.8. Samples: 27347806. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:42:57,935][41694] Avg episode reward: [(0, '4.506')] +[2024-11-08 03:42:59,026][42004] Updated weights for policy 0, policy_version 31596 (0.0027) +[2024-11-08 03:43:02,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6690.1, 300 sec: 6789.6). Total num frames: 129445888. Throughput: 0: 1704.8. Samples: 27353764. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:43:02,933][41694] Avg episode reward: [(0, '4.551')] +[2024-11-08 03:43:04,536][42004] Updated weights for policy 0, policy_version 31606 (0.0033) +[2024-11-08 03:43:07,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6690.1, 300 sec: 6833.4). Total num frames: 129482752. Throughput: 0: 1751.5. Samples: 27365030. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:43:07,933][41694] Avg episode reward: [(0, '4.236')] +[2024-11-08 03:43:09,670][42004] Updated weights for policy 0, policy_version 31616 (0.0034) +[2024-11-08 03:43:12,932][41694] Fps is (10 sec: 7782.0, 60 sec: 6826.6, 300 sec: 6859.1). Total num frames: 129523712. Throughput: 0: 1754.7. Samples: 27376838. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:43:12,935][41694] Avg episode reward: [(0, '4.275')] +[2024-11-08 03:43:15,097][42004] Updated weights for policy 0, policy_version 31626 (0.0028) +[2024-11-08 03:43:17,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7099.7, 300 sec: 6859.1). Total num frames: 129556480. Throughput: 0: 1751.6. Samples: 27382466. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:43:17,935][41694] Avg episode reward: [(0, '4.429')] +[2024-11-08 03:43:21,113][42004] Updated weights for policy 0, policy_version 31636 (0.0042) +[2024-11-08 03:43:24,814][41694] Fps is (10 sec: 5171.0, 60 sec: 6751.5, 300 sec: 6788.0). Total num frames: 129585152. Throughput: 0: 1644.0. Samples: 27392546. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:43:24,827][41694] Avg episode reward: [(0, '4.261')] +[2024-11-08 03:43:27,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6621.9, 300 sec: 6761.9). Total num frames: 129605632. Throughput: 0: 1607.9. Samples: 27398144. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:43:27,933][41694] Avg episode reward: [(0, '4.479')] +[2024-11-08 03:43:29,698][42004] Updated weights for policy 0, policy_version 31646 (0.0042) +[2024-11-08 03:43:32,932][41694] Fps is (10 sec: 7568.5, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 129646592. Throughput: 0: 1701.9. Samples: 27403570. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:43:32,933][41694] Avg episode reward: [(0, '4.379')] +[2024-11-08 03:43:35,065][42004] Updated weights for policy 0, policy_version 31656 (0.0028) +[2024-11-08 03:43:37,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6621.9, 300 sec: 6761.9). Total num frames: 129683456. Throughput: 0: 1748.9. Samples: 27415278. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:43:37,934][41694] Avg episode reward: [(0, '4.326')] +[2024-11-08 03:43:38,063][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000031662_129687552.pth... +[2024-11-08 03:43:38,146][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000031266_128065536.pth +[2024-11-08 03:43:40,369][42004] Updated weights for policy 0, policy_version 31666 (0.0027) +[2024-11-08 03:43:42,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6808.1). Total num frames: 129720320. Throughput: 0: 1754.5. Samples: 27426760. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:43:42,934][41694] Avg episode reward: [(0, '4.378')] +[2024-11-08 03:43:45,487][42004] Updated weights for policy 0, policy_version 31676 (0.0020) +[2024-11-08 03:43:47,931][41694] Fps is (10 sec: 7782.6, 60 sec: 7159.1, 300 sec: 6845.2). Total num frames: 129761280. Throughput: 0: 1756.0. Samples: 27432782. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:43:47,933][41694] Avg episode reward: [(0, '4.409')] +[2024-11-08 03:43:51,083][42004] Updated weights for policy 0, policy_version 31686 (0.0023) +[2024-11-08 03:43:52,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7099.8, 300 sec: 6831.3). Total num frames: 129794048. Throughput: 0: 1752.6. Samples: 27443896. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:43:52,935][41694] Avg episode reward: [(0, '4.402')] +[2024-11-08 03:43:59,321][41694] Fps is (10 sec: 5394.5, 60 sec: 6805.6, 300 sec: 6785.5). Total num frames: 129822720. Throughput: 0: 1653.6. Samples: 27453546. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:43:59,322][41694] Avg episode reward: [(0, '4.402')] +[2024-11-08 03:43:59,656][42004] Updated weights for policy 0, policy_version 31696 (0.0027) +[2024-11-08 03:44:02,932][41694] Fps is (10 sec: 5324.6, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 129847296. Throughput: 0: 1609.9. Samples: 27454910. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:44:02,934][41694] Avg episode reward: [(0, '4.594')] +[2024-11-08 03:44:05,269][42004] Updated weights for policy 0, policy_version 31706 (0.0034) +[2024-11-08 03:44:07,931][41694] Fps is (10 sec: 7611.0, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 129888256. Throughput: 0: 1702.5. Samples: 27465952. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:44:07,934][41694] Avg episode reward: [(0, '4.293')] +[2024-11-08 03:44:10,547][42004] Updated weights for policy 0, policy_version 31716 (0.0022) +[2024-11-08 03:44:12,931][41694] Fps is (10 sec: 7782.7, 60 sec: 6690.2, 300 sec: 6775.8). Total num frames: 129925120. Throughput: 0: 1763.5. Samples: 27477500. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:44:12,933][41694] Avg episode reward: [(0, '4.150')] +[2024-11-08 03:44:15,830][42004] Updated weights for policy 0, policy_version 31726 (0.0024) +[2024-11-08 03:44:17,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6826.6, 300 sec: 6831.3). Total num frames: 129966080. Throughput: 0: 1772.7. Samples: 27483342. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:44:17,933][41694] Avg episode reward: [(0, '4.339')] +[2024-11-08 03:44:21,104][42004] Updated weights for policy 0, policy_version 31736 (0.0026) +[2024-11-08 03:44:22,932][41694] Fps is (10 sec: 7372.3, 60 sec: 7118.2, 300 sec: 6831.3). Total num frames: 129998848. Throughput: 0: 1769.1. Samples: 27494888. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:44:22,934][41694] Avg episode reward: [(0, '4.655')] +[2024-11-08 03:44:27,340][42004] Updated weights for policy 0, policy_version 31746 (0.0032) +[2024-11-08 03:44:27,932][41694] Fps is (10 sec: 6963.3, 60 sec: 7168.0, 300 sec: 6817.4). Total num frames: 130035712. Throughput: 0: 1738.4. Samples: 27504988. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:44:27,935][41694] Avg episode reward: [(0, '4.482')] +[2024-11-08 03:44:33,917][41694] Fps is (10 sec: 5593.2, 60 sec: 6783.5, 300 sec: 6753.2). Total num frames: 130060288. Throughput: 0: 1682.4. Samples: 27510148. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:44:33,919][41694] Avg episode reward: [(0, '4.645')] +[2024-11-08 03:44:35,441][42004] Updated weights for policy 0, policy_version 31756 (0.0025) +[2024-11-08 03:44:37,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 130088960. Throughput: 0: 1623.1. Samples: 27516936. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 03:44:37,934][41694] Avg episode reward: [(0, '4.892')] +[2024-11-08 03:44:40,571][42004] Updated weights for policy 0, policy_version 31766 (0.0031) +[2024-11-08 03:44:42,932][41694] Fps is (10 sec: 7724.2, 60 sec: 6826.6, 300 sec: 6734.1). Total num frames: 130129920. Throughput: 0: 1721.8. Samples: 27528634. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 03:44:42,934][41694] Avg episode reward: [(0, '4.522')] +[2024-11-08 03:44:45,959][42004] Updated weights for policy 0, policy_version 31776 (0.0028) +[2024-11-08 03:44:47,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 130166784. Throughput: 0: 1763.2. Samples: 27534252. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 03:44:47,933][41694] Avg episode reward: [(0, '4.420')] +[2024-11-08 03:44:51,026][42004] Updated weights for policy 0, policy_version 31786 (0.0022) +[2024-11-08 03:44:52,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6826.7, 300 sec: 6803.6). Total num frames: 130203648. Throughput: 0: 1785.6. Samples: 27546306. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 03:44:52,934][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 03:44:56,754][42004] Updated weights for policy 0, policy_version 31796 (0.0022) +[2024-11-08 03:44:57,932][41694] Fps is (10 sec: 7782.0, 60 sec: 7198.1, 300 sec: 6845.2). Total num frames: 130244608. Throughput: 0: 1770.5. Samples: 27557176. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:44:57,935][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 03:45:02,931][41694] Fps is (10 sec: 6963.3, 60 sec: 7099.8, 300 sec: 6803.5). Total num frames: 130273280. Throughput: 0: 1745.5. Samples: 27561888. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:45:02,933][41694] Avg episode reward: [(0, '4.569')] +[2024-11-08 03:45:03,195][42004] Updated weights for policy 0, policy_version 31806 (0.0045) +[2024-11-08 03:45:08,406][41694] Fps is (10 sec: 5084.0, 60 sec: 6773.1, 300 sec: 6751.0). Total num frames: 130297856. Throughput: 0: 1693.8. Samples: 27571912. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:45:08,407][41694] Avg episode reward: [(0, '4.413')] +[2024-11-08 03:45:11,125][42004] Updated weights for policy 0, policy_version 31816 (0.0026) +[2024-11-08 03:45:12,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 130330624. Throughput: 0: 1639.5. Samples: 27578766. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:45:12,934][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 03:45:16,952][42004] Updated weights for policy 0, policy_version 31826 (0.0042) +[2024-11-08 03:45:17,932][41694] Fps is (10 sec: 6879.6, 60 sec: 6621.8, 300 sec: 6748.0). Total num frames: 130363392. Throughput: 0: 1674.7. Samples: 27583858. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:45:17,933][41694] Avg episode reward: [(0, '4.492')] +[2024-11-08 03:45:22,329][42004] Updated weights for policy 0, policy_version 31836 (0.0023) +[2024-11-08 03:45:22,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 130404352. Throughput: 0: 1735.0. Samples: 27595012. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:45:22,934][41694] Avg episode reward: [(0, '4.552')] +[2024-11-08 03:45:27,473][42004] Updated weights for policy 0, policy_version 31846 (0.0030) +[2024-11-08 03:45:27,931][41694] Fps is (10 sec: 7782.7, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 130441216. Throughput: 0: 1740.3. Samples: 27606948. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:45:27,934][41694] Avg episode reward: [(0, '4.508')] +[2024-11-08 03:45:32,780][42004] Updated weights for policy 0, policy_version 31856 (0.0026) +[2024-11-08 03:45:32,931][41694] Fps is (10 sec: 7782.8, 60 sec: 7148.9, 300 sec: 6845.2). Total num frames: 130482176. Throughput: 0: 1744.8. Samples: 27612770. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:45:32,933][41694] Avg episode reward: [(0, '4.162')] +[2024-11-08 03:45:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7099.7, 300 sec: 6831.3). Total num frames: 130514944. Throughput: 0: 1711.5. Samples: 27623322. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:45:37,933][41694] Avg episode reward: [(0, '4.458')] +[2024-11-08 03:45:37,963][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000031864_130514944.pth... +[2024-11-08 03:45:38,070][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000031465_128880640.pth +[2024-11-08 03:45:38,939][42004] Updated weights for policy 0, policy_version 31866 (0.0033) +[2024-11-08 03:45:42,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 130535424. Throughput: 0: 1585.7. Samples: 27628532. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:45:42,934][41694] Avg episode reward: [(0, '4.495')] +[2024-11-08 03:45:47,049][42004] Updated weights for policy 0, policy_version 31876 (0.0029) +[2024-11-08 03:45:47,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 130568192. Throughput: 0: 1621.8. Samples: 27634870. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:45:47,933][41694] Avg episode reward: [(0, '4.300')] +[2024-11-08 03:45:52,384][42004] Updated weights for policy 0, policy_version 31886 (0.0031) +[2024-11-08 03:45:52,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 130609152. Throughput: 0: 1670.3. Samples: 27646282. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:45:52,933][41694] Avg episode reward: [(0, '4.409')] +[2024-11-08 03:45:57,691][42004] Updated weights for policy 0, policy_version 31896 (0.0036) +[2024-11-08 03:45:57,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6690.2, 300 sec: 6761.9). Total num frames: 130646016. Throughput: 0: 1755.2. Samples: 27657748. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:45:57,934][41694] Avg episode reward: [(0, '4.411')] +[2024-11-08 03:46:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.6, 300 sec: 6845.2). Total num frames: 130682880. Throughput: 0: 1768.4. Samples: 27663436. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:46:02,934][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 03:46:03,383][42004] Updated weights for policy 0, policy_version 31906 (0.0031) +[2024-11-08 03:46:07,932][41694] Fps is (10 sec: 7372.5, 60 sec: 7087.4, 300 sec: 6859.0). Total num frames: 130719744. Throughput: 0: 1761.1. Samples: 27674260. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:46:07,935][41694] Avg episode reward: [(0, '4.364')] +[2024-11-08 03:46:09,217][42004] Updated weights for policy 0, policy_version 31916 (0.0031) +[2024-11-08 03:46:12,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6963.2, 300 sec: 6845.2). Total num frames: 130748416. Throughput: 0: 1699.1. Samples: 27683406. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:46:12,937][41694] Avg episode reward: [(0, '4.487')] +[2024-11-08 03:46:17,223][42004] Updated weights for policy 0, policy_version 31926 (0.0036) +[2024-11-08 03:46:17,931][41694] Fps is (10 sec: 5325.1, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 130772992. Throughput: 0: 1681.6. Samples: 27688440. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:46:17,934][41694] Avg episode reward: [(0, '4.370')] +[2024-11-08 03:46:22,396][42004] Updated weights for policy 0, policy_version 31936 (0.0034) +[2024-11-08 03:46:22,933][41694] Fps is (10 sec: 6552.6, 60 sec: 6826.5, 300 sec: 6803.5). Total num frames: 130813952. Throughput: 0: 1638.4. Samples: 27697054. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:46:22,937][41694] Avg episode reward: [(0, '4.609')] +[2024-11-08 03:46:27,569][42004] Updated weights for policy 0, policy_version 31946 (0.0033) +[2024-11-08 03:46:27,932][41694] Fps is (10 sec: 7781.8, 60 sec: 6826.6, 300 sec: 6803.5). Total num frames: 130850816. Throughput: 0: 1790.6. Samples: 27709112. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:46:27,934][41694] Avg episode reward: [(0, '4.382')] +[2024-11-08 03:46:32,932][41694] Fps is (10 sec: 7373.7, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 130887680. Throughput: 0: 1780.3. Samples: 27714984. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:46:32,935][41694] Avg episode reward: [(0, '4.401')] +[2024-11-08 03:46:33,229][42004] Updated weights for policy 0, policy_version 31956 (0.0029) +[2024-11-08 03:46:37,932][41694] Fps is (10 sec: 7373.2, 60 sec: 6826.6, 300 sec: 6886.8). Total num frames: 130924544. Throughput: 0: 1756.2. Samples: 27725310. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:46:37,935][41694] Avg episode reward: [(0, '4.475')] +[2024-11-08 03:46:39,031][42004] Updated weights for policy 0, policy_version 31966 (0.0030) +[2024-11-08 03:46:42,931][41694] Fps is (10 sec: 6963.3, 60 sec: 7031.5, 300 sec: 6873.0). Total num frames: 130957312. Throughput: 0: 1724.9. Samples: 27735368. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:46:42,936][41694] Avg episode reward: [(0, '4.441')] +[2024-11-08 03:46:45,628][42004] Updated weights for policy 0, policy_version 31976 (0.0029) +[2024-11-08 03:46:47,932][41694] Fps is (10 sec: 6144.1, 60 sec: 6963.2, 300 sec: 6845.2). Total num frames: 130985984. Throughput: 0: 1700.1. Samples: 27739942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:46:47,933][41694] Avg episode reward: [(0, '4.287')] +[2024-11-08 03:46:52,361][42004] Updated weights for policy 0, policy_version 31986 (0.0039) +[2024-11-08 03:46:52,933][41694] Fps is (10 sec: 5733.5, 60 sec: 6758.2, 300 sec: 6803.5). Total num frames: 131014656. Throughput: 0: 1655.4. Samples: 27748756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:46:52,942][41694] Avg episode reward: [(0, '4.356')] +[2024-11-08 03:46:57,744][42004] Updated weights for policy 0, policy_version 31996 (0.0030) +[2024-11-08 03:46:57,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 131055616. Throughput: 0: 1698.4. Samples: 27759836. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:46:57,933][41694] Avg episode reward: [(0, '4.521')] +[2024-11-08 03:47:02,931][41694] Fps is (10 sec: 7783.6, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 131092480. Throughput: 0: 1718.6. Samples: 27765776. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:47:02,934][41694] Avg episode reward: [(0, '4.456')] +[2024-11-08 03:47:03,176][42004] Updated weights for policy 0, policy_version 32006 (0.0031) +[2024-11-08 03:47:07,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6895.0, 300 sec: 6845.2). Total num frames: 131133440. Throughput: 0: 1783.8. Samples: 27777322. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:47:07,933][41694] Avg episode reward: [(0, '4.356')] +[2024-11-08 03:47:08,268][42004] Updated weights for policy 0, policy_version 32016 (0.0027) +[2024-11-08 03:47:12,934][41694] Fps is (10 sec: 7780.9, 60 sec: 7031.2, 300 sec: 6914.6). Total num frames: 131170304. Throughput: 0: 1779.1. Samples: 27789172. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:47:12,940][41694] Avg episode reward: [(0, '4.392')] +[2024-11-08 03:47:13,619][42004] Updated weights for policy 0, policy_version 32026 (0.0028) +[2024-11-08 03:47:17,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7236.2, 300 sec: 6914.6). Total num frames: 131207168. Throughput: 0: 1763.4. Samples: 27794338. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:47:17,934][41694] Avg episode reward: [(0, '4.275')] +[2024-11-08 03:47:19,780][42004] Updated weights for policy 0, policy_version 32036 (0.0041) +[2024-11-08 03:47:24,360][41694] Fps is (10 sec: 5376.8, 60 sec: 6801.4, 300 sec: 6826.0). Total num frames: 131231744. Throughput: 0: 1705.5. Samples: 27804492. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:47:24,366][41694] Avg episode reward: [(0, '4.478')] +[2024-11-08 03:47:27,923][42004] Updated weights for policy 0, policy_version 32046 (0.0028) +[2024-11-08 03:47:27,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6826.8, 300 sec: 6831.3). Total num frames: 131260416. Throughput: 0: 1678.8. Samples: 27810916. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:47:27,933][41694] Avg episode reward: [(0, '4.489')] +[2024-11-08 03:47:32,931][41694] Fps is (10 sec: 7168.3, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 131293184. Throughput: 0: 1696.9. Samples: 27816300. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:47:32,933][41694] Avg episode reward: [(0, '4.353')] +[2024-11-08 03:47:33,592][42004] Updated weights for policy 0, policy_version 32056 (0.0024) +[2024-11-08 03:47:37,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6826.7, 300 sec: 6845.2). Total num frames: 131334144. Throughput: 0: 1748.1. Samples: 27827420. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:47:37,936][41694] Avg episode reward: [(0, '4.369')] +[2024-11-08 03:47:37,952][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000032064_131334144.pth... +[2024-11-08 03:47:38,097][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000031662_129687552.pth +[2024-11-08 03:47:38,971][42004] Updated weights for policy 0, policy_version 32066 (0.0024) +[2024-11-08 03:47:42,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6896.9). Total num frames: 131366912. Throughput: 0: 1742.4. Samples: 27838246. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:47:42,933][41694] Avg episode reward: [(0, '4.456')] +[2024-11-08 03:47:44,880][42004] Updated weights for policy 0, policy_version 32076 (0.0026) +[2024-11-08 03:47:47,931][41694] Fps is (10 sec: 7373.1, 60 sec: 7031.5, 300 sec: 6914.6). Total num frames: 131407872. Throughput: 0: 1733.8. Samples: 27843796. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:47:47,933][41694] Avg episode reward: [(0, '4.443')] +[2024-11-08 03:47:49,979][42004] Updated weights for policy 0, policy_version 32086 (0.0030) +[2024-11-08 03:47:52,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7168.2, 300 sec: 6914.6). Total num frames: 131444736. Throughput: 0: 1741.5. Samples: 27855688. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:47:52,934][41694] Avg episode reward: [(0, '4.706')] +[2024-11-08 03:47:55,183][42004] Updated weights for policy 0, policy_version 32096 (0.0028) +[2024-11-08 03:47:58,364][41694] Fps is (10 sec: 6281.9, 60 sec: 6913.4, 300 sec: 6862.9). Total num frames: 131473408. Throughput: 0: 1594.7. Samples: 27861618. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:47:58,365][41694] Avg episode reward: [(0, '4.497')] +[2024-11-08 03:48:02,882][42004] Updated weights for policy 0, policy_version 32106 (0.0025) +[2024-11-08 03:48:02,933][41694] Fps is (10 sec: 6142.8, 60 sec: 6894.7, 300 sec: 6859.0). Total num frames: 131506176. Throughput: 0: 1674.3. Samples: 27869686. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:48:02,936][41694] Avg episode reward: [(0, '4.346')] +[2024-11-08 03:48:07,931][41694] Fps is (10 sec: 6849.9, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 131538944. Throughput: 0: 1719.9. Samples: 27879428. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:48:07,934][41694] Avg episode reward: [(0, '4.362')] +[2024-11-08 03:48:08,529][42004] Updated weights for policy 0, policy_version 32116 (0.0045) +[2024-11-08 03:48:12,932][41694] Fps is (10 sec: 7374.2, 60 sec: 6826.9, 300 sec: 6859.1). Total num frames: 131579904. Throughput: 0: 1777.1. Samples: 27890886. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:48:12,935][41694] Avg episode reward: [(0, '4.308')] +[2024-11-08 03:48:13,900][42004] Updated weights for policy 0, policy_version 32126 (0.0030) +[2024-11-08 03:48:17,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6826.7, 300 sec: 6931.1). Total num frames: 131616768. Throughput: 0: 1785.0. Samples: 27896624. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:48:17,933][41694] Avg episode reward: [(0, '4.660')] +[2024-11-08 03:48:19,271][42004] Updated weights for policy 0, policy_version 32136 (0.0027) +[2024-11-08 03:48:22,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7203.0, 300 sec: 6942.4). Total num frames: 131653632. Throughput: 0: 1785.3. Samples: 27907758. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:48:22,934][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 03:48:24,798][42004] Updated weights for policy 0, policy_version 32146 (0.0024) +[2024-11-08 03:48:27,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7168.0, 300 sec: 6928.5). Total num frames: 131690496. Throughput: 0: 1794.4. Samples: 27918992. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:48:27,933][41694] Avg episode reward: [(0, '4.505')] +[2024-11-08 03:48:30,567][42004] Updated weights for policy 0, policy_version 32156 (0.0033) +[2024-11-08 03:48:32,932][41694] Fps is (10 sec: 5734.1, 60 sec: 6963.1, 300 sec: 6872.9). Total num frames: 131710976. Throughput: 0: 1788.3. Samples: 27924272. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:48:32,937][41694] Avg episode reward: [(0, '4.511')] +[2024-11-08 03:48:37,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6826.7, 300 sec: 6859.1). Total num frames: 131743744. Throughput: 0: 1668.0. Samples: 27930746. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:48:37,934][41694] Avg episode reward: [(0, '4.299')] +[2024-11-08 03:48:38,638][42004] Updated weights for policy 0, policy_version 32166 (0.0029) +[2024-11-08 03:48:42,931][41694] Fps is (10 sec: 6963.7, 60 sec: 6895.0, 300 sec: 6845.2). Total num frames: 131780608. Throughput: 0: 1797.4. Samples: 27941722. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:48:42,934][41694] Avg episode reward: [(0, '4.496')] +[2024-11-08 03:48:44,168][42004] Updated weights for policy 0, policy_version 32176 (0.0030) +[2024-11-08 03:48:47,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6894.9, 300 sec: 6872.9). Total num frames: 131821568. Throughput: 0: 1726.8. Samples: 27947390. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:48:47,934][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 03:48:49,420][42004] Updated weights for policy 0, policy_version 32186 (0.0027) +[2024-11-08 03:48:52,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6895.0, 300 sec: 6933.4). Total num frames: 131858432. Throughput: 0: 1769.3. Samples: 27959046. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:48:52,933][41694] Avg episode reward: [(0, '4.590')] +[2024-11-08 03:48:54,775][42004] Updated weights for policy 0, policy_version 32196 (0.0026) +[2024-11-08 03:48:57,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7082.5, 300 sec: 6942.4). Total num frames: 131895296. Throughput: 0: 1772.0. Samples: 27970624. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:48:57,935][41694] Avg episode reward: [(0, '4.608')] +[2024-11-08 03:49:00,307][42004] Updated weights for policy 0, policy_version 32206 (0.0026) +[2024-11-08 03:49:02,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7099.9, 300 sec: 6928.5). Total num frames: 131932160. Throughput: 0: 1768.1. Samples: 27976190. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:49:02,934][41694] Avg episode reward: [(0, '4.297')] +[2024-11-08 03:49:07,932][41694] Fps is (10 sec: 5734.0, 60 sec: 6894.8, 300 sec: 6872.9). Total num frames: 131952640. Throughput: 0: 1708.8. Samples: 27984656. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:49:07,934][41694] Avg episode reward: [(0, '4.312')] +[2024-11-08 03:49:08,509][42004] Updated weights for policy 0, policy_version 32216 (0.0027) +[2024-11-08 03:49:12,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 131985408. Throughput: 0: 1634.1. Samples: 27992528. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:49:12,938][41694] Avg episode reward: [(0, '4.376')] +[2024-11-08 03:49:14,547][42004] Updated weights for policy 0, policy_version 32226 (0.0035) +[2024-11-08 03:49:17,932][41694] Fps is (10 sec: 6963.7, 60 sec: 6758.4, 300 sec: 6859.1). Total num frames: 132022272. Throughput: 0: 1635.1. Samples: 27997852. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:49:17,934][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 03:49:19,934][42004] Updated weights for policy 0, policy_version 32236 (0.0026) +[2024-11-08 03:49:22,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6859.1). Total num frames: 132059136. Throughput: 0: 1751.8. Samples: 28009578. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:49:22,933][41694] Avg episode reward: [(0, '4.308')] +[2024-11-08 03:49:25,072][42004] Updated weights for policy 0, policy_version 32246 (0.0024) +[2024-11-08 03:49:27,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6826.6, 300 sec: 6937.8). Total num frames: 132100096. Throughput: 0: 1765.6. Samples: 28021176. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:49:27,934][41694] Avg episode reward: [(0, '4.552')] +[2024-11-08 03:49:30,618][42004] Updated weights for policy 0, policy_version 32256 (0.0028) +[2024-11-08 03:49:32,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7099.8, 300 sec: 6942.4). Total num frames: 132136960. Throughput: 0: 1760.8. Samples: 28026628. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:49:32,933][41694] Avg episode reward: [(0, '4.362')] +[2024-11-08 03:49:36,221][42004] Updated weights for policy 0, policy_version 32266 (0.0025) +[2024-11-08 03:49:37,932][41694] Fps is (10 sec: 6963.3, 60 sec: 7099.7, 300 sec: 6914.6). Total num frames: 132169728. Throughput: 0: 1745.9. Samples: 28037612. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:49:37,934][41694] Avg episode reward: [(0, '4.558')] +[2024-11-08 03:49:37,989][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000032269_132173824.pth... +[2024-11-08 03:49:38,084][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000031864_130514944.pth +[2024-11-08 03:49:42,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 132186112. Throughput: 0: 1623.2. Samples: 28043670. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:49:42,933][41694] Avg episode reward: [(0, '4.635')] +[2024-11-08 03:49:45,359][42004] Updated weights for policy 0, policy_version 32276 (0.0052) +[2024-11-08 03:49:47,932][41694] Fps is (10 sec: 4915.2, 60 sec: 6621.9, 300 sec: 6831.3). Total num frames: 132218880. Throughput: 0: 1589.2. Samples: 28047704. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:49:47,934][41694] Avg episode reward: [(0, '4.506')] +[2024-11-08 03:49:51,092][42004] Updated weights for policy 0, policy_version 32286 (0.0031) +[2024-11-08 03:49:52,933][41694] Fps is (10 sec: 6552.3, 60 sec: 6553.4, 300 sec: 6803.5). Total num frames: 132251648. Throughput: 0: 1634.7. Samples: 28058218. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 03:49:52,935][41694] Avg episode reward: [(0, '4.361')] +[2024-11-08 03:49:56,815][42004] Updated weights for policy 0, policy_version 32296 (0.0047) +[2024-11-08 03:49:57,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6845.2). Total num frames: 132292608. Throughput: 0: 1701.1. Samples: 28069076. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 03:49:57,933][41694] Avg episode reward: [(0, '4.446')] +[2024-11-08 03:50:02,931][41694] Fps is (10 sec: 6554.9, 60 sec: 6417.1, 300 sec: 6856.2). Total num frames: 132317184. Throughput: 0: 1689.3. Samples: 28073868. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 03:50:02,933][41694] Avg episode reward: [(0, '4.446')] +[2024-11-08 03:50:03,953][42004] Updated weights for policy 0, policy_version 32306 (0.0021) +[2024-11-08 03:50:07,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6621.9, 300 sec: 6845.2). Total num frames: 132349952. Throughput: 0: 1603.6. Samples: 28081742. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 03:50:07,934][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 03:50:10,107][42004] Updated weights for policy 0, policy_version 32316 (0.0027) +[2024-11-08 03:50:12,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6621.9, 300 sec: 6845.2). Total num frames: 132382720. Throughput: 0: 1578.5. Samples: 28092206. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:50:12,933][41694] Avg episode reward: [(0, '4.345')] +[2024-11-08 03:50:17,933][41694] Fps is (10 sec: 4505.0, 60 sec: 6212.1, 300 sec: 6748.0). Total num frames: 132395008. Throughput: 0: 1523.2. Samples: 28095174. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:50:17,937][41694] Avg episode reward: [(0, '4.387')] +[2024-11-08 03:50:19,763][42004] Updated weights for policy 0, policy_version 32326 (0.0038) +[2024-11-08 03:50:22,931][41694] Fps is (10 sec: 4505.6, 60 sec: 6144.0, 300 sec: 6734.1). Total num frames: 132427776. Throughput: 0: 1415.4. Samples: 28101306. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:50:22,933][41694] Avg episode reward: [(0, '4.323')] +[2024-11-08 03:50:25,739][42004] Updated weights for policy 0, policy_version 32336 (0.0025) +[2024-11-08 03:50:27,932][41694] Fps is (10 sec: 6554.8, 60 sec: 6007.5, 300 sec: 6706.3). Total num frames: 132460544. Throughput: 0: 1520.2. Samples: 28112080. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:50:27,933][41694] Avg episode reward: [(0, '4.268')] +[2024-11-08 03:50:31,249][42004] Updated weights for policy 0, policy_version 32346 (0.0052) +[2024-11-08 03:50:32,933][41694] Fps is (10 sec: 7371.4, 60 sec: 6075.5, 300 sec: 6734.1). Total num frames: 132501504. Throughput: 0: 1551.5. Samples: 28117526. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:50:32,936][41694] Avg episode reward: [(0, '4.368')] +[2024-11-08 03:50:36,726][42004] Updated weights for policy 0, policy_version 32356 (0.0031) +[2024-11-08 03:50:37,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6144.0, 300 sec: 6789.6). Total num frames: 132538368. Throughput: 0: 1566.3. Samples: 28128698. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:50:37,934][41694] Avg episode reward: [(0, '4.490')] +[2024-11-08 03:50:42,293][42004] Updated weights for policy 0, policy_version 32366 (0.0025) +[2024-11-08 03:50:42,933][41694] Fps is (10 sec: 7373.1, 60 sec: 6485.2, 300 sec: 6803.5). Total num frames: 132575232. Throughput: 0: 1572.3. Samples: 28139832. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:50:42,935][41694] Avg episode reward: [(0, '4.331')] +[2024-11-08 03:50:47,671][42004] Updated weights for policy 0, policy_version 32376 (0.0025) +[2024-11-08 03:50:47,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6553.6, 300 sec: 6789.6). Total num frames: 132612096. Throughput: 0: 1588.7. Samples: 28145360. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:50:47,933][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 03:50:52,931][41694] Fps is (10 sec: 5735.3, 60 sec: 6349.0, 300 sec: 6734.1). Total num frames: 132632576. Throughput: 0: 1583.0. Samples: 28152978. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:50:52,935][41694] Avg episode reward: [(0, '4.627')] +[2024-11-08 03:50:55,824][42004] Updated weights for policy 0, policy_version 32386 (0.0032) +[2024-11-08 03:50:57,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6212.3, 300 sec: 6720.2). Total num frames: 132665344. Throughput: 0: 1577.9. Samples: 28163210. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:50:57,933][41694] Avg episode reward: [(0, '4.663')] +[2024-11-08 03:51:01,362][42004] Updated weights for policy 0, policy_version 32396 (0.0038) +[2024-11-08 03:51:02,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6417.0, 300 sec: 6720.2). Total num frames: 132702208. Throughput: 0: 1630.8. Samples: 28168558. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:51:02,934][41694] Avg episode reward: [(0, '4.341')] +[2024-11-08 03:51:07,094][42004] Updated weights for policy 0, policy_version 32406 (0.0035) +[2024-11-08 03:51:07,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6485.4, 300 sec: 6748.0). Total num frames: 132739072. Throughput: 0: 1731.4. Samples: 28179220. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:51:07,934][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 03:51:12,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6485.3, 300 sec: 6775.8). Total num frames: 132771840. Throughput: 0: 1724.3. Samples: 28189674. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:51:12,933][41694] Avg episode reward: [(0, '4.778')] +[2024-11-08 03:51:13,092][42004] Updated weights for policy 0, policy_version 32416 (0.0039) +[2024-11-08 03:51:17,933][41694] Fps is (10 sec: 6962.5, 60 sec: 6895.0, 300 sec: 6761.9). Total num frames: 132808704. Throughput: 0: 1720.0. Samples: 28194926. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:51:17,935][41694] Avg episode reward: [(0, '4.469')] +[2024-11-08 03:51:18,507][42004] Updated weights for policy 0, policy_version 32426 (0.0040) +[2024-11-08 03:51:24,945][41694] Fps is (10 sec: 6136.8, 60 sec: 6737.1, 300 sec: 6716.0). Total num frames: 132845568. Throughput: 0: 1653.7. Samples: 28206446. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:51:24,947][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 03:51:26,297][42004] Updated weights for policy 0, policy_version 32436 (0.0027) +[2024-11-08 03:51:27,932][41694] Fps is (10 sec: 5735.0, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 132866048. Throughput: 0: 1627.3. Samples: 28213060. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:51:27,933][41694] Avg episode reward: [(0, '4.283')] +[2024-11-08 03:51:32,121][42004] Updated weights for policy 0, policy_version 32446 (0.0030) +[2024-11-08 03:51:32,932][41694] Fps is (10 sec: 7180.4, 60 sec: 6690.3, 300 sec: 6706.3). Total num frames: 132902912. Throughput: 0: 1616.6. Samples: 28218106. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:51:32,934][41694] Avg episode reward: [(0, '4.332')] +[2024-11-08 03:51:37,548][42004] Updated weights for policy 0, policy_version 32456 (0.0030) +[2024-11-08 03:51:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.1, 300 sec: 6720.2). Total num frames: 132939776. Throughput: 0: 1699.5. Samples: 28229454. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:51:37,934][41694] Avg episode reward: [(0, '4.421')] +[2024-11-08 03:51:37,949][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000032456_132939776.pth... +[2024-11-08 03:51:38,043][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000032064_131334144.pth +[2024-11-08 03:51:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.3, 300 sec: 6748.0). Total num frames: 132976640. Throughput: 0: 1728.1. Samples: 28240974. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:51:42,933][41694] Avg episode reward: [(0, '4.415')] +[2024-11-08 03:51:42,993][42004] Updated weights for policy 0, policy_version 32466 (0.0027) +[2024-11-08 03:51:47,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6758.3, 300 sec: 6789.7). Total num frames: 133017600. Throughput: 0: 1732.8. Samples: 28246536. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:51:47,934][41694] Avg episode reward: [(0, '4.464')] +[2024-11-08 03:51:48,228][42004] Updated weights for policy 0, policy_version 32476 (0.0030) +[2024-11-08 03:51:52,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7031.5, 300 sec: 6775.8). Total num frames: 133054464. Throughput: 0: 1741.4. Samples: 28257582. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:51:52,933][41694] Avg episode reward: [(0, '4.126')] +[2024-11-08 03:51:53,978][42004] Updated weights for policy 0, policy_version 32486 (0.0037) +[2024-11-08 03:51:59,274][41694] Fps is (10 sec: 5777.9, 60 sec: 6810.8, 300 sec: 6717.4). Total num frames: 133083136. Throughput: 0: 1694.9. Samples: 28268220. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:51:59,276][41694] Avg episode reward: [(0, '4.389')] +[2024-11-08 03:52:02,414][42004] Updated weights for policy 0, policy_version 32496 (0.0036) +[2024-11-08 03:52:02,932][41694] Fps is (10 sec: 4915.2, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 133103616. Throughput: 0: 1662.5. Samples: 28269738. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:52:02,934][41694] Avg episode reward: [(0, '4.369')] +[2024-11-08 03:52:07,932][41694] Fps is (10 sec: 6623.6, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 133140480. Throughput: 0: 1707.3. Samples: 28279836. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:52:07,935][41694] Avg episode reward: [(0, '4.517')] +[2024-11-08 03:52:07,997][42004] Updated weights for policy 0, policy_version 32506 (0.0038) +[2024-11-08 03:52:12,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6826.6, 300 sec: 6692.5). Total num frames: 133181440. Throughput: 0: 1736.5. Samples: 28291202. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:52:12,934][41694] Avg episode reward: [(0, '4.268')] +[2024-11-08 03:52:13,428][42004] Updated weights for policy 0, policy_version 32516 (0.0030) +[2024-11-08 03:52:17,931][41694] Fps is (10 sec: 7782.8, 60 sec: 6826.8, 300 sec: 6766.9). Total num frames: 133218304. Throughput: 0: 1748.4. Samples: 28296782. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:52:17,933][41694] Avg episode reward: [(0, '4.512')] +[2024-11-08 03:52:18,851][42004] Updated weights for policy 0, policy_version 32526 (0.0028) +[2024-11-08 03:52:22,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7063.7, 300 sec: 6761.9). Total num frames: 133255168. Throughput: 0: 1747.1. Samples: 28308074. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 03:52:22,933][41694] Avg episode reward: [(0, '4.562')] +[2024-11-08 03:52:24,406][42004] Updated weights for policy 0, policy_version 32536 (0.0026) +[2024-11-08 03:52:27,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7099.7, 300 sec: 6775.8). Total num frames: 133292032. Throughput: 0: 1742.8. Samples: 28319398. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 03:52:27,933][41694] Avg episode reward: [(0, '4.486')] +[2024-11-08 03:52:29,890][42004] Updated weights for policy 0, policy_version 32546 (0.0024) +[2024-11-08 03:52:33,611][41694] Fps is (10 sec: 5753.4, 60 sec: 6817.8, 300 sec: 6704.8). Total num frames: 133316608. Throughput: 0: 1714.7. Samples: 28324860. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 03:52:33,612][41694] Avg episode reward: [(0, '4.472')] +[2024-11-08 03:52:37,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 133345280. Throughput: 0: 1634.9. Samples: 28331152. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 03:52:37,933][41694] Avg episode reward: [(0, '4.343')] +[2024-11-08 03:52:37,986][42004] Updated weights for policy 0, policy_version 32556 (0.0041) +[2024-11-08 03:52:42,932][41694] Fps is (10 sec: 7470.4, 60 sec: 6826.7, 300 sec: 6706.3). Total num frames: 133386240. Throughput: 0: 1702.7. Samples: 28342556. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:52:42,934][41694] Avg episode reward: [(0, '4.415')] +[2024-11-08 03:52:43,412][42004] Updated weights for policy 0, policy_version 32566 (0.0027) +[2024-11-08 03:52:47,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 133423104. Throughput: 0: 1736.8. Samples: 28347894. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:52:47,934][41694] Avg episode reward: [(0, '4.414')] +[2024-11-08 03:52:48,977][42004] Updated weights for policy 0, policy_version 32576 (0.0038) +[2024-11-08 03:52:52,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6744.0). Total num frames: 133459968. Throughput: 0: 1758.5. Samples: 28358968. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:52:52,936][41694] Avg episode reward: [(0, '4.448')] +[2024-11-08 03:52:54,502][42004] Updated weights for policy 0, policy_version 32586 (0.0028) +[2024-11-08 03:52:57,932][41694] Fps is (10 sec: 6962.7, 60 sec: 6982.9, 300 sec: 6734.1). Total num frames: 133492736. Throughput: 0: 1749.1. Samples: 28369912. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:52:57,939][41694] Avg episode reward: [(0, '4.313')] +[2024-11-08 03:53:00,189][42004] Updated weights for policy 0, policy_version 32596 (0.0045) +[2024-11-08 03:53:02,932][41694] Fps is (10 sec: 6963.3, 60 sec: 7099.7, 300 sec: 6748.0). Total num frames: 133529600. Throughput: 0: 1752.5. Samples: 28375644. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:53:02,933][41694] Avg episode reward: [(0, '4.641')] +[2024-11-08 03:53:07,965][41694] Fps is (10 sec: 5307.3, 60 sec: 6754.6, 300 sec: 6663.9). Total num frames: 133545984. Throughput: 0: 1613.3. Samples: 28380728. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:53:07,967][41694] Avg episode reward: [(0, '4.517')] +[2024-11-08 03:53:08,608][42004] Updated weights for policy 0, policy_version 32606 (0.0037) +[2024-11-08 03:53:12,931][41694] Fps is (10 sec: 4915.3, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 133578752. Throughput: 0: 1605.2. Samples: 28391630. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:53:12,939][41694] Avg episode reward: [(0, '4.424')] +[2024-11-08 03:53:14,945][42004] Updated weights for policy 0, policy_version 32616 (0.0036) +[2024-11-08 03:53:17,932][41694] Fps is (10 sec: 6986.8, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 133615616. Throughput: 0: 1615.8. Samples: 28396476. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:53:17,934][41694] Avg episode reward: [(0, '4.332')] +[2024-11-08 03:53:20,267][42004] Updated weights for policy 0, policy_version 32626 (0.0022) +[2024-11-08 03:53:22,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 133652480. Throughput: 0: 1706.9. Samples: 28407964. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:53:22,934][41694] Avg episode reward: [(0, '4.307')] +[2024-11-08 03:53:25,816][42004] Updated weights for policy 0, policy_version 32636 (0.0026) +[2024-11-08 03:53:27,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6706.3). Total num frames: 133689344. Throughput: 0: 1694.8. Samples: 28418824. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:53:27,934][41694] Avg episode reward: [(0, '4.655')] +[2024-11-08 03:53:31,322][42004] Updated weights for policy 0, policy_version 32646 (0.0033) +[2024-11-08 03:53:32,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6973.8, 300 sec: 6734.1). Total num frames: 133730304. Throughput: 0: 1699.9. Samples: 28424390. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:53:32,934][41694] Avg episode reward: [(0, '4.494')] +[2024-11-08 03:53:36,778][42004] Updated weights for policy 0, policy_version 32656 (0.0027) +[2024-11-08 03:53:37,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6963.2, 300 sec: 6720.2). Total num frames: 133763072. Throughput: 0: 1708.1. Samples: 28435834. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:53:37,933][41694] Avg episode reward: [(0, '4.361')] +[2024-11-08 03:53:37,958][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000032657_133763072.pth... +[2024-11-08 03:53:38,101][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000032269_132173824.pth +[2024-11-08 03:53:42,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6621.8, 300 sec: 6650.8). Total num frames: 133783552. Throughput: 0: 1615.0. Samples: 28442588. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:53:42,935][41694] Avg episode reward: [(0, '4.521')] +[2024-11-08 03:53:45,157][42004] Updated weights for policy 0, policy_version 32666 (0.0033) +[2024-11-08 03:53:47,933][41694] Fps is (10 sec: 5324.4, 60 sec: 6553.5, 300 sec: 6636.9). Total num frames: 133816320. Throughput: 0: 1592.2. Samples: 28447296. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:53:47,935][41694] Avg episode reward: [(0, '4.540')] +[2024-11-08 03:53:50,788][42004] Updated weights for policy 0, policy_version 32676 (0.0029) +[2024-11-08 03:53:52,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6553.6, 300 sec: 6636.9). Total num frames: 133853184. Throughput: 0: 1721.2. Samples: 28458122. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:53:52,933][41694] Avg episode reward: [(0, '4.477')] +[2024-11-08 03:53:56,410][42004] Updated weights for policy 0, policy_version 32686 (0.0029) +[2024-11-08 03:53:57,931][41694] Fps is (10 sec: 7373.5, 60 sec: 6622.0, 300 sec: 6636.9). Total num frames: 133890048. Throughput: 0: 1724.8. Samples: 28469246. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:53:57,933][41694] Avg episode reward: [(0, '4.531')] +[2024-11-08 03:54:02,079][42004] Updated weights for policy 0, policy_version 32696 (0.0028) +[2024-11-08 03:54:02,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6692.5). Total num frames: 133926912. Throughput: 0: 1741.0. Samples: 28474820. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:54:02,933][41694] Avg episode reward: [(0, '4.232')] +[2024-11-08 03:54:07,370][42004] Updated weights for policy 0, policy_version 32706 (0.0030) +[2024-11-08 03:54:07,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7035.4, 300 sec: 6720.2). Total num frames: 133967872. Throughput: 0: 1734.0. Samples: 28485992. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:54:07,934][41694] Avg episode reward: [(0, '4.349')] +[2024-11-08 03:54:12,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 6706.3). Total num frames: 134000640. Throughput: 0: 1731.6. Samples: 28496748. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:54:12,936][41694] Avg episode reward: [(0, '4.344')] +[2024-11-08 03:54:13,334][42004] Updated weights for policy 0, policy_version 32716 (0.0038) +[2024-11-08 03:54:17,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6758.4, 300 sec: 6650.8). Total num frames: 134021120. Throughput: 0: 1703.9. Samples: 28501066. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:54:17,933][41694] Avg episode reward: [(0, '4.366')] +[2024-11-08 03:54:21,101][42004] Updated weights for policy 0, policy_version 32726 (0.0022) +[2024-11-08 03:54:22,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6758.4, 300 sec: 6636.9). Total num frames: 134057984. Throughput: 0: 1623.6. Samples: 28508896. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:54:22,933][41694] Avg episode reward: [(0, '4.347')] +[2024-11-08 03:54:26,488][42004] Updated weights for policy 0, policy_version 32736 (0.0023) +[2024-11-08 03:54:27,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6636.9). Total num frames: 134094848. Throughput: 0: 1726.8. Samples: 28520292. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:54:27,933][41694] Avg episode reward: [(0, '4.371')] +[2024-11-08 03:54:31,765][42004] Updated weights for policy 0, policy_version 32746 (0.0027) +[2024-11-08 03:54:32,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 134135808. Throughput: 0: 1748.1. Samples: 28525958. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:54:32,934][41694] Avg episode reward: [(0, '4.357')] +[2024-11-08 03:54:37,188][42004] Updated weights for policy 0, policy_version 32756 (0.0034) +[2024-11-08 03:54:37,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6826.7, 300 sec: 6734.1). Total num frames: 134172672. Throughput: 0: 1763.6. Samples: 28537486. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:54:37,933][41694] Avg episode reward: [(0, '4.564')] +[2024-11-08 03:54:42,652][42004] Updated weights for policy 0, policy_version 32766 (0.0029) +[2024-11-08 03:54:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7099.8, 300 sec: 6748.0). Total num frames: 134209536. Throughput: 0: 1769.1. Samples: 28548856. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:54:42,933][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 03:54:47,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7168.1, 300 sec: 6761.9). Total num frames: 134246400. Throughput: 0: 1762.2. Samples: 28554120. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:54:47,933][41694] Avg episode reward: [(0, '4.282')] +[2024-11-08 03:54:48,462][42004] Updated weights for policy 0, policy_version 32776 (0.0034) +[2024-11-08 03:54:52,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6894.9, 300 sec: 6692.5). Total num frames: 134266880. Throughput: 0: 1663.4. Samples: 28560846. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:54:52,933][41694] Avg episode reward: [(0, '4.325')] +[2024-11-08 03:54:56,185][42004] Updated weights for policy 0, policy_version 32786 (0.0028) +[2024-11-08 03:54:57,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6894.9, 300 sec: 6734.1). Total num frames: 134303744. Throughput: 0: 1674.2. Samples: 28572086. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:54:57,933][41694] Avg episode reward: [(0, '4.592')] +[2024-11-08 03:55:01,634][42004] Updated weights for policy 0, policy_version 32796 (0.0041) +[2024-11-08 03:55:02,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6826.7, 300 sec: 6734.1). Total num frames: 134336512. Throughput: 0: 1700.3. Samples: 28577580. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:55:02,933][41694] Avg episode reward: [(0, '4.659')] +[2024-11-08 03:55:07,488][42004] Updated weights for policy 0, policy_version 32806 (0.0021) +[2024-11-08 03:55:07,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 134373376. Throughput: 0: 1759.2. Samples: 28588058. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:55:07,933][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 03:55:12,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.6, 300 sec: 6831.3). Total num frames: 134410240. Throughput: 0: 1762.2. Samples: 28599590. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:55:12,933][41694] Avg episode reward: [(0, '4.371')] +[2024-11-08 03:55:12,987][42004] Updated weights for policy 0, policy_version 32816 (0.0044) +[2024-11-08 03:55:17,933][41694] Fps is (10 sec: 6962.1, 60 sec: 7031.3, 300 sec: 6831.3). Total num frames: 134443008. Throughput: 0: 1737.8. Samples: 28604160. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:55:17,935][41694] Avg episode reward: [(0, '4.399')] +[2024-11-08 03:55:19,587][42004] Updated weights for policy 0, policy_version 32826 (0.0028) +[2024-11-08 03:55:22,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6963.2, 300 sec: 6831.3). Total num frames: 134475776. Throughput: 0: 1695.5. Samples: 28613782. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:55:22,933][41694] Avg episode reward: [(0, '4.327')] +[2024-11-08 03:55:27,626][42004] Updated weights for policy 0, policy_version 32836 (0.0026) +[2024-11-08 03:55:27,931][41694] Fps is (10 sec: 5325.6, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 134496256. Throughput: 0: 1593.4. Samples: 28620560. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:55:27,933][41694] Avg episode reward: [(0, '4.377')] +[2024-11-08 03:55:32,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6621.8, 300 sec: 6761.9). Total num frames: 134533120. Throughput: 0: 1602.1. Samples: 28626216. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:55:32,934][41694] Avg episode reward: [(0, '4.756')] +[2024-11-08 03:55:32,944][42004] Updated weights for policy 0, policy_version 32846 (0.0025) +[2024-11-08 03:55:37,932][41694] Fps is (10 sec: 7782.0, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 134574080. Throughput: 0: 1711.0. Samples: 28637840. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:55:37,934][41694] Avg episode reward: [(0, '4.605')] +[2024-11-08 03:55:37,944][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000032855_134574080.pth... +[2024-11-08 03:55:38,060][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000032456_132939776.pth +[2024-11-08 03:55:38,202][42004] Updated weights for policy 0, policy_version 32856 (0.0029) +[2024-11-08 03:55:42,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 134610944. Throughput: 0: 1719.9. Samples: 28649480. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:55:42,933][41694] Avg episode reward: [(0, '4.606')] +[2024-11-08 03:55:43,493][42004] Updated weights for policy 0, policy_version 32866 (0.0029) +[2024-11-08 03:55:47,931][41694] Fps is (10 sec: 7782.7, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 134651904. Throughput: 0: 1725.0. Samples: 28655204. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:55:47,933][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 03:55:48,886][42004] Updated weights for policy 0, policy_version 32876 (0.0026) +[2024-11-08 03:55:52,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6845.2). Total num frames: 134684672. Throughput: 0: 1740.2. Samples: 28666366. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:55:52,933][41694] Avg episode reward: [(0, '4.618')] +[2024-11-08 03:55:54,972][42004] Updated weights for policy 0, policy_version 32886 (0.0027) +[2024-11-08 03:55:59,985][41694] Fps is (10 sec: 5437.0, 60 sec: 6666.7, 300 sec: 6784.1). Total num frames: 134717440. Throughput: 0: 1630.1. Samples: 28676294. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:55:59,988][41694] Avg episode reward: [(0, '4.451')] +[2024-11-08 03:56:02,937][41694] Fps is (10 sec: 5323.4, 60 sec: 6689.9, 300 sec: 6775.7). Total num frames: 134737920. Throughput: 0: 1635.7. Samples: 28677770. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:56:02,938][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 03:56:03,211][42004] Updated weights for policy 0, policy_version 32896 (0.0032) +[2024-11-08 03:56:07,932][41694] Fps is (10 sec: 7216.3, 60 sec: 6690.1, 300 sec: 6789.6). Total num frames: 134774784. Throughput: 0: 1650.0. Samples: 28688030. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:56:07,934][41694] Avg episode reward: [(0, '4.375')] +[2024-11-08 03:56:08,735][42004] Updated weights for policy 0, policy_version 32906 (0.0031) +[2024-11-08 03:56:12,932][41694] Fps is (10 sec: 6964.9, 60 sec: 6621.9, 300 sec: 6775.8). Total num frames: 134807552. Throughput: 0: 1731.9. Samples: 28698496. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:56:12,934][41694] Avg episode reward: [(0, '4.392')] +[2024-11-08 03:56:14,825][42004] Updated weights for policy 0, policy_version 32916 (0.0028) +[2024-11-08 03:56:17,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.3, 300 sec: 6822.3). Total num frames: 134844416. Throughput: 0: 1725.7. Samples: 28703872. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:56:17,933][41694] Avg episode reward: [(0, '4.799')] +[2024-11-08 03:56:20,394][42004] Updated weights for policy 0, policy_version 32926 (0.0029) +[2024-11-08 03:56:22,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 134881280. Throughput: 0: 1713.9. Samples: 28714966. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:56:22,934][41694] Avg episode reward: [(0, '4.477')] +[2024-11-08 03:56:26,402][42004] Updated weights for policy 0, policy_version 32936 (0.0037) +[2024-11-08 03:56:27,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6963.2, 300 sec: 6817.4). Total num frames: 134914048. Throughput: 0: 1678.1. Samples: 28724996. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:56:27,935][41694] Avg episode reward: [(0, '4.722')] +[2024-11-08 03:56:34,437][41694] Fps is (10 sec: 5340.2, 60 sec: 6659.6, 300 sec: 6755.2). Total num frames: 134942720. Throughput: 0: 1606.5. Samples: 28729916. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:56:34,438][41694] Avg episode reward: [(0, '4.429')] +[2024-11-08 03:56:34,483][42004] Updated weights for policy 0, policy_version 32946 (0.0025) +[2024-11-08 03:56:37,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6553.6, 300 sec: 6748.0). Total num frames: 134967296. Throughput: 0: 1562.9. Samples: 28736696. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:56:37,933][41694] Avg episode reward: [(0, '4.312')] +[2024-11-08 03:56:40,215][42004] Updated weights for policy 0, policy_version 32956 (0.0037) +[2024-11-08 03:56:42,931][41694] Fps is (10 sec: 7232.7, 60 sec: 6553.6, 300 sec: 6734.1). Total num frames: 135004160. Throughput: 0: 1663.5. Samples: 28747736. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:56:42,933][41694] Avg episode reward: [(0, '4.504')] +[2024-11-08 03:56:45,591][42004] Updated weights for policy 0, policy_version 32966 (0.0032) +[2024-11-08 03:56:47,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6553.6, 300 sec: 6748.0). Total num frames: 135045120. Throughput: 0: 1682.8. Samples: 28753494. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:56:47,933][41694] Avg episode reward: [(0, '4.343')] +[2024-11-08 03:56:51,226][42004] Updated weights for policy 0, policy_version 32976 (0.0031) +[2024-11-08 03:56:52,932][41694] Fps is (10 sec: 7372.1, 60 sec: 6553.5, 300 sec: 6792.8). Total num frames: 135077888. Throughput: 0: 1705.9. Samples: 28764796. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:56:52,934][41694] Avg episode reward: [(0, '4.584')] +[2024-11-08 03:56:56,795][42004] Updated weights for policy 0, policy_version 32986 (0.0025) +[2024-11-08 03:56:57,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6927.2, 300 sec: 6831.3). Total num frames: 135118848. Throughput: 0: 1715.1. Samples: 28775676. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:56:57,933][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 03:57:02,932][41694] Fps is (10 sec: 6963.4, 60 sec: 6826.9, 300 sec: 6803.5). Total num frames: 135147520. Throughput: 0: 1702.5. Samples: 28780486. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 03:57:02,938][41694] Avg episode reward: [(0, '4.374')] +[2024-11-08 03:57:03,195][42004] Updated weights for policy 0, policy_version 32996 (0.0037) +[2024-11-08 03:57:08,902][41694] Fps is (10 sec: 4853.6, 60 sec: 6516.5, 300 sec: 6725.9). Total num frames: 135172096. Throughput: 0: 1635.1. Samples: 28790134. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 03:57:08,908][41694] Avg episode reward: [(0, '4.514')] +[2024-11-08 03:57:11,229][42004] Updated weights for policy 0, policy_version 33006 (0.0038) +[2024-11-08 03:57:12,932][41694] Fps is (10 sec: 5325.1, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 135200768. Throughput: 0: 1600.4. Samples: 28797016. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 03:57:12,934][41694] Avg episode reward: [(0, '4.550')] +[2024-11-08 03:57:16,910][42004] Updated weights for policy 0, policy_version 33016 (0.0030) +[2024-11-08 03:57:17,931][41694] Fps is (10 sec: 7258.2, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 135237632. Throughput: 0: 1666.4. Samples: 28802396. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 03:57:17,933][41694] Avg episode reward: [(0, '4.305')] +[2024-11-08 03:57:22,099][42004] Updated weights for policy 0, policy_version 33026 (0.0020) +[2024-11-08 03:57:22,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6621.8, 300 sec: 6734.1). Total num frames: 135278592. Throughput: 0: 1720.6. Samples: 28814122. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:57:22,934][41694] Avg episode reward: [(0, '4.426')] +[2024-11-08 03:57:27,245][42004] Updated weights for policy 0, policy_version 33036 (0.0040) +[2024-11-08 03:57:27,931][41694] Fps is (10 sec: 8191.9, 60 sec: 6758.4, 300 sec: 6805.3). Total num frames: 135319552. Throughput: 0: 1739.6. Samples: 28826016. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:57:27,933][41694] Avg episode reward: [(0, '4.418')] +[2024-11-08 03:57:32,574][42004] Updated weights for policy 0, policy_version 33046 (0.0028) +[2024-11-08 03:57:32,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7072.3, 300 sec: 6817.4). Total num frames: 135356416. Throughput: 0: 1736.0. Samples: 28831612. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:57:32,934][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 03:57:37,932][41694] Fps is (10 sec: 6963.2, 60 sec: 7031.5, 300 sec: 6789.6). Total num frames: 135389184. Throughput: 0: 1717.4. Samples: 28842078. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:57:37,935][41694] Avg episode reward: [(0, '4.338')] +[2024-11-08 03:57:37,954][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000033054_135389184.pth... +[2024-11-08 03:57:38,103][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000032657_133763072.pth +[2024-11-08 03:57:38,832][42004] Updated weights for policy 0, policy_version 33056 (0.0032) +[2024-11-08 03:57:43,466][41694] Fps is (10 sec: 5443.4, 60 sec: 6766.4, 300 sec: 6735.8). Total num frames: 135413760. Throughput: 0: 1570.9. Samples: 28847208. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:57:43,469][41694] Avg episode reward: [(0, '4.373')] +[2024-11-08 03:57:46,987][42004] Updated weights for policy 0, policy_version 33066 (0.0040) +[2024-11-08 03:57:47,934][41694] Fps is (10 sec: 5323.7, 60 sec: 6621.6, 300 sec: 6720.2). Total num frames: 135442432. Throughput: 0: 1629.7. Samples: 28853824. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:57:47,936][41694] Avg episode reward: [(0, '4.311')] +[2024-11-08 03:57:52,932][41694] Fps is (10 sec: 6491.1, 60 sec: 6622.0, 300 sec: 6720.2). Total num frames: 135475200. Throughput: 0: 1672.2. Samples: 28863760. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:57:52,933][41694] Avg episode reward: [(0, '4.513')] +[2024-11-08 03:57:52,948][42004] Updated weights for policy 0, policy_version 33076 (0.0033) +[2024-11-08 03:57:57,931][41694] Fps is (10 sec: 7374.4, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 135516160. Throughput: 0: 1738.6. Samples: 28875254. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:57:57,933][41694] Avg episode reward: [(0, '4.538')] +[2024-11-08 03:57:58,238][42004] Updated weights for policy 0, policy_version 33086 (0.0036) +[2024-11-08 03:58:02,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.5, 300 sec: 6804.3). Total num frames: 135553024. Throughput: 0: 1743.7. Samples: 28880864. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:58:02,933][41694] Avg episode reward: [(0, '4.574')] +[2024-11-08 03:58:03,715][42004] Updated weights for policy 0, policy_version 33096 (0.0030) +[2024-11-08 03:58:07,933][41694] Fps is (10 sec: 7371.9, 60 sec: 7077.6, 300 sec: 6817.4). Total num frames: 135589888. Throughput: 0: 1728.2. Samples: 28891894. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:58:07,935][41694] Avg episode reward: [(0, '4.503')] +[2024-11-08 03:58:09,759][42004] Updated weights for policy 0, policy_version 33106 (0.0028) +[2024-11-08 03:58:12,931][41694] Fps is (10 sec: 6963.3, 60 sec: 7031.5, 300 sec: 6803.5). Total num frames: 135622656. Throughput: 0: 1686.8. Samples: 28901922. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:58:12,934][41694] Avg episode reward: [(0, '4.524')] +[2024-11-08 03:58:17,852][42004] Updated weights for policy 0, policy_version 33116 (0.0029) +[2024-11-08 03:58:17,931][41694] Fps is (10 sec: 5325.4, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 135643136. Throughput: 0: 1680.0. Samples: 28907212. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:58:17,933][41694] Avg episode reward: [(0, '4.502')] +[2024-11-08 03:58:22,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.2, 300 sec: 6748.0). Total num frames: 135680000. Throughput: 0: 1603.7. Samples: 28914244. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:58:22,933][41694] Avg episode reward: [(0, '4.691')] +[2024-11-08 03:58:23,364][42004] Updated weights for policy 0, policy_version 33126 (0.0024) +[2024-11-08 03:58:27,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 135716864. Throughput: 0: 1763.6. Samples: 28925628. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:58:27,934][41694] Avg episode reward: [(0, '4.374')] +[2024-11-08 03:58:28,850][42004] Updated weights for policy 0, policy_version 33136 (0.0031) +[2024-11-08 03:58:32,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6690.2, 300 sec: 6761.9). Total num frames: 135757824. Throughput: 0: 1716.4. Samples: 28931060. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:58:32,933][41694] Avg episode reward: [(0, '4.358')] +[2024-11-08 03:58:33,980][42004] Updated weights for policy 0, policy_version 33146 (0.0024) +[2024-11-08 03:58:37,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 135794688. Throughput: 0: 1757.1. Samples: 28942828. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:58:37,933][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 03:58:39,516][42004] Updated weights for policy 0, policy_version 33156 (0.0026) +[2024-11-08 03:58:42,935][41694] Fps is (10 sec: 6960.9, 60 sec: 6956.6, 300 sec: 6817.4). Total num frames: 135827456. Throughput: 0: 1736.4. Samples: 28953398. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:58:42,937][41694] Avg episode reward: [(0, '4.542')] +[2024-11-08 03:58:45,427][42004] Updated weights for policy 0, policy_version 33166 (0.0034) +[2024-11-08 03:58:47,931][41694] Fps is (10 sec: 6963.2, 60 sec: 7031.7, 300 sec: 6817.4). Total num frames: 135864320. Throughput: 0: 1729.1. Samples: 28958674. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:58:47,933][41694] Avg episode reward: [(0, '4.619')] +[2024-11-08 03:58:52,932][41694] Fps is (10 sec: 5326.4, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 135880704. Throughput: 0: 1692.0. Samples: 28968032. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:58:52,933][41694] Avg episode reward: [(0, '4.670')] +[2024-11-08 03:58:53,688][42004] Updated weights for policy 0, policy_version 33176 (0.0029) +[2024-11-08 03:58:57,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 135917568. Throughput: 0: 1637.7. Samples: 28975618. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:58:57,933][41694] Avg episode reward: [(0, '4.465')] +[2024-11-08 03:58:59,340][42004] Updated weights for policy 0, policy_version 33186 (0.0046) +[2024-11-08 03:59:02,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6734.1). Total num frames: 135954432. Throughput: 0: 1642.6. Samples: 28981130. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:59:02,933][41694] Avg episode reward: [(0, '4.528')] +[2024-11-08 03:59:04,878][42004] Updated weights for policy 0, policy_version 33196 (0.0029) +[2024-11-08 03:59:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.2, 300 sec: 6748.0). Total num frames: 135991296. Throughput: 0: 1735.5. Samples: 28992342. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:59:07,935][41694] Avg episode reward: [(0, '4.600')] +[2024-11-08 03:59:10,344][42004] Updated weights for policy 0, policy_version 33206 (0.0028) +[2024-11-08 03:59:12,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 136032256. Throughput: 0: 1738.2. Samples: 29003846. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:59:12,936][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 03:59:16,079][42004] Updated weights for policy 0, policy_version 33216 (0.0047) +[2024-11-08 03:59:17,932][41694] Fps is (10 sec: 7372.9, 60 sec: 7031.5, 300 sec: 6803.5). Total num frames: 136065024. Throughput: 0: 1731.1. Samples: 29008960. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:59:17,935][41694] Avg episode reward: [(0, '4.459')] +[2024-11-08 03:59:21,844][42004] Updated weights for policy 0, policy_version 33226 (0.0032) +[2024-11-08 03:59:22,932][41694] Fps is (10 sec: 6963.1, 60 sec: 7031.5, 300 sec: 6803.5). Total num frames: 136101888. Throughput: 0: 1704.7. Samples: 29019542. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:59:22,934][41694] Avg episode reward: [(0, '4.331')] +[2024-11-08 03:59:27,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6690.1, 300 sec: 6720.2). Total num frames: 136118272. Throughput: 0: 1634.7. Samples: 29026956. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:59:27,934][41694] Avg episode reward: [(0, '4.488')] +[2024-11-08 03:59:29,842][42004] Updated weights for policy 0, policy_version 33236 (0.0026) +[2024-11-08 03:59:32,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 136155136. Throughput: 0: 1619.8. Samples: 29031566. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:59:32,934][41694] Avg episode reward: [(0, '4.633')] +[2024-11-08 03:59:35,372][42004] Updated weights for policy 0, policy_version 33246 (0.0032) +[2024-11-08 03:59:37,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 136192000. Throughput: 0: 1663.1. Samples: 29042870. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:59:37,933][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 03:59:37,953][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000033251_136196096.pth... +[2024-11-08 03:59:38,078][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000032855_134574080.pth +[2024-11-08 03:59:40,688][42004] Updated weights for policy 0, policy_version 33256 (0.0026) +[2024-11-08 03:59:42,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.8, 300 sec: 6734.1). Total num frames: 136232960. Throughput: 0: 1750.3. Samples: 29054380. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:59:42,933][41694] Avg episode reward: [(0, '4.484')] +[2024-11-08 03:59:46,093][42004] Updated weights for policy 0, policy_version 33266 (0.0034) +[2024-11-08 03:59:47,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 136269824. Throughput: 0: 1747.1. Samples: 29059748. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:59:47,934][41694] Avg episode reward: [(0, '4.433')] +[2024-11-08 03:59:52,367][42004] Updated weights for policy 0, policy_version 33276 (0.0030) +[2024-11-08 03:59:52,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6963.2, 300 sec: 6761.9). Total num frames: 136298496. Throughput: 0: 1740.3. Samples: 29070654. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:59:52,934][41694] Avg episode reward: [(0, '4.373')] +[2024-11-08 03:59:57,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6963.2, 300 sec: 6775.8). Total num frames: 136335360. Throughput: 0: 1716.4. Samples: 29081086. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:59:57,933][41694] Avg episode reward: [(0, '4.560')] +[2024-11-08 03:59:58,056][42004] Updated weights for policy 0, policy_version 33286 (0.0027) +[2024-11-08 04:00:02,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6826.7, 300 sec: 6748.0). Total num frames: 136364032. Throughput: 0: 1697.6. Samples: 29085354. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:00:02,941][41694] Avg episode reward: [(0, '4.619')] +[2024-11-08 04:00:04,824][42004] Updated weights for policy 0, policy_version 33296 (0.0029) +[2024-11-08 04:00:07,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6894.9, 300 sec: 6761.9). Total num frames: 136404992. Throughput: 0: 1677.5. Samples: 29095028. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:00:07,934][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 04:00:10,295][42004] Updated weights for policy 0, policy_version 33306 (0.0025) +[2024-11-08 04:00:12,932][41694] Fps is (10 sec: 7372.3, 60 sec: 6758.3, 300 sec: 6761.9). Total num frames: 136437760. Throughput: 0: 1743.7. Samples: 29105422. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:00:12,935][41694] Avg episode reward: [(0, '4.581')] +[2024-11-08 04:00:18,107][41694] Fps is (10 sec: 4025.5, 60 sec: 6330.3, 300 sec: 6674.6). Total num frames: 136445952. Throughput: 0: 1634.9. Samples: 29105422. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:00:18,110][41694] Avg episode reward: [(0, '4.492')] +[2024-11-08 04:00:20,434][42004] Updated weights for policy 0, policy_version 33316 (0.0058) +[2024-11-08 04:00:22,932][41694] Fps is (10 sec: 3686.4, 60 sec: 6212.2, 300 sec: 6706.3). Total num frames: 136474624. Throughput: 0: 1581.0. Samples: 29114018. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:00:22,935][41694] Avg episode reward: [(0, '4.446')] +[2024-11-08 04:00:27,272][42004] Updated weights for policy 0, policy_version 33326 (0.0040) +[2024-11-08 04:00:27,932][41694] Fps is (10 sec: 5836.4, 60 sec: 6417.0, 300 sec: 6678.5). Total num frames: 136503296. Throughput: 0: 1521.6. Samples: 29122854. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:00:27,937][41694] Avg episode reward: [(0, '4.196')] +[2024-11-08 04:00:32,933][41694] Fps is (10 sec: 6143.7, 60 sec: 6348.7, 300 sec: 6650.8). Total num frames: 136536064. Throughput: 0: 1501.5. Samples: 29127316. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:00:32,935][41694] Avg episode reward: [(0, '4.255')] +[2024-11-08 04:00:33,679][42004] Updated weights for policy 0, policy_version 33336 (0.0037) +[2024-11-08 04:00:37,932][41694] Fps is (10 sec: 6963.8, 60 sec: 6348.8, 300 sec: 6650.8). Total num frames: 136572928. Throughput: 0: 1492.4. Samples: 29137812. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:00:37,933][41694] Avg episode reward: [(0, '4.421')] +[2024-11-08 04:00:39,021][42004] Updated weights for policy 0, policy_version 33346 (0.0032) +[2024-11-08 04:00:42,932][41694] Fps is (10 sec: 7783.0, 60 sec: 6348.7, 300 sec: 6650.8). Total num frames: 136613888. Throughput: 0: 1519.5. Samples: 29149466. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:00:42,934][41694] Avg episode reward: [(0, '4.358')] +[2024-11-08 04:00:44,431][42004] Updated weights for policy 0, policy_version 33356 (0.0023) +[2024-11-08 04:00:47,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6348.8, 300 sec: 6664.7). Total num frames: 136650752. Throughput: 0: 1554.5. Samples: 29155306. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:00:47,934][41694] Avg episode reward: [(0, '4.584')] +[2024-11-08 04:00:49,659][42004] Updated weights for policy 0, policy_version 33366 (0.0022) +[2024-11-08 04:00:52,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6485.3, 300 sec: 6725.4). Total num frames: 136687616. Throughput: 0: 1596.1. Samples: 29166852. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:00:52,933][41694] Avg episode reward: [(0, '4.514')] +[2024-11-08 04:00:54,981][42004] Updated weights for policy 0, policy_version 33376 (0.0031) +[2024-11-08 04:00:57,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6485.3, 300 sec: 6734.2). Total num frames: 136724480. Throughput: 0: 1606.2. Samples: 29177698. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:00:57,933][41694] Avg episode reward: [(0, '4.370')] +[2024-11-08 04:01:01,086][42004] Updated weights for policy 0, policy_version 33386 (0.0030) +[2024-11-08 04:01:02,932][41694] Fps is (10 sec: 7373.1, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 136761344. Throughput: 0: 1729.7. Samples: 29182954. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:01:02,934][41694] Avg episode reward: [(0, '4.668')] +[2024-11-08 04:01:06,633][42004] Updated weights for policy 0, policy_version 33396 (0.0048) +[2024-11-08 04:01:07,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6553.6, 300 sec: 6748.0). Total num frames: 136798208. Throughput: 0: 1773.6. Samples: 29193828. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:01:07,935][41694] Avg episode reward: [(0, '4.496')] +[2024-11-08 04:01:12,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6485.4, 300 sec: 6720.2). Total num frames: 136826880. Throughput: 0: 1784.9. Samples: 29203174. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:01:12,934][41694] Avg episode reward: [(0, '4.408')] +[2024-11-08 04:01:13,271][42004] Updated weights for policy 0, policy_version 33406 (0.0034) +[2024-11-08 04:01:17,931][41694] Fps is (10 sec: 6553.9, 60 sec: 6983.6, 300 sec: 6720.2). Total num frames: 136863744. Throughput: 0: 1794.8. Samples: 29208078. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:01:17,933][41694] Avg episode reward: [(0, '4.613')] +[2024-11-08 04:01:18,998][42004] Updated weights for policy 0, policy_version 33416 (0.0041) +[2024-11-08 04:01:23,949][41694] Fps is (10 sec: 6320.5, 60 sec: 6914.3, 300 sec: 6697.1). Total num frames: 136896512. Throughput: 0: 1761.8. Samples: 29218886. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:01:23,951][41694] Avg episode reward: [(0, '4.540')] +[2024-11-08 04:01:26,190][42004] Updated weights for policy 0, policy_version 33426 (0.0033) +[2024-11-08 04:01:27,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6963.3, 300 sec: 6740.7). Total num frames: 136921088. Throughput: 0: 1728.2. Samples: 29227236. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:01:27,934][41694] Avg episode reward: [(0, '4.458')] +[2024-11-08 04:01:32,366][42004] Updated weights for policy 0, policy_version 33436 (0.0031) +[2024-11-08 04:01:32,931][41694] Fps is (10 sec: 6383.6, 60 sec: 6963.3, 300 sec: 6734.1). Total num frames: 136953856. Throughput: 0: 1710.8. Samples: 29232294. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:01:32,934][41694] Avg episode reward: [(0, '4.241')] +[2024-11-08 04:01:37,866][42004] Updated weights for policy 0, policy_version 33446 (0.0031) +[2024-11-08 04:01:37,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.4, 300 sec: 6748.0). Total num frames: 136994816. Throughput: 0: 1684.4. Samples: 29242648. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:01:37,933][41694] Avg episode reward: [(0, '4.474')] +[2024-11-08 04:01:37,944][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000033446_136994816.pth... +[2024-11-08 04:01:38,062][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000033054_135389184.pth +[2024-11-08 04:01:42,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.3, 300 sec: 6734.1). Total num frames: 137031680. Throughput: 0: 1704.3. Samples: 29254392. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:01:42,933][41694] Avg episode reward: [(0, '4.534')] +[2024-11-08 04:01:43,056][42004] Updated weights for policy 0, policy_version 33456 (0.0028) +[2024-11-08 04:01:47,931][41694] Fps is (10 sec: 7782.6, 60 sec: 7031.5, 300 sec: 6761.9). Total num frames: 137072640. Throughput: 0: 1714.1. Samples: 29260090. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:01:47,933][41694] Avg episode reward: [(0, '4.527')] +[2024-11-08 04:01:48,331][42004] Updated weights for policy 0, policy_version 33466 (0.0040) +[2024-11-08 04:01:52,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7031.5, 300 sec: 6748.0). Total num frames: 137109504. Throughput: 0: 1729.8. Samples: 29271668. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:01:52,935][41694] Avg episode reward: [(0, '4.523')] +[2024-11-08 04:01:53,795][42004] Updated weights for policy 0, policy_version 33476 (0.0022) +[2024-11-08 04:01:57,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6826.7, 300 sec: 6734.1). Total num frames: 137134080. Throughput: 0: 1727.3. Samples: 29280902. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:01:57,934][41694] Avg episode reward: [(0, '4.650')] +[2024-11-08 04:02:01,232][42004] Updated weights for policy 0, policy_version 33486 (0.0023) +[2024-11-08 04:02:02,938][41694] Fps is (10 sec: 5730.7, 60 sec: 6757.7, 300 sec: 6784.0). Total num frames: 137166848. Throughput: 0: 1709.8. Samples: 29285030. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:02:02,940][41694] Avg episode reward: [(0, '4.573')] +[2024-11-08 04:02:07,629][42004] Updated weights for policy 0, policy_version 33496 (0.0038) +[2024-11-08 04:02:07,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6690.2, 300 sec: 6775.8). Total num frames: 137199616. Throughput: 0: 1718.3. Samples: 29294462. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:02:07,933][41694] Avg episode reward: [(0, '4.567')] +[2024-11-08 04:02:12,932][41694] Fps is (10 sec: 6967.7, 60 sec: 6826.7, 300 sec: 6775.8). Total num frames: 137236480. Throughput: 0: 1739.0. Samples: 29305490. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:02:12,933][41694] Avg episode reward: [(0, '4.596')] +[2024-11-08 04:02:13,059][42004] Updated weights for policy 0, policy_version 33506 (0.0024) +[2024-11-08 04:02:17,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6894.9, 300 sec: 6775.8). Total num frames: 137277440. Throughput: 0: 1755.6. Samples: 29311294. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:02:17,933][41694] Avg episode reward: [(0, '4.474')] +[2024-11-08 04:02:18,341][42004] Updated weights for policy 0, policy_version 33516 (0.0027) +[2024-11-08 04:02:22,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7083.3, 300 sec: 6761.9). Total num frames: 137314304. Throughput: 0: 1787.5. Samples: 29323086. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:02:22,933][41694] Avg episode reward: [(0, '4.378')] +[2024-11-08 04:02:23,616][42004] Updated weights for policy 0, policy_version 33526 (0.0026) +[2024-11-08 04:02:27,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7168.0, 300 sec: 6761.9). Total num frames: 137351168. Throughput: 0: 1772.3. Samples: 29334144. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:02:27,933][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 04:02:29,422][42004] Updated weights for policy 0, policy_version 33536 (0.0033) +[2024-11-08 04:02:32,931][41694] Fps is (10 sec: 6144.0, 60 sec: 7031.5, 300 sec: 6734.1). Total num frames: 137375744. Throughput: 0: 1763.0. Samples: 29339426. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:02:32,933][41694] Avg episode reward: [(0, '4.334')] +[2024-11-08 04:02:36,698][42004] Updated weights for policy 0, policy_version 33546 (0.0032) +[2024-11-08 04:02:37,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6894.9, 300 sec: 6774.1). Total num frames: 137408512. Throughput: 0: 1682.8. Samples: 29347394. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:02:37,934][41694] Avg episode reward: [(0, '4.342')] +[2024-11-08 04:02:42,891][42004] Updated weights for policy 0, policy_version 33556 (0.0028) +[2024-11-08 04:02:42,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6894.9, 300 sec: 6789.7). Total num frames: 137445376. Throughput: 0: 1697.1. Samples: 29357272. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:02:42,933][41694] Avg episode reward: [(0, '4.441')] +[2024-11-08 04:02:47,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 137482240. Throughput: 0: 1729.0. Samples: 29362826. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:02:47,935][41694] Avg episode reward: [(0, '4.546')] +[2024-11-08 04:02:48,312][42004] Updated weights for policy 0, policy_version 33566 (0.0026) +[2024-11-08 04:02:52,933][41694] Fps is (10 sec: 7371.8, 60 sec: 6826.5, 300 sec: 6789.6). Total num frames: 137519104. Throughput: 0: 1775.0. Samples: 29374340. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:02:52,935][41694] Avg episode reward: [(0, '4.306')] +[2024-11-08 04:02:53,698][42004] Updated weights for policy 0, policy_version 33576 (0.0022) +[2024-11-08 04:02:57,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 6789.6). Total num frames: 137555968. Throughput: 0: 1771.9. Samples: 29385226. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:02:57,933][41694] Avg episode reward: [(0, '4.637')] +[2024-11-08 04:02:59,610][42004] Updated weights for policy 0, policy_version 33586 (0.0028) +[2024-11-08 04:03:02,932][41694] Fps is (10 sec: 6964.0, 60 sec: 7032.2, 300 sec: 6775.8). Total num frames: 137588736. Throughput: 0: 1753.3. Samples: 29390192. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:03:02,933][41694] Avg episode reward: [(0, '4.720')] +[2024-11-08 04:03:07,475][42004] Updated weights for policy 0, policy_version 33596 (0.0034) +[2024-11-08 04:03:07,944][41694] Fps is (10 sec: 5318.2, 60 sec: 6825.2, 300 sec: 6733.8). Total num frames: 137609216. Throughput: 0: 1645.5. Samples: 29397152. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:03:07,946][41694] Avg episode reward: [(0, '4.688')] +[2024-11-08 04:03:12,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 137641984. Throughput: 0: 1614.4. Samples: 29406792. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:03:12,933][41694] Avg episode reward: [(0, '4.475')] +[2024-11-08 04:03:14,085][42004] Updated weights for policy 0, policy_version 33606 (0.0034) +[2024-11-08 04:03:17,931][41694] Fps is (10 sec: 6561.8, 60 sec: 6621.9, 300 sec: 6761.9). Total num frames: 137674752. Throughput: 0: 1607.0. Samples: 29411740. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:03:17,933][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 04:03:19,757][42004] Updated weights for policy 0, policy_version 33616 (0.0026) +[2024-11-08 04:03:22,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 137715712. Throughput: 0: 1677.0. Samples: 29422858. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:03:22,934][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 04:03:25,178][42004] Updated weights for policy 0, policy_version 33626 (0.0027) +[2024-11-08 04:03:27,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6748.0). Total num frames: 137748480. Throughput: 0: 1706.5. Samples: 29434066. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:03:27,934][41694] Avg episode reward: [(0, '4.456')] +[2024-11-08 04:03:30,725][42004] Updated weights for policy 0, policy_version 33636 (0.0030) +[2024-11-08 04:03:32,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6761.9). Total num frames: 137789440. Throughput: 0: 1702.3. Samples: 29439428. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:03:32,933][41694] Avg episode reward: [(0, '4.309')] +[2024-11-08 04:03:36,510][42004] Updated weights for policy 0, policy_version 33646 (0.0025) +[2024-11-08 04:03:39,442][41694] Fps is (10 sec: 6405.4, 60 sec: 6725.7, 300 sec: 6727.5). Total num frames: 137822208. Throughput: 0: 1629.9. Samples: 29450144. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:03:39,445][41694] Avg episode reward: [(0, '4.396')] +[2024-11-08 04:03:39,461][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000033648_137822208.pth... +[2024-11-08 04:03:39,604][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000033251_136196096.pth +[2024-11-08 04:03:42,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.9, 300 sec: 6706.3). Total num frames: 137842688. Throughput: 0: 1600.0. Samples: 29457226. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:03:42,933][41694] Avg episode reward: [(0, '4.390')] +[2024-11-08 04:03:44,440][42004] Updated weights for policy 0, policy_version 33656 (0.0031) +[2024-11-08 04:03:47,932][41694] Fps is (10 sec: 6272.0, 60 sec: 6553.6, 300 sec: 6761.9). Total num frames: 137875456. Throughput: 0: 1599.2. Samples: 29462154. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:03:47,936][41694] Avg episode reward: [(0, '4.485')] +[2024-11-08 04:03:50,738][42004] Updated weights for policy 0, policy_version 33666 (0.0026) +[2024-11-08 04:03:52,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6485.5, 300 sec: 6748.0). Total num frames: 137908224. Throughput: 0: 1665.8. Samples: 29472092. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:03:52,936][41694] Avg episode reward: [(0, '4.572')] +[2024-11-08 04:03:56,904][42004] Updated weights for policy 0, policy_version 33676 (0.0029) +[2024-11-08 04:03:57,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6734.1). Total num frames: 137940992. Throughput: 0: 1673.2. Samples: 29482086. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:03:57,934][41694] Avg episode reward: [(0, '4.551')] +[2024-11-08 04:04:02,448][42004] Updated weights for policy 0, policy_version 33686 (0.0051) +[2024-11-08 04:04:02,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6485.3, 300 sec: 6734.1). Total num frames: 137977856. Throughput: 0: 1690.2. Samples: 29487800. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:04:02,933][41694] Avg episode reward: [(0, '4.531')] +[2024-11-08 04:04:07,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6759.8, 300 sec: 6720.2). Total num frames: 138014720. Throughput: 0: 1667.2. Samples: 29497882. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:04:07,933][41694] Avg episode reward: [(0, '4.505')] +[2024-11-08 04:04:08,388][42004] Updated weights for policy 0, policy_version 33696 (0.0041) +[2024-11-08 04:04:13,416][41694] Fps is (10 sec: 5859.9, 60 sec: 6568.8, 300 sec: 6681.5). Total num frames: 138039296. Throughput: 0: 1521.3. Samples: 29503262. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:04:13,418][41694] Avg episode reward: [(0, '4.351')] +[2024-11-08 04:04:16,400][42004] Updated weights for policy 0, policy_version 33706 (0.0029) +[2024-11-08 04:04:17,932][41694] Fps is (10 sec: 5324.3, 60 sec: 6553.5, 300 sec: 6664.7). Total num frames: 138067968. Throughput: 0: 1574.2. Samples: 29510270. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:04:17,935][41694] Avg episode reward: [(0, '4.535')] +[2024-11-08 04:04:22,534][42004] Updated weights for policy 0, policy_version 33716 (0.0030) +[2024-11-08 04:04:22,932][41694] Fps is (10 sec: 6456.8, 60 sec: 6417.0, 300 sec: 6720.2). Total num frames: 138100736. Throughput: 0: 1603.7. Samples: 29519890. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:04:22,933][41694] Avg episode reward: [(0, '4.426')] +[2024-11-08 04:04:27,932][41694] Fps is (10 sec: 6963.7, 60 sec: 6485.3, 300 sec: 6720.2). Total num frames: 138137600. Throughput: 0: 1633.2. Samples: 29530720. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:04:27,934][41694] Avg episode reward: [(0, '4.412')] +[2024-11-08 04:04:28,142][42004] Updated weights for policy 0, policy_version 33726 (0.0026) +[2024-11-08 04:04:32,933][41694] Fps is (10 sec: 7781.9, 60 sec: 6485.2, 300 sec: 6734.1). Total num frames: 138178560. Throughput: 0: 1648.1. Samples: 29536318. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:04:32,937][41694] Avg episode reward: [(0, '4.574')] +[2024-11-08 04:04:33,420][42004] Updated weights for policy 0, policy_version 33736 (0.0025) +[2024-11-08 04:04:37,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6722.8, 300 sec: 6720.2). Total num frames: 138215424. Throughput: 0: 1691.8. Samples: 29548224. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:04:37,934][41694] Avg episode reward: [(0, '4.475')] +[2024-11-08 04:04:38,699][42004] Updated weights for policy 0, policy_version 33746 (0.0024) +[2024-11-08 04:04:42,932][41694] Fps is (10 sec: 7373.3, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 138252288. Throughput: 0: 1715.4. Samples: 29559278. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:04:42,933][41694] Avg episode reward: [(0, '4.319')] +[2024-11-08 04:04:44,363][42004] Updated weights for policy 0, policy_version 33756 (0.0031) +[2024-11-08 04:04:47,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6690.1, 300 sec: 6706.3). Total num frames: 138276864. Throughput: 0: 1708.6. Samples: 29564686. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:04:47,935][41694] Avg episode reward: [(0, '4.365')] +[2024-11-08 04:04:52,092][42004] Updated weights for policy 0, policy_version 33766 (0.0049) +[2024-11-08 04:04:52,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.1, 300 sec: 6692.5). Total num frames: 138309632. Throughput: 0: 1649.5. Samples: 29572110. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:04:52,934][41694] Avg episode reward: [(0, '4.430')] +[2024-11-08 04:04:57,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6690.1, 300 sec: 6706.3). Total num frames: 138342400. Throughput: 0: 1770.5. Samples: 29582076. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:04:57,933][41694] Avg episode reward: [(0, '4.497')] +[2024-11-08 04:04:58,113][42004] Updated weights for policy 0, policy_version 33776 (0.0031) +[2024-11-08 04:05:02,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 6692.5). Total num frames: 138379264. Throughput: 0: 1725.5. Samples: 29587916. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:05:02,933][41694] Avg episode reward: [(0, '4.406')] +[2024-11-08 04:05:03,512][42004] Updated weights for policy 0, policy_version 33786 (0.0038) +[2024-11-08 04:05:07,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.4, 300 sec: 6720.2). Total num frames: 138420224. Throughput: 0: 1764.0. Samples: 29599270. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:05:07,933][41694] Avg episode reward: [(0, '4.293')] +[2024-11-08 04:05:08,826][42004] Updated weights for policy 0, policy_version 33796 (0.0034) +[2024-11-08 04:05:12,933][41694] Fps is (10 sec: 7372.0, 60 sec: 6951.0, 300 sec: 6807.6). Total num frames: 138452992. Throughput: 0: 1767.2. Samples: 29610244. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:05:12,935][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 04:05:15,151][42004] Updated weights for policy 0, policy_version 33806 (0.0032) +[2024-11-08 04:05:17,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6963.3, 300 sec: 6817.4). Total num frames: 138485760. Throughput: 0: 1744.8. Samples: 29614834. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:05:17,935][41694] Avg episode reward: [(0, '4.531')] +[2024-11-08 04:05:22,790][42004] Updated weights for policy 0, policy_version 33816 (0.0032) +[2024-11-08 04:05:22,931][41694] Fps is (10 sec: 5735.0, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 138510336. Throughput: 0: 1650.9. Samples: 29622516. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:05:22,933][41694] Avg episode reward: [(0, '4.405')] +[2024-11-08 04:05:27,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 138543104. Throughput: 0: 1616.3. Samples: 29632010. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:05:27,934][41694] Avg episode reward: [(0, '4.444')] +[2024-11-08 04:05:29,137][42004] Updated weights for policy 0, policy_version 33826 (0.0025) +[2024-11-08 04:05:32,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6622.0, 300 sec: 6789.6). Total num frames: 138575872. Throughput: 0: 1603.3. Samples: 29636836. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:05:32,934][41694] Avg episode reward: [(0, '4.259')] +[2024-11-08 04:05:34,711][42004] Updated weights for policy 0, policy_version 33836 (0.0026) +[2024-11-08 04:05:37,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6690.1, 300 sec: 6789.7). Total num frames: 138616832. Throughput: 0: 1693.5. Samples: 29648316. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:05:37,933][41694] Avg episode reward: [(0, '4.308')] +[2024-11-08 04:05:37,945][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000033842_138616832.pth... +[2024-11-08 04:05:38,042][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000033446_136994816.pth +[2024-11-08 04:05:40,121][42004] Updated weights for policy 0, policy_version 33846 (0.0030) +[2024-11-08 04:05:42,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6690.1, 300 sec: 6789.6). Total num frames: 138653696. Throughput: 0: 1717.9. Samples: 29659380. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:05:42,934][41694] Avg episode reward: [(0, '4.371')] +[2024-11-08 04:05:45,558][42004] Updated weights for policy 0, policy_version 33856 (0.0022) +[2024-11-08 04:05:47,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6789.7). Total num frames: 138690560. Throughput: 0: 1712.1. Samples: 29664960. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:05:47,935][41694] Avg episode reward: [(0, '4.322')] +[2024-11-08 04:05:51,273][42004] Updated weights for policy 0, policy_version 33866 (0.0040) +[2024-11-08 04:05:52,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6789.6). Total num frames: 138727424. Throughput: 0: 1706.8. Samples: 29676078. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:05:52,933][41694] Avg episode reward: [(0, '4.541')] +[2024-11-08 04:05:57,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 138747904. Throughput: 0: 1631.5. Samples: 29683658. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:05:57,934][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 04:05:58,541][42004] Updated weights for policy 0, policy_version 33876 (0.0032) +[2024-11-08 04:06:02,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6690.1, 300 sec: 6720.2). Total num frames: 138780672. Throughput: 0: 1643.7. Samples: 29688802. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:06:02,935][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 04:06:05,039][42004] Updated weights for policy 0, policy_version 33886 (0.0034) +[2024-11-08 04:06:07,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6621.8, 300 sec: 6748.0). Total num frames: 138817536. Throughput: 0: 1693.1. Samples: 29698704. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:06:07,934][41694] Avg episode reward: [(0, '4.374')] +[2024-11-08 04:06:10,694][42004] Updated weights for policy 0, policy_version 33896 (0.0029) +[2024-11-08 04:06:12,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6622.0, 300 sec: 6734.1). Total num frames: 138850304. Throughput: 0: 1714.5. Samples: 29709164. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:06:12,936][41694] Avg episode reward: [(0, '4.407')] +[2024-11-08 04:06:16,329][42004] Updated weights for policy 0, policy_version 33906 (0.0030) +[2024-11-08 04:06:17,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6758.4, 300 sec: 6785.3). Total num frames: 138891264. Throughput: 0: 1724.7. Samples: 29714446. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:06:17,934][41694] Avg episode reward: [(0, '4.478')] +[2024-11-08 04:06:21,815][42004] Updated weights for policy 0, policy_version 33916 (0.0026) +[2024-11-08 04:06:22,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6803.5). Total num frames: 138928128. Throughput: 0: 1726.1. Samples: 29725992. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:06:22,933][41694] Avg episode reward: [(0, '4.321')] +[2024-11-08 04:06:27,350][42004] Updated weights for policy 0, policy_version 33926 (0.0027) +[2024-11-08 04:06:29,106][41694] Fps is (10 sec: 6231.3, 60 sec: 6829.5, 300 sec: 6776.5). Total num frames: 138960896. Throughput: 0: 1684.1. Samples: 29737144. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:06:29,113][41694] Avg episode reward: [(0, '4.450')] +[2024-11-08 04:06:32,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6826.6, 300 sec: 6748.0). Total num frames: 138985472. Throughput: 0: 1657.7. Samples: 29739556. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:06:32,934][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 04:06:35,210][42004] Updated weights for policy 0, policy_version 33936 (0.0035) +[2024-11-08 04:06:37,932][41694] Fps is (10 sec: 6033.4, 60 sec: 6621.8, 300 sec: 6720.2). Total num frames: 139014144. Throughput: 0: 1624.2. Samples: 29749166. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:06:37,933][41694] Avg episode reward: [(0, '4.656')] +[2024-11-08 04:06:41,461][42004] Updated weights for policy 0, policy_version 33946 (0.0040) +[2024-11-08 04:06:42,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6621.9, 300 sec: 6706.3). Total num frames: 139051008. Throughput: 0: 1682.5. Samples: 29759372. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:06:42,944][41694] Avg episode reward: [(0, '4.419')] +[2024-11-08 04:06:47,029][42004] Updated weights for policy 0, policy_version 33956 (0.0022) +[2024-11-08 04:06:47,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6621.8, 300 sec: 6706.3). Total num frames: 139087872. Throughput: 0: 1680.3. Samples: 29764416. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:06:47,934][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 04:06:52,472][42004] Updated weights for policy 0, policy_version 33966 (0.0025) +[2024-11-08 04:06:52,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6748.0). Total num frames: 139124736. Throughput: 0: 1715.3. Samples: 29775894. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:06:52,933][41694] Avg episode reward: [(0, '4.300')] +[2024-11-08 04:06:57,857][42004] Updated weights for policy 0, policy_version 33976 (0.0035) +[2024-11-08 04:06:57,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6775.9). Total num frames: 139165696. Throughput: 0: 1739.6. Samples: 29787448. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:06:57,934][41694] Avg episode reward: [(0, '4.465')] +[2024-11-08 04:07:02,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 6761.9). Total num frames: 139194368. Throughput: 0: 1745.9. Samples: 29793010. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:07:02,933][41694] Avg episode reward: [(0, '4.734')] +[2024-11-08 04:07:04,237][42004] Updated weights for policy 0, policy_version 33986 (0.0028) +[2024-11-08 04:07:07,931][41694] Fps is (10 sec: 6553.8, 60 sec: 6895.0, 300 sec: 6761.9). Total num frames: 139231232. Throughput: 0: 1692.4. Samples: 29802152. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:07:07,935][41694] Avg episode reward: [(0, '4.404')] +[2024-11-08 04:07:10,265][42004] Updated weights for policy 0, policy_version 33996 (0.0020) +[2024-11-08 04:07:12,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6894.9, 300 sec: 6734.1). Total num frames: 139264000. Throughput: 0: 1716.5. Samples: 29812372. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:07:12,933][41694] Avg episode reward: [(0, '4.503')] +[2024-11-08 04:07:15,646][42004] Updated weights for policy 0, policy_version 34006 (0.0028) +[2024-11-08 04:07:17,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6894.9, 300 sec: 6748.0). Total num frames: 139304960. Throughput: 0: 1752.0. Samples: 29818394. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:07:17,934][41694] Avg episode reward: [(0, '4.488')] +[2024-11-08 04:07:20,916][42004] Updated weights for policy 0, policy_version 34016 (0.0026) +[2024-11-08 04:07:22,942][41694] Fps is (10 sec: 7774.1, 60 sec: 6893.7, 300 sec: 6747.7). Total num frames: 139341824. Throughput: 0: 1800.0. Samples: 29830186. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:07:22,944][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 04:07:26,397][42004] Updated weights for policy 0, policy_version 34026 (0.0025) +[2024-11-08 04:07:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7102.2, 300 sec: 6789.6). Total num frames: 139378688. Throughput: 0: 1818.1. Samples: 29841188. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:07:27,933][41694] Avg episode reward: [(0, '4.209')] +[2024-11-08 04:07:32,121][42004] Updated weights for policy 0, policy_version 34036 (0.0027) +[2024-11-08 04:07:32,931][41694] Fps is (10 sec: 7380.6, 60 sec: 7168.0, 300 sec: 6803.5). Total num frames: 139415552. Throughput: 0: 1820.5. Samples: 29846340. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:07:32,933][41694] Avg episode reward: [(0, '4.429')] +[2024-11-08 04:07:37,931][41694] Fps is (10 sec: 6144.2, 60 sec: 7099.8, 300 sec: 6761.9). Total num frames: 139440128. Throughput: 0: 1745.2. Samples: 29854430. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:07:37,933][41694] Avg episode reward: [(0, '4.351')] +[2024-11-08 04:07:37,944][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000034043_139440128.pth... +[2024-11-08 04:07:38,063][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000033648_137822208.pth +[2024-11-08 04:07:39,507][42004] Updated weights for policy 0, policy_version 34046 (0.0035) +[2024-11-08 04:07:42,932][41694] Fps is (10 sec: 5734.3, 60 sec: 7031.5, 300 sec: 6748.0). Total num frames: 139472896. Throughput: 0: 1712.1. Samples: 29864492. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:07:42,933][41694] Avg episode reward: [(0, '4.568')] +[2024-11-08 04:07:45,953][42004] Updated weights for policy 0, policy_version 34056 (0.0042) +[2024-11-08 04:07:47,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6963.2, 300 sec: 6734.1). Total num frames: 139505664. Throughput: 0: 1697.6. Samples: 29869402. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:07:47,934][41694] Avg episode reward: [(0, '4.606')] +[2024-11-08 04:07:51,696][42004] Updated weights for policy 0, policy_version 34066 (0.0024) +[2024-11-08 04:07:52,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6963.2, 300 sec: 6734.1). Total num frames: 139542528. Throughput: 0: 1728.3. Samples: 29879928. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:07:52,937][41694] Avg episode reward: [(0, '4.398')] +[2024-11-08 04:07:57,527][42004] Updated weights for policy 0, policy_version 34076 (0.0031) +[2024-11-08 04:07:57,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6826.7, 300 sec: 6734.1). Total num frames: 139575296. Throughput: 0: 1733.6. Samples: 29890386. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:07:57,934][41694] Avg episode reward: [(0, '4.332')] +[2024-11-08 04:08:02,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6963.2, 300 sec: 6789.9). Total num frames: 139612160. Throughput: 0: 1716.5. Samples: 29895634. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:08:02,933][41694] Avg episode reward: [(0, '4.546')] +[2024-11-08 04:08:03,423][42004] Updated weights for policy 0, policy_version 34086 (0.0035) +[2024-11-08 04:08:07,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6963.2, 300 sec: 6803.5). Total num frames: 139649024. Throughput: 0: 1692.6. Samples: 29906336. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:08:07,933][41694] Avg episode reward: [(0, '4.386')] +[2024-11-08 04:08:09,617][42004] Updated weights for policy 0, policy_version 34096 (0.0030) +[2024-11-08 04:08:12,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6894.9, 300 sec: 6789.6). Total num frames: 139677696. Throughput: 0: 1664.6. Samples: 29916094. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:08:12,933][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 04:08:15,764][42004] Updated weights for policy 0, policy_version 34106 (0.0025) +[2024-11-08 04:08:17,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 139710464. Throughput: 0: 1658.9. Samples: 29920990. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:08:17,933][41694] Avg episode reward: [(0, '4.530')] +[2024-11-08 04:08:21,562][42004] Updated weights for policy 0, policy_version 34116 (0.0028) +[2024-11-08 04:08:22,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6759.6, 300 sec: 6775.8). Total num frames: 139747328. Throughput: 0: 1711.2. Samples: 29931436. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:08:22,933][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 04:08:27,023][42004] Updated weights for policy 0, policy_version 34126 (0.0023) +[2024-11-08 04:08:27,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 139784192. Throughput: 0: 1737.1. Samples: 29942662. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:08:27,934][41694] Avg episode reward: [(0, '4.361')] +[2024-11-08 04:08:32,679][42004] Updated weights for policy 0, policy_version 34136 (0.0039) +[2024-11-08 04:08:32,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6810.6). Total num frames: 139821056. Throughput: 0: 1741.3. Samples: 29947760. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:08:32,934][41694] Avg episode reward: [(0, '4.284')] +[2024-11-08 04:08:37,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6963.2, 300 sec: 6831.3). Total num frames: 139857920. Throughput: 0: 1750.6. Samples: 29958706. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:08:37,934][41694] Avg episode reward: [(0, '4.555')] +[2024-11-08 04:08:38,275][42004] Updated weights for policy 0, policy_version 34146 (0.0026) +[2024-11-08 04:08:42,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 139882496. Throughput: 0: 1727.8. Samples: 29968136. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:08:42,933][41694] Avg episode reward: [(0, '4.607')] +[2024-11-08 04:08:45,676][42004] Updated weights for policy 0, policy_version 34156 (0.0047) +[2024-11-08 04:08:47,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 139915264. Throughput: 0: 1697.3. Samples: 29972012. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:08:47,934][41694] Avg episode reward: [(0, '4.401')] +[2024-11-08 04:08:51,933][42004] Updated weights for policy 0, policy_version 34166 (0.0031) +[2024-11-08 04:08:52,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 139948032. Throughput: 0: 1680.4. Samples: 29981954. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:08:52,934][41694] Avg episode reward: [(0, '4.534')] +[2024-11-08 04:08:57,345][42004] Updated weights for policy 0, policy_version 34176 (0.0026) +[2024-11-08 04:08:57,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6895.0, 300 sec: 6817.4). Total num frames: 139988992. Throughput: 0: 1709.9. Samples: 29993040. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:08:57,933][41694] Avg episode reward: [(0, '4.637')] +[2024-11-08 04:09:02,925][42004] Updated weights for policy 0, policy_version 34186 (0.0030) +[2024-11-08 04:09:02,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6894.9, 300 sec: 6817.4). Total num frames: 140025856. Throughput: 0: 1726.0. Samples: 29998658. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:09:02,936][41694] Avg episode reward: [(0, '4.567')] +[2024-11-08 04:09:07,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6894.9, 300 sec: 6870.4). Total num frames: 140062720. Throughput: 0: 1743.6. Samples: 30009900. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:09:07,934][41694] Avg episode reward: [(0, '4.482')] +[2024-11-08 04:09:08,226][42004] Updated weights for policy 0, policy_version 34196 (0.0031) +[2024-11-08 04:09:12,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7031.5, 300 sec: 6886.9). Total num frames: 140099584. Throughput: 0: 1744.4. Samples: 30021160. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:09:12,933][41694] Avg episode reward: [(0, '4.431')] +[2024-11-08 04:09:13,621][42004] Updated weights for policy 0, policy_version 34206 (0.0038) +[2024-11-08 04:09:17,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6894.9, 300 sec: 6859.1). Total num frames: 140124160. Throughput: 0: 1760.7. Samples: 30026990. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:09:17,933][41694] Avg episode reward: [(0, '4.364')] +[2024-11-08 04:09:20,899][42004] Updated weights for policy 0, policy_version 34216 (0.0036) +[2024-11-08 04:09:22,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6895.0, 300 sec: 6859.1). Total num frames: 140161024. Throughput: 0: 1692.5. Samples: 30034870. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:09:22,933][41694] Avg episode reward: [(0, '4.322')] +[2024-11-08 04:09:26,957][42004] Updated weights for policy 0, policy_version 34226 (0.0031) +[2024-11-08 04:09:27,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6826.6, 300 sec: 6831.3). Total num frames: 140193792. Throughput: 0: 1709.5. Samples: 30045064. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:09:27,933][41694] Avg episode reward: [(0, '4.261')] +[2024-11-08 04:09:32,169][42004] Updated weights for policy 0, policy_version 34236 (0.0028) +[2024-11-08 04:09:32,932][41694] Fps is (10 sec: 7372.4, 60 sec: 6894.9, 300 sec: 6845.2). Total num frames: 140234752. Throughput: 0: 1752.1. Samples: 30050856. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:09:32,935][41694] Avg episode reward: [(0, '4.181')] +[2024-11-08 04:09:37,560][42004] Updated weights for policy 0, policy_version 34246 (0.0027) +[2024-11-08 04:09:37,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6894.9, 300 sec: 6845.2). Total num frames: 140271616. Throughput: 0: 1791.1. Samples: 30062556. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:09:37,935][41694] Avg episode reward: [(0, '4.383')] +[2024-11-08 04:09:37,951][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000034246_140271616.pth... +[2024-11-08 04:09:38,050][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000033842_138616832.pth +[2024-11-08 04:09:42,931][41694] Fps is (10 sec: 7373.3, 60 sec: 7099.7, 300 sec: 6886.8). Total num frames: 140308480. Throughput: 0: 1789.4. Samples: 30073564. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:09:42,933][41694] Avg episode reward: [(0, '4.280')] +[2024-11-08 04:09:43,161][42004] Updated weights for policy 0, policy_version 34256 (0.0041) +[2024-11-08 04:09:47,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7168.0, 300 sec: 6900.7). Total num frames: 140345344. Throughput: 0: 1780.3. Samples: 30078770. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:09:47,934][41694] Avg episode reward: [(0, '4.332')] +[2024-11-08 04:09:48,872][42004] Updated weights for policy 0, policy_version 34266 (0.0031) +[2024-11-08 04:09:52,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6963.2, 300 sec: 6859.1). Total num frames: 140365824. Throughput: 0: 1697.2. Samples: 30086272. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:09:52,934][41694] Avg episode reward: [(0, '4.280')] +[2024-11-08 04:09:57,060][42004] Updated weights for policy 0, policy_version 34276 (0.0025) +[2024-11-08 04:09:57,932][41694] Fps is (10 sec: 5325.0, 60 sec: 6826.6, 300 sec: 6845.2). Total num frames: 140398592. Throughput: 0: 1666.4. Samples: 30096146. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:09:57,935][41694] Avg episode reward: [(0, '4.401')] +[2024-11-08 04:10:02,833][42004] Updated weights for policy 0, policy_version 34286 (0.0024) +[2024-11-08 04:10:02,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6831.3). Total num frames: 140435456. Throughput: 0: 1651.2. Samples: 30101296. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:10:02,934][41694] Avg episode reward: [(0, '4.574')] +[2024-11-08 04:10:07,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6845.2). Total num frames: 140472320. Throughput: 0: 1717.3. Samples: 30112150. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:10:07,933][41694] Avg episode reward: [(0, '4.393')] +[2024-11-08 04:10:08,261][42004] Updated weights for policy 0, policy_version 34296 (0.0033) +[2024-11-08 04:10:12,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6859.1). Total num frames: 140509184. Throughput: 0: 1742.4. Samples: 30123470. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:10:12,933][41694] Avg episode reward: [(0, '4.499')] +[2024-11-08 04:10:14,043][42004] Updated weights for policy 0, policy_version 34306 (0.0029) +[2024-11-08 04:10:17,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6963.2, 300 sec: 6886.8). Total num frames: 140541952. Throughput: 0: 1718.6. Samples: 30128190. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 04:10:17,934][41694] Avg episode reward: [(0, '4.260')] +[2024-11-08 04:10:20,348][42004] Updated weights for policy 0, policy_version 34316 (0.0036) +[2024-11-08 04:10:22,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6894.9, 300 sec: 6886.8). Total num frames: 140574720. Throughput: 0: 1685.6. Samples: 30138410. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 04:10:22,934][41694] Avg episode reward: [(0, '4.509')] +[2024-11-08 04:10:27,722][42004] Updated weights for policy 0, policy_version 34326 (0.0033) +[2024-11-08 04:10:27,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6859.1). Total num frames: 140599296. Throughput: 0: 1612.6. Samples: 30146132. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 04:10:27,933][41694] Avg episode reward: [(0, '4.306')] +[2024-11-08 04:10:32,931][41694] Fps is (10 sec: 5734.6, 60 sec: 6621.9, 300 sec: 6831.3). Total num frames: 140632064. Throughput: 0: 1601.3. Samples: 30150828. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 04:10:32,936][41694] Avg episode reward: [(0, '4.442')] +[2024-11-08 04:10:33,663][42004] Updated weights for policy 0, policy_version 34336 (0.0027) +[2024-11-08 04:10:37,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6621.9, 300 sec: 6831.3). Total num frames: 140668928. Throughput: 0: 1680.9. Samples: 30161912. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:10:37,933][41694] Avg episode reward: [(0, '4.558')] +[2024-11-08 04:10:39,075][42004] Updated weights for policy 0, policy_version 34346 (0.0019) +[2024-11-08 04:10:42,933][41694] Fps is (10 sec: 7781.5, 60 sec: 6690.0, 300 sec: 6845.2). Total num frames: 140709888. Throughput: 0: 1719.2. Samples: 30173512. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:10:42,935][41694] Avg episode reward: [(0, '4.329')] +[2024-11-08 04:10:44,479][42004] Updated weights for policy 0, policy_version 34356 (0.0025) +[2024-11-08 04:10:47,932][41694] Fps is (10 sec: 7782.0, 60 sec: 6690.1, 300 sec: 6845.2). Total num frames: 140746752. Throughput: 0: 1725.5. Samples: 30178944. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:10:47,935][41694] Avg episode reward: [(0, '4.518')] +[2024-11-08 04:10:49,986][42004] Updated weights for policy 0, policy_version 34366 (0.0023) +[2024-11-08 04:10:52,931][41694] Fps is (10 sec: 7373.6, 60 sec: 6963.2, 300 sec: 6900.7). Total num frames: 140783616. Throughput: 0: 1735.8. Samples: 30190260. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:10:52,934][41694] Avg episode reward: [(0, '4.524')] +[2024-11-08 04:10:55,666][42004] Updated weights for policy 0, policy_version 34376 (0.0037) +[2024-11-08 04:10:59,060][41694] Fps is (10 sec: 5889.0, 60 sec: 6767.6, 300 sec: 6860.6). Total num frames: 140812288. Throughput: 0: 1679.6. Samples: 30200950. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:10:59,062][41694] Avg episode reward: [(0, '4.418')] +[2024-11-08 04:11:02,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6690.1, 300 sec: 6845.2). Total num frames: 140836864. Throughput: 0: 1658.6. Samples: 30202828. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:11:02,934][41694] Avg episode reward: [(0, '4.340')] +[2024-11-08 04:11:03,849][42004] Updated weights for policy 0, policy_version 34386 (0.0038) +[2024-11-08 04:11:07,931][41694] Fps is (10 sec: 6464.3, 60 sec: 6621.9, 300 sec: 6845.2). Total num frames: 140869632. Throughput: 0: 1636.9. Samples: 30212070. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:11:07,933][41694] Avg episode reward: [(0, '4.390')] +[2024-11-08 04:11:09,924][42004] Updated weights for policy 0, policy_version 34396 (0.0032) +[2024-11-08 04:11:12,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6817.4). Total num frames: 140902400. Throughput: 0: 1684.4. Samples: 30221930. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:11:12,933][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 04:11:16,161][42004] Updated weights for policy 0, policy_version 34406 (0.0028) +[2024-11-08 04:11:17,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 6817.4). Total num frames: 140939264. Throughput: 0: 1691.5. Samples: 30226944. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:11:17,933][41694] Avg episode reward: [(0, '4.534')] +[2024-11-08 04:11:21,785][42004] Updated weights for policy 0, policy_version 34416 (0.0042) +[2024-11-08 04:11:22,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.2, 300 sec: 6858.6). Total num frames: 140976128. Throughput: 0: 1692.3. Samples: 30238066. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:11:22,933][41694] Avg episode reward: [(0, '4.580')] +[2024-11-08 04:11:27,180][42004] Updated weights for policy 0, policy_version 34426 (0.0032) +[2024-11-08 04:11:27,932][41694] Fps is (10 sec: 7372.4, 60 sec: 6894.9, 300 sec: 6872.9). Total num frames: 141012992. Throughput: 0: 1689.0. Samples: 30249514. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:11:27,935][41694] Avg episode reward: [(0, '4.377')] +[2024-11-08 04:11:33,054][41694] Fps is (10 sec: 6069.4, 60 sec: 6744.6, 300 sec: 6856.2). Total num frames: 141037568. Throughput: 0: 1687.7. Samples: 30255098. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:11:33,056][41694] Avg episode reward: [(0, '4.490')] +[2024-11-08 04:11:34,442][42004] Updated weights for policy 0, policy_version 34436 (0.0035) +[2024-11-08 04:11:37,932][41694] Fps is (10 sec: 5734.6, 60 sec: 6690.1, 300 sec: 6845.2). Total num frames: 141070336. Throughput: 0: 1611.3. Samples: 30262768. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:11:37,935][41694] Avg episode reward: [(0, '4.476')] +[2024-11-08 04:11:37,957][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000034441_141070336.pth... +[2024-11-08 04:11:38,105][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000034043_139440128.pth +[2024-11-08 04:11:40,600][42004] Updated weights for policy 0, policy_version 34446 (0.0044) +[2024-11-08 04:11:42,931][41694] Fps is (10 sec: 6635.2, 60 sec: 6553.7, 300 sec: 6831.3). Total num frames: 141103104. Throughput: 0: 1634.0. Samples: 30272634. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:11:42,933][41694] Avg episode reward: [(0, '4.377')] +[2024-11-08 04:11:46,137][42004] Updated weights for policy 0, policy_version 34456 (0.0031) +[2024-11-08 04:11:47,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6621.9, 300 sec: 6845.2). Total num frames: 141144064. Throughput: 0: 1676.7. Samples: 30278282. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:11:47,934][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 04:11:51,510][42004] Updated weights for policy 0, policy_version 34466 (0.0030) +[2024-11-08 04:11:52,931][41694] Fps is (10 sec: 7782.3, 60 sec: 6621.9, 300 sec: 6831.3). Total num frames: 141180928. Throughput: 0: 1725.1. Samples: 30289698. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:11:52,933][41694] Avg episode reward: [(0, '4.468')] +[2024-11-08 04:11:57,039][42004] Updated weights for policy 0, policy_version 34476 (0.0031) +[2024-11-08 04:11:57,932][41694] Fps is (10 sec: 7373.1, 60 sec: 6888.0, 300 sec: 6859.1). Total num frames: 141217792. Throughput: 0: 1756.8. Samples: 30300988. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:11:57,934][41694] Avg episode reward: [(0, '4.406')] +[2024-11-08 04:12:02,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6894.9, 300 sec: 6845.2). Total num frames: 141250560. Throughput: 0: 1765.3. Samples: 30306382. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:12:02,936][41694] Avg episode reward: [(0, '4.536')] +[2024-11-08 04:12:03,023][42004] Updated weights for policy 0, policy_version 34486 (0.0033) +[2024-11-08 04:12:07,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6690.1, 300 sec: 6803.5). Total num frames: 141271040. Throughput: 0: 1705.9. Samples: 30314830. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:12:07,936][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 04:12:11,313][42004] Updated weights for policy 0, policy_version 34496 (0.0028) +[2024-11-08 04:12:12,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 141303808. Throughput: 0: 1617.3. Samples: 30322292. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:12:12,933][41694] Avg episode reward: [(0, '4.405')] +[2024-11-08 04:12:17,558][42004] Updated weights for policy 0, policy_version 34506 (0.0027) +[2024-11-08 04:12:17,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6621.9, 300 sec: 6762.1). Total num frames: 141336576. Throughput: 0: 1594.7. Samples: 30326664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:12:17,933][41694] Avg episode reward: [(0, '4.740')] +[2024-11-08 04:12:22,712][42004] Updated weights for policy 0, policy_version 34516 (0.0025) +[2024-11-08 04:12:22,932][41694] Fps is (10 sec: 7372.4, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 141377536. Throughput: 0: 1684.2. Samples: 30338558. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:12:22,935][41694] Avg episode reward: [(0, '4.444')] +[2024-11-08 04:12:27,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6690.2, 300 sec: 6775.8). Total num frames: 141414400. Throughput: 0: 1727.6. Samples: 30350378. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:12:27,933][41694] Avg episode reward: [(0, '4.286')] +[2024-11-08 04:12:28,002][42004] Updated weights for policy 0, policy_version 34526 (0.0025) +[2024-11-08 04:12:32,932][41694] Fps is (10 sec: 7782.7, 60 sec: 6977.5, 300 sec: 6831.3). Total num frames: 141455360. Throughput: 0: 1728.7. Samples: 30356072. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:12:32,935][41694] Avg episode reward: [(0, '4.499')] +[2024-11-08 04:12:33,264][42004] Updated weights for policy 0, policy_version 34536 (0.0028) +[2024-11-08 04:12:37,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7031.5, 300 sec: 6845.2). Total num frames: 141492224. Throughput: 0: 1728.0. Samples: 30367456. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:12:37,933][41694] Avg episode reward: [(0, '4.519')] +[2024-11-08 04:12:38,777][42004] Updated weights for policy 0, policy_version 34546 (0.0035) +[2024-11-08 04:12:42,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6826.6, 300 sec: 6803.5). Total num frames: 141512704. Throughput: 0: 1636.6. Samples: 30374636. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:12:42,936][41694] Avg episode reward: [(0, '4.322')] +[2024-11-08 04:12:47,174][42004] Updated weights for policy 0, policy_version 34556 (0.0026) +[2024-11-08 04:12:47,932][41694] Fps is (10 sec: 5324.6, 60 sec: 6690.2, 300 sec: 6789.6). Total num frames: 141545472. Throughput: 0: 1617.6. Samples: 30379176. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:12:47,934][41694] Avg episode reward: [(0, '4.511')] +[2024-11-08 04:12:52,933][41694] Fps is (10 sec: 6552.7, 60 sec: 6621.7, 300 sec: 6789.6). Total num frames: 141578240. Throughput: 0: 1660.6. Samples: 30389560. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:12:52,937][41694] Avg episode reward: [(0, '4.501')] +[2024-11-08 04:12:52,948][42004] Updated weights for policy 0, policy_version 34566 (0.0028) +[2024-11-08 04:12:57,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6690.1, 300 sec: 6803.5). Total num frames: 141619200. Throughput: 0: 1741.1. Samples: 30400642. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:12:57,934][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 04:12:58,397][42004] Updated weights for policy 0, policy_version 34576 (0.0030) +[2024-11-08 04:13:02,931][41694] Fps is (10 sec: 7374.0, 60 sec: 6690.2, 300 sec: 6789.6). Total num frames: 141651968. Throughput: 0: 1765.5. Samples: 30406112. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:13:02,933][41694] Avg episode reward: [(0, '4.505')] +[2024-11-08 04:13:04,729][42004] Updated weights for policy 0, policy_version 34586 (0.0033) +[2024-11-08 04:13:07,932][41694] Fps is (10 sec: 6553.3, 60 sec: 6894.9, 300 sec: 6803.5). Total num frames: 141684736. Throughput: 0: 1716.0. Samples: 30415776. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:13:07,934][41694] Avg episode reward: [(0, '4.244')] +[2024-11-08 04:13:10,877][42004] Updated weights for policy 0, policy_version 34596 (0.0029) +[2024-11-08 04:13:12,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6894.9, 300 sec: 6803.5). Total num frames: 141717504. Throughput: 0: 1680.9. Samples: 30426020. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:13:12,933][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 04:13:17,932][41694] Fps is (10 sec: 4915.4, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 141733888. Throughput: 0: 1619.2. Samples: 30428936. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:13:17,933][41694] Avg episode reward: [(0, '4.341')] +[2024-11-08 04:13:19,541][42004] Updated weights for policy 0, policy_version 34606 (0.0031) +[2024-11-08 04:13:22,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6485.4, 300 sec: 6720.2). Total num frames: 141766656. Throughput: 0: 1524.4. Samples: 30436054. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:13:22,933][41694] Avg episode reward: [(0, '4.564')] +[2024-11-08 04:13:25,771][42004] Updated weights for policy 0, policy_version 34616 (0.0029) +[2024-11-08 04:13:27,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6706.3). Total num frames: 141799424. Throughput: 0: 1595.3. Samples: 30446424. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:13:27,934][41694] Avg episode reward: [(0, '4.413')] +[2024-11-08 04:13:32,013][42004] Updated weights for policy 0, policy_version 34626 (0.0033) +[2024-11-08 04:13:32,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6692.4). Total num frames: 141832192. Throughput: 0: 1595.4. Samples: 30450968. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:13:32,935][41694] Avg episode reward: [(0, '4.375')] +[2024-11-08 04:13:37,541][42004] Updated weights for policy 0, policy_version 34636 (0.0022) +[2024-11-08 04:13:37,933][41694] Fps is (10 sec: 6962.1, 60 sec: 6280.4, 300 sec: 6734.1). Total num frames: 141869056. Throughput: 0: 1608.6. Samples: 30461946. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:13:37,936][41694] Avg episode reward: [(0, '4.230')] +[2024-11-08 04:13:38,056][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000034637_141873152.pth... +[2024-11-08 04:13:38,152][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000034246_140271616.pth +[2024-11-08 04:13:42,919][42004] Updated weights for policy 0, policy_version 34646 (0.0032) +[2024-11-08 04:13:42,932][41694] Fps is (10 sec: 7782.6, 60 sec: 6621.9, 300 sec: 6761.9). Total num frames: 141910016. Throughput: 0: 1619.6. Samples: 30473526. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:13:42,933][41694] Avg episode reward: [(0, '4.383')] +[2024-11-08 04:13:47,932][41694] Fps is (10 sec: 7783.7, 60 sec: 6690.2, 300 sec: 6775.8). Total num frames: 141946880. Throughput: 0: 1618.4. Samples: 30478942. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:13:47,934][41694] Avg episode reward: [(0, '4.340')] +[2024-11-08 04:13:50,333][42004] Updated weights for policy 0, policy_version 34656 (0.0025) +[2024-11-08 04:13:52,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6485.5, 300 sec: 6706.3). Total num frames: 141967360. Throughput: 0: 1568.5. Samples: 30486360. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:13:52,934][41694] Avg episode reward: [(0, '4.351')] +[2024-11-08 04:13:56,673][42004] Updated weights for policy 0, policy_version 34666 (0.0026) +[2024-11-08 04:13:57,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6348.8, 300 sec: 6692.4). Total num frames: 142000128. Throughput: 0: 1556.3. Samples: 30496052. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:13:57,934][41694] Avg episode reward: [(0, '4.461')] +[2024-11-08 04:14:02,267][42004] Updated weights for policy 0, policy_version 34676 (0.0025) +[2024-11-08 04:14:02,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6417.0, 300 sec: 6692.4). Total num frames: 142036992. Throughput: 0: 1615.1. Samples: 30501616. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:14:02,934][41694] Avg episode reward: [(0, '4.457')] +[2024-11-08 04:14:07,517][42004] Updated weights for policy 0, policy_version 34686 (0.0026) +[2024-11-08 04:14:07,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6485.4, 300 sec: 6692.4). Total num frames: 142073856. Throughput: 0: 1708.8. Samples: 30512952. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:14:07,933][41694] Avg episode reward: [(0, '4.553')] +[2024-11-08 04:14:12,762][42004] Updated weights for policy 0, policy_version 34696 (0.0038) +[2024-11-08 04:14:12,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6621.9, 300 sec: 6748.0). Total num frames: 142114816. Throughput: 0: 1739.3. Samples: 30524694. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:14:12,933][41694] Avg episode reward: [(0, '4.350')] +[2024-11-08 04:14:17,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6963.2, 300 sec: 6748.0). Total num frames: 142151680. Throughput: 0: 1765.0. Samples: 30530392. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:14:17,935][41694] Avg episode reward: [(0, '4.299')] +[2024-11-08 04:14:18,282][42004] Updated weights for policy 0, policy_version 34706 (0.0030) +[2024-11-08 04:14:24,551][41694] Fps is (10 sec: 5992.6, 60 sec: 6780.2, 300 sec: 6711.1). Total num frames: 142184448. Throughput: 0: 1707.4. Samples: 30541540. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:14:24,553][41694] Avg episode reward: [(0, '4.436')] +[2024-11-08 04:14:25,962][42004] Updated weights for policy 0, policy_version 34716 (0.0040) +[2024-11-08 04:14:27,932][41694] Fps is (10 sec: 5324.5, 60 sec: 6758.3, 300 sec: 6678.6). Total num frames: 142204928. Throughput: 0: 1658.6. Samples: 30548164. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:14:27,935][41694] Avg episode reward: [(0, '4.389')] +[2024-11-08 04:14:32,149][42004] Updated weights for policy 0, policy_version 34726 (0.0022) +[2024-11-08 04:14:32,932][41694] Fps is (10 sec: 6842.5, 60 sec: 6826.7, 300 sec: 6678.6). Total num frames: 142241792. Throughput: 0: 1645.4. Samples: 30552984. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:14:32,935][41694] Avg episode reward: [(0, '4.674')] +[2024-11-08 04:14:37,286][42004] Updated weights for policy 0, policy_version 34736 (0.0032) +[2024-11-08 04:14:37,931][41694] Fps is (10 sec: 7783.0, 60 sec: 6895.1, 300 sec: 6692.4). Total num frames: 142282752. Throughput: 0: 1741.1. Samples: 30564708. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:14:37,934][41694] Avg episode reward: [(0, '4.421')] +[2024-11-08 04:14:42,487][42004] Updated weights for policy 0, policy_version 34746 (0.0024) +[2024-11-08 04:14:42,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6826.7, 300 sec: 6692.5). Total num frames: 142319616. Throughput: 0: 1788.9. Samples: 30576550. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:14:42,933][41694] Avg episode reward: [(0, '4.622')] +[2024-11-08 04:14:47,696][42004] Updated weights for policy 0, policy_version 34756 (0.0029) +[2024-11-08 04:14:47,934][41694] Fps is (10 sec: 7780.3, 60 sec: 6894.6, 300 sec: 6761.8). Total num frames: 142360576. Throughput: 0: 1792.7. Samples: 30582294. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:14:47,937][41694] Avg episode reward: [(0, '4.558')] +[2024-11-08 04:14:52,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7168.0, 300 sec: 6775.8). Total num frames: 142397440. Throughput: 0: 1797.6. Samples: 30593844. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:14:52,934][41694] Avg episode reward: [(0, '4.584')] +[2024-11-08 04:14:53,431][42004] Updated weights for policy 0, policy_version 34766 (0.0036) +[2024-11-08 04:14:58,936][41694] Fps is (10 sec: 5956.9, 60 sec: 6982.9, 300 sec: 6725.1). Total num frames: 142426112. Throughput: 0: 1616.6. Samples: 30599064. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:14:58,938][41694] Avg episode reward: [(0, '4.364')] +[2024-11-08 04:15:01,728][42004] Updated weights for policy 0, policy_version 34776 (0.0028) +[2024-11-08 04:15:02,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6826.7, 300 sec: 6692.4). Total num frames: 142446592. Throughput: 0: 1673.7. Samples: 30605708. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:15:02,934][41694] Avg episode reward: [(0, '4.351')] +[2024-11-08 04:15:07,726][42004] Updated weights for policy 0, policy_version 34786 (0.0030) +[2024-11-08 04:15:07,934][41694] Fps is (10 sec: 6372.9, 60 sec: 6826.4, 300 sec: 6692.4). Total num frames: 142483456. Throughput: 0: 1697.4. Samples: 30615176. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:15:07,936][41694] Avg episode reward: [(0, '4.398')] +[2024-11-08 04:15:12,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 142520320. Throughput: 0: 1743.7. Samples: 30626628. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:15:12,935][41694] Avg episode reward: [(0, '4.483')] +[2024-11-08 04:15:12,990][42004] Updated weights for policy 0, policy_version 34796 (0.0029) +[2024-11-08 04:15:17,932][41694] Fps is (10 sec: 6964.9, 60 sec: 6690.2, 300 sec: 6706.3). Total num frames: 142553088. Throughput: 0: 1746.9. Samples: 30631596. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:15:17,933][41694] Avg episode reward: [(0, '4.402')] +[2024-11-08 04:15:19,232][42004] Updated weights for policy 0, policy_version 34806 (0.0022) +[2024-11-08 04:15:22,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6945.9, 300 sec: 6748.0). Total num frames: 142589952. Throughput: 0: 1721.1. Samples: 30642156. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:15:22,933][41694] Avg episode reward: [(0, '4.607')] +[2024-11-08 04:15:24,505][42004] Updated weights for policy 0, policy_version 34816 (0.0027) +[2024-11-08 04:15:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.6, 300 sec: 6761.9). Total num frames: 142626816. Throughput: 0: 1706.5. Samples: 30653342. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:15:27,936][41694] Avg episode reward: [(0, '4.239')] +[2024-11-08 04:15:30,193][42004] Updated weights for policy 0, policy_version 34826 (0.0031) +[2024-11-08 04:15:33,395][41694] Fps is (10 sec: 6263.4, 60 sec: 6842.1, 300 sec: 6723.5). Total num frames: 142655488. Throughput: 0: 1687.5. Samples: 30659010. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:15:33,396][41694] Avg episode reward: [(0, '4.302')] +[2024-11-08 04:15:37,934][41694] Fps is (10 sec: 5732.8, 60 sec: 6689.8, 300 sec: 6692.4). Total num frames: 142684160. Throughput: 0: 1588.1. Samples: 30665312. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:15:37,939][41694] Avg episode reward: [(0, '4.409')] +[2024-11-08 04:15:37,971][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000034835_142684160.pth... +[2024-11-08 04:15:38,064][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000034441_141070336.pth +[2024-11-08 04:15:38,432][42004] Updated weights for policy 0, policy_version 34836 (0.0030) +[2024-11-08 04:15:42,932][41694] Fps is (10 sec: 6871.9, 60 sec: 6690.1, 300 sec: 6692.5). Total num frames: 142721024. Throughput: 0: 1763.6. Samples: 30676654. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:15:42,933][41694] Avg episode reward: [(0, '4.334')] +[2024-11-08 04:15:43,737][42004] Updated weights for policy 0, policy_version 34846 (0.0031) +[2024-11-08 04:15:47,932][41694] Fps is (10 sec: 7374.9, 60 sec: 6622.2, 300 sec: 6692.4). Total num frames: 142757888. Throughput: 0: 1700.4. Samples: 30682226. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:15:47,934][41694] Avg episode reward: [(0, '4.568')] +[2024-11-08 04:15:48,984][42004] Updated weights for policy 0, policy_version 34856 (0.0021) +[2024-11-08 04:15:52,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6690.1, 300 sec: 6760.0). Total num frames: 142798848. Throughput: 0: 1742.8. Samples: 30693598. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:15:52,933][41694] Avg episode reward: [(0, '4.407')] +[2024-11-08 04:15:54,543][42004] Updated weights for policy 0, policy_version 34866 (0.0030) +[2024-11-08 04:15:57,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6942.9, 300 sec: 6775.8). Total num frames: 142835712. Throughput: 0: 1746.5. Samples: 30705220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:15:57,934][41694] Avg episode reward: [(0, '4.494')] +[2024-11-08 04:15:59,979][42004] Updated weights for policy 0, policy_version 34876 (0.0026) +[2024-11-08 04:16:02,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7099.8, 300 sec: 6789.6). Total num frames: 142872576. Throughput: 0: 1756.7. Samples: 30710646. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:16:02,933][41694] Avg episode reward: [(0, '4.424')] +[2024-11-08 04:16:07,550][42004] Updated weights for policy 0, policy_version 34886 (0.0032) +[2024-11-08 04:16:07,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6826.9, 300 sec: 6748.0). Total num frames: 142893056. Throughput: 0: 1745.9. Samples: 30720722. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:16:07,933][41694] Avg episode reward: [(0, '4.505')] +[2024-11-08 04:16:12,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6690.1, 300 sec: 6720.2). Total num frames: 142921728. Throughput: 0: 1646.1. Samples: 30727416. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:16:12,934][41694] Avg episode reward: [(0, '4.503')] +[2024-11-08 04:16:14,048][42004] Updated weights for policy 0, policy_version 34896 (0.0027) +[2024-11-08 04:16:17,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6758.4, 300 sec: 6720.2). Total num frames: 142958592. Throughput: 0: 1651.4. Samples: 30732560. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:16:17,934][41694] Avg episode reward: [(0, '4.283')] +[2024-11-08 04:16:19,891][42004] Updated weights for policy 0, policy_version 34906 (0.0037) +[2024-11-08 04:16:22,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6758.4, 300 sec: 6720.2). Total num frames: 142995456. Throughput: 0: 1737.2. Samples: 30743480. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:16:22,935][41694] Avg episode reward: [(0, '4.389')] +[2024-11-08 04:16:25,311][42004] Updated weights for policy 0, policy_version 34916 (0.0026) +[2024-11-08 04:16:27,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6758.4, 300 sec: 6764.7). Total num frames: 143032320. Throughput: 0: 1734.3. Samples: 30754698. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:16:27,933][41694] Avg episode reward: [(0, '4.603')] +[2024-11-08 04:16:30,603][42004] Updated weights for policy 0, policy_version 34926 (0.0032) +[2024-11-08 04:16:32,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7017.4, 300 sec: 6789.6). Total num frames: 143073280. Throughput: 0: 1741.4. Samples: 30760588. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:16:32,934][41694] Avg episode reward: [(0, '4.537')] +[2024-11-08 04:16:36,011][42004] Updated weights for policy 0, policy_version 34936 (0.0027) +[2024-11-08 04:16:37,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7100.1, 300 sec: 6803.5). Total num frames: 143110144. Throughput: 0: 1740.5. Samples: 30771920. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:16:37,933][41694] Avg episode reward: [(0, '4.482')] +[2024-11-08 04:16:42,934][41694] Fps is (10 sec: 5733.2, 60 sec: 6826.4, 300 sec: 6734.1). Total num frames: 143130624. Throughput: 0: 1637.9. Samples: 30778930. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:16:42,936][41694] Avg episode reward: [(0, '4.433')] +[2024-11-08 04:16:44,059][42004] Updated weights for policy 0, policy_version 34946 (0.0023) +[2024-11-08 04:16:47,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6758.4, 300 sec: 6720.2). Total num frames: 143163392. Throughput: 0: 1625.5. Samples: 30783796. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:16:47,933][41694] Avg episode reward: [(0, '4.356')] +[2024-11-08 04:16:49,602][42004] Updated weights for policy 0, policy_version 34956 (0.0028) +[2024-11-08 04:16:52,931][41694] Fps is (10 sec: 7374.6, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 143204352. Throughput: 0: 1654.8. Samples: 30795186. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:16:52,933][41694] Avg episode reward: [(0, '4.458')] +[2024-11-08 04:16:55,036][42004] Updated weights for policy 0, policy_version 34966 (0.0025) +[2024-11-08 04:16:57,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 143241216. Throughput: 0: 1763.0. Samples: 30806752. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:16:57,933][41694] Avg episode reward: [(0, '4.391')] +[2024-11-08 04:17:00,502][42004] Updated weights for policy 0, policy_version 34976 (0.0022) +[2024-11-08 04:17:02,932][41694] Fps is (10 sec: 7372.3, 60 sec: 6758.3, 300 sec: 6803.5). Total num frames: 143278080. Throughput: 0: 1768.1. Samples: 30812126. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:17:02,935][41694] Avg episode reward: [(0, '4.353')] +[2024-11-08 04:17:05,905][42004] Updated weights for policy 0, policy_version 34986 (0.0031) +[2024-11-08 04:17:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7031.4, 300 sec: 6817.4). Total num frames: 143314944. Throughput: 0: 1776.3. Samples: 30823412. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:17:07,934][41694] Avg episode reward: [(0, '4.301')] +[2024-11-08 04:17:11,491][42004] Updated weights for policy 0, policy_version 34996 (0.0031) +[2024-11-08 04:17:12,931][41694] Fps is (10 sec: 7373.3, 60 sec: 7168.0, 300 sec: 6831.3). Total num frames: 143351808. Throughput: 0: 1773.9. Samples: 30834524. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:17:12,933][41694] Avg episode reward: [(0, '4.494')] +[2024-11-08 04:17:17,935][41694] Fps is (10 sec: 5323.1, 60 sec: 6826.3, 300 sec: 6747.9). Total num frames: 143368192. Throughput: 0: 1714.5. Samples: 30837748. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:17:17,937][41694] Avg episode reward: [(0, '4.477')] +[2024-11-08 04:17:19,805][42004] Updated weights for policy 0, policy_version 35006 (0.0028) +[2024-11-08 04:17:22,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6826.7, 300 sec: 6748.0). Total num frames: 143405056. Throughput: 0: 1645.9. Samples: 30845986. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:17:22,934][41694] Avg episode reward: [(0, '4.410')] +[2024-11-08 04:17:25,294][42004] Updated weights for policy 0, policy_version 35016 (0.0021) +[2024-11-08 04:17:27,931][41694] Fps is (10 sec: 7375.4, 60 sec: 6826.7, 300 sec: 6734.1). Total num frames: 143441920. Throughput: 0: 1730.9. Samples: 30856816. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:17:27,933][41694] Avg episode reward: [(0, '4.451')] +[2024-11-08 04:17:30,822][42004] Updated weights for policy 0, policy_version 35026 (0.0030) +[2024-11-08 04:17:32,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 143478784. Throughput: 0: 1752.2. Samples: 30862646. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:17:32,934][41694] Avg episode reward: [(0, '4.477')] +[2024-11-08 04:17:36,321][42004] Updated weights for policy 0, policy_version 35036 (0.0024) +[2024-11-08 04:17:37,933][41694] Fps is (10 sec: 7372.0, 60 sec: 6758.3, 300 sec: 6789.6). Total num frames: 143515648. Throughput: 0: 1746.5. Samples: 30873782. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:17:37,935][41694] Avg episode reward: [(0, '4.496')] +[2024-11-08 04:17:37,993][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000035039_143519744.pth... +[2024-11-08 04:17:38,090][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000034637_141873152.pth +[2024-11-08 04:17:41,911][42004] Updated weights for policy 0, policy_version 35046 (0.0032) +[2024-11-08 04:17:42,933][41694] Fps is (10 sec: 7371.8, 60 sec: 7031.6, 300 sec: 6803.5). Total num frames: 143552512. Throughput: 0: 1735.1. Samples: 30884832. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:17:42,935][41694] Avg episode reward: [(0, '4.327')] +[2024-11-08 04:17:47,261][42004] Updated weights for policy 0, policy_version 35056 (0.0045) +[2024-11-08 04:17:47,932][41694] Fps is (10 sec: 7783.2, 60 sec: 7168.0, 300 sec: 6831.3). Total num frames: 143593472. Throughput: 0: 1741.3. Samples: 30890484. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:17:47,933][41694] Avg episode reward: [(0, '4.100')] +[2024-11-08 04:17:52,932][41694] Fps is (10 sec: 6144.8, 60 sec: 6826.6, 300 sec: 6761.9). Total num frames: 143613952. Throughput: 0: 1660.0. Samples: 30898110. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:17:52,933][41694] Avg episode reward: [(0, '4.333')] +[2024-11-08 04:17:54,849][42004] Updated weights for policy 0, policy_version 35066 (0.0029) +[2024-11-08 04:17:57,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6826.7, 300 sec: 6775.8). Total num frames: 143650816. Throughput: 0: 1661.3. Samples: 30909282. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:17:57,933][41694] Avg episode reward: [(0, '4.589')] +[2024-11-08 04:18:00,169][42004] Updated weights for policy 0, policy_version 35076 (0.0036) +[2024-11-08 04:18:02,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.5, 300 sec: 6775.8). Total num frames: 143683584. Throughput: 0: 1719.8. Samples: 30915134. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:18:02,933][41694] Avg episode reward: [(0, '4.371')] +[2024-11-08 04:18:06,610][42004] Updated weights for policy 0, policy_version 35086 (0.0026) +[2024-11-08 04:18:07,933][41694] Fps is (10 sec: 6552.5, 60 sec: 6690.0, 300 sec: 6775.7). Total num frames: 143716352. Throughput: 0: 1744.7. Samples: 30924502. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:18:07,935][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 04:18:12,471][42004] Updated weights for policy 0, policy_version 35096 (0.0038) +[2024-11-08 04:18:12,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 6845.2). Total num frames: 143753216. Throughput: 0: 1735.4. Samples: 30934908. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:18:12,934][41694] Avg episode reward: [(0, '4.721')] +[2024-11-08 04:18:17,932][41694] Fps is (10 sec: 7373.8, 60 sec: 7031.8, 300 sec: 6859.1). Total num frames: 143790080. Throughput: 0: 1726.7. Samples: 30940350. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:18:17,934][41694] Avg episode reward: [(0, '4.318')] +[2024-11-08 04:18:18,054][42004] Updated weights for policy 0, policy_version 35106 (0.0025) +[2024-11-08 04:18:24,612][41694] Fps is (10 sec: 6312.2, 60 sec: 6839.9, 300 sec: 6834.0). Total num frames: 143826944. Throughput: 0: 1664.8. Samples: 30951494. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:18:24,615][41694] Avg episode reward: [(0, '4.325')] +[2024-11-08 04:18:25,449][42004] Updated weights for policy 0, policy_version 35116 (0.0038) +[2024-11-08 04:18:27,932][41694] Fps is (10 sec: 6144.1, 60 sec: 6826.7, 300 sec: 6845.2). Total num frames: 143851520. Throughput: 0: 1638.9. Samples: 30958580. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:18:27,934][41694] Avg episode reward: [(0, '4.315')] +[2024-11-08 04:18:31,412][42004] Updated weights for policy 0, policy_version 35126 (0.0031) +[2024-11-08 04:18:32,932][41694] Fps is (10 sec: 6891.9, 60 sec: 6758.3, 300 sec: 6831.3). Total num frames: 143884288. Throughput: 0: 1635.8. Samples: 30964096. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:18:32,935][41694] Avg episode reward: [(0, '4.547')] +[2024-11-08 04:18:37,206][42004] Updated weights for policy 0, policy_version 35136 (0.0034) +[2024-11-08 04:18:37,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.5, 300 sec: 6817.4). Total num frames: 143921152. Throughput: 0: 1704.4. Samples: 30974808. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:18:37,933][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 04:18:42,796][42004] Updated weights for policy 0, policy_version 35146 (0.0025) +[2024-11-08 04:18:42,931][41694] Fps is (10 sec: 7373.4, 60 sec: 6758.5, 300 sec: 6817.4). Total num frames: 143958016. Throughput: 0: 1696.6. Samples: 30985630. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:18:42,933][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 04:18:47,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6873.0). Total num frames: 143994880. Throughput: 0: 1686.8. Samples: 30991038. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:18:47,933][41694] Avg episode reward: [(0, '4.586')] +[2024-11-08 04:18:48,165][42004] Updated weights for policy 0, policy_version 35156 (0.0033) +[2024-11-08 04:18:52,932][41694] Fps is (10 sec: 7372.4, 60 sec: 6963.1, 300 sec: 6886.8). Total num frames: 144031744. Throughput: 0: 1729.2. Samples: 31002316. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:18:52,935][41694] Avg episode reward: [(0, '4.786')] +[2024-11-08 04:18:53,646][42004] Updated weights for policy 0, policy_version 35166 (0.0034) +[2024-11-08 04:18:58,741][41694] Fps is (10 sec: 6062.8, 60 sec: 6735.8, 300 sec: 6840.3). Total num frames: 144060416. Throughput: 0: 1600.5. Samples: 31008226. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:18:58,743][41694] Avg episode reward: [(0, '4.496')] +[2024-11-08 04:19:01,234][42004] Updated weights for policy 0, policy_version 35176 (0.0033) +[2024-11-08 04:19:02,932][41694] Fps is (10 sec: 5734.6, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 144089088. Throughput: 0: 1669.6. Samples: 31015482. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 04:19:02,935][41694] Avg episode reward: [(0, '4.436')] +[2024-11-08 04:19:06,892][42004] Updated weights for policy 0, policy_version 35186 (0.0027) +[2024-11-08 04:19:07,931][41694] Fps is (10 sec: 7130.8, 60 sec: 6826.9, 300 sec: 6817.4). Total num frames: 144125952. Throughput: 0: 1727.1. Samples: 31026312. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 04:19:07,933][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 04:19:12,868][42004] Updated weights for policy 0, policy_version 35196 (0.0035) +[2024-11-08 04:19:12,931][41694] Fps is (10 sec: 7373.1, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 144162816. Throughput: 0: 1737.9. Samples: 31036786. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 04:19:12,932][41694] Avg episode reward: [(0, '4.457')] +[2024-11-08 04:19:17,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6869.0). Total num frames: 144199680. Throughput: 0: 1731.3. Samples: 31042004. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 04:19:17,933][41694] Avg episode reward: [(0, '4.594')] +[2024-11-08 04:19:18,448][42004] Updated weights for policy 0, policy_version 35206 (0.0034) +[2024-11-08 04:19:22,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7023.4, 300 sec: 6886.9). Total num frames: 144236544. Throughput: 0: 1739.5. Samples: 31053086. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:19:22,933][41694] Avg episode reward: [(0, '4.537')] +[2024-11-08 04:19:23,945][42004] Updated weights for policy 0, policy_version 35216 (0.0031) +[2024-11-08 04:19:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 6886.8). Total num frames: 144273408. Throughput: 0: 1747.8. Samples: 31064280. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:19:27,933][41694] Avg episode reward: [(0, '4.675')] +[2024-11-08 04:19:29,628][42004] Updated weights for policy 0, policy_version 35226 (0.0026) +[2024-11-08 04:19:32,998][41694] Fps is (10 sec: 5696.4, 60 sec: 6819.2, 300 sec: 6815.9). Total num frames: 144293888. Throughput: 0: 1744.4. Samples: 31069654. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:19:33,000][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 04:19:37,174][42004] Updated weights for policy 0, policy_version 35236 (0.0025) +[2024-11-08 04:19:37,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 144330752. Throughput: 0: 1657.0. Samples: 31076878. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:19:37,933][41694] Avg episode reward: [(0, '4.435')] +[2024-11-08 04:19:37,943][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000035237_144330752.pth... +[2024-11-08 04:19:38,041][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000034835_142684160.pth +[2024-11-08 04:19:42,838][42004] Updated weights for policy 0, policy_version 35246 (0.0029) +[2024-11-08 04:19:42,932][41694] Fps is (10 sec: 7422.2, 60 sec: 6826.7, 300 sec: 6803.6). Total num frames: 144367616. Throughput: 0: 1809.9. Samples: 31088206. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:19:42,933][41694] Avg episode reward: [(0, '4.500')] +[2024-11-08 04:19:47,933][41694] Fps is (10 sec: 6552.9, 60 sec: 6690.0, 300 sec: 6775.7). Total num frames: 144396288. Throughput: 0: 1708.8. Samples: 31092380. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:19:47,934][41694] Avg episode reward: [(0, '4.617')] +[2024-11-08 04:19:49,031][42004] Updated weights for policy 0, policy_version 35256 (0.0041) +[2024-11-08 04:19:52,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6690.2, 300 sec: 6826.8). Total num frames: 144433152. Throughput: 0: 1703.3. Samples: 31102962. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:19:52,934][41694] Avg episode reward: [(0, '4.517')] +[2024-11-08 04:19:54,760][42004] Updated weights for policy 0, policy_version 35266 (0.0034) +[2024-11-08 04:19:57,932][41694] Fps is (10 sec: 7373.5, 60 sec: 6920.0, 300 sec: 6859.1). Total num frames: 144470016. Throughput: 0: 1722.3. Samples: 31114288. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:19:57,933][41694] Avg episode reward: [(0, '4.353')] +[2024-11-08 04:20:00,517][42004] Updated weights for policy 0, policy_version 35276 (0.0022) +[2024-11-08 04:20:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6859.1). Total num frames: 144506880. Throughput: 0: 1717.6. Samples: 31119296. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:20:02,933][41694] Avg episode reward: [(0, '4.458')] +[2024-11-08 04:20:07,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6690.1, 300 sec: 6803.5). Total num frames: 144527360. Throughput: 0: 1675.4. Samples: 31128478. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:20:07,933][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 04:20:08,272][42004] Updated weights for policy 0, policy_version 35286 (0.0034) +[2024-11-08 04:20:12,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6690.1, 300 sec: 6817.4). Total num frames: 144564224. Throughput: 0: 1623.4. Samples: 31137332. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:20:12,934][41694] Avg episode reward: [(0, '4.634')] +[2024-11-08 04:20:13,922][42004] Updated weights for policy 0, policy_version 35296 (0.0036) +[2024-11-08 04:20:17,933][41694] Fps is (10 sec: 6553.2, 60 sec: 6553.5, 300 sec: 6789.6). Total num frames: 144592896. Throughput: 0: 1615.1. Samples: 31142228. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:20:17,934][41694] Avg episode reward: [(0, '4.321')] +[2024-11-08 04:20:20,947][42004] Updated weights for policy 0, policy_version 35306 (0.0032) +[2024-11-08 04:20:22,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6485.3, 300 sec: 6775.8). Total num frames: 144625664. Throughput: 0: 1647.7. Samples: 31151024. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:20:22,934][41694] Avg episode reward: [(0, '4.365')] +[2024-11-08 04:20:26,583][42004] Updated weights for policy 0, policy_version 35316 (0.0037) +[2024-11-08 04:20:27,932][41694] Fps is (10 sec: 6963.6, 60 sec: 6485.3, 300 sec: 6814.2). Total num frames: 144662528. Throughput: 0: 1639.4. Samples: 31161980. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:20:27,933][41694] Avg episode reward: [(0, '4.638')] +[2024-11-08 04:20:32,472][42004] Updated weights for policy 0, policy_version 35326 (0.0025) +[2024-11-08 04:20:32,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6697.6, 300 sec: 6817.5). Total num frames: 144695296. Throughput: 0: 1661.0. Samples: 31167124. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:20:32,933][41694] Avg episode reward: [(0, '4.459')] +[2024-11-08 04:20:37,884][42004] Updated weights for policy 0, policy_version 35336 (0.0032) +[2024-11-08 04:20:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 144736256. Throughput: 0: 1667.4. Samples: 31177996. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:20:37,934][41694] Avg episode reward: [(0, '4.448')] +[2024-11-08 04:20:42,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6417.1, 300 sec: 6761.9). Total num frames: 144752640. Throughput: 0: 1575.2. Samples: 31185170. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:20:42,933][41694] Avg episode reward: [(0, '4.310')] +[2024-11-08 04:20:45,913][42004] Updated weights for policy 0, policy_version 35346 (0.0025) +[2024-11-08 04:20:47,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6553.7, 300 sec: 6748.0). Total num frames: 144789504. Throughput: 0: 1576.4. Samples: 31190234. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:20:47,933][41694] Avg episode reward: [(0, '4.440')] +[2024-11-08 04:20:51,844][42004] Updated weights for policy 0, policy_version 35356 (0.0028) +[2024-11-08 04:20:52,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6485.3, 300 sec: 6734.1). Total num frames: 144822272. Throughput: 0: 1604.9. Samples: 31200700. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:20:52,933][41694] Avg episode reward: [(0, '4.567')] +[2024-11-08 04:20:57,691][42004] Updated weights for policy 0, policy_version 35366 (0.0036) +[2024-11-08 04:20:57,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6485.3, 300 sec: 6734.1). Total num frames: 144859136. Throughput: 0: 1637.9. Samples: 31211038. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:20:57,938][41694] Avg episode reward: [(0, '4.497')] +[2024-11-08 04:21:02,932][41694] Fps is (10 sec: 6962.7, 60 sec: 6417.0, 300 sec: 6775.7). Total num frames: 144891904. Throughput: 0: 1646.8. Samples: 31216336. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:21:02,934][41694] Avg episode reward: [(0, '4.454')] +[2024-11-08 04:21:03,614][42004] Updated weights for policy 0, policy_version 35376 (0.0043) +[2024-11-08 04:21:07,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 6803.5). Total num frames: 144928768. Throughput: 0: 1685.7. Samples: 31226882. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:21:07,933][41694] Avg episode reward: [(0, '4.446')] +[2024-11-08 04:21:09,634][42004] Updated weights for policy 0, policy_version 35386 (0.0025) +[2024-11-08 04:21:12,932][41694] Fps is (10 sec: 6963.4, 60 sec: 6621.8, 300 sec: 6789.6). Total num frames: 144961536. Throughput: 0: 1657.1. Samples: 31236548. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:21:12,935][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 04:21:17,845][42004] Updated weights for policy 0, policy_version 35396 (0.0029) +[2024-11-08 04:21:17,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6485.4, 300 sec: 6734.1). Total num frames: 144982016. Throughput: 0: 1601.8. Samples: 31239204. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:21:17,933][41694] Avg episode reward: [(0, '4.312')] +[2024-11-08 04:21:22,931][41694] Fps is (10 sec: 5734.6, 60 sec: 6553.6, 300 sec: 6734.1). Total num frames: 145018880. Throughput: 0: 1575.8. Samples: 31248908. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:21:22,934][41694] Avg episode reward: [(0, '4.510')] +[2024-11-08 04:21:23,361][42004] Updated weights for policy 0, policy_version 35406 (0.0038) +[2024-11-08 04:21:27,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6485.3, 300 sec: 6706.3). Total num frames: 145051648. Throughput: 0: 1647.1. Samples: 31259288. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:21:27,934][41694] Avg episode reward: [(0, '4.450')] +[2024-11-08 04:21:29,562][42004] Updated weights for policy 0, policy_version 35416 (0.0034) +[2024-11-08 04:21:32,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6485.3, 300 sec: 6692.4). Total num frames: 145084416. Throughput: 0: 1648.2. Samples: 31264404. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:21:32,933][41694] Avg episode reward: [(0, '4.454')] +[2024-11-08 04:21:35,262][42004] Updated weights for policy 0, policy_version 35426 (0.0029) +[2024-11-08 04:21:37,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6417.1, 300 sec: 6748.0). Total num frames: 145121280. Throughput: 0: 1656.8. Samples: 31275258. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:21:37,934][41694] Avg episode reward: [(0, '4.388')] +[2024-11-08 04:21:38,022][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000035431_145125376.pth... +[2024-11-08 04:21:38,141][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000035039_143519744.pth +[2024-11-08 04:21:40,990][42004] Updated weights for policy 0, policy_version 35436 (0.0027) +[2024-11-08 04:21:42,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 145158144. Throughput: 0: 1667.2. Samples: 31286060. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:21:42,933][41694] Avg episode reward: [(0, '4.567')] +[2024-11-08 04:21:46,535][42004] Updated weights for policy 0, policy_version 35446 (0.0027) +[2024-11-08 04:21:47,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 145195008. Throughput: 0: 1669.0. Samples: 31291438. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:21:47,935][41694] Avg episode reward: [(0, '4.758')] +[2024-11-08 04:21:52,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6553.6, 300 sec: 6692.4). Total num frames: 145215488. Throughput: 0: 1592.1. Samples: 31298526. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:21:52,935][41694] Avg episode reward: [(0, '4.586')] +[2024-11-08 04:21:54,487][42004] Updated weights for policy 0, policy_version 35456 (0.0040) +[2024-11-08 04:21:57,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6485.4, 300 sec: 6678.6). Total num frames: 145248256. Throughput: 0: 1611.7. Samples: 31309072. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:21:57,933][41694] Avg episode reward: [(0, '4.412')] +[2024-11-08 04:22:00,455][42004] Updated weights for policy 0, policy_version 35466 (0.0035) +[2024-11-08 04:22:02,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6485.4, 300 sec: 6664.7). Total num frames: 145281024. Throughput: 0: 1658.0. Samples: 31313814. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:22:02,933][41694] Avg episode reward: [(0, '4.547')] +[2024-11-08 04:22:06,557][42004] Updated weights for policy 0, policy_version 35476 (0.0036) +[2024-11-08 04:22:07,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6485.3, 300 sec: 6664.7). Total num frames: 145317888. Throughput: 0: 1669.8. Samples: 31324050. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:22:07,933][41694] Avg episode reward: [(0, '4.625')] +[2024-11-08 04:22:12,024][42004] Updated weights for policy 0, policy_version 35486 (0.0033) +[2024-11-08 04:22:12,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6553.6, 300 sec: 6734.2). Total num frames: 145354752. Throughput: 0: 1689.2. Samples: 31335304. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:22:12,933][41694] Avg episode reward: [(0, '4.725')] +[2024-11-08 04:22:17,470][42004] Updated weights for policy 0, policy_version 35496 (0.0027) +[2024-11-08 04:22:17,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6734.1). Total num frames: 145391616. Throughput: 0: 1693.8. Samples: 31340624. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:22:17,933][41694] Avg episode reward: [(0, '4.493')] +[2024-11-08 04:22:24,478][41694] Fps is (10 sec: 6030.6, 60 sec: 6588.6, 300 sec: 6685.2). Total num frames: 145424384. Throughput: 0: 1642.1. Samples: 31351690. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:22:24,480][41694] Avg episode reward: [(0, '4.426')] +[2024-11-08 04:22:25,223][42004] Updated weights for policy 0, policy_version 35506 (0.0024) +[2024-11-08 04:22:27,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6621.8, 300 sec: 6678.6). Total num frames: 145448960. Throughput: 0: 1624.6. Samples: 31359168. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:22:27,933][41694] Avg episode reward: [(0, '4.601')] +[2024-11-08 04:22:30,654][42004] Updated weights for policy 0, policy_version 35516 (0.0045) +[2024-11-08 04:22:32,931][41694] Fps is (10 sec: 7267.8, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 145485824. Throughput: 0: 1631.1. Samples: 31364838. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:22:32,934][41694] Avg episode reward: [(0, '4.379')] +[2024-11-08 04:22:36,802][42004] Updated weights for policy 0, policy_version 35526 (0.0044) +[2024-11-08 04:22:37,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6621.8, 300 sec: 6664.7). Total num frames: 145518592. Throughput: 0: 1690.9. Samples: 31374618. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:22:37,934][41694] Avg episode reward: [(0, '4.598')] +[2024-11-08 04:22:42,550][42004] Updated weights for policy 0, policy_version 35536 (0.0038) +[2024-11-08 04:22:42,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 145555456. Throughput: 0: 1696.9. Samples: 31385432. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:22:42,933][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 04:22:47,932][41694] Fps is (10 sec: 7373.1, 60 sec: 6621.9, 300 sec: 6706.3). Total num frames: 145592320. Throughput: 0: 1719.2. Samples: 31391180. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:22:47,933][41694] Avg episode reward: [(0, '4.509')] +[2024-11-08 04:22:47,971][42004] Updated weights for policy 0, policy_version 35546 (0.0024) +[2024-11-08 04:22:52,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6720.2). Total num frames: 145633280. Throughput: 0: 1740.7. Samples: 31402380. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:22:52,933][41694] Avg episode reward: [(0, '4.326')] +[2024-11-08 04:22:53,323][42004] Updated weights for policy 0, policy_version 35556 (0.0032) +[2024-11-08 04:22:58,730][41694] Fps is (10 sec: 6448.5, 60 sec: 6804.4, 300 sec: 6688.2). Total num frames: 145661952. Throughput: 0: 1592.5. Samples: 31408240. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:22:58,732][41694] Avg episode reward: [(0, '4.347')] +[2024-11-08 04:23:00,948][42004] Updated weights for policy 0, policy_version 35566 (0.0026) +[2024-11-08 04:23:02,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6826.7, 300 sec: 6692.5). Total num frames: 145690624. Throughput: 0: 1660.2. Samples: 31415332. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:23:02,933][41694] Avg episode reward: [(0, '4.449')] +[2024-11-08 04:23:07,143][42004] Updated weights for policy 0, policy_version 35576 (0.0036) +[2024-11-08 04:23:07,931][41694] Fps is (10 sec: 6677.2, 60 sec: 6758.4, 300 sec: 6678.6). Total num frames: 145723392. Throughput: 0: 1701.5. Samples: 31425628. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:23:07,933][41694] Avg episode reward: [(0, '4.500')] +[2024-11-08 04:23:12,839][42004] Updated weights for policy 0, policy_version 35586 (0.0031) +[2024-11-08 04:23:12,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6678.6). Total num frames: 145760256. Throughput: 0: 1709.1. Samples: 31436076. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:23:12,937][41694] Avg episode reward: [(0, '4.632')] +[2024-11-08 04:23:17,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6716.8). Total num frames: 145797120. Throughput: 0: 1701.1. Samples: 31441386. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:23:17,934][41694] Avg episode reward: [(0, '4.652')] +[2024-11-08 04:23:18,252][42004] Updated weights for policy 0, policy_version 35596 (0.0026) +[2024-11-08 04:23:22,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7007.2, 300 sec: 6720.2). Total num frames: 145833984. Throughput: 0: 1741.8. Samples: 31453000. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:23:22,934][41694] Avg episode reward: [(0, '4.509')] +[2024-11-08 04:23:23,639][42004] Updated weights for policy 0, policy_version 35606 (0.0032) +[2024-11-08 04:23:27,932][41694] Fps is (10 sec: 7782.2, 60 sec: 7099.7, 300 sec: 6748.0). Total num frames: 145874944. Throughput: 0: 1757.0. Samples: 31464496. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:23:27,934][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 04:23:29,086][42004] Updated weights for policy 0, policy_version 35616 (0.0031) +[2024-11-08 04:23:32,968][41694] Fps is (10 sec: 5713.7, 60 sec: 6754.3, 300 sec: 6677.7). Total num frames: 145891328. Throughput: 0: 1744.2. Samples: 31469732. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:23:32,969][41694] Avg episode reward: [(0, '4.543')] +[2024-11-08 04:23:36,698][42004] Updated weights for policy 0, policy_version 35626 (0.0021) +[2024-11-08 04:23:37,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6895.0, 300 sec: 6692.4). Total num frames: 145932288. Throughput: 0: 1666.3. Samples: 31477366. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:23:37,934][41694] Avg episode reward: [(0, '4.233')] +[2024-11-08 04:23:37,951][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000035628_145932288.pth... +[2024-11-08 04:23:38,205][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000035237_144330752.pth +[2024-11-08 04:23:42,932][41694] Fps is (10 sec: 6988.5, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 145960960. Throughput: 0: 1781.2. Samples: 31486974. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:23:42,934][41694] Avg episode reward: [(0, '4.265')] +[2024-11-08 04:23:43,087][42004] Updated weights for policy 0, policy_version 35636 (0.0045) +[2024-11-08 04:23:47,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 145997824. Throughput: 0: 1705.2. Samples: 31492068. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:23:47,933][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 04:23:48,560][42004] Updated weights for policy 0, policy_version 35646 (0.0038) +[2024-11-08 04:23:52,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.4, 300 sec: 6724.8). Total num frames: 146038784. Throughput: 0: 1736.0. Samples: 31503748. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:23:52,934][41694] Avg episode reward: [(0, '4.453')] +[2024-11-08 04:23:53,876][42004] Updated weights for policy 0, policy_version 35656 (0.0033) +[2024-11-08 04:23:57,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6918.7, 300 sec: 6720.2). Total num frames: 146071552. Throughput: 0: 1738.6. Samples: 31514312. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:23:57,933][41694] Avg episode reward: [(0, '4.523')] +[2024-11-08 04:23:59,729][42004] Updated weights for policy 0, policy_version 35666 (0.0029) +[2024-11-08 04:24:02,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6963.1, 300 sec: 6720.2). Total num frames: 146108416. Throughput: 0: 1746.4. Samples: 31519976. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:24:02,934][41694] Avg episode reward: [(0, '4.290')] +[2024-11-08 04:24:07,378][42004] Updated weights for policy 0, policy_version 35676 (0.0027) +[2024-11-08 04:24:07,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6826.6, 300 sec: 6678.6). Total num frames: 146132992. Throughput: 0: 1686.8. Samples: 31528906. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:24:07,933][41694] Avg episode reward: [(0, '4.606')] +[2024-11-08 04:24:12,906][42004] Updated weights for policy 0, policy_version 35686 (0.0026) +[2024-11-08 04:24:12,932][41694] Fps is (10 sec: 6144.2, 60 sec: 6826.7, 300 sec: 6678.6). Total num frames: 146169856. Throughput: 0: 1638.4. Samples: 31538226. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:24:12,933][41694] Avg episode reward: [(0, '4.327')] +[2024-11-08 04:24:17,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 146202624. Throughput: 0: 1644.7. Samples: 31543684. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:24:17,934][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 04:24:18,714][42004] Updated weights for policy 0, policy_version 35696 (0.0025) +[2024-11-08 04:24:22,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 146239488. Throughput: 0: 1702.2. Samples: 31553966. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:24:22,933][41694] Avg episode reward: [(0, '4.444')] +[2024-11-08 04:24:24,506][42004] Updated weights for policy 0, policy_version 35706 (0.0035) +[2024-11-08 04:24:27,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6621.9, 300 sec: 6707.8). Total num frames: 146272256. Throughput: 0: 1732.8. Samples: 31564952. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:24:27,933][41694] Avg episode reward: [(0, '4.417')] +[2024-11-08 04:24:30,628][42004] Updated weights for policy 0, policy_version 35716 (0.0045) +[2024-11-08 04:24:32,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6967.4, 300 sec: 6706.3). Total num frames: 146309120. Throughput: 0: 1725.5. Samples: 31569716. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:24:32,933][41694] Avg episode reward: [(0, '4.255')] +[2024-11-08 04:24:35,954][42004] Updated weights for policy 0, policy_version 35726 (0.0030) +[2024-11-08 04:24:37,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6895.0, 300 sec: 6706.3). Total num frames: 146345984. Throughput: 0: 1719.2. Samples: 31581110. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:24:37,933][41694] Avg episode reward: [(0, '4.543')] +[2024-11-08 04:24:42,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6826.7, 300 sec: 6692.5). Total num frames: 146370560. Throughput: 0: 1651.6. Samples: 31588636. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:24:42,933][41694] Avg episode reward: [(0, '4.442')] +[2024-11-08 04:24:43,373][42004] Updated weights for policy 0, policy_version 35736 (0.0031) +[2024-11-08 04:24:47,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6678.6). Total num frames: 146403328. Throughput: 0: 1645.7. Samples: 31594032. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:24:47,933][41694] Avg episode reward: [(0, '4.493')] +[2024-11-08 04:24:49,180][42004] Updated weights for policy 0, policy_version 35746 (0.0029) +[2024-11-08 04:24:52,931][41694] Fps is (10 sec: 6963.1, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 146440192. Throughput: 0: 1677.3. Samples: 31604386. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:24:52,934][41694] Avg episode reward: [(0, '4.575')] +[2024-11-08 04:24:54,971][42004] Updated weights for policy 0, policy_version 35756 (0.0024) +[2024-11-08 04:24:57,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6678.6). Total num frames: 146477056. Throughput: 0: 1718.3. Samples: 31615550. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:24:57,935][41694] Avg episode reward: [(0, '4.583')] +[2024-11-08 04:25:00,368][42004] Updated weights for policy 0, policy_version 35766 (0.0039) +[2024-11-08 04:25:02,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 146513920. Throughput: 0: 1723.9. Samples: 31621258. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:25:02,934][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 04:25:06,204][42004] Updated weights for policy 0, policy_version 35776 (0.0026) +[2024-11-08 04:25:07,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 6734.1). Total num frames: 146550784. Throughput: 0: 1731.7. Samples: 31631890. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:25:07,933][41694] Avg episode reward: [(0, '4.492')] +[2024-11-08 04:25:11,860][42004] Updated weights for policy 0, policy_version 35786 (0.0028) +[2024-11-08 04:25:12,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6894.9, 300 sec: 6748.0). Total num frames: 146583552. Throughput: 0: 1728.8. Samples: 31642748. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:25:12,934][41694] Avg episode reward: [(0, '4.368')] +[2024-11-08 04:25:17,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6690.1, 300 sec: 6706.3). Total num frames: 146604032. Throughput: 0: 1659.4. Samples: 31644388. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:25:17,933][41694] Avg episode reward: [(0, '4.396')] +[2024-11-08 04:25:20,265][42004] Updated weights for policy 0, policy_version 35796 (0.0031) +[2024-11-08 04:25:22,934][41694] Fps is (10 sec: 5323.9, 60 sec: 6621.7, 300 sec: 6692.4). Total num frames: 146636800. Throughput: 0: 1619.7. Samples: 31654002. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:25:22,935][41694] Avg episode reward: [(0, '4.187')] +[2024-11-08 04:25:26,369][42004] Updated weights for policy 0, policy_version 35806 (0.0026) +[2024-11-08 04:25:27,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6621.9, 300 sec: 6692.4). Total num frames: 146669568. Throughput: 0: 1674.3. Samples: 31663978. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:25:27,934][41694] Avg episode reward: [(0, '4.272')] +[2024-11-08 04:25:31,920][42004] Updated weights for policy 0, policy_version 35816 (0.0025) +[2024-11-08 04:25:32,931][41694] Fps is (10 sec: 7374.3, 60 sec: 6690.1, 300 sec: 6692.5). Total num frames: 146710528. Throughput: 0: 1675.7. Samples: 31669438. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:25:32,934][41694] Avg episode reward: [(0, '4.398')] +[2024-11-08 04:25:37,167][42004] Updated weights for policy 0, policy_version 35826 (0.0035) +[2024-11-08 04:25:37,932][41694] Fps is (10 sec: 7782.1, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 146747392. Throughput: 0: 1707.1. Samples: 31681208. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:25:37,936][41694] Avg episode reward: [(0, '4.456')] +[2024-11-08 04:25:37,954][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000035827_146747392.pth... +[2024-11-08 04:25:38,066][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000035431_145125376.pth +[2024-11-08 04:25:42,465][42004] Updated weights for policy 0, policy_version 35836 (0.0025) +[2024-11-08 04:25:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6761.9). Total num frames: 146784256. Throughput: 0: 1714.5. Samples: 31692700. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:25:42,933][41694] Avg episode reward: [(0, '4.230')] +[2024-11-08 04:25:49,238][41694] Fps is (10 sec: 6159.0, 60 sec: 6748.0, 300 sec: 6732.1). Total num frames: 146817024. Throughput: 0: 1658.7. Samples: 31698068. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:25:49,241][41694] Avg episode reward: [(0, '4.250')] +[2024-11-08 04:25:49,979][42004] Updated weights for policy 0, policy_version 35846 (0.0038) +[2024-11-08 04:25:52,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 146845696. Throughput: 0: 1642.3. Samples: 31705794. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:25:52,937][41694] Avg episode reward: [(0, '4.500')] +[2024-11-08 04:25:55,785][42004] Updated weights for policy 0, policy_version 35856 (0.0030) +[2024-11-08 04:25:57,932][41694] Fps is (10 sec: 7067.1, 60 sec: 6690.1, 300 sec: 6734.1). Total num frames: 146878464. Throughput: 0: 1630.9. Samples: 31716140. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:25:57,933][41694] Avg episode reward: [(0, '4.531')] +[2024-11-08 04:26:02,085][42004] Updated weights for policy 0, policy_version 35866 (0.0026) +[2024-11-08 04:26:02,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 146911232. Throughput: 0: 1702.8. Samples: 31721012. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:26:02,934][41694] Avg episode reward: [(0, '4.305')] +[2024-11-08 04:26:07,599][42004] Updated weights for policy 0, policy_version 35876 (0.0030) +[2024-11-08 04:26:07,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.8, 300 sec: 6734.1). Total num frames: 146948096. Throughput: 0: 1725.4. Samples: 31731640. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:26:07,935][41694] Avg episode reward: [(0, '4.613')] +[2024-11-08 04:26:12,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.2, 300 sec: 6789.6). Total num frames: 146984960. Throughput: 0: 1741.8. Samples: 31742358. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:26:12,933][41694] Avg episode reward: [(0, '4.506')] +[2024-11-08 04:26:13,371][42004] Updated weights for policy 0, policy_version 35886 (0.0030) +[2024-11-08 04:26:17,933][41694] Fps is (10 sec: 7372.0, 60 sec: 6963.1, 300 sec: 6789.6). Total num frames: 147021824. Throughput: 0: 1738.0. Samples: 31747650. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:26:17,935][41694] Avg episode reward: [(0, '4.567')] +[2024-11-08 04:26:18,886][42004] Updated weights for policy 0, policy_version 35896 (0.0029) +[2024-11-08 04:26:23,495][41694] Fps is (10 sec: 5816.2, 60 sec: 6763.4, 300 sec: 6749.0). Total num frames: 147046400. Throughput: 0: 1701.8. Samples: 31758746. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:26:23,497][41694] Avg episode reward: [(0, '4.365')] +[2024-11-08 04:26:26,873][42004] Updated weights for policy 0, policy_version 35906 (0.0026) +[2024-11-08 04:26:27,932][41694] Fps is (10 sec: 5325.4, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 147075072. Throughput: 0: 1622.1. Samples: 31765696. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:26:27,933][41694] Avg episode reward: [(0, '4.641')] +[2024-11-08 04:26:32,858][42004] Updated weights for policy 0, policy_version 35916 (0.0038) +[2024-11-08 04:26:32,932][41694] Fps is (10 sec: 6945.0, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 147111936. Throughput: 0: 1664.5. Samples: 31770796. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:26:32,933][41694] Avg episode reward: [(0, '4.539')] +[2024-11-08 04:26:37,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.2, 300 sec: 6748.0). Total num frames: 147148800. Throughput: 0: 1677.8. Samples: 31781294. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:26:37,934][41694] Avg episode reward: [(0, '4.491')] +[2024-11-08 04:26:38,463][42004] Updated weights for policy 0, policy_version 35926 (0.0028) +[2024-11-08 04:26:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 147185664. Throughput: 0: 1695.3. Samples: 31792430. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:26:42,933][41694] Avg episode reward: [(0, '4.337')] +[2024-11-08 04:26:43,840][42004] Updated weights for policy 0, policy_version 35936 (0.0026) +[2024-11-08 04:26:47,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6908.8, 300 sec: 6803.5). Total num frames: 147222528. Throughput: 0: 1713.6. Samples: 31798124. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:26:47,933][41694] Avg episode reward: [(0, '4.437')] +[2024-11-08 04:26:49,399][42004] Updated weights for policy 0, policy_version 35946 (0.0026) +[2024-11-08 04:26:52,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6894.9, 300 sec: 6817.4). Total num frames: 147259392. Throughput: 0: 1722.9. Samples: 31809172. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:26:52,934][41694] Avg episode reward: [(0, '4.536')] +[2024-11-08 04:26:55,091][42004] Updated weights for policy 0, policy_version 35956 (0.0034) +[2024-11-08 04:26:57,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 147279872. Throughput: 0: 1675.6. Samples: 31817762. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:26:57,934][41694] Avg episode reward: [(0, '4.519')] +[2024-11-08 04:27:02,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 147312640. Throughput: 0: 1650.3. Samples: 31821912. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:27:02,938][41694] Avg episode reward: [(0, '4.378')] +[2024-11-08 04:27:02,994][42004] Updated weights for policy 0, policy_version 35966 (0.0043) +[2024-11-08 04:27:07,932][41694] Fps is (10 sec: 6962.7, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 147349504. Throughput: 0: 1636.6. Samples: 31831472. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:27:07,937][41694] Avg episode reward: [(0, '4.684')] +[2024-11-08 04:27:09,043][42004] Updated weights for policy 0, policy_version 35976 (0.0040) +[2024-11-08 04:27:12,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 147386368. Throughput: 0: 1703.5. Samples: 31842354. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:27:12,935][41694] Avg episode reward: [(0, '4.392')] +[2024-11-08 04:27:14,589][42004] Updated weights for policy 0, policy_version 35986 (0.0032) +[2024-11-08 04:27:17,932][41694] Fps is (10 sec: 7373.3, 60 sec: 6690.3, 300 sec: 6811.5). Total num frames: 147423232. Throughput: 0: 1718.1. Samples: 31848110. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:27:17,933][41694] Avg episode reward: [(0, '4.465')] +[2024-11-08 04:27:19,991][42004] Updated weights for policy 0, policy_version 35996 (0.0030) +[2024-11-08 04:27:22,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6960.3, 300 sec: 6817.4). Total num frames: 147460096. Throughput: 0: 1736.4. Samples: 31859432. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:27:22,933][41694] Avg episode reward: [(0, '4.356')] +[2024-11-08 04:27:25,621][42004] Updated weights for policy 0, policy_version 36006 (0.0036) +[2024-11-08 04:27:27,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6963.2, 300 sec: 6803.5). Total num frames: 147492864. Throughput: 0: 1716.8. Samples: 31869688. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:27:27,935][41694] Avg episode reward: [(0, '4.557')] +[2024-11-08 04:27:32,931][41694] Fps is (10 sec: 5325.0, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 147513344. Throughput: 0: 1686.1. Samples: 31873998. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:27:32,936][41694] Avg episode reward: [(0, '4.489')] +[2024-11-08 04:27:34,181][42004] Updated weights for policy 0, policy_version 36016 (0.0033) +[2024-11-08 04:27:37,932][41694] Fps is (10 sec: 4915.2, 60 sec: 6553.6, 300 sec: 6734.1). Total num frames: 147542016. Throughput: 0: 1589.4. Samples: 31880696. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:27:37,935][41694] Avg episode reward: [(0, '4.237')] +[2024-11-08 04:27:37,948][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000036021_147542016.pth... +[2024-11-08 04:27:38,077][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000035628_145932288.pth +[2024-11-08 04:27:40,582][42004] Updated weights for policy 0, policy_version 36026 (0.0033) +[2024-11-08 04:27:42,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6485.3, 300 sec: 6720.2). Total num frames: 147574784. Throughput: 0: 1615.2. Samples: 31890446. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:27:42,934][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 04:27:46,162][42004] Updated weights for policy 0, policy_version 36036 (0.0023) +[2024-11-08 04:27:47,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 147615744. Throughput: 0: 1650.2. Samples: 31896172. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:27:47,933][41694] Avg episode reward: [(0, '4.463')] +[2024-11-08 04:27:51,537][42004] Updated weights for policy 0, policy_version 36046 (0.0036) +[2024-11-08 04:27:52,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6553.6, 300 sec: 6766.3). Total num frames: 147652608. Throughput: 0: 1692.5. Samples: 31907632. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:27:52,935][41694] Avg episode reward: [(0, '4.347')] +[2024-11-08 04:27:57,381][42004] Updated weights for policy 0, policy_version 36056 (0.0040) +[2024-11-08 04:27:57,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 147685376. Throughput: 0: 1687.7. Samples: 31918300. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:27:57,934][41694] Avg episode reward: [(0, '4.544')] +[2024-11-08 04:28:02,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6775.8). Total num frames: 147722240. Throughput: 0: 1673.2. Samples: 31923402. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:28:02,938][41694] Avg episode reward: [(0, '4.414')] +[2024-11-08 04:28:03,359][42004] Updated weights for policy 0, policy_version 36066 (0.0026) +[2024-11-08 04:28:07,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6553.7, 300 sec: 6720.2). Total num frames: 147742720. Throughput: 0: 1579.9. Samples: 31930528. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:28:07,934][41694] Avg episode reward: [(0, '4.571')] +[2024-11-08 04:28:11,364][42004] Updated weights for policy 0, policy_version 36076 (0.0038) +[2024-11-08 04:28:12,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6485.3, 300 sec: 6706.3). Total num frames: 147775488. Throughput: 0: 1570.6. Samples: 31940366. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:28:12,933][41694] Avg episode reward: [(0, '4.459')] +[2024-11-08 04:28:17,506][42004] Updated weights for policy 0, policy_version 36086 (0.0037) +[2024-11-08 04:28:17,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6692.5). Total num frames: 147808256. Throughput: 0: 1583.8. Samples: 31945270. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:28:17,933][41694] Avg episode reward: [(0, '4.628')] +[2024-11-08 04:28:22,719][42004] Updated weights for policy 0, policy_version 36096 (0.0025) +[2024-11-08 04:28:22,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6485.4, 300 sec: 6692.5). Total num frames: 147849216. Throughput: 0: 1681.4. Samples: 31956358. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:28:22,934][41694] Avg episode reward: [(0, '4.700')] +[2024-11-08 04:28:27,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6553.6, 300 sec: 6762.7). Total num frames: 147886080. Throughput: 0: 1726.1. Samples: 31968122. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:28:27,933][41694] Avg episode reward: [(0, '4.506')] +[2024-11-08 04:28:27,980][42004] Updated weights for policy 0, policy_version 36106 (0.0033) +[2024-11-08 04:28:32,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6748.0). Total num frames: 147922944. Throughput: 0: 1717.4. Samples: 31973454. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:28:32,933][41694] Avg episode reward: [(0, '4.601')] +[2024-11-08 04:28:33,476][42004] Updated weights for policy 0, policy_version 36116 (0.0028) +[2024-11-08 04:28:37,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6775.8). Total num frames: 147959808. Throughput: 0: 1710.8. Samples: 31984618. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:28:37,933][41694] Avg episode reward: [(0, '4.463')] +[2024-11-08 04:28:41,155][42004] Updated weights for policy 0, policy_version 36126 (0.0031) +[2024-11-08 04:28:42,934][41694] Fps is (10 sec: 5733.1, 60 sec: 6758.2, 300 sec: 6720.2). Total num frames: 147980288. Throughput: 0: 1634.6. Samples: 31991860. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:28:42,936][41694] Avg episode reward: [(0, '4.557')] +[2024-11-08 04:28:47,549][42004] Updated weights for policy 0, policy_version 36136 (0.0040) +[2024-11-08 04:28:47,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.9, 300 sec: 6692.4). Total num frames: 148013056. Throughput: 0: 1628.8. Samples: 31996698. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:28:47,933][41694] Avg episode reward: [(0, '4.521')] +[2024-11-08 04:28:52,933][41694] Fps is (10 sec: 6963.6, 60 sec: 6621.7, 300 sec: 6706.3). Total num frames: 148049920. Throughput: 0: 1693.9. Samples: 32006758. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:28:52,935][41694] Avg episode reward: [(0, '4.633')] +[2024-11-08 04:28:53,218][42004] Updated weights for policy 0, policy_version 36146 (0.0036) +[2024-11-08 04:28:57,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6690.1, 300 sec: 6706.3). Total num frames: 148086784. Throughput: 0: 1728.0. Samples: 32018126. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:28:57,934][41694] Avg episode reward: [(0, '4.386')] +[2024-11-08 04:28:58,592][42004] Updated weights for policy 0, policy_version 36156 (0.0026) +[2024-11-08 04:29:02,931][41694] Fps is (10 sec: 7374.1, 60 sec: 6690.2, 300 sec: 6748.0). Total num frames: 148123648. Throughput: 0: 1743.5. Samples: 32023726. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:29:02,933][41694] Avg episode reward: [(0, '4.373')] +[2024-11-08 04:29:04,531][42004] Updated weights for policy 0, policy_version 36166 (0.0026) +[2024-11-08 04:29:07,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6963.1, 300 sec: 6748.0). Total num frames: 148160512. Throughput: 0: 1729.9. Samples: 32034204. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:29:07,936][41694] Avg episode reward: [(0, '4.508')] +[2024-11-08 04:29:10,098][42004] Updated weights for policy 0, policy_version 36176 (0.0027) +[2024-11-08 04:29:13,330][41694] Fps is (10 sec: 6696.4, 60 sec: 6917.3, 300 sec: 6738.9). Total num frames: 148193280. Throughput: 0: 1699.4. Samples: 32045274. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:29:13,332][41694] Avg episode reward: [(0, '4.598')] +[2024-11-08 04:29:16,631][42004] Updated weights for policy 0, policy_version 36186 (0.0029) +[2024-11-08 04:29:17,935][41694] Fps is (10 sec: 6551.4, 60 sec: 6962.7, 300 sec: 6734.0). Total num frames: 148226048. Throughput: 0: 1675.8. Samples: 32048870. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:29:17,940][41694] Avg episode reward: [(0, '4.446')] +[2024-11-08 04:29:22,548][42004] Updated weights for policy 0, policy_version 36196 (0.0032) +[2024-11-08 04:29:22,931][41694] Fps is (10 sec: 6825.6, 60 sec: 6826.7, 300 sec: 6734.1). Total num frames: 148258816. Throughput: 0: 1656.5. Samples: 32059162. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:29:22,933][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 04:29:27,847][42004] Updated weights for policy 0, policy_version 36206 (0.0036) +[2024-11-08 04:29:27,932][41694] Fps is (10 sec: 7375.6, 60 sec: 6894.9, 300 sec: 6748.0). Total num frames: 148299776. Throughput: 0: 1750.9. Samples: 32070648. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 04:29:27,935][41694] Avg episode reward: [(0, '4.470')] +[2024-11-08 04:29:32,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6894.9, 300 sec: 6748.0). Total num frames: 148336640. Throughput: 0: 1774.6. Samples: 32076554. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 04:29:32,934][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 04:29:33,086][42004] Updated weights for policy 0, policy_version 36216 (0.0031) +[2024-11-08 04:29:37,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6963.2, 300 sec: 6803.5). Total num frames: 148377600. Throughput: 0: 1817.2. Samples: 32088530. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 04:29:37,935][41694] Avg episode reward: [(0, '4.394')] +[2024-11-08 04:29:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000036225_148377600.pth... +[2024-11-08 04:29:38,075][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000035827_146747392.pth +[2024-11-08 04:29:38,451][42004] Updated weights for policy 0, policy_version 36226 (0.0031) +[2024-11-08 04:29:42,933][41694] Fps is (10 sec: 7371.5, 60 sec: 7168.1, 300 sec: 6803.5). Total num frames: 148410368. Throughput: 0: 1800.9. Samples: 32099168. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 04:29:42,937][41694] Avg episode reward: [(0, '4.456')] +[2024-11-08 04:29:44,436][42004] Updated weights for policy 0, policy_version 36236 (0.0033) +[2024-11-08 04:29:47,932][41694] Fps is (10 sec: 5734.5, 60 sec: 7031.5, 300 sec: 6761.9). Total num frames: 148434944. Throughput: 0: 1786.9. Samples: 32104138. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:29:47,934][41694] Avg episode reward: [(0, '4.594')] +[2024-11-08 04:29:51,907][42004] Updated weights for policy 0, policy_version 36246 (0.0035) +[2024-11-08 04:29:52,932][41694] Fps is (10 sec: 5735.2, 60 sec: 6963.3, 300 sec: 6748.0). Total num frames: 148467712. Throughput: 0: 1726.9. Samples: 32111916. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:29:52,933][41694] Avg episode reward: [(0, '4.640')] +[2024-11-08 04:29:57,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 148496384. Throughput: 0: 1698.1. Samples: 32121012. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:29:57,934][41694] Avg episode reward: [(0, '4.356')] +[2024-11-08 04:29:58,512][42004] Updated weights for policy 0, policy_version 36256 (0.0048) +[2024-11-08 04:30:02,931][41694] Fps is (10 sec: 6553.9, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 148533248. Throughput: 0: 1723.1. Samples: 32126402. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:30:02,933][41694] Avg episode reward: [(0, '4.391')] +[2024-11-08 04:30:04,222][42004] Updated weights for policy 0, policy_version 36266 (0.0031) +[2024-11-08 04:30:07,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6826.7, 300 sec: 6734.1). Total num frames: 148570112. Throughput: 0: 1734.7. Samples: 32137226. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:30:07,934][41694] Avg episode reward: [(0, '4.251')] +[2024-11-08 04:30:09,898][42004] Updated weights for policy 0, policy_version 36276 (0.0040) +[2024-11-08 04:30:12,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6941.0, 300 sec: 6789.6). Total num frames: 148606976. Throughput: 0: 1714.1. Samples: 32147782. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:30:12,934][41694] Avg episode reward: [(0, '4.402')] +[2024-11-08 04:30:16,146][42004] Updated weights for policy 0, policy_version 36286 (0.0039) +[2024-11-08 04:30:17,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6827.1, 300 sec: 6775.8). Total num frames: 148635648. Throughput: 0: 1688.7. Samples: 32152544. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:30:17,933][41694] Avg episode reward: [(0, '4.347')] +[2024-11-08 04:30:22,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 148664320. Throughput: 0: 1607.0. Samples: 32160846. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:30:22,934][41694] Avg episode reward: [(0, '4.433')] +[2024-11-08 04:30:23,027][42004] Updated weights for policy 0, policy_version 36296 (0.0029) +[2024-11-08 04:30:27,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 148701184. Throughput: 0: 1602.4. Samples: 32171272. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:30:27,934][41694] Avg episode reward: [(0, '4.514')] +[2024-11-08 04:30:29,098][42004] Updated weights for policy 0, policy_version 36306 (0.0029) +[2024-11-08 04:30:32,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 148733952. Throughput: 0: 1603.1. Samples: 32176276. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:30:32,933][41694] Avg episode reward: [(0, '4.547')] +[2024-11-08 04:30:34,857][42004] Updated weights for policy 0, policy_version 36316 (0.0027) +[2024-11-08 04:30:37,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6553.6, 300 sec: 6734.1). Total num frames: 148770816. Throughput: 0: 1673.4. Samples: 32187218. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:30:37,936][41694] Avg episode reward: [(0, '4.512')] +[2024-11-08 04:30:40,561][42004] Updated weights for policy 0, policy_version 36326 (0.0023) +[2024-11-08 04:30:42,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6622.0, 300 sec: 6778.0). Total num frames: 148807680. Throughput: 0: 1709.5. Samples: 32197942. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:30:42,934][41694] Avg episode reward: [(0, '4.417')] +[2024-11-08 04:30:46,130][42004] Updated weights for policy 0, policy_version 36336 (0.0034) +[2024-11-08 04:30:47,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6826.7, 300 sec: 6775.8). Total num frames: 148844544. Throughput: 0: 1711.3. Samples: 32203410. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:30:47,934][41694] Avg episode reward: [(0, '4.278')] +[2024-11-08 04:30:51,835][42004] Updated weights for policy 0, policy_version 36346 (0.0033) +[2024-11-08 04:30:54,352][41694] Fps is (10 sec: 6097.2, 60 sec: 6668.8, 300 sec: 6743.3). Total num frames: 148877312. Throughput: 0: 1662.9. Samples: 32214418. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:30:54,353][41694] Avg episode reward: [(0, '4.328')] +[2024-11-08 04:30:57,932][41694] Fps is (10 sec: 6143.6, 60 sec: 6826.6, 300 sec: 6761.9). Total num frames: 148905984. Throughput: 0: 1660.2. Samples: 32222492. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:30:57,935][41694] Avg episode reward: [(0, '4.355')] +[2024-11-08 04:30:59,029][42004] Updated weights for policy 0, policy_version 36356 (0.0045) +[2024-11-08 04:31:02,932][41694] Fps is (10 sec: 6683.7, 60 sec: 6690.1, 300 sec: 6734.1). Total num frames: 148934656. Throughput: 0: 1665.4. Samples: 32227486. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:31:02,936][41694] Avg episode reward: [(0, '4.553')] +[2024-11-08 04:31:05,939][42004] Updated weights for policy 0, policy_version 36366 (0.0038) +[2024-11-08 04:31:07,932][41694] Fps is (10 sec: 6144.3, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 148967424. Throughput: 0: 1680.2. Samples: 32236456. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:31:07,934][41694] Avg episode reward: [(0, '4.539')] +[2024-11-08 04:31:12,100][42004] Updated weights for policy 0, policy_version 36376 (0.0026) +[2024-11-08 04:31:12,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6706.4). Total num frames: 149000192. Throughput: 0: 1669.7. Samples: 32246408. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:31:12,933][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 04:31:17,899][42004] Updated weights for policy 0, policy_version 36386 (0.0024) +[2024-11-08 04:31:17,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 6760.9). Total num frames: 149037056. Throughput: 0: 1669.6. Samples: 32251408. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:31:17,933][41694] Avg episode reward: [(0, '4.154')] +[2024-11-08 04:31:22,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6775.8). Total num frames: 149073920. Throughput: 0: 1673.9. Samples: 32262544. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:31:22,933][41694] Avg episode reward: [(0, '4.514')] +[2024-11-08 04:31:23,419][42004] Updated weights for policy 0, policy_version 36396 (0.0028) +[2024-11-08 04:31:27,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6621.8, 300 sec: 6734.1). Total num frames: 149098496. Throughput: 0: 1558.0. Samples: 32268052. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:31:27,933][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 04:31:30,459][42004] Updated weights for policy 0, policy_version 36406 (0.0032) +[2024-11-08 04:31:32,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6690.1, 300 sec: 6734.1). Total num frames: 149135360. Throughput: 0: 1623.6. Samples: 32276474. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:31:32,933][41694] Avg episode reward: [(0, '4.537')] +[2024-11-08 04:31:36,586][42004] Updated weights for policy 0, policy_version 36416 (0.0022) +[2024-11-08 04:31:37,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 149168128. Throughput: 0: 1654.5. Samples: 32286520. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:31:37,934][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 04:31:38,003][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000036418_149168128.pth... +[2024-11-08 04:31:38,212][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000036021_147542016.pth +[2024-11-08 04:31:42,457][42004] Updated weights for policy 0, policy_version 36426 (0.0038) +[2024-11-08 04:31:42,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6706.3). Total num frames: 149200896. Throughput: 0: 1651.0. Samples: 32296788. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:31:42,933][41694] Avg episode reward: [(0, '4.290')] +[2024-11-08 04:31:47,933][41694] Fps is (10 sec: 6962.2, 60 sec: 6553.4, 300 sec: 6706.3). Total num frames: 149237760. Throughput: 0: 1662.2. Samples: 32302288. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 04:31:47,935][41694] Avg episode reward: [(0, '4.451')] +[2024-11-08 04:31:47,983][42004] Updated weights for policy 0, policy_version 36436 (0.0040) +[2024-11-08 04:31:52,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6782.4, 300 sec: 6761.9). Total num frames: 149274624. Throughput: 0: 1711.6. Samples: 32313478. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 04:31:52,934][41694] Avg episode reward: [(0, '4.265')] +[2024-11-08 04:31:53,523][42004] Updated weights for policy 0, policy_version 36446 (0.0029) +[2024-11-08 04:31:57,932][41694] Fps is (10 sec: 7783.2, 60 sec: 6826.7, 300 sec: 6789.6). Total num frames: 149315584. Throughput: 0: 1742.0. Samples: 32324800. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 04:31:57,933][41694] Avg episode reward: [(0, '4.442')] +[2024-11-08 04:31:58,964][42004] Updated weights for policy 0, policy_version 36456 (0.0029) +[2024-11-08 04:32:02,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 149340160. Throughput: 0: 1753.6. Samples: 32330320. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 04:32:02,934][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 04:32:06,468][42004] Updated weights for policy 0, policy_version 36466 (0.0037) +[2024-11-08 04:32:07,932][41694] Fps is (10 sec: 5734.6, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 149372928. Throughput: 0: 1674.8. Samples: 32337910. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 04:32:07,933][41694] Avg episode reward: [(0, '4.684')] +[2024-11-08 04:32:12,534][42004] Updated weights for policy 0, policy_version 36476 (0.0029) +[2024-11-08 04:32:12,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6758.4, 300 sec: 6720.2). Total num frames: 149405696. Throughput: 0: 1777.7. Samples: 32348050. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:32:12,933][41694] Avg episode reward: [(0, '4.411')] +[2024-11-08 04:32:17,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6720.2). Total num frames: 149442560. Throughput: 0: 1709.6. Samples: 32353408. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:32:17,933][41694] Avg episode reward: [(0, '4.475')] +[2024-11-08 04:32:18,031][42004] Updated weights for policy 0, policy_version 36486 (0.0027) +[2024-11-08 04:32:22,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 149479424. Throughput: 0: 1738.3. Samples: 32364742. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:32:22,933][41694] Avg episode reward: [(0, '4.272')] +[2024-11-08 04:32:23,521][42004] Updated weights for policy 0, policy_version 36496 (0.0031) +[2024-11-08 04:32:27,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7031.5, 300 sec: 6803.5). Total num frames: 149520384. Throughput: 0: 1760.3. Samples: 32376002. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:32:27,934][41694] Avg episode reward: [(0, '4.483')] +[2024-11-08 04:32:28,943][42004] Updated weights for policy 0, policy_version 36506 (0.0027) +[2024-11-08 04:32:32,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7031.5, 300 sec: 6831.3). Total num frames: 149557248. Throughput: 0: 1760.7. Samples: 32381518. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:32:32,934][41694] Avg episode reward: [(0, '4.623')] +[2024-11-08 04:32:36,137][42004] Updated weights for policy 0, policy_version 36516 (0.0026) +[2024-11-08 04:32:37,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6894.9, 300 sec: 6803.5). Total num frames: 149581824. Throughput: 0: 1693.8. Samples: 32389700. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:32:37,933][41694] Avg episode reward: [(0, '4.552')] +[2024-11-08 04:32:42,172][42004] Updated weights for policy 0, policy_version 36526 (0.0036) +[2024-11-08 04:32:42,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6894.9, 300 sec: 6775.8). Total num frames: 149614592. Throughput: 0: 1667.9. Samples: 32399854. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:32:42,934][41694] Avg episode reward: [(0, '4.327')] +[2024-11-08 04:32:47,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6826.8, 300 sec: 6761.9). Total num frames: 149647360. Throughput: 0: 1659.1. Samples: 32404978. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:32:47,934][41694] Avg episode reward: [(0, '4.452')] +[2024-11-08 04:32:48,006][42004] Updated weights for policy 0, policy_version 36536 (0.0034) +[2024-11-08 04:32:52,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6894.9, 300 sec: 6789.6). Total num frames: 149688320. Throughput: 0: 1740.7. Samples: 32416240. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:32:52,933][41694] Avg episode reward: [(0, '4.375')] +[2024-11-08 04:32:53,171][42004] Updated weights for policy 0, policy_version 36546 (0.0024) +[2024-11-08 04:32:57,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6826.7, 300 sec: 6789.6). Total num frames: 149725184. Throughput: 0: 1771.6. Samples: 32427772. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:32:57,933][41694] Avg episode reward: [(0, '4.401')] +[2024-11-08 04:32:58,574][42004] Updated weights for policy 0, policy_version 36556 (0.0036) +[2024-11-08 04:33:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7031.5, 300 sec: 6845.2). Total num frames: 149762048. Throughput: 0: 1784.1. Samples: 32433692. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:33:02,934][41694] Avg episode reward: [(0, '4.831')] +[2024-11-08 04:33:04,427][42004] Updated weights for policy 0, policy_version 36566 (0.0029) +[2024-11-08 04:33:09,211][41694] Fps is (10 sec: 6173.5, 60 sec: 6884.7, 300 sec: 6815.6). Total num frames: 149794816. Throughput: 0: 1713.1. Samples: 32444022. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:33:09,213][41694] Avg episode reward: [(0, '4.812')] +[2024-11-08 04:33:11,607][42004] Updated weights for policy 0, policy_version 36576 (0.0033) +[2024-11-08 04:33:12,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6963.2, 300 sec: 6831.3). Total num frames: 149823488. Throughput: 0: 1693.2. Samples: 32452198. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:33:12,934][41694] Avg episode reward: [(0, '4.310')] +[2024-11-08 04:33:17,492][42004] Updated weights for policy 0, policy_version 36586 (0.0026) +[2024-11-08 04:33:17,932][41694] Fps is (10 sec: 7045.1, 60 sec: 6894.9, 300 sec: 6803.5). Total num frames: 149856256. Throughput: 0: 1683.6. Samples: 32457282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:33:17,933][41694] Avg episode reward: [(0, '4.266')] +[2024-11-08 04:33:22,932][41694] Fps is (10 sec: 6962.7, 60 sec: 6894.8, 300 sec: 6803.5). Total num frames: 149893120. Throughput: 0: 1729.7. Samples: 32467536. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:33:22,935][41694] Avg episode reward: [(0, '4.322')] +[2024-11-08 04:33:23,356][42004] Updated weights for policy 0, policy_version 36596 (0.0030) +[2024-11-08 04:33:27,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 149929984. Throughput: 0: 1757.3. Samples: 32478934. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:33:27,934][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 04:33:28,737][42004] Updated weights for policy 0, policy_version 36606 (0.0023) +[2024-11-08 04:33:32,934][41694] Fps is (10 sec: 7371.9, 60 sec: 6826.4, 300 sec: 6803.5). Total num frames: 149966848. Throughput: 0: 1768.3. Samples: 32484554. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:33:32,936][41694] Avg episode reward: [(0, '4.382')] +[2024-11-08 04:33:34,141][42004] Updated weights for policy 0, policy_version 36616 (0.0027) +[2024-11-08 04:33:37,932][41694] Fps is (10 sec: 7372.9, 60 sec: 7031.5, 300 sec: 6859.1). Total num frames: 150003712. Throughput: 0: 1759.9. Samples: 32495434. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:33:37,936][41694] Avg episode reward: [(0, '4.481')] +[2024-11-08 04:33:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000036622_150003712.pth... +[2024-11-08 04:33:38,155][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000036225_148377600.pth +[2024-11-08 04:33:40,036][42004] Updated weights for policy 0, policy_version 36626 (0.0030) +[2024-11-08 04:33:43,069][41694] Fps is (10 sec: 6062.0, 60 sec: 6879.2, 300 sec: 6828.1). Total num frames: 150028288. Throughput: 0: 1616.7. Samples: 32500744. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:33:43,071][41694] Avg episode reward: [(0, '4.498')] +[2024-11-08 04:33:47,421][42004] Updated weights for policy 0, policy_version 36636 (0.0029) +[2024-11-08 04:33:47,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6895.0, 300 sec: 6817.4). Total num frames: 150061056. Throughput: 0: 1659.3. Samples: 32508360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:33:47,934][41694] Avg episode reward: [(0, '4.382')] +[2024-11-08 04:33:52,931][41694] Fps is (10 sec: 7060.2, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 150097920. Throughput: 0: 1706.0. Samples: 32518610. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:33:52,933][41694] Avg episode reward: [(0, '4.515')] +[2024-11-08 04:33:53,500][42004] Updated weights for policy 0, policy_version 36646 (0.0027) +[2024-11-08 04:33:57,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 150134784. Throughput: 0: 1724.8. Samples: 32529816. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:33:57,934][41694] Avg episode reward: [(0, '4.708')] +[2024-11-08 04:33:58,678][42004] Updated weights for policy 0, policy_version 36656 (0.0028) +[2024-11-08 04:34:02,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 150167552. Throughput: 0: 1738.0. Samples: 32535490. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:34:02,935][41694] Avg episode reward: [(0, '4.313')] +[2024-11-08 04:34:04,990][42004] Updated weights for policy 0, policy_version 36666 (0.0026) +[2024-11-08 04:34:07,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6975.3, 300 sec: 6826.6). Total num frames: 150204416. Throughput: 0: 1733.3. Samples: 32545536. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:34:07,935][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 04:34:10,673][42004] Updated weights for policy 0, policy_version 36676 (0.0030) +[2024-11-08 04:34:12,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6895.0, 300 sec: 6817.5). Total num frames: 150237184. Throughput: 0: 1714.9. Samples: 32556106. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:34:12,933][41694] Avg episode reward: [(0, '4.412')] +[2024-11-08 04:34:17,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 150261760. Throughput: 0: 1715.8. Samples: 32561762. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:34:17,933][41694] Avg episode reward: [(0, '4.553')] +[2024-11-08 04:34:17,962][42004] Updated weights for policy 0, policy_version 36686 (0.0033) +[2024-11-08 04:34:22,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6758.5, 300 sec: 6775.8). Total num frames: 150298624. Throughput: 0: 1640.7. Samples: 32569264. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:34:22,934][41694] Avg episode reward: [(0, '4.503')] +[2024-11-08 04:34:23,893][42004] Updated weights for policy 0, policy_version 36696 (0.0020) +[2024-11-08 04:34:27,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6690.2, 300 sec: 6761.9). Total num frames: 150331392. Throughput: 0: 1751.3. Samples: 32579310. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:34:27,933][41694] Avg episode reward: [(0, '4.442')] +[2024-11-08 04:34:29,970][42004] Updated weights for policy 0, policy_version 36706 (0.0032) +[2024-11-08 04:34:32,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.4, 300 sec: 6748.0). Total num frames: 150368256. Throughput: 0: 1698.0. Samples: 32584770. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:34:32,937][41694] Avg episode reward: [(0, '4.498')] +[2024-11-08 04:34:35,203][42004] Updated weights for policy 0, policy_version 36716 (0.0024) +[2024-11-08 04:34:37,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.2, 300 sec: 6761.9). Total num frames: 150405120. Throughput: 0: 1719.8. Samples: 32596002. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:34:37,933][41694] Avg episode reward: [(0, '4.431')] +[2024-11-08 04:34:40,943][42004] Updated weights for policy 0, policy_version 36726 (0.0026) +[2024-11-08 04:34:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6910.8, 300 sec: 6803.5). Total num frames: 150441984. Throughput: 0: 1717.6. Samples: 32607106. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:34:42,933][41694] Avg episode reward: [(0, '4.422')] +[2024-11-08 04:34:46,678][42004] Updated weights for policy 0, policy_version 36736 (0.0027) +[2024-11-08 04:34:47,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6963.2, 300 sec: 6817.4). Total num frames: 150478848. Throughput: 0: 1701.4. Samples: 32612054. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:34:47,936][41694] Avg episode reward: [(0, '4.383')] +[2024-11-08 04:34:52,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6690.1, 300 sec: 6789.6). Total num frames: 150499328. Throughput: 0: 1654.4. Samples: 32619984. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:34:52,934][41694] Avg episode reward: [(0, '4.350')] +[2024-11-08 04:34:54,209][42004] Updated weights for policy 0, policy_version 36746 (0.0044) +[2024-11-08 04:34:57,931][41694] Fps is (10 sec: 5734.6, 60 sec: 6690.1, 300 sec: 6789.6). Total num frames: 150536192. Throughput: 0: 1648.4. Samples: 32630286. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:34:57,938][41694] Avg episode reward: [(0, '4.317')] +[2024-11-08 04:35:00,162][42004] Updated weights for policy 0, policy_version 36756 (0.0028) +[2024-11-08 04:35:02,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 150568960. Throughput: 0: 1635.6. Samples: 32635362. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:35:02,933][41694] Avg episode reward: [(0, '4.421')] +[2024-11-08 04:35:06,004][42004] Updated weights for policy 0, policy_version 36766 (0.0027) +[2024-11-08 04:35:07,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 150605824. Throughput: 0: 1703.2. Samples: 32645910. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:35:07,934][41694] Avg episode reward: [(0, '4.514')] +[2024-11-08 04:35:11,613][42004] Updated weights for policy 0, policy_version 36776 (0.0031) +[2024-11-08 04:35:12,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 150642688. Throughput: 0: 1724.3. Samples: 32656904. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:35:12,938][41694] Avg episode reward: [(0, '4.444')] +[2024-11-08 04:35:17,816][42004] Updated weights for policy 0, policy_version 36786 (0.0047) +[2024-11-08 04:35:17,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6894.9, 300 sec: 6817.4). Total num frames: 150675456. Throughput: 0: 1709.4. Samples: 32661694. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:35:17,934][41694] Avg episode reward: [(0, '4.615')] +[2024-11-08 04:35:22,933][41694] Fps is (10 sec: 6962.3, 60 sec: 6894.8, 300 sec: 6817.4). Total num frames: 150712320. Throughput: 0: 1693.2. Samples: 32672198. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:35:22,936][41694] Avg episode reward: [(0, '4.578')] +[2024-11-08 04:35:25,127][42004] Updated weights for policy 0, policy_version 36796 (0.0033) +[2024-11-08 04:35:27,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 150732800. Throughput: 0: 1619.0. Samples: 32679962. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:35:27,934][41694] Avg episode reward: [(0, '4.513')] +[2024-11-08 04:35:31,046][42004] Updated weights for policy 0, policy_version 36806 (0.0026) +[2024-11-08 04:35:32,932][41694] Fps is (10 sec: 5734.7, 60 sec: 6690.0, 300 sec: 6775.7). Total num frames: 150769664. Throughput: 0: 1623.0. Samples: 32685090. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:35:32,935][41694] Avg episode reward: [(0, '4.446')] +[2024-11-08 04:35:36,821][42004] Updated weights for policy 0, policy_version 36816 (0.0031) +[2024-11-08 04:35:37,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 150806528. Throughput: 0: 1681.6. Samples: 32695654. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:35:37,933][41694] Avg episode reward: [(0, '4.286')] +[2024-11-08 04:35:37,953][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000036818_150806528.pth... +[2024-11-08 04:35:38,050][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000036418_149168128.pth +[2024-11-08 04:35:42,129][42004] Updated weights for policy 0, policy_version 36826 (0.0026) +[2024-11-08 04:35:42,932][41694] Fps is (10 sec: 7373.4, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 150843392. Throughput: 0: 1711.1. Samples: 32707286. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:35:42,933][41694] Avg episode reward: [(0, '4.344')] +[2024-11-08 04:35:47,351][42004] Updated weights for policy 0, policy_version 36836 (0.0026) +[2024-11-08 04:35:47,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6758.4, 300 sec: 6836.4). Total num frames: 150884352. Throughput: 0: 1718.7. Samples: 32712702. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:35:47,935][41694] Avg episode reward: [(0, '4.583')] +[2024-11-08 04:35:52,886][42004] Updated weights for policy 0, policy_version 36846 (0.0023) +[2024-11-08 04:35:52,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7031.5, 300 sec: 6831.3). Total num frames: 150921216. Throughput: 0: 1743.8. Samples: 32724382. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:35:52,935][41694] Avg episode reward: [(0, '4.503')] +[2024-11-08 04:35:59,049][41694] Fps is (10 sec: 6263.0, 60 sec: 6835.8, 300 sec: 6819.3). Total num frames: 150953984. Throughput: 0: 1711.8. Samples: 32735848. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:35:59,051][41694] Avg episode reward: [(0, '4.219')] +[2024-11-08 04:36:00,077][42004] Updated weights for policy 0, policy_version 36856 (0.0024) +[2024-11-08 04:36:02,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 150978560. Throughput: 0: 1695.3. Samples: 32737982. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:36:02,933][41694] Avg episode reward: [(0, '4.367')] +[2024-11-08 04:36:06,511][42004] Updated weights for policy 0, policy_version 36866 (0.0034) +[2024-11-08 04:36:07,932][41694] Fps is (10 sec: 6456.1, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 151011328. Throughput: 0: 1673.0. Samples: 32747482. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:36:07,934][41694] Avg episode reward: [(0, '4.342')] +[2024-11-08 04:36:12,637][42004] Updated weights for policy 0, policy_version 36876 (0.0043) +[2024-11-08 04:36:12,932][41694] Fps is (10 sec: 6553.2, 60 sec: 6690.1, 300 sec: 6803.5). Total num frames: 151044096. Throughput: 0: 1719.4. Samples: 32757336. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:36:12,935][41694] Avg episode reward: [(0, '4.409')] +[2024-11-08 04:36:17,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 151080960. Throughput: 0: 1724.6. Samples: 32762696. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:36:17,935][41694] Avg episode reward: [(0, '4.379')] +[2024-11-08 04:36:18,226][42004] Updated weights for policy 0, policy_version 36886 (0.0033) +[2024-11-08 04:36:22,932][41694] Fps is (10 sec: 7373.3, 60 sec: 6758.6, 300 sec: 6845.2). Total num frames: 151117824. Throughput: 0: 1741.7. Samples: 32774032. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:36:22,934][41694] Avg episode reward: [(0, '4.299')] +[2024-11-08 04:36:23,627][42004] Updated weights for policy 0, policy_version 36896 (0.0030) +[2024-11-08 04:36:27,932][41694] Fps is (10 sec: 7373.0, 60 sec: 7031.5, 300 sec: 6845.2). Total num frames: 151154688. Throughput: 0: 1734.1. Samples: 32785320. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:36:27,933][41694] Avg episode reward: [(0, '4.387')] +[2024-11-08 04:36:29,116][42004] Updated weights for policy 0, policy_version 36906 (0.0063) +[2024-11-08 04:36:33,228][41694] Fps is (10 sec: 5967.2, 60 sec: 6793.2, 300 sec: 6810.6). Total num frames: 151179264. Throughput: 0: 1727.3. Samples: 32790944. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 04:36:33,230][41694] Avg episode reward: [(0, '4.366')] +[2024-11-08 04:36:36,953][42004] Updated weights for policy 0, policy_version 36916 (0.0030) +[2024-11-08 04:36:37,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 151212032. Throughput: 0: 1632.2. Samples: 32797832. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 04:36:37,934][41694] Avg episode reward: [(0, '4.583')] +[2024-11-08 04:36:42,931][41694] Fps is (10 sec: 6753.8, 60 sec: 6690.2, 300 sec: 6803.6). Total num frames: 151244800. Throughput: 0: 1642.0. Samples: 32807904. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 04:36:42,933][41694] Avg episode reward: [(0, '4.409')] +[2024-11-08 04:36:43,030][42004] Updated weights for policy 0, policy_version 36926 (0.0046) +[2024-11-08 04:36:47,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6817.4). Total num frames: 151285760. Throughput: 0: 1670.8. Samples: 32813168. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 04:36:47,933][41694] Avg episode reward: [(0, '4.505')] +[2024-11-08 04:36:48,404][42004] Updated weights for policy 0, policy_version 36936 (0.0026) +[2024-11-08 04:36:52,932][41694] Fps is (10 sec: 7781.6, 60 sec: 6690.0, 300 sec: 6803.5). Total num frames: 151322624. Throughput: 0: 1716.7. Samples: 32824736. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 04:36:52,935][41694] Avg episode reward: [(0, '4.565')] +[2024-11-08 04:36:53,897][42004] Updated weights for policy 0, policy_version 36946 (0.0031) +[2024-11-08 04:36:57,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6886.7, 300 sec: 6845.2). Total num frames: 151359488. Throughput: 0: 1751.2. Samples: 32836140. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:36:57,935][41694] Avg episode reward: [(0, '4.563')] +[2024-11-08 04:36:59,526][42004] Updated weights for policy 0, policy_version 36956 (0.0025) +[2024-11-08 04:37:02,932][41694] Fps is (10 sec: 6963.8, 60 sec: 6894.9, 300 sec: 6845.2). Total num frames: 151392256. Throughput: 0: 1748.9. Samples: 32841396. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:37:02,935][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 04:37:05,370][42004] Updated weights for policy 0, policy_version 36966 (0.0049) +[2024-11-08 04:37:07,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 151416832. Throughput: 0: 1709.1. Samples: 32850942. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:37:07,934][41694] Avg episode reward: [(0, '4.356')] +[2024-11-08 04:37:12,925][42004] Updated weights for policy 0, policy_version 36976 (0.0031) +[2024-11-08 04:37:12,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 151453696. Throughput: 0: 1646.5. Samples: 32859412. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:37:12,933][41694] Avg episode reward: [(0, '4.572')] +[2024-11-08 04:37:17,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 151486464. Throughput: 0: 1644.6. Samples: 32864462. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:37:17,935][41694] Avg episode reward: [(0, '4.496')] +[2024-11-08 04:37:18,628][42004] Updated weights for policy 0, policy_version 36986 (0.0026) +[2024-11-08 04:37:22,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 151527424. Throughput: 0: 1727.5. Samples: 32875568. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:37:22,934][41694] Avg episode reward: [(0, '4.309')] +[2024-11-08 04:37:23,970][42004] Updated weights for policy 0, policy_version 36996 (0.0029) +[2024-11-08 04:37:27,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 151560192. Throughput: 0: 1743.1. Samples: 32886344. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:37:27,934][41694] Avg episode reward: [(0, '4.386')] +[2024-11-08 04:37:29,908][42004] Updated weights for policy 0, policy_version 37006 (0.0035) +[2024-11-08 04:37:32,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6997.8, 300 sec: 6831.3). Total num frames: 151597056. Throughput: 0: 1752.9. Samples: 32892048. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:37:32,934][41694] Avg episode reward: [(0, '4.528')] +[2024-11-08 04:37:35,470][42004] Updated weights for policy 0, policy_version 37016 (0.0035) +[2024-11-08 04:37:37,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 6845.2). Total num frames: 151633920. Throughput: 0: 1740.6. Samples: 32903062. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:37:37,933][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 04:37:37,946][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000037020_151633920.pth... +[2024-11-08 04:37:38,068][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000036622_150003712.pth +[2024-11-08 04:37:42,777][42004] Updated weights for policy 0, policy_version 37026 (0.0029) +[2024-11-08 04:37:42,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6894.9, 300 sec: 6817.4). Total num frames: 151658496. Throughput: 0: 1659.5. Samples: 32910818. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:37:42,933][41694] Avg episode reward: [(0, '4.535')] +[2024-11-08 04:37:47,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 151691264. Throughput: 0: 1652.2. Samples: 32915746. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:37:47,935][41694] Avg episode reward: [(0, '4.500')] +[2024-11-08 04:37:48,862][42004] Updated weights for policy 0, policy_version 37036 (0.0026) +[2024-11-08 04:37:52,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.5, 300 sec: 6789.6). Total num frames: 151728128. Throughput: 0: 1667.3. Samples: 32925972. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:37:52,932][41694] Avg episode reward: [(0, '4.394')] +[2024-11-08 04:37:54,502][42004] Updated weights for policy 0, policy_version 37046 (0.0026) +[2024-11-08 04:37:57,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 151760896. Throughput: 0: 1730.0. Samples: 32937264. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:37:57,933][41694] Avg episode reward: [(0, '4.491')] +[2024-11-08 04:38:00,274][42004] Updated weights for policy 0, policy_version 37056 (0.0034) +[2024-11-08 04:38:02,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6758.4, 300 sec: 6819.2). Total num frames: 151797760. Throughput: 0: 1732.8. Samples: 32942436. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:38:02,934][41694] Avg episode reward: [(0, '4.538')] +[2024-11-08 04:38:05,996][42004] Updated weights for policy 0, policy_version 37066 (0.0024) +[2024-11-08 04:38:07,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6963.2, 300 sec: 6817.4). Total num frames: 151834624. Throughput: 0: 1723.5. Samples: 32953124. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:38:07,934][41694] Avg episode reward: [(0, '4.368')] +[2024-11-08 04:38:11,557][42004] Updated weights for policy 0, policy_version 37076 (0.0034) +[2024-11-08 04:38:12,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 6831.3). Total num frames: 151871488. Throughput: 0: 1732.0. Samples: 32964282. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:38:12,933][41694] Avg episode reward: [(0, '4.542')] +[2024-11-08 04:38:17,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 151891968. Throughput: 0: 1651.8. Samples: 32966378. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:38:17,935][41694] Avg episode reward: [(0, '4.405')] +[2024-11-08 04:38:19,440][42004] Updated weights for policy 0, policy_version 37086 (0.0043) +[2024-11-08 04:38:22,934][41694] Fps is (10 sec: 5323.6, 60 sec: 6621.6, 300 sec: 6761.8). Total num frames: 151924736. Throughput: 0: 1620.6. Samples: 32975992. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:38:22,935][41694] Avg episode reward: [(0, '4.275')] +[2024-11-08 04:38:25,500][42004] Updated weights for policy 0, policy_version 37096 (0.0025) +[2024-11-08 04:38:27,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 151961600. Throughput: 0: 1684.5. Samples: 32986620. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:38:27,934][41694] Avg episode reward: [(0, '4.294')] +[2024-11-08 04:38:31,446][42004] Updated weights for policy 0, policy_version 37106 (0.0027) +[2024-11-08 04:38:32,931][41694] Fps is (10 sec: 6964.9, 60 sec: 6621.9, 300 sec: 6748.0). Total num frames: 151994368. Throughput: 0: 1687.6. Samples: 32991690. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:38:32,933][41694] Avg episode reward: [(0, '4.398')] +[2024-11-08 04:38:37,059][42004] Updated weights for policy 0, policy_version 37116 (0.0032) +[2024-11-08 04:38:37,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 6792.8). Total num frames: 152031232. Throughput: 0: 1694.7. Samples: 33002236. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:38:37,933][41694] Avg episode reward: [(0, '4.376')] +[2024-11-08 04:38:42,593][42004] Updated weights for policy 0, policy_version 37126 (0.0024) +[2024-11-08 04:38:42,933][41694] Fps is (10 sec: 7371.7, 60 sec: 6826.5, 300 sec: 6803.5). Total num frames: 152068096. Throughput: 0: 1693.6. Samples: 33013478. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:38:42,935][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 04:38:48,934][41694] Fps is (10 sec: 6328.9, 60 sec: 6714.5, 300 sec: 6766.6). Total num frames: 152100864. Throughput: 0: 1663.3. Samples: 33018952. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:38:48,935][41694] Avg episode reward: [(0, '4.317')] +[2024-11-08 04:38:49,774][42004] Updated weights for policy 0, policy_version 37136 (0.0030) +[2024-11-08 04:38:52,931][41694] Fps is (10 sec: 5735.2, 60 sec: 6621.9, 300 sec: 6748.0). Total num frames: 152125440. Throughput: 0: 1636.5. Samples: 33026764. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:38:52,933][41694] Avg episode reward: [(0, '4.444')] +[2024-11-08 04:38:56,108][42004] Updated weights for policy 0, policy_version 37146 (0.0036) +[2024-11-08 04:38:57,932][41694] Fps is (10 sec: 6828.3, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 152162304. Throughput: 0: 1608.8. Samples: 33036678. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:38:57,934][41694] Avg episode reward: [(0, '4.789')] +[2024-11-08 04:39:01,808][42004] Updated weights for policy 0, policy_version 37156 (0.0024) +[2024-11-08 04:39:02,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6621.9, 300 sec: 6748.0). Total num frames: 152195072. Throughput: 0: 1681.3. Samples: 33042036. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:39:02,934][41694] Avg episode reward: [(0, '4.461')] +[2024-11-08 04:39:07,388][42004] Updated weights for policy 0, policy_version 37166 (0.0029) +[2024-11-08 04:39:07,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 6761.9). Total num frames: 152231936. Throughput: 0: 1711.3. Samples: 33052996. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:39:07,935][41694] Avg episode reward: [(0, '4.380')] +[2024-11-08 04:39:12,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6803.5). Total num frames: 152268800. Throughput: 0: 1711.3. Samples: 33063630. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:39:12,933][41694] Avg episode reward: [(0, '4.547')] +[2024-11-08 04:39:13,136][42004] Updated weights for policy 0, policy_version 37176 (0.0030) +[2024-11-08 04:39:17,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6895.0, 300 sec: 6803.5). Total num frames: 152305664. Throughput: 0: 1718.6. Samples: 33069028. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:39:17,936][41694] Avg episode reward: [(0, '4.348')] +[2024-11-08 04:39:18,807][42004] Updated weights for policy 0, policy_version 37186 (0.0026) +[2024-11-08 04:39:22,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6690.4, 300 sec: 6761.9). Total num frames: 152326144. Throughput: 0: 1722.2. Samples: 33079734. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:39:22,933][41694] Avg episode reward: [(0, '4.449')] +[2024-11-08 04:39:26,644][42004] Updated weights for policy 0, policy_version 37196 (0.0031) +[2024-11-08 04:39:27,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.9, 300 sec: 6748.0). Total num frames: 152358912. Throughput: 0: 1631.8. Samples: 33086908. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:39:27,934][41694] Avg episode reward: [(0, '4.306')] +[2024-11-08 04:39:32,583][42004] Updated weights for policy 0, policy_version 37206 (0.0032) +[2024-11-08 04:39:32,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 152395776. Throughput: 0: 1648.6. Samples: 33091486. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:39:32,936][41694] Avg episode reward: [(0, '4.541')] +[2024-11-08 04:39:37,912][42004] Updated weights for policy 0, policy_version 37216 (0.0026) +[2024-11-08 04:39:37,932][41694] Fps is (10 sec: 7782.0, 60 sec: 6758.3, 300 sec: 6761.9). Total num frames: 152436736. Throughput: 0: 1694.6. Samples: 33103020. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:39:37,934][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 04:39:37,943][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000037216_152436736.pth... +[2024-11-08 04:39:38,071][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000036818_150806528.pth +[2024-11-08 04:39:42,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.6, 300 sec: 6761.9). Total num frames: 152473600. Throughput: 0: 1725.5. Samples: 33114326. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:39:42,933][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 04:39:43,555][42004] Updated weights for policy 0, policy_version 37226 (0.0031) +[2024-11-08 04:39:47,932][41694] Fps is (10 sec: 6963.6, 60 sec: 6873.2, 300 sec: 6803.5). Total num frames: 152506368. Throughput: 0: 1721.8. Samples: 33119518. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:39:47,933][41694] Avg episode reward: [(0, '4.423')] +[2024-11-08 04:39:49,269][42004] Updated weights for policy 0, policy_version 37236 (0.0028) +[2024-11-08 04:39:52,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6963.2, 300 sec: 6803.5). Total num frames: 152543232. Throughput: 0: 1720.1. Samples: 33130398. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:39:52,933][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 04:39:54,759][42004] Updated weights for policy 0, policy_version 37246 (0.0022) +[2024-11-08 04:39:57,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 152567808. Throughput: 0: 1660.6. Samples: 33138356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:39:57,941][41694] Avg episode reward: [(0, '4.322')] +[2024-11-08 04:40:02,935][41694] Fps is (10 sec: 5322.8, 60 sec: 6689.7, 300 sec: 6747.9). Total num frames: 152596480. Throughput: 0: 1636.2. Samples: 33142664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:40:02,940][41694] Avg episode reward: [(0, '4.320')] +[2024-11-08 04:40:03,152][42004] Updated weights for policy 0, policy_version 37256 (0.0029) +[2024-11-08 04:40:07,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6690.2, 300 sec: 6748.0). Total num frames: 152633344. Throughput: 0: 1624.9. Samples: 33152856. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:40:07,933][41694] Avg episode reward: [(0, '4.483')] +[2024-11-08 04:40:08,630][42004] Updated weights for policy 0, policy_version 37266 (0.0031) +[2024-11-08 04:40:12,932][41694] Fps is (10 sec: 7375.2, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 152670208. Throughput: 0: 1714.9. Samples: 33164080. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:40:12,934][41694] Avg episode reward: [(0, '4.499')] +[2024-11-08 04:40:14,589][42004] Updated weights for policy 0, policy_version 37276 (0.0032) +[2024-11-08 04:40:17,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6621.9, 300 sec: 6748.0). Total num frames: 152702976. Throughput: 0: 1722.2. Samples: 33168986. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:40:17,933][41694] Avg episode reward: [(0, '4.387')] +[2024-11-08 04:40:20,598][42004] Updated weights for policy 0, policy_version 37286 (0.0035) +[2024-11-08 04:40:22,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6894.9, 300 sec: 6803.5). Total num frames: 152739840. Throughput: 0: 1694.3. Samples: 33179262. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:40:22,933][41694] Avg episode reward: [(0, '4.452')] +[2024-11-08 04:40:26,133][42004] Updated weights for policy 0, policy_version 37296 (0.0024) +[2024-11-08 04:40:27,933][41694] Fps is (10 sec: 7371.4, 60 sec: 6963.0, 300 sec: 6803.5). Total num frames: 152776704. Throughput: 0: 1689.4. Samples: 33190352. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:40:27,937][41694] Avg episode reward: [(0, '4.491')] +[2024-11-08 04:40:32,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 152797184. Throughput: 0: 1652.9. Samples: 33193898. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:40:32,936][41694] Avg episode reward: [(0, '4.387')] +[2024-11-08 04:40:33,717][42004] Updated weights for policy 0, policy_version 37306 (0.0029) +[2024-11-08 04:40:37,932][41694] Fps is (10 sec: 5325.8, 60 sec: 6553.7, 300 sec: 6734.1). Total num frames: 152829952. Throughput: 0: 1600.0. Samples: 33202398. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:40:37,933][41694] Avg episode reward: [(0, '4.634')] +[2024-11-08 04:40:39,761][42004] Updated weights for policy 0, policy_version 37316 (0.0051) +[2024-11-08 04:40:42,931][41694] Fps is (10 sec: 6963.7, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 152866816. Throughput: 0: 1660.9. Samples: 33213094. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:40:42,933][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 04:40:45,354][42004] Updated weights for policy 0, policy_version 37326 (0.0025) +[2024-11-08 04:40:47,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 152903680. Throughput: 0: 1688.3. Samples: 33218630. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:40:47,933][41694] Avg episode reward: [(0, '4.433')] +[2024-11-08 04:40:50,954][42004] Updated weights for policy 0, policy_version 37336 (0.0026) +[2024-11-08 04:40:52,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6621.9, 300 sec: 6759.7). Total num frames: 152940544. Throughput: 0: 1708.7. Samples: 33229748. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:40:52,935][41694] Avg episode reward: [(0, '4.575')] +[2024-11-08 04:40:56,390][42004] Updated weights for policy 0, policy_version 37346 (0.0027) +[2024-11-08 04:40:57,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6775.8). Total num frames: 152977408. Throughput: 0: 1706.2. Samples: 33240858. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:40:57,933][41694] Avg episode reward: [(0, '4.389')] +[2024-11-08 04:41:02,364][42004] Updated weights for policy 0, policy_version 37356 (0.0034) +[2024-11-08 04:41:02,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6963.6, 300 sec: 6789.6). Total num frames: 153014272. Throughput: 0: 1715.2. Samples: 33246172. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:41:02,933][41694] Avg episode reward: [(0, '4.506')] +[2024-11-08 04:41:07,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 153034752. Throughput: 0: 1644.3. Samples: 33253254. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:41:07,933][41694] Avg episode reward: [(0, '4.490')] +[2024-11-08 04:41:10,453][42004] Updated weights for policy 0, policy_version 37366 (0.0050) +[2024-11-08 04:41:12,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 153063424. Throughput: 0: 1603.5. Samples: 33262508. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:41:12,933][41694] Avg episode reward: [(0, '4.579')] +[2024-11-08 04:41:16,743][42004] Updated weights for policy 0, policy_version 37376 (0.0034) +[2024-11-08 04:41:17,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 153100288. Throughput: 0: 1631.8. Samples: 33267328. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:41:17,934][41694] Avg episode reward: [(0, '4.386')] +[2024-11-08 04:41:22,523][42004] Updated weights for policy 0, policy_version 37386 (0.0034) +[2024-11-08 04:41:22,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6553.6, 300 sec: 6706.3). Total num frames: 153133056. Throughput: 0: 1679.4. Samples: 33277972. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:41:22,934][41694] Avg episode reward: [(0, '4.542')] +[2024-11-08 04:41:27,913][42004] Updated weights for policy 0, policy_version 37396 (0.0046) +[2024-11-08 04:41:27,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6622.1, 300 sec: 6768.7). Total num frames: 153174016. Throughput: 0: 1693.8. Samples: 33289314. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:41:27,933][41694] Avg episode reward: [(0, '4.487')] +[2024-11-08 04:41:32,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6895.0, 300 sec: 6775.8). Total num frames: 153210880. Throughput: 0: 1693.2. Samples: 33294824. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:41:32,933][41694] Avg episode reward: [(0, '4.712')] +[2024-11-08 04:41:33,451][42004] Updated weights for policy 0, policy_version 37406 (0.0025) +[2024-11-08 04:41:38,813][41694] Fps is (10 sec: 6022.4, 60 sec: 6727.8, 300 sec: 6741.7). Total num frames: 153239552. Throughput: 0: 1662.0. Samples: 33306002. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:41:38,815][41694] Avg episode reward: [(0, '4.573')] +[2024-11-08 04:41:38,833][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000037412_153239552.pth... +[2024-11-08 04:41:38,966][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000037020_151633920.pth +[2024-11-08 04:41:41,104][42004] Updated weights for policy 0, policy_version 37416 (0.0028) +[2024-11-08 04:41:42,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6621.9, 300 sec: 6706.3). Total num frames: 153264128. Throughput: 0: 1602.8. Samples: 33312986. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:41:42,933][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 04:41:47,145][42004] Updated weights for policy 0, policy_version 37426 (0.0030) +[2024-11-08 04:41:47,932][41694] Fps is (10 sec: 6738.2, 60 sec: 6621.8, 300 sec: 6706.4). Total num frames: 153300992. Throughput: 0: 1593.0. Samples: 33317856. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:41:47,935][41694] Avg episode reward: [(0, '4.491')] +[2024-11-08 04:41:52,565][42004] Updated weights for policy 0, policy_version 37436 (0.0028) +[2024-11-08 04:41:52,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6621.9, 300 sec: 6706.3). Total num frames: 153337856. Throughput: 0: 1683.7. Samples: 33329022. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:41:52,933][41694] Avg episode reward: [(0, '4.371')] +[2024-11-08 04:41:57,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6621.8, 300 sec: 6720.2). Total num frames: 153374720. Throughput: 0: 1725.6. Samples: 33340160. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:41:57,934][41694] Avg episode reward: [(0, '4.232')] +[2024-11-08 04:41:58,096][42004] Updated weights for policy 0, policy_version 37446 (0.0022) +[2024-11-08 04:42:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6621.8, 300 sec: 6761.9). Total num frames: 153411584. Throughput: 0: 1745.6. Samples: 33345882. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:42:02,934][41694] Avg episode reward: [(0, '4.411')] +[2024-11-08 04:42:04,677][42004] Updated weights for policy 0, policy_version 37456 (0.0023) +[2024-11-08 04:42:07,932][41694] Fps is (10 sec: 6144.1, 60 sec: 6690.1, 300 sec: 6720.2). Total num frames: 153436160. Throughput: 0: 1695.2. Samples: 33354256. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:42:07,934][41694] Avg episode reward: [(0, '4.469')] +[2024-11-08 04:42:12,932][41694] Fps is (10 sec: 4505.6, 60 sec: 6553.6, 300 sec: 6678.6). Total num frames: 153456640. Throughput: 0: 1611.8. Samples: 33361844. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:42:12,934][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 04:42:13,347][42004] Updated weights for policy 0, policy_version 37466 (0.0042) +[2024-11-08 04:42:17,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6485.3, 300 sec: 6650.8). Total num frames: 153489408. Throughput: 0: 1558.6. Samples: 33364960. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:42:17,933][41694] Avg episode reward: [(0, '4.372')] +[2024-11-08 04:42:19,601][42004] Updated weights for policy 0, policy_version 37476 (0.0038) +[2024-11-08 04:42:22,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6485.3, 300 sec: 6650.8). Total num frames: 153522176. Throughput: 0: 1571.5. Samples: 33375332. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:42:22,933][41694] Avg episode reward: [(0, '4.499')] +[2024-11-08 04:42:25,231][42004] Updated weights for policy 0, policy_version 37486 (0.0031) +[2024-11-08 04:42:27,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6417.0, 300 sec: 6650.8). Total num frames: 153559040. Throughput: 0: 1631.5. Samples: 33386406. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:42:27,934][41694] Avg episode reward: [(0, '4.479')] +[2024-11-08 04:42:30,948][42004] Updated weights for policy 0, policy_version 37496 (0.0041) +[2024-11-08 04:42:32,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6417.1, 300 sec: 6650.8). Total num frames: 153595904. Throughput: 0: 1641.7. Samples: 33391732. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:42:32,933][41694] Avg episode reward: [(0, '4.617')] +[2024-11-08 04:42:36,503][42004] Updated weights for policy 0, policy_version 37506 (0.0027) +[2024-11-08 04:42:37,933][41694] Fps is (10 sec: 7371.9, 60 sec: 6651.2, 300 sec: 6692.4). Total num frames: 153632768. Throughput: 0: 1643.0. Samples: 33402960. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:42:37,938][41694] Avg episode reward: [(0, '4.562')] +[2024-11-08 04:42:41,982][42004] Updated weights for policy 0, policy_version 37516 (0.0031) +[2024-11-08 04:42:42,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 153669632. Throughput: 0: 1642.0. Samples: 33414050. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:42:42,934][41694] Avg episode reward: [(0, '4.483')] +[2024-11-08 04:42:47,932][41694] Fps is (10 sec: 6144.9, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 153694208. Throughput: 0: 1634.8. Samples: 33419450. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:42:47,939][41694] Avg episode reward: [(0, '4.430')] +[2024-11-08 04:42:49,771][42004] Updated weights for policy 0, policy_version 37526 (0.0050) +[2024-11-08 04:42:52,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6485.3, 300 sec: 6664.7). Total num frames: 153726976. Throughput: 0: 1602.0. Samples: 33426344. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:42:52,933][41694] Avg episode reward: [(0, '4.287')] +[2024-11-08 04:42:55,675][42004] Updated weights for policy 0, policy_version 37536 (0.0034) +[2024-11-08 04:42:57,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6485.3, 300 sec: 6664.7). Total num frames: 153763840. Throughput: 0: 1669.4. Samples: 33436966. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:42:57,935][41694] Avg episode reward: [(0, '4.586')] +[2024-11-08 04:43:01,332][42004] Updated weights for policy 0, policy_version 37546 (0.0034) +[2024-11-08 04:43:02,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6417.1, 300 sec: 6650.8). Total num frames: 153796608. Throughput: 0: 1724.1. Samples: 33442544. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:43:02,939][41694] Avg episode reward: [(0, '4.349')] +[2024-11-08 04:43:07,636][42004] Updated weights for policy 0, policy_version 37556 (0.0032) +[2024-11-08 04:43:07,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6636.9). Total num frames: 153829376. Throughput: 0: 1700.8. Samples: 33451870. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:43:07,935][41694] Avg episode reward: [(0, '4.318')] +[2024-11-08 04:43:12,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6826.7, 300 sec: 6692.5). Total num frames: 153866240. Throughput: 0: 1691.8. Samples: 33462538. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:43:12,934][41694] Avg episode reward: [(0, '4.504')] +[2024-11-08 04:43:13,370][42004] Updated weights for policy 0, policy_version 37566 (0.0029) +[2024-11-08 04:43:17,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6706.4). Total num frames: 153903104. Throughput: 0: 1701.4. Samples: 33468294. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:43:17,934][41694] Avg episode reward: [(0, '4.524')] +[2024-11-08 04:43:18,847][42004] Updated weights for policy 0, policy_version 37576 (0.0037) +[2024-11-08 04:43:22,934][41694] Fps is (10 sec: 5733.7, 60 sec: 6690.0, 300 sec: 6650.8). Total num frames: 153923584. Throughput: 0: 1615.9. Samples: 33475676. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:43:22,937][41694] Avg episode reward: [(0, '4.329')] +[2024-11-08 04:43:27,050][42004] Updated weights for policy 0, policy_version 37586 (0.0063) +[2024-11-08 04:43:27,933][41694] Fps is (10 sec: 5324.3, 60 sec: 6621.8, 300 sec: 6650.8). Total num frames: 153956352. Throughput: 0: 1591.3. Samples: 33485660. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:43:27,936][41694] Avg episode reward: [(0, '4.470')] +[2024-11-08 04:43:32,841][42004] Updated weights for policy 0, policy_version 37596 (0.0038) +[2024-11-08 04:43:32,932][41694] Fps is (10 sec: 6964.0, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 153993216. Throughput: 0: 1581.4. Samples: 33490612. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:43:32,934][41694] Avg episode reward: [(0, '4.422')] +[2024-11-08 04:43:37,932][41694] Fps is (10 sec: 7373.5, 60 sec: 6622.0, 300 sec: 6650.8). Total num frames: 154030080. Throughput: 0: 1676.1. Samples: 33501768. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:43:37,934][41694] Avg episode reward: [(0, '4.340')] +[2024-11-08 04:43:37,943][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000037605_154030080.pth... +[2024-11-08 04:43:38,050][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000037216_152436736.pth +[2024-11-08 04:43:38,324][42004] Updated weights for policy 0, policy_version 37606 (0.0027) +[2024-11-08 04:43:42,932][41694] Fps is (10 sec: 7372.3, 60 sec: 6621.8, 300 sec: 6687.4). Total num frames: 154066944. Throughput: 0: 1691.0. Samples: 33513060. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:43:42,937][41694] Avg episode reward: [(0, '4.419')] +[2024-11-08 04:43:43,802][42004] Updated weights for policy 0, policy_version 37616 (0.0022) +[2024-11-08 04:43:47,932][41694] Fps is (10 sec: 7372.2, 60 sec: 6826.6, 300 sec: 6706.3). Total num frames: 154103808. Throughput: 0: 1683.9. Samples: 33518322. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:43:47,934][41694] Avg episode reward: [(0, '4.130')] +[2024-11-08 04:43:49,529][42004] Updated weights for policy 0, policy_version 37626 (0.0026) +[2024-11-08 04:43:52,931][41694] Fps is (10 sec: 7373.4, 60 sec: 6894.9, 300 sec: 6706.3). Total num frames: 154140672. Throughput: 0: 1718.1. Samples: 33529184. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:43:52,933][41694] Avg episode reward: [(0, '4.355')] +[2024-11-08 04:43:57,268][42004] Updated weights for policy 0, policy_version 37636 (0.0023) +[2024-11-08 04:43:57,932][41694] Fps is (10 sec: 5325.3, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 154157056. Throughput: 0: 1637.4. Samples: 33536222. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:43:57,934][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 04:44:02,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 154193920. Throughput: 0: 1622.1. Samples: 33541290. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:44:02,933][41694] Avg episode reward: [(0, '4.243')] +[2024-11-08 04:44:03,295][42004] Updated weights for policy 0, policy_version 37646 (0.0028) +[2024-11-08 04:44:07,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.2, 300 sec: 6650.8). Total num frames: 154230784. Throughput: 0: 1700.8. Samples: 33552208. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:44:07,933][41694] Avg episode reward: [(0, '4.611')] +[2024-11-08 04:44:08,555][42004] Updated weights for policy 0, policy_version 37656 (0.0028) +[2024-11-08 04:44:12,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6650.8). Total num frames: 154267648. Throughput: 0: 1728.8. Samples: 33563456. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:44:12,934][41694] Avg episode reward: [(0, '4.258')] +[2024-11-08 04:44:13,986][42004] Updated weights for policy 0, policy_version 37666 (0.0026) +[2024-11-08 04:44:17,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.1, 300 sec: 6706.3). Total num frames: 154304512. Throughput: 0: 1748.4. Samples: 33569292. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:44:17,935][41694] Avg episode reward: [(0, '4.398')] +[2024-11-08 04:44:19,726][42004] Updated weights for policy 0, policy_version 37676 (0.0027) +[2024-11-08 04:44:22,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7031.6, 300 sec: 6734.1). Total num frames: 154345472. Throughput: 0: 1745.2. Samples: 33580300. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:44:22,934][41694] Avg episode reward: [(0, '4.437')] +[2024-11-08 04:44:25,181][42004] Updated weights for policy 0, policy_version 37686 (0.0032) +[2024-11-08 04:44:28,855][41694] Fps is (10 sec: 6374.6, 60 sec: 6857.8, 300 sec: 6685.4). Total num frames: 154374144. Throughput: 0: 1697.2. Samples: 33590998. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:44:28,857][41694] Avg episode reward: [(0, '4.569')] +[2024-11-08 04:44:32,935][41694] Fps is (10 sec: 5324.7, 60 sec: 6758.4, 300 sec: 6650.8). Total num frames: 154398720. Throughput: 0: 1658.3. Samples: 33592942. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:44:32,938][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 04:44:32,982][42004] Updated weights for policy 0, policy_version 37696 (0.0037) +[2024-11-08 04:44:37,932][41694] Fps is (10 sec: 6769.0, 60 sec: 6758.4, 300 sec: 6650.8). Total num frames: 154435584. Throughput: 0: 1654.8. Samples: 33603648. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:44:37,934][41694] Avg episode reward: [(0, '4.440')] +[2024-11-08 04:44:38,610][42004] Updated weights for policy 0, policy_version 37706 (0.0039) +[2024-11-08 04:44:42,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6826.7, 300 sec: 6678.6). Total num frames: 154476544. Throughput: 0: 1751.3. Samples: 33615030. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:44:42,934][41694] Avg episode reward: [(0, '4.448')] +[2024-11-08 04:44:44,016][42004] Updated weights for policy 0, policy_version 37716 (0.0022) +[2024-11-08 04:44:47,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.5, 300 sec: 6664.7). Total num frames: 154509312. Throughput: 0: 1755.6. Samples: 33620292. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:44:47,933][41694] Avg episode reward: [(0, '4.447')] +[2024-11-08 04:44:49,647][42004] Updated weights for policy 0, policy_version 37726 (0.0030) +[2024-11-08 04:44:52,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6690.1, 300 sec: 6692.4). Total num frames: 154542080. Throughput: 0: 1755.8. Samples: 33631220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:44:52,933][41694] Avg episode reward: [(0, '4.464')] +[2024-11-08 04:44:55,817][42004] Updated weights for policy 0, policy_version 37736 (0.0032) +[2024-11-08 04:44:57,932][41694] Fps is (10 sec: 6962.9, 60 sec: 7031.4, 300 sec: 6720.3). Total num frames: 154578944. Throughput: 0: 1731.3. Samples: 33641366. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:44:57,936][41694] Avg episode reward: [(0, '4.594')] +[2024-11-08 04:45:02,943][41694] Fps is (10 sec: 5728.0, 60 sec: 6757.1, 300 sec: 6664.4). Total num frames: 154599424. Throughput: 0: 1716.6. Samples: 33646560. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 04:45:02,949][41694] Avg episode reward: [(0, '4.572')] +[2024-11-08 04:45:03,781][42004] Updated weights for policy 0, policy_version 37746 (0.0032) +[2024-11-08 04:45:07,931][41694] Fps is (10 sec: 5325.0, 60 sec: 6690.1, 300 sec: 6650.8). Total num frames: 154632192. Throughput: 0: 1609.9. Samples: 33652746. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 04:45:07,933][41694] Avg episode reward: [(0, '4.411')] +[2024-11-08 04:45:09,772][42004] Updated weights for policy 0, policy_version 37756 (0.0035) +[2024-11-08 04:45:12,931][41694] Fps is (10 sec: 6971.1, 60 sec: 6690.1, 300 sec: 6664.7). Total num frames: 154669056. Throughput: 0: 1642.6. Samples: 33663396. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 04:45:12,938][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 04:45:15,864][42004] Updated weights for policy 0, policy_version 37766 (0.0040) +[2024-11-08 04:45:17,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 154701824. Throughput: 0: 1678.7. Samples: 33668482. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 04:45:17,934][41694] Avg episode reward: [(0, '4.410')] +[2024-11-08 04:45:21,883][42004] Updated weights for policy 0, policy_version 37776 (0.0041) +[2024-11-08 04:45:22,934][41694] Fps is (10 sec: 6961.6, 60 sec: 6553.3, 300 sec: 6650.8). Total num frames: 154738688. Throughput: 0: 1660.9. Samples: 33678392. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:45:22,939][41694] Avg episode reward: [(0, '4.485')] +[2024-11-08 04:45:27,538][42004] Updated weights for policy 0, policy_version 37786 (0.0030) +[2024-11-08 04:45:27,933][41694] Fps is (10 sec: 6962.5, 60 sec: 6725.3, 300 sec: 6692.4). Total num frames: 154771456. Throughput: 0: 1655.0. Samples: 33689508. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:45:27,936][41694] Avg episode reward: [(0, '4.414')] +[2024-11-08 04:45:32,931][41694] Fps is (10 sec: 6555.2, 60 sec: 6758.4, 300 sec: 6692.4). Total num frames: 154804224. Throughput: 0: 1653.9. Samples: 33694718. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:45:32,933][41694] Avg episode reward: [(0, '4.375')] +[2024-11-08 04:45:33,586][42004] Updated weights for policy 0, policy_version 37796 (0.0027) +[2024-11-08 04:45:37,931][41694] Fps is (10 sec: 5325.4, 60 sec: 6485.4, 300 sec: 6636.9). Total num frames: 154824704. Throughput: 0: 1593.9. Samples: 33702944. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:45:37,933][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 04:45:38,043][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000037800_154828800.pth... +[2024-11-08 04:45:38,185][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000037412_153239552.pth +[2024-11-08 04:45:41,842][42004] Updated weights for policy 0, policy_version 37806 (0.0030) +[2024-11-08 04:45:42,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6417.1, 300 sec: 6636.9). Total num frames: 154861568. Throughput: 0: 1552.5. Samples: 33711228. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:45:42,933][41694] Avg episode reward: [(0, '4.390')] +[2024-11-08 04:45:47,369][42004] Updated weights for policy 0, policy_version 37816 (0.0029) +[2024-11-08 04:45:47,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6485.3, 300 sec: 6636.9). Total num frames: 154898432. Throughput: 0: 1555.4. Samples: 33716534. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:45:47,934][41694] Avg episode reward: [(0, '4.373')] +[2024-11-08 04:45:52,822][42004] Updated weights for policy 0, policy_version 37826 (0.0037) +[2024-11-08 04:45:52,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6553.6, 300 sec: 6636.9). Total num frames: 154935296. Throughput: 0: 1672.8. Samples: 33728024. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:45:52,934][41694] Avg episode reward: [(0, '4.461')] +[2024-11-08 04:45:57,932][41694] Fps is (10 sec: 7372.2, 60 sec: 6553.6, 300 sec: 6636.9). Total num frames: 154972160. Throughput: 0: 1685.2. Samples: 33739232. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:45:57,934][41694] Avg episode reward: [(0, '4.406')] +[2024-11-08 04:45:58,206][42004] Updated weights for policy 0, policy_version 37836 (0.0026) +[2024-11-08 04:46:02,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6759.7, 300 sec: 6678.6). Total num frames: 155004928. Throughput: 0: 1691.4. Samples: 33744596. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:46:02,933][41694] Avg episode reward: [(0, '4.518')] +[2024-11-08 04:46:04,001][42004] Updated weights for policy 0, policy_version 37846 (0.0032) +[2024-11-08 04:46:07,931][41694] Fps is (10 sec: 7373.4, 60 sec: 6894.9, 300 sec: 6720.2). Total num frames: 155045888. Throughput: 0: 1712.4. Samples: 33755446. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:46:07,933][41694] Avg episode reward: [(0, '4.373')] +[2024-11-08 04:46:11,931][42004] Updated weights for policy 0, policy_version 37856 (0.0031) +[2024-11-08 04:46:12,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 155062272. Throughput: 0: 1615.5. Samples: 33762204. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:46:12,934][41694] Avg episode reward: [(0, '4.406')] +[2024-11-08 04:46:17,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 155095040. Throughput: 0: 1601.7. Samples: 33766794. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:46:17,935][41694] Avg episode reward: [(0, '4.506')] +[2024-11-08 04:46:18,333][42004] Updated weights for policy 0, policy_version 37866 (0.0029) +[2024-11-08 04:46:22,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6485.6, 300 sec: 6623.0). Total num frames: 155127808. Throughput: 0: 1648.0. Samples: 33777104. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:46:22,934][41694] Avg episode reward: [(0, '4.421')] +[2024-11-08 04:46:24,378][42004] Updated weights for policy 0, policy_version 37876 (0.0044) +[2024-11-08 04:46:27,933][41694] Fps is (10 sec: 6962.6, 60 sec: 6553.6, 300 sec: 6623.0). Total num frames: 155164672. Throughput: 0: 1684.0. Samples: 33787010. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:46:27,934][41694] Avg episode reward: [(0, '4.114')] +[2024-11-08 04:46:30,013][42004] Updated weights for policy 0, policy_version 37886 (0.0025) +[2024-11-08 04:46:32,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6670.7). Total num frames: 155201536. Throughput: 0: 1693.4. Samples: 33792738. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:46:32,934][41694] Avg episode reward: [(0, '4.451')] +[2024-11-08 04:46:35,686][42004] Updated weights for policy 0, policy_version 37896 (0.0027) +[2024-11-08 04:46:37,932][41694] Fps is (10 sec: 7373.6, 60 sec: 6894.9, 300 sec: 6692.4). Total num frames: 155238400. Throughput: 0: 1684.5. Samples: 33803828. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:46:37,933][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 04:46:41,252][42004] Updated weights for policy 0, policy_version 37906 (0.0026) +[2024-11-08 04:46:42,939][41694] Fps is (10 sec: 6958.3, 60 sec: 6825.8, 300 sec: 6678.4). Total num frames: 155271168. Throughput: 0: 1676.3. Samples: 33814678. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:46:42,942][41694] Avg episode reward: [(0, '4.259')] +[2024-11-08 04:46:47,932][41694] Fps is (10 sec: 5324.6, 60 sec: 6553.6, 300 sec: 6623.0). Total num frames: 155291648. Throughput: 0: 1596.7. Samples: 33816446. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:46:47,934][41694] Avg episode reward: [(0, '4.380')] +[2024-11-08 04:46:49,494][42004] Updated weights for policy 0, policy_version 37916 (0.0030) +[2024-11-08 04:46:52,931][41694] Fps is (10 sec: 5328.6, 60 sec: 6485.3, 300 sec: 6609.1). Total num frames: 155324416. Throughput: 0: 1562.4. Samples: 33825756. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:46:52,934][41694] Avg episode reward: [(0, '4.421')] +[2024-11-08 04:46:55,285][42004] Updated weights for policy 0, policy_version 37926 (0.0030) +[2024-11-08 04:46:57,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6485.4, 300 sec: 6609.1). Total num frames: 155361280. Throughput: 0: 1662.5. Samples: 33837018. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:46:57,933][41694] Avg episode reward: [(0, '4.436')] +[2024-11-08 04:47:00,562][42004] Updated weights for policy 0, policy_version 37936 (0.0040) +[2024-11-08 04:47:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 155398144. Throughput: 0: 1691.7. Samples: 33842920. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:47:02,933][41694] Avg episode reward: [(0, '4.429')] +[2024-11-08 04:47:06,419][42004] Updated weights for policy 0, policy_version 37946 (0.0040) +[2024-11-08 04:47:07,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6485.3, 300 sec: 6706.3). Total num frames: 155435008. Throughput: 0: 1698.7. Samples: 33853546. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:47:07,933][41694] Avg episode reward: [(0, '4.588')] +[2024-11-08 04:47:11,889][42004] Updated weights for policy 0, policy_version 37956 (0.0030) +[2024-11-08 04:47:12,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 155471872. Throughput: 0: 1729.4. Samples: 33864830. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:47:12,933][41694] Avg episode reward: [(0, '4.448')] +[2024-11-08 04:47:18,823][41694] Fps is (10 sec: 6393.6, 60 sec: 6726.8, 300 sec: 6700.0). Total num frames: 155504640. Throughput: 0: 1687.2. Samples: 33870164. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:47:18,825][41694] Avg episode reward: [(0, '4.229')] +[2024-11-08 04:47:19,308][42004] Updated weights for policy 0, policy_version 37966 (0.0023) +[2024-11-08 04:47:22,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.2, 300 sec: 6678.6). Total num frames: 155529216. Throughput: 0: 1632.3. Samples: 33877280. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:47:22,933][41694] Avg episode reward: [(0, '4.268')] +[2024-11-08 04:47:25,764][42004] Updated weights for policy 0, policy_version 37976 (0.0045) +[2024-11-08 04:47:27,931][41694] Fps is (10 sec: 6745.0, 60 sec: 6690.3, 300 sec: 6678.6). Total num frames: 155566080. Throughput: 0: 1618.0. Samples: 33887476. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:47:27,933][41694] Avg episode reward: [(0, '4.420')] +[2024-11-08 04:47:31,110][42004] Updated weights for policy 0, policy_version 37986 (0.0026) +[2024-11-08 04:47:32,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 155602944. Throughput: 0: 1704.0. Samples: 33893124. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:47:32,933][41694] Avg episode reward: [(0, '4.461')] +[2024-11-08 04:47:36,487][42004] Updated weights for policy 0, policy_version 37996 (0.0027) +[2024-11-08 04:47:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 155639808. Throughput: 0: 1746.9. Samples: 33904368. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:47:37,933][41694] Avg episode reward: [(0, '4.609')] +[2024-11-08 04:47:37,948][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000037998_155639808.pth... +[2024-11-08 04:47:38,096][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000037605_154030080.pth +[2024-11-08 04:47:42,148][42004] Updated weights for policy 0, policy_version 38006 (0.0030) +[2024-11-08 04:47:42,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6759.2, 300 sec: 6720.2). Total num frames: 155676672. Throughput: 0: 1745.7. Samples: 33915574. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:47:42,934][41694] Avg episode reward: [(0, '4.396')] +[2024-11-08 04:47:47,393][42004] Updated weights for policy 0, policy_version 38016 (0.0031) +[2024-11-08 04:47:47,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 6734.1). Total num frames: 155713536. Throughput: 0: 1742.8. Samples: 33921346. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:47:47,933][41694] Avg episode reward: [(0, '4.306')] +[2024-11-08 04:47:52,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6894.9, 300 sec: 6692.4). Total num frames: 155738112. Throughput: 0: 1752.9. Samples: 33932426. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:47:52,934][41694] Avg episode reward: [(0, '4.550')] +[2024-11-08 04:47:54,893][42004] Updated weights for policy 0, policy_version 38026 (0.0027) +[2024-11-08 04:47:57,933][41694] Fps is (10 sec: 6143.1, 60 sec: 6894.7, 300 sec: 6706.3). Total num frames: 155774976. Throughput: 0: 1663.6. Samples: 33939694. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:47:57,936][41694] Avg episode reward: [(0, '4.689')] +[2024-11-08 04:48:00,695][42004] Updated weights for policy 0, policy_version 38036 (0.0025) +[2024-11-08 04:48:02,932][41694] Fps is (10 sec: 6962.6, 60 sec: 6826.6, 300 sec: 6706.3). Total num frames: 155807744. Throughput: 0: 1700.1. Samples: 33945154. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:48:02,936][41694] Avg episode reward: [(0, '4.429')] +[2024-11-08 04:48:06,968][42004] Updated weights for policy 0, policy_version 38046 (0.0031) +[2024-11-08 04:48:07,932][41694] Fps is (10 sec: 6554.6, 60 sec: 6758.4, 300 sec: 6692.4). Total num frames: 155840512. Throughput: 0: 1727.4. Samples: 33955014. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:48:07,934][41694] Avg episode reward: [(0, '4.506')] +[2024-11-08 04:48:12,247][42004] Updated weights for policy 0, policy_version 38056 (0.0023) +[2024-11-08 04:48:12,932][41694] Fps is (10 sec: 7373.2, 60 sec: 6826.6, 300 sec: 6706.3). Total num frames: 155881472. Throughput: 0: 1756.5. Samples: 33966518. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:48:12,935][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 04:48:17,531][42004] Updated weights for policy 0, policy_version 38066 (0.0025) +[2024-11-08 04:48:17,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6998.9, 300 sec: 6761.9). Total num frames: 155918336. Throughput: 0: 1759.2. Samples: 33972290. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:48:17,934][41694] Avg episode reward: [(0, '4.400')] +[2024-11-08 04:48:22,742][42004] Updated weights for policy 0, policy_version 38076 (0.0033) +[2024-11-08 04:48:22,931][41694] Fps is (10 sec: 7782.7, 60 sec: 7168.0, 300 sec: 6789.7). Total num frames: 155959296. Throughput: 0: 1772.5. Samples: 33984128. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:48:22,936][41694] Avg episode reward: [(0, '4.324')] +[2024-11-08 04:48:27,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6963.2, 300 sec: 6748.0). Total num frames: 155983872. Throughput: 0: 1700.0. Samples: 33992076. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:48:27,934][41694] Avg episode reward: [(0, '4.311')] +[2024-11-08 04:48:29,971][42004] Updated weights for policy 0, policy_version 38086 (0.0025) +[2024-11-08 04:48:32,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6894.9, 300 sec: 6734.1). Total num frames: 156016640. Throughput: 0: 1695.2. Samples: 33997632. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:48:32,934][41694] Avg episode reward: [(0, '4.499')] +[2024-11-08 04:48:36,011][42004] Updated weights for policy 0, policy_version 38096 (0.0039) +[2024-11-08 04:48:37,931][41694] Fps is (10 sec: 6553.8, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 156049408. Throughput: 0: 1673.3. Samples: 34007724. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:48:37,937][41694] Avg episode reward: [(0, '4.292')] +[2024-11-08 04:48:41,852][42004] Updated weights for policy 0, policy_version 38106 (0.0026) +[2024-11-08 04:48:42,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6894.9, 300 sec: 6734.1). Total num frames: 156090368. Throughput: 0: 1750.3. Samples: 34018454. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:48:42,933][41694] Avg episode reward: [(0, '4.376')] +[2024-11-08 04:48:47,038][42004] Updated weights for policy 0, policy_version 38116 (0.0025) +[2024-11-08 04:48:47,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6894.9, 300 sec: 6734.1). Total num frames: 156127232. Throughput: 0: 1757.7. Samples: 34024250. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:48:47,934][41694] Avg episode reward: [(0, '4.519')] +[2024-11-08 04:48:52,344][42004] Updated weights for policy 0, policy_version 38126 (0.0040) +[2024-11-08 04:48:52,931][41694] Fps is (10 sec: 7782.6, 60 sec: 7168.0, 300 sec: 6817.4). Total num frames: 156168192. Throughput: 0: 1799.2. Samples: 34035978. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:48:52,934][41694] Avg episode reward: [(0, '4.391')] +[2024-11-08 04:48:57,711][42004] Updated weights for policy 0, policy_version 38136 (0.0030) +[2024-11-08 04:48:57,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7168.2, 300 sec: 6817.4). Total num frames: 156205056. Throughput: 0: 1799.4. Samples: 34047490. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:48:57,933][41694] Avg episode reward: [(0, '4.569')] +[2024-11-08 04:49:02,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6963.3, 300 sec: 6761.9). Total num frames: 156225536. Throughput: 0: 1756.2. Samples: 34051320. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:49:02,933][41694] Avg episode reward: [(0, '4.554')] +[2024-11-08 04:49:05,634][42004] Updated weights for policy 0, policy_version 38146 (0.0030) +[2024-11-08 04:49:07,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6963.2, 300 sec: 6748.0). Total num frames: 156258304. Throughput: 0: 1679.7. Samples: 34059714. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:49:07,933][41694] Avg episode reward: [(0, '4.546')] +[2024-11-08 04:49:11,518][42004] Updated weights for policy 0, policy_version 38156 (0.0028) +[2024-11-08 04:49:12,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6895.0, 300 sec: 6748.0). Total num frames: 156295168. Throughput: 0: 1732.0. Samples: 34070014. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:49:12,934][41694] Avg episode reward: [(0, '4.269')] +[2024-11-08 04:49:17,180][42004] Updated weights for policy 0, policy_version 38166 (0.0024) +[2024-11-08 04:49:17,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6894.9, 300 sec: 6734.1). Total num frames: 156332032. Throughput: 0: 1723.9. Samples: 34075206. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:49:17,933][41694] Avg episode reward: [(0, '4.286')] +[2024-11-08 04:49:22,259][42004] Updated weights for policy 0, policy_version 38176 (0.0023) +[2024-11-08 04:49:22,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6894.9, 300 sec: 6797.0). Total num frames: 156372992. Throughput: 0: 1767.0. Samples: 34087240. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:49:22,935][41694] Avg episode reward: [(0, '4.646')] +[2024-11-08 04:49:27,670][42004] Updated weights for policy 0, policy_version 38186 (0.0027) +[2024-11-08 04:49:27,932][41694] Fps is (10 sec: 7782.1, 60 sec: 7099.7, 300 sec: 6817.4). Total num frames: 156409856. Throughput: 0: 1789.4. Samples: 34098978. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:49:27,936][41694] Avg episode reward: [(0, '4.470')] +[2024-11-08 04:49:32,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7168.0, 300 sec: 6817.4). Total num frames: 156446720. Throughput: 0: 1783.3. Samples: 34104498. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:49:32,934][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 04:49:33,119][42004] Updated weights for policy 0, policy_version 38196 (0.0025) +[2024-11-08 04:49:37,932][41694] Fps is (10 sec: 6144.3, 60 sec: 7031.4, 300 sec: 6761.9). Total num frames: 156471296. Throughput: 0: 1693.5. Samples: 34112186. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:49:37,933][41694] Avg episode reward: [(0, '4.464')] +[2024-11-08 04:49:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000038201_156471296.pth... +[2024-11-08 04:49:38,067][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000037800_154828800.pth +[2024-11-08 04:49:40,642][42004] Updated weights for policy 0, policy_version 38206 (0.0027) +[2024-11-08 04:49:42,931][41694] Fps is (10 sec: 6144.2, 60 sec: 6963.2, 300 sec: 6775.8). Total num frames: 156508160. Throughput: 0: 1676.3. Samples: 34122922. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:49:42,933][41694] Avg episode reward: [(0, '4.395')] +[2024-11-08 04:49:46,699][42004] Updated weights for policy 0, policy_version 38216 (0.0031) +[2024-11-08 04:49:47,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 6775.8). Total num frames: 156540928. Throughput: 0: 1699.2. Samples: 34127786. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:49:47,933][41694] Avg episode reward: [(0, '4.524')] +[2024-11-08 04:49:52,204][42004] Updated weights for policy 0, policy_version 38226 (0.0033) +[2024-11-08 04:49:52,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6826.6, 300 sec: 6775.8). Total num frames: 156577792. Throughput: 0: 1753.9. Samples: 34138642. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:49:52,935][41694] Avg episode reward: [(0, '4.409')] +[2024-11-08 04:49:57,374][42004] Updated weights for policy 0, policy_version 38236 (0.0028) +[2024-11-08 04:49:57,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6894.9, 300 sec: 6845.4). Total num frames: 156618752. Throughput: 0: 1788.4. Samples: 34150492. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:49:57,933][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 04:50:02,868][42004] Updated weights for policy 0, policy_version 38246 (0.0030) +[2024-11-08 04:50:02,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7168.0, 300 sec: 6859.1). Total num frames: 156655616. Throughput: 0: 1801.2. Samples: 34156258. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:50:02,934][41694] Avg episode reward: [(0, '4.253')] +[2024-11-08 04:50:09,338][41694] Fps is (10 sec: 6104.7, 60 sec: 7003.9, 300 sec: 6812.7). Total num frames: 156688384. Throughput: 0: 1722.4. Samples: 34167168. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:50:09,340][41694] Avg episode reward: [(0, '4.411')] +[2024-11-08 04:50:10,198][42004] Updated weights for policy 0, policy_version 38256 (0.0022) +[2024-11-08 04:50:12,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6963.2, 300 sec: 6817.4). Total num frames: 156712960. Throughput: 0: 1675.2. Samples: 34174360. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:50:12,933][41694] Avg episode reward: [(0, '4.407')] +[2024-11-08 04:50:16,504][42004] Updated weights for policy 0, policy_version 38266 (0.0038) +[2024-11-08 04:50:17,932][41694] Fps is (10 sec: 6196.0, 60 sec: 6826.7, 300 sec: 6789.7). Total num frames: 156741632. Throughput: 0: 1665.5. Samples: 34179444. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:50:17,933][41694] Avg episode reward: [(0, '4.618')] +[2024-11-08 04:50:22,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6690.1, 300 sec: 6789.7). Total num frames: 156774400. Throughput: 0: 1693.3. Samples: 34188384. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:50:22,937][41694] Avg episode reward: [(0, '4.481')] +[2024-11-08 04:50:23,129][42004] Updated weights for policy 0, policy_version 38276 (0.0035) +[2024-11-08 04:50:27,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 156815360. Throughput: 0: 1711.6. Samples: 34199946. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:50:27,935][41694] Avg episode reward: [(0, '4.404')] +[2024-11-08 04:50:28,310][42004] Updated weights for policy 0, policy_version 38286 (0.0026) +[2024-11-08 04:50:32,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.4, 300 sec: 6872.9). Total num frames: 156852224. Throughput: 0: 1721.6. Samples: 34205260. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:50:32,934][41694] Avg episode reward: [(0, '4.574')] +[2024-11-08 04:50:33,954][42004] Updated weights for policy 0, policy_version 38296 (0.0034) +[2024-11-08 04:50:37,932][41694] Fps is (10 sec: 7373.1, 60 sec: 6963.2, 300 sec: 6872.9). Total num frames: 156889088. Throughput: 0: 1732.6. Samples: 34216608. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:50:37,934][41694] Avg episode reward: [(0, '4.380')] +[2024-11-08 04:50:39,566][42004] Updated weights for policy 0, policy_version 38306 (0.0028) +[2024-11-08 04:50:43,356][41694] Fps is (10 sec: 5893.7, 60 sec: 6710.9, 300 sec: 6821.5). Total num frames: 156913664. Throughput: 0: 1578.9. Samples: 34222214. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:50:43,358][41694] Avg episode reward: [(0, '4.406')] +[2024-11-08 04:50:47,022][42004] Updated weights for policy 0, policy_version 38316 (0.0034) +[2024-11-08 04:50:47,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 156946432. Throughput: 0: 1624.4. Samples: 34229354. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:50:47,934][41694] Avg episode reward: [(0, '4.571')] +[2024-11-08 04:50:52,652][42004] Updated weights for policy 0, policy_version 38326 (0.0027) +[2024-11-08 04:50:52,932][41694] Fps is (10 sec: 7272.2, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 156983296. Throughput: 0: 1681.8. Samples: 34240486. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:50:52,933][41694] Avg episode reward: [(0, '4.437')] +[2024-11-08 04:50:57,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6831.3). Total num frames: 157020160. Throughput: 0: 1707.6. Samples: 34251204. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:50:57,933][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 04:50:58,163][42004] Updated weights for policy 0, policy_version 38336 (0.0029) +[2024-11-08 04:51:02,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6817.4). Total num frames: 157057024. Throughput: 0: 1724.8. Samples: 34257058. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:51:02,934][41694] Avg episode reward: [(0, '4.263')] +[2024-11-08 04:51:03,612][42004] Updated weights for policy 0, policy_version 38346 (0.0027) +[2024-11-08 04:51:07,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6990.5, 300 sec: 6900.7). Total num frames: 157097984. Throughput: 0: 1783.9. Samples: 34268660. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:51:07,933][41694] Avg episode reward: [(0, '4.631')] +[2024-11-08 04:51:08,878][42004] Updated weights for policy 0, policy_version 38356 (0.0030) +[2024-11-08 04:51:12,933][41694] Fps is (10 sec: 7372.0, 60 sec: 6963.1, 300 sec: 6900.7). Total num frames: 157130752. Throughput: 0: 1752.6. Samples: 34278814. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:51:12,936][41694] Avg episode reward: [(0, '4.325')] +[2024-11-08 04:51:15,261][42004] Updated weights for policy 0, policy_version 38366 (0.0030) +[2024-11-08 04:51:17,934][41694] Fps is (10 sec: 5324.4, 60 sec: 6826.6, 300 sec: 6859.0). Total num frames: 157151232. Throughput: 0: 1744.6. Samples: 34283768. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:51:17,937][41694] Avg episode reward: [(0, '4.344')] +[2024-11-08 04:51:22,693][42004] Updated weights for policy 0, policy_version 38376 (0.0024) +[2024-11-08 04:51:22,932][41694] Fps is (10 sec: 5735.0, 60 sec: 6895.0, 300 sec: 6859.1). Total num frames: 157188096. Throughput: 0: 1659.8. Samples: 34291300. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:51:22,934][41694] Avg episode reward: [(0, '4.506')] +[2024-11-08 04:51:27,932][41694] Fps is (10 sec: 6963.8, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 157220864. Throughput: 0: 1790.8. Samples: 34302038. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:51:27,934][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 04:51:28,503][42004] Updated weights for policy 0, policy_version 38386 (0.0022) +[2024-11-08 04:51:32,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6859.1). Total num frames: 157261824. Throughput: 0: 1734.4. Samples: 34307400. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:51:32,933][41694] Avg episode reward: [(0, '4.521')] +[2024-11-08 04:51:33,817][42004] Updated weights for policy 0, policy_version 38396 (0.0024) +[2024-11-08 04:51:37,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6826.7, 300 sec: 6873.1). Total num frames: 157298688. Throughput: 0: 1747.2. Samples: 34319110. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:51:37,933][41694] Avg episode reward: [(0, '4.400')] +[2024-11-08 04:51:38,017][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000038404_157302784.pth... +[2024-11-08 04:51:38,108][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000037998_155639808.pth +[2024-11-08 04:51:39,102][42004] Updated weights for policy 0, policy_version 38406 (0.0034) +[2024-11-08 04:51:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7081.6, 300 sec: 6928.5). Total num frames: 157335552. Throughput: 0: 1762.1. Samples: 34330498. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:51:42,933][41694] Avg episode reward: [(0, '4.164')] +[2024-11-08 04:51:44,704][42004] Updated weights for policy 0, policy_version 38416 (0.0029) +[2024-11-08 04:51:47,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7099.7, 300 sec: 6942.4). Total num frames: 157372416. Throughput: 0: 1755.8. Samples: 34336068. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:51:47,934][41694] Avg episode reward: [(0, '4.365')] +[2024-11-08 04:51:52,451][42004] Updated weights for policy 0, policy_version 38426 (0.0028) +[2024-11-08 04:51:52,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6826.7, 300 sec: 6886.8). Total num frames: 157392896. Throughput: 0: 1679.6. Samples: 34344240. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:51:52,934][41694] Avg episode reward: [(0, '4.195')] +[2024-11-08 04:51:57,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6826.7, 300 sec: 6886.8). Total num frames: 157429760. Throughput: 0: 1671.7. Samples: 34354040. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:51:57,933][41694] Avg episode reward: [(0, '4.239')] +[2024-11-08 04:51:58,118][42004] Updated weights for policy 0, policy_version 38436 (0.0030) +[2024-11-08 04:52:02,937][41694] Fps is (10 sec: 6959.1, 60 sec: 6757.7, 300 sec: 6872.8). Total num frames: 157462528. Throughput: 0: 1667.5. Samples: 34358816. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:52:02,940][41694] Avg episode reward: [(0, '4.476')] +[2024-11-08 04:52:04,952][42004] Updated weights for policy 0, policy_version 38446 (0.0045) +[2024-11-08 04:52:07,932][41694] Fps is (10 sec: 6553.1, 60 sec: 6621.8, 300 sec: 6859.0). Total num frames: 157495296. Throughput: 0: 1711.0. Samples: 34368296. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:52:07,934][41694] Avg episode reward: [(0, '4.584')] +[2024-11-08 04:52:11,109][42004] Updated weights for policy 0, policy_version 38456 (0.0031) +[2024-11-08 04:52:12,931][41694] Fps is (10 sec: 6147.7, 60 sec: 6553.7, 300 sec: 6865.9). Total num frames: 157523968. Throughput: 0: 1688.2. Samples: 34378006. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:52:12,933][41694] Avg episode reward: [(0, '4.347')] +[2024-11-08 04:52:16,986][42004] Updated weights for policy 0, policy_version 38466 (0.0029) +[2024-11-08 04:52:17,931][41694] Fps is (10 sec: 6554.1, 60 sec: 6826.8, 300 sec: 6886.8). Total num frames: 157560832. Throughput: 0: 1682.4. Samples: 34383110. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:52:17,934][41694] Avg episode reward: [(0, '4.271')] +[2024-11-08 04:52:22,721][42004] Updated weights for policy 0, policy_version 38476 (0.0031) +[2024-11-08 04:52:22,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6886.8). Total num frames: 157597696. Throughput: 0: 1659.5. Samples: 34393786. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:52:22,933][41694] Avg episode reward: [(0, '4.353')] +[2024-11-08 04:52:27,814][42004] Updated weights for policy 0, policy_version 38486 (0.0029) +[2024-11-08 04:52:27,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6963.2, 300 sec: 6900.7). Total num frames: 157638656. Throughput: 0: 1671.0. Samples: 34405694. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:52:27,933][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 04:52:32,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6894.9, 300 sec: 6900.7). Total num frames: 157675520. Throughput: 0: 1665.3. Samples: 34411008. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:52:32,933][41694] Avg episode reward: [(0, '4.605')] +[2024-11-08 04:52:33,452][42004] Updated weights for policy 0, policy_version 38496 (0.0024) +[2024-11-08 04:52:37,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6886.8). Total num frames: 157708288. Throughput: 0: 1721.2. Samples: 34421694. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:52:37,934][41694] Avg episode reward: [(0, '4.554')] +[2024-11-08 04:52:39,106][42004] Updated weights for policy 0, policy_version 38506 (0.0037) +[2024-11-08 04:52:42,934][41694] Fps is (10 sec: 7370.6, 60 sec: 6894.6, 300 sec: 6900.7). Total num frames: 157749248. Throughput: 0: 1755.9. Samples: 34433062. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:52:42,938][41694] Avg episode reward: [(0, '4.199')] +[2024-11-08 04:52:44,578][42004] Updated weights for policy 0, policy_version 38516 (0.0026) +[2024-11-08 04:52:47,932][41694] Fps is (10 sec: 7372.4, 60 sec: 6826.6, 300 sec: 6928.5). Total num frames: 157782016. Throughput: 0: 1779.1. Samples: 34438868. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:52:47,934][41694] Avg episode reward: [(0, '4.515')] +[2024-11-08 04:52:50,520][42004] Updated weights for policy 0, policy_version 38526 (0.0033) +[2024-11-08 04:52:52,932][41694] Fps is (10 sec: 6964.9, 60 sec: 7099.7, 300 sec: 6928.5). Total num frames: 157818880. Throughput: 0: 1795.7. Samples: 34449102. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:52:52,935][41694] Avg episode reward: [(0, '4.325')] +[2024-11-08 04:52:57,489][42004] Updated weights for policy 0, policy_version 38536 (0.0028) +[2024-11-08 04:52:57,934][41694] Fps is (10 sec: 6142.7, 60 sec: 6894.6, 300 sec: 6900.7). Total num frames: 157843456. Throughput: 0: 1766.6. Samples: 34457506. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:52:57,939][41694] Avg episode reward: [(0, '4.492')] +[2024-11-08 04:53:02,932][41694] Fps is (10 sec: 6144.3, 60 sec: 6963.9, 300 sec: 6914.6). Total num frames: 157880320. Throughput: 0: 1769.2. Samples: 34462722. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:53:02,934][41694] Avg episode reward: [(0, '4.422')] +[2024-11-08 04:53:03,512][42004] Updated weights for policy 0, policy_version 38546 (0.0034) +[2024-11-08 04:53:07,932][41694] Fps is (10 sec: 6965.0, 60 sec: 6963.3, 300 sec: 6886.8). Total num frames: 157913088. Throughput: 0: 1761.4. Samples: 34473048. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:53:07,934][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 04:53:09,488][42004] Updated weights for policy 0, policy_version 38556 (0.0032) +[2024-11-08 04:53:12,932][41694] Fps is (10 sec: 6963.0, 60 sec: 7099.7, 300 sec: 6886.8). Total num frames: 157949952. Throughput: 0: 1730.6. Samples: 34483572. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:53:12,934][41694] Avg episode reward: [(0, '4.611')] +[2024-11-08 04:53:15,032][42004] Updated weights for policy 0, policy_version 38566 (0.0026) +[2024-11-08 04:53:17,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7099.7, 300 sec: 6872.9). Total num frames: 157986816. Throughput: 0: 1735.2. Samples: 34489092. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:53:17,934][41694] Avg episode reward: [(0, '4.497')] +[2024-11-08 04:53:20,785][42004] Updated weights for policy 0, policy_version 38576 (0.0021) +[2024-11-08 04:53:22,932][41694] Fps is (10 sec: 6963.4, 60 sec: 7031.5, 300 sec: 6900.7). Total num frames: 158019584. Throughput: 0: 1737.6. Samples: 34499886. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:53:22,933][41694] Avg episode reward: [(0, '4.340')] +[2024-11-08 04:53:26,472][42004] Updated weights for policy 0, policy_version 38586 (0.0038) +[2024-11-08 04:53:27,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6963.2, 300 sec: 6914.6). Total num frames: 158056448. Throughput: 0: 1724.5. Samples: 34510658. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:53:27,934][41694] Avg episode reward: [(0, '4.508')] +[2024-11-08 04:53:32,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6758.4, 300 sec: 6886.8). Total num frames: 158081024. Throughput: 0: 1677.4. Samples: 34514352. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:53:32,933][41694] Avg episode reward: [(0, '4.369')] +[2024-11-08 04:53:33,883][42004] Updated weights for policy 0, policy_version 38596 (0.0029) +[2024-11-08 04:53:37,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6758.4, 300 sec: 6859.1). Total num frames: 158113792. Throughput: 0: 1653.5. Samples: 34523508. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:53:37,934][41694] Avg episode reward: [(0, '4.602')] +[2024-11-08 04:53:38,075][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000038603_158117888.pth... +[2024-11-08 04:53:38,226][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000038201_156471296.pth +[2024-11-08 04:53:40,224][42004] Updated weights for policy 0, policy_version 38606 (0.0034) +[2024-11-08 04:53:42,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6622.2, 300 sec: 6845.2). Total num frames: 158146560. Throughput: 0: 1673.0. Samples: 34532786. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:53:42,933][41694] Avg episode reward: [(0, '4.518')] +[2024-11-08 04:53:46,242][42004] Updated weights for policy 0, policy_version 38616 (0.0030) +[2024-11-08 04:53:47,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6622.0, 300 sec: 6817.4). Total num frames: 158179328. Throughput: 0: 1671.3. Samples: 34537930. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:53:47,934][41694] Avg episode reward: [(0, '4.225')] +[2024-11-08 04:53:51,793][42004] Updated weights for policy 0, policy_version 38626 (0.0031) +[2024-11-08 04:53:52,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.2, 300 sec: 6831.3). Total num frames: 158220288. Throughput: 0: 1692.9. Samples: 34549230. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:53:52,934][41694] Avg episode reward: [(0, '4.390')] +[2024-11-08 04:53:57,287][42004] Updated weights for policy 0, policy_version 38636 (0.0026) +[2024-11-08 04:53:57,931][41694] Fps is (10 sec: 7782.3, 60 sec: 6895.3, 300 sec: 6886.8). Total num frames: 158257152. Throughput: 0: 1704.9. Samples: 34560294. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:53:57,933][41694] Avg episode reward: [(0, '4.578')] +[2024-11-08 04:54:03,896][41694] Fps is (10 sec: 6350.9, 60 sec: 6718.7, 300 sec: 6864.4). Total num frames: 158289920. Throughput: 0: 1668.7. Samples: 34565792. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:54:03,898][41694] Avg episode reward: [(0, '4.338')] +[2024-11-08 04:54:04,064][42004] Updated weights for policy 0, policy_version 38646 (0.0024) +[2024-11-08 04:54:07,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6826.7, 300 sec: 6873.0). Total num frames: 158322688. Throughput: 0: 1663.7. Samples: 34574754. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:54:07,933][41694] Avg episode reward: [(0, '4.421')] +[2024-11-08 04:54:09,474][42004] Updated weights for policy 0, policy_version 38656 (0.0023) +[2024-11-08 04:54:12,932][41694] Fps is (10 sec: 7706.1, 60 sec: 6826.7, 300 sec: 6873.0). Total num frames: 158359552. Throughput: 0: 1677.4. Samples: 34586140. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:54:12,933][41694] Avg episode reward: [(0, '4.528')] +[2024-11-08 04:54:15,310][42004] Updated weights for policy 0, policy_version 38666 (0.0030) +[2024-11-08 04:54:17,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 158392320. Throughput: 0: 1702.2. Samples: 34590952. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:54:17,935][41694] Avg episode reward: [(0, '4.514')] +[2024-11-08 04:54:20,844][42004] Updated weights for policy 0, policy_version 38676 (0.0028) +[2024-11-08 04:54:22,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6826.7, 300 sec: 6845.2). Total num frames: 158429184. Throughput: 0: 1748.8. Samples: 34602204. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:54:22,933][41694] Avg episode reward: [(0, '4.536')] +[2024-11-08 04:54:26,271][42004] Updated weights for policy 0, policy_version 38686 (0.0027) +[2024-11-08 04:54:27,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6845.2). Total num frames: 158466048. Throughput: 0: 1790.0. Samples: 34613336. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:54:27,933][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 04:54:31,843][42004] Updated weights for policy 0, policy_version 38696 (0.0023) +[2024-11-08 04:54:32,932][41694] Fps is (10 sec: 7782.1, 60 sec: 7099.7, 300 sec: 6900.7). Total num frames: 158507008. Throughput: 0: 1793.0. Samples: 34618616. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:54:32,935][41694] Avg episode reward: [(0, '4.242')] +[2024-11-08 04:54:38,017][41694] Fps is (10 sec: 6091.9, 60 sec: 6885.1, 300 sec: 6843.2). Total num frames: 158527488. Throughput: 0: 1787.9. Samples: 34629840. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:54:38,019][41694] Avg episode reward: [(0, '4.455')] +[2024-11-08 04:54:39,448][42004] Updated weights for policy 0, policy_version 38706 (0.0034) +[2024-11-08 04:54:42,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6894.9, 300 sec: 6845.2). Total num frames: 158560256. Throughput: 0: 1704.0. Samples: 34636972. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:54:42,934][41694] Avg episode reward: [(0, '4.311')] +[2024-11-08 04:54:45,316][42004] Updated weights for policy 0, policy_version 38716 (0.0039) +[2024-11-08 04:54:47,934][41694] Fps is (10 sec: 7021.9, 60 sec: 6962.9, 300 sec: 6845.1). Total num frames: 158597120. Throughput: 0: 1733.7. Samples: 34642142. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:54:47,936][41694] Avg episode reward: [(0, '4.593')] +[2024-11-08 04:54:51,537][42004] Updated weights for policy 0, policy_version 38726 (0.0024) +[2024-11-08 04:54:52,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 158629888. Throughput: 0: 1720.1. Samples: 34652158. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:54:52,959][41694] Avg episode reward: [(0, '4.666')] +[2024-11-08 04:54:56,885][42004] Updated weights for policy 0, policy_version 38736 (0.0030) +[2024-11-08 04:54:57,932][41694] Fps is (10 sec: 6964.5, 60 sec: 6826.6, 300 sec: 6817.4). Total num frames: 158666752. Throughput: 0: 1716.4. Samples: 34663380. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:54:57,934][41694] Avg episode reward: [(0, '4.554')] +[2024-11-08 04:55:02,754][42004] Updated weights for policy 0, policy_version 38746 (0.0025) +[2024-11-08 04:55:02,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7007.5, 300 sec: 6864.0). Total num frames: 158703616. Throughput: 0: 1731.5. Samples: 34668870. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:55:02,933][41694] Avg episode reward: [(0, '4.648')] +[2024-11-08 04:55:07,932][41694] Fps is (10 sec: 6963.4, 60 sec: 6894.9, 300 sec: 6859.1). Total num frames: 158736384. Throughput: 0: 1701.1. Samples: 34678754. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:55:07,934][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 04:55:08,529][42004] Updated weights for policy 0, policy_version 38756 (0.0031) +[2024-11-08 04:55:12,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6758.4, 300 sec: 6859.1). Total num frames: 158765056. Throughput: 0: 1650.9. Samples: 34687628. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:55:12,934][41694] Avg episode reward: [(0, '4.544')] +[2024-11-08 04:55:15,914][42004] Updated weights for policy 0, policy_version 38766 (0.0020) +[2024-11-08 04:55:17,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6690.1, 300 sec: 6845.2). Total num frames: 158793728. Throughput: 0: 1645.9. Samples: 34692680. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:55:17,934][41694] Avg episode reward: [(0, '4.633')] +[2024-11-08 04:55:22,933][41694] Fps is (10 sec: 4914.8, 60 sec: 6416.9, 300 sec: 6775.7). Total num frames: 158814208. Throughput: 0: 1554.3. Samples: 34699654. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:55:22,936][41694] Avg episode reward: [(0, '4.575')] +[2024-11-08 04:55:24,503][42004] Updated weights for policy 0, policy_version 38776 (0.0041) +[2024-11-08 04:55:27,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6417.1, 300 sec: 6775.8). Total num frames: 158851072. Throughput: 0: 1592.5. Samples: 34708636. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:55:27,934][41694] Avg episode reward: [(0, '4.542')] +[2024-11-08 04:55:30,148][42004] Updated weights for policy 0, policy_version 38786 (0.0022) +[2024-11-08 04:55:32,932][41694] Fps is (10 sec: 7373.4, 60 sec: 6348.8, 300 sec: 6775.7). Total num frames: 158887936. Throughput: 0: 1600.0. Samples: 34714140. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:55:32,935][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 04:55:35,814][42004] Updated weights for policy 0, policy_version 38796 (0.0030) +[2024-11-08 04:55:37,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6563.0, 300 sec: 6813.3). Total num frames: 158920704. Throughput: 0: 1617.0. Samples: 34724924. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:55:37,935][41694] Avg episode reward: [(0, '4.456')] +[2024-11-08 04:55:38,077][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000038800_158924800.pth... +[2024-11-08 04:55:38,197][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000038404_157302784.pth +[2024-11-08 04:55:41,546][42004] Updated weights for policy 0, policy_version 38806 (0.0038) +[2024-11-08 04:55:42,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6621.9, 300 sec: 6817.4). Total num frames: 158957568. Throughput: 0: 1604.4. Samples: 34735576. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:55:42,933][41694] Avg episode reward: [(0, '4.777')] +[2024-11-08 04:55:47,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6349.0, 300 sec: 6761.9). Total num frames: 158978048. Throughput: 0: 1536.2. Samples: 34738000. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:55:47,933][41694] Avg episode reward: [(0, '4.506')] +[2024-11-08 04:55:49,077][42004] Updated weights for policy 0, policy_version 38816 (0.0041) +[2024-11-08 04:55:52,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6417.1, 300 sec: 6761.9). Total num frames: 159014912. Throughput: 0: 1543.0. Samples: 34748188. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:55:52,933][41694] Avg episode reward: [(0, '4.594')] +[2024-11-08 04:55:54,736][42004] Updated weights for policy 0, policy_version 38826 (0.0031) +[2024-11-08 04:55:57,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6348.8, 300 sec: 6748.0). Total num frames: 159047680. Throughput: 0: 1582.2. Samples: 34758826. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:55:57,936][41694] Avg episode reward: [(0, '4.577')] +[2024-11-08 04:56:01,427][42004] Updated weights for policy 0, policy_version 38836 (0.0035) +[2024-11-08 04:56:02,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6720.2). Total num frames: 159080448. Throughput: 0: 1566.4. Samples: 34763166. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:56:02,933][41694] Avg episode reward: [(0, '4.353')] +[2024-11-08 04:56:07,399][42004] Updated weights for policy 0, policy_version 38846 (0.0030) +[2024-11-08 04:56:07,932][41694] Fps is (10 sec: 6553.8, 60 sec: 6280.5, 300 sec: 6720.2). Total num frames: 159113216. Throughput: 0: 1634.4. Samples: 34773200. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:56:07,934][41694] Avg episode reward: [(0, '4.621')] +[2024-11-08 04:56:12,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6417.1, 300 sec: 6775.8). Total num frames: 159150080. Throughput: 0: 1677.6. Samples: 34784128. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:56:12,934][41694] Avg episode reward: [(0, '4.501')] +[2024-11-08 04:56:13,088][42004] Updated weights for policy 0, policy_version 38856 (0.0044) +[2024-11-08 04:56:18,817][41694] Fps is (10 sec: 6396.7, 60 sec: 6391.0, 300 sec: 6741.6). Total num frames: 159182848. Throughput: 0: 1641.0. Samples: 34789436. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:56:18,819][41694] Avg episode reward: [(0, '4.513')] +[2024-11-08 04:56:20,286][42004] Updated weights for policy 0, policy_version 38866 (0.0027) +[2024-11-08 04:56:22,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6622.0, 300 sec: 6748.0). Total num frames: 159211520. Throughput: 0: 1618.4. Samples: 34797752. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:56:22,935][41694] Avg episode reward: [(0, '4.260')] +[2024-11-08 04:56:25,868][42004] Updated weights for policy 0, policy_version 38876 (0.0026) +[2024-11-08 04:56:27,931][41694] Fps is (10 sec: 7190.4, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 159248384. Throughput: 0: 1627.0. Samples: 34808792. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:56:27,934][41694] Avg episode reward: [(0, '4.487')] +[2024-11-08 04:56:31,875][42004] Updated weights for policy 0, policy_version 38886 (0.0033) +[2024-11-08 04:56:32,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 159281152. Throughput: 0: 1681.8. Samples: 34813680. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:56:32,934][41694] Avg episode reward: [(0, '4.390')] +[2024-11-08 04:56:37,399][42004] Updated weights for policy 0, policy_version 38896 (0.0023) +[2024-11-08 04:56:37,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 159318016. Throughput: 0: 1691.6. Samples: 34824310. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:56:37,933][41694] Avg episode reward: [(0, '4.514')] +[2024-11-08 04:56:42,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 159354880. Throughput: 0: 1705.7. Samples: 34835580. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:56:42,934][41694] Avg episode reward: [(0, '4.600')] +[2024-11-08 04:56:42,940][42004] Updated weights for policy 0, policy_version 38906 (0.0031) +[2024-11-08 04:56:47,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6894.9, 300 sec: 6775.8). Total num frames: 159391744. Throughput: 0: 1726.4. Samples: 34840854. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:56:47,933][41694] Avg episode reward: [(0, '4.551')] +[2024-11-08 04:56:48,500][42004] Updated weights for policy 0, policy_version 38916 (0.0037) +[2024-11-08 04:56:52,932][41694] Fps is (10 sec: 6144.1, 60 sec: 6690.1, 300 sec: 6734.1). Total num frames: 159416320. Throughput: 0: 1751.0. Samples: 34851994. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:56:52,933][41694] Avg episode reward: [(0, '4.422')] +[2024-11-08 04:56:55,729][42004] Updated weights for policy 0, policy_version 38926 (0.0037) +[2024-11-08 04:56:57,935][41694] Fps is (10 sec: 6141.8, 60 sec: 6758.0, 300 sec: 6748.0). Total num frames: 159453184. Throughput: 0: 1690.4. Samples: 34860200. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:56:57,939][41694] Avg episode reward: [(0, '4.387')] +[2024-11-08 04:57:01,395][42004] Updated weights for policy 0, policy_version 38936 (0.0033) +[2024-11-08 04:57:02,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6761.9). Total num frames: 159490048. Throughput: 0: 1723.8. Samples: 34865480. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:57:02,933][41694] Avg episode reward: [(0, '4.519')] +[2024-11-08 04:57:07,810][42004] Updated weights for policy 0, policy_version 38946 (0.0023) +[2024-11-08 04:57:07,931][41694] Fps is (10 sec: 6965.7, 60 sec: 6826.7, 300 sec: 6775.8). Total num frames: 159522816. Throughput: 0: 1717.2. Samples: 34875024. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:57:07,933][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 04:57:12,931][41694] Fps is (10 sec: 6963.1, 60 sec: 6826.7, 300 sec: 6775.8). Total num frames: 159559680. Throughput: 0: 1720.2. Samples: 34886202. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:57:12,933][41694] Avg episode reward: [(0, '4.311')] +[2024-11-08 04:57:13,281][42004] Updated weights for policy 0, policy_version 38956 (0.0032) +[2024-11-08 04:57:17,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6998.2, 300 sec: 6775.8). Total num frames: 159596544. Throughput: 0: 1731.8. Samples: 34891612. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:57:17,933][41694] Avg episode reward: [(0, '4.616')] +[2024-11-08 04:57:18,547][42004] Updated weights for policy 0, policy_version 38966 (0.0028) +[2024-11-08 04:57:22,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 6761.9). Total num frames: 159633408. Throughput: 0: 1748.1. Samples: 34902976. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:57:22,933][41694] Avg episode reward: [(0, '4.453')] +[2024-11-08 04:57:24,099][42004] Updated weights for policy 0, policy_version 38976 (0.0029) +[2024-11-08 04:57:27,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 159657984. Throughput: 0: 1675.8. Samples: 34910992. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:57:27,934][41694] Avg episode reward: [(0, '4.312')] +[2024-11-08 04:57:31,582][42004] Updated weights for policy 0, policy_version 38986 (0.0028) +[2024-11-08 04:57:32,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6895.0, 300 sec: 6734.1). Total num frames: 159694848. Throughput: 0: 1674.9. Samples: 34916224. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:57:32,934][41694] Avg episode reward: [(0, '4.377')] +[2024-11-08 04:57:37,094][42004] Updated weights for policy 0, policy_version 38996 (0.0043) +[2024-11-08 04:57:37,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6720.3). Total num frames: 159731712. Throughput: 0: 1681.9. Samples: 34927678. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:57:37,933][41694] Avg episode reward: [(0, '4.647')] +[2024-11-08 04:57:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000038997_159731712.pth... +[2024-11-08 04:57:38,060][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000038603_158117888.pth +[2024-11-08 04:57:42,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 159764480. Throughput: 0: 1726.9. Samples: 34937902. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:57:42,932][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 04:57:42,977][42004] Updated weights for policy 0, policy_version 39006 (0.0035) +[2024-11-08 04:57:47,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6734.1). Total num frames: 159805440. Throughput: 0: 1733.3. Samples: 34943478. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:57:47,934][41694] Avg episode reward: [(0, '4.443')] +[2024-11-08 04:57:48,338][42004] Updated weights for policy 0, policy_version 39016 (0.0025) +[2024-11-08 04:57:52,932][41694] Fps is (10 sec: 7782.2, 60 sec: 7099.7, 300 sec: 6775.8). Total num frames: 159842304. Throughput: 0: 1774.9. Samples: 34954896. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:57:52,934][41694] Avg episode reward: [(0, '4.245')] +[2024-11-08 04:57:53,753][42004] Updated weights for policy 0, policy_version 39026 (0.0032) +[2024-11-08 04:57:57,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7100.2, 300 sec: 6775.8). Total num frames: 159879168. Throughput: 0: 1772.2. Samples: 34965950. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:57:57,933][41694] Avg episode reward: [(0, '4.328')] +[2024-11-08 04:58:01,045][42004] Updated weights for policy 0, policy_version 39036 (0.0026) +[2024-11-08 04:58:02,932][41694] Fps is (10 sec: 5734.0, 60 sec: 6826.6, 300 sec: 6734.1). Total num frames: 159899648. Throughput: 0: 1724.3. Samples: 34969208. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:58:02,934][41694] Avg episode reward: [(0, '4.377')] +[2024-11-08 04:58:06,839][42004] Updated weights for policy 0, policy_version 39046 (0.0039) +[2024-11-08 04:58:07,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6894.9, 300 sec: 6734.1). Total num frames: 159936512. Throughput: 0: 1688.8. Samples: 34978974. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:58:07,933][41694] Avg episode reward: [(0, '4.572')] +[2024-11-08 04:58:12,932][41694] Fps is (10 sec: 6554.1, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 159965184. Throughput: 0: 1721.1. Samples: 34988442. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:58:12,933][41694] Avg episode reward: [(0, '4.589')] +[2024-11-08 04:58:13,467][42004] Updated weights for policy 0, policy_version 39056 (0.0033) +[2024-11-08 04:58:17,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6734.1). Total num frames: 160006144. Throughput: 0: 1713.9. Samples: 34993350. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:58:17,934][41694] Avg episode reward: [(0, '4.583')] +[2024-11-08 04:58:18,822][42004] Updated weights for policy 0, policy_version 39066 (0.0031) +[2024-11-08 04:58:22,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6826.7, 300 sec: 6734.1). Total num frames: 160043008. Throughput: 0: 1712.5. Samples: 35004740. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:58:22,934][41694] Avg episode reward: [(0, '4.369')] +[2024-11-08 04:58:24,390][42004] Updated weights for policy 0, policy_version 39076 (0.0026) +[2024-11-08 04:58:27,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7031.5, 300 sec: 6775.8). Total num frames: 160079872. Throughput: 0: 1740.4. Samples: 35016222. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:58:27,934][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 04:58:29,998][42004] Updated weights for policy 0, policy_version 39086 (0.0029) +[2024-11-08 04:58:34,327][41694] Fps is (10 sec: 6110.7, 60 sec: 6805.0, 300 sec: 6743.9). Total num frames: 160112640. Throughput: 0: 1686.1. Samples: 35021704. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:58:34,328][41694] Avg episode reward: [(0, '4.496')] +[2024-11-08 04:58:37,581][42004] Updated weights for policy 0, policy_version 39096 (0.0033) +[2024-11-08 04:58:37,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 160137216. Throughput: 0: 1648.5. Samples: 35029080. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:58:37,939][41694] Avg episode reward: [(0, '4.266')] +[2024-11-08 04:58:42,931][41694] Fps is (10 sec: 7140.2, 60 sec: 6826.7, 300 sec: 6761.9). Total num frames: 160174080. Throughput: 0: 1648.0. Samples: 35040110. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:58:42,934][41694] Avg episode reward: [(0, '4.508')] +[2024-11-08 04:58:42,998][42004] Updated weights for policy 0, policy_version 39106 (0.0034) +[2024-11-08 04:58:47,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 160210944. Throughput: 0: 1692.2. Samples: 35045354. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:58:47,933][41694] Avg episode reward: [(0, '4.586')] +[2024-11-08 04:58:49,009][42004] Updated weights for policy 0, policy_version 39116 (0.0025) +[2024-11-08 04:58:52,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 160247808. Throughput: 0: 1713.3. Samples: 35056074. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:58:52,934][41694] Avg episode reward: [(0, '4.318')] +[2024-11-08 04:58:54,396][42004] Updated weights for policy 0, policy_version 39126 (0.0036) +[2024-11-08 04:58:57,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6784.0). Total num frames: 160284672. Throughput: 0: 1755.2. Samples: 35067426. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:58:57,937][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 04:59:00,006][42004] Updated weights for policy 0, policy_version 39136 (0.0029) +[2024-11-08 04:59:02,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6963.3, 300 sec: 6761.9). Total num frames: 160317440. Throughput: 0: 1764.8. Samples: 35072768. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:59:02,933][41694] Avg episode reward: [(0, '4.433')] +[2024-11-08 04:59:06,312][42004] Updated weights for policy 0, policy_version 39146 (0.0026) +[2024-11-08 04:59:08,475][41694] Fps is (10 sec: 5438.9, 60 sec: 6697.8, 300 sec: 6707.9). Total num frames: 160342016. Throughput: 0: 1708.9. Samples: 35082568. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:59:08,478][41694] Avg episode reward: [(0, '4.676')] +[2024-11-08 04:59:12,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 160374784. Throughput: 0: 1642.6. Samples: 35090140. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:59:12,935][41694] Avg episode reward: [(0, '4.447')] +[2024-11-08 04:59:13,737][42004] Updated weights for policy 0, policy_version 39156 (0.0037) +[2024-11-08 04:59:17,931][41694] Fps is (10 sec: 7363.2, 60 sec: 6758.4, 300 sec: 6720.2). Total num frames: 160411648. Throughput: 0: 1694.6. Samples: 35095598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:59:17,933][41694] Avg episode reward: [(0, '4.414')] +[2024-11-08 04:59:19,618][42004] Updated weights for policy 0, policy_version 39166 (0.0024) +[2024-11-08 04:59:22,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6690.1, 300 sec: 6706.3). Total num frames: 160444416. Throughput: 0: 1708.6. Samples: 35105966. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:59:22,934][41694] Avg episode reward: [(0, '4.415')] +[2024-11-08 04:59:25,379][42004] Updated weights for policy 0, policy_version 39176 (0.0027) +[2024-11-08 04:59:27,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6690.1, 300 sec: 6692.5). Total num frames: 160481280. Throughput: 0: 1703.2. Samples: 35116754. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:59:27,933][41694] Avg episode reward: [(0, '4.411')] +[2024-11-08 04:59:30,866][42004] Updated weights for policy 0, policy_version 39186 (0.0020) +[2024-11-08 04:59:32,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6919.3, 300 sec: 6749.9). Total num frames: 160518144. Throughput: 0: 1707.7. Samples: 35122202. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:59:32,934][41694] Avg episode reward: [(0, '4.576')] +[2024-11-08 04:59:36,407][42004] Updated weights for policy 0, policy_version 39196 (0.0026) +[2024-11-08 04:59:37,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 6761.9). Total num frames: 160555008. Throughput: 0: 1721.4. Samples: 35133538. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:59:37,933][41694] Avg episode reward: [(0, '4.542')] +[2024-11-08 04:59:37,948][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000039198_160555008.pth... +[2024-11-08 04:59:38,096][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000038800_158924800.pth +[2024-11-08 04:59:42,932][41694] Fps is (10 sec: 6143.7, 60 sec: 6758.3, 300 sec: 6720.3). Total num frames: 160579584. Throughput: 0: 1656.1. Samples: 35141950. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:59:42,935][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 04:59:43,871][42004] Updated weights for policy 0, policy_version 39206 (0.0033) +[2024-11-08 04:59:47,936][41694] Fps is (10 sec: 6141.1, 60 sec: 6757.9, 300 sec: 6734.0). Total num frames: 160616448. Throughput: 0: 1638.8. Samples: 35146524. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:59:47,939][41694] Avg episode reward: [(0, '4.500')] +[2024-11-08 04:59:49,482][42004] Updated weights for policy 0, policy_version 39216 (0.0030) +[2024-11-08 04:59:52,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 6720.2). Total num frames: 160649216. Throughput: 0: 1681.8. Samples: 35157338. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:59:52,934][41694] Avg episode reward: [(0, '4.474')] +[2024-11-08 04:59:55,892][42004] Updated weights for policy 0, policy_version 39226 (0.0035) +[2024-11-08 04:59:57,932][41694] Fps is (10 sec: 6556.2, 60 sec: 6621.8, 300 sec: 6706.3). Total num frames: 160681984. Throughput: 0: 1710.9. Samples: 35167132. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:59:57,934][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 05:00:01,426][42004] Updated weights for policy 0, policy_version 39236 (0.0026) +[2024-11-08 05:00:02,932][41694] Fps is (10 sec: 6963.4, 60 sec: 6690.1, 300 sec: 6720.2). Total num frames: 160718848. Throughput: 0: 1713.5. Samples: 35172708. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:00:02,934][41694] Avg episode reward: [(0, '4.378')] +[2024-11-08 05:00:07,119][42004] Updated weights for policy 0, policy_version 39246 (0.0040) +[2024-11-08 05:00:07,932][41694] Fps is (10 sec: 7373.2, 60 sec: 6957.9, 300 sec: 6748.0). Total num frames: 160755712. Throughput: 0: 1719.0. Samples: 35183322. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:00:07,934][41694] Avg episode reward: [(0, '4.424')] +[2024-11-08 05:00:12,864][42004] Updated weights for policy 0, policy_version 39256 (0.0031) +[2024-11-08 05:00:12,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 6775.8). Total num frames: 160792576. Throughput: 0: 1722.1. Samples: 35194248. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:00:12,934][41694] Avg episode reward: [(0, '4.482')] +[2024-11-08 05:00:17,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.8, 300 sec: 6761.9). Total num frames: 160808960. Throughput: 0: 1690.4. Samples: 35198272. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:00:17,934][41694] Avg episode reward: [(0, '4.683')] +[2024-11-08 05:00:21,180][42004] Updated weights for policy 0, policy_version 39266 (0.0035) +[2024-11-08 05:00:22,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6690.2, 300 sec: 6761.9). Total num frames: 160845824. Throughput: 0: 1603.4. Samples: 35205692. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:00:22,935][41694] Avg episode reward: [(0, '4.568')] +[2024-11-08 05:00:26,862][42004] Updated weights for policy 0, policy_version 39276 (0.0029) +[2024-11-08 05:00:27,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6621.9, 300 sec: 6748.0). Total num frames: 160878592. Throughput: 0: 1651.9. Samples: 35216284. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:00:27,933][41694] Avg episode reward: [(0, '4.378')] +[2024-11-08 05:00:32,383][42004] Updated weights for policy 0, policy_version 39286 (0.0027) +[2024-11-08 05:00:32,931][41694] Fps is (10 sec: 6963.1, 60 sec: 6621.9, 300 sec: 6761.9). Total num frames: 160915456. Throughput: 0: 1666.8. Samples: 35221524. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:00:32,933][41694] Avg episode reward: [(0, '4.266')] +[2024-11-08 05:00:37,811][42004] Updated weights for policy 0, policy_version 39296 (0.0033) +[2024-11-08 05:00:37,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 160956416. Throughput: 0: 1685.0. Samples: 35233162. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:00:37,932][41694] Avg episode reward: [(0, '4.550')] +[2024-11-08 05:00:42,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6895.0, 300 sec: 6831.3). Total num frames: 160993280. Throughput: 0: 1723.0. Samples: 35244664. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:00:42,934][41694] Avg episode reward: [(0, '4.448')] +[2024-11-08 05:00:43,226][42004] Updated weights for policy 0, policy_version 39306 (0.0031) +[2024-11-08 05:00:47,932][41694] Fps is (10 sec: 7372.4, 60 sec: 6895.4, 300 sec: 6831.3). Total num frames: 161030144. Throughput: 0: 1714.4. Samples: 35249856. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:00:47,936][41694] Avg episode reward: [(0, '4.401')] +[2024-11-08 05:00:48,772][42004] Updated weights for policy 0, policy_version 39316 (0.0031) +[2024-11-08 05:00:52,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6690.2, 300 sec: 6789.6). Total num frames: 161050624. Throughput: 0: 1652.5. Samples: 35257686. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:00:52,933][41694] Avg episode reward: [(0, '4.344')] +[2024-11-08 05:00:56,350][42004] Updated weights for policy 0, policy_version 39326 (0.0023) +[2024-11-08 05:00:57,932][41694] Fps is (10 sec: 5734.6, 60 sec: 6758.5, 300 sec: 6803.5). Total num frames: 161087488. Throughput: 0: 1652.4. Samples: 35268606. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:00:57,934][41694] Avg episode reward: [(0, '4.494')] +[2024-11-08 05:01:02,580][42004] Updated weights for policy 0, policy_version 39336 (0.0038) +[2024-11-08 05:01:02,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.2, 300 sec: 6803.5). Total num frames: 161120256. Throughput: 0: 1672.8. Samples: 35273548. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:01:02,933][41694] Avg episode reward: [(0, '4.233')] +[2024-11-08 05:01:07,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6690.1, 300 sec: 6803.5). Total num frames: 161157120. Throughput: 0: 1738.9. Samples: 35283942. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:01:07,937][41694] Avg episode reward: [(0, '4.617')] +[2024-11-08 05:01:08,020][42004] Updated weights for policy 0, policy_version 39346 (0.0037) +[2024-11-08 05:01:12,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6621.9, 300 sec: 6824.0). Total num frames: 161189888. Throughput: 0: 1728.8. Samples: 35294080. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:01:12,935][41694] Avg episode reward: [(0, '4.411')] +[2024-11-08 05:01:14,334][42004] Updated weights for policy 0, policy_version 39356 (0.0037) +[2024-11-08 05:01:17,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6963.2, 300 sec: 6831.3). Total num frames: 161226752. Throughput: 0: 1728.9. Samples: 35299326. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:01:17,934][41694] Avg episode reward: [(0, '4.532')] +[2024-11-08 05:01:20,200][42004] Updated weights for policy 0, policy_version 39366 (0.0024) +[2024-11-08 05:01:22,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6963.2, 300 sec: 6831.3). Total num frames: 161263616. Throughput: 0: 1708.4. Samples: 35310040. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:01:22,933][41694] Avg episode reward: [(0, '4.417')] +[2024-11-08 05:01:27,480][42004] Updated weights for policy 0, policy_version 39376 (0.0043) +[2024-11-08 05:01:27,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 161284096. Throughput: 0: 1620.3. Samples: 35317578. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:01:27,934][41694] Avg episode reward: [(0, '4.498')] +[2024-11-08 05:01:32,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 161320960. Throughput: 0: 1626.5. Samples: 35323050. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:01:32,933][41694] Avg episode reward: [(0, '4.265')] +[2024-11-08 05:01:33,131][42004] Updated weights for policy 0, policy_version 39386 (0.0031) +[2024-11-08 05:01:37,932][41694] Fps is (10 sec: 7372.1, 60 sec: 6690.0, 300 sec: 6789.6). Total num frames: 161357824. Throughput: 0: 1686.9. Samples: 35333596. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:01:37,934][41694] Avg episode reward: [(0, '4.620')] +[2024-11-08 05:01:37,964][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000039394_161357824.pth... +[2024-11-08 05:01:38,084][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000038997_159731712.pth +[2024-11-08 05:01:38,869][42004] Updated weights for policy 0, policy_version 39396 (0.0042) +[2024-11-08 05:01:42,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6690.1, 300 sec: 6789.6). Total num frames: 161394688. Throughput: 0: 1693.8. Samples: 35344826. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:01:42,933][41694] Avg episode reward: [(0, '4.543')] +[2024-11-08 05:01:44,330][42004] Updated weights for policy 0, policy_version 39406 (0.0031) +[2024-11-08 05:01:47,931][41694] Fps is (10 sec: 7373.5, 60 sec: 6690.2, 300 sec: 6831.3). Total num frames: 161431552. Throughput: 0: 1712.1. Samples: 35350592. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:01:47,933][41694] Avg episode reward: [(0, '4.351')] +[2024-11-08 05:01:49,836][42004] Updated weights for policy 0, policy_version 39416 (0.0026) +[2024-11-08 05:01:52,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6831.4). Total num frames: 161468416. Throughput: 0: 1727.8. Samples: 35361692. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:01:52,935][41694] Avg episode reward: [(0, '4.269')] +[2024-11-08 05:01:55,472][42004] Updated weights for policy 0, policy_version 39426 (0.0026) +[2024-11-08 05:01:59,041][41694] Fps is (10 sec: 5899.2, 60 sec: 6702.7, 300 sec: 6778.0). Total num frames: 161497088. Throughput: 0: 1700.4. Samples: 35372482. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:01:59,042][41694] Avg episode reward: [(0, '4.380')] +[2024-11-08 05:02:02,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 161525760. Throughput: 0: 1667.4. Samples: 35374360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:02:02,933][41694] Avg episode reward: [(0, '4.336')] +[2024-11-08 05:02:03,411][42004] Updated weights for policy 0, policy_version 39436 (0.0027) +[2024-11-08 05:02:07,932][41694] Fps is (10 sec: 6910.4, 60 sec: 6690.1, 300 sec: 6775.7). Total num frames: 161558528. Throughput: 0: 1656.6. Samples: 35384588. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:02:07,934][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 05:02:09,428][42004] Updated weights for policy 0, policy_version 39446 (0.0035) +[2024-11-08 05:02:12,932][41694] Fps is (10 sec: 6962.7, 60 sec: 6758.4, 300 sec: 6775.7). Total num frames: 161595392. Throughput: 0: 1713.2. Samples: 35394674. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:02:12,935][41694] Avg episode reward: [(0, '4.368')] +[2024-11-08 05:02:15,052][42004] Updated weights for policy 0, policy_version 39456 (0.0045) +[2024-11-08 05:02:17,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 161632256. Throughput: 0: 1719.3. Samples: 35400418. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:02:17,934][41694] Avg episode reward: [(0, '4.288')] +[2024-11-08 05:02:20,611][42004] Updated weights for policy 0, policy_version 39466 (0.0028) +[2024-11-08 05:02:22,931][41694] Fps is (10 sec: 7373.3, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 161669120. Throughput: 0: 1731.7. Samples: 35411522. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:02:22,933][41694] Avg episode reward: [(0, '4.323')] +[2024-11-08 05:02:26,153][42004] Updated weights for policy 0, policy_version 39476 (0.0028) +[2024-11-08 05:02:27,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6963.2, 300 sec: 6803.5). Total num frames: 161701888. Throughput: 0: 1726.7. Samples: 35422526. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:02:27,934][41694] Avg episode reward: [(0, '4.499')] +[2024-11-08 05:02:33,215][41694] Fps is (10 sec: 5576.1, 60 sec: 6726.6, 300 sec: 6755.4). Total num frames: 161726464. Throughput: 0: 1701.6. Samples: 35427646. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:02:33,218][41694] Avg episode reward: [(0, '4.312')] +[2024-11-08 05:02:33,881][42004] Updated weights for policy 0, policy_version 39486 (0.0035) +[2024-11-08 05:02:37,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6758.5, 300 sec: 6775.8). Total num frames: 161763328. Throughput: 0: 1629.4. Samples: 35435014. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:02:37,933][41694] Avg episode reward: [(0, '4.497')] +[2024-11-08 05:02:39,538][42004] Updated weights for policy 0, policy_version 39496 (0.0032) +[2024-11-08 05:02:42,932][41694] Fps is (10 sec: 7166.5, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 161796096. Throughput: 0: 1663.2. Samples: 35445480. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:02:42,935][41694] Avg episode reward: [(0, '4.374')] +[2024-11-08 05:02:45,588][42004] Updated weights for policy 0, policy_version 39506 (0.0033) +[2024-11-08 05:02:47,932][41694] Fps is (10 sec: 6962.8, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 161832960. Throughput: 0: 1690.9. Samples: 35450450. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:02:47,934][41694] Avg episode reward: [(0, '4.267')] +[2024-11-08 05:02:51,368][42004] Updated weights for policy 0, policy_version 39516 (0.0037) +[2024-11-08 05:02:52,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 161865728. Throughput: 0: 1703.9. Samples: 35461262. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:02:52,933][41694] Avg episode reward: [(0, '4.338')] +[2024-11-08 05:02:56,689][42004] Updated weights for policy 0, policy_version 39526 (0.0023) +[2024-11-08 05:02:57,932][41694] Fps is (10 sec: 7373.1, 60 sec: 6955.2, 300 sec: 6803.5). Total num frames: 161906688. Throughput: 0: 1737.7. Samples: 35472868. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:02:57,935][41694] Avg episode reward: [(0, '4.408')] +[2024-11-08 05:03:02,714][42004] Updated weights for policy 0, policy_version 39536 (0.0031) +[2024-11-08 05:03:02,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6894.9, 300 sec: 6789.6). Total num frames: 161939456. Throughput: 0: 1721.0. Samples: 35477864. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:03:02,933][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 05:03:07,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 161964032. Throughput: 0: 1687.6. Samples: 35487464. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:03:07,933][41694] Avg episode reward: [(0, '4.392')] +[2024-11-08 05:03:10,241][42004] Updated weights for policy 0, policy_version 39546 (0.0033) +[2024-11-08 05:03:12,939][41694] Fps is (10 sec: 5730.8, 60 sec: 6689.5, 300 sec: 6747.8). Total num frames: 161996800. Throughput: 0: 1632.3. Samples: 35495988. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:03:12,941][41694] Avg episode reward: [(0, '4.599')] +[2024-11-08 05:03:16,118][42004] Updated weights for policy 0, policy_version 39556 (0.0037) +[2024-11-08 05:03:17,933][41694] Fps is (10 sec: 6553.0, 60 sec: 6621.7, 300 sec: 6734.1). Total num frames: 162029568. Throughput: 0: 1640.7. Samples: 35501014. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:03:17,937][41694] Avg episode reward: [(0, '4.687')] +[2024-11-08 05:03:22,033][42004] Updated weights for policy 0, policy_version 39566 (0.0030) +[2024-11-08 05:03:22,931][41694] Fps is (10 sec: 6967.6, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 162066432. Throughput: 0: 1693.6. Samples: 35511228. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:03:22,933][41694] Avg episode reward: [(0, '4.423')] +[2024-11-08 05:03:27,369][42004] Updated weights for policy 0, policy_version 39576 (0.0025) +[2024-11-08 05:03:27,931][41694] Fps is (10 sec: 7783.3, 60 sec: 6758.4, 300 sec: 6794.0). Total num frames: 162107392. Throughput: 0: 1715.4. Samples: 35522672. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:03:27,933][41694] Avg episode reward: [(0, '4.393')] +[2024-11-08 05:03:32,858][42004] Updated weights for policy 0, policy_version 39586 (0.0030) +[2024-11-08 05:03:32,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6996.3, 300 sec: 6803.5). Total num frames: 162144256. Throughput: 0: 1729.7. Samples: 35528284. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:03:32,961][41694] Avg episode reward: [(0, '4.486')] +[2024-11-08 05:03:37,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6963.2, 300 sec: 6803.5). Total num frames: 162181120. Throughput: 0: 1737.2. Samples: 35539436. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:03:37,934][41694] Avg episode reward: [(0, '4.353')] +[2024-11-08 05:03:37,945][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000039595_162181120.pth... +[2024-11-08 05:03:38,078][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000039198_160555008.pth +[2024-11-08 05:03:38,338][42004] Updated weights for policy 0, policy_version 39596 (0.0022) +[2024-11-08 05:03:42,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 162201600. Throughput: 0: 1648.3. Samples: 35547042. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:03:42,933][41694] Avg episode reward: [(0, '4.453')] +[2024-11-08 05:03:46,120][42004] Updated weights for policy 0, policy_version 39606 (0.0031) +[2024-11-08 05:03:47,931][41694] Fps is (10 sec: 5325.0, 60 sec: 6690.2, 300 sec: 6734.1). Total num frames: 162234368. Throughput: 0: 1648.3. Samples: 35552036. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:03:47,933][41694] Avg episode reward: [(0, '4.336')] +[2024-11-08 05:03:52,387][42004] Updated weights for policy 0, policy_version 39616 (0.0027) +[2024-11-08 05:03:52,933][41694] Fps is (10 sec: 6552.5, 60 sec: 6690.0, 300 sec: 6720.2). Total num frames: 162267136. Throughput: 0: 1654.9. Samples: 35561938. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:03:52,936][41694] Avg episode reward: [(0, '4.468')] +[2024-11-08 05:03:57,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 162299904. Throughput: 0: 1672.7. Samples: 35571250. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:03:57,934][41694] Avg episode reward: [(0, '4.577')] +[2024-11-08 05:03:58,790][42004] Updated weights for policy 0, policy_version 39626 (0.0038) +[2024-11-08 05:04:02,932][41694] Fps is (10 sec: 6144.7, 60 sec: 6485.3, 300 sec: 6746.5). Total num frames: 162328576. Throughput: 0: 1675.0. Samples: 35576390. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:04:02,934][41694] Avg episode reward: [(0, '4.456')] +[2024-11-08 05:04:05,673][42004] Updated weights for policy 0, policy_version 39636 (0.0036) +[2024-11-08 05:04:07,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 162361344. Throughput: 0: 1646.3. Samples: 35585310. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:04:07,936][41694] Avg episode reward: [(0, '4.442')] +[2024-11-08 05:04:11,542][42004] Updated weights for policy 0, policy_version 39646 (0.0027) +[2024-11-08 05:04:12,932][41694] Fps is (10 sec: 6963.4, 60 sec: 6690.8, 300 sec: 6734.1). Total num frames: 162398208. Throughput: 0: 1630.0. Samples: 35596022. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:04:12,938][41694] Avg episode reward: [(0, '4.540')] +[2024-11-08 05:04:17,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6485.4, 300 sec: 6692.5). Total num frames: 162418688. Throughput: 0: 1562.5. Samples: 35598598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:04:17,934][41694] Avg episode reward: [(0, '4.533')] +[2024-11-08 05:04:19,184][42004] Updated weights for policy 0, policy_version 39656 (0.0027) +[2024-11-08 05:04:22,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6485.3, 300 sec: 6692.5). Total num frames: 162455552. Throughput: 0: 1534.5. Samples: 35608488. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:04:22,933][41694] Avg episode reward: [(0, '4.678')] +[2024-11-08 05:04:25,335][42004] Updated weights for policy 0, policy_version 39666 (0.0036) +[2024-11-08 05:04:27,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6348.8, 300 sec: 6678.6). Total num frames: 162488320. Throughput: 0: 1585.6. Samples: 35618396. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:04:27,935][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 05:04:30,895][42004] Updated weights for policy 0, policy_version 39676 (0.0030) +[2024-11-08 05:04:32,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6348.8, 300 sec: 6678.6). Total num frames: 162525184. Throughput: 0: 1600.0. Samples: 35624034. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:04:32,933][41694] Avg episode reward: [(0, '4.527')] +[2024-11-08 05:04:36,514][42004] Updated weights for policy 0, policy_version 39686 (0.0020) +[2024-11-08 05:04:37,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6348.8, 300 sec: 6720.2). Total num frames: 162562048. Throughput: 0: 1626.2. Samples: 35635116. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:04:37,933][41694] Avg episode reward: [(0, '4.300')] +[2024-11-08 05:04:42,276][42004] Updated weights for policy 0, policy_version 39696 (0.0039) +[2024-11-08 05:04:42,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6720.3). Total num frames: 162598912. Throughput: 0: 1658.1. Samples: 35645864. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:04:42,934][41694] Avg episode reward: [(0, '4.491')] +[2024-11-08 05:04:49,275][41694] Fps is (10 sec: 6138.4, 60 sec: 6476.8, 300 sec: 6689.8). Total num frames: 162631680. Throughput: 0: 1610.7. Samples: 35651036. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:04:49,278][41694] Avg episode reward: [(0, '4.626')] +[2024-11-08 05:04:49,671][42004] Updated weights for policy 0, policy_version 39706 (0.0030) +[2024-11-08 05:04:52,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6485.5, 300 sec: 6692.5). Total num frames: 162656256. Throughput: 0: 1635.1. Samples: 35658888. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:04:52,933][41694] Avg episode reward: [(0, '4.657')] +[2024-11-08 05:04:55,523][42004] Updated weights for policy 0, policy_version 39716 (0.0026) +[2024-11-08 05:04:57,932][41694] Fps is (10 sec: 6624.5, 60 sec: 6485.3, 300 sec: 6678.6). Total num frames: 162689024. Throughput: 0: 1617.1. Samples: 35668792. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:04:57,934][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 05:05:02,162][42004] Updated weights for policy 0, policy_version 39726 (0.0048) +[2024-11-08 05:05:02,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 162721792. Throughput: 0: 1656.6. Samples: 35673144. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:05:02,934][41694] Avg episode reward: [(0, '4.525')] +[2024-11-08 05:05:07,693][42004] Updated weights for policy 0, policy_version 39736 (0.0034) +[2024-11-08 05:05:07,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 6664.7). Total num frames: 162758656. Throughput: 0: 1678.7. Samples: 35684028. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:05:07,936][41694] Avg episode reward: [(0, '4.643')] +[2024-11-08 05:05:12,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 162795520. Throughput: 0: 1699.5. Samples: 35694872. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:05:12,934][41694] Avg episode reward: [(0, '4.500')] +[2024-11-08 05:05:13,638][42004] Updated weights for policy 0, policy_version 39746 (0.0030) +[2024-11-08 05:05:17,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 162824192. Throughput: 0: 1677.5. Samples: 35699520. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:05:17,936][41694] Avg episode reward: [(0, '4.531')] +[2024-11-08 05:05:19,835][42004] Updated weights for policy 0, policy_version 39756 (0.0038) +[2024-11-08 05:05:23,336][41694] Fps is (10 sec: 5118.0, 60 sec: 6509.7, 300 sec: 6669.4). Total num frames: 162848768. Throughput: 0: 1648.5. Samples: 35709964. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:05:23,337][41694] Avg episode reward: [(0, '4.615')] +[2024-11-08 05:05:27,438][42004] Updated weights for policy 0, policy_version 39766 (0.0044) +[2024-11-08 05:05:27,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 162881536. Throughput: 0: 1585.5. Samples: 35717210. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:05:27,935][41694] Avg episode reward: [(0, '4.558')] +[2024-11-08 05:05:32,932][41694] Fps is (10 sec: 7256.4, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 162918400. Throughput: 0: 1629.5. Samples: 35722174. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:05:32,934][41694] Avg episode reward: [(0, '4.508')] +[2024-11-08 05:05:33,466][42004] Updated weights for policy 0, policy_version 39776 (0.0032) +[2024-11-08 05:05:37,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6553.5, 300 sec: 6650.8). Total num frames: 162955264. Throughput: 0: 1648.2. Samples: 35733056. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:05:37,934][41694] Avg episode reward: [(0, '4.446')] +[2024-11-08 05:05:37,942][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000039784_162955264.pth... +[2024-11-08 05:05:38,073][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000039394_161357824.pth +[2024-11-08 05:05:38,736][42004] Updated weights for policy 0, policy_version 39786 (0.0027) +[2024-11-08 05:05:42,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 162992128. Throughput: 0: 1675.9. Samples: 35744206. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:05:42,934][41694] Avg episode reward: [(0, '4.376')] +[2024-11-08 05:05:44,718][42004] Updated weights for policy 0, policy_version 39796 (0.0031) +[2024-11-08 05:05:47,932][41694] Fps is (10 sec: 6963.4, 60 sec: 6703.7, 300 sec: 6692.4). Total num frames: 163024896. Throughput: 0: 1690.1. Samples: 35749200. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:05:47,934][41694] Avg episode reward: [(0, '4.215')] +[2024-11-08 05:05:50,322][42004] Updated weights for policy 0, policy_version 39806 (0.0034) +[2024-11-08 05:05:52,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6692.5). Total num frames: 163061760. Throughput: 0: 1690.8. Samples: 35760114. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:05:52,935][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 05:05:57,762][42004] Updated weights for policy 0, policy_version 39816 (0.0025) +[2024-11-08 05:05:57,931][41694] Fps is (10 sec: 6144.2, 60 sec: 6621.9, 300 sec: 6664.7). Total num frames: 163086336. Throughput: 0: 1631.0. Samples: 35768268. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:05:57,934][41694] Avg episode reward: [(0, '4.546')] +[2024-11-08 05:06:02,932][41694] Fps is (10 sec: 5734.1, 60 sec: 6621.8, 300 sec: 6650.8). Total num frames: 163119104. Throughput: 0: 1634.3. Samples: 35773064. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:06:02,938][41694] Avg episode reward: [(0, '4.463')] +[2024-11-08 05:06:03,754][42004] Updated weights for policy 0, policy_version 39826 (0.0038) +[2024-11-08 05:06:07,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 163151872. Throughput: 0: 1633.4. Samples: 35782806. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:06:07,936][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 05:06:10,089][42004] Updated weights for policy 0, policy_version 39836 (0.0031) +[2024-11-08 05:06:12,932][41694] Fps is (10 sec: 6553.9, 60 sec: 6485.3, 300 sec: 6636.9). Total num frames: 163184640. Throughput: 0: 1671.7. Samples: 35792438. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:06:12,933][41694] Avg episode reward: [(0, '4.453')] +[2024-11-08 05:06:16,330][42004] Updated weights for policy 0, policy_version 39846 (0.0027) +[2024-11-08 05:06:17,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6623.0). Total num frames: 163217408. Throughput: 0: 1670.8. Samples: 35797362. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:06:17,934][41694] Avg episode reward: [(0, '4.406')] +[2024-11-08 05:06:21,984][42004] Updated weights for policy 0, policy_version 39856 (0.0026) +[2024-11-08 05:06:22,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6804.2, 300 sec: 6678.6). Total num frames: 163254272. Throughput: 0: 1673.0. Samples: 35808340. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:06:22,934][41694] Avg episode reward: [(0, '4.494')] +[2024-11-08 05:06:27,553][42004] Updated weights for policy 0, policy_version 39866 (0.0033) +[2024-11-08 05:06:27,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6826.7, 300 sec: 6678.6). Total num frames: 163291136. Throughput: 0: 1669.7. Samples: 35819344. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:06:27,933][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 05:06:32,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6621.9, 300 sec: 6636.9). Total num frames: 163315712. Throughput: 0: 1665.6. Samples: 35824150. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:06:32,935][41694] Avg episode reward: [(0, '4.534')] +[2024-11-08 05:06:35,092][42004] Updated weights for policy 0, policy_version 39876 (0.0035) +[2024-11-08 05:06:37,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6553.7, 300 sec: 6623.0). Total num frames: 163348480. Throughput: 0: 1603.7. Samples: 35832280. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:06:37,935][41694] Avg episode reward: [(0, '4.336')] +[2024-11-08 05:06:41,350][42004] Updated weights for policy 0, policy_version 39886 (0.0035) +[2024-11-08 05:06:42,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6485.3, 300 sec: 6609.1). Total num frames: 163381248. Throughput: 0: 1640.1. Samples: 35842072. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:06:42,934][41694] Avg episode reward: [(0, '4.322')] +[2024-11-08 05:06:47,050][42004] Updated weights for policy 0, policy_version 39896 (0.0027) +[2024-11-08 05:06:47,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6553.6, 300 sec: 6609.1). Total num frames: 163418112. Throughput: 0: 1648.6. Samples: 35847248. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:06:47,934][41694] Avg episode reward: [(0, '4.323')] +[2024-11-08 05:06:52,736][42004] Updated weights for policy 0, policy_version 39906 (0.0038) +[2024-11-08 05:06:52,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6553.6, 300 sec: 6662.0). Total num frames: 163454976. Throughput: 0: 1667.5. Samples: 35857844. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:06:52,933][41694] Avg episode reward: [(0, '4.361')] +[2024-11-08 05:06:57,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 163491840. Throughput: 0: 1706.0. Samples: 35869208. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:06:57,933][41694] Avg episode reward: [(0, '4.515')] +[2024-11-08 05:06:58,175][42004] Updated weights for policy 0, policy_version 39916 (0.0029) +[2024-11-08 05:07:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6678.6). Total num frames: 163528704. Throughput: 0: 1720.5. Samples: 35874782. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:07:02,933][41694] Avg episode reward: [(0, '4.332')] +[2024-11-08 05:07:05,643][42004] Updated weights for policy 0, policy_version 39926 (0.0032) +[2024-11-08 05:07:07,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6621.9, 300 sec: 6623.0). Total num frames: 163549184. Throughput: 0: 1642.1. Samples: 35882236. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:07:07,933][41694] Avg episode reward: [(0, '4.674')] +[2024-11-08 05:07:11,799][42004] Updated weights for policy 0, policy_version 39936 (0.0045) +[2024-11-08 05:07:12,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.9, 300 sec: 6609.1). Total num frames: 163581952. Throughput: 0: 1618.3. Samples: 35892168. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:07:12,933][41694] Avg episode reward: [(0, '4.547')] +[2024-11-08 05:07:17,577][42004] Updated weights for policy 0, policy_version 39946 (0.0022) +[2024-11-08 05:07:17,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 6609.1). Total num frames: 163618816. Throughput: 0: 1625.3. Samples: 35897290. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:07:17,935][41694] Avg episode reward: [(0, '4.608')] +[2024-11-08 05:07:22,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6623.0). Total num frames: 163655680. Throughput: 0: 1696.6. Samples: 35908626. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:07:22,934][41694] Avg episode reward: [(0, '4.322')] +[2024-11-08 05:07:22,991][42004] Updated weights for policy 0, policy_version 39956 (0.0032) +[2024-11-08 05:07:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6671.1). Total num frames: 163692544. Throughput: 0: 1728.6. Samples: 35919860. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:07:27,933][41694] Avg episode reward: [(0, '4.680')] +[2024-11-08 05:07:28,483][42004] Updated weights for policy 0, policy_version 39966 (0.0025) +[2024-11-08 05:07:32,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6678.6). Total num frames: 163733504. Throughput: 0: 1738.3. Samples: 35925472. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:07:32,934][41694] Avg episode reward: [(0, '4.358')] +[2024-11-08 05:07:33,927][42004] Updated weights for policy 0, policy_version 39976 (0.0025) +[2024-11-08 05:07:39,574][41694] Fps is (10 sec: 6332.8, 60 sec: 6777.7, 300 sec: 6641.6). Total num frames: 163766272. Throughput: 0: 1687.9. Samples: 35936570. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:07:39,576][41694] Avg episode reward: [(0, '4.366')] +[2024-11-08 05:07:39,588][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000039982_163766272.pth... +[2024-11-08 05:07:39,729][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000039595_162181120.pth +[2024-11-08 05:07:41,589][42004] Updated weights for policy 0, policy_version 39986 (0.0050) +[2024-11-08 05:07:42,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6826.7, 300 sec: 6636.9). Total num frames: 163790848. Throughput: 0: 1664.2. Samples: 35944096. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:07:42,933][41694] Avg episode reward: [(0, '4.644')] +[2024-11-08 05:07:47,749][42004] Updated weights for policy 0, policy_version 39996 (0.0033) +[2024-11-08 05:07:47,937][41694] Fps is (10 sec: 6856.8, 60 sec: 6757.8, 300 sec: 6636.8). Total num frames: 163823616. Throughput: 0: 1644.2. Samples: 35948782. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:07:47,946][41694] Avg episode reward: [(0, '4.347')] +[2024-11-08 05:07:52,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6758.4, 300 sec: 6623.0). Total num frames: 163860480. Throughput: 0: 1719.0. Samples: 35959590. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:07:52,933][41694] Avg episode reward: [(0, '4.409')] +[2024-11-08 05:07:53,123][42004] Updated weights for policy 0, policy_version 40006 (0.0029) +[2024-11-08 05:07:57,932][41694] Fps is (10 sec: 7376.7, 60 sec: 6758.4, 300 sec: 6636.9). Total num frames: 163897344. Throughput: 0: 1751.7. Samples: 35970996. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:07:57,933][41694] Avg episode reward: [(0, '4.280')] +[2024-11-08 05:07:58,620][42004] Updated weights for policy 0, policy_version 40016 (0.0028) +[2024-11-08 05:08:02,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6678.6). Total num frames: 163934208. Throughput: 0: 1757.3. Samples: 35976366. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:08:02,933][41694] Avg episode reward: [(0, '4.354')] +[2024-11-08 05:08:04,514][42004] Updated weights for policy 0, policy_version 40026 (0.0031) +[2024-11-08 05:08:07,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6963.2, 300 sec: 6678.7). Total num frames: 163966976. Throughput: 0: 1725.2. Samples: 35986262. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:08:07,934][41694] Avg episode reward: [(0, '4.714')] +[2024-11-08 05:08:10,644][42004] Updated weights for policy 0, policy_version 40036 (0.0033) +[2024-11-08 05:08:13,629][41694] Fps is (10 sec: 5743.2, 60 sec: 6815.7, 300 sec: 6649.0). Total num frames: 163995648. Throughput: 0: 1570.8. Samples: 35991644. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:08:13,631][41694] Avg episode reward: [(0, '4.436')] +[2024-11-08 05:08:17,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6636.9). Total num frames: 164024320. Throughput: 0: 1622.5. Samples: 35998484. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:08:17,934][41694] Avg episode reward: [(0, '4.344')] +[2024-11-08 05:08:18,625][42004] Updated weights for policy 0, policy_version 40046 (0.0029) +[2024-11-08 05:08:22,931][41694] Fps is (10 sec: 6605.0, 60 sec: 6690.1, 300 sec: 6609.1). Total num frames: 164057088. Throughput: 0: 1660.8. Samples: 36008580. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:08:22,935][41694] Avg episode reward: [(0, '4.479')] +[2024-11-08 05:08:24,531][42004] Updated weights for policy 0, policy_version 40056 (0.0041) +[2024-11-08 05:08:27,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6690.1, 300 sec: 6609.1). Total num frames: 164093952. Throughput: 0: 1679.9. Samples: 36019692. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:08:27,933][41694] Avg episode reward: [(0, '4.359')] +[2024-11-08 05:08:30,011][42004] Updated weights for policy 0, policy_version 40066 (0.0034) +[2024-11-08 05:08:32,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6621.9, 300 sec: 6609.1). Total num frames: 164130816. Throughput: 0: 1695.4. Samples: 36025064. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:08:32,933][41694] Avg episode reward: [(0, '4.574')] +[2024-11-08 05:08:35,869][42004] Updated weights for policy 0, policy_version 40076 (0.0037) +[2024-11-08 05:08:37,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6808.2, 300 sec: 6650.8). Total num frames: 164163584. Throughput: 0: 1688.4. Samples: 36035570. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:08:37,933][41694] Avg episode reward: [(0, '4.496')] +[2024-11-08 05:08:41,547][42004] Updated weights for policy 0, policy_version 40086 (0.0027) +[2024-11-08 05:08:42,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6826.7, 300 sec: 6664.7). Total num frames: 164200448. Throughput: 0: 1675.8. Samples: 36046408. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:08:42,934][41694] Avg episode reward: [(0, '4.271')] +[2024-11-08 05:08:47,932][41694] Fps is (10 sec: 5734.0, 60 sec: 6622.4, 300 sec: 6623.0). Total num frames: 164220928. Throughput: 0: 1671.4. Samples: 36051580. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:08:47,937][41694] Avg episode reward: [(0, '4.329')] +[2024-11-08 05:08:49,477][42004] Updated weights for policy 0, policy_version 40096 (0.0042) +[2024-11-08 05:08:52,931][41694] Fps is (10 sec: 5325.0, 60 sec: 6553.6, 300 sec: 6623.0). Total num frames: 164253696. Throughput: 0: 1595.2. Samples: 36058046. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:08:52,934][41694] Avg episode reward: [(0, '4.406')] +[2024-11-08 05:08:56,106][42004] Updated weights for policy 0, policy_version 40106 (0.0063) +[2024-11-08 05:08:57,932][41694] Fps is (10 sec: 6554.0, 60 sec: 6485.3, 300 sec: 6636.9). Total num frames: 164286464. Throughput: 0: 1722.2. Samples: 36067942. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:08:57,933][41694] Avg episode reward: [(0, '4.525')] +[2024-11-08 05:09:01,634][42004] Updated weights for policy 0, policy_version 40116 (0.0033) +[2024-11-08 05:09:02,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6485.3, 300 sec: 6650.8). Total num frames: 164323328. Throughput: 0: 1665.6. Samples: 36073438. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:09:02,934][41694] Avg episode reward: [(0, '4.423')] +[2024-11-08 05:09:07,271][42004] Updated weights for policy 0, policy_version 40126 (0.0029) +[2024-11-08 05:09:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 164360192. Throughput: 0: 1675.1. Samples: 36083962. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:09:07,933][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 05:09:12,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6699.8, 300 sec: 6692.4). Total num frames: 164392960. Throughput: 0: 1672.8. Samples: 36094968. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:09:12,934][41694] Avg episode reward: [(0, '4.331')] +[2024-11-08 05:09:13,051][42004] Updated weights for policy 0, policy_version 40136 (0.0035) +[2024-11-08 05:09:17,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6706.3). Total num frames: 164433920. Throughput: 0: 1674.1. Samples: 36100398. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:09:17,933][41694] Avg episode reward: [(0, '4.429')] +[2024-11-08 05:09:18,511][42004] Updated weights for policy 0, policy_version 40146 (0.0024) +[2024-11-08 05:09:22,932][41694] Fps is (10 sec: 6144.1, 60 sec: 6621.8, 300 sec: 6664.7). Total num frames: 164454400. Throughput: 0: 1639.9. Samples: 36109364. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:09:22,934][41694] Avg episode reward: [(0, '4.313')] +[2024-11-08 05:09:26,558][42004] Updated weights for policy 0, policy_version 40156 (0.0034) +[2024-11-08 05:09:27,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 164487168. Throughput: 0: 1591.9. Samples: 36118044. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:09:27,935][41694] Avg episode reward: [(0, '4.406')] +[2024-11-08 05:09:32,510][42004] Updated weights for policy 0, policy_version 40166 (0.0031) +[2024-11-08 05:09:32,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6485.4, 300 sec: 6636.9). Total num frames: 164519936. Throughput: 0: 1580.9. Samples: 36122720. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:09:32,933][41694] Avg episode reward: [(0, '4.321')] +[2024-11-08 05:09:37,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6553.6, 300 sec: 6636.9). Total num frames: 164556800. Throughput: 0: 1686.6. Samples: 36133944. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:09:37,934][41694] Avg episode reward: [(0, '4.424')] +[2024-11-08 05:09:38,029][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000040176_164560896.pth... +[2024-11-08 05:09:38,033][42004] Updated weights for policy 0, policy_version 40176 (0.0027) +[2024-11-08 05:09:38,231][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000039784_162955264.pth +[2024-11-08 05:09:42,932][41694] Fps is (10 sec: 7372.3, 60 sec: 6553.6, 300 sec: 6681.2). Total num frames: 164593664. Throughput: 0: 1707.5. Samples: 36144782. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:09:42,935][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 05:09:43,727][42004] Updated weights for policy 0, policy_version 40186 (0.0031) +[2024-11-08 05:09:47,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6758.5, 300 sec: 6678.6). Total num frames: 164626432. Throughput: 0: 1701.1. Samples: 36149988. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:09:47,933][41694] Avg episode reward: [(0, '4.388')] +[2024-11-08 05:09:49,679][42004] Updated weights for policy 0, policy_version 40196 (0.0024) +[2024-11-08 05:09:52,932][41694] Fps is (10 sec: 6963.5, 60 sec: 6826.6, 300 sec: 6692.4). Total num frames: 164663296. Throughput: 0: 1704.3. Samples: 36160656. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:09:52,935][41694] Avg episode reward: [(0, '4.355')] +[2024-11-08 05:09:57,230][42004] Updated weights for policy 0, policy_version 40206 (0.0032) +[2024-11-08 05:09:57,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6690.1, 300 sec: 6664.7). Total num frames: 164687872. Throughput: 0: 1623.8. Samples: 36168040. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:09:57,934][41694] Avg episode reward: [(0, '4.505')] +[2024-11-08 05:10:02,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6553.6, 300 sec: 6636.9). Total num frames: 164716544. Throughput: 0: 1604.0. Samples: 36172576. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:10:02,934][41694] Avg episode reward: [(0, '4.541')] +[2024-11-08 05:10:03,775][42004] Updated weights for policy 0, policy_version 40216 (0.0032) +[2024-11-08 05:10:07,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6636.9). Total num frames: 164753408. Throughput: 0: 1626.4. Samples: 36182552. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:10:07,933][41694] Avg episode reward: [(0, '4.596')] +[2024-11-08 05:10:09,447][42004] Updated weights for policy 0, policy_version 40226 (0.0029) +[2024-11-08 05:10:12,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 164786176. Throughput: 0: 1668.9. Samples: 36193144. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:10:12,933][41694] Avg episode reward: [(0, '4.593')] +[2024-11-08 05:10:15,756][42004] Updated weights for policy 0, policy_version 40236 (0.0023) +[2024-11-08 05:10:17,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6687.7). Total num frames: 164818944. Throughput: 0: 1673.0. Samples: 36198006. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:10:17,937][41694] Avg episode reward: [(0, '4.390')] +[2024-11-08 05:10:21,799][42004] Updated weights for policy 0, policy_version 40246 (0.0031) +[2024-11-08 05:10:22,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 6692.5). Total num frames: 164855808. Throughput: 0: 1646.4. Samples: 36208032. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:10:22,933][41694] Avg episode reward: [(0, '4.386')] +[2024-11-08 05:10:27,433][42004] Updated weights for policy 0, policy_version 40256 (0.0041) +[2024-11-08 05:10:27,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 164888576. Throughput: 0: 1648.1. Samples: 36218946. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:10:27,933][41694] Avg episode reward: [(0, '4.567')] +[2024-11-08 05:10:32,936][41694] Fps is (10 sec: 5322.6, 60 sec: 6484.9, 300 sec: 6622.9). Total num frames: 164909056. Throughput: 0: 1574.4. Samples: 36220844. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:10:32,939][41694] Avg episode reward: [(0, '4.612')] +[2024-11-08 05:10:35,622][42004] Updated weights for policy 0, policy_version 40266 (0.0023) +[2024-11-08 05:10:37,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6417.1, 300 sec: 6609.1). Total num frames: 164941824. Throughput: 0: 1552.5. Samples: 36230518. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:10:37,935][41694] Avg episode reward: [(0, '4.562')] +[2024-11-08 05:10:41,286][42004] Updated weights for policy 0, policy_version 40276 (0.0051) +[2024-11-08 05:10:42,931][41694] Fps is (10 sec: 6966.1, 60 sec: 6417.1, 300 sec: 6623.0). Total num frames: 164978688. Throughput: 0: 1634.3. Samples: 36241584. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:10:42,933][41694] Avg episode reward: [(0, '4.784')] +[2024-11-08 05:10:46,861][42004] Updated weights for policy 0, policy_version 40286 (0.0029) +[2024-11-08 05:10:47,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6553.6, 300 sec: 6636.9). Total num frames: 165019648. Throughput: 0: 1652.4. Samples: 36246936. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:10:47,933][41694] Avg episode reward: [(0, '4.497')] +[2024-11-08 05:10:52,232][42004] Updated weights for policy 0, policy_version 40296 (0.0023) +[2024-11-08 05:10:52,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6553.6, 300 sec: 6678.6). Total num frames: 165056512. Throughput: 0: 1685.3. Samples: 36258390. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:10:52,933][41694] Avg episode reward: [(0, '4.367')] +[2024-11-08 05:10:57,751][42004] Updated weights for policy 0, policy_version 40306 (0.0022) +[2024-11-08 05:10:57,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6692.4). Total num frames: 165093376. Throughput: 0: 1698.3. Samples: 36269570. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:10:57,935][41694] Avg episode reward: [(0, '4.578')] +[2024-11-08 05:11:04,128][41694] Fps is (10 sec: 5853.2, 60 sec: 6626.3, 300 sec: 6651.6). Total num frames: 165122048. Throughput: 0: 1667.8. Samples: 36275054. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:11:04,131][41694] Avg episode reward: [(0, '4.295')] +[2024-11-08 05:11:05,482][42004] Updated weights for policy 0, policy_version 40316 (0.0029) +[2024-11-08 05:11:07,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 165146624. Throughput: 0: 1642.3. Samples: 36281934. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:11:07,934][41694] Avg episode reward: [(0, '4.501')] +[2024-11-08 05:11:12,003][42004] Updated weights for policy 0, policy_version 40326 (0.0031) +[2024-11-08 05:11:12,931][41694] Fps is (10 sec: 6513.9, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 165179392. Throughput: 0: 1612.0. Samples: 36291484. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:11:12,933][41694] Avg episode reward: [(0, '4.407')] +[2024-11-08 05:11:17,612][42004] Updated weights for policy 0, policy_version 40336 (0.0019) +[2024-11-08 05:11:17,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 165216256. Throughput: 0: 1683.0. Samples: 36296574. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:11:17,933][41694] Avg episode reward: [(0, '4.431')] +[2024-11-08 05:11:22,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 165253120. Throughput: 0: 1724.5. Samples: 36308120. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:11:22,933][41694] Avg episode reward: [(0, '4.457')] +[2024-11-08 05:11:22,991][42004] Updated weights for policy 0, policy_version 40346 (0.0032) +[2024-11-08 05:11:27,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.1, 300 sec: 6692.4). Total num frames: 165289984. Throughput: 0: 1723.2. Samples: 36319128. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:11:27,935][41694] Avg episode reward: [(0, '4.408')] +[2024-11-08 05:11:28,611][42004] Updated weights for policy 0, policy_version 40356 (0.0037) +[2024-11-08 05:11:32,932][41694] Fps is (10 sec: 7782.2, 60 sec: 7031.9, 300 sec: 6720.2). Total num frames: 165330944. Throughput: 0: 1727.1. Samples: 36324656. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:11:32,934][41694] Avg episode reward: [(0, '4.617')] +[2024-11-08 05:11:33,943][42004] Updated weights for policy 0, policy_version 40366 (0.0029) +[2024-11-08 05:11:38,336][41694] Fps is (10 sec: 5905.3, 60 sec: 6781.0, 300 sec: 6669.4). Total num frames: 165351424. Throughput: 0: 1700.5. Samples: 36335600. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:11:38,341][41694] Avg episode reward: [(0, '4.553')] +[2024-11-08 05:11:38,456][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000040370_165355520.pth... +[2024-11-08 05:11:38,553][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000039982_163766272.pth +[2024-11-08 05:11:42,399][42004] Updated weights for policy 0, policy_version 40376 (0.0025) +[2024-11-08 05:11:42,932][41694] Fps is (10 sec: 4915.2, 60 sec: 6690.1, 300 sec: 6650.8). Total num frames: 165380096. Throughput: 0: 1609.9. Samples: 36342014. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:11:42,933][41694] Avg episode reward: [(0, '4.620')] +[2024-11-08 05:11:47,933][41694] Fps is (10 sec: 6828.7, 60 sec: 6621.7, 300 sec: 6650.8). Total num frames: 165416960. Throughput: 0: 1636.0. Samples: 36346720. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:11:47,935][41694] Avg episode reward: [(0, '4.627')] +[2024-11-08 05:11:48,383][42004] Updated weights for policy 0, policy_version 40386 (0.0034) +[2024-11-08 05:11:52,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 165453824. Throughput: 0: 1688.0. Samples: 36357894. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:11:52,933][41694] Avg episode reward: [(0, '4.664')] +[2024-11-08 05:11:53,532][42004] Updated weights for policy 0, policy_version 40396 (0.0027) +[2024-11-08 05:11:57,931][41694] Fps is (10 sec: 7783.5, 60 sec: 6690.2, 300 sec: 6664.7). Total num frames: 165494784. Throughput: 0: 1739.6. Samples: 36369764. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:11:57,933][41694] Avg episode reward: [(0, '4.433')] +[2024-11-08 05:11:58,803][42004] Updated weights for policy 0, policy_version 40406 (0.0028) +[2024-11-08 05:12:02,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6895.9, 300 sec: 6706.3). Total num frames: 165527552. Throughput: 0: 1751.9. Samples: 36375408. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:12:02,933][41694] Avg episode reward: [(0, '4.315')] +[2024-11-08 05:12:05,333][42004] Updated weights for policy 0, policy_version 40416 (0.0030) +[2024-11-08 05:12:07,934][41694] Fps is (10 sec: 6142.7, 60 sec: 6826.5, 300 sec: 6692.4). Total num frames: 165556224. Throughput: 0: 1695.9. Samples: 36384440. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:12:07,938][41694] Avg episode reward: [(0, '4.387')] +[2024-11-08 05:12:12,933][41694] Fps is (10 sec: 4915.0, 60 sec: 6621.8, 300 sec: 6636.9). Total num frames: 165576704. Throughput: 0: 1608.4. Samples: 36391506. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:12:12,935][41694] Avg episode reward: [(0, '4.447')] +[2024-11-08 05:12:13,674][42004] Updated weights for policy 0, policy_version 40426 (0.0030) +[2024-11-08 05:12:17,932][41694] Fps is (10 sec: 5325.5, 60 sec: 6553.5, 300 sec: 6623.0). Total num frames: 165609472. Throughput: 0: 1577.1. Samples: 36395626. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:12:17,935][41694] Avg episode reward: [(0, '4.456')] +[2024-11-08 05:12:20,449][42004] Updated weights for policy 0, policy_version 40436 (0.0039) +[2024-11-08 05:12:22,931][41694] Fps is (10 sec: 6553.9, 60 sec: 6485.3, 300 sec: 6609.1). Total num frames: 165642240. Throughput: 0: 1561.9. Samples: 36405256. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:12:22,933][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 05:12:25,678][42004] Updated weights for policy 0, policy_version 40446 (0.0025) +[2024-11-08 05:12:27,932][41694] Fps is (10 sec: 6963.6, 60 sec: 6485.3, 300 sec: 6595.2). Total num frames: 165679104. Throughput: 0: 1651.1. Samples: 36416312. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:12:27,934][41694] Avg episode reward: [(0, '4.552')] +[2024-11-08 05:12:31,391][42004] Updated weights for policy 0, policy_version 40456 (0.0033) +[2024-11-08 05:12:32,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6417.1, 300 sec: 6646.1). Total num frames: 165715968. Throughput: 0: 1671.3. Samples: 36421928. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:12:32,933][41694] Avg episode reward: [(0, '4.577')] +[2024-11-08 05:12:36,826][42004] Updated weights for policy 0, policy_version 40466 (0.0041) +[2024-11-08 05:12:37,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6735.5, 300 sec: 6650.8). Total num frames: 165752832. Throughput: 0: 1671.5. Samples: 36433112. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:12:37,933][41694] Avg episode reward: [(0, '4.619')] +[2024-11-08 05:12:42,385][42004] Updated weights for policy 0, policy_version 40476 (0.0028) +[2024-11-08 05:12:42,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6664.8). Total num frames: 165789696. Throughput: 0: 1659.5. Samples: 36444440. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:12:42,933][41694] Avg episode reward: [(0, '4.366')] +[2024-11-08 05:12:47,932][41694] Fps is (10 sec: 6143.6, 60 sec: 6621.9, 300 sec: 6623.0). Total num frames: 165814272. Throughput: 0: 1633.4. Samples: 36448910. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:12:47,935][41694] Avg episode reward: [(0, '4.392')] +[2024-11-08 05:12:50,071][42004] Updated weights for policy 0, policy_version 40486 (0.0041) +[2024-11-08 05:12:52,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6553.6, 300 sec: 6609.1). Total num frames: 165847040. Throughput: 0: 1606.8. Samples: 36456742. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:12:52,933][41694] Avg episode reward: [(0, '4.327')] +[2024-11-08 05:12:56,391][42004] Updated weights for policy 0, policy_version 40496 (0.0028) +[2024-11-08 05:12:57,932][41694] Fps is (10 sec: 6554.0, 60 sec: 6417.0, 300 sec: 6595.3). Total num frames: 165879808. Throughput: 0: 1673.1. Samples: 36466796. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:12:57,933][41694] Avg episode reward: [(0, '4.289')] +[2024-11-08 05:13:02,358][42004] Updated weights for policy 0, policy_version 40506 (0.0024) +[2024-11-08 05:13:02,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6485.3, 300 sec: 6609.1). Total num frames: 165916672. Throughput: 0: 1695.8. Samples: 36471934. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:13:02,935][41694] Avg episode reward: [(0, '4.626')] +[2024-11-08 05:13:07,685][42004] Updated weights for policy 0, policy_version 40516 (0.0032) +[2024-11-08 05:13:07,933][41694] Fps is (10 sec: 7371.9, 60 sec: 6621.9, 300 sec: 6652.6). Total num frames: 165953536. Throughput: 0: 1719.8. Samples: 36482650. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:13:07,935][41694] Avg episode reward: [(0, '4.440')] +[2024-11-08 05:13:12,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6895.0, 300 sec: 6664.7). Total num frames: 165990400. Throughput: 0: 1725.2. Samples: 36493944. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:13:12,933][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 05:13:13,283][42004] Updated weights for policy 0, policy_version 40526 (0.0030) +[2024-11-08 05:13:17,931][41694] Fps is (10 sec: 6964.1, 60 sec: 6895.0, 300 sec: 6664.7). Total num frames: 166023168. Throughput: 0: 1713.7. Samples: 36499046. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:13:17,934][41694] Avg episode reward: [(0, '4.380')] +[2024-11-08 05:13:21,040][42004] Updated weights for policy 0, policy_version 40536 (0.0032) +[2024-11-08 05:13:22,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6758.4, 300 sec: 6623.0). Total num frames: 166047744. Throughput: 0: 1626.7. Samples: 36506314. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:13:22,934][41694] Avg episode reward: [(0, '4.449')] +[2024-11-08 05:13:27,381][42004] Updated weights for policy 0, policy_version 40546 (0.0046) +[2024-11-08 05:13:27,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6621.9, 300 sec: 6595.3). Total num frames: 166076416. Throughput: 0: 1591.8. Samples: 36516070. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:13:27,933][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 05:13:32,800][42004] Updated weights for policy 0, policy_version 40556 (0.0030) +[2024-11-08 05:13:32,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 6623.0). Total num frames: 166117376. Throughput: 0: 1609.5. Samples: 36521338. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:13:32,933][41694] Avg episode reward: [(0, '4.431')] +[2024-11-08 05:13:37,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6690.1, 300 sec: 6623.0). Total num frames: 166154240. Throughput: 0: 1687.5. Samples: 36532678. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:13:37,934][41694] Avg episode reward: [(0, '4.416')] +[2024-11-08 05:13:37,946][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000040565_166154240.pth... +[2024-11-08 05:13:38,037][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000040176_164560896.pth +[2024-11-08 05:13:38,347][42004] Updated weights for policy 0, policy_version 40566 (0.0026) +[2024-11-08 05:13:42,932][41694] Fps is (10 sec: 7372.2, 60 sec: 6690.0, 300 sec: 6678.6). Total num frames: 166191104. Throughput: 0: 1721.1. Samples: 36544248. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:13:42,935][41694] Avg episode reward: [(0, '4.199')] +[2024-11-08 05:13:43,753][42004] Updated weights for policy 0, policy_version 40576 (0.0034) +[2024-11-08 05:13:47,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6963.3, 300 sec: 6706.3). Total num frames: 166232064. Throughput: 0: 1731.2. Samples: 36549838. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:13:47,935][41694] Avg episode reward: [(0, '4.452')] +[2024-11-08 05:13:48,948][42004] Updated weights for policy 0, policy_version 40586 (0.0032) +[2024-11-08 05:13:54,815][41694] Fps is (10 sec: 6204.8, 60 sec: 6751.3, 300 sec: 6663.8). Total num frames: 166264832. Throughput: 0: 1663.7. Samples: 36560648. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:13:54,818][41694] Avg episode reward: [(0, '4.302')] +[2024-11-08 05:13:57,462][42004] Updated weights for policy 0, policy_version 40596 (0.0034) +[2024-11-08 05:13:57,932][41694] Fps is (10 sec: 4915.2, 60 sec: 6690.1, 300 sec: 6636.9). Total num frames: 166281216. Throughput: 0: 1630.5. Samples: 36567316. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:13:57,934][41694] Avg episode reward: [(0, '4.255')] +[2024-11-08 05:14:02,931][41694] Fps is (10 sec: 6055.6, 60 sec: 6621.9, 300 sec: 6623.0). Total num frames: 166313984. Throughput: 0: 1614.3. Samples: 36571690. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:14:02,934][41694] Avg episode reward: [(0, '4.341')] +[2024-11-08 05:14:03,872][42004] Updated weights for policy 0, policy_version 40606 (0.0029) +[2024-11-08 05:14:07,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6622.0, 300 sec: 6636.9). Total num frames: 166350848. Throughput: 0: 1673.9. Samples: 36581640. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:14:07,934][41694] Avg episode reward: [(0, '4.597')] +[2024-11-08 05:14:09,409][42004] Updated weights for policy 0, policy_version 40616 (0.0020) +[2024-11-08 05:14:12,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6623.0). Total num frames: 166387712. Throughput: 0: 1702.1. Samples: 36592664. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:14:12,933][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 05:14:15,133][42004] Updated weights for policy 0, policy_version 40626 (0.0035) +[2024-11-08 05:14:17,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 166424576. Throughput: 0: 1710.5. Samples: 36598312. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:14:17,933][41694] Avg episode reward: [(0, '4.276')] +[2024-11-08 05:14:20,531][42004] Updated weights for policy 0, policy_version 40636 (0.0025) +[2024-11-08 05:14:22,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6895.0, 300 sec: 6692.5). Total num frames: 166461440. Throughput: 0: 1715.7. Samples: 36609882. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:14:22,933][41694] Avg episode reward: [(0, '4.322')] +[2024-11-08 05:14:26,284][42004] Updated weights for policy 0, policy_version 40646 (0.0027) +[2024-11-08 05:14:28,998][41694] Fps is (10 sec: 5922.0, 60 sec: 6774.5, 300 sec: 6654.5). Total num frames: 166490112. Throughput: 0: 1656.4. Samples: 36620552. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:14:29,000][41694] Avg episode reward: [(0, '4.338')] +[2024-11-08 05:14:32,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6621.9, 300 sec: 6636.9). Total num frames: 166514688. Throughput: 0: 1614.4. Samples: 36622486. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:14:32,934][41694] Avg episode reward: [(0, '4.544')] +[2024-11-08 05:14:34,322][42004] Updated weights for policy 0, policy_version 40656 (0.0027) +[2024-11-08 05:14:37,931][41694] Fps is (10 sec: 6877.5, 60 sec: 6621.9, 300 sec: 6636.9). Total num frames: 166551552. Throughput: 0: 1647.9. Samples: 36631702. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:14:37,936][41694] Avg episode reward: [(0, '4.446')] +[2024-11-08 05:14:40,071][42004] Updated weights for policy 0, policy_version 40666 (0.0039) +[2024-11-08 05:14:42,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 166588416. Throughput: 0: 1687.2. Samples: 36643240. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:14:42,933][41694] Avg episode reward: [(0, '4.572')] +[2024-11-08 05:14:45,481][42004] Updated weights for policy 0, policy_version 40676 (0.0031) +[2024-11-08 05:14:47,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 166625280. Throughput: 0: 1712.3. Samples: 36648746. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:14:47,934][41694] Avg episode reward: [(0, '4.647')] +[2024-11-08 05:14:50,973][42004] Updated weights for policy 0, policy_version 40686 (0.0036) +[2024-11-08 05:14:52,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6766.0, 300 sec: 6678.6). Total num frames: 166658048. Throughput: 0: 1739.8. Samples: 36659930. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:14:52,933][41694] Avg episode reward: [(0, '4.270')] +[2024-11-08 05:14:56,876][42004] Updated weights for policy 0, policy_version 40696 (0.0027) +[2024-11-08 05:14:57,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 6720.2). Total num frames: 166699008. Throughput: 0: 1731.5. Samples: 36670580. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:14:57,933][41694] Avg episode reward: [(0, '4.424')] +[2024-11-08 05:15:03,218][41694] Fps is (10 sec: 5972.8, 60 sec: 6726.3, 300 sec: 6658.2). Total num frames: 166719488. Throughput: 0: 1707.9. Samples: 36675658. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:15:03,221][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 05:15:05,053][42004] Updated weights for policy 0, policy_version 40706 (0.0031) +[2024-11-08 05:15:07,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 166748160. Throughput: 0: 1602.9. Samples: 36682012. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:15:07,934][41694] Avg episode reward: [(0, '4.587')] +[2024-11-08 05:15:11,336][42004] Updated weights for policy 0, policy_version 40716 (0.0034) +[2024-11-08 05:15:12,931][41694] Fps is (10 sec: 6325.4, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 166780928. Throughput: 0: 1630.2. Samples: 36692172. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:15:12,933][41694] Avg episode reward: [(0, '4.470')] +[2024-11-08 05:15:17,366][42004] Updated weights for policy 0, policy_version 40726 (0.0035) +[2024-11-08 05:15:17,933][41694] Fps is (10 sec: 6962.4, 60 sec: 6553.5, 300 sec: 6650.8). Total num frames: 166817792. Throughput: 0: 1658.6. Samples: 36697126. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:15:17,935][41694] Avg episode reward: [(0, '4.342')] +[2024-11-08 05:15:22,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6485.3, 300 sec: 6650.8). Total num frames: 166850560. Throughput: 0: 1676.9. Samples: 36707162. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:15:22,934][41694] Avg episode reward: [(0, '4.506')] +[2024-11-08 05:15:23,019][42004] Updated weights for policy 0, policy_version 40736 (0.0023) +[2024-11-08 05:15:27,932][41694] Fps is (10 sec: 7373.5, 60 sec: 6811.2, 300 sec: 6720.3). Total num frames: 166891520. Throughput: 0: 1674.5. Samples: 36718592. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:15:27,937][41694] Avg episode reward: [(0, '4.416')] +[2024-11-08 05:15:28,473][42004] Updated weights for policy 0, policy_version 40746 (0.0022) +[2024-11-08 05:15:32,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6894.9, 300 sec: 6734.1). Total num frames: 166928384. Throughput: 0: 1676.4. Samples: 36724184. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:15:32,936][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 05:15:33,930][42004] Updated weights for policy 0, policy_version 40756 (0.0029) +[2024-11-08 05:15:37,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6621.8, 300 sec: 6678.6). Total num frames: 166948864. Throughput: 0: 1645.5. Samples: 36733976. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:15:37,934][41694] Avg episode reward: [(0, '4.291')] +[2024-11-08 05:15:37,953][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000040759_166948864.pth... +[2024-11-08 05:15:38,101][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000040370_165355520.pth +[2024-11-08 05:15:42,240][42004] Updated weights for policy 0, policy_version 40766 (0.0032) +[2024-11-08 05:15:42,932][41694] Fps is (10 sec: 4915.2, 60 sec: 6485.3, 300 sec: 6636.9). Total num frames: 166977536. Throughput: 0: 1577.8. Samples: 36741582. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:15:42,934][41694] Avg episode reward: [(0, '4.622')] +[2024-11-08 05:15:47,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6485.3, 300 sec: 6636.9). Total num frames: 167014400. Throughput: 0: 1577.7. Samples: 36746204. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:15:47,934][41694] Avg episode reward: [(0, '4.859')] +[2024-11-08 05:15:48,000][42004] Updated weights for policy 0, policy_version 40776 (0.0043) +[2024-11-08 05:15:52,931][41694] Fps is (10 sec: 7782.7, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 167055360. Throughput: 0: 1690.9. Samples: 36758102. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:15:52,933][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 05:15:53,222][42004] Updated weights for policy 0, policy_version 40786 (0.0027) +[2024-11-08 05:15:57,931][41694] Fps is (10 sec: 8192.4, 60 sec: 6621.9, 300 sec: 6719.7). Total num frames: 167096320. Throughput: 0: 1728.0. Samples: 36769932. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:15:57,933][41694] Avg episode reward: [(0, '4.554')] +[2024-11-08 05:15:58,407][42004] Updated weights for policy 0, policy_version 40796 (0.0028) +[2024-11-08 05:16:02,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6928.0, 300 sec: 6734.1). Total num frames: 167133184. Throughput: 0: 1747.2. Samples: 36775748. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:16:02,934][41694] Avg episode reward: [(0, '4.255')] +[2024-11-08 05:16:03,844][42004] Updated weights for policy 0, policy_version 40806 (0.0027) +[2024-11-08 05:16:07,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6963.2, 300 sec: 6734.1). Total num frames: 167165952. Throughput: 0: 1763.8. Samples: 36786532. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:16:07,934][41694] Avg episode reward: [(0, '4.569')] +[2024-11-08 05:16:12,211][42004] Updated weights for policy 0, policy_version 40816 (0.0030) +[2024-11-08 05:16:12,932][41694] Fps is (10 sec: 4915.2, 60 sec: 6690.1, 300 sec: 6664.7). Total num frames: 167182336. Throughput: 0: 1649.2. Samples: 36792806. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:16:12,933][41694] Avg episode reward: [(0, '4.496')] +[2024-11-08 05:16:17,931][41694] Fps is (10 sec: 4505.7, 60 sec: 6553.7, 300 sec: 6636.9). Total num frames: 167211008. Throughput: 0: 1611.5. Samples: 36796700. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:16:17,933][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 05:16:19,521][42004] Updated weights for policy 0, policy_version 40826 (0.0030) +[2024-11-08 05:16:22,935][41694] Fps is (10 sec: 6551.6, 60 sec: 6621.5, 300 sec: 6636.8). Total num frames: 167247872. Throughput: 0: 1604.9. Samples: 36806200. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:16:22,937][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 05:16:25,066][42004] Updated weights for policy 0, policy_version 40836 (0.0022) +[2024-11-08 05:16:27,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6553.6, 300 sec: 6623.0). Total num frames: 167284736. Throughput: 0: 1677.6. Samples: 36817072. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:16:27,933][41694] Avg episode reward: [(0, '4.446')] +[2024-11-08 05:16:30,484][42004] Updated weights for policy 0, policy_version 40846 (0.0023) +[2024-11-08 05:16:32,932][41694] Fps is (10 sec: 7375.0, 60 sec: 6553.6, 300 sec: 6687.7). Total num frames: 167321600. Throughput: 0: 1703.7. Samples: 36822870. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:16:32,934][41694] Avg episode reward: [(0, '4.401')] +[2024-11-08 05:16:36,026][42004] Updated weights for policy 0, policy_version 40856 (0.0028) +[2024-11-08 05:16:37,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6706.3). Total num frames: 167358464. Throughput: 0: 1686.0. Samples: 36833972. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:16:37,933][41694] Avg episode reward: [(0, '4.338')] +[2024-11-08 05:16:41,970][42004] Updated weights for policy 0, policy_version 40866 (0.0026) +[2024-11-08 05:16:42,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6894.9, 300 sec: 6692.5). Total num frames: 167391232. Throughput: 0: 1653.7. Samples: 36844348. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:16:42,934][41694] Avg episode reward: [(0, '4.477')] +[2024-11-08 05:16:47,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6690.2, 300 sec: 6650.8). Total num frames: 167415808. Throughput: 0: 1586.7. Samples: 36847150. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:16:47,934][41694] Avg episode reward: [(0, '4.563')] +[2024-11-08 05:16:49,769][42004] Updated weights for policy 0, policy_version 40876 (0.0039) +[2024-11-08 05:16:52,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6553.6, 300 sec: 6623.0). Total num frames: 167448576. Throughput: 0: 1553.5. Samples: 36856440. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:16:52,934][41694] Avg episode reward: [(0, '4.639')] +[2024-11-08 05:16:55,452][42004] Updated weights for policy 0, policy_version 40886 (0.0026) +[2024-11-08 05:16:57,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6485.3, 300 sec: 6636.9). Total num frames: 167485440. Throughput: 0: 1661.1. Samples: 36867554. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:16:57,933][41694] Avg episode reward: [(0, '4.520')] +[2024-11-08 05:17:00,929][42004] Updated weights for policy 0, policy_version 40896 (0.0030) +[2024-11-08 05:17:02,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6485.3, 300 sec: 6664.7). Total num frames: 167522304. Throughput: 0: 1700.9. Samples: 36873240. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:17:02,933][41694] Avg episode reward: [(0, '4.444')] +[2024-11-08 05:17:06,634][42004] Updated weights for policy 0, policy_version 40906 (0.0036) +[2024-11-08 05:17:07,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 167559168. Throughput: 0: 1733.2. Samples: 36884190. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:17:07,934][41694] Avg episode reward: [(0, '4.682')] +[2024-11-08 05:17:12,159][42004] Updated weights for policy 0, policy_version 40916 (0.0033) +[2024-11-08 05:17:12,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6894.9, 300 sec: 6734.1). Total num frames: 167596032. Throughput: 0: 1741.1. Samples: 36895420. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:17:12,933][41694] Avg episode reward: [(0, '4.481')] +[2024-11-08 05:17:19,370][41694] Fps is (10 sec: 6087.3, 60 sec: 6800.1, 300 sec: 6701.4). Total num frames: 167628800. Throughput: 0: 1671.5. Samples: 36900492. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:17:19,373][41694] Avg episode reward: [(0, '4.326')] +[2024-11-08 05:17:19,837][42004] Updated weights for policy 0, policy_version 40926 (0.0026) +[2024-11-08 05:17:22,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6690.5, 300 sec: 6678.6). Total num frames: 167649280. Throughput: 0: 1634.6. Samples: 36907528. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:17:22,934][41694] Avg episode reward: [(0, '4.442')] +[2024-11-08 05:17:26,238][42004] Updated weights for policy 0, policy_version 40936 (0.0033) +[2024-11-08 05:17:27,931][41694] Fps is (10 sec: 6698.4, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 167686144. Throughput: 0: 1627.4. Samples: 36917582. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:17:27,933][41694] Avg episode reward: [(0, '4.416')] +[2024-11-08 05:17:31,640][42004] Updated weights for policy 0, policy_version 40946 (0.0028) +[2024-11-08 05:17:32,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 167723008. Throughput: 0: 1686.7. Samples: 36923054. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:17:32,934][41694] Avg episode reward: [(0, '4.448')] +[2024-11-08 05:17:37,108][42004] Updated weights for policy 0, policy_version 40956 (0.0023) +[2024-11-08 05:17:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 167759872. Throughput: 0: 1729.2. Samples: 36934252. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:17:37,934][41694] Avg episode reward: [(0, '4.339')] +[2024-11-08 05:17:37,946][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000040957_167759872.pth... +[2024-11-08 05:17:38,091][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000040565_166154240.pth +[2024-11-08 05:17:42,705][42004] Updated weights for policy 0, policy_version 40966 (0.0023) +[2024-11-08 05:17:42,933][41694] Fps is (10 sec: 7372.1, 60 sec: 6758.3, 300 sec: 6720.2). Total num frames: 167796736. Throughput: 0: 1728.1. Samples: 36945318. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:17:42,936][41694] Avg episode reward: [(0, '4.372')] +[2024-11-08 05:17:47,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 6734.1). Total num frames: 167833600. Throughput: 0: 1726.4. Samples: 36950926. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:17:47,933][41694] Avg episode reward: [(0, '4.556')] +[2024-11-08 05:17:48,149][42004] Updated weights for policy 0, policy_version 40976 (0.0029) +[2024-11-08 05:17:53,248][41694] Fps is (10 sec: 6353.3, 60 sec: 6858.8, 300 sec: 6713.0). Total num frames: 167862272. Throughput: 0: 1717.5. Samples: 36962020. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:17:53,250][41694] Avg episode reward: [(0, '4.325')] +[2024-11-08 05:17:55,469][42004] Updated weights for policy 0, policy_version 40986 (0.0046) +[2024-11-08 05:17:57,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6826.7, 300 sec: 6706.3). Total num frames: 167895040. Throughput: 0: 1655.6. Samples: 36969922. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:17:57,933][41694] Avg episode reward: [(0, '4.287')] +[2024-11-08 05:18:01,033][42004] Updated weights for policy 0, policy_version 40996 (0.0024) +[2024-11-08 05:18:02,932][41694] Fps is (10 sec: 6767.4, 60 sec: 6758.4, 300 sec: 6692.5). Total num frames: 167927808. Throughput: 0: 1720.4. Samples: 36975436. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:18:02,934][41694] Avg episode reward: [(0, '4.491')] +[2024-11-08 05:18:07,567][42004] Updated weights for policy 0, policy_version 41006 (0.0030) +[2024-11-08 05:18:07,932][41694] Fps is (10 sec: 6553.2, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 167960576. Throughput: 0: 1725.0. Samples: 36985154. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:18:07,934][41694] Avg episode reward: [(0, '4.372')] +[2024-11-08 05:18:12,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6690.1, 300 sec: 6692.4). Total num frames: 167997440. Throughput: 0: 1741.6. Samples: 36995952. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:18:12,933][41694] Avg episode reward: [(0, '4.464')] +[2024-11-08 05:18:13,043][42004] Updated weights for policy 0, policy_version 41016 (0.0027) +[2024-11-08 05:18:17,932][41694] Fps is (10 sec: 7373.1, 60 sec: 6924.5, 300 sec: 6734.1). Total num frames: 168034304. Throughput: 0: 1738.5. Samples: 37001286. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:18:17,934][41694] Avg episode reward: [(0, '4.337')] +[2024-11-08 05:18:18,542][42004] Updated weights for policy 0, policy_version 41026 (0.0023) +[2024-11-08 05:18:22,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7031.5, 300 sec: 6761.9). Total num frames: 168071168. Throughput: 0: 1744.5. Samples: 37012756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:18:22,936][41694] Avg episode reward: [(0, '4.395')] +[2024-11-08 05:18:24,011][42004] Updated weights for policy 0, policy_version 41036 (0.0039) +[2024-11-08 05:18:27,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6826.7, 300 sec: 6706.3). Total num frames: 168095744. Throughput: 0: 1673.2. Samples: 37020612. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:18:27,933][41694] Avg episode reward: [(0, '4.302')] +[2024-11-08 05:18:31,516][42004] Updated weights for policy 0, policy_version 41046 (0.0030) +[2024-11-08 05:18:32,936][41694] Fps is (10 sec: 6141.4, 60 sec: 6826.2, 300 sec: 6706.2). Total num frames: 168132608. Throughput: 0: 1668.1. Samples: 37025996. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:18:32,952][41694] Avg episode reward: [(0, '4.488')] +[2024-11-08 05:18:37,697][42004] Updated weights for policy 0, policy_version 41056 (0.0030) +[2024-11-08 05:18:37,932][41694] Fps is (10 sec: 6962.8, 60 sec: 6758.3, 300 sec: 6692.5). Total num frames: 168165376. Throughput: 0: 1656.6. Samples: 37036046. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:18:37,935][41694] Avg episode reward: [(0, '4.446')] +[2024-11-08 05:18:42,931][41694] Fps is (10 sec: 6556.5, 60 sec: 6690.3, 300 sec: 6664.7). Total num frames: 168198144. Throughput: 0: 1695.6. Samples: 37046222. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:18:42,933][41694] Avg episode reward: [(0, '4.630')] +[2024-11-08 05:18:43,490][42004] Updated weights for policy 0, policy_version 41066 (0.0024) +[2024-11-08 05:18:47,932][41694] Fps is (10 sec: 7373.1, 60 sec: 6758.4, 300 sec: 6735.4). Total num frames: 168239104. Throughput: 0: 1697.7. Samples: 37051832. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:18:47,933][41694] Avg episode reward: [(0, '4.235')] +[2024-11-08 05:18:48,885][42004] Updated weights for policy 0, policy_version 41076 (0.0030) +[2024-11-08 05:18:52,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6931.5, 300 sec: 6761.9). Total num frames: 168275968. Throughput: 0: 1731.1. Samples: 37063052. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:18:52,933][41694] Avg episode reward: [(0, '4.535')] +[2024-11-08 05:18:54,582][42004] Updated weights for policy 0, policy_version 41086 (0.0037) +[2024-11-08 05:18:57,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 6761.9). Total num frames: 168308736. Throughput: 0: 1728.7. Samples: 37073744. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:18:57,933][41694] Avg episode reward: [(0, '4.615')] +[2024-11-08 05:19:02,454][42004] Updated weights for policy 0, policy_version 41096 (0.0034) +[2024-11-08 05:19:02,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6690.2, 300 sec: 6706.3). Total num frames: 168329216. Throughput: 0: 1696.1. Samples: 37077612. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:19:02,935][41694] Avg episode reward: [(0, '4.676')] +[2024-11-08 05:19:07,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 168366080. Throughput: 0: 1626.6. Samples: 37085952. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:19:07,935][41694] Avg episode reward: [(0, '4.375')] +[2024-11-08 05:19:08,167][42004] Updated weights for policy 0, policy_version 41106 (0.0024) +[2024-11-08 05:19:12,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 168402944. Throughput: 0: 1686.4. Samples: 37096500. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:19:12,933][41694] Avg episode reward: [(0, '4.700')] +[2024-11-08 05:19:14,188][42004] Updated weights for policy 0, policy_version 41116 (0.0022) +[2024-11-08 05:19:17,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 6692.4). Total num frames: 168435712. Throughput: 0: 1680.5. Samples: 37101612. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:19:17,934][41694] Avg episode reward: [(0, '4.309')] +[2024-11-08 05:19:19,848][42004] Updated weights for policy 0, policy_version 41126 (0.0035) +[2024-11-08 05:19:22,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6690.1, 300 sec: 6744.6). Total num frames: 168472576. Throughput: 0: 1708.0. Samples: 37112906. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:19:22,933][41694] Avg episode reward: [(0, '4.440')] +[2024-11-08 05:19:25,316][42004] Updated weights for policy 0, policy_version 41136 (0.0035) +[2024-11-08 05:19:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6761.9). Total num frames: 168509440. Throughput: 0: 1720.1. Samples: 37123626. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:19:27,935][41694] Avg episode reward: [(0, '4.593')] +[2024-11-08 05:19:31,077][42004] Updated weights for policy 0, policy_version 41146 (0.0033) +[2024-11-08 05:19:32,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6895.4, 300 sec: 6761.9). Total num frames: 168546304. Throughput: 0: 1715.7. Samples: 37129038. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:19:32,933][41694] Avg episode reward: [(0, '4.563')] +[2024-11-08 05:19:37,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6758.4, 300 sec: 6720.2). Total num frames: 168570880. Throughput: 0: 1641.9. Samples: 37136938. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:19:37,933][41694] Avg episode reward: [(0, '4.355')] +[2024-11-08 05:19:37,943][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000041155_168570880.pth... +[2024-11-08 05:19:38,051][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000040759_166948864.pth +[2024-11-08 05:19:38,219][42004] Updated weights for policy 0, policy_version 41156 (0.0031) +[2024-11-08 05:19:42,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 168607744. Throughput: 0: 1651.4. Samples: 37148056. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:19:42,933][41694] Avg episode reward: [(0, '4.479')] +[2024-11-08 05:19:44,044][42004] Updated weights for policy 0, policy_version 41166 (0.0026) +[2024-11-08 05:19:47,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 6720.2). Total num frames: 168640512. Throughput: 0: 1672.6. Samples: 37152878. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:19:47,934][41694] Avg episode reward: [(0, '4.545')] +[2024-11-08 05:19:50,253][42004] Updated weights for policy 0, policy_version 41176 (0.0023) +[2024-11-08 05:19:52,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6621.9, 300 sec: 6692.4). Total num frames: 168673280. Throughput: 0: 1718.2. Samples: 37163272. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:19:52,934][41694] Avg episode reward: [(0, '4.513')] +[2024-11-08 05:19:55,675][42004] Updated weights for policy 0, policy_version 41186 (0.0028) +[2024-11-08 05:19:57,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 6754.5). Total num frames: 168710144. Throughput: 0: 1732.6. Samples: 37174466. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:19:57,934][41694] Avg episode reward: [(0, '4.596')] +[2024-11-08 05:20:01,404][42004] Updated weights for policy 0, policy_version 41196 (0.0034) +[2024-11-08 05:20:02,933][41694] Fps is (10 sec: 7372.1, 60 sec: 6963.1, 300 sec: 6775.7). Total num frames: 168747008. Throughput: 0: 1740.6. Samples: 37179940. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:20:02,935][41694] Avg episode reward: [(0, '4.420')] +[2024-11-08 05:20:07,138][42004] Updated weights for policy 0, policy_version 41206 (0.0028) +[2024-11-08 05:20:08,895][41694] Fps is (10 sec: 6351.1, 60 sec: 6785.9, 300 sec: 6753.7). Total num frames: 168779776. Throughput: 0: 1685.9. Samples: 37190396. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:20:08,897][41694] Avg episode reward: [(0, '4.415')] +[2024-11-08 05:20:12,932][41694] Fps is (10 sec: 6144.6, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 168808448. Throughput: 0: 1665.6. Samples: 37198576. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:20:12,933][41694] Avg episode reward: [(0, '4.321')] +[2024-11-08 05:20:14,519][42004] Updated weights for policy 0, policy_version 41216 (0.0033) +[2024-11-08 05:20:17,932][41694] Fps is (10 sec: 6345.9, 60 sec: 6690.1, 300 sec: 6734.1). Total num frames: 168837120. Throughput: 0: 1655.2. Samples: 37203522. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:20:17,934][41694] Avg episode reward: [(0, '4.355')] +[2024-11-08 05:20:21,419][42004] Updated weights for policy 0, policy_version 41226 (0.0051) +[2024-11-08 05:20:22,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6621.9, 300 sec: 6706.3). Total num frames: 168869888. Throughput: 0: 1670.1. Samples: 37212092. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:20:22,933][41694] Avg episode reward: [(0, '4.404')] +[2024-11-08 05:20:27,455][42004] Updated weights for policy 0, policy_version 41236 (0.0026) +[2024-11-08 05:20:27,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6692.4). Total num frames: 168902656. Throughput: 0: 1649.7. Samples: 37222292. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:20:27,934][41694] Avg episode reward: [(0, '4.301')] +[2024-11-08 05:20:32,856][42004] Updated weights for policy 0, policy_version 41246 (0.0027) +[2024-11-08 05:20:32,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6621.9, 300 sec: 6761.9). Total num frames: 168943616. Throughput: 0: 1669.9. Samples: 37228024. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:20:32,935][41694] Avg episode reward: [(0, '4.490')] +[2024-11-08 05:20:37,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6826.7, 300 sec: 6789.6). Total num frames: 168980480. Throughput: 0: 1695.0. Samples: 37239546. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:20:37,933][41694] Avg episode reward: [(0, '4.310')] +[2024-11-08 05:20:38,153][42004] Updated weights for policy 0, policy_version 41256 (0.0029) +[2024-11-08 05:20:42,934][41694] Fps is (10 sec: 6142.8, 60 sec: 6621.7, 300 sec: 6747.9). Total num frames: 169005056. Throughput: 0: 1657.5. Samples: 37249056. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:20:42,936][41694] Avg episode reward: [(0, '4.402')] +[2024-11-08 05:20:45,755][42004] Updated weights for policy 0, policy_version 41266 (0.0026) +[2024-11-08 05:20:47,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 169037824. Throughput: 0: 1613.3. Samples: 37252538. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:20:47,933][41694] Avg episode reward: [(0, '4.277')] +[2024-11-08 05:20:51,383][42004] Updated weights for policy 0, policy_version 41276 (0.0027) +[2024-11-08 05:20:52,932][41694] Fps is (10 sec: 6964.2, 60 sec: 6690.1, 300 sec: 6706.3). Total num frames: 169074688. Throughput: 0: 1662.2. Samples: 37263594. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:20:52,935][41694] Avg episode reward: [(0, '4.351')] +[2024-11-08 05:20:57,019][42004] Updated weights for policy 0, policy_version 41286 (0.0024) +[2024-11-08 05:20:57,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6706.3). Total num frames: 169111552. Throughput: 0: 1684.2. Samples: 37274366. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:20:57,934][41694] Avg episode reward: [(0, '4.444')] +[2024-11-08 05:21:02,932][41694] Fps is (10 sec: 6963.4, 60 sec: 6622.0, 300 sec: 6706.3). Total num frames: 169144320. Throughput: 0: 1683.6. Samples: 37279286. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:21:02,934][41694] Avg episode reward: [(0, '4.379')] +[2024-11-08 05:21:03,088][42004] Updated weights for policy 0, policy_version 41296 (0.0039) +[2024-11-08 05:21:07,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6799.4, 300 sec: 6775.8). Total num frames: 169181184. Throughput: 0: 1732.6. Samples: 37290058. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:21:07,933][41694] Avg episode reward: [(0, '4.613')] +[2024-11-08 05:21:08,510][42004] Updated weights for policy 0, policy_version 41306 (0.0033) +[2024-11-08 05:21:12,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 169213952. Throughput: 0: 1725.5. Samples: 37299940. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:21:12,933][41694] Avg episode reward: [(0, '4.331')] +[2024-11-08 05:21:15,011][42004] Updated weights for policy 0, policy_version 41316 (0.0039) +[2024-11-08 05:21:17,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.2, 300 sec: 6748.1). Total num frames: 169238528. Throughput: 0: 1714.2. Samples: 37305162. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:21:17,933][41694] Avg episode reward: [(0, '4.472')] +[2024-11-08 05:21:22,205][42004] Updated weights for policy 0, policy_version 41326 (0.0027) +[2024-11-08 05:21:22,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 169275392. Throughput: 0: 1632.4. Samples: 37313004. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:21:22,934][41694] Avg episode reward: [(0, '4.564')] +[2024-11-08 05:21:27,642][42004] Updated weights for policy 0, policy_version 41336 (0.0033) +[2024-11-08 05:21:27,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6748.0). Total num frames: 169312256. Throughput: 0: 1671.2. Samples: 37324256. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:21:27,935][41694] Avg episode reward: [(0, '4.364')] +[2024-11-08 05:21:32,933][41694] Fps is (10 sec: 7372.2, 60 sec: 6758.3, 300 sec: 6748.0). Total num frames: 169349120. Throughput: 0: 1714.1. Samples: 37329674. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:21:32,935][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 05:21:33,455][42004] Updated weights for policy 0, policy_version 41346 (0.0036) +[2024-11-08 05:21:37,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6758.3, 300 sec: 6761.9). Total num frames: 169385984. Throughput: 0: 1709.8. Samples: 37340534. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:21:37,934][41694] Avg episode reward: [(0, '4.387')] +[2024-11-08 05:21:37,952][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000041354_169385984.pth... +[2024-11-08 05:21:38,072][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000040957_167759872.pth +[2024-11-08 05:21:38,954][42004] Updated weights for policy 0, policy_version 41356 (0.0029) +[2024-11-08 05:21:42,932][41694] Fps is (10 sec: 6963.7, 60 sec: 6895.2, 300 sec: 6789.6). Total num frames: 169418752. Throughput: 0: 1713.7. Samples: 37351484. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:21:42,933][41694] Avg episode reward: [(0, '4.382')] +[2024-11-08 05:21:44,699][42004] Updated weights for policy 0, policy_version 41366 (0.0031) +[2024-11-08 05:21:47,931][41694] Fps is (10 sec: 6963.6, 60 sec: 6963.2, 300 sec: 6803.5). Total num frames: 169455616. Throughput: 0: 1727.1. Samples: 37357004. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:21:47,934][41694] Avg episode reward: [(0, '4.514')] +[2024-11-08 05:21:51,949][42004] Updated weights for policy 0, policy_version 41376 (0.0034) +[2024-11-08 05:21:52,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6758.5, 300 sec: 6761.9). Total num frames: 169480192. Throughput: 0: 1660.9. Samples: 37364798. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:21:52,936][41694] Avg episode reward: [(0, '4.389')] +[2024-11-08 05:21:57,440][42004] Updated weights for policy 0, policy_version 41386 (0.0028) +[2024-11-08 05:21:57,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 169517056. Throughput: 0: 1692.4. Samples: 37376098. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:21:57,933][41694] Avg episode reward: [(0, '4.326')] +[2024-11-08 05:22:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6761.9). Total num frames: 169553920. Throughput: 0: 1698.5. Samples: 37381594. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:22:02,934][41694] Avg episode reward: [(0, '4.231')] +[2024-11-08 05:22:03,524][42004] Updated weights for policy 0, policy_version 41396 (0.0030) +[2024-11-08 05:22:07,935][41694] Fps is (10 sec: 6551.9, 60 sec: 6689.8, 300 sec: 6734.0). Total num frames: 169582592. Throughput: 0: 1724.1. Samples: 37390594. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:22:07,943][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 05:22:09,835][42004] Updated weights for policy 0, policy_version 41406 (0.0038) +[2024-11-08 05:22:12,931][41694] Fps is (10 sec: 6553.8, 60 sec: 6758.4, 300 sec: 6781.1). Total num frames: 169619456. Throughput: 0: 1712.7. Samples: 37401328. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:22:12,933][41694] Avg episode reward: [(0, '4.217')] +[2024-11-08 05:22:15,399][42004] Updated weights for policy 0, policy_version 41416 (0.0028) +[2024-11-08 05:22:17,931][41694] Fps is (10 sec: 7374.8, 60 sec: 6963.2, 300 sec: 6803.5). Total num frames: 169656320. Throughput: 0: 1713.1. Samples: 37406764. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:22:17,933][41694] Avg episode reward: [(0, '4.490')] +[2024-11-08 05:22:20,910][42004] Updated weights for policy 0, policy_version 41426 (0.0024) +[2024-11-08 05:22:22,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6803.5). Total num frames: 169693184. Throughput: 0: 1721.1. Samples: 37417982. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:22:22,934][41694] Avg episode reward: [(0, '4.666')] +[2024-11-08 05:22:27,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 169717760. Throughput: 0: 1652.0. Samples: 37425824. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:22:27,934][41694] Avg episode reward: [(0, '4.650')] +[2024-11-08 05:22:28,223][42004] Updated weights for policy 0, policy_version 41436 (0.0040) +[2024-11-08 05:22:32,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6758.5, 300 sec: 6761.9). Total num frames: 169754624. Throughput: 0: 1646.7. Samples: 37431108. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:22:32,934][41694] Avg episode reward: [(0, '4.307')] +[2024-11-08 05:22:33,774][42004] Updated weights for policy 0, policy_version 41446 (0.0032) +[2024-11-08 05:22:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 169791488. Throughput: 0: 1720.6. Samples: 37442224. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:22:37,939][41694] Avg episode reward: [(0, '4.390')] +[2024-11-08 05:22:39,921][42004] Updated weights for policy 0, policy_version 41456 (0.0049) +[2024-11-08 05:22:42,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 169824256. Throughput: 0: 1690.2. Samples: 37452158. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:22:42,933][41694] Avg episode reward: [(0, '4.423')] +[2024-11-08 05:22:45,613][42004] Updated weights for policy 0, policy_version 41466 (0.0023) +[2024-11-08 05:22:47,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6758.3, 300 sec: 6783.0). Total num frames: 169861120. Throughput: 0: 1690.6. Samples: 37457670. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:22:47,934][41694] Avg episode reward: [(0, '4.475')] +[2024-11-08 05:22:51,140][42004] Updated weights for policy 0, policy_version 41476 (0.0030) +[2024-11-08 05:22:52,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6789.6). Total num frames: 169897984. Throughput: 0: 1739.6. Samples: 37468870. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:22:52,933][41694] Avg episode reward: [(0, '4.375')] +[2024-11-08 05:22:56,526][42004] Updated weights for policy 0, policy_version 41486 (0.0026) +[2024-11-08 05:22:58,602][41694] Fps is (10 sec: 6142.0, 60 sec: 6751.2, 300 sec: 6760.4). Total num frames: 169926656. Throughput: 0: 1597.6. Samples: 37474292. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:22:58,607][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 05:23:02,932][41694] Fps is (10 sec: 6143.7, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 169959424. Throughput: 0: 1679.2. Samples: 37482330. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:23:02,935][41694] Avg episode reward: [(0, '4.347')] +[2024-11-08 05:23:04,031][42004] Updated weights for policy 0, policy_version 41496 (0.0036) +[2024-11-08 05:23:07,932][41694] Fps is (10 sec: 7463.7, 60 sec: 6895.2, 300 sec: 6775.8). Total num frames: 169996288. Throughput: 0: 1666.9. Samples: 37492994. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:23:07,934][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 05:23:09,575][42004] Updated weights for policy 0, policy_version 41506 (0.0031) +[2024-11-08 05:23:12,932][41694] Fps is (10 sec: 6963.4, 60 sec: 6826.7, 300 sec: 6761.9). Total num frames: 170029056. Throughput: 0: 1729.1. Samples: 37503634. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:23:12,933][41694] Avg episode reward: [(0, '4.358')] +[2024-11-08 05:23:15,764][42004] Updated weights for policy 0, policy_version 41516 (0.0026) +[2024-11-08 05:23:17,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 170061824. Throughput: 0: 1723.5. Samples: 37508666. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:23:17,934][41694] Avg episode reward: [(0, '4.568')] +[2024-11-08 05:23:21,716][42004] Updated weights for policy 0, policy_version 41526 (0.0023) +[2024-11-08 05:23:22,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 170098688. Throughput: 0: 1706.1. Samples: 37519000. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:23:22,934][41694] Avg episode reward: [(0, '4.419')] +[2024-11-08 05:23:27,278][42004] Updated weights for policy 0, policy_version 41536 (0.0025) +[2024-11-08 05:23:27,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6963.2, 300 sec: 6789.7). Total num frames: 170135552. Throughput: 0: 1729.8. Samples: 37530000. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:23:27,933][41694] Avg episode reward: [(0, '4.509')] +[2024-11-08 05:23:32,931][41694] Fps is (10 sec: 6144.3, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 170160128. Throughput: 0: 1728.6. Samples: 37535458. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:23:32,933][41694] Avg episode reward: [(0, '4.373')] +[2024-11-08 05:23:34,360][42004] Updated weights for policy 0, policy_version 41546 (0.0037) +[2024-11-08 05:23:37,932][41694] Fps is (10 sec: 6144.2, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 170196992. Throughput: 0: 1656.7. Samples: 37543420. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:23:37,934][41694] Avg episode reward: [(0, '4.590')] +[2024-11-08 05:23:37,945][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000041552_170196992.pth... +[2024-11-08 05:23:38,162][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000041155_168570880.pth +[2024-11-08 05:23:40,095][42004] Updated weights for policy 0, policy_version 41556 (0.0030) +[2024-11-08 05:23:42,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6826.6, 300 sec: 6761.9). Total num frames: 170233856. Throughput: 0: 1806.1. Samples: 37554358. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:23:42,934][41694] Avg episode reward: [(0, '4.476')] +[2024-11-08 05:23:45,858][42004] Updated weights for policy 0, policy_version 41566 (0.0023) +[2024-11-08 05:23:47,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 170266624. Throughput: 0: 1718.2. Samples: 37559648. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:23:47,935][41694] Avg episode reward: [(0, '4.519')] +[2024-11-08 05:23:51,840][42004] Updated weights for policy 0, policy_version 41576 (0.0034) +[2024-11-08 05:23:52,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 170303488. Throughput: 0: 1706.2. Samples: 37569772. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:23:52,933][41694] Avg episode reward: [(0, '4.311')] +[2024-11-08 05:23:57,231][42004] Updated weights for policy 0, policy_version 41586 (0.0026) +[2024-11-08 05:23:57,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6972.9, 300 sec: 6817.4). Total num frames: 170340352. Throughput: 0: 1723.2. Samples: 37581176. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:23:57,934][41694] Avg episode reward: [(0, '4.559')] +[2024-11-08 05:24:02,936][41694] Fps is (10 sec: 6960.1, 60 sec: 6894.5, 300 sec: 6803.4). Total num frames: 170373120. Throughput: 0: 1732.7. Samples: 37586644. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:24:02,938][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 05:24:03,294][42004] Updated weights for policy 0, policy_version 41596 (0.0022) +[2024-11-08 05:24:07,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 170397696. Throughput: 0: 1667.8. Samples: 37594050. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:24:07,934][41694] Avg episode reward: [(0, '4.533')] +[2024-11-08 05:24:10,595][42004] Updated weights for policy 0, policy_version 41606 (0.0028) +[2024-11-08 05:24:12,931][41694] Fps is (10 sec: 6146.8, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 170434560. Throughput: 0: 1658.5. Samples: 37604630. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:24:12,936][41694] Avg episode reward: [(0, '4.403')] +[2024-11-08 05:24:16,098][42004] Updated weights for policy 0, policy_version 41616 (0.0023) +[2024-11-08 05:24:17,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6826.6, 300 sec: 6775.7). Total num frames: 170471424. Throughput: 0: 1660.6. Samples: 37610186. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:24:17,934][41694] Avg episode reward: [(0, '4.464')] +[2024-11-08 05:24:22,070][42004] Updated weights for policy 0, policy_version 41626 (0.0032) +[2024-11-08 05:24:22,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.5, 300 sec: 6761.9). Total num frames: 170504192. Throughput: 0: 1711.7. Samples: 37620448. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:24:22,933][41694] Avg episode reward: [(0, '4.426')] +[2024-11-08 05:24:27,823][42004] Updated weights for policy 0, policy_version 41636 (0.0031) +[2024-11-08 05:24:27,932][41694] Fps is (10 sec: 6963.5, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 170541056. Throughput: 0: 1713.8. Samples: 37631480. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:24:27,933][41694] Avg episode reward: [(0, '4.455')] +[2024-11-08 05:24:32,932][41694] Fps is (10 sec: 7372.1, 60 sec: 6963.1, 300 sec: 6803.5). Total num frames: 170577920. Throughput: 0: 1721.7. Samples: 37637128. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:24:32,934][41694] Avg episode reward: [(0, '4.526')] +[2024-11-08 05:24:33,069][42004] Updated weights for policy 0, policy_version 41646 (0.0034) +[2024-11-08 05:24:37,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 6803.5). Total num frames: 170614784. Throughput: 0: 1749.3. Samples: 37648490. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:24:37,938][41694] Avg episode reward: [(0, '4.465')] +[2024-11-08 05:24:40,169][42004] Updated weights for policy 0, policy_version 41656 (0.0031) +[2024-11-08 05:24:42,932][41694] Fps is (10 sec: 6144.6, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 170639360. Throughput: 0: 1674.2. Samples: 37656514. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:24:42,934][41694] Avg episode reward: [(0, '4.535')] +[2024-11-08 05:24:45,842][42004] Updated weights for policy 0, policy_version 41666 (0.0023) +[2024-11-08 05:24:47,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6826.7, 300 sec: 6789.6). Total num frames: 170676224. Throughput: 0: 1673.7. Samples: 37661954. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:24:47,933][41694] Avg episode reward: [(0, '4.361')] +[2024-11-08 05:24:51,242][42004] Updated weights for policy 0, policy_version 41676 (0.0025) +[2024-11-08 05:24:52,932][41694] Fps is (10 sec: 7372.3, 60 sec: 6826.6, 300 sec: 6789.6). Total num frames: 170713088. Throughput: 0: 1762.1. Samples: 37673344. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:24:52,935][41694] Avg episode reward: [(0, '4.377')] +[2024-11-08 05:24:57,213][42004] Updated weights for policy 0, policy_version 41686 (0.0036) +[2024-11-08 05:24:57,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6789.7). Total num frames: 170749952. Throughput: 0: 1756.2. Samples: 37683658. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:24:57,933][41694] Avg episode reward: [(0, '4.345')] +[2024-11-08 05:25:02,773][42004] Updated weights for policy 0, policy_version 41696 (0.0028) +[2024-11-08 05:25:02,931][41694] Fps is (10 sec: 7373.3, 60 sec: 6895.5, 300 sec: 6825.8). Total num frames: 170786816. Throughput: 0: 1761.0. Samples: 37689432. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:25:02,933][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 05:25:07,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7099.7, 300 sec: 6831.3). Total num frames: 170823680. Throughput: 0: 1775.2. Samples: 37700334. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:25:07,935][41694] Avg episode reward: [(0, '4.348')] +[2024-11-08 05:25:08,149][42004] Updated weights for policy 0, policy_version 41706 (0.0044) +[2024-11-08 05:25:14,126][41694] Fps is (10 sec: 6220.3, 60 sec: 6894.2, 300 sec: 6817.6). Total num frames: 170856448. Throughput: 0: 1732.1. Samples: 37711494. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:25:14,130][41694] Avg episode reward: [(0, '4.682')] +[2024-11-08 05:25:16,076][42004] Updated weights for policy 0, policy_version 41716 (0.0032) +[2024-11-08 05:25:17,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 170876928. Throughput: 0: 1690.3. Samples: 37713190. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:25:17,934][41694] Avg episode reward: [(0, '4.435')] +[2024-11-08 05:25:21,936][42004] Updated weights for policy 0, policy_version 41726 (0.0036) +[2024-11-08 05:25:22,931][41694] Fps is (10 sec: 6512.2, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 170913792. Throughput: 0: 1657.6. Samples: 37723082. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:25:22,933][41694] Avg episode reward: [(0, '4.412')] +[2024-11-08 05:25:27,505][42004] Updated weights for policy 0, policy_version 41736 (0.0023) +[2024-11-08 05:25:27,933][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 170950656. Throughput: 0: 1731.0. Samples: 37734410. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:25:27,934][41694] Avg episode reward: [(0, '4.453')] +[2024-11-08 05:25:32,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.8, 300 sec: 6803.5). Total num frames: 170987520. Throughput: 0: 1717.2. Samples: 37739230. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:25:32,933][41694] Avg episode reward: [(0, '4.611')] +[2024-11-08 05:25:33,212][42004] Updated weights for policy 0, policy_version 41746 (0.0022) +[2024-11-08 05:25:37,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.6, 300 sec: 6845.2). Total num frames: 171024384. Throughput: 0: 1720.3. Samples: 37750756. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:25:37,933][41694] Avg episode reward: [(0, '4.437')] +[2024-11-08 05:25:38,059][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000041755_171028480.pth... +[2024-11-08 05:25:38,157][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000041354_169385984.pth +[2024-11-08 05:25:38,613][42004] Updated weights for policy 0, policy_version 41756 (0.0028) +[2024-11-08 05:25:42,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7031.4, 300 sec: 6859.1). Total num frames: 171061248. Throughput: 0: 1739.7. Samples: 37761946. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:25:42,937][41694] Avg episode reward: [(0, '4.628')] +[2024-11-08 05:25:44,102][42004] Updated weights for policy 0, policy_version 41766 (0.0028) +[2024-11-08 05:25:47,932][41694] Fps is (10 sec: 6144.1, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 171085824. Throughput: 0: 1732.9. Samples: 37767412. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:25:47,935][41694] Avg episode reward: [(0, '4.424')] +[2024-11-08 05:25:51,703][42004] Updated weights for policy 0, policy_version 41776 (0.0035) +[2024-11-08 05:25:52,931][41694] Fps is (10 sec: 6144.2, 60 sec: 6826.8, 300 sec: 6817.4). Total num frames: 171122688. Throughput: 0: 1656.7. Samples: 37774884. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:25:52,933][41694] Avg episode reward: [(0, '4.566')] +[2024-11-08 05:25:57,101][42004] Updated weights for policy 0, policy_version 41786 (0.0026) +[2024-11-08 05:25:57,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6831.3). Total num frames: 171159552. Throughput: 0: 1708.4. Samples: 37786332. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:25:57,934][41694] Avg episode reward: [(0, '4.474')] +[2024-11-08 05:26:02,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 171192320. Throughput: 0: 1739.5. Samples: 37791466. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:26:02,934][41694] Avg episode reward: [(0, '4.310')] +[2024-11-08 05:26:03,184][42004] Updated weights for policy 0, policy_version 41796 (0.0022) +[2024-11-08 05:26:07,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 171229184. Throughput: 0: 1742.0. Samples: 37801472. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:26:07,933][41694] Avg episode reward: [(0, '4.321')] +[2024-11-08 05:26:08,993][42004] Updated weights for policy 0, policy_version 41806 (0.0034) +[2024-11-08 05:26:12,934][41694] Fps is (10 sec: 6961.4, 60 sec: 6895.4, 300 sec: 6859.0). Total num frames: 171261952. Throughput: 0: 1708.4. Samples: 37811292. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:26:12,936][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 05:26:15,334][42004] Updated weights for policy 0, policy_version 41816 (0.0036) +[2024-11-08 05:26:17,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6963.3, 300 sec: 6845.2). Total num frames: 171294720. Throughput: 0: 1715.2. Samples: 37816412. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:26:17,933][41694] Avg episode reward: [(0, '4.659')] +[2024-11-08 05:26:22,932][41694] Fps is (10 sec: 5326.1, 60 sec: 6690.1, 300 sec: 6789.6). Total num frames: 171315200. Throughput: 0: 1666.9. Samples: 37825766. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:26:22,935][41694] Avg episode reward: [(0, '4.551')] +[2024-11-08 05:26:22,995][42004] Updated weights for policy 0, policy_version 41826 (0.0035) +[2024-11-08 05:26:27,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 171356160. Throughput: 0: 1618.6. Samples: 37834784. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:26:27,933][41694] Avg episode reward: [(0, '4.215')] +[2024-11-08 05:26:28,420][42004] Updated weights for policy 0, policy_version 41836 (0.0027) +[2024-11-08 05:26:32,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 171393024. Throughput: 0: 1622.5. Samples: 37840424. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:26:32,934][41694] Avg episode reward: [(0, '4.550')] +[2024-11-08 05:26:34,147][42004] Updated weights for policy 0, policy_version 41846 (0.0030) +[2024-11-08 05:26:37,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6621.9, 300 sec: 6789.6). Total num frames: 171421696. Throughput: 0: 1682.1. Samples: 37850580. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:26:37,933][41694] Avg episode reward: [(0, '4.337')] +[2024-11-08 05:26:40,308][42004] Updated weights for policy 0, policy_version 41856 (0.0046) +[2024-11-08 05:26:42,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6621.9, 300 sec: 6789.6). Total num frames: 171458560. Throughput: 0: 1662.9. Samples: 37861164. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:26:42,935][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 05:26:45,838][42004] Updated weights for policy 0, policy_version 41866 (0.0021) +[2024-11-08 05:26:47,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6831.3). Total num frames: 171495424. Throughput: 0: 1673.5. Samples: 37866774. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:26:47,932][41694] Avg episode reward: [(0, '4.644')] +[2024-11-08 05:26:51,585][42004] Updated weights for policy 0, policy_version 41876 (0.0047) +[2024-11-08 05:26:52,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.6, 300 sec: 6831.3). Total num frames: 171532288. Throughput: 0: 1692.2. Samples: 37877620. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:26:52,936][41694] Avg episode reward: [(0, '4.532')] +[2024-11-08 05:26:57,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6621.9, 300 sec: 6789.6). Total num frames: 171556864. Throughput: 0: 1649.2. Samples: 37885502. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:26:57,934][41694] Avg episode reward: [(0, '4.231')] +[2024-11-08 05:26:58,957][42004] Updated weights for policy 0, policy_version 41886 (0.0027) +[2024-11-08 05:27:02,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6621.9, 300 sec: 6803.6). Total num frames: 171589632. Throughput: 0: 1651.3. Samples: 37890722. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:27:02,933][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 05:27:04,877][42004] Updated weights for policy 0, policy_version 41896 (0.0027) +[2024-11-08 05:27:07,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.8, 300 sec: 6803.5). Total num frames: 171626496. Throughput: 0: 1677.1. Samples: 37901234. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:27:07,933][41694] Avg episode reward: [(0, '4.579')] +[2024-11-08 05:27:11,131][42004] Updated weights for policy 0, policy_version 41906 (0.0028) +[2024-11-08 05:27:12,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6553.9, 300 sec: 6775.8). Total num frames: 171655168. Throughput: 0: 1684.8. Samples: 37910598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:27:12,934][41694] Avg episode reward: [(0, '4.313')] +[2024-11-08 05:27:17,018][42004] Updated weights for policy 0, policy_version 41916 (0.0034) +[2024-11-08 05:27:17,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6621.8, 300 sec: 6775.8). Total num frames: 171692032. Throughput: 0: 1669.3. Samples: 37915542. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:27:17,934][41694] Avg episode reward: [(0, '4.559')] +[2024-11-08 05:27:22,493][42004] Updated weights for policy 0, policy_version 41926 (0.0030) +[2024-11-08 05:27:22,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6895.0, 300 sec: 6817.4). Total num frames: 171728896. Throughput: 0: 1697.0. Samples: 37926944. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:27:22,933][41694] Avg episode reward: [(0, '4.576')] +[2024-11-08 05:27:27,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 171765760. Throughput: 0: 1708.8. Samples: 37938058. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:27:27,932][41694] Avg episode reward: [(0, '4.623')] +[2024-11-08 05:27:28,026][42004] Updated weights for policy 0, policy_version 41936 (0.0029) +[2024-11-08 05:27:32,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6621.9, 300 sec: 6775.8). Total num frames: 171790336. Throughput: 0: 1639.6. Samples: 37940558. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:27:32,935][41694] Avg episode reward: [(0, '4.637')] +[2024-11-08 05:27:35,265][42004] Updated weights for policy 0, policy_version 41946 (0.0031) +[2024-11-08 05:27:37,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 171827200. Throughput: 0: 1647.6. Samples: 37951760. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:27:37,933][41694] Avg episode reward: [(0, '4.171')] +[2024-11-08 05:27:37,976][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000041951_171831296.pth... +[2024-11-08 05:27:38,072][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000041552_170196992.pth +[2024-11-08 05:27:40,841][42004] Updated weights for policy 0, policy_version 41956 (0.0025) +[2024-11-08 05:27:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6789.7). Total num frames: 171864064. Throughput: 0: 1708.5. Samples: 37962386. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:27:42,933][41694] Avg episode reward: [(0, '4.539')] +[2024-11-08 05:27:46,958][42004] Updated weights for policy 0, policy_version 41966 (0.0027) +[2024-11-08 05:27:47,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 171896832. Throughput: 0: 1697.5. Samples: 37967112. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:27:47,935][41694] Avg episode reward: [(0, '4.312')] +[2024-11-08 05:27:52,470][42004] Updated weights for policy 0, policy_version 41976 (0.0023) +[2024-11-08 05:27:52,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.2, 300 sec: 6819.0). Total num frames: 171933696. Throughput: 0: 1711.7. Samples: 37978258. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:27:52,932][41694] Avg episode reward: [(0, '4.593')] +[2024-11-08 05:27:57,815][42004] Updated weights for policy 0, policy_version 41986 (0.0032) +[2024-11-08 05:27:57,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6831.3). Total num frames: 171974656. Throughput: 0: 1757.5. Samples: 37989688. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:27:57,933][41694] Avg episode reward: [(0, '4.552')] +[2024-11-08 05:28:03,983][41694] Fps is (10 sec: 6300.4, 60 sec: 6776.1, 300 sec: 6779.4). Total num frames: 172003328. Throughput: 0: 1729.0. Samples: 37995166. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:28:03,985][41694] Avg episode reward: [(0, '4.669')] +[2024-11-08 05:28:05,685][42004] Updated weights for policy 0, policy_version 41996 (0.0027) +[2024-11-08 05:28:07,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 172032000. Throughput: 0: 1669.6. Samples: 38002078. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:28:07,934][41694] Avg episode reward: [(0, '4.876')] +[2024-11-08 05:28:11,342][42004] Updated weights for policy 0, policy_version 42006 (0.0033) +[2024-11-08 05:28:12,931][41694] Fps is (10 sec: 6866.3, 60 sec: 6826.7, 300 sec: 6789.6). Total num frames: 172064768. Throughput: 0: 1665.8. Samples: 38013018. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:28:12,934][41694] Avg episode reward: [(0, '4.599')] +[2024-11-08 05:28:17,429][42004] Updated weights for policy 0, policy_version 42016 (0.0046) +[2024-11-08 05:28:17,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6826.7, 300 sec: 6789.7). Total num frames: 172101632. Throughput: 0: 1720.9. Samples: 38017998. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:28:17,932][41694] Avg episode reward: [(0, '4.560')] +[2024-11-08 05:28:22,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 172130304. Throughput: 0: 1686.8. Samples: 38027664. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:28:22,948][41694] Avg episode reward: [(0, '4.376')] +[2024-11-08 05:28:23,593][42004] Updated weights for policy 0, policy_version 42026 (0.0029) +[2024-11-08 05:28:27,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6690.1, 300 sec: 6803.5). Total num frames: 172167168. Throughput: 0: 1688.3. Samples: 38038360. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:28:27,933][41694] Avg episode reward: [(0, '4.318')] +[2024-11-08 05:28:29,099][42004] Updated weights for policy 0, policy_version 42036 (0.0029) +[2024-11-08 05:28:32,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6894.9, 300 sec: 6803.5). Total num frames: 172204032. Throughput: 0: 1709.0. Samples: 38044018. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:28:32,933][41694] Avg episode reward: [(0, '4.607')] +[2024-11-08 05:28:34,844][42004] Updated weights for policy 0, policy_version 42046 (0.0045) +[2024-11-08 05:28:37,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 172228608. Throughput: 0: 1698.4. Samples: 38054686. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:28:37,934][41694] Avg episode reward: [(0, '4.645')] +[2024-11-08 05:28:41,992][42004] Updated weights for policy 0, policy_version 42056 (0.0026) +[2024-11-08 05:28:42,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 172265472. Throughput: 0: 1626.5. Samples: 38062880. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:28:42,933][41694] Avg episode reward: [(0, '4.259')] +[2024-11-08 05:28:47,338][42004] Updated weights for policy 0, policy_version 42066 (0.0036) +[2024-11-08 05:28:47,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6826.6, 300 sec: 6789.6). Total num frames: 172306432. Throughput: 0: 1671.6. Samples: 38068628. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:28:47,934][41694] Avg episode reward: [(0, '4.297')] +[2024-11-08 05:28:52,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 172335104. Throughput: 0: 1694.4. Samples: 38078324. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:28:52,933][41694] Avg episode reward: [(0, '4.493')] +[2024-11-08 05:28:54,024][42004] Updated weights for policy 0, policy_version 42076 (0.0028) +[2024-11-08 05:28:57,931][41694] Fps is (10 sec: 6144.2, 60 sec: 6553.6, 300 sec: 6762.0). Total num frames: 172367872. Throughput: 0: 1681.3. Samples: 38088678. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:28:57,933][41694] Avg episode reward: [(0, '4.468')] +[2024-11-08 05:28:59,622][42004] Updated weights for policy 0, policy_version 42086 (0.0025) +[2024-11-08 05:29:02,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6809.5, 300 sec: 6803.5). Total num frames: 172404736. Throughput: 0: 1697.1. Samples: 38094368. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:29:02,934][41694] Avg episode reward: [(0, '4.567')] +[2024-11-08 05:29:05,624][42004] Updated weights for policy 0, policy_version 42096 (0.0031) +[2024-11-08 05:29:07,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 172437504. Throughput: 0: 1710.8. Samples: 38104652. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:29:07,935][41694] Avg episode reward: [(0, '4.478')] +[2024-11-08 05:29:12,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6621.8, 300 sec: 6748.0). Total num frames: 172462080. Throughput: 0: 1637.6. Samples: 38112054. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:29:12,934][41694] Avg episode reward: [(0, '4.362')] +[2024-11-08 05:29:13,096][42004] Updated weights for policy 0, policy_version 42106 (0.0032) +[2024-11-08 05:29:17,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 172503040. Throughput: 0: 1634.2. Samples: 38117558. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:29:17,933][41694] Avg episode reward: [(0, '4.372')] +[2024-11-08 05:29:18,359][42004] Updated weights for policy 0, policy_version 42116 (0.0031) +[2024-11-08 05:29:22,931][41694] Fps is (10 sec: 7782.7, 60 sec: 6826.7, 300 sec: 6775.8). Total num frames: 172539904. Throughput: 0: 1654.0. Samples: 38129114. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:29:22,934][41694] Avg episode reward: [(0, '4.287')] +[2024-11-08 05:29:24,281][42004] Updated weights for policy 0, policy_version 42126 (0.0032) +[2024-11-08 05:29:27,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 172572672. Throughput: 0: 1695.5. Samples: 38139176. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:29:27,934][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 05:29:30,229][42004] Updated weights for policy 0, policy_version 42136 (0.0038) +[2024-11-08 05:29:32,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 172605440. Throughput: 0: 1684.1. Samples: 38144410. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:29:32,934][41694] Avg episode reward: [(0, '4.513')] +[2024-11-08 05:29:35,866][42004] Updated weights for policy 0, policy_version 42146 (0.0028) +[2024-11-08 05:29:37,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6895.0, 300 sec: 6789.6). Total num frames: 172642304. Throughput: 0: 1713.9. Samples: 38155448. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:29:37,934][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 05:29:37,946][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000042149_172642304.pth... +[2024-11-08 05:29:38,136][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000041755_171028480.pth +[2024-11-08 05:29:41,724][42004] Updated weights for policy 0, policy_version 42156 (0.0024) +[2024-11-08 05:29:42,932][41694] Fps is (10 sec: 6962.8, 60 sec: 6826.6, 300 sec: 6775.7). Total num frames: 172675072. Throughput: 0: 1710.0. Samples: 38165630. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:29:42,935][41694] Avg episode reward: [(0, '4.526')] +[2024-11-08 05:29:47,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6553.6, 300 sec: 6734.1). Total num frames: 172699648. Throughput: 0: 1662.3. Samples: 38169170. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:29:47,933][41694] Avg episode reward: [(0, '4.476')] +[2024-11-08 05:29:49,482][42004] Updated weights for policy 0, policy_version 42166 (0.0060) +[2024-11-08 05:29:52,932][41694] Fps is (10 sec: 6144.2, 60 sec: 6690.1, 300 sec: 6734.1). Total num frames: 172736512. Throughput: 0: 1641.0. Samples: 38178496. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:29:52,933][41694] Avg episode reward: [(0, '4.323')] +[2024-11-08 05:29:55,474][42004] Updated weights for policy 0, policy_version 42176 (0.0026) +[2024-11-08 05:29:57,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6621.9, 300 sec: 6706.3). Total num frames: 172765184. Throughput: 0: 1689.4. Samples: 38188078. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:29:57,937][41694] Avg episode reward: [(0, '4.248')] +[2024-11-08 05:30:01,793][42004] Updated weights for policy 0, policy_version 42186 (0.0024) +[2024-11-08 05:30:02,933][41694] Fps is (10 sec: 6143.1, 60 sec: 6553.4, 300 sec: 6692.4). Total num frames: 172797952. Throughput: 0: 1676.1. Samples: 38192986. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:30:02,938][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 05:30:07,337][42004] Updated weights for policy 0, policy_version 42196 (0.0025) +[2024-11-08 05:30:07,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6747.5). Total num frames: 172838912. Throughput: 0: 1653.0. Samples: 38203498. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:30:07,933][41694] Avg episode reward: [(0, '4.601')] +[2024-11-08 05:30:12,932][41694] Fps is (10 sec: 7373.7, 60 sec: 6826.7, 300 sec: 6761.9). Total num frames: 172871680. Throughput: 0: 1669.7. Samples: 38214314. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:30:12,937][41694] Avg episode reward: [(0, '4.251')] +[2024-11-08 05:30:13,277][42004] Updated weights for policy 0, policy_version 42206 (0.0036) +[2024-11-08 05:30:17,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 172904448. Throughput: 0: 1650.8. Samples: 38218696. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:30:17,935][41694] Avg episode reward: [(0, '4.297')] +[2024-11-08 05:30:21,623][42004] Updated weights for policy 0, policy_version 42216 (0.0042) +[2024-11-08 05:30:22,935][41694] Fps is (10 sec: 5322.9, 60 sec: 6416.6, 300 sec: 6692.4). Total num frames: 172924928. Throughput: 0: 1562.8. Samples: 38225778. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:30:22,938][41694] Avg episode reward: [(0, '4.546')] +[2024-11-08 05:30:27,259][42004] Updated weights for policy 0, policy_version 42226 (0.0040) +[2024-11-08 05:30:27,933][41694] Fps is (10 sec: 5734.3, 60 sec: 6485.3, 300 sec: 6692.4). Total num frames: 172961792. Throughput: 0: 1578.1. Samples: 38236642. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:30:27,936][41694] Avg episode reward: [(0, '4.294')] +[2024-11-08 05:30:32,932][41694] Fps is (10 sec: 6965.7, 60 sec: 6485.3, 300 sec: 6678.6). Total num frames: 172994560. Throughput: 0: 1611.6. Samples: 38241694. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:30:32,934][41694] Avg episode reward: [(0, '4.482')] +[2024-11-08 05:30:33,063][42004] Updated weights for policy 0, policy_version 42236 (0.0026) +[2024-11-08 05:30:37,932][41694] Fps is (10 sec: 7372.3, 60 sec: 6553.5, 300 sec: 6692.4). Total num frames: 173035520. Throughput: 0: 1646.2. Samples: 38252576. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:30:37,935][41694] Avg episode reward: [(0, '4.436')] +[2024-11-08 05:30:38,447][42004] Updated weights for policy 0, policy_version 42246 (0.0025) +[2024-11-08 05:30:42,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 173072384. Throughput: 0: 1695.4. Samples: 38264372. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:30:42,935][41694] Avg episode reward: [(0, '4.452')] +[2024-11-08 05:30:43,794][42004] Updated weights for policy 0, policy_version 42256 (0.0029) +[2024-11-08 05:30:47,931][41694] Fps is (10 sec: 7373.5, 60 sec: 6826.7, 300 sec: 6734.1). Total num frames: 173109248. Throughput: 0: 1708.5. Samples: 38269868. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:30:47,933][41694] Avg episode reward: [(0, '4.517')] +[2024-11-08 05:30:49,522][42004] Updated weights for policy 0, policy_version 42266 (0.0029) +[2024-11-08 05:30:54,404][41694] Fps is (10 sec: 5712.3, 60 sec: 6529.8, 300 sec: 6673.0). Total num frames: 173137920. Throughput: 0: 1660.1. Samples: 38280648. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:30:54,406][41694] Avg episode reward: [(0, '4.303')] +[2024-11-08 05:30:57,287][42004] Updated weights for policy 0, policy_version 42276 (0.0030) +[2024-11-08 05:30:57,932][41694] Fps is (10 sec: 5734.1, 60 sec: 6690.1, 300 sec: 6692.4). Total num frames: 173166592. Throughput: 0: 1633.1. Samples: 38287802. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:30:57,936][41694] Avg episode reward: [(0, '4.459')] +[2024-11-08 05:31:02,907][42004] Updated weights for policy 0, policy_version 42286 (0.0024) +[2024-11-08 05:31:02,931][41694] Fps is (10 sec: 7685.7, 60 sec: 6758.6, 300 sec: 6692.4). Total num frames: 173203456. Throughput: 0: 1658.6. Samples: 38293334. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:31:02,933][41694] Avg episode reward: [(0, '4.407')] +[2024-11-08 05:31:07,932][41694] Fps is (10 sec: 6963.5, 60 sec: 6621.9, 300 sec: 6692.5). Total num frames: 173236224. Throughput: 0: 1726.2. Samples: 38303448. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:31:07,934][41694] Avg episode reward: [(0, '4.348')] +[2024-11-08 05:31:08,629][42004] Updated weights for policy 0, policy_version 42296 (0.0028) +[2024-11-08 05:31:12,934][41694] Fps is (10 sec: 6961.7, 60 sec: 6689.9, 300 sec: 6706.3). Total num frames: 173273088. Throughput: 0: 1718.6. Samples: 38313980. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:31:12,935][41694] Avg episode reward: [(0, '4.328')] +[2024-11-08 05:31:14,809][42004] Updated weights for policy 0, policy_version 42306 (0.0043) +[2024-11-08 05:31:17,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 173305856. Throughput: 0: 1723.5. Samples: 38319250. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:31:17,934][41694] Avg episode reward: [(0, '4.506')] +[2024-11-08 05:31:20,454][42004] Updated weights for policy 0, policy_version 42316 (0.0029) +[2024-11-08 05:31:22,931][41694] Fps is (10 sec: 6964.7, 60 sec: 6963.7, 300 sec: 6734.1). Total num frames: 173342720. Throughput: 0: 1728.4. Samples: 38330352. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:31:22,933][41694] Avg episode reward: [(0, '4.326')] +[2024-11-08 05:31:25,895][42004] Updated weights for policy 0, policy_version 42326 (0.0030) +[2024-11-08 05:31:28,768][41694] Fps is (10 sec: 6048.1, 60 sec: 6732.9, 300 sec: 6687.4). Total num frames: 173371392. Throughput: 0: 1560.4. Samples: 38335896. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:31:28,770][41694] Avg episode reward: [(0, '4.268')] +[2024-11-08 05:31:32,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 173404160. Throughput: 0: 1625.4. Samples: 38343012. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:31:32,933][41694] Avg episode reward: [(0, '4.179')] +[2024-11-08 05:31:33,471][42004] Updated weights for policy 0, policy_version 42336 (0.0037) +[2024-11-08 05:31:37,932][41694] Fps is (10 sec: 7151.4, 60 sec: 6690.2, 300 sec: 6706.3). Total num frames: 173436928. Throughput: 0: 1692.3. Samples: 38354308. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:31:37,933][41694] Avg episode reward: [(0, '4.463')] +[2024-11-08 05:31:38,054][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000042344_173441024.pth... +[2024-11-08 05:31:38,174][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000041951_171831296.pth +[2024-11-08 05:31:39,310][42004] Updated weights for policy 0, policy_version 42346 (0.0029) +[2024-11-08 05:31:42,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6621.9, 300 sec: 6692.4). Total num frames: 173469696. Throughput: 0: 1699.0. Samples: 38364258. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:31:42,934][41694] Avg episode reward: [(0, '4.527')] +[2024-11-08 05:31:45,100][42004] Updated weights for policy 0, policy_version 42356 (0.0032) +[2024-11-08 05:31:47,935][41694] Fps is (10 sec: 7369.9, 60 sec: 6689.7, 300 sec: 6706.2). Total num frames: 173510656. Throughput: 0: 1702.5. Samples: 38369952. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:31:47,938][41694] Avg episode reward: [(0, '4.486')] +[2024-11-08 05:31:50,207][42004] Updated weights for policy 0, policy_version 42366 (0.0031) +[2024-11-08 05:31:52,931][41694] Fps is (10 sec: 8192.4, 60 sec: 7068.5, 300 sec: 6761.9). Total num frames: 173551616. Throughput: 0: 1742.8. Samples: 38381874. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:31:52,933][41694] Avg episode reward: [(0, '4.309')] +[2024-11-08 05:31:55,592][42004] Updated weights for policy 0, policy_version 42376 (0.0029) +[2024-11-08 05:31:57,931][41694] Fps is (10 sec: 7785.5, 60 sec: 7031.5, 300 sec: 6775.8). Total num frames: 173588480. Throughput: 0: 1758.2. Samples: 38393096. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:31:57,934][41694] Avg episode reward: [(0, '4.378')] +[2024-11-08 05:32:03,089][41694] Fps is (10 sec: 5645.2, 60 sec: 6740.7, 300 sec: 6716.6). Total num frames: 173608960. Throughput: 0: 1757.2. Samples: 38398600. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:32:03,092][41694] Avg episode reward: [(0, '4.332')] +[2024-11-08 05:32:03,427][42004] Updated weights for policy 0, policy_version 42386 (0.0035) +[2024-11-08 05:32:07,932][41694] Fps is (10 sec: 5324.6, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 173641728. Throughput: 0: 1663.7. Samples: 38405220. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:32:07,933][41694] Avg episode reward: [(0, '4.375')] +[2024-11-08 05:32:09,080][42004] Updated weights for policy 0, policy_version 42396 (0.0047) +[2024-11-08 05:32:12,931][41694] Fps is (10 sec: 7075.0, 60 sec: 6758.6, 300 sec: 6734.1). Total num frames: 173678592. Throughput: 0: 1807.1. Samples: 38415704. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:32:12,933][41694] Avg episode reward: [(0, '4.394')] +[2024-11-08 05:32:15,133][42004] Updated weights for policy 0, policy_version 42406 (0.0030) +[2024-11-08 05:32:17,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6758.4, 300 sec: 6720.2). Total num frames: 173711360. Throughput: 0: 1731.2. Samples: 38420916. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:32:17,933][41694] Avg episode reward: [(0, '4.324')] +[2024-11-08 05:32:20,592][42004] Updated weights for policy 0, policy_version 42416 (0.0025) +[2024-11-08 05:32:22,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6734.1). Total num frames: 173752320. Throughput: 0: 1731.1. Samples: 38432206. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:32:22,933][41694] Avg episode reward: [(0, '4.396')] +[2024-11-08 05:32:26,114][42004] Updated weights for policy 0, policy_version 42426 (0.0027) +[2024-11-08 05:32:27,932][41694] Fps is (10 sec: 7781.9, 60 sec: 7061.5, 300 sec: 6775.7). Total num frames: 173789184. Throughput: 0: 1765.9. Samples: 38443724. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:32:27,935][41694] Avg episode reward: [(0, '4.426')] +[2024-11-08 05:32:31,541][42004] Updated weights for policy 0, policy_version 42436 (0.0038) +[2024-11-08 05:32:32,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 6775.8). Total num frames: 173826048. Throughput: 0: 1757.2. Samples: 38449018. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:32:32,934][41694] Avg episode reward: [(0, '4.504')] +[2024-11-08 05:32:37,932][41694] Fps is (10 sec: 5325.1, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 173842432. Throughput: 0: 1708.8. Samples: 38458770. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:32:37,933][41694] Avg episode reward: [(0, '4.339')] +[2024-11-08 05:32:39,940][42004] Updated weights for policy 0, policy_version 42446 (0.0036) +[2024-11-08 05:32:42,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 173875200. Throughput: 0: 1610.5. Samples: 38465570. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:32:42,936][41694] Avg episode reward: [(0, '4.313')] +[2024-11-08 05:32:46,077][42004] Updated weights for policy 0, policy_version 42456 (0.0033) +[2024-11-08 05:32:47,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6622.3, 300 sec: 6692.4). Total num frames: 173907968. Throughput: 0: 1609.6. Samples: 38470780. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:32:47,933][41694] Avg episode reward: [(0, '4.509')] +[2024-11-08 05:32:51,808][42004] Updated weights for policy 0, policy_version 42466 (0.0041) +[2024-11-08 05:32:52,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6692.5). Total num frames: 173948928. Throughput: 0: 1687.6. Samples: 38481160. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:32:52,933][41694] Avg episode reward: [(0, '4.498')] +[2024-11-08 05:32:57,055][42004] Updated weights for policy 0, policy_version 42476 (0.0020) +[2024-11-08 05:32:57,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6621.9, 300 sec: 6744.3). Total num frames: 173985792. Throughput: 0: 1712.5. Samples: 38492768. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:32:57,933][41694] Avg episode reward: [(0, '4.636')] +[2024-11-08 05:33:02,743][42004] Updated weights for policy 0, policy_version 42486 (0.0031) +[2024-11-08 05:33:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6913.1, 300 sec: 6748.0). Total num frames: 174022656. Throughput: 0: 1714.8. Samples: 38498082. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:33:02,933][41694] Avg episode reward: [(0, '4.505')] +[2024-11-08 05:33:07,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6761.9). Total num frames: 174059520. Throughput: 0: 1709.1. Samples: 38509114. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:33:07,933][41694] Avg episode reward: [(0, '4.361')] +[2024-11-08 05:33:08,239][42004] Updated weights for policy 0, policy_version 42496 (0.0028) +[2024-11-08 05:33:12,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6690.1, 300 sec: 6706.3). Total num frames: 174080000. Throughput: 0: 1616.7. Samples: 38516476. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:33:12,933][41694] Avg episode reward: [(0, '4.370')] +[2024-11-08 05:33:16,304][42004] Updated weights for policy 0, policy_version 42506 (0.0029) +[2024-11-08 05:33:17,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 174116864. Throughput: 0: 1604.9. Samples: 38521240. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:33:17,934][41694] Avg episode reward: [(0, '4.358')] +[2024-11-08 05:33:22,245][42004] Updated weights for policy 0, policy_version 42516 (0.0029) +[2024-11-08 05:33:22,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 174149632. Throughput: 0: 1615.4. Samples: 38531462. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:33:22,933][41694] Avg episode reward: [(0, '4.509')] +[2024-11-08 05:33:27,556][42004] Updated weights for policy 0, policy_version 42526 (0.0024) +[2024-11-08 05:33:27,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 174186496. Throughput: 0: 1722.3. Samples: 38543072. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:33:27,933][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 05:33:32,633][42004] Updated weights for policy 0, policy_version 42536 (0.0036) +[2024-11-08 05:33:32,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 174227456. Throughput: 0: 1738.0. Samples: 38548990. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:33:32,933][41694] Avg episode reward: [(0, '4.298')] +[2024-11-08 05:33:37,932][41694] Fps is (10 sec: 7782.2, 60 sec: 7031.5, 300 sec: 6775.8). Total num frames: 174264320. Throughput: 0: 1767.8. Samples: 38560712. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:33:37,934][41694] Avg episode reward: [(0, '4.496')] +[2024-11-08 05:33:38,087][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000042546_174268416.pth... +[2024-11-08 05:33:38,091][42004] Updated weights for policy 0, policy_version 42546 (0.0030) +[2024-11-08 05:33:38,205][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000042149_172642304.pth +[2024-11-08 05:33:42,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7099.7, 300 sec: 6761.9). Total num frames: 174301184. Throughput: 0: 1764.7. Samples: 38572180. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:33:42,933][41694] Avg episode reward: [(0, '4.342')] +[2024-11-08 05:33:43,489][42004] Updated weights for policy 0, policy_version 42556 (0.0030) +[2024-11-08 05:33:47,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6894.9, 300 sec: 6734.1). Total num frames: 174321664. Throughput: 0: 1748.1. Samples: 38576748. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:33:47,935][41694] Avg episode reward: [(0, '4.398')] +[2024-11-08 05:33:51,887][42004] Updated weights for policy 0, policy_version 42566 (0.0032) +[2024-11-08 05:33:52,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 174354432. Throughput: 0: 1652.8. Samples: 38583492. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:33:52,934][41694] Avg episode reward: [(0, '4.388')] +[2024-11-08 05:33:57,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6690.1, 300 sec: 6720.2). Total num frames: 174387200. Throughput: 0: 1709.6. Samples: 38593410. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:33:57,934][41694] Avg episode reward: [(0, '4.737')] +[2024-11-08 05:33:57,973][42004] Updated weights for policy 0, policy_version 42576 (0.0034) +[2024-11-08 05:34:02,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 174428160. Throughput: 0: 1732.2. Samples: 38599188. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:34:02,932][41694] Avg episode reward: [(0, '4.321')] +[2024-11-08 05:34:03,420][42004] Updated weights for policy 0, policy_version 42586 (0.0027) +[2024-11-08 05:34:07,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 174465024. Throughput: 0: 1748.2. Samples: 38610132. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:34:07,935][41694] Avg episode reward: [(0, '4.294')] +[2024-11-08 05:34:08,863][42004] Updated weights for policy 0, policy_version 42596 (0.0028) +[2024-11-08 05:34:12,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6963.2, 300 sec: 6761.9). Total num frames: 174497792. Throughput: 0: 1727.1. Samples: 38620790. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:34:12,933][41694] Avg episode reward: [(0, '4.363')] +[2024-11-08 05:34:14,909][42004] Updated weights for policy 0, policy_version 42606 (0.0030) +[2024-11-08 05:34:17,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6963.2, 300 sec: 6761.9). Total num frames: 174534656. Throughput: 0: 1718.4. Samples: 38626318. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:34:17,933][41694] Avg episode reward: [(0, '4.575')] +[2024-11-08 05:34:22,778][42004] Updated weights for policy 0, policy_version 42616 (0.0027) +[2024-11-08 05:34:22,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6720.2). Total num frames: 174555136. Throughput: 0: 1629.4. Samples: 38634034. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:34:22,933][41694] Avg episode reward: [(0, '4.553')] +[2024-11-08 05:34:27,932][41694] Fps is (10 sec: 5324.6, 60 sec: 6690.1, 300 sec: 6720.2). Total num frames: 174587904. Throughput: 0: 1584.4. Samples: 38643478. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:34:27,934][41694] Avg episode reward: [(0, '4.246')] +[2024-11-08 05:34:28,975][42004] Updated weights for policy 0, policy_version 42626 (0.0035) +[2024-11-08 05:34:32,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 174624768. Throughput: 0: 1588.1. Samples: 38648214. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:34:32,934][41694] Avg episode reward: [(0, '4.457')] +[2024-11-08 05:34:34,553][42004] Updated weights for policy 0, policy_version 42636 (0.0038) +[2024-11-08 05:34:37,931][41694] Fps is (10 sec: 7373.1, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 174661632. Throughput: 0: 1691.4. Samples: 38659604. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:34:37,933][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 05:34:39,776][42004] Updated weights for policy 0, policy_version 42646 (0.0028) +[2024-11-08 05:34:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6775.8). Total num frames: 174698496. Throughput: 0: 1723.6. Samples: 38670974. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:34:42,933][41694] Avg episode reward: [(0, '4.301')] +[2024-11-08 05:34:45,351][42004] Updated weights for policy 0, policy_version 42656 (0.0027) +[2024-11-08 05:34:47,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6894.9, 300 sec: 6775.8). Total num frames: 174735360. Throughput: 0: 1724.3. Samples: 38676780. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:34:47,934][41694] Avg episode reward: [(0, '4.253')] +[2024-11-08 05:34:50,962][42004] Updated weights for policy 0, policy_version 42666 (0.0035) +[2024-11-08 05:34:52,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6803.5). Total num frames: 174772224. Throughput: 0: 1723.8. Samples: 38687704. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:34:52,933][41694] Avg episode reward: [(0, '4.475')] +[2024-11-08 05:34:57,933][41694] Fps is (10 sec: 5324.2, 60 sec: 6690.0, 300 sec: 6748.0). Total num frames: 174788608. Throughput: 0: 1622.2. Samples: 38693790. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 05:34:57,935][41694] Avg episode reward: [(0, '4.319')] +[2024-11-08 05:34:59,399][42004] Updated weights for policy 0, policy_version 42676 (0.0030) +[2024-11-08 05:35:02,931][41694] Fps is (10 sec: 4505.6, 60 sec: 6485.3, 300 sec: 6706.3). Total num frames: 174817280. Throughput: 0: 1607.2. Samples: 38698644. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 05:35:02,933][41694] Avg episode reward: [(0, '4.408')] +[2024-11-08 05:35:06,228][42004] Updated weights for policy 0, policy_version 42686 (0.0032) +[2024-11-08 05:35:07,932][41694] Fps is (10 sec: 6144.3, 60 sec: 6417.0, 300 sec: 6706.3). Total num frames: 174850048. Throughput: 0: 1635.8. Samples: 38707644. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 05:35:07,934][41694] Avg episode reward: [(0, '4.364')] +[2024-11-08 05:35:12,031][42004] Updated weights for policy 0, policy_version 42696 (0.0036) +[2024-11-08 05:35:12,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6485.3, 300 sec: 6720.2). Total num frames: 174886912. Throughput: 0: 1659.6. Samples: 38718158. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 05:35:12,933][41694] Avg episode reward: [(0, '4.469')] +[2024-11-08 05:35:17,932][41694] Fps is (10 sec: 6963.6, 60 sec: 6417.0, 300 sec: 6762.0). Total num frames: 174919680. Throughput: 0: 1663.5. Samples: 38723072. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 05:35:17,933][41694] Avg episode reward: [(0, '4.535')] +[2024-11-08 05:35:18,229][42004] Updated weights for policy 0, policy_version 42706 (0.0033) +[2024-11-08 05:35:22,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 174956544. Throughput: 0: 1636.3. Samples: 38733236. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:35:22,934][41694] Avg episode reward: [(0, '4.454')] +[2024-11-08 05:35:23,784][42004] Updated weights for policy 0, policy_version 42716 (0.0031) +[2024-11-08 05:35:27,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 174993408. Throughput: 0: 1627.0. Samples: 38744188. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:35:27,934][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 05:35:30,944][42004] Updated weights for policy 0, policy_version 42726 (0.0038) +[2024-11-08 05:35:32,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 175017984. Throughput: 0: 1568.7. Samples: 38747372. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:35:32,934][41694] Avg episode reward: [(0, '4.477')] +[2024-11-08 05:35:37,023][42004] Updated weights for policy 0, policy_version 42736 (0.0045) +[2024-11-08 05:35:37,932][41694] Fps is (10 sec: 5734.6, 60 sec: 6485.3, 300 sec: 6706.3). Total num frames: 175050752. Throughput: 0: 1547.9. Samples: 38757360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:35:37,934][41694] Avg episode reward: [(0, '4.406')] +[2024-11-08 05:35:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000042737_175050752.pth... +[2024-11-08 05:35:38,070][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000042344_173441024.pth +[2024-11-08 05:35:42,661][42004] Updated weights for policy 0, policy_version 42746 (0.0026) +[2024-11-08 05:35:42,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6485.3, 300 sec: 6706.3). Total num frames: 175087616. Throughput: 0: 1652.7. Samples: 38768158. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:35:42,933][41694] Avg episode reward: [(0, '4.372')] +[2024-11-08 05:35:47,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6485.3, 300 sec: 6767.9). Total num frames: 175124480. Throughput: 0: 1661.6. Samples: 38773414. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:35:47,934][41694] Avg episode reward: [(0, '4.299')] +[2024-11-08 05:35:48,192][42004] Updated weights for policy 0, policy_version 42756 (0.0028) +[2024-11-08 05:35:52,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6485.3, 300 sec: 6761.9). Total num frames: 175161344. Throughput: 0: 1716.9. Samples: 38784902. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:35:52,935][41694] Avg episode reward: [(0, '4.342')] +[2024-11-08 05:35:53,503][42004] Updated weights for policy 0, policy_version 42766 (0.0029) +[2024-11-08 05:35:57,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6895.0, 300 sec: 6775.7). Total num frames: 175202304. Throughput: 0: 1739.6. Samples: 38796440. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:35:57,934][41694] Avg episode reward: [(0, '4.488')] +[2024-11-08 05:35:58,952][42004] Updated weights for policy 0, policy_version 42776 (0.0033) +[2024-11-08 05:36:04,242][41694] Fps is (10 sec: 5794.1, 60 sec: 6680.7, 300 sec: 6718.1). Total num frames: 175226880. Throughput: 0: 1702.8. Samples: 38801930. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:36:04,244][41694] Avg episode reward: [(0, '4.670')] +[2024-11-08 05:36:07,549][42004] Updated weights for policy 0, policy_version 42786 (0.0024) +[2024-11-08 05:36:07,932][41694] Fps is (10 sec: 4915.2, 60 sec: 6690.2, 300 sec: 6706.4). Total num frames: 175251456. Throughput: 0: 1667.8. Samples: 38808286. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:36:07,935][41694] Avg episode reward: [(0, '4.459')] +[2024-11-08 05:36:12,932][41694] Fps is (10 sec: 6128.0, 60 sec: 6553.6, 300 sec: 6692.4). Total num frames: 175280128. Throughput: 0: 1617.6. Samples: 38816980. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:36:12,934][41694] Avg episode reward: [(0, '4.267')] +[2024-11-08 05:36:14,191][42004] Updated weights for policy 0, policy_version 42796 (0.0029) +[2024-11-08 05:36:17,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6621.8, 300 sec: 6692.4). Total num frames: 175316992. Throughput: 0: 1657.8. Samples: 38821972. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:36:17,934][41694] Avg episode reward: [(0, '4.270')] +[2024-11-08 05:36:19,918][42004] Updated weights for policy 0, policy_version 42806 (0.0027) +[2024-11-08 05:36:22,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6621.9, 300 sec: 6739.3). Total num frames: 175353856. Throughput: 0: 1678.8. Samples: 38832906. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:36:22,933][41694] Avg episode reward: [(0, '4.380')] +[2024-11-08 05:36:25,393][42004] Updated weights for policy 0, policy_version 42816 (0.0022) +[2024-11-08 05:36:27,932][41694] Fps is (10 sec: 7373.1, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 175390720. Throughput: 0: 1687.4. Samples: 38844092. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:36:27,933][41694] Avg episode reward: [(0, '4.601')] +[2024-11-08 05:36:30,909][42004] Updated weights for policy 0, policy_version 42826 (0.0026) +[2024-11-08 05:36:32,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6748.0). Total num frames: 175427584. Throughput: 0: 1693.8. Samples: 38849636. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:36:32,934][41694] Avg episode reward: [(0, '4.408')] +[2024-11-08 05:36:38,923][41694] Fps is (10 sec: 5589.7, 60 sec: 6581.4, 300 sec: 6697.7). Total num frames: 175452160. Throughput: 0: 1643.3. Samples: 38860478. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:36:38,928][41694] Avg episode reward: [(0, '4.433')] +[2024-11-08 05:36:39,060][42004] Updated weights for policy 0, policy_version 42836 (0.0041) +[2024-11-08 05:36:42,932][41694] Fps is (10 sec: 4915.2, 60 sec: 6485.3, 300 sec: 6664.8). Total num frames: 175476736. Throughput: 0: 1551.8. Samples: 38866272. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:36:42,934][41694] Avg episode reward: [(0, '4.412')] +[2024-11-08 05:36:45,509][42004] Updated weights for policy 0, policy_version 42846 (0.0041) +[2024-11-08 05:36:47,932][41694] Fps is (10 sec: 6820.4, 60 sec: 6485.3, 300 sec: 6650.8). Total num frames: 175513600. Throughput: 0: 1580.6. Samples: 38870986. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:36:47,933][41694] Avg episode reward: [(0, '4.318')] +[2024-11-08 05:36:51,132][42004] Updated weights for policy 0, policy_version 42856 (0.0030) +[2024-11-08 05:36:52,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6485.3, 300 sec: 6650.8). Total num frames: 175550464. Throughput: 0: 1635.2. Samples: 38881870. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:36:52,933][41694] Avg episode reward: [(0, '4.559')] +[2024-11-08 05:36:56,497][42004] Updated weights for policy 0, policy_version 42866 (0.0021) +[2024-11-08 05:36:57,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6417.1, 300 sec: 6709.9). Total num frames: 175587328. Throughput: 0: 1697.4. Samples: 38893362. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:36:57,933][41694] Avg episode reward: [(0, '4.542')] +[2024-11-08 05:37:01,983][42004] Updated weights for policy 0, policy_version 42876 (0.0039) +[2024-11-08 05:37:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6769.8, 300 sec: 6720.2). Total num frames: 175624192. Throughput: 0: 1713.8. Samples: 38899094. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:37:02,934][41694] Avg episode reward: [(0, '4.371')] +[2024-11-08 05:37:07,451][42004] Updated weights for policy 0, policy_version 42886 (0.0024) +[2024-11-08 05:37:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 175661056. Throughput: 0: 1716.1. Samples: 38910130. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:37:07,934][41694] Avg episode reward: [(0, '4.497')] +[2024-11-08 05:37:13,310][41694] Fps is (10 sec: 5920.0, 60 sec: 6716.1, 300 sec: 6683.9). Total num frames: 175685632. Throughput: 0: 1577.6. Samples: 38915680. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:37:13,313][41694] Avg episode reward: [(0, '4.591')] +[2024-11-08 05:37:15,519][42004] Updated weights for policy 0, policy_version 42896 (0.0036) +[2024-11-08 05:37:17,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 175714304. Throughput: 0: 1614.3. Samples: 38922280. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:37:17,935][41694] Avg episode reward: [(0, '4.378')] +[2024-11-08 05:37:21,543][42004] Updated weights for policy 0, policy_version 42906 (0.0019) +[2024-11-08 05:37:22,932][41694] Fps is (10 sec: 6810.8, 60 sec: 6621.8, 300 sec: 6650.8). Total num frames: 175751168. Throughput: 0: 1628.2. Samples: 38932132. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:37:22,933][41694] Avg episode reward: [(0, '4.441')] +[2024-11-08 05:37:26,901][42004] Updated weights for policy 0, policy_version 42916 (0.0019) +[2024-11-08 05:37:27,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 175788032. Throughput: 0: 1722.6. Samples: 38943790. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:37:27,933][41694] Avg episode reward: [(0, '4.450')] +[2024-11-08 05:37:32,130][42004] Updated weights for policy 0, policy_version 42926 (0.0029) +[2024-11-08 05:37:32,932][41694] Fps is (10 sec: 7782.9, 60 sec: 6690.1, 300 sec: 6734.1). Total num frames: 175828992. Throughput: 0: 1746.3. Samples: 38949568. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:37:32,934][41694] Avg episode reward: [(0, '4.656')] +[2024-11-08 05:37:37,531][42004] Updated weights for policy 0, policy_version 42936 (0.0028) +[2024-11-08 05:37:37,933][41694] Fps is (10 sec: 7781.4, 60 sec: 7010.7, 300 sec: 6748.0). Total num frames: 175865856. Throughput: 0: 1768.8. Samples: 38961466. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:37:37,936][41694] Avg episode reward: [(0, '4.465')] +[2024-11-08 05:37:37,949][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000042936_175865856.pth... +[2024-11-08 05:37:38,084][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000042546_174268416.pth +[2024-11-08 05:37:42,932][41694] Fps is (10 sec: 7372.9, 60 sec: 7099.7, 300 sec: 6761.9). Total num frames: 175902720. Throughput: 0: 1752.8. Samples: 38972240. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:37:42,933][41694] Avg episode reward: [(0, '4.607')] +[2024-11-08 05:37:43,085][42004] Updated weights for policy 0, policy_version 42946 (0.0031) +[2024-11-08 05:37:47,982][41694] Fps is (10 sec: 5706.4, 60 sec: 6820.9, 300 sec: 6691.3). Total num frames: 175923200. Throughput: 0: 1741.7. Samples: 38977560. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:37:47,989][41694] Avg episode reward: [(0, '4.405')] +[2024-11-08 05:37:51,778][42004] Updated weights for policy 0, policy_version 42956 (0.0041) +[2024-11-08 05:37:52,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6758.4, 300 sec: 6678.6). Total num frames: 175955968. Throughput: 0: 1623.1. Samples: 38983168. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:37:52,940][41694] Avg episode reward: [(0, '4.552')] +[2024-11-08 05:37:57,308][42004] Updated weights for policy 0, policy_version 42966 (0.0025) +[2024-11-08 05:37:57,931][41694] Fps is (10 sec: 6998.5, 60 sec: 6758.4, 300 sec: 6678.6). Total num frames: 175992832. Throughput: 0: 1760.9. Samples: 38994256. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:37:57,933][41694] Avg episode reward: [(0, '4.415')] +[2024-11-08 05:38:02,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 6664.7). Total num frames: 176025600. Throughput: 0: 1724.7. Samples: 38999892. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:38:02,933][41694] Avg episode reward: [(0, '4.742')] +[2024-11-08 05:38:02,952][42004] Updated weights for policy 0, policy_version 42976 (0.0025) +[2024-11-08 05:38:07,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 176066560. Throughput: 0: 1755.2. Samples: 39011116. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:38:07,933][41694] Avg episode reward: [(0, '4.411')] +[2024-11-08 05:38:08,066][42004] Updated weights for policy 0, policy_version 42986 (0.0032) +[2024-11-08 05:38:12,931][41694] Fps is (10 sec: 8192.0, 60 sec: 7076.1, 300 sec: 6748.0). Total num frames: 176107520. Throughput: 0: 1761.0. Samples: 39023034. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:38:12,937][41694] Avg episode reward: [(0, '4.377')] +[2024-11-08 05:38:13,347][42004] Updated weights for policy 0, policy_version 42996 (0.0036) +[2024-11-08 05:38:17,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7168.0, 300 sec: 6761.9). Total num frames: 176144384. Throughput: 0: 1755.2. Samples: 39028552. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:38:17,933][41694] Avg episode reward: [(0, '4.535')] +[2024-11-08 05:38:18,832][42004] Updated weights for policy 0, policy_version 43006 (0.0028) +[2024-11-08 05:38:22,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6826.7, 300 sec: 6692.4). Total num frames: 176160768. Throughput: 0: 1695.2. Samples: 39037746. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:38:22,934][41694] Avg episode reward: [(0, '4.714')] +[2024-11-08 05:38:27,635][42004] Updated weights for policy 0, policy_version 43016 (0.0035) +[2024-11-08 05:38:27,932][41694] Fps is (10 sec: 4915.0, 60 sec: 6758.3, 300 sec: 6664.7). Total num frames: 176193536. Throughput: 0: 1614.6. Samples: 39044896. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:38:27,935][41694] Avg episode reward: [(0, '4.674')] +[2024-11-08 05:38:32,929][42004] Updated weights for policy 0, policy_version 43026 (0.0040) +[2024-11-08 05:38:32,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6678.6). Total num frames: 176234496. Throughput: 0: 1618.5. Samples: 39050312. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:38:32,934][41694] Avg episode reward: [(0, '4.485')] +[2024-11-08 05:38:37,931][41694] Fps is (10 sec: 7782.8, 60 sec: 6758.5, 300 sec: 6678.6). Total num frames: 176271360. Throughput: 0: 1755.7. Samples: 39062176. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:38:37,933][41694] Avg episode reward: [(0, '4.415')] +[2024-11-08 05:38:38,251][42004] Updated weights for policy 0, policy_version 43036 (0.0032) +[2024-11-08 05:38:42,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6826.7, 300 sec: 6748.0). Total num frames: 176312320. Throughput: 0: 1771.3. Samples: 39073966. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:38:42,933][41694] Avg episode reward: [(0, '4.493')] +[2024-11-08 05:38:43,388][42004] Updated weights for policy 0, policy_version 43046 (0.0027) +[2024-11-08 05:38:47,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7105.7, 300 sec: 6761.9). Total num frames: 176349184. Throughput: 0: 1771.7. Samples: 39079620. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:38:47,934][41694] Avg episode reward: [(0, '4.418')] +[2024-11-08 05:38:48,762][42004] Updated weights for policy 0, policy_version 43056 (0.0028) +[2024-11-08 05:38:52,932][41694] Fps is (10 sec: 6963.0, 60 sec: 7099.7, 300 sec: 6761.9). Total num frames: 176381952. Throughput: 0: 1759.6. Samples: 39090300. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:38:52,933][41694] Avg episode reward: [(0, '4.389')] +[2024-11-08 05:38:57,494][42004] Updated weights for policy 0, policy_version 43066 (0.0035) +[2024-11-08 05:38:57,932][41694] Fps is (10 sec: 4914.8, 60 sec: 6758.3, 300 sec: 6678.5). Total num frames: 176398336. Throughput: 0: 1634.4. Samples: 39096584. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:38:57,934][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 05:39:02,932][41694] Fps is (10 sec: 4915.2, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 176431104. Throughput: 0: 1607.6. Samples: 39100894. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:39:02,935][41694] Avg episode reward: [(0, '4.440')] +[2024-11-08 05:39:03,869][42004] Updated weights for policy 0, policy_version 43076 (0.0048) +[2024-11-08 05:39:07,931][41694] Fps is (10 sec: 6963.8, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 176467968. Throughput: 0: 1630.2. Samples: 39111106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:39:07,934][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 05:39:09,428][42004] Updated weights for policy 0, policy_version 43086 (0.0037) +[2024-11-08 05:39:12,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.8, 300 sec: 6678.6). Total num frames: 176504832. Throughput: 0: 1720.1. Samples: 39122300. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:39:12,935][41694] Avg episode reward: [(0, '4.519')] +[2024-11-08 05:39:15,147][42004] Updated weights for policy 0, policy_version 43096 (0.0034) +[2024-11-08 05:39:17,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 176541696. Throughput: 0: 1719.7. Samples: 39127698. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:39:17,937][41694] Avg episode reward: [(0, '4.700')] +[2024-11-08 05:39:20,457][42004] Updated weights for policy 0, policy_version 43106 (0.0040) +[2024-11-08 05:39:22,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6748.0). Total num frames: 176578560. Throughput: 0: 1706.0. Samples: 39138948. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:39:22,934][41694] Avg episode reward: [(0, '4.481')] +[2024-11-08 05:39:26,177][42004] Updated weights for policy 0, policy_version 43116 (0.0034) +[2024-11-08 05:39:27,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6963.2, 300 sec: 6734.1). Total num frames: 176611328. Throughput: 0: 1686.0. Samples: 39149836. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:39:27,933][41694] Avg episode reward: [(0, '4.337')] +[2024-11-08 05:39:32,932][41694] Fps is (10 sec: 4915.2, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 176627712. Throughput: 0: 1633.9. Samples: 39153144. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:39:32,933][41694] Avg episode reward: [(0, '4.304')] +[2024-11-08 05:39:35,021][42004] Updated weights for policy 0, policy_version 43126 (0.0028) +[2024-11-08 05:39:37,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 176664576. Throughput: 0: 1556.0. Samples: 39160320. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:39:37,933][41694] Avg episode reward: [(0, '4.463')] +[2024-11-08 05:39:37,946][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000043131_176664576.pth... +[2024-11-08 05:39:38,043][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000042737_175050752.pth +[2024-11-08 05:39:40,775][42004] Updated weights for policy 0, policy_version 43136 (0.0042) +[2024-11-08 05:39:42,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6417.1, 300 sec: 6650.8). Total num frames: 176697344. Throughput: 0: 1652.4. Samples: 39170942. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:39:42,934][41694] Avg episode reward: [(0, '4.324')] +[2024-11-08 05:39:46,546][42004] Updated weights for policy 0, policy_version 43146 (0.0032) +[2024-11-08 05:39:47,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6417.1, 300 sec: 6650.8). Total num frames: 176734208. Throughput: 0: 1670.2. Samples: 39176054. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:39:47,934][41694] Avg episode reward: [(0, '4.604')] +[2024-11-08 05:39:52,175][42004] Updated weights for policy 0, policy_version 43156 (0.0021) +[2024-11-08 05:39:52,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6485.4, 300 sec: 6720.2). Total num frames: 176771072. Throughput: 0: 1686.3. Samples: 39186988. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:39:52,933][41694] Avg episode reward: [(0, '4.416')] +[2024-11-08 05:39:57,686][42004] Updated weights for policy 0, policy_version 43166 (0.0036) +[2024-11-08 05:39:57,935][41694] Fps is (10 sec: 7370.4, 60 sec: 6826.4, 300 sec: 6747.9). Total num frames: 176807936. Throughput: 0: 1687.5. Samples: 39198244. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:39:57,937][41694] Avg episode reward: [(0, '4.436')] +[2024-11-08 05:40:02,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6748.0). Total num frames: 176840704. Throughput: 0: 1687.3. Samples: 39203628. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:40:02,933][41694] Avg episode reward: [(0, '4.625')] +[2024-11-08 05:40:03,879][42004] Updated weights for policy 0, policy_version 43176 (0.0033) +[2024-11-08 05:40:07,932][41694] Fps is (10 sec: 5326.3, 60 sec: 6553.5, 300 sec: 6692.4). Total num frames: 176861184. Throughput: 0: 1589.5. Samples: 39210476. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:40:07,934][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 05:40:11,492][42004] Updated weights for policy 0, policy_version 43186 (0.0038) +[2024-11-08 05:40:12,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6553.6, 300 sec: 6706.3). Total num frames: 176898048. Throughput: 0: 1578.3. Samples: 39220858. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:40:12,933][41694] Avg episode reward: [(0, '4.598')] +[2024-11-08 05:40:17,662][42004] Updated weights for policy 0, policy_version 43196 (0.0028) +[2024-11-08 05:40:17,933][41694] Fps is (10 sec: 6962.8, 60 sec: 6485.2, 300 sec: 6692.4). Total num frames: 176930816. Throughput: 0: 1610.1. Samples: 39225600. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:40:17,935][41694] Avg episode reward: [(0, '4.712')] +[2024-11-08 05:40:22,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6485.4, 300 sec: 6692.5). Total num frames: 176967680. Throughput: 0: 1681.9. Samples: 39236004. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:40:22,933][41694] Avg episode reward: [(0, '4.563')] +[2024-11-08 05:40:23,502][42004] Updated weights for policy 0, policy_version 43206 (0.0037) +[2024-11-08 05:40:27,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6553.5, 300 sec: 6734.1). Total num frames: 177004544. Throughput: 0: 1687.6. Samples: 39246884. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:40:27,934][41694] Avg episode reward: [(0, '4.640')] +[2024-11-08 05:40:28,975][42004] Updated weights for policy 0, policy_version 43216 (0.0032) +[2024-11-08 05:40:32,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6894.9, 300 sec: 6748.0). Total num frames: 177041408. Throughput: 0: 1698.0. Samples: 39252462. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:40:32,933][41694] Avg episode reward: [(0, '4.621')] +[2024-11-08 05:40:34,645][42004] Updated weights for policy 0, policy_version 43226 (0.0038) +[2024-11-08 05:40:39,810][41694] Fps is (10 sec: 5862.5, 60 sec: 6619.4, 300 sec: 6691.5). Total num frames: 177074176. Throughput: 0: 1623.9. Samples: 39263112. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:40:39,813][41694] Avg episode reward: [(0, '4.588')] +[2024-11-08 05:40:42,913][42004] Updated weights for policy 0, policy_version 43236 (0.0036) +[2024-11-08 05:40:42,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6621.8, 300 sec: 6678.6). Total num frames: 177094656. Throughput: 0: 1581.1. Samples: 39269390. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:40:42,934][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 05:40:47,932][41694] Fps is (10 sec: 7060.3, 60 sec: 6621.9, 300 sec: 6678.6). Total num frames: 177131520. Throughput: 0: 1583.6. Samples: 39274892. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:40:47,934][41694] Avg episode reward: [(0, '4.436')] +[2024-11-08 05:40:48,377][42004] Updated weights for policy 0, policy_version 43246 (0.0024) +[2024-11-08 05:40:52,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.8, 300 sec: 6664.7). Total num frames: 177168384. Throughput: 0: 1682.2. Samples: 39286174. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:40:52,933][41694] Avg episode reward: [(0, '4.518')] +[2024-11-08 05:40:53,870][42004] Updated weights for policy 0, policy_version 43256 (0.0038) +[2024-11-08 05:40:57,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6622.2, 300 sec: 6736.3). Total num frames: 177205248. Throughput: 0: 1695.0. Samples: 39297134. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:40:57,934][41694] Avg episode reward: [(0, '4.544')] +[2024-11-08 05:40:59,668][42004] Updated weights for policy 0, policy_version 43266 (0.0029) +[2024-11-08 05:41:02,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 177238016. Throughput: 0: 1713.0. Samples: 39302682. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:41:02,933][41694] Avg episode reward: [(0, '4.118')] +[2024-11-08 05:41:05,411][42004] Updated weights for policy 0, policy_version 43276 (0.0049) +[2024-11-08 05:41:07,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 6761.9). Total num frames: 177274880. Throughput: 0: 1717.7. Samples: 39313300. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:41:07,935][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 05:41:11,878][42004] Updated weights for policy 0, policy_version 43286 (0.0038) +[2024-11-08 05:41:14,031][41694] Fps is (10 sec: 5535.1, 60 sec: 6569.7, 300 sec: 6695.3). Total num frames: 177299456. Throughput: 0: 1554.5. Samples: 39318544. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:41:14,034][41694] Avg episode reward: [(0, '4.377')] +[2024-11-08 05:41:17,931][41694] Fps is (10 sec: 4915.4, 60 sec: 6553.7, 300 sec: 6678.6). Total num frames: 177324032. Throughput: 0: 1598.0. Samples: 39324374. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:41:17,933][41694] Avg episode reward: [(0, '4.383')] +[2024-11-08 05:41:19,850][42004] Updated weights for policy 0, policy_version 43296 (0.0042) +[2024-11-08 05:41:22,932][41694] Fps is (10 sec: 6903.2, 60 sec: 6553.6, 300 sec: 6678.6). Total num frames: 177360896. Throughput: 0: 1662.6. Samples: 39334808. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:41:22,933][41694] Avg episode reward: [(0, '4.274')] +[2024-11-08 05:41:25,823][42004] Updated weights for policy 0, policy_version 43306 (0.0033) +[2024-11-08 05:41:27,932][41694] Fps is (10 sec: 6962.8, 60 sec: 6485.4, 300 sec: 6664.7). Total num frames: 177393664. Throughput: 0: 1684.2. Samples: 39345180. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:41:27,934][41694] Avg episode reward: [(0, '4.331')] +[2024-11-08 05:41:31,337][42004] Updated weights for policy 0, policy_version 43316 (0.0031) +[2024-11-08 05:41:32,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6485.3, 300 sec: 6729.0). Total num frames: 177430528. Throughput: 0: 1682.8. Samples: 39350616. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:41:32,934][41694] Avg episode reward: [(0, '4.522')] +[2024-11-08 05:41:36,779][42004] Updated weights for policy 0, policy_version 43326 (0.0029) +[2024-11-08 05:41:37,932][41694] Fps is (10 sec: 7782.6, 60 sec: 6835.8, 300 sec: 6761.9). Total num frames: 177471488. Throughput: 0: 1684.0. Samples: 39361956. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:41:37,934][41694] Avg episode reward: [(0, '4.589')] +[2024-11-08 05:41:37,956][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000043328_177471488.pth... +[2024-11-08 05:41:38,106][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000042936_175865856.pth +[2024-11-08 05:41:42,373][42004] Updated weights for policy 0, policy_version 43336 (0.0031) +[2024-11-08 05:41:42,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6748.0). Total num frames: 177504256. Throughput: 0: 1684.8. Samples: 39372950. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:41:42,936][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 05:41:48,281][41694] Fps is (10 sec: 5540.8, 60 sec: 6583.5, 300 sec: 6698.4). Total num frames: 177528832. Throughput: 0: 1658.1. Samples: 39377876. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:41:48,284][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 05:41:50,504][42004] Updated weights for policy 0, policy_version 43346 (0.0024) +[2024-11-08 05:41:52,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6553.6, 300 sec: 6692.4). Total num frames: 177561600. Throughput: 0: 1592.1. Samples: 39384942. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:41:52,933][41694] Avg episode reward: [(0, '4.220')] +[2024-11-08 05:41:55,954][42004] Updated weights for policy 0, policy_version 43356 (0.0026) +[2024-11-08 05:41:57,931][41694] Fps is (10 sec: 7215.4, 60 sec: 6553.6, 300 sec: 6692.4). Total num frames: 177598464. Throughput: 0: 1770.8. Samples: 39396284. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:41:57,933][41694] Avg episode reward: [(0, '4.404')] +[2024-11-08 05:42:01,223][42004] Updated weights for policy 0, policy_version 43366 (0.0026) +[2024-11-08 05:42:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6621.9, 300 sec: 6692.4). Total num frames: 177635328. Throughput: 0: 1721.9. Samples: 39401858. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:42:02,933][41694] Avg episode reward: [(0, '4.448')] +[2024-11-08 05:42:06,929][42004] Updated weights for policy 0, policy_version 43376 (0.0033) +[2024-11-08 05:42:07,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6742.7). Total num frames: 177672192. Throughput: 0: 1730.9. Samples: 39412700. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:42:07,933][41694] Avg episode reward: [(0, '4.488')] +[2024-11-08 05:42:12,270][42004] Updated weights for policy 0, policy_version 43386 (0.0037) +[2024-11-08 05:42:12,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7023.7, 300 sec: 6775.8). Total num frames: 177713152. Throughput: 0: 1759.6. Samples: 39424362. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:42:12,933][41694] Avg episode reward: [(0, '4.413')] +[2024-11-08 05:42:17,827][42004] Updated weights for policy 0, policy_version 43396 (0.0026) +[2024-11-08 05:42:17,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7099.7, 300 sec: 6775.8). Total num frames: 177750016. Throughput: 0: 1761.7. Samples: 39429892. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:42:17,933][41694] Avg episode reward: [(0, '4.606')] +[2024-11-08 05:42:22,932][41694] Fps is (10 sec: 5324.6, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 177766400. Throughput: 0: 1719.9. Samples: 39439350. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:42:22,934][41694] Avg episode reward: [(0, '4.534')] +[2024-11-08 05:42:25,800][42004] Updated weights for policy 0, policy_version 43406 (0.0026) +[2024-11-08 05:42:27,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6826.7, 300 sec: 6692.4). Total num frames: 177803264. Throughput: 0: 1663.2. Samples: 39447792. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:42:27,933][41694] Avg episode reward: [(0, '4.528')] +[2024-11-08 05:42:31,170][42004] Updated weights for policy 0, policy_version 43416 (0.0032) +[2024-11-08 05:42:32,931][41694] Fps is (10 sec: 7782.7, 60 sec: 6894.9, 300 sec: 6706.4). Total num frames: 177844224. Throughput: 0: 1686.6. Samples: 39453184. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:42:32,933][41694] Avg episode reward: [(0, '4.535')] +[2024-11-08 05:42:36,780][42004] Updated weights for policy 0, policy_version 43426 (0.0027) +[2024-11-08 05:42:37,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6758.4, 300 sec: 6692.4). Total num frames: 177876992. Throughput: 0: 1764.2. Samples: 39464332. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:42:37,933][41694] Avg episode reward: [(0, '4.539')] +[2024-11-08 05:42:42,237][42004] Updated weights for policy 0, policy_version 43436 (0.0028) +[2024-11-08 05:42:42,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6894.9, 300 sec: 6763.0). Total num frames: 177917952. Throughput: 0: 1763.6. Samples: 39475646. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:42:42,933][41694] Avg episode reward: [(0, '4.451')] +[2024-11-08 05:42:47,573][42004] Updated weights for policy 0, policy_version 43446 (0.0028) +[2024-11-08 05:42:47,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7141.3, 300 sec: 6775.8). Total num frames: 177954816. Throughput: 0: 1763.3. Samples: 39481208. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:42:47,934][41694] Avg episode reward: [(0, '4.552')] +[2024-11-08 05:42:52,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7168.0, 300 sec: 6775.8). Total num frames: 177991680. Throughput: 0: 1765.7. Samples: 39492158. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:42:52,933][41694] Avg episode reward: [(0, '4.478')] +[2024-11-08 05:42:53,465][42004] Updated weights for policy 0, policy_version 43456 (0.0035) +[2024-11-08 05:42:57,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 178008064. Throughput: 0: 1649.6. Samples: 39498596. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:42:57,933][41694] Avg episode reward: [(0, '4.201')] +[2024-11-08 05:43:01,274][42004] Updated weights for policy 0, policy_version 43466 (0.0048) +[2024-11-08 05:43:02,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6826.7, 300 sec: 6706.3). Total num frames: 178044928. Throughput: 0: 1650.2. Samples: 39504152. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:43:02,934][41694] Avg episode reward: [(0, '4.661')] +[2024-11-08 05:43:07,114][42004] Updated weights for policy 0, policy_version 43476 (0.0021) +[2024-11-08 05:43:07,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6692.4). Total num frames: 178081792. Throughput: 0: 1677.7. Samples: 39514844. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:43:07,934][41694] Avg episode reward: [(0, '4.552')] +[2024-11-08 05:43:12,408][42004] Updated weights for policy 0, policy_version 43486 (0.0025) +[2024-11-08 05:43:12,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6692.5). Total num frames: 178118656. Throughput: 0: 1748.9. Samples: 39526490. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:43:12,933][41694] Avg episode reward: [(0, '4.508')] +[2024-11-08 05:43:17,689][42004] Updated weights for policy 0, policy_version 43496 (0.0039) +[2024-11-08 05:43:17,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6826.7, 300 sec: 6775.8). Total num frames: 178159616. Throughput: 0: 1752.5. Samples: 39532048. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:43:17,933][41694] Avg episode reward: [(0, '4.288')] +[2024-11-08 05:43:22,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7168.0, 300 sec: 6789.7). Total num frames: 178196480. Throughput: 0: 1758.1. Samples: 39543448. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:43:22,933][41694] Avg episode reward: [(0, '4.612')] +[2024-11-08 05:43:23,260][42004] Updated weights for policy 0, policy_version 43506 (0.0026) +[2024-11-08 05:43:27,933][41694] Fps is (10 sec: 6962.5, 60 sec: 7099.6, 300 sec: 6761.8). Total num frames: 178229248. Throughput: 0: 1726.9. Samples: 39553360. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:43:27,935][41694] Avg episode reward: [(0, '4.701')] +[2024-11-08 05:43:31,830][42004] Updated weights for policy 0, policy_version 43516 (0.0040) +[2024-11-08 05:43:32,932][41694] Fps is (10 sec: 5324.4, 60 sec: 6758.3, 300 sec: 6706.3). Total num frames: 178249728. Throughput: 0: 1675.7. Samples: 39556614. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:43:32,934][41694] Avg episode reward: [(0, '4.474')] +[2024-11-08 05:43:37,249][42004] Updated weights for policy 0, policy_version 43526 (0.0032) +[2024-11-08 05:43:37,932][41694] Fps is (10 sec: 5735.0, 60 sec: 6826.6, 300 sec: 6692.4). Total num frames: 178286592. Throughput: 0: 1638.0. Samples: 39565870. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:43:37,934][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 05:43:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000043527_178286592.pth... +[2024-11-08 05:43:38,052][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000043131_176664576.pth +[2024-11-08 05:43:42,776][42004] Updated weights for policy 0, policy_version 43536 (0.0026) +[2024-11-08 05:43:42,931][41694] Fps is (10 sec: 7373.3, 60 sec: 6758.4, 300 sec: 6692.4). Total num frames: 178323456. Throughput: 0: 1740.4. Samples: 39576912. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:43:42,933][41694] Avg episode reward: [(0, '4.403')] +[2024-11-08 05:43:47,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 178360320. Throughput: 0: 1741.6. Samples: 39582524. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:43:47,934][41694] Avg episode reward: [(0, '4.317')] +[2024-11-08 05:43:48,123][42004] Updated weights for policy 0, policy_version 43546 (0.0022) +[2024-11-08 05:43:52,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 178397184. Throughput: 0: 1757.4. Samples: 39593926. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:43:52,935][41694] Avg episode reward: [(0, '4.568')] +[2024-11-08 05:43:53,566][42004] Updated weights for policy 0, policy_version 43556 (0.0030) +[2024-11-08 05:43:57,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7099.7, 300 sec: 6789.6). Total num frames: 178434048. Throughput: 0: 1748.0. Samples: 39605152. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:43:57,933][41694] Avg episode reward: [(0, '4.588')] +[2024-11-08 05:43:59,482][42004] Updated weights for policy 0, policy_version 43566 (0.0032) +[2024-11-08 05:44:02,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6963.2, 300 sec: 6761.9). Total num frames: 178462720. Throughput: 0: 1728.1. Samples: 39609814. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:44:02,934][41694] Avg episode reward: [(0, '4.577')] +[2024-11-08 05:44:07,932][41694] Fps is (10 sec: 4915.3, 60 sec: 6690.1, 300 sec: 6706.3). Total num frames: 178483200. Throughput: 0: 1616.9. Samples: 39616208. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:44:07,933][41694] Avg episode reward: [(0, '4.577')] +[2024-11-08 05:44:08,093][42004] Updated weights for policy 0, policy_version 43576 (0.0032) +[2024-11-08 05:44:12,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6690.1, 300 sec: 6706.3). Total num frames: 178520064. Throughput: 0: 1632.3. Samples: 39626810. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:44:12,933][41694] Avg episode reward: [(0, '4.455')] +[2024-11-08 05:44:13,468][42004] Updated weights for policy 0, policy_version 43586 (0.0026) +[2024-11-08 05:44:17,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6690.1, 300 sec: 6720.2). Total num frames: 178561024. Throughput: 0: 1681.3. Samples: 39632270. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:44:17,935][41694] Avg episode reward: [(0, '4.297')] +[2024-11-08 05:44:19,021][42004] Updated weights for policy 0, policy_version 43596 (0.0032) +[2024-11-08 05:44:22,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6621.8, 300 sec: 6720.2). Total num frames: 178593792. Throughput: 0: 1715.0. Samples: 39643046. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:44:22,934][41694] Avg episode reward: [(0, '4.571')] +[2024-11-08 05:44:24,836][42004] Updated weights for policy 0, policy_version 43606 (0.0025) +[2024-11-08 05:44:27,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6690.3, 300 sec: 6789.6). Total num frames: 178630656. Throughput: 0: 1715.2. Samples: 39654096. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:44:27,933][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 05:44:30,367][42004] Updated weights for policy 0, policy_version 43616 (0.0037) +[2024-11-08 05:44:32,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6895.0, 300 sec: 6775.8). Total num frames: 178663424. Throughput: 0: 1714.8. Samples: 39659690. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 05:44:32,934][41694] Avg episode reward: [(0, '4.488')] +[2024-11-08 05:44:36,648][42004] Updated weights for policy 0, policy_version 43626 (0.0037) +[2024-11-08 05:44:39,647][41694] Fps is (10 sec: 5593.8, 60 sec: 6636.9, 300 sec: 6736.6). Total num frames: 178696192. Throughput: 0: 1620.4. Samples: 39669626. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 05:44:39,650][41694] Avg episode reward: [(0, '4.540')] +[2024-11-08 05:44:42,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 178720768. Throughput: 0: 1593.1. Samples: 39676840. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 05:44:42,933][41694] Avg episode reward: [(0, '4.546')] +[2024-11-08 05:44:44,341][42004] Updated weights for policy 0, policy_version 43636 (0.0044) +[2024-11-08 05:44:47,932][41694] Fps is (10 sec: 7416.3, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 178757632. Throughput: 0: 1607.0. Samples: 39682128. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 05:44:47,934][41694] Avg episode reward: [(0, '4.408')] +[2024-11-08 05:44:49,887][42004] Updated weights for policy 0, policy_version 43646 (0.0036) +[2024-11-08 05:44:52,934][41694] Fps is (10 sec: 7370.9, 60 sec: 6621.6, 300 sec: 6734.1). Total num frames: 178794496. Throughput: 0: 1709.6. Samples: 39693144. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 05:44:52,937][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 05:44:55,773][42004] Updated weights for policy 0, policy_version 43656 (0.0037) +[2024-11-08 05:44:57,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6553.6, 300 sec: 6734.1). Total num frames: 178827264. Throughput: 0: 1707.0. Samples: 39703624. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:44:57,934][41694] Avg episode reward: [(0, '4.255')] +[2024-11-08 05:45:01,426][42004] Updated weights for policy 0, policy_version 43666 (0.0031) +[2024-11-08 05:45:02,932][41694] Fps is (10 sec: 6964.4, 60 sec: 6690.1, 300 sec: 6789.6). Total num frames: 178864128. Throughput: 0: 1708.1. Samples: 39709136. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:45:02,934][41694] Avg episode reward: [(0, '4.339')] +[2024-11-08 05:45:07,549][42004] Updated weights for policy 0, policy_version 43676 (0.0027) +[2024-11-08 05:45:07,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6895.0, 300 sec: 6775.8). Total num frames: 178896896. Throughput: 0: 1694.9. Samples: 39719314. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:45:07,936][41694] Avg episode reward: [(0, '4.346')] +[2024-11-08 05:45:13,947][41694] Fps is (10 sec: 5578.1, 60 sec: 6645.9, 300 sec: 6738.7). Total num frames: 178925568. Throughput: 0: 1527.4. Samples: 39724380. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:45:13,949][41694] Avg episode reward: [(0, '4.338')] +[2024-11-08 05:45:15,989][42004] Updated weights for policy 0, policy_version 43686 (0.0043) +[2024-11-08 05:45:17,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6417.1, 300 sec: 6706.3). Total num frames: 178946048. Throughput: 0: 1579.6. Samples: 39730772. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:45:17,934][41694] Avg episode reward: [(0, '4.510')] +[2024-11-08 05:45:21,976][42004] Updated weights for policy 0, policy_version 43696 (0.0031) +[2024-11-08 05:45:22,933][41694] Fps is (10 sec: 6381.5, 60 sec: 6485.2, 300 sec: 6706.3). Total num frames: 178982912. Throughput: 0: 1630.6. Samples: 39740206. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:45:22,935][41694] Avg episode reward: [(0, '4.598')] +[2024-11-08 05:45:27,535][42004] Updated weights for policy 0, policy_version 43706 (0.0025) +[2024-11-08 05:45:27,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6485.3, 300 sec: 6706.3). Total num frames: 179019776. Throughput: 0: 1657.2. Samples: 39751416. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:45:27,932][41694] Avg episode reward: [(0, '4.623')] +[2024-11-08 05:45:32,932][41694] Fps is (10 sec: 7373.2, 60 sec: 6553.5, 300 sec: 6763.3). Total num frames: 179056640. Throughput: 0: 1667.0. Samples: 39757144. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:45:32,935][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 05:45:33,059][42004] Updated weights for policy 0, policy_version 43716 (0.0026) +[2024-11-08 05:45:37,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6816.8, 300 sec: 6775.8). Total num frames: 179093504. Throughput: 0: 1666.8. Samples: 39768148. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:45:37,933][41694] Avg episode reward: [(0, '4.611')] +[2024-11-08 05:45:38,024][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000043725_179097600.pth... +[2024-11-08 05:45:38,118][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000043328_177471488.pth +[2024-11-08 05:45:38,688][42004] Updated weights for policy 0, policy_version 43726 (0.0034) +[2024-11-08 05:45:42,932][41694] Fps is (10 sec: 6963.7, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 179126272. Throughput: 0: 1665.3. Samples: 39778562. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:45:42,936][41694] Avg episode reward: [(0, '4.447')] +[2024-11-08 05:45:44,792][42004] Updated weights for policy 0, policy_version 43736 (0.0022) +[2024-11-08 05:45:48,273][41694] Fps is (10 sec: 5545.2, 60 sec: 6516.6, 300 sec: 6712.5). Total num frames: 179150848. Throughput: 0: 1643.5. Samples: 39783654. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:45:48,279][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 05:45:52,420][42004] Updated weights for policy 0, policy_version 43746 (0.0033) +[2024-11-08 05:45:52,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6485.6, 300 sec: 6706.3). Total num frames: 179183616. Throughput: 0: 1587.1. Samples: 39790736. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:45:52,934][41694] Avg episode reward: [(0, '4.529')] +[2024-11-08 05:45:57,919][42004] Updated weights for policy 0, policy_version 43756 (0.0036) +[2024-11-08 05:45:57,932][41694] Fps is (10 sec: 7633.1, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 179224576. Throughput: 0: 1766.3. Samples: 39802070. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:45:57,934][41694] Avg episode reward: [(0, '4.469')] +[2024-11-08 05:46:02,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 179257344. Throughput: 0: 1708.3. Samples: 39807648. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:46:02,934][41694] Avg episode reward: [(0, '4.673')] +[2024-11-08 05:46:03,875][42004] Updated weights for policy 0, policy_version 43766 (0.0041) +[2024-11-08 05:46:07,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6621.9, 300 sec: 6787.2). Total num frames: 179294208. Throughput: 0: 1725.4. Samples: 39817846. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:46:07,933][41694] Avg episode reward: [(0, '4.450')] +[2024-11-08 05:46:09,668][42004] Updated weights for policy 0, policy_version 43776 (0.0025) +[2024-11-08 05:46:12,932][41694] Fps is (10 sec: 6963.4, 60 sec: 6805.3, 300 sec: 6789.6). Total num frames: 179326976. Throughput: 0: 1697.5. Samples: 39827806. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:46:12,934][41694] Avg episode reward: [(0, '4.568')] +[2024-11-08 05:46:16,283][42004] Updated weights for policy 0, policy_version 43786 (0.0032) +[2024-11-08 05:46:17,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6826.7, 300 sec: 6761.9). Total num frames: 179355648. Throughput: 0: 1664.7. Samples: 39832056. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:46:17,933][41694] Avg episode reward: [(0, '4.383')] +[2024-11-08 05:46:22,932][41694] Fps is (10 sec: 4915.3, 60 sec: 6553.7, 300 sec: 6720.2). Total num frames: 179376128. Throughput: 0: 1624.2. Samples: 39841236. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:46:22,933][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 05:46:24,606][42004] Updated weights for policy 0, policy_version 43796 (0.0048) +[2024-11-08 05:46:27,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 179412992. Throughput: 0: 1569.7. Samples: 39849200. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:46:27,934][41694] Avg episode reward: [(0, '4.511')] +[2024-11-08 05:46:29,970][42004] Updated weights for policy 0, policy_version 43806 (0.0031) +[2024-11-08 05:46:32,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6553.7, 300 sec: 6706.3). Total num frames: 179449856. Throughput: 0: 1594.3. Samples: 39854854. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:46:32,933][41694] Avg episode reward: [(0, '4.567')] +[2024-11-08 05:46:35,544][42004] Updated weights for policy 0, policy_version 43816 (0.0033) +[2024-11-08 05:46:37,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 179486720. Throughput: 0: 1672.2. Samples: 39865984. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:46:37,933][41694] Avg episode reward: [(0, '4.399')] +[2024-11-08 05:46:41,210][42004] Updated weights for policy 0, policy_version 43826 (0.0033) +[2024-11-08 05:46:42,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6553.6, 300 sec: 6756.0). Total num frames: 179519488. Throughput: 0: 1663.1. Samples: 39876908. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:46:42,933][41694] Avg episode reward: [(0, '4.518')] +[2024-11-08 05:46:46,931][42004] Updated weights for policy 0, policy_version 43836 (0.0039) +[2024-11-08 05:46:47,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6797.1, 300 sec: 6761.9). Total num frames: 179556352. Throughput: 0: 1656.6. Samples: 39882194. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:46:47,933][41694] Avg episode reward: [(0, '4.271')] +[2024-11-08 05:46:52,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 179589120. Throughput: 0: 1653.4. Samples: 39892250. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:46:52,935][41694] Avg episode reward: [(0, '4.527')] +[2024-11-08 05:46:52,990][42004] Updated weights for policy 0, policy_version 43846 (0.0039) +[2024-11-08 05:46:57,932][41694] Fps is (10 sec: 5324.6, 60 sec: 6417.0, 300 sec: 6692.4). Total num frames: 179609600. Throughput: 0: 1587.8. Samples: 39899256. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:46:57,934][41694] Avg episode reward: [(0, '4.541')] +[2024-11-08 05:47:00,626][42004] Updated weights for policy 0, policy_version 43856 (0.0051) +[2024-11-08 05:47:02,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6485.4, 300 sec: 6692.4). Total num frames: 179646464. Throughput: 0: 1617.1. Samples: 39904826. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:47:02,936][41694] Avg episode reward: [(0, '4.661')] +[2024-11-08 05:47:06,798][42004] Updated weights for policy 0, policy_version 43866 (0.0027) +[2024-11-08 05:47:07,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6485.3, 300 sec: 6678.6). Total num frames: 179683328. Throughput: 0: 1638.9. Samples: 39914988. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:47:07,933][41694] Avg episode reward: [(0, '4.660')] +[2024-11-08 05:47:12,138][42004] Updated weights for policy 0, policy_version 43876 (0.0035) +[2024-11-08 05:47:12,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6553.6, 300 sec: 6678.6). Total num frames: 179720192. Throughput: 0: 1712.2. Samples: 39926250. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:47:12,933][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 05:47:17,529][42004] Updated weights for policy 0, policy_version 43886 (0.0023) +[2024-11-08 05:47:17,932][41694] Fps is (10 sec: 7372.3, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 179757056. Throughput: 0: 1710.0. Samples: 39931806. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:47:17,934][41694] Avg episode reward: [(0, '4.327')] +[2024-11-08 05:47:22,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6748.0). Total num frames: 179793920. Throughput: 0: 1709.7. Samples: 39942920. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:47:22,933][41694] Avg episode reward: [(0, '4.450')] +[2024-11-08 05:47:23,266][42004] Updated weights for policy 0, policy_version 43896 (0.0023) +[2024-11-08 05:47:27,931][41694] Fps is (10 sec: 7373.3, 60 sec: 6963.2, 300 sec: 6734.1). Total num frames: 179830784. Throughput: 0: 1713.2. Samples: 39954002. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:47:27,935][41694] Avg episode reward: [(0, '4.311')] +[2024-11-08 05:47:28,724][42004] Updated weights for policy 0, policy_version 43906 (0.0028) +[2024-11-08 05:47:32,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6690.1, 300 sec: 6692.4). Total num frames: 179851264. Throughput: 0: 1680.5. Samples: 39957818. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:47:32,933][41694] Avg episode reward: [(0, '4.410')] +[2024-11-08 05:47:36,610][42004] Updated weights for policy 0, policy_version 43916 (0.0024) +[2024-11-08 05:47:37,934][41694] Fps is (10 sec: 5732.9, 60 sec: 6689.8, 300 sec: 6678.5). Total num frames: 179888128. Throughput: 0: 1649.1. Samples: 39966464. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:47:37,945][41694] Avg episode reward: [(0, '4.416')] +[2024-11-08 05:47:37,963][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000043918_179888128.pth... +[2024-11-08 05:47:38,196][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000043527_178286592.pth +[2024-11-08 05:47:42,077][42004] Updated weights for policy 0, policy_version 43926 (0.0025) +[2024-11-08 05:47:42,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6678.6). Total num frames: 179924992. Throughput: 0: 1741.8. Samples: 39977636. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:47:42,934][41694] Avg episode reward: [(0, '4.440')] +[2024-11-08 05:47:47,295][42004] Updated weights for policy 0, policy_version 43936 (0.0030) +[2024-11-08 05:47:47,931][41694] Fps is (10 sec: 7784.5, 60 sec: 6826.7, 300 sec: 6692.4). Total num frames: 179965952. Throughput: 0: 1744.9. Samples: 39983344. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:47:47,933][41694] Avg episode reward: [(0, '4.344')] +[2024-11-08 05:47:52,565][42004] Updated weights for policy 0, policy_version 43946 (0.0036) +[2024-11-08 05:47:52,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6894.9, 300 sec: 6761.9). Total num frames: 180002816. Throughput: 0: 1779.6. Samples: 39995072. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:47:52,933][41694] Avg episode reward: [(0, '4.411')] +[2024-11-08 05:47:57,932][41694] Fps is (10 sec: 7372.5, 60 sec: 7168.0, 300 sec: 6761.9). Total num frames: 180039680. Throughput: 0: 1786.0. Samples: 40006620. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:47:57,933][41694] Avg episode reward: [(0, '4.463')] +[2024-11-08 05:47:57,965][42004] Updated weights for policy 0, policy_version 43956 (0.0028) +[2024-11-08 05:48:02,932][41694] Fps is (10 sec: 7372.3, 60 sec: 7167.9, 300 sec: 6761.9). Total num frames: 180076544. Throughput: 0: 1784.4. Samples: 40012106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:48:02,937][41694] Avg episode reward: [(0, '4.734')] +[2024-11-08 05:48:06,405][42004] Updated weights for policy 0, policy_version 43966 (0.0031) +[2024-11-08 05:48:07,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6826.7, 300 sec: 6692.4). Total num frames: 180092928. Throughput: 0: 1671.7. Samples: 40018148. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:48:07,934][41694] Avg episode reward: [(0, '4.715')] +[2024-11-08 05:48:12,047][42004] Updated weights for policy 0, policy_version 43976 (0.0028) +[2024-11-08 05:48:12,932][41694] Fps is (10 sec: 5325.1, 60 sec: 6826.7, 300 sec: 6678.6). Total num frames: 180129792. Throughput: 0: 1661.9. Samples: 40028788. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:48:12,934][41694] Avg episode reward: [(0, '4.328')] +[2024-11-08 05:48:17,449][42004] Updated weights for policy 0, policy_version 43986 (0.0031) +[2024-11-08 05:48:17,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6678.6). Total num frames: 180166656. Throughput: 0: 1699.5. Samples: 40034294. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:48:17,933][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 05:48:22,792][42004] Updated weights for policy 0, policy_version 43996 (0.0038) +[2024-11-08 05:48:22,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6895.0, 300 sec: 6706.4). Total num frames: 180207616. Throughput: 0: 1767.3. Samples: 40045988. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:48:22,932][41694] Avg episode reward: [(0, '4.441')] +[2024-11-08 05:48:27,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6894.9, 300 sec: 6761.9). Total num frames: 180244480. Throughput: 0: 1777.5. Samples: 40057624. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:48:27,933][41694] Avg episode reward: [(0, '4.482')] +[2024-11-08 05:48:28,048][42004] Updated weights for policy 0, policy_version 44006 (0.0025) +[2024-11-08 05:48:32,932][41694] Fps is (10 sec: 7781.6, 60 sec: 7236.2, 300 sec: 6775.7). Total num frames: 180285440. Throughput: 0: 1776.3. Samples: 40063278. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:48:32,933][41694] Avg episode reward: [(0, '4.660')] +[2024-11-08 05:48:33,301][42004] Updated weights for policy 0, policy_version 44016 (0.0025) +[2024-11-08 05:48:37,933][41694] Fps is (10 sec: 7371.3, 60 sec: 7168.1, 300 sec: 6761.8). Total num frames: 180318208. Throughput: 0: 1756.7. Samples: 40074126. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:48:37,936][41694] Avg episode reward: [(0, '4.564')] +[2024-11-08 05:48:41,683][42004] Updated weights for policy 0, policy_version 44026 (0.0030) +[2024-11-08 05:48:42,931][41694] Fps is (10 sec: 5325.3, 60 sec: 6895.0, 300 sec: 6706.3). Total num frames: 180338688. Throughput: 0: 1643.6. Samples: 40080582. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:48:42,933][41694] Avg episode reward: [(0, '4.539')] +[2024-11-08 05:48:47,365][42004] Updated weights for policy 0, policy_version 44036 (0.0034) +[2024-11-08 05:48:47,931][41694] Fps is (10 sec: 5735.6, 60 sec: 6826.7, 300 sec: 6706.3). Total num frames: 180375552. Throughput: 0: 1638.6. Samples: 40085840. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:48:47,933][41694] Avg episode reward: [(0, '4.627')] +[2024-11-08 05:48:52,827][42004] Updated weights for policy 0, policy_version 44046 (0.0035) +[2024-11-08 05:48:52,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6706.3). Total num frames: 180412416. Throughput: 0: 1755.5. Samples: 40097146. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:48:52,934][41694] Avg episode reward: [(0, '4.295')] +[2024-11-08 05:48:57,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6734.1). Total num frames: 180449280. Throughput: 0: 1772.4. Samples: 40108544. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:48:57,933][41694] Avg episode reward: [(0, '4.343')] +[2024-11-08 05:48:58,325][42004] Updated weights for policy 0, policy_version 44056 (0.0037) +[2024-11-08 05:49:02,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6789.6). Total num frames: 180486144. Throughput: 0: 1769.9. Samples: 40113938. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:49:02,933][41694] Avg episode reward: [(0, '4.492')] +[2024-11-08 05:49:04,107][42004] Updated weights for policy 0, policy_version 44066 (0.0028) +[2024-11-08 05:49:07,932][41694] Fps is (10 sec: 6962.9, 60 sec: 7099.7, 300 sec: 6775.7). Total num frames: 180518912. Throughput: 0: 1742.2. Samples: 40124390. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:49:07,936][41694] Avg episode reward: [(0, '4.184')] +[2024-11-08 05:49:10,031][42004] Updated weights for policy 0, policy_version 44076 (0.0031) +[2024-11-08 05:49:14,469][41694] Fps is (10 sec: 5325.4, 60 sec: 6789.3, 300 sec: 6699.2). Total num frames: 180547584. Throughput: 0: 1649.3. Samples: 40134378. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:49:14,470][41694] Avg episode reward: [(0, '4.422')] +[2024-11-08 05:49:17,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 180572160. Throughput: 0: 1614.0. Samples: 40135906. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:49:17,933][41694] Avg episode reward: [(0, '4.277')] +[2024-11-08 05:49:18,164][42004] Updated weights for policy 0, policy_version 44086 (0.0030) +[2024-11-08 05:49:22,932][41694] Fps is (10 sec: 7259.8, 60 sec: 6690.1, 300 sec: 6706.3). Total num frames: 180609024. Throughput: 0: 1620.8. Samples: 40147060. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:49:22,934][41694] Avg episode reward: [(0, '4.382')] +[2024-11-08 05:49:23,604][42004] Updated weights for policy 0, policy_version 44096 (0.0031) +[2024-11-08 05:49:27,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6720.2). Total num frames: 180645888. Throughput: 0: 1721.4. Samples: 40158046. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:49:27,934][41694] Avg episode reward: [(0, '4.585')] +[2024-11-08 05:49:29,157][42004] Updated weights for policy 0, policy_version 44106 (0.0040) +[2024-11-08 05:49:32,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6690.2, 300 sec: 6787.5). Total num frames: 180686848. Throughput: 0: 1731.4. Samples: 40163752. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:49:32,934][41694] Avg episode reward: [(0, '4.391')] +[2024-11-08 05:49:34,597][42004] Updated weights for policy 0, policy_version 44116 (0.0026) +[2024-11-08 05:49:37,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.6, 300 sec: 6789.6). Total num frames: 180723712. Throughput: 0: 1729.1. Samples: 40174954. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:49:37,933][41694] Avg episode reward: [(0, '4.543')] +[2024-11-08 05:49:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000044122_180723712.pth... +[2024-11-08 05:49:38,057][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000043725_179097600.pth +[2024-11-08 05:49:40,113][42004] Updated weights for policy 0, policy_version 44126 (0.0029) +[2024-11-08 05:49:42,934][41694] Fps is (10 sec: 6961.3, 60 sec: 6962.9, 300 sec: 6775.7). Total num frames: 180756480. Throughput: 0: 1719.6. Samples: 40185932. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:49:42,940][41694] Avg episode reward: [(0, '4.694')] +[2024-11-08 05:49:46,186][42004] Updated weights for policy 0, policy_version 44136 (0.0029) +[2024-11-08 05:49:48,886][41694] Fps is (10 sec: 5235.2, 60 sec: 6652.7, 300 sec: 6712.5). Total num frames: 180781056. Throughput: 0: 1667.6. Samples: 40190570. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:49:48,888][41694] Avg episode reward: [(0, '4.499')] +[2024-11-08 05:49:52,931][41694] Fps is (10 sec: 5736.1, 60 sec: 6690.1, 300 sec: 6734.1). Total num frames: 180813824. Throughput: 0: 1624.9. Samples: 40197508. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:49:52,932][41694] Avg episode reward: [(0, '4.544')] +[2024-11-08 05:49:53,774][42004] Updated weights for policy 0, policy_version 44146 (0.0022) +[2024-11-08 05:49:57,932][41694] Fps is (10 sec: 7697.1, 60 sec: 6690.1, 300 sec: 6734.1). Total num frames: 180850688. Throughput: 0: 1714.4. Samples: 40208890. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:49:57,933][41694] Avg episode reward: [(0, '4.366')] +[2024-11-08 05:49:59,459][42004] Updated weights for policy 0, policy_version 44156 (0.0028) +[2024-11-08 05:50:02,931][41694] Fps is (10 sec: 6963.1, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 180883456. Throughput: 0: 1745.3. Samples: 40214444. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:50:02,934][41694] Avg episode reward: [(0, '4.580')] +[2024-11-08 05:50:05,170][42004] Updated weights for policy 0, policy_version 44166 (0.0053) +[2024-11-08 05:50:07,933][41694] Fps is (10 sec: 7371.8, 60 sec: 6758.3, 300 sec: 6799.1). Total num frames: 180924416. Throughput: 0: 1739.4. Samples: 40225336. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:50:07,936][41694] Avg episode reward: [(0, '4.575')] +[2024-11-08 05:50:10,708][42004] Updated weights for policy 0, policy_version 44176 (0.0028) +[2024-11-08 05:50:12,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7006.2, 300 sec: 6817.4). Total num frames: 180957184. Throughput: 0: 1741.7. Samples: 40236422. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:50:12,935][41694] Avg episode reward: [(0, '4.415')] +[2024-11-08 05:50:16,660][42004] Updated weights for policy 0, policy_version 44186 (0.0022) +[2024-11-08 05:50:17,932][41694] Fps is (10 sec: 6963.9, 60 sec: 7031.4, 300 sec: 6817.4). Total num frames: 180994048. Throughput: 0: 1712.8. Samples: 40240830. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:50:17,935][41694] Avg episode reward: [(0, '4.581')] +[2024-11-08 05:50:23,280][41694] Fps is (10 sec: 5541.4, 60 sec: 6719.4, 300 sec: 6753.9). Total num frames: 181014528. Throughput: 0: 1681.6. Samples: 40251212. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:50:23,283][41694] Avg episode reward: [(0, '4.433')] +[2024-11-08 05:50:24,830][42004] Updated weights for policy 0, policy_version 44196 (0.0031) +[2024-11-08 05:50:27,932][41694] Fps is (10 sec: 5325.0, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 181047296. Throughput: 0: 1609.0. Samples: 40258334. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:50:27,933][41694] Avg episode reward: [(0, '4.515')] +[2024-11-08 05:50:30,351][42004] Updated weights for policy 0, policy_version 44206 (0.0023) +[2024-11-08 05:50:32,932][41694] Fps is (10 sec: 7214.3, 60 sec: 6621.9, 300 sec: 6748.0). Total num frames: 181084160. Throughput: 0: 1666.8. Samples: 40263988. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:50:32,933][41694] Avg episode reward: [(0, '4.505')] +[2024-11-08 05:50:35,641][42004] Updated weights for policy 0, policy_version 44216 (0.0020) +[2024-11-08 05:50:37,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6621.8, 300 sec: 6761.9). Total num frames: 181121024. Throughput: 0: 1730.2. Samples: 40275366. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:50:37,933][41694] Avg episode reward: [(0, '4.375')] +[2024-11-08 05:50:41,789][42004] Updated weights for policy 0, policy_version 44226 (0.0031) +[2024-11-08 05:50:42,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.5, 300 sec: 6811.4). Total num frames: 181157888. Throughput: 0: 1700.1. Samples: 40285394. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:50:42,935][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 05:50:47,161][42004] Updated weights for policy 0, policy_version 44236 (0.0026) +[2024-11-08 05:50:47,932][41694] Fps is (10 sec: 7373.0, 60 sec: 7006.3, 300 sec: 6817.4). Total num frames: 181194752. Throughput: 0: 1698.3. Samples: 40290866. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:50:47,934][41694] Avg episode reward: [(0, '4.465')] +[2024-11-08 05:50:52,597][42004] Updated weights for policy 0, policy_version 44246 (0.0032) +[2024-11-08 05:50:52,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6963.2, 300 sec: 6803.5). Total num frames: 181231616. Throughput: 0: 1709.4. Samples: 40302256. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:50:52,934][41694] Avg episode reward: [(0, '4.341')] +[2024-11-08 05:50:57,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 181252096. Throughput: 0: 1655.9. Samples: 40310940. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:50:57,934][41694] Avg episode reward: [(0, '4.294')] +[2024-11-08 05:51:00,365][42004] Updated weights for policy 0, policy_version 44256 (0.0027) +[2024-11-08 05:51:02,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 181288960. Throughput: 0: 1648.5. Samples: 40315014. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:51:02,934][41694] Avg episode reward: [(0, '4.320')] +[2024-11-08 05:51:06,099][42004] Updated weights for policy 0, policy_version 44266 (0.0029) +[2024-11-08 05:51:07,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.3, 300 sec: 6775.8). Total num frames: 181325824. Throughput: 0: 1670.1. Samples: 40325784. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:51:07,933][41694] Avg episode reward: [(0, '4.225')] +[2024-11-08 05:51:12,536][42004] Updated weights for policy 0, policy_version 44276 (0.0035) +[2024-11-08 05:51:12,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6621.9, 300 sec: 6775.8). Total num frames: 181354496. Throughput: 0: 1709.4. Samples: 40335256. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:51:12,934][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 05:51:17,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6553.7, 300 sec: 6817.4). Total num frames: 181387264. Throughput: 0: 1680.9. Samples: 40339630. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:51:17,933][41694] Avg episode reward: [(0, '4.479')] +[2024-11-08 05:51:18,730][42004] Updated weights for policy 0, policy_version 44286 (0.0033) +[2024-11-08 05:51:22,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6866.5, 300 sec: 6817.4). Total num frames: 181424128. Throughput: 0: 1666.1. Samples: 40350342. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:51:22,934][41694] Avg episode reward: [(0, '4.436')] +[2024-11-08 05:51:24,676][42004] Updated weights for policy 0, policy_version 44296 (0.0052) +[2024-11-08 05:51:27,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6894.9, 300 sec: 6817.4). Total num frames: 181460992. Throughput: 0: 1681.1. Samples: 40361044. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:51:27,934][41694] Avg episode reward: [(0, '4.433')] +[2024-11-08 05:51:32,296][42004] Updated weights for policy 0, policy_version 44306 (0.0026) +[2024-11-08 05:51:32,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6621.9, 300 sec: 6761.9). Total num frames: 181481472. Throughput: 0: 1681.9. Samples: 40366550. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:51:32,933][41694] Avg episode reward: [(0, '4.580')] +[2024-11-08 05:51:37,897][42004] Updated weights for policy 0, policy_version 44316 (0.0030) +[2024-11-08 05:51:37,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6621.9, 300 sec: 6775.8). Total num frames: 181518336. Throughput: 0: 1583.7. Samples: 40373520. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:51:37,933][41694] Avg episode reward: [(0, '4.527')] +[2024-11-08 05:51:37,945][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000044316_181518336.pth... +[2024-11-08 05:51:38,058][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000043918_179888128.pth +[2024-11-08 05:51:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6775.8). Total num frames: 181555200. Throughput: 0: 1636.8. Samples: 40384596. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:51:42,933][41694] Avg episode reward: [(0, '4.589')] +[2024-11-08 05:51:43,461][42004] Updated weights for policy 0, policy_version 44326 (0.0035) +[2024-11-08 05:51:47,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6553.6, 300 sec: 6775.8). Total num frames: 181587968. Throughput: 0: 1662.6. Samples: 40389832. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:51:47,934][41694] Avg episode reward: [(0, '4.402')] +[2024-11-08 05:51:49,655][42004] Updated weights for policy 0, policy_version 44336 (0.0026) +[2024-11-08 05:51:52,931][41694] Fps is (10 sec: 6963.1, 60 sec: 6553.6, 300 sec: 6831.3). Total num frames: 181624832. Throughput: 0: 1647.7. Samples: 40399932. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:51:52,933][41694] Avg episode reward: [(0, '4.498')] +[2024-11-08 05:51:55,217][42004] Updated weights for policy 0, policy_version 44346 (0.0036) +[2024-11-08 05:51:57,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 181657600. Throughput: 0: 1684.8. Samples: 40411072. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 05:51:57,933][41694] Avg episode reward: [(0, '4.454')] +[2024-11-08 05:52:01,049][42004] Updated weights for policy 0, policy_version 44356 (0.0025) +[2024-11-08 05:52:02,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 181694464. Throughput: 0: 1708.3. Samples: 40416504. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 05:52:02,933][41694] Avg episode reward: [(0, '4.525')] +[2024-11-08 05:52:07,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6485.3, 300 sec: 6761.9). Total num frames: 181714944. Throughput: 0: 1636.0. Samples: 40423962. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 05:52:07,933][41694] Avg episode reward: [(0, '4.639')] +[2024-11-08 05:52:08,971][42004] Updated weights for policy 0, policy_version 44366 (0.0031) +[2024-11-08 05:52:12,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6621.9, 300 sec: 6761.9). Total num frames: 181751808. Throughput: 0: 1616.9. Samples: 40433806. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 05:52:12,933][41694] Avg episode reward: [(0, '4.355')] +[2024-11-08 05:52:14,669][42004] Updated weights for policy 0, policy_version 44376 (0.0031) +[2024-11-08 05:52:17,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6621.8, 300 sec: 6748.0). Total num frames: 181784576. Throughput: 0: 1612.9. Samples: 40439130. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 05:52:17,933][41694] Avg episode reward: [(0, '4.492')] +[2024-11-08 05:52:20,686][42004] Updated weights for policy 0, policy_version 44386 (0.0035) +[2024-11-08 05:52:22,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6734.1). Total num frames: 181817344. Throughput: 0: 1684.2. Samples: 40449308. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 05:52:22,936][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 05:52:26,521][42004] Updated weights for policy 0, policy_version 44396 (0.0031) +[2024-11-08 05:52:27,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6553.6, 300 sec: 6789.6). Total num frames: 181854208. Throughput: 0: 1673.0. Samples: 40459880. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:52:27,934][41694] Avg episode reward: [(0, '4.412')] +[2024-11-08 05:52:32,087][42004] Updated weights for policy 0, policy_version 44406 (0.0031) +[2024-11-08 05:52:32,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.6, 300 sec: 6789.7). Total num frames: 181891072. Throughput: 0: 1674.1. Samples: 40465168. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:52:32,934][41694] Avg episode reward: [(0, '4.367')] +[2024-11-08 05:52:37,700][42004] Updated weights for policy 0, policy_version 44416 (0.0034) +[2024-11-08 05:52:37,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.6, 300 sec: 6789.6). Total num frames: 181927936. Throughput: 0: 1699.7. Samples: 40476420. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:52:37,934][41694] Avg episode reward: [(0, '4.585')] +[2024-11-08 05:52:42,931][41694] Fps is (10 sec: 5734.6, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 181948416. Throughput: 0: 1604.1. Samples: 40483258. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:52:42,933][41694] Avg episode reward: [(0, '4.638')] +[2024-11-08 05:52:45,552][42004] Updated weights for policy 0, policy_version 44426 (0.0027) +[2024-11-08 05:52:47,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 181985280. Throughput: 0: 1606.8. Samples: 40488808. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:52:47,933][41694] Avg episode reward: [(0, '4.351')] +[2024-11-08 05:52:51,101][42004] Updated weights for policy 0, policy_version 44436 (0.0035) +[2024-11-08 05:52:52,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6553.6, 300 sec: 6706.3). Total num frames: 182018048. Throughput: 0: 1686.0. Samples: 40499830. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:52:52,935][41694] Avg episode reward: [(0, '4.417')] +[2024-11-08 05:52:57,385][42004] Updated weights for policy 0, policy_version 44446 (0.0031) +[2024-11-08 05:52:57,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 6706.3). Total num frames: 182054912. Throughput: 0: 1683.6. Samples: 40509568. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:52:57,933][41694] Avg episode reward: [(0, '4.272')] +[2024-11-08 05:53:02,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6553.6, 300 sec: 6761.9). Total num frames: 182087680. Throughput: 0: 1686.4. Samples: 40515016. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:53:02,932][41694] Avg episode reward: [(0, '4.442')] +[2024-11-08 05:53:03,060][42004] Updated weights for policy 0, policy_version 44456 (0.0031) +[2024-11-08 05:53:07,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6895.0, 300 sec: 6775.8). Total num frames: 182128640. Throughput: 0: 1703.7. Samples: 40525976. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:53:07,933][41694] Avg episode reward: [(0, '4.298')] +[2024-11-08 05:53:08,510][42004] Updated weights for policy 0, policy_version 44466 (0.0030) +[2024-11-08 05:53:12,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6761.9). Total num frames: 182161408. Throughput: 0: 1716.3. Samples: 40537112. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:53:12,933][41694] Avg episode reward: [(0, '4.322')] +[2024-11-08 05:53:16,213][42004] Updated weights for policy 0, policy_version 44476 (0.0036) +[2024-11-08 05:53:17,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.9, 300 sec: 6692.4). Total num frames: 182181888. Throughput: 0: 1659.3. Samples: 40539834. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:53:17,934][41694] Avg episode reward: [(0, '4.558')] +[2024-11-08 05:53:21,874][42004] Updated weights for policy 0, policy_version 44486 (0.0042) +[2024-11-08 05:53:22,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.2, 300 sec: 6692.4). Total num frames: 182218752. Throughput: 0: 1627.5. Samples: 40549656. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:53:22,936][41694] Avg episode reward: [(0, '4.422')] +[2024-11-08 05:53:27,742][42004] Updated weights for policy 0, policy_version 44496 (0.0025) +[2024-11-08 05:53:27,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.2, 300 sec: 6678.6). Total num frames: 182255616. Throughput: 0: 1710.0. Samples: 40560208. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:53:27,933][41694] Avg episode reward: [(0, '4.665')] +[2024-11-08 05:53:32,932][41694] Fps is (10 sec: 7372.2, 60 sec: 6690.1, 300 sec: 6692.5). Total num frames: 182292480. Throughput: 0: 1699.2. Samples: 40565274. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:53:32,935][41694] Avg episode reward: [(0, '4.655')] +[2024-11-08 05:53:33,323][42004] Updated weights for policy 0, policy_version 44506 (0.0019) +[2024-11-08 05:53:37,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 182325248. Throughput: 0: 1703.8. Samples: 40576500. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:53:37,933][41694] Avg episode reward: [(0, '4.358')] +[2024-11-08 05:53:37,948][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000044514_182329344.pth... +[2024-11-08 05:53:38,058][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000044122_180723712.pth +[2024-11-08 05:53:39,011][42004] Updated weights for policy 0, policy_version 44516 (0.0026) +[2024-11-08 05:53:42,931][41694] Fps is (10 sec: 7373.4, 60 sec: 6963.2, 300 sec: 6748.0). Total num frames: 182366208. Throughput: 0: 1729.0. Samples: 40587374. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:53:42,933][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 05:53:44,627][42004] Updated weights for policy 0, policy_version 44526 (0.0030) +[2024-11-08 05:53:47,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6894.9, 300 sec: 6734.1). Total num frames: 182398976. Throughput: 0: 1730.9. Samples: 40592906. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:53:47,933][41694] Avg episode reward: [(0, '4.671')] +[2024-11-08 05:53:52,496][42004] Updated weights for policy 0, policy_version 44536 (0.0032) +[2024-11-08 05:53:52,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6690.2, 300 sec: 6678.6). Total num frames: 182419456. Throughput: 0: 1638.4. Samples: 40599706. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:53:52,933][41694] Avg episode reward: [(0, '4.547')] +[2024-11-08 05:53:57,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 182456320. Throughput: 0: 1638.6. Samples: 40610850. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:53:57,935][41694] Avg episode reward: [(0, '4.318')] +[2024-11-08 05:53:57,941][42004] Updated weights for policy 0, policy_version 44546 (0.0027) +[2024-11-08 05:54:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6692.5). Total num frames: 182493184. Throughput: 0: 1697.7. Samples: 40616230. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:54:02,934][41694] Avg episode reward: [(0, '4.534')] +[2024-11-08 05:54:03,988][42004] Updated weights for policy 0, policy_version 44556 (0.0026) +[2024-11-08 05:54:07,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6755.4). Total num frames: 182530048. Throughput: 0: 1704.6. Samples: 40626364. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:54:07,934][41694] Avg episode reward: [(0, '4.729')] +[2024-11-08 05:54:09,671][42004] Updated weights for policy 0, policy_version 44566 (0.0034) +[2024-11-08 05:54:12,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 182562816. Throughput: 0: 1714.7. Samples: 40637370. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:54:12,934][41694] Avg episode reward: [(0, '4.623')] +[2024-11-08 05:54:15,109][42004] Updated weights for policy 0, policy_version 44576 (0.0025) +[2024-11-08 05:54:17,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6963.2, 300 sec: 6748.0). Total num frames: 182599680. Throughput: 0: 1727.6. Samples: 40643014. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:54:17,935][41694] Avg episode reward: [(0, '4.290')] +[2024-11-08 05:54:20,725][42004] Updated weights for policy 0, policy_version 44586 (0.0032) +[2024-11-08 05:54:25,105][41694] Fps is (10 sec: 6056.5, 60 sec: 6719.8, 300 sec: 6698.6). Total num frames: 182636544. Throughput: 0: 1646.8. Samples: 40654186. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:54:25,106][41694] Avg episode reward: [(0, '4.509')] +[2024-11-08 05:54:27,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 182657024. Throughput: 0: 1621.6. Samples: 40660348. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:54:27,934][41694] Avg episode reward: [(0, '4.453')] +[2024-11-08 05:54:28,861][42004] Updated weights for policy 0, policy_version 44596 (0.0023) +[2024-11-08 05:54:32,931][41694] Fps is (10 sec: 6803.5, 60 sec: 6622.0, 300 sec: 6664.7). Total num frames: 182689792. Throughput: 0: 1614.9. Samples: 40665578. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:54:32,933][41694] Avg episode reward: [(0, '4.508')] +[2024-11-08 05:54:35,036][42004] Updated weights for policy 0, policy_version 44606 (0.0037) +[2024-11-08 05:54:37,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6621.9, 300 sec: 6664.7). Total num frames: 182722560. Throughput: 0: 1689.3. Samples: 40675726. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:54:37,933][41694] Avg episode reward: [(0, '4.625')] +[2024-11-08 05:54:40,995][42004] Updated weights for policy 0, policy_version 44616 (0.0028) +[2024-11-08 05:54:42,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6553.6, 300 sec: 6728.1). Total num frames: 182759424. Throughput: 0: 1676.6. Samples: 40686296. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:54:42,934][41694] Avg episode reward: [(0, '4.314')] +[2024-11-08 05:54:46,373][42004] Updated weights for policy 0, policy_version 44626 (0.0029) +[2024-11-08 05:54:47,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 182796288. Throughput: 0: 1676.9. Samples: 40691690. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:54:47,933][41694] Avg episode reward: [(0, '4.504')] +[2024-11-08 05:54:51,636][42004] Updated weights for policy 0, policy_version 44636 (0.0021) +[2024-11-08 05:54:52,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6963.2, 300 sec: 6734.1). Total num frames: 182837248. Throughput: 0: 1714.8. Samples: 40703528. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:54:52,938][41694] Avg episode reward: [(0, '4.184')] +[2024-11-08 05:54:57,130][42004] Updated weights for policy 0, policy_version 44646 (0.0024) +[2024-11-08 05:54:59,894][41694] Fps is (10 sec: 6163.1, 60 sec: 6676.5, 300 sec: 6689.6). Total num frames: 182870016. Throughput: 0: 1652.7. Samples: 40714986. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:54:59,896][41694] Avg episode reward: [(0, '4.242')] +[2024-11-08 05:55:02,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6621.9, 300 sec: 6664.7). Total num frames: 182890496. Throughput: 0: 1617.9. Samples: 40715818. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:55:02,934][41694] Avg episode reward: [(0, '4.386')] +[2024-11-08 05:55:05,887][42004] Updated weights for policy 0, policy_version 44656 (0.0036) +[2024-11-08 05:55:07,932][41694] Fps is (10 sec: 6625.1, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 182923264. Throughput: 0: 1666.2. Samples: 40725546. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:55:07,934][41694] Avg episode reward: [(0, '4.320')] +[2024-11-08 05:55:12,045][42004] Updated weights for policy 0, policy_version 44666 (0.0052) +[2024-11-08 05:55:12,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 182956032. Throughput: 0: 1668.8. Samples: 40735444. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:55:12,932][41694] Avg episode reward: [(0, '4.612')] +[2024-11-08 05:55:17,760][42004] Updated weights for policy 0, policy_version 44676 (0.0026) +[2024-11-08 05:55:17,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6553.6, 300 sec: 6714.3). Total num frames: 182992896. Throughput: 0: 1665.0. Samples: 40740502. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:55:17,933][41694] Avg episode reward: [(0, '4.356')] +[2024-11-08 05:55:22,881][42004] Updated weights for policy 0, policy_version 44686 (0.0030) +[2024-11-08 05:55:22,931][41694] Fps is (10 sec: 7782.3, 60 sec: 6870.7, 300 sec: 6734.1). Total num frames: 183033856. Throughput: 0: 1693.6. Samples: 40751940. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:55:22,933][41694] Avg episode reward: [(0, '4.563')] +[2024-11-08 05:55:27,932][41694] Fps is (10 sec: 7782.1, 60 sec: 6894.9, 300 sec: 6734.1). Total num frames: 183070720. Throughput: 0: 1726.9. Samples: 40764006. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:55:27,936][41694] Avg episode reward: [(0, '4.345')] +[2024-11-08 05:55:28,212][42004] Updated weights for policy 0, policy_version 44696 (0.0037) +[2024-11-08 05:55:34,587][41694] Fps is (10 sec: 5974.3, 60 sec: 6709.8, 300 sec: 6682.7). Total num frames: 183103488. Throughput: 0: 1672.9. Samples: 40769738. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:55:34,590][41694] Avg episode reward: [(0, '4.485')] +[2024-11-08 05:55:35,911][42004] Updated weights for policy 0, policy_version 44706 (0.0028) +[2024-11-08 05:55:37,932][41694] Fps is (10 sec: 5734.6, 60 sec: 6758.4, 300 sec: 6678.6). Total num frames: 183128064. Throughput: 0: 1622.9. Samples: 40776560. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:55:37,934][41694] Avg episode reward: [(0, '4.426')] +[2024-11-08 05:55:37,950][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000044709_183128064.pth... +[2024-11-08 05:55:38,077][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000044316_181518336.pth +[2024-11-08 05:55:41,790][42004] Updated weights for policy 0, policy_version 44716 (0.0026) +[2024-11-08 05:55:42,935][41694] Fps is (10 sec: 6869.1, 60 sec: 6689.8, 300 sec: 6664.6). Total num frames: 183160832. Throughput: 0: 1671.0. Samples: 40786908. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:55:42,937][41694] Avg episode reward: [(0, '4.403')] +[2024-11-08 05:55:47,589][42004] Updated weights for policy 0, policy_version 44726 (0.0033) +[2024-11-08 05:55:47,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 6664.7). Total num frames: 183197696. Throughput: 0: 1689.9. Samples: 40791864. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:55:47,933][41694] Avg episode reward: [(0, '4.288')] +[2024-11-08 05:55:52,706][42004] Updated weights for policy 0, policy_version 44736 (0.0029) +[2024-11-08 05:55:52,932][41694] Fps is (10 sec: 7784.9, 60 sec: 6690.1, 300 sec: 6734.1). Total num frames: 183238656. Throughput: 0: 1737.4. Samples: 40803730. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:55:52,935][41694] Avg episode reward: [(0, '4.424')] +[2024-11-08 05:55:57,872][42004] Updated weights for policy 0, policy_version 44746 (0.0025) +[2024-11-08 05:55:57,932][41694] Fps is (10 sec: 8191.9, 60 sec: 7057.5, 300 sec: 6748.0). Total num frames: 183279616. Throughput: 0: 1785.6. Samples: 40815796. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:55:57,933][41694] Avg episode reward: [(0, '4.655')] +[2024-11-08 05:56:02,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7099.7, 300 sec: 6748.0). Total num frames: 183316480. Throughput: 0: 1797.9. Samples: 40821406. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:56:02,933][41694] Avg episode reward: [(0, '4.436')] +[2024-11-08 05:56:03,366][42004] Updated weights for policy 0, policy_version 44756 (0.0030) +[2024-11-08 05:56:09,415][41694] Fps is (10 sec: 5707.2, 60 sec: 6861.9, 300 sec: 6714.2). Total num frames: 183345152. Throughput: 0: 1728.5. Samples: 40832288. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:56:09,417][41694] Avg episode reward: [(0, '4.337')] +[2024-11-08 05:56:12,370][42004] Updated weights for policy 0, policy_version 44766 (0.0040) +[2024-11-08 05:56:12,932][41694] Fps is (10 sec: 4505.6, 60 sec: 6758.4, 300 sec: 6692.4). Total num frames: 183361536. Throughput: 0: 1633.4. Samples: 40837510. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:56:12,934][41694] Avg episode reward: [(0, '4.613')] +[2024-11-08 05:56:17,932][41694] Fps is (10 sec: 5290.2, 60 sec: 6621.9, 300 sec: 6664.7). Total num frames: 183390208. Throughput: 0: 1657.7. Samples: 40841590. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:56:17,933][41694] Avg episode reward: [(0, '4.615')] +[2024-11-08 05:56:19,433][42004] Updated weights for policy 0, policy_version 44776 (0.0029) +[2024-11-08 05:56:22,932][41694] Fps is (10 sec: 6553.3, 60 sec: 6553.5, 300 sec: 6664.7). Total num frames: 183427072. Throughput: 0: 1654.6. Samples: 40851016. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:56:22,934][41694] Avg episode reward: [(0, '4.465')] +[2024-11-08 05:56:24,878][42004] Updated weights for policy 0, policy_version 44786 (0.0029) +[2024-11-08 05:56:27,932][41694] Fps is (10 sec: 7782.0, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 183468032. Throughput: 0: 1680.8. Samples: 40862538. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:56:27,934][41694] Avg episode reward: [(0, '4.721')] +[2024-11-08 05:56:30,192][42004] Updated weights for policy 0, policy_version 44796 (0.0031) +[2024-11-08 05:56:32,932][41694] Fps is (10 sec: 7782.7, 60 sec: 6879.9, 300 sec: 6734.1). Total num frames: 183504896. Throughput: 0: 1700.8. Samples: 40868400. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:56:32,934][41694] Avg episode reward: [(0, '4.551')] +[2024-11-08 05:56:35,516][42004] Updated weights for policy 0, policy_version 44806 (0.0025) +[2024-11-08 05:56:37,931][41694] Fps is (10 sec: 7373.2, 60 sec: 6894.9, 300 sec: 6734.1). Total num frames: 183541760. Throughput: 0: 1690.9. Samples: 40879820. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:56:37,933][41694] Avg episode reward: [(0, '4.419')] +[2024-11-08 05:56:41,078][42004] Updated weights for policy 0, policy_version 44816 (0.0030) +[2024-11-08 05:56:44,128][41694] Fps is (10 sec: 5853.5, 60 sec: 6693.6, 300 sec: 6693.1). Total num frames: 183570432. Throughput: 0: 1506.4. Samples: 40885388. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:56:44,129][41694] Avg episode reward: [(0, '4.417')] +[2024-11-08 05:56:47,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.9, 300 sec: 6678.6). Total num frames: 183595008. Throughput: 0: 1568.8. Samples: 40892002. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:56:47,933][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 05:56:49,357][42004] Updated weights for policy 0, policy_version 44826 (0.0023) +[2024-11-08 05:56:52,931][41694] Fps is (10 sec: 6513.6, 60 sec: 6485.4, 300 sec: 6678.6). Total num frames: 183627776. Throughput: 0: 1597.2. Samples: 40901792. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:56:52,937][41694] Avg episode reward: [(0, '4.409')] +[2024-11-08 05:56:55,357][42004] Updated weights for policy 0, policy_version 44836 (0.0036) +[2024-11-08 05:56:57,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6485.3, 300 sec: 6692.4). Total num frames: 183668736. Throughput: 0: 1672.4. Samples: 40912770. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:56:57,934][41694] Avg episode reward: [(0, '4.450')] +[2024-11-08 05:57:00,584][42004] Updated weights for policy 0, policy_version 44846 (0.0024) +[2024-11-08 05:57:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6417.1, 300 sec: 6734.1). Total num frames: 183701504. Throughput: 0: 1711.6. Samples: 40918614. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:57:02,933][41694] Avg episode reward: [(0, '4.599')] +[2024-11-08 05:57:06,232][42004] Updated weights for policy 0, policy_version 44856 (0.0023) +[2024-11-08 05:57:07,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6789.7, 300 sec: 6748.0). Total num frames: 183742464. Throughput: 0: 1749.7. Samples: 40929752. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:57:07,933][41694] Avg episode reward: [(0, '4.553')] +[2024-11-08 05:57:11,712][42004] Updated weights for policy 0, policy_version 44866 (0.0032) +[2024-11-08 05:57:12,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6963.2, 300 sec: 6761.9). Total num frames: 183779328. Throughput: 0: 1744.4. Samples: 40941034. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:57:12,934][41694] Avg episode reward: [(0, '4.376')] +[2024-11-08 05:57:18,968][41694] Fps is (10 sec: 5567.0, 60 sec: 6777.9, 300 sec: 6710.5). Total num frames: 183803904. Throughput: 0: 1695.6. Samples: 40946460. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:57:18,969][41694] Avg episode reward: [(0, '4.422')] +[2024-11-08 05:57:19,810][42004] Updated weights for policy 0, policy_version 44876 (0.0019) +[2024-11-08 05:57:22,933][41694] Fps is (10 sec: 4914.5, 60 sec: 6690.0, 300 sec: 6692.4). Total num frames: 183828480. Throughput: 0: 1612.0. Samples: 40952360. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:57:22,935][41694] Avg episode reward: [(0, '4.550')] +[2024-11-08 05:57:26,230][42004] Updated weights for policy 0, policy_version 44886 (0.0046) +[2024-11-08 05:57:27,932][41694] Fps is (10 sec: 6396.8, 60 sec: 6553.6, 300 sec: 6678.5). Total num frames: 183861248. Throughput: 0: 1750.7. Samples: 40962076. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:57:27,935][41694] Avg episode reward: [(0, '4.331')] +[2024-11-08 05:57:31,924][42004] Updated weights for policy 0, policy_version 44896 (0.0032) +[2024-11-08 05:57:32,932][41694] Fps is (10 sec: 6964.1, 60 sec: 6553.6, 300 sec: 6678.6). Total num frames: 183898112. Throughput: 0: 1673.2. Samples: 40967298. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:57:32,933][41694] Avg episode reward: [(0, '4.333')] +[2024-11-08 05:57:37,136][42004] Updated weights for policy 0, policy_version 44906 (0.0024) +[2024-11-08 05:57:37,932][41694] Fps is (10 sec: 7783.0, 60 sec: 6621.9, 300 sec: 6748.0). Total num frames: 183939072. Throughput: 0: 1716.8. Samples: 40979048. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:57:37,933][41694] Avg episode reward: [(0, '4.538')] +[2024-11-08 05:57:37,949][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000044907_183939072.pth... +[2024-11-08 05:57:38,099][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000044514_182329344.pth +[2024-11-08 05:57:42,331][42004] Updated weights for policy 0, policy_version 44916 (0.0026) +[2024-11-08 05:57:42,932][41694] Fps is (10 sec: 8191.9, 60 sec: 6965.5, 300 sec: 6761.9). Total num frames: 183980032. Throughput: 0: 1737.9. Samples: 40990976. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:57:42,933][41694] Avg episode reward: [(0, '4.494')] +[2024-11-08 05:57:47,594][42004] Updated weights for policy 0, policy_version 44926 (0.0034) +[2024-11-08 05:57:47,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7031.5, 300 sec: 6775.8). Total num frames: 184016896. Throughput: 0: 1735.1. Samples: 40996692. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:57:47,933][41694] Avg episode reward: [(0, '4.366')] +[2024-11-08 05:57:53,540][41694] Fps is (10 sec: 5791.4, 60 sec: 6825.7, 300 sec: 6720.2). Total num frames: 184041472. Throughput: 0: 1720.4. Samples: 41008218. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:57:53,543][41694] Avg episode reward: [(0, '4.634')] +[2024-11-08 05:57:55,517][42004] Updated weights for policy 0, policy_version 44936 (0.0033) +[2024-11-08 05:57:57,936][41694] Fps is (10 sec: 5322.4, 60 sec: 6689.7, 300 sec: 6720.1). Total num frames: 184070144. Throughput: 0: 1628.2. Samples: 41014312. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:57:57,938][41694] Avg episode reward: [(0, '4.576')] +[2024-11-08 05:58:01,782][42004] Updated weights for policy 0, policy_version 44946 (0.0035) +[2024-11-08 05:58:02,931][41694] Fps is (10 sec: 6542.4, 60 sec: 6690.1, 300 sec: 6692.4). Total num frames: 184102912. Throughput: 0: 1653.1. Samples: 41019134. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:58:02,933][41694] Avg episode reward: [(0, '4.563')] +[2024-11-08 05:58:07,254][42004] Updated weights for policy 0, policy_version 44956 (0.0023) +[2024-11-08 05:58:07,932][41694] Fps is (10 sec: 7376.0, 60 sec: 6690.1, 300 sec: 6720.2). Total num frames: 184143872. Throughput: 0: 1725.5. Samples: 41030006. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:58:07,933][41694] Avg episode reward: [(0, '4.495')] +[2024-11-08 05:58:12,329][42004] Updated weights for policy 0, policy_version 44966 (0.0035) +[2024-11-08 05:58:12,931][41694] Fps is (10 sec: 8192.0, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 184184832. Throughput: 0: 1777.8. Samples: 41042074. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:58:12,934][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 05:58:17,470][42004] Updated weights for policy 0, policy_version 44976 (0.0026) +[2024-11-08 05:58:17,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7085.6, 300 sec: 6789.6). Total num frames: 184221696. Throughput: 0: 1793.2. Samples: 41047994. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:58:17,934][41694] Avg episode reward: [(0, '4.535')] +[2024-11-08 05:58:22,811][42004] Updated weights for policy 0, policy_version 44986 (0.0023) +[2024-11-08 05:58:22,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7236.4, 300 sec: 6803.5). Total num frames: 184262656. Throughput: 0: 1791.5. Samples: 41059664. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:58:22,933][41694] Avg episode reward: [(0, '4.474')] +[2024-11-08 05:58:28,371][41694] Fps is (10 sec: 5885.1, 60 sec: 6980.4, 300 sec: 6738.0). Total num frames: 184283136. Throughput: 0: 1638.9. Samples: 41065446. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:58:28,373][41694] Avg episode reward: [(0, '4.495')] +[2024-11-08 05:58:31,499][42004] Updated weights for policy 0, policy_version 44996 (0.0028) +[2024-11-08 05:58:32,931][41694] Fps is (10 sec: 4505.6, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 184307712. Throughput: 0: 1657.4. Samples: 41071276. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:58:32,933][41694] Avg episode reward: [(0, '4.747')] +[2024-11-08 05:58:37,774][42004] Updated weights for policy 0, policy_version 45006 (0.0029) +[2024-11-08 05:58:37,931][41694] Fps is (10 sec: 6426.8, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 184344576. Throughput: 0: 1620.2. Samples: 41080140. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:58:37,932][41694] Avg episode reward: [(0, '4.581')] +[2024-11-08 05:58:42,932][41694] Fps is (10 sec: 7372.3, 60 sec: 6690.1, 300 sec: 6720.2). Total num frames: 184381440. Throughput: 0: 1719.9. Samples: 41091700. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:58:42,934][41694] Avg episode reward: [(0, '4.626')] +[2024-11-08 05:58:43,043][42004] Updated weights for policy 0, policy_version 45016 (0.0026) +[2024-11-08 05:58:47,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 184422400. Throughput: 0: 1744.4. Samples: 41097634. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:58:47,933][41694] Avg episode reward: [(0, '4.375')] +[2024-11-08 05:58:48,382][42004] Updated weights for policy 0, policy_version 45026 (0.0023) +[2024-11-08 05:58:52,932][41694] Fps is (10 sec: 7782.8, 60 sec: 7034.6, 300 sec: 6789.6). Total num frames: 184459264. Throughput: 0: 1765.0. Samples: 41109432. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:58:52,934][41694] Avg episode reward: [(0, '4.414')] +[2024-11-08 05:58:53,518][42004] Updated weights for policy 0, policy_version 45036 (0.0030) +[2024-11-08 05:58:57,932][41694] Fps is (10 sec: 7782.0, 60 sec: 7168.5, 300 sec: 6803.5). Total num frames: 184500224. Throughput: 0: 1760.1. Samples: 41121278. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:58:57,936][41694] Avg episode reward: [(0, '4.369')] +[2024-11-08 05:58:58,788][42004] Updated weights for policy 0, policy_version 45046 (0.0026) +[2024-11-08 05:59:03,090][41694] Fps is (10 sec: 5644.7, 60 sec: 6876.7, 300 sec: 6730.5). Total num frames: 184516608. Throughput: 0: 1739.2. Samples: 41126536. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:59:03,093][41694] Avg episode reward: [(0, '4.424')] +[2024-11-08 05:59:07,709][42004] Updated weights for policy 0, policy_version 45056 (0.0033) +[2024-11-08 05:59:07,932][41694] Fps is (10 sec: 4915.4, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 184549376. Throughput: 0: 1601.4. Samples: 41131728. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:59:07,938][41694] Avg episode reward: [(0, '4.315')] +[2024-11-08 05:59:12,932][41694] Fps is (10 sec: 6659.3, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 184582144. Throughput: 0: 1727.5. Samples: 41142424. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:59:12,933][41694] Avg episode reward: [(0, '4.200')] +[2024-11-08 05:59:13,583][42004] Updated weights for policy 0, policy_version 45066 (0.0031) +[2024-11-08 05:59:17,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6784.1). Total num frames: 184623104. Throughput: 0: 1703.4. Samples: 41147928. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:59:17,935][41694] Avg episode reward: [(0, '4.514')] +[2024-11-08 05:59:18,759][42004] Updated weights for policy 0, policy_version 45076 (0.0034) +[2024-11-08 05:59:22,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6621.9, 300 sec: 6789.6). Total num frames: 184659968. Throughput: 0: 1762.8. Samples: 41159464. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:59:22,933][41694] Avg episode reward: [(0, '4.557')] +[2024-11-08 05:59:24,167][42004] Updated weights for policy 0, policy_version 45086 (0.0027) +[2024-11-08 05:59:27,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6945.9, 300 sec: 6803.5). Total num frames: 184696832. Throughput: 0: 1760.6. Samples: 41170928. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:59:27,933][41694] Avg episode reward: [(0, '4.409')] +[2024-11-08 05:59:29,917][42004] Updated weights for policy 0, policy_version 45096 (0.0026) +[2024-11-08 05:59:32,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7099.7, 300 sec: 6817.4). Total num frames: 184733696. Throughput: 0: 1745.0. Samples: 41176158. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 05:59:32,935][41694] Avg episode reward: [(0, '4.501')] +[2024-11-08 05:59:37,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 184750080. Throughput: 0: 1694.5. Samples: 41185684. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 05:59:37,934][41694] Avg episode reward: [(0, '4.449')] +[2024-11-08 05:59:37,964][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000045105_184750080.pth... +[2024-11-08 05:59:38,158][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000044709_183128064.pth +[2024-11-08 05:59:38,331][42004] Updated weights for policy 0, policy_version 45106 (0.0029) +[2024-11-08 05:59:42,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6690.1, 300 sec: 6734.1). Total num frames: 184782848. Throughput: 0: 1572.7. Samples: 41192048. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 05:59:42,934][41694] Avg episode reward: [(0, '4.492')] +[2024-11-08 05:59:44,610][42004] Updated weights for policy 0, policy_version 45116 (0.0020) +[2024-11-08 05:59:47,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 184819712. Throughput: 0: 1575.3. Samples: 41197176. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 05:59:47,933][41694] Avg episode reward: [(0, '4.395')] +[2024-11-08 05:59:49,871][42004] Updated weights for policy 0, policy_version 45126 (0.0022) +[2024-11-08 05:59:52,933][41694] Fps is (10 sec: 7372.4, 60 sec: 6621.8, 300 sec: 6779.2). Total num frames: 184856576. Throughput: 0: 1714.8. Samples: 41208894. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 05:59:52,935][41694] Avg episode reward: [(0, '4.568')] +[2024-11-08 05:59:55,635][42004] Updated weights for policy 0, policy_version 45136 (0.0022) +[2024-11-08 05:59:57,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6553.6, 300 sec: 6789.6). Total num frames: 184893440. Throughput: 0: 1714.4. Samples: 41219574. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 05:59:57,933][41694] Avg episode reward: [(0, '4.342')] +[2024-11-08 06:00:01,047][42004] Updated weights for policy 0, policy_version 45146 (0.0031) +[2024-11-08 06:00:02,932][41694] Fps is (10 sec: 6963.9, 60 sec: 6844.8, 300 sec: 6789.6). Total num frames: 184926208. Throughput: 0: 1720.5. Samples: 41225352. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:00:02,933][41694] Avg episode reward: [(0, '4.456')] +[2024-11-08 06:00:06,840][42004] Updated weights for policy 0, policy_version 45156 (0.0031) +[2024-11-08 06:00:07,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 6803.5). Total num frames: 184963072. Throughput: 0: 1697.9. Samples: 41235868. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:00:07,934][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 06:00:12,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 184979456. Throughput: 0: 1586.9. Samples: 41242338. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:00:12,933][41694] Avg episode reward: [(0, '4.426')] +[2024-11-08 06:00:16,352][42004] Updated weights for policy 0, policy_version 45166 (0.0031) +[2024-11-08 06:00:17,932][41694] Fps is (10 sec: 4505.5, 60 sec: 6417.1, 300 sec: 6692.4). Total num frames: 185008128. Throughput: 0: 1542.4. Samples: 41245564. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:00:17,936][41694] Avg episode reward: [(0, '4.443')] +[2024-11-08 06:00:22,308][42004] Updated weights for policy 0, policy_version 45176 (0.0029) +[2024-11-08 06:00:22,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6692.5). Total num frames: 185044992. Throughput: 0: 1540.3. Samples: 41254998. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:00:22,933][41694] Avg episode reward: [(0, '4.426')] +[2024-11-08 06:00:27,475][42004] Updated weights for policy 0, policy_version 45186 (0.0027) +[2024-11-08 06:00:27,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6417.1, 300 sec: 6744.2). Total num frames: 185081856. Throughput: 0: 1665.5. Samples: 41266994. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:00:27,934][41694] Avg episode reward: [(0, '4.509')] +[2024-11-08 06:00:32,571][42004] Updated weights for policy 0, policy_version 45196 (0.0027) +[2024-11-08 06:00:32,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6485.4, 300 sec: 6761.9). Total num frames: 185122816. Throughput: 0: 1684.4. Samples: 41272976. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:00:32,934][41694] Avg episode reward: [(0, '4.487')] +[2024-11-08 06:00:37,709][42004] Updated weights for policy 0, policy_version 45206 (0.0024) +[2024-11-08 06:00:37,933][41694] Fps is (10 sec: 8190.5, 60 sec: 6894.7, 300 sec: 6789.7). Total num frames: 185163776. Throughput: 0: 1694.8. Samples: 41285162. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:00:37,937][41694] Avg episode reward: [(0, '4.764')] +[2024-11-08 06:00:42,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6963.3, 300 sec: 6789.6). Total num frames: 185200640. Throughput: 0: 1709.8. Samples: 41296514. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:00:42,934][41694] Avg episode reward: [(0, '4.562')] +[2024-11-08 06:00:43,287][42004] Updated weights for policy 0, policy_version 45216 (0.0029) +[2024-11-08 06:00:47,932][41694] Fps is (10 sec: 5325.7, 60 sec: 6621.8, 300 sec: 6706.3). Total num frames: 185217024. Throughput: 0: 1682.1. Samples: 41301046. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:00:47,933][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 06:00:52,140][42004] Updated weights for policy 0, policy_version 45226 (0.0032) +[2024-11-08 06:00:52,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6553.7, 300 sec: 6678.6). Total num frames: 185249792. Throughput: 0: 1577.8. Samples: 41306868. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:00:52,933][41694] Avg episode reward: [(0, '4.389')] +[2024-11-08 06:00:57,339][42004] Updated weights for policy 0, policy_version 45236 (0.0033) +[2024-11-08 06:00:57,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6621.9, 300 sec: 6692.4). Total num frames: 185290752. Throughput: 0: 1694.8. Samples: 41318604. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:00:57,933][41694] Avg episode reward: [(0, '4.618')] +[2024-11-08 06:01:02,679][42004] Updated weights for policy 0, policy_version 45246 (0.0028) +[2024-11-08 06:01:02,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6690.1, 300 sec: 6754.2). Total num frames: 185327616. Throughput: 0: 1754.8. Samples: 41324532. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:01:02,934][41694] Avg episode reward: [(0, '4.590')] +[2024-11-08 06:01:07,784][42004] Updated weights for policy 0, policy_version 45256 (0.0028) +[2024-11-08 06:01:07,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 185368576. Throughput: 0: 1801.4. Samples: 41336062. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:01:07,933][41694] Avg episode reward: [(0, '4.489')] +[2024-11-08 06:01:12,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 6817.4). Total num frames: 185401344. Throughput: 0: 1778.4. Samples: 41347024. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:01:12,934][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 06:01:13,771][42004] Updated weights for policy 0, policy_version 45266 (0.0029) +[2024-11-08 06:01:17,932][41694] Fps is (10 sec: 6962.7, 60 sec: 7167.9, 300 sec: 6817.4). Total num frames: 185438208. Throughput: 0: 1756.3. Samples: 41352010. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:01:17,935][41694] Avg episode reward: [(0, '4.201')] +[2024-11-08 06:01:22,340][42004] Updated weights for policy 0, policy_version 45276 (0.0032) +[2024-11-08 06:01:22,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6758.4, 300 sec: 6720.2). Total num frames: 185450496. Throughput: 0: 1662.1. Samples: 41359954. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:01:22,934][41694] Avg episode reward: [(0, '4.244')] +[2024-11-08 06:01:27,932][41694] Fps is (10 sec: 4915.5, 60 sec: 6758.4, 300 sec: 6720.2). Total num frames: 185487360. Throughput: 0: 1588.6. Samples: 41368000. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) +[2024-11-08 06:01:27,934][41694] Avg episode reward: [(0, '4.620')] +[2024-11-08 06:01:28,256][42004] Updated weights for policy 0, policy_version 45286 (0.0041) +[2024-11-08 06:01:32,932][41694] Fps is (10 sec: 7782.6, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 185528320. Throughput: 0: 1615.3. Samples: 41373736. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) +[2024-11-08 06:01:32,933][41694] Avg episode reward: [(0, '4.443')] +[2024-11-08 06:01:33,381][42004] Updated weights for policy 0, policy_version 45296 (0.0030) +[2024-11-08 06:01:37,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6690.3, 300 sec: 6789.4). Total num frames: 185565184. Throughput: 0: 1755.4. Samples: 41385862. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) +[2024-11-08 06:01:37,934][41694] Avg episode reward: [(0, '4.642')] +[2024-11-08 06:01:37,969][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000045305_185569280.pth... +[2024-11-08 06:01:38,058][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000044907_183939072.pth +[2024-11-08 06:01:38,484][42004] Updated weights for policy 0, policy_version 45306 (0.0022) +[2024-11-08 06:01:42,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 185606144. Throughput: 0: 1756.3. Samples: 41397636. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) +[2024-11-08 06:01:42,933][41694] Avg episode reward: [(0, '4.691')] +[2024-11-08 06:01:43,776][42004] Updated weights for policy 0, policy_version 45316 (0.0026) +[2024-11-08 06:01:47,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7099.8, 300 sec: 6831.3). Total num frames: 185643008. Throughput: 0: 1753.2. Samples: 41403426. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) +[2024-11-08 06:01:47,933][41694] Avg episode reward: [(0, '4.498')] +[2024-11-08 06:01:49,350][42004] Updated weights for policy 0, policy_version 45326 (0.0024) +[2024-11-08 06:01:52,932][41694] Fps is (10 sec: 6963.0, 60 sec: 7099.7, 300 sec: 6803.5). Total num frames: 185675776. Throughput: 0: 1742.0. Samples: 41414452. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) +[2024-11-08 06:01:52,934][41694] Avg episode reward: [(0, '4.417')] +[2024-11-08 06:01:57,932][41694] Fps is (10 sec: 4915.0, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 185692160. Throughput: 0: 1621.2. Samples: 41419978. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:01:57,934][41694] Avg episode reward: [(0, '4.671')] +[2024-11-08 06:01:58,016][42004] Updated weights for policy 0, policy_version 45336 (0.0025) +[2024-11-08 06:02:02,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6690.1, 300 sec: 6734.1). Total num frames: 185729024. Throughput: 0: 1620.8. Samples: 41424944. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:02:02,933][41694] Avg episode reward: [(0, '4.599')] +[2024-11-08 06:02:03,730][42004] Updated weights for policy 0, policy_version 45346 (0.0037) +[2024-11-08 06:02:07,931][41694] Fps is (10 sec: 7373.1, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 185765888. Throughput: 0: 1685.3. Samples: 41435790. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:02:07,933][41694] Avg episode reward: [(0, '4.630')] +[2024-11-08 06:02:09,056][42004] Updated weights for policy 0, policy_version 45356 (0.0031) +[2024-11-08 06:02:12,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6758.4, 300 sec: 6813.6). Total num frames: 185806848. Throughput: 0: 1770.9. Samples: 41447690. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:02:12,934][41694] Avg episode reward: [(0, '4.600')] +[2024-11-08 06:02:14,391][42004] Updated weights for policy 0, policy_version 45366 (0.0027) +[2024-11-08 06:02:17,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.5, 300 sec: 6831.3). Total num frames: 185843712. Throughput: 0: 1774.2. Samples: 41453576. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:02:17,934][41694] Avg episode reward: [(0, '4.430')] +[2024-11-08 06:02:19,655][42004] Updated weights for policy 0, policy_version 45376 (0.0029) +[2024-11-08 06:02:22,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7236.3, 300 sec: 6859.1). Total num frames: 185884672. Throughput: 0: 1760.3. Samples: 41465076. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:02:22,935][41694] Avg episode reward: [(0, '4.429')] +[2024-11-08 06:02:25,299][42004] Updated weights for policy 0, policy_version 45386 (0.0034) +[2024-11-08 06:02:27,932][41694] Fps is (10 sec: 6963.1, 60 sec: 7099.7, 300 sec: 6831.3). Total num frames: 185913344. Throughput: 0: 1726.9. Samples: 41475348. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:02:27,934][41694] Avg episode reward: [(0, '4.342')] +[2024-11-08 06:02:32,932][41694] Fps is (10 sec: 4505.7, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 185929728. Throughput: 0: 1673.7. Samples: 41478742. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:02:32,936][41694] Avg episode reward: [(0, '4.522')] +[2024-11-08 06:02:34,404][42004] Updated weights for policy 0, policy_version 45396 (0.0033) +[2024-11-08 06:02:37,932][41694] Fps is (10 sec: 5324.3, 60 sec: 6690.0, 300 sec: 6734.1). Total num frames: 185966592. Throughput: 0: 1582.6. Samples: 41485670. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:02:37,934][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 06:02:40,162][42004] Updated weights for policy 0, policy_version 45406 (0.0037) +[2024-11-08 06:02:42,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 186003456. Throughput: 0: 1707.8. Samples: 41496830. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:02:42,934][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 06:02:45,410][42004] Updated weights for policy 0, policy_version 45416 (0.0023) +[2024-11-08 06:02:47,931][41694] Fps is (10 sec: 7373.6, 60 sec: 6621.9, 300 sec: 6789.8). Total num frames: 186040320. Throughput: 0: 1728.4. Samples: 41502722. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:02:47,934][41694] Avg episode reward: [(0, '4.271')] +[2024-11-08 06:02:50,714][42004] Updated weights for policy 0, policy_version 45426 (0.0020) +[2024-11-08 06:02:52,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 6817.5). Total num frames: 186081280. Throughput: 0: 1745.0. Samples: 41514316. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:02:52,933][41694] Avg episode reward: [(0, '4.407')] +[2024-11-08 06:02:56,329][42004] Updated weights for policy 0, policy_version 45436 (0.0024) +[2024-11-08 06:02:57,931][41694] Fps is (10 sec: 7372.7, 60 sec: 7031.5, 300 sec: 6817.4). Total num frames: 186114048. Throughput: 0: 1728.4. Samples: 41525468. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:02:57,933][41694] Avg episode reward: [(0, '4.632')] +[2024-11-08 06:03:02,908][42004] Updated weights for policy 0, policy_version 45446 (0.0039) +[2024-11-08 06:03:02,932][41694] Fps is (10 sec: 6553.1, 60 sec: 6963.1, 300 sec: 6789.6). Total num frames: 186146816. Throughput: 0: 1696.5. Samples: 41529920. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:03:02,935][41694] Avg episode reward: [(0, '4.405')] +[2024-11-08 06:03:07,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6621.8, 300 sec: 6706.3). Total num frames: 186163200. Throughput: 0: 1578.0. Samples: 41536084. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:03:07,934][41694] Avg episode reward: [(0, '4.454')] +[2024-11-08 06:03:11,325][42004] Updated weights for policy 0, policy_version 45456 (0.0039) +[2024-11-08 06:03:12,932][41694] Fps is (10 sec: 4915.5, 60 sec: 6485.4, 300 sec: 6692.4). Total num frames: 186195968. Throughput: 0: 1566.0. Samples: 41545818. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:03:12,934][41694] Avg episode reward: [(0, '4.488')] +[2024-11-08 06:03:16,570][42004] Updated weights for policy 0, policy_version 45466 (0.0043) +[2024-11-08 06:03:17,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6553.6, 300 sec: 6692.4). Total num frames: 186236928. Throughput: 0: 1616.8. Samples: 41551496. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:03:17,938][41694] Avg episode reward: [(0, '4.612')] +[2024-11-08 06:03:21,797][42004] Updated weights for policy 0, policy_version 45476 (0.0030) +[2024-11-08 06:03:22,932][41694] Fps is (10 sec: 8191.8, 60 sec: 6553.6, 300 sec: 6772.0). Total num frames: 186277888. Throughput: 0: 1726.9. Samples: 41563380. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:03:22,933][41694] Avg episode reward: [(0, '4.270')] +[2024-11-08 06:03:27,015][42004] Updated weights for policy 0, policy_version 45486 (0.0027) +[2024-11-08 06:03:27,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6690.1, 300 sec: 6803.5). Total num frames: 186314752. Throughput: 0: 1740.9. Samples: 41575170. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:03:27,934][41694] Avg episode reward: [(0, '4.563')] +[2024-11-08 06:03:32,471][42004] Updated weights for policy 0, policy_version 45496 (0.0029) +[2024-11-08 06:03:32,932][41694] Fps is (10 sec: 7373.0, 60 sec: 7031.5, 300 sec: 6803.5). Total num frames: 186351616. Throughput: 0: 1729.2. Samples: 41580536. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:03:32,934][41694] Avg episode reward: [(0, '4.278')] +[2024-11-08 06:03:37,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6963.3, 300 sec: 6789.6). Total num frames: 186384384. Throughput: 0: 1700.2. Samples: 41590828. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:03:37,934][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 06:03:37,955][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000045504_186384384.pth... +[2024-11-08 06:03:38,091][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000045105_184750080.pth +[2024-11-08 06:03:41,656][42004] Updated weights for policy 0, policy_version 45506 (0.0030) +[2024-11-08 06:03:42,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6621.8, 300 sec: 6706.3). Total num frames: 186400768. Throughput: 0: 1573.1. Samples: 41596260. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:03:42,935][41694] Avg episode reward: [(0, '4.489')] +[2024-11-08 06:03:47,522][42004] Updated weights for policy 0, policy_version 45516 (0.0034) +[2024-11-08 06:03:47,931][41694] Fps is (10 sec: 4915.4, 60 sec: 6553.6, 300 sec: 6692.4). Total num frames: 186433536. Throughput: 0: 1582.6. Samples: 41601136. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:03:47,933][41694] Avg episode reward: [(0, '4.277')] +[2024-11-08 06:03:52,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6485.3, 300 sec: 6678.6). Total num frames: 186470400. Throughput: 0: 1696.3. Samples: 41612416. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:03:52,933][41694] Avg episode reward: [(0, '4.526')] +[2024-11-08 06:03:53,085][42004] Updated weights for policy 0, policy_version 45526 (0.0026) +[2024-11-08 06:03:57,932][41694] Fps is (10 sec: 7372.2, 60 sec: 6553.5, 300 sec: 6751.6). Total num frames: 186507264. Throughput: 0: 1709.6. Samples: 41622750. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:03:57,934][41694] Avg episode reward: [(0, '4.406')] +[2024-11-08 06:03:58,925][42004] Updated weights for policy 0, policy_version 45536 (0.0030) +[2024-11-08 06:04:02,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6553.7, 300 sec: 6748.0). Total num frames: 186540032. Throughput: 0: 1706.3. Samples: 41628278. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:04:02,934][41694] Avg episode reward: [(0, '4.553')] +[2024-11-08 06:04:05,170][42004] Updated weights for policy 0, policy_version 45546 (0.0033) +[2024-11-08 06:04:07,934][41694] Fps is (10 sec: 6552.7, 60 sec: 6826.5, 300 sec: 6747.9). Total num frames: 186572800. Throughput: 0: 1661.8. Samples: 41638162. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:04:07,941][41694] Avg episode reward: [(0, '4.625')] +[2024-11-08 06:04:11,465][42004] Updated weights for policy 0, policy_version 45556 (0.0031) +[2024-11-08 06:04:12,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 186605568. Throughput: 0: 1613.1. Samples: 41647758. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:04:12,934][41694] Avg episode reward: [(0, '4.423')] +[2024-11-08 06:04:17,932][41694] Fps is (10 sec: 5735.5, 60 sec: 6553.6, 300 sec: 6678.6). Total num frames: 186630144. Throughput: 0: 1545.6. Samples: 41650086. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:04:17,936][41694] Avg episode reward: [(0, '4.503')] +[2024-11-08 06:04:18,776][42004] Updated weights for policy 0, policy_version 45566 (0.0022) +[2024-11-08 06:04:22,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6485.4, 300 sec: 6678.6). Total num frames: 186667008. Throughput: 0: 1559.3. Samples: 41660994. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:04:22,933][41694] Avg episode reward: [(0, '4.500')] +[2024-11-08 06:04:24,066][42004] Updated weights for policy 0, policy_version 45576 (0.0024) +[2024-11-08 06:04:27,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6553.7, 300 sec: 6692.5). Total num frames: 186707968. Throughput: 0: 1700.4. Samples: 41672776. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:04:27,933][41694] Avg episode reward: [(0, '4.605')] +[2024-11-08 06:04:29,590][42004] Updated weights for policy 0, policy_version 45586 (0.0031) +[2024-11-08 06:04:32,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6553.6, 300 sec: 6761.9). Total num frames: 186744832. Throughput: 0: 1720.4. Samples: 41678552. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:04:32,933][41694] Avg episode reward: [(0, '4.620')] +[2024-11-08 06:04:34,945][42004] Updated weights for policy 0, policy_version 45596 (0.0030) +[2024-11-08 06:04:37,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6775.8). Total num frames: 186781696. Throughput: 0: 1726.0. Samples: 41690088. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:04:37,933][41694] Avg episode reward: [(0, '4.330')] +[2024-11-08 06:04:40,659][42004] Updated weights for policy 0, policy_version 45606 (0.0035) +[2024-11-08 06:04:42,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6894.9, 300 sec: 6761.9). Total num frames: 186814464. Throughput: 0: 1716.5. Samples: 41699992. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:04:42,935][41694] Avg episode reward: [(0, '4.504')] +[2024-11-08 06:04:49,618][41694] Fps is (10 sec: 4906.8, 60 sec: 6573.6, 300 sec: 6682.0). Total num frames: 186839040. Throughput: 0: 1635.4. Samples: 41704628. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:04:49,622][41694] Avg episode reward: [(0, '4.455')] +[2024-11-08 06:04:49,659][42004] Updated weights for policy 0, policy_version 45616 (0.0041) +[2024-11-08 06:04:52,932][41694] Fps is (10 sec: 4915.3, 60 sec: 6553.6, 300 sec: 6678.6). Total num frames: 186863616. Throughput: 0: 1612.4. Samples: 41710716. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:04:52,935][41694] Avg episode reward: [(0, '4.453')] +[2024-11-08 06:04:55,744][42004] Updated weights for policy 0, policy_version 45626 (0.0031) +[2024-11-08 06:04:57,932][41694] Fps is (10 sec: 6897.7, 60 sec: 6485.4, 300 sec: 6678.6). Total num frames: 186896384. Throughput: 0: 1628.9. Samples: 41721060. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:04:57,933][41694] Avg episode reward: [(0, '4.488')] +[2024-11-08 06:05:01,189][42004] Updated weights for policy 0, policy_version 45636 (0.0029) +[2024-11-08 06:05:02,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6553.6, 300 sec: 6678.6). Total num frames: 186933248. Throughput: 0: 1696.9. Samples: 41726448. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:05:02,934][41694] Avg episode reward: [(0, '4.326')] +[2024-11-08 06:05:06,910][42004] Updated weights for policy 0, policy_version 45646 (0.0036) +[2024-11-08 06:05:07,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6622.1, 300 sec: 6748.0). Total num frames: 186970112. Throughput: 0: 1692.7. Samples: 41737166. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:05:07,933][41694] Avg episode reward: [(0, '4.387')] +[2024-11-08 06:05:12,539][42004] Updated weights for policy 0, policy_version 45656 (0.0044) +[2024-11-08 06:05:12,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 187006976. Throughput: 0: 1680.6. Samples: 41748402. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:05:12,933][41694] Avg episode reward: [(0, '4.618')] +[2024-11-08 06:05:17,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6826.7, 300 sec: 6761.9). Total num frames: 187039744. Throughput: 0: 1660.8. Samples: 41753290. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:05:17,934][41694] Avg episode reward: [(0, '4.487')] +[2024-11-08 06:05:19,177][42004] Updated weights for policy 0, policy_version 45666 (0.0034) +[2024-11-08 06:05:23,740][41694] Fps is (10 sec: 5305.3, 60 sec: 6533.8, 300 sec: 6701.8). Total num frames: 187064320. Throughput: 0: 1580.0. Samples: 41762464. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:05:23,743][41694] Avg episode reward: [(0, '4.503')] +[2024-11-08 06:05:26,747][42004] Updated weights for policy 0, policy_version 45676 (0.0021) +[2024-11-08 06:05:27,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6485.3, 300 sec: 6692.4). Total num frames: 187097088. Throughput: 0: 1557.1. Samples: 41770062. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:05:27,935][41694] Avg episode reward: [(0, '4.562')] +[2024-11-08 06:05:32,693][42004] Updated weights for policy 0, policy_version 45686 (0.0024) +[2024-11-08 06:05:32,939][41694] Fps is (10 sec: 7124.3, 60 sec: 6416.2, 300 sec: 6664.5). Total num frames: 187129856. Throughput: 0: 1623.2. Samples: 41774946. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:05:32,945][41694] Avg episode reward: [(0, '4.652')] +[2024-11-08 06:05:37,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6417.1, 300 sec: 6664.7). Total num frames: 187166720. Throughput: 0: 1677.9. Samples: 41786220. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:05:37,944][41694] Avg episode reward: [(0, '4.414')] +[2024-11-08 06:05:38,072][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000045696_187170816.pth... +[2024-11-08 06:05:38,074][42004] Updated weights for policy 0, policy_version 45696 (0.0028) +[2024-11-08 06:05:38,208][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000045305_185569280.pth +[2024-11-08 06:05:42,932][41694] Fps is (10 sec: 7378.5, 60 sec: 6485.4, 300 sec: 6734.1). Total num frames: 187203584. Throughput: 0: 1696.8. Samples: 41797418. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:05:42,935][41694] Avg episode reward: [(0, '4.387')] +[2024-11-08 06:05:43,650][42004] Updated weights for policy 0, policy_version 45706 (0.0029) +[2024-11-08 06:05:47,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6883.6, 300 sec: 6748.0). Total num frames: 187240448. Throughput: 0: 1700.2. Samples: 41802958. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:05:47,933][41694] Avg episode reward: [(0, '4.597')] +[2024-11-08 06:05:49,404][42004] Updated weights for policy 0, policy_version 45716 (0.0024) +[2024-11-08 06:05:52,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 187273216. Throughput: 0: 1676.2. Samples: 41812594. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:05:52,934][41694] Avg episode reward: [(0, '4.401')] +[2024-11-08 06:05:55,736][42004] Updated weights for policy 0, policy_version 45726 (0.0029) +[2024-11-08 06:05:58,195][41694] Fps is (10 sec: 5188.0, 60 sec: 6592.9, 300 sec: 6658.7). Total num frames: 187293696. Throughput: 0: 1529.5. Samples: 41817634. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:05:58,199][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 06:06:02,933][41694] Fps is (10 sec: 5324.1, 60 sec: 6553.5, 300 sec: 6636.9). Total num frames: 187326464. Throughput: 0: 1581.4. Samples: 41824456. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:06:02,935][41694] Avg episode reward: [(0, '4.372')] +[2024-11-08 06:06:03,956][42004] Updated weights for policy 0, policy_version 45736 (0.0034) +[2024-11-08 06:06:07,931][41694] Fps is (10 sec: 6731.3, 60 sec: 6485.3, 300 sec: 6636.9). Total num frames: 187359232. Throughput: 0: 1627.7. Samples: 41834394. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:06:07,933][41694] Avg episode reward: [(0, '4.601')] +[2024-11-08 06:06:10,077][42004] Updated weights for policy 0, policy_version 45746 (0.0025) +[2024-11-08 06:06:12,931][41694] Fps is (10 sec: 6554.4, 60 sec: 6417.1, 300 sec: 6623.0). Total num frames: 187392000. Throughput: 0: 1653.9. Samples: 41844488. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:06:12,933][41694] Avg episode reward: [(0, '4.754')] +[2024-11-08 06:06:16,096][42004] Updated weights for policy 0, policy_version 45756 (0.0028) +[2024-11-08 06:06:17,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6485.4, 300 sec: 6706.3). Total num frames: 187428864. Throughput: 0: 1656.2. Samples: 41849460. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:06:17,933][41694] Avg episode reward: [(0, '4.623')] +[2024-11-08 06:06:21,817][42004] Updated weights for policy 0, policy_version 45766 (0.0036) +[2024-11-08 06:06:22,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6781.6, 300 sec: 6706.3). Total num frames: 187465728. Throughput: 0: 1651.9. Samples: 41860556. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:06:22,933][41694] Avg episode reward: [(0, '4.314')] +[2024-11-08 06:06:27,931][42004] Updated weights for policy 0, policy_version 45776 (0.0038) +[2024-11-08 06:06:27,932][41694] Fps is (10 sec: 6962.8, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 187498496. Throughput: 0: 1627.4. Samples: 41870652. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:06:27,934][41694] Avg episode reward: [(0, '4.359')] +[2024-11-08 06:06:32,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6486.2, 300 sec: 6623.0). Total num frames: 187518976. Throughput: 0: 1624.8. Samples: 41876074. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:06:32,934][41694] Avg episode reward: [(0, '4.363')] +[2024-11-08 06:06:35,498][42004] Updated weights for policy 0, policy_version 45786 (0.0028) +[2024-11-08 06:06:37,931][41694] Fps is (10 sec: 5734.7, 60 sec: 6485.3, 300 sec: 6609.1). Total num frames: 187555840. Throughput: 0: 1573.2. Samples: 41883390. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:06:37,933][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 06:06:41,125][42004] Updated weights for policy 0, policy_version 45796 (0.0027) +[2024-11-08 06:06:42,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6485.3, 300 sec: 6609.1). Total num frames: 187592704. Throughput: 0: 1714.5. Samples: 41894336. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:06:42,933][41694] Avg episode reward: [(0, '4.585')] +[2024-11-08 06:06:46,785][42004] Updated weights for policy 0, policy_version 45806 (0.0026) +[2024-11-08 06:06:47,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6485.4, 300 sec: 6623.0). Total num frames: 187629568. Throughput: 0: 1668.2. Samples: 41899522. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:06:47,933][41694] Avg episode reward: [(0, '4.715')] +[2024-11-08 06:06:52,158][42004] Updated weights for policy 0, policy_version 45816 (0.0034) +[2024-11-08 06:06:52,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6553.6, 300 sec: 6692.5). Total num frames: 187666432. Throughput: 0: 1701.0. Samples: 41910938. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:06:52,934][41694] Avg episode reward: [(0, '4.345')] +[2024-11-08 06:06:57,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6788.3, 300 sec: 6678.6). Total num frames: 187699200. Throughput: 0: 1709.0. Samples: 41921392. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:06:57,934][41694] Avg episode reward: [(0, '4.316')] +[2024-11-08 06:06:58,202][42004] Updated weights for policy 0, policy_version 45826 (0.0033) +[2024-11-08 06:07:02,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6758.5, 300 sec: 6664.7). Total num frames: 187731968. Throughput: 0: 1706.3. Samples: 41926246. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:07:02,934][41694] Avg episode reward: [(0, '4.682')] +[2024-11-08 06:07:04,417][42004] Updated weights for policy 0, policy_version 45836 (0.0043) +[2024-11-08 06:07:07,932][41694] Fps is (10 sec: 5324.6, 60 sec: 6553.6, 300 sec: 6595.3). Total num frames: 187752448. Throughput: 0: 1640.6. Samples: 41934382. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:07:07,934][41694] Avg episode reward: [(0, '4.595')] +[2024-11-08 06:07:12,326][42004] Updated weights for policy 0, policy_version 45846 (0.0035) +[2024-11-08 06:07:12,931][41694] Fps is (10 sec: 5734.6, 60 sec: 6621.9, 300 sec: 6595.3). Total num frames: 187789312. Throughput: 0: 1617.7. Samples: 41943446. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:07:12,934][41694] Avg episode reward: [(0, '4.526')] +[2024-11-08 06:07:17,729][42004] Updated weights for policy 0, policy_version 45856 (0.0033) +[2024-11-08 06:07:17,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6581.4). Total num frames: 187826176. Throughput: 0: 1618.3. Samples: 41948898. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:07:17,934][41694] Avg episode reward: [(0, '4.496')] +[2024-11-08 06:07:22,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6609.1). Total num frames: 187863040. Throughput: 0: 1707.4. Samples: 41960222. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:07:22,933][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 06:07:23,090][42004] Updated weights for policy 0, policy_version 45866 (0.0030) +[2024-11-08 06:07:27,933][41694] Fps is (10 sec: 7372.1, 60 sec: 6690.1, 300 sec: 6678.5). Total num frames: 187899904. Throughput: 0: 1717.0. Samples: 41971602. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:07:27,935][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 06:07:28,522][42004] Updated weights for policy 0, policy_version 45876 (0.0031) +[2024-11-08 06:07:32,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6678.6). Total num frames: 187936768. Throughput: 0: 1722.2. Samples: 41977022. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:07:32,933][41694] Avg episode reward: [(0, '4.518')] +[2024-11-08 06:07:34,671][42004] Updated weights for policy 0, policy_version 45886 (0.0034) +[2024-11-08 06:07:37,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6894.8, 300 sec: 6664.7). Total num frames: 187969536. Throughput: 0: 1693.7. Samples: 41987154. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:07:37,937][41694] Avg episode reward: [(0, '4.486')] +[2024-11-08 06:07:37,991][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000045892_187973632.pth... +[2024-11-08 06:07:38,114][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000045504_186384384.pth +[2024-11-08 06:07:42,471][42004] Updated weights for policy 0, policy_version 45896 (0.0028) +[2024-11-08 06:07:42,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6621.8, 300 sec: 6609.1). Total num frames: 187990016. Throughput: 0: 1619.1. Samples: 41994254. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:07:42,934][41694] Avg episode reward: [(0, '4.510')] +[2024-11-08 06:07:47,931][41694] Fps is (10 sec: 5734.9, 60 sec: 6621.9, 300 sec: 6595.3). Total num frames: 188026880. Throughput: 0: 1622.3. Samples: 41999248. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:07:47,933][41694] Avg episode reward: [(0, '4.386')] +[2024-11-08 06:07:48,498][42004] Updated weights for policy 0, policy_version 45906 (0.0039) +[2024-11-08 06:07:52,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6609.1). Total num frames: 188063744. Throughput: 0: 1680.8. Samples: 42010018. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:07:52,935][41694] Avg episode reward: [(0, '4.353')] +[2024-11-08 06:07:53,797][42004] Updated weights for policy 0, policy_version 45916 (0.0029) +[2024-11-08 06:07:57,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6623.0). Total num frames: 188100608. Throughput: 0: 1723.3. Samples: 42020994. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:07:57,933][41694] Avg episode reward: [(0, '4.275')] +[2024-11-08 06:07:59,525][42004] Updated weights for policy 0, policy_version 45926 (0.0026) +[2024-11-08 06:08:02,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.2, 300 sec: 6678.6). Total num frames: 188133376. Throughput: 0: 1725.7. Samples: 42026556. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:08:02,936][41694] Avg episode reward: [(0, '4.278')] +[2024-11-08 06:08:05,858][42004] Updated weights for policy 0, policy_version 45936 (0.0029) +[2024-11-08 06:08:07,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6895.0, 300 sec: 6678.6). Total num frames: 188166144. Throughput: 0: 1690.9. Samples: 42036314. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:08:07,936][41694] Avg episode reward: [(0, '4.782')] +[2024-11-08 06:08:11,738][42004] Updated weights for policy 0, policy_version 45946 (0.0028) +[2024-11-08 06:08:12,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 6664.7). Total num frames: 188203008. Throughput: 0: 1669.1. Samples: 42046710. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:08:12,933][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 06:08:17,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6553.6, 300 sec: 6581.4). Total num frames: 188219392. Throughput: 0: 1608.5. Samples: 42049404. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:08:17,933][41694] Avg episode reward: [(0, '4.510')] +[2024-11-08 06:08:19,715][42004] Updated weights for policy 0, policy_version 45956 (0.0022) +[2024-11-08 06:08:22,932][41694] Fps is (10 sec: 5324.3, 60 sec: 6553.5, 300 sec: 6581.4). Total num frames: 188256256. Throughput: 0: 1595.7. Samples: 42058962. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:08:22,934][41694] Avg episode reward: [(0, '4.332')] +[2024-11-08 06:08:25,405][42004] Updated weights for policy 0, policy_version 45966 (0.0032) +[2024-11-08 06:08:27,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6553.7, 300 sec: 6581.4). Total num frames: 188293120. Throughput: 0: 1676.4. Samples: 42069690. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:08:27,934][41694] Avg episode reward: [(0, '4.497')] +[2024-11-08 06:08:30,898][42004] Updated weights for policy 0, policy_version 45976 (0.0023) +[2024-11-08 06:08:32,931][41694] Fps is (10 sec: 7373.5, 60 sec: 6553.6, 300 sec: 6595.3). Total num frames: 188329984. Throughput: 0: 1689.9. Samples: 42075294. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:08:32,934][41694] Avg episode reward: [(0, '4.503')] +[2024-11-08 06:08:36,388][42004] Updated weights for policy 0, policy_version 45986 (0.0025) +[2024-11-08 06:08:37,932][41694] Fps is (10 sec: 7372.3, 60 sec: 6621.9, 300 sec: 6664.7). Total num frames: 188366848. Throughput: 0: 1699.1. Samples: 42086480. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:08:37,937][41694] Avg episode reward: [(0, '4.468')] +[2024-11-08 06:08:42,742][42004] Updated weights for policy 0, policy_version 45996 (0.0036) +[2024-11-08 06:08:42,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6664.7). Total num frames: 188399616. Throughput: 0: 1673.6. Samples: 42096304. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:08:42,934][41694] Avg episode reward: [(0, '4.552')] +[2024-11-08 06:08:47,931][41694] Fps is (10 sec: 6963.7, 60 sec: 6826.7, 300 sec: 6664.7). Total num frames: 188436480. Throughput: 0: 1670.7. Samples: 42101738. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:08:47,933][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 06:08:50,362][42004] Updated weights for policy 0, policy_version 46006 (0.0032) +[2024-11-08 06:08:52,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6553.6, 300 sec: 6609.2). Total num frames: 188456960. Throughput: 0: 1618.3. Samples: 42109136. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:08:52,933][41694] Avg episode reward: [(0, '4.421')] +[2024-11-08 06:08:55,849][42004] Updated weights for policy 0, policy_version 46016 (0.0054) +[2024-11-08 06:08:57,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6553.6, 300 sec: 6623.0). Total num frames: 188493824. Throughput: 0: 1633.3. Samples: 42120210. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:08:57,936][41694] Avg episode reward: [(0, '4.375')] +[2024-11-08 06:09:01,392][42004] Updated weights for policy 0, policy_version 46026 (0.0030) +[2024-11-08 06:09:02,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6621.8, 300 sec: 6636.9). Total num frames: 188530688. Throughput: 0: 1692.6. Samples: 42125572. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:09:02,934][41694] Avg episode reward: [(0, '4.476')] +[2024-11-08 06:09:07,077][42004] Updated weights for policy 0, policy_version 46036 (0.0031) +[2024-11-08 06:09:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.1, 300 sec: 6650.8). Total num frames: 188567552. Throughput: 0: 1720.2. Samples: 42136372. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:09:07,934][41694] Avg episode reward: [(0, '4.482')] +[2024-11-08 06:09:12,781][42004] Updated weights for policy 0, policy_version 46046 (0.0030) +[2024-11-08 06:09:12,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6692.4). Total num frames: 188604416. Throughput: 0: 1725.6. Samples: 42147340. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:09:12,934][41694] Avg episode reward: [(0, '4.483')] +[2024-11-08 06:09:17,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6963.2, 300 sec: 6678.6). Total num frames: 188637184. Throughput: 0: 1710.2. Samples: 42152254. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:09:17,934][41694] Avg episode reward: [(0, '4.487')] +[2024-11-08 06:09:18,701][42004] Updated weights for policy 0, policy_version 46056 (0.0027) +[2024-11-08 06:09:24,522][41694] Fps is (10 sec: 5654.3, 60 sec: 6717.0, 300 sec: 6615.1). Total num frames: 188669952. Throughput: 0: 1645.2. Samples: 42163132. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:09:24,526][41694] Avg episode reward: [(0, '4.530')] +[2024-11-08 06:09:26,462][42004] Updated weights for policy 0, policy_version 46066 (0.0026) +[2024-11-08 06:09:27,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6690.1, 300 sec: 6609.1). Total num frames: 188694528. Throughput: 0: 1643.1. Samples: 42170242. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:09:27,935][41694] Avg episode reward: [(0, '4.374')] +[2024-11-08 06:09:32,187][42004] Updated weights for policy 0, policy_version 46076 (0.0026) +[2024-11-08 06:09:32,932][41694] Fps is (10 sec: 7306.2, 60 sec: 6690.1, 300 sec: 6609.1). Total num frames: 188731392. Throughput: 0: 1631.4. Samples: 42175150. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:09:32,933][41694] Avg episode reward: [(0, '4.517')] +[2024-11-08 06:09:37,749][42004] Updated weights for policy 0, policy_version 46086 (0.0028) +[2024-11-08 06:09:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.2, 300 sec: 6623.0). Total num frames: 188768256. Throughput: 0: 1714.3. Samples: 42186282. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:09:37,933][41694] Avg episode reward: [(0, '4.497')] +[2024-11-08 06:09:37,944][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000046086_188768256.pth... +[2024-11-08 06:09:38,086][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000045696_187170816.pth +[2024-11-08 06:09:42,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 6689.0). Total num frames: 188801024. Throughput: 0: 1700.1. Samples: 42196716. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:09:42,933][41694] Avg episode reward: [(0, '4.654')] +[2024-11-08 06:09:43,713][42004] Updated weights for policy 0, policy_version 46096 (0.0038) +[2024-11-08 06:09:47,931][41694] Fps is (10 sec: 6553.8, 60 sec: 6621.9, 300 sec: 6678.6). Total num frames: 188833792. Throughput: 0: 1696.9. Samples: 42201930. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:09:47,933][41694] Avg episode reward: [(0, '4.421')] +[2024-11-08 06:09:49,998][42004] Updated weights for policy 0, policy_version 46106 (0.0035) +[2024-11-08 06:09:52,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 6692.5). Total num frames: 188870656. Throughput: 0: 1676.6. Samples: 42211818. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:09:52,933][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 06:09:55,744][42004] Updated weights for policy 0, policy_version 46116 (0.0032) +[2024-11-08 06:09:58,922][41694] Fps is (10 sec: 5590.2, 60 sec: 6581.5, 300 sec: 6628.5). Total num frames: 188895232. Throughput: 0: 1516.8. Samples: 42217100. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:09:58,925][41694] Avg episode reward: [(0, '4.362')] +[2024-11-08 06:10:02,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6553.6, 300 sec: 6623.0). Total num frames: 188923904. Throughput: 0: 1599.5. Samples: 42224232. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:10:02,935][41694] Avg episode reward: [(0, '4.476')] +[2024-11-08 06:10:03,669][42004] Updated weights for policy 0, policy_version 46126 (0.0040) +[2024-11-08 06:10:07,932][41694] Fps is (10 sec: 7273.8, 60 sec: 6553.6, 300 sec: 6623.0). Total num frames: 188960768. Throughput: 0: 1647.9. Samples: 42234666. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:10:07,934][41694] Avg episode reward: [(0, '4.478')] +[2024-11-08 06:10:09,389][42004] Updated weights for policy 0, policy_version 46136 (0.0033) +[2024-11-08 06:10:12,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6485.4, 300 sec: 6623.0). Total num frames: 188993536. Throughput: 0: 1663.1. Samples: 42245080. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:10:12,933][41694] Avg episode reward: [(0, '4.358')] +[2024-11-08 06:10:15,650][42004] Updated weights for policy 0, policy_version 46146 (0.0025) +[2024-11-08 06:10:17,933][41694] Fps is (10 sec: 6553.2, 60 sec: 6485.2, 300 sec: 6669.1). Total num frames: 189026304. Throughput: 0: 1662.3. Samples: 42249956. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:10:17,934][41694] Avg episode reward: [(0, '4.307')] +[2024-11-08 06:10:22,207][42004] Updated weights for policy 0, policy_version 46156 (0.0039) +[2024-11-08 06:10:22,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6662.0, 300 sec: 6650.8). Total num frames: 189059072. Throughput: 0: 1620.9. Samples: 42259220. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:10:22,933][41694] Avg episode reward: [(0, '4.583')] +[2024-11-08 06:10:27,777][42004] Updated weights for policy 0, policy_version 46166 (0.0035) +[2024-11-08 06:10:27,931][41694] Fps is (10 sec: 6964.0, 60 sec: 6690.2, 300 sec: 6664.9). Total num frames: 189095936. Throughput: 0: 1628.5. Samples: 42269998. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:10:27,933][41694] Avg episode reward: [(0, '4.512')] +[2024-11-08 06:10:33,356][41694] Fps is (10 sec: 5500.9, 60 sec: 6372.0, 300 sec: 6599.6). Total num frames: 189116416. Throughput: 0: 1619.7. Samples: 42275502. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:10:33,358][41694] Avg episode reward: [(0, '4.463')] +[2024-11-08 06:10:35,581][42004] Updated weights for policy 0, policy_version 46176 (0.0020) +[2024-11-08 06:10:37,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6417.1, 300 sec: 6609.1). Total num frames: 189153280. Throughput: 0: 1572.9. Samples: 42282600. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:10:37,935][41694] Avg episode reward: [(0, '4.357')] +[2024-11-08 06:10:41,200][42004] Updated weights for policy 0, policy_version 46186 (0.0023) +[2024-11-08 06:10:42,932][41694] Fps is (10 sec: 7699.6, 60 sec: 6485.3, 300 sec: 6609.1). Total num frames: 189190144. Throughput: 0: 1740.4. Samples: 42293694. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:10:42,934][41694] Avg episode reward: [(0, '4.585')] +[2024-11-08 06:10:46,706][42004] Updated weights for policy 0, policy_version 46196 (0.0026) +[2024-11-08 06:10:47,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6553.6, 300 sec: 6623.0). Total num frames: 189227008. Throughput: 0: 1659.7. Samples: 42298918. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:10:47,934][41694] Avg episode reward: [(0, '4.596')] +[2024-11-08 06:10:52,388][42004] Updated weights for policy 0, policy_version 46206 (0.0044) +[2024-11-08 06:10:52,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6485.3, 300 sec: 6670.6). Total num frames: 189259776. Throughput: 0: 1675.0. Samples: 42310042. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:10:52,933][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 06:10:57,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6802.5, 300 sec: 6678.6). Total num frames: 189296640. Throughput: 0: 1671.4. Samples: 42320294. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:10:57,934][41694] Avg episode reward: [(0, '4.390')] +[2024-11-08 06:10:58,292][42004] Updated weights for policy 0, policy_version 46216 (0.0027) +[2024-11-08 06:11:02,932][41694] Fps is (10 sec: 6962.7, 60 sec: 6758.3, 300 sec: 6678.5). Total num frames: 189329408. Throughput: 0: 1680.8. Samples: 42325592. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:11:02,934][41694] Avg episode reward: [(0, '4.190')] +[2024-11-08 06:11:04,104][42004] Updated weights for policy 0, policy_version 46226 (0.0021) +[2024-11-08 06:11:07,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6485.4, 300 sec: 6636.9). Total num frames: 189349888. Throughput: 0: 1680.3. Samples: 42334832. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:11:07,933][41694] Avg episode reward: [(0, '4.401')] +[2024-11-08 06:11:12,738][42004] Updated weights for policy 0, policy_version 46236 (0.0027) +[2024-11-08 06:11:12,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6485.2, 300 sec: 6623.0). Total num frames: 189382656. Throughput: 0: 1601.4. Samples: 42342062. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:11:12,933][41694] Avg episode reward: [(0, '4.498')] +[2024-11-08 06:11:17,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6553.7, 300 sec: 6623.0). Total num frames: 189419520. Throughput: 0: 1603.3. Samples: 42346970. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:11:17,933][41694] Avg episode reward: [(0, '4.583')] +[2024-11-08 06:11:18,316][42004] Updated weights for policy 0, policy_version 46246 (0.0038) +[2024-11-08 06:11:22,932][41694] Fps is (10 sec: 7373.3, 60 sec: 6621.9, 300 sec: 6636.9). Total num frames: 189456384. Throughput: 0: 1683.1. Samples: 42358338. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:11:22,933][41694] Avg episode reward: [(0, '4.640')] +[2024-11-08 06:11:23,849][42004] Updated weights for policy 0, policy_version 46256 (0.0032) +[2024-11-08 06:11:27,933][41694] Fps is (10 sec: 6962.1, 60 sec: 6553.4, 300 sec: 6678.5). Total num frames: 189489152. Throughput: 0: 1674.8. Samples: 42369062. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:11:27,936][41694] Avg episode reward: [(0, '4.192')] +[2024-11-08 06:11:29,972][42004] Updated weights for policy 0, policy_version 46266 (0.0032) +[2024-11-08 06:11:32,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6806.5, 300 sec: 6664.7). Total num frames: 189521920. Throughput: 0: 1666.0. Samples: 42373888. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:11:32,934][41694] Avg episode reward: [(0, '4.560')] +[2024-11-08 06:11:35,891][42004] Updated weights for policy 0, policy_version 46276 (0.0031) +[2024-11-08 06:11:37,932][41694] Fps is (10 sec: 6964.3, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 189558784. Throughput: 0: 1652.9. Samples: 42384422. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:11:37,933][41694] Avg episode reward: [(0, '4.447')] +[2024-11-08 06:11:37,941][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000046279_189558784.pth... +[2024-11-08 06:11:38,083][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000045892_187973632.pth +[2024-11-08 06:11:42,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6485.3, 300 sec: 6609.1). Total num frames: 189579264. Throughput: 0: 1578.1. Samples: 42391310. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:11:42,936][41694] Avg episode reward: [(0, '4.508')] +[2024-11-08 06:11:43,923][42004] Updated weights for policy 0, policy_version 46286 (0.0026) +[2024-11-08 06:11:47,934][41694] Fps is (10 sec: 5732.8, 60 sec: 6485.0, 300 sec: 6609.1). Total num frames: 189616128. Throughput: 0: 1570.0. Samples: 42396244. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:11:47,937][41694] Avg episode reward: [(0, '4.295')] +[2024-11-08 06:11:49,681][42004] Updated weights for policy 0, policy_version 46296 (0.0026) +[2024-11-08 06:11:52,931][41694] Fps is (10 sec: 7373.1, 60 sec: 6553.6, 300 sec: 6623.0). Total num frames: 189652992. Throughput: 0: 1609.3. Samples: 42407252. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:11:52,933][41694] Avg episode reward: [(0, '4.417')] +[2024-11-08 06:11:55,183][42004] Updated weights for policy 0, policy_version 46306 (0.0032) +[2024-11-08 06:11:57,932][41694] Fps is (10 sec: 6964.9, 60 sec: 6485.3, 300 sec: 6623.0). Total num frames: 189685760. Throughput: 0: 1690.3. Samples: 42418124. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:11:57,934][41694] Avg episode reward: [(0, '4.607')] +[2024-11-08 06:12:00,795][42004] Updated weights for policy 0, policy_version 46316 (0.0025) +[2024-11-08 06:12:02,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6553.7, 300 sec: 6678.6). Total num frames: 189722624. Throughput: 0: 1706.3. Samples: 42423752. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:12:02,934][41694] Avg episode reward: [(0, '4.442')] +[2024-11-08 06:12:07,406][42004] Updated weights for policy 0, policy_version 46326 (0.0036) +[2024-11-08 06:12:07,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6690.1, 300 sec: 6650.8). Total num frames: 189751296. Throughput: 0: 1656.3. Samples: 42432872. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:12:07,934][41694] Avg episode reward: [(0, '4.451')] +[2024-11-08 06:12:12,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6758.5, 300 sec: 6650.8). Total num frames: 189788160. Throughput: 0: 1651.3. Samples: 42443366. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:12:12,933][41694] Avg episode reward: [(0, '4.512')] +[2024-11-08 06:12:13,123][42004] Updated weights for policy 0, policy_version 46336 (0.0028) +[2024-11-08 06:12:17,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6485.3, 300 sec: 6595.2). Total num frames: 189808640. Throughput: 0: 1642.2. Samples: 42447786. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:12:17,934][41694] Avg episode reward: [(0, '4.344')] +[2024-11-08 06:12:20,917][42004] Updated weights for policy 0, policy_version 46346 (0.0026) +[2024-11-08 06:12:22,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6485.3, 300 sec: 6595.3). Total num frames: 189845504. Throughput: 0: 1590.8. Samples: 42456008. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:12:22,933][41694] Avg episode reward: [(0, '4.389')] +[2024-11-08 06:12:26,584][42004] Updated weights for policy 0, policy_version 46356 (0.0032) +[2024-11-08 06:12:27,932][41694] Fps is (10 sec: 7373.1, 60 sec: 6553.8, 300 sec: 6595.3). Total num frames: 189882368. Throughput: 0: 1678.5. Samples: 42466842. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:12:27,934][41694] Avg episode reward: [(0, '4.449')] +[2024-11-08 06:12:31,984][42004] Updated weights for policy 0, policy_version 46366 (0.0030) +[2024-11-08 06:12:32,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6609.2). Total num frames: 189919232. Throughput: 0: 1688.5. Samples: 42472222. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:12:32,933][41694] Avg episode reward: [(0, '4.679')] +[2024-11-08 06:12:37,714][42004] Updated weights for policy 0, policy_version 46376 (0.0037) +[2024-11-08 06:12:37,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6621.8, 300 sec: 6664.7). Total num frames: 189956096. Throughput: 0: 1695.1. Samples: 42483530. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:12:37,935][41694] Avg episode reward: [(0, '4.793')] +[2024-11-08 06:12:42,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6826.7, 300 sec: 6650.8). Total num frames: 189988864. Throughput: 0: 1672.3. Samples: 42493378. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:12:42,933][41694] Avg episode reward: [(0, '4.455')] +[2024-11-08 06:12:43,771][42004] Updated weights for policy 0, policy_version 46386 (0.0045) +[2024-11-08 06:12:47,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6827.0, 300 sec: 6650.8). Total num frames: 190025728. Throughput: 0: 1668.1. Samples: 42498818. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:12:47,933][41694] Avg episode reward: [(0, '4.245')] +[2024-11-08 06:12:51,728][42004] Updated weights for policy 0, policy_version 46396 (0.0032) +[2024-11-08 06:12:52,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6553.6, 300 sec: 6595.2). Total num frames: 190046208. Throughput: 0: 1621.7. Samples: 42505846. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:12:52,934][41694] Avg episode reward: [(0, '4.326')] +[2024-11-08 06:12:57,341][42004] Updated weights for policy 0, policy_version 46406 (0.0024) +[2024-11-08 06:12:57,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6621.9, 300 sec: 6609.1). Total num frames: 190083072. Throughput: 0: 1628.5. Samples: 42516648. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:12:57,934][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 06:13:02,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6553.6, 300 sec: 6609.1). Total num frames: 190115840. Throughput: 0: 1653.0. Samples: 42522172. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:13:02,934][41694] Avg episode reward: [(0, '4.366')] +[2024-11-08 06:13:03,099][42004] Updated weights for policy 0, policy_version 46416 (0.0037) +[2024-11-08 06:13:07,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6690.2, 300 sec: 6609.1). Total num frames: 190152704. Throughput: 0: 1706.9. Samples: 42532820. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:13:07,934][41694] Avg episode reward: [(0, '4.540')] +[2024-11-08 06:13:08,622][42004] Updated weights for policy 0, policy_version 46426 (0.0032) +[2024-11-08 06:13:12,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 190189568. Throughput: 0: 1704.7. Samples: 42543554. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:13:12,933][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 06:13:14,751][42004] Updated weights for policy 0, policy_version 46436 (0.0033) +[2024-11-08 06:13:17,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6895.0, 300 sec: 6664.7). Total num frames: 190222336. Throughput: 0: 1692.6. Samples: 42548388. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:13:17,933][41694] Avg episode reward: [(0, '4.597')] +[2024-11-08 06:13:20,504][42004] Updated weights for policy 0, policy_version 46446 (0.0040) +[2024-11-08 06:13:22,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6894.9, 300 sec: 6664.7). Total num frames: 190259200. Throughput: 0: 1682.1. Samples: 42559224. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:13:22,934][41694] Avg episode reward: [(0, '4.487')] +[2024-11-08 06:13:27,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6621.9, 300 sec: 6609.1). Total num frames: 190279680. Throughput: 0: 1614.3. Samples: 42566022. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:13:27,935][41694] Avg episode reward: [(0, '4.557')] +[2024-11-08 06:13:28,409][42004] Updated weights for policy 0, policy_version 46456 (0.0021) +[2024-11-08 06:13:32,932][41694] Fps is (10 sec: 5734.6, 60 sec: 6621.9, 300 sec: 6609.2). Total num frames: 190316544. Throughput: 0: 1614.1. Samples: 42571454. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:13:32,933][41694] Avg episode reward: [(0, '4.449')] +[2024-11-08 06:13:33,886][42004] Updated weights for policy 0, policy_version 46466 (0.0038) +[2024-11-08 06:13:37,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6623.0). Total num frames: 190353408. Throughput: 0: 1702.8. Samples: 42582470. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:13:37,933][41694] Avg episode reward: [(0, '4.491')] +[2024-11-08 06:13:37,946][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000046473_190353408.pth... +[2024-11-08 06:13:38,085][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000046086_188768256.pth +[2024-11-08 06:13:39,981][42004] Updated weights for policy 0, policy_version 46476 (0.0038) +[2024-11-08 06:13:42,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 6609.1). Total num frames: 190386176. Throughput: 0: 1685.3. Samples: 42592488. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:13:42,934][41694] Avg episode reward: [(0, '4.280')] +[2024-11-08 06:13:46,017][42004] Updated weights for policy 0, policy_version 46486 (0.0032) +[2024-11-08 06:13:47,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6485.3, 300 sec: 6636.9). Total num frames: 190414848. Throughput: 0: 1677.1. Samples: 42597642. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:13:47,934][41694] Avg episode reward: [(0, '4.489')] +[2024-11-08 06:13:51,849][42004] Updated weights for policy 0, policy_version 46496 (0.0025) +[2024-11-08 06:13:52,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6758.4, 300 sec: 6636.9). Total num frames: 190451712. Throughput: 0: 1664.8. Samples: 42607734. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:13:52,933][41694] Avg episode reward: [(0, '4.500')] +[2024-11-08 06:13:57,382][42004] Updated weights for policy 0, policy_version 46506 (0.0034) +[2024-11-08 06:13:59,989][41694] Fps is (10 sec: 6114.7, 60 sec: 6534.3, 300 sec: 6590.9). Total num frames: 190488576. Throughput: 0: 1605.8. Samples: 42619120. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:13:59,992][41694] Avg episode reward: [(0, '4.566')] +[2024-11-08 06:14:02,931][41694] Fps is (10 sec: 5734.3, 60 sec: 6553.6, 300 sec: 6581.4). Total num frames: 190509056. Throughput: 0: 1603.5. Samples: 42620546. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:14:02,934][41694] Avg episode reward: [(0, '4.586')] +[2024-11-08 06:14:05,510][42004] Updated weights for policy 0, policy_version 46516 (0.0023) +[2024-11-08 06:14:07,932][41694] Fps is (10 sec: 7219.9, 60 sec: 6553.6, 300 sec: 6581.4). Total num frames: 190545920. Throughput: 0: 1595.2. Samples: 42631006. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 06:14:07,933][41694] Avg episode reward: [(0, '4.498')] +[2024-11-08 06:14:11,012][42004] Updated weights for policy 0, policy_version 46526 (0.0031) +[2024-11-08 06:14:12,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6553.6, 300 sec: 6595.3). Total num frames: 190582784. Throughput: 0: 1691.3. Samples: 42642130. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 06:14:12,933][41694] Avg episode reward: [(0, '4.431')] +[2024-11-08 06:14:16,734][42004] Updated weights for policy 0, policy_version 46536 (0.0025) +[2024-11-08 06:14:17,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6621.8, 300 sec: 6645.0). Total num frames: 190619648. Throughput: 0: 1681.8. Samples: 42647134. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 06:14:17,933][41694] Avg episode reward: [(0, '4.596')] +[2024-11-08 06:14:22,932][41694] Fps is (10 sec: 6553.3, 60 sec: 6485.3, 300 sec: 6623.0). Total num frames: 190648320. Throughput: 0: 1662.3. Samples: 42657276. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 06:14:22,933][41694] Avg episode reward: [(0, '4.546')] +[2024-11-08 06:14:23,038][42004] Updated weights for policy 0, policy_version 46546 (0.0040) +[2024-11-08 06:14:27,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6758.4, 300 sec: 6623.0). Total num frames: 190685184. Throughput: 0: 1682.1. Samples: 42668182. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 06:14:27,933][41694] Avg episode reward: [(0, '4.421')] +[2024-11-08 06:14:28,469][42004] Updated weights for policy 0, policy_version 46556 (0.0029) +[2024-11-08 06:14:34,449][41694] Fps is (10 sec: 6046.0, 60 sec: 6525.1, 300 sec: 6575.3). Total num frames: 190717952. Throughput: 0: 1635.8. Samples: 42673736. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 06:14:34,452][41694] Avg episode reward: [(0, '4.418')] +[2024-11-08 06:14:36,233][42004] Updated weights for policy 0, policy_version 46566 (0.0037) +[2024-11-08 06:14:37,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6485.3, 300 sec: 6581.4). Total num frames: 190742528. Throughput: 0: 1623.8. Samples: 42680804. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:14:37,933][41694] Avg episode reward: [(0, '4.293')] +[2024-11-08 06:14:41,881][42004] Updated weights for policy 0, policy_version 46576 (0.0041) +[2024-11-08 06:14:42,933][41694] Fps is (10 sec: 7241.7, 60 sec: 6553.4, 300 sec: 6595.2). Total num frames: 190779392. Throughput: 0: 1689.2. Samples: 42691660. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:14:42,935][41694] Avg episode reward: [(0, '4.249')] +[2024-11-08 06:14:47,843][42004] Updated weights for policy 0, policy_version 46586 (0.0037) +[2024-11-08 06:14:47,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.1, 300 sec: 6595.3). Total num frames: 190816256. Throughput: 0: 1687.2. Samples: 42696470. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:14:47,934][41694] Avg episode reward: [(0, '4.435')] +[2024-11-08 06:14:52,932][41694] Fps is (10 sec: 6964.1, 60 sec: 6621.8, 300 sec: 6645.3). Total num frames: 190849024. Throughput: 0: 1696.8. Samples: 42707364. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:14:52,940][41694] Avg episode reward: [(0, '4.366')] +[2024-11-08 06:14:53,955][42004] Updated weights for policy 0, policy_version 46596 (0.0050) +[2024-11-08 06:14:57,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6786.3, 300 sec: 6636.9). Total num frames: 190881792. Throughput: 0: 1656.0. Samples: 42716652. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:14:57,933][41694] Avg episode reward: [(0, '4.494')] +[2024-11-08 06:14:59,970][42004] Updated weights for policy 0, policy_version 46606 (0.0038) +[2024-11-08 06:15:02,936][41694] Fps is (10 sec: 6550.5, 60 sec: 6757.8, 300 sec: 6622.9). Total num frames: 190914560. Throughput: 0: 1666.6. Samples: 42722140. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:15:02,940][41694] Avg episode reward: [(0, '4.468')] +[2024-11-08 06:15:05,883][42004] Updated weights for policy 0, policy_version 46616 (0.0056) +[2024-11-08 06:15:08,948][41694] Fps is (10 sec: 5577.3, 60 sec: 6511.6, 300 sec: 6586.4). Total num frames: 190943232. Throughput: 0: 1636.1. Samples: 42732564. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:15:08,951][41694] Avg episode reward: [(0, '4.459')] +[2024-11-08 06:15:12,931][41694] Fps is (10 sec: 5737.2, 60 sec: 6485.3, 300 sec: 6595.3). Total num frames: 190971904. Throughput: 0: 1586.7. Samples: 42739584. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:15:12,934][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 06:15:13,611][42004] Updated weights for policy 0, policy_version 46626 (0.0022) +[2024-11-08 06:15:17,932][41694] Fps is (10 sec: 6838.9, 60 sec: 6417.1, 300 sec: 6595.3). Total num frames: 191004672. Throughput: 0: 1631.6. Samples: 42744684. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:15:17,933][41694] Avg episode reward: [(0, '4.330')] +[2024-11-08 06:15:19,871][42004] Updated weights for policy 0, policy_version 46636 (0.0053) +[2024-11-08 06:15:22,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6553.6, 300 sec: 6595.2). Total num frames: 191041536. Throughput: 0: 1644.8. Samples: 42754820. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:15:22,933][41694] Avg episode reward: [(0, '4.591')] +[2024-11-08 06:15:25,393][42004] Updated weights for policy 0, policy_version 46646 (0.0030) +[2024-11-08 06:15:27,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6485.4, 300 sec: 6646.5). Total num frames: 191074304. Throughput: 0: 1643.3. Samples: 42765606. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:15:27,937][41694] Avg episode reward: [(0, '4.503')] +[2024-11-08 06:15:31,428][42004] Updated weights for policy 0, policy_version 46656 (0.0042) +[2024-11-08 06:15:32,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6723.6, 300 sec: 6636.9). Total num frames: 191111168. Throughput: 0: 1646.3. Samples: 42770554. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:15:32,934][41694] Avg episode reward: [(0, '4.303')] +[2024-11-08 06:15:36,974][42004] Updated weights for policy 0, policy_version 46666 (0.0021) +[2024-11-08 06:15:37,935][41694] Fps is (10 sec: 7371.2, 60 sec: 6758.1, 300 sec: 6636.9). Total num frames: 191148032. Throughput: 0: 1651.7. Samples: 42781692. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:15:37,940][41694] Avg episode reward: [(0, '4.410')] +[2024-11-08 06:15:38,005][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000046668_191152128.pth... +[2024-11-08 06:15:38,153][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000046279_189558784.pth +[2024-11-08 06:15:43,509][41694] Fps is (10 sec: 5808.8, 60 sec: 6491.3, 300 sec: 6582.4). Total num frames: 191172608. Throughput: 0: 1552.1. Samples: 42787394. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:15:43,512][41694] Avg episode reward: [(0, '4.537')] +[2024-11-08 06:15:44,954][42004] Updated weights for policy 0, policy_version 46676 (0.0033) +[2024-11-08 06:15:47,932][41694] Fps is (10 sec: 5735.6, 60 sec: 6485.3, 300 sec: 6595.3). Total num frames: 191205376. Throughput: 0: 1595.4. Samples: 42793926. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:15:47,934][41694] Avg episode reward: [(0, '4.400')] +[2024-11-08 06:15:50,605][42004] Updated weights for policy 0, policy_version 46686 (0.0021) +[2024-11-08 06:15:52,932][41694] Fps is (10 sec: 7389.9, 60 sec: 6553.6, 300 sec: 6595.3). Total num frames: 191242240. Throughput: 0: 1642.7. Samples: 42804816. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:15:52,933][41694] Avg episode reward: [(0, '4.234')] +[2024-11-08 06:15:56,080][42004] Updated weights for policy 0, policy_version 46696 (0.0028) +[2024-11-08 06:15:57,932][41694] Fps is (10 sec: 7372.3, 60 sec: 6621.8, 300 sec: 6609.1). Total num frames: 191279104. Throughput: 0: 1700.0. Samples: 42816086. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:15:57,934][41694] Avg episode reward: [(0, '4.333')] +[2024-11-08 06:16:02,185][42004] Updated weights for policy 0, policy_version 46706 (0.0034) +[2024-11-08 06:16:02,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6622.4, 300 sec: 6650.8). Total num frames: 191311872. Throughput: 0: 1701.7. Samples: 42821258. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:16:02,933][41694] Avg episode reward: [(0, '4.724')] +[2024-11-08 06:16:07,829][42004] Updated weights for policy 0, policy_version 46716 (0.0027) +[2024-11-08 06:16:07,932][41694] Fps is (10 sec: 6963.6, 60 sec: 6874.8, 300 sec: 6664.7). Total num frames: 191348736. Throughput: 0: 1695.1. Samples: 42831100. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:16:07,935][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 06:16:12,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6826.6, 300 sec: 6650.8). Total num frames: 191381504. Throughput: 0: 1681.0. Samples: 42841250. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:16:12,934][41694] Avg episode reward: [(0, '4.553')] +[2024-11-08 06:16:14,032][42004] Updated weights for policy 0, policy_version 46726 (0.0031) +[2024-11-08 06:16:18,054][41694] Fps is (10 sec: 4855.6, 60 sec: 6540.2, 300 sec: 6578.6). Total num frames: 191397888. Throughput: 0: 1680.5. Samples: 42846382. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:16:18,056][41694] Avg episode reward: [(0, '4.637')] +[2024-11-08 06:16:22,200][42004] Updated weights for policy 0, policy_version 46736 (0.0022) +[2024-11-08 06:16:22,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6553.6, 300 sec: 6595.3). Total num frames: 191434752. Throughput: 0: 1587.5. Samples: 42853124. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:16:22,933][41694] Avg episode reward: [(0, '4.350')] +[2024-11-08 06:16:27,471][42004] Updated weights for policy 0, policy_version 46746 (0.0028) +[2024-11-08 06:16:27,931][41694] Fps is (10 sec: 7464.5, 60 sec: 6621.9, 300 sec: 6609.1). Total num frames: 191471616. Throughput: 0: 1734.9. Samples: 42864464. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:16:27,933][41694] Avg episode reward: [(0, '4.371')] +[2024-11-08 06:16:32,493][42004] Updated weights for policy 0, policy_version 46756 (0.0022) +[2024-11-08 06:16:32,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6690.2, 300 sec: 6623.0). Total num frames: 191512576. Throughput: 0: 1701.4. Samples: 42870490. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:16:32,934][41694] Avg episode reward: [(0, '4.415')] +[2024-11-08 06:16:37,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6690.4, 300 sec: 6678.6). Total num frames: 191549440. Throughput: 0: 1705.7. Samples: 42881574. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:16:37,933][41694] Avg episode reward: [(0, '4.415')] +[2024-11-08 06:16:38,306][42004] Updated weights for policy 0, policy_version 46766 (0.0031) +[2024-11-08 06:16:42,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7030.9, 300 sec: 6692.5). Total num frames: 191590400. Throughput: 0: 1715.1. Samples: 42893266. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:16:42,933][41694] Avg episode reward: [(0, '4.373')] +[2024-11-08 06:16:43,372][42004] Updated weights for policy 0, policy_version 46776 (0.0024) +[2024-11-08 06:16:47,932][41694] Fps is (10 sec: 7782.2, 60 sec: 7031.4, 300 sec: 6692.4). Total num frames: 191627264. Throughput: 0: 1729.1. Samples: 42899070. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:16:47,933][41694] Avg episode reward: [(0, '4.322')] +[2024-11-08 06:16:48,720][42004] Updated weights for policy 0, policy_version 46786 (0.0032) +[2024-11-08 06:16:52,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6690.1, 300 sec: 6636.9). Total num frames: 191643648. Throughput: 0: 1725.6. Samples: 42908752. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:16:52,933][41694] Avg episode reward: [(0, '4.318')] +[2024-11-08 06:16:57,234][42004] Updated weights for policy 0, policy_version 46796 (0.0034) +[2024-11-08 06:16:57,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6690.2, 300 sec: 6636.9). Total num frames: 191680512. Throughput: 0: 1666.4. Samples: 42916240. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:16:57,934][41694] Avg episode reward: [(0, '4.392')] +[2024-11-08 06:17:02,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 6650.8). Total num frames: 191713280. Throughput: 0: 1673.9. Samples: 42921500. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:17:02,933][41694] Avg episode reward: [(0, '4.533')] +[2024-11-08 06:17:03,195][42004] Updated weights for policy 0, policy_version 46806 (0.0028) +[2024-11-08 06:17:07,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.2, 300 sec: 6650.8). Total num frames: 191750144. Throughput: 0: 1757.0. Samples: 42932188. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:17:07,933][41694] Avg episode reward: [(0, '4.577')] +[2024-11-08 06:17:08,942][42004] Updated weights for policy 0, policy_version 46816 (0.0029) +[2024-11-08 06:17:12,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.2, 300 sec: 6692.5). Total num frames: 191782912. Throughput: 0: 1724.8. Samples: 42942078. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:17:12,932][41694] Avg episode reward: [(0, '4.212')] +[2024-11-08 06:17:14,755][42004] Updated weights for policy 0, policy_version 46826 (0.0028) +[2024-11-08 06:17:17,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7114.3, 300 sec: 6706.3). Total num frames: 191823872. Throughput: 0: 1722.0. Samples: 42947978. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:17:17,933][41694] Avg episode reward: [(0, '4.508')] +[2024-11-08 06:17:20,015][42004] Updated weights for policy 0, policy_version 46836 (0.0026) +[2024-11-08 06:17:22,932][41694] Fps is (10 sec: 7782.0, 60 sec: 7099.7, 300 sec: 6706.3). Total num frames: 191860736. Throughput: 0: 1733.8. Samples: 42959594. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:17:22,934][41694] Avg episode reward: [(0, '4.607')] +[2024-11-08 06:17:27,870][42004] Updated weights for policy 0, policy_version 46846 (0.0031) +[2024-11-08 06:17:27,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6826.7, 300 sec: 6650.8). Total num frames: 191881216. Throughput: 0: 1642.3. Samples: 42967170. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:17:27,933][41694] Avg episode reward: [(0, '4.517')] +[2024-11-08 06:17:32,932][41694] Fps is (10 sec: 5734.6, 60 sec: 6758.4, 300 sec: 6650.8). Total num frames: 191918080. Throughput: 0: 1615.0. Samples: 42971744. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:17:32,934][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 06:17:33,292][42004] Updated weights for policy 0, policy_version 46856 (0.0031) +[2024-11-08 06:17:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 191954944. Throughput: 0: 1643.9. Samples: 42982726. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:17:37,944][41694] Avg episode reward: [(0, '4.374')] +[2024-11-08 06:17:37,978][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000046864_191954944.pth... +[2024-11-08 06:17:38,081][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000046473_190353408.pth +[2024-11-08 06:17:38,868][42004] Updated weights for policy 0, policy_version 46866 (0.0025) +[2024-11-08 06:17:42,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6664.7). Total num frames: 191991808. Throughput: 0: 1737.7. Samples: 42994438. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:17:42,933][41694] Avg episode reward: [(0, '4.452')] +[2024-11-08 06:17:44,004][42004] Updated weights for policy 0, policy_version 46876 (0.0032) +[2024-11-08 06:17:47,932][41694] Fps is (10 sec: 7782.0, 60 sec: 6758.3, 300 sec: 6734.1). Total num frames: 192032768. Throughput: 0: 1754.8. Samples: 43000468. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:17:47,935][41694] Avg episode reward: [(0, '4.589')] +[2024-11-08 06:17:49,321][42004] Updated weights for policy 0, policy_version 46886 (0.0021) +[2024-11-08 06:17:52,931][41694] Fps is (10 sec: 8192.0, 60 sec: 7168.0, 300 sec: 6748.0). Total num frames: 192073728. Throughput: 0: 1777.8. Samples: 43012188. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:17:52,935][41694] Avg episode reward: [(0, '4.563')] +[2024-11-08 06:17:54,531][42004] Updated weights for policy 0, policy_version 46896 (0.0019) +[2024-11-08 06:17:57,931][41694] Fps is (10 sec: 7783.0, 60 sec: 7168.0, 300 sec: 6761.9). Total num frames: 192110592. Throughput: 0: 1814.8. Samples: 43023742. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:17:57,933][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 06:18:02,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6826.7, 300 sec: 6678.6). Total num frames: 192122880. Throughput: 0: 1777.9. Samples: 43027984. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:18:02,934][41694] Avg episode reward: [(0, '4.636')] +[2024-11-08 06:18:03,168][42004] Updated weights for policy 0, policy_version 46906 (0.0025) +[2024-11-08 06:18:07,931][41694] Fps is (10 sec: 4505.6, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 192155648. Throughput: 0: 1640.2. Samples: 43033402. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:18:07,933][41694] Avg episode reward: [(0, '4.358')] +[2024-11-08 06:18:09,907][42004] Updated weights for policy 0, policy_version 46916 (0.0039) +[2024-11-08 06:18:12,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 192188416. Throughput: 0: 1689.7. Samples: 43043208. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:18:12,933][41694] Avg episode reward: [(0, '4.487')] +[2024-11-08 06:18:15,332][42004] Updated weights for policy 0, policy_version 46926 (0.0029) +[2024-11-08 06:18:17,932][41694] Fps is (10 sec: 6962.8, 60 sec: 6690.1, 300 sec: 6664.7). Total num frames: 192225280. Throughput: 0: 1719.9. Samples: 43049138. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:18:17,935][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 06:18:20,506][42004] Updated weights for policy 0, policy_version 46936 (0.0029) +[2024-11-08 06:18:22,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 192266240. Throughput: 0: 1739.5. Samples: 43061002. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:18:22,934][41694] Avg episode reward: [(0, '4.307')] +[2024-11-08 06:18:25,857][42004] Updated weights for policy 0, policy_version 46946 (0.0028) +[2024-11-08 06:18:27,931][41694] Fps is (10 sec: 8192.4, 60 sec: 7099.7, 300 sec: 6748.0). Total num frames: 192307200. Throughput: 0: 1741.0. Samples: 43072784. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:18:27,932][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 06:18:31,108][42004] Updated weights for policy 0, policy_version 46956 (0.0031) +[2024-11-08 06:18:32,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7099.7, 300 sec: 6748.0). Total num frames: 192344064. Throughput: 0: 1728.9. Samples: 43078268. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:18:32,933][41694] Avg episode reward: [(0, '4.386')] +[2024-11-08 06:18:37,933][41694] Fps is (10 sec: 5323.8, 60 sec: 6758.2, 300 sec: 6692.4). Total num frames: 192360448. Throughput: 0: 1655.9. Samples: 43086706. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:18:37,935][41694] Avg episode reward: [(0, '4.287')] +[2024-11-08 06:18:39,743][42004] Updated weights for policy 0, policy_version 46966 (0.0039) +[2024-11-08 06:18:42,932][41694] Fps is (10 sec: 4505.5, 60 sec: 6621.8, 300 sec: 6692.4). Total num frames: 192389120. Throughput: 0: 1564.3. Samples: 43094136. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:18:42,938][41694] Avg episode reward: [(0, '4.569')] +[2024-11-08 06:18:45,866][42004] Updated weights for policy 0, policy_version 46976 (0.0037) +[2024-11-08 06:18:47,932][41694] Fps is (10 sec: 6554.7, 60 sec: 6553.7, 300 sec: 6692.4). Total num frames: 192425984. Throughput: 0: 1584.8. Samples: 43099298. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:18:47,934][41694] Avg episode reward: [(0, '4.591')] +[2024-11-08 06:18:50,995][42004] Updated weights for policy 0, policy_version 46986 (0.0018) +[2024-11-08 06:18:52,931][41694] Fps is (10 sec: 7782.7, 60 sec: 6553.6, 300 sec: 6753.4). Total num frames: 192466944. Throughput: 0: 1728.4. Samples: 43111182. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:18:52,934][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 06:18:56,323][42004] Updated weights for policy 0, policy_version 46996 (0.0028) +[2024-11-08 06:18:57,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6553.6, 300 sec: 6761.9). Total num frames: 192503808. Throughput: 0: 1769.7. Samples: 43122844. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:18:57,935][41694] Avg episode reward: [(0, '4.476')] +[2024-11-08 06:19:01,901][42004] Updated weights for policy 0, policy_version 47006 (0.0034) +[2024-11-08 06:19:02,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7031.5, 300 sec: 6775.8). Total num frames: 192544768. Throughput: 0: 1759.7. Samples: 43128326. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:19:02,935][41694] Avg episode reward: [(0, '4.534')] +[2024-11-08 06:19:07,188][42004] Updated weights for policy 0, policy_version 47016 (0.0027) +[2024-11-08 06:19:07,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7099.7, 300 sec: 6775.8). Total num frames: 192581632. Throughput: 0: 1745.8. Samples: 43139564. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:19:07,933][41694] Avg episode reward: [(0, '4.573')] +[2024-11-08 06:19:12,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6826.6, 300 sec: 6706.3). Total num frames: 192598016. Throughput: 0: 1620.1. Samples: 43145690. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:19:12,934][41694] Avg episode reward: [(0, '4.446')] +[2024-11-08 06:19:16,110][42004] Updated weights for policy 0, policy_version 47026 (0.0035) +[2024-11-08 06:19:17,931][41694] Fps is (10 sec: 4505.7, 60 sec: 6690.2, 300 sec: 6706.3). Total num frames: 192626688. Throughput: 0: 1600.1. Samples: 43150272. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:19:17,933][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 06:19:21,696][42004] Updated weights for policy 0, policy_version 47036 (0.0028) +[2024-11-08 06:19:22,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 6720.2). Total num frames: 192667648. Throughput: 0: 1649.6. Samples: 43160934. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:19:22,933][41694] Avg episode reward: [(0, '4.585')] +[2024-11-08 06:19:27,051][42004] Updated weights for policy 0, policy_version 47046 (0.0036) +[2024-11-08 06:19:27,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6621.8, 300 sec: 6768.9). Total num frames: 192704512. Throughput: 0: 1744.7. Samples: 43172646. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:19:27,934][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 06:19:32,117][42004] Updated weights for policy 0, policy_version 47056 (0.0034) +[2024-11-08 06:19:32,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6690.1, 300 sec: 6789.6). Total num frames: 192745472. Throughput: 0: 1759.7. Samples: 43178484. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:19:32,934][41694] Avg episode reward: [(0, '4.525')] +[2024-11-08 06:19:37,261][42004] Updated weights for policy 0, policy_version 47066 (0.0039) +[2024-11-08 06:19:37,933][41694] Fps is (10 sec: 8191.2, 60 sec: 7099.8, 300 sec: 6803.5). Total num frames: 192786432. Throughput: 0: 1769.4. Samples: 43190806. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:19:37,934][41694] Avg episode reward: [(0, '4.527')] +[2024-11-08 06:19:37,948][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000047067_192786432.pth... +[2024-11-08 06:19:38,071][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000046668_191152128.pth +[2024-11-08 06:19:42,771][42004] Updated weights for policy 0, policy_version 47076 (0.0046) +[2024-11-08 06:19:42,933][41694] Fps is (10 sec: 7781.5, 60 sec: 7236.2, 300 sec: 6803.5). Total num frames: 192823296. Throughput: 0: 1758.1. Samples: 43201958. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:19:42,935][41694] Avg episode reward: [(0, '4.504')] +[2024-11-08 06:19:47,931][41694] Fps is (10 sec: 5325.4, 60 sec: 6894.9, 300 sec: 6748.0). Total num frames: 192839680. Throughput: 0: 1713.1. Samples: 43205414. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:19:47,933][41694] Avg episode reward: [(0, '4.324')] +[2024-11-08 06:19:51,476][42004] Updated weights for policy 0, policy_version 47086 (0.0028) +[2024-11-08 06:19:52,932][41694] Fps is (10 sec: 4915.7, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 192872448. Throughput: 0: 1627.1. Samples: 43212786. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:19:52,934][41694] Avg episode reward: [(0, '4.556')] +[2024-11-08 06:19:56,968][42004] Updated weights for policy 0, policy_version 47096 (0.0026) +[2024-11-08 06:19:57,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 6762.0). Total num frames: 192909312. Throughput: 0: 1740.0. Samples: 43223992. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:19:57,934][41694] Avg episode reward: [(0, '4.337')] +[2024-11-08 06:20:02,328][42004] Updated weights for policy 0, policy_version 47106 (0.0038) +[2024-11-08 06:20:02,931][41694] Fps is (10 sec: 7782.7, 60 sec: 6758.4, 300 sec: 6827.0). Total num frames: 192950272. Throughput: 0: 1766.5. Samples: 43229766. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:20:02,933][41694] Avg episode reward: [(0, '4.789')] +[2024-11-08 06:20:07,408][42004] Updated weights for policy 0, policy_version 47116 (0.0032) +[2024-11-08 06:20:07,931][41694] Fps is (10 sec: 8192.1, 60 sec: 6826.7, 300 sec: 6845.2). Total num frames: 192991232. Throughput: 0: 1788.4. Samples: 43241414. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:20:07,933][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 06:20:12,933][41694] Fps is (10 sec: 7371.5, 60 sec: 7099.6, 300 sec: 6845.1). Total num frames: 193024000. Throughput: 0: 1785.9. Samples: 43253016. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:20:12,935][41694] Avg episode reward: [(0, '4.573')] +[2024-11-08 06:20:13,006][42004] Updated weights for policy 0, policy_version 47126 (0.0027) +[2024-11-08 06:20:17,932][41694] Fps is (10 sec: 6553.3, 60 sec: 7167.9, 300 sec: 6831.3). Total num frames: 193056768. Throughput: 0: 1758.9. Samples: 43257636. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:20:17,934][41694] Avg episode reward: [(0, '4.470')] +[2024-11-08 06:20:22,300][42004] Updated weights for policy 0, policy_version 47136 (0.0049) +[2024-11-08 06:20:22,932][41694] Fps is (10 sec: 4916.0, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 193073152. Throughput: 0: 1615.9. Samples: 43263518. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:20:22,933][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 06:20:27,931][41694] Fps is (10 sec: 4915.4, 60 sec: 6690.2, 300 sec: 6761.9). Total num frames: 193105920. Throughput: 0: 1575.4. Samples: 43272850. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:20:27,934][41694] Avg episode reward: [(0, '4.408')] +[2024-11-08 06:20:28,153][42004] Updated weights for policy 0, policy_version 47146 (0.0028) +[2024-11-08 06:20:32,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 6761.9). Total num frames: 193142784. Throughput: 0: 1623.8. Samples: 43278486. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:20:32,933][41694] Avg episode reward: [(0, '4.353')] +[2024-11-08 06:20:33,627][42004] Updated weights for policy 0, policy_version 47156 (0.0029) +[2024-11-08 06:20:37,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6622.0, 300 sec: 6830.8). Total num frames: 193183744. Throughput: 0: 1705.7. Samples: 43289544. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:20:37,933][41694] Avg episode reward: [(0, '4.415')] +[2024-11-08 06:20:38,921][42004] Updated weights for policy 0, policy_version 47166 (0.0016) +[2024-11-08 06:20:42,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6553.7, 300 sec: 6817.4). Total num frames: 193216512. Throughput: 0: 1702.9. Samples: 43300622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:20:42,933][41694] Avg episode reward: [(0, '4.449')] +[2024-11-08 06:20:44,811][42004] Updated weights for policy 0, policy_version 47176 (0.0026) +[2024-11-08 06:20:47,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6894.9, 300 sec: 6817.4). Total num frames: 193253376. Throughput: 0: 1699.3. Samples: 43306234. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:20:47,933][41694] Avg episode reward: [(0, '4.749')] +[2024-11-08 06:20:50,361][42004] Updated weights for policy 0, policy_version 47186 (0.0030) +[2024-11-08 06:20:52,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6894.9, 300 sec: 6803.5). Total num frames: 193286144. Throughput: 0: 1679.2. Samples: 43316980. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:20:52,934][41694] Avg episode reward: [(0, '4.609')] +[2024-11-08 06:20:57,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6553.6, 300 sec: 6748.0). Total num frames: 193302528. Throughput: 0: 1547.8. Samples: 43322664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:20:57,934][41694] Avg episode reward: [(0, '4.672')] +[2024-11-08 06:20:59,067][42004] Updated weights for policy 0, policy_version 47196 (0.0041) +[2024-11-08 06:21:02,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6485.3, 300 sec: 6748.0). Total num frames: 193339392. Throughput: 0: 1558.3. Samples: 43327758. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:21:02,934][41694] Avg episode reward: [(0, '4.371')] +[2024-11-08 06:21:04,985][42004] Updated weights for policy 0, policy_version 47206 (0.0033) +[2024-11-08 06:21:07,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6417.1, 300 sec: 6761.9). Total num frames: 193376256. Throughput: 0: 1667.0. Samples: 43338532. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:21:07,933][41694] Avg episode reward: [(0, '4.356')] +[2024-11-08 06:21:10,684][42004] Updated weights for policy 0, policy_version 47216 (0.0025) +[2024-11-08 06:21:12,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6417.2, 300 sec: 6820.2). Total num frames: 193409024. Throughput: 0: 1681.4. Samples: 43348514. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:21:12,933][41694] Avg episode reward: [(0, '4.388')] +[2024-11-08 06:21:16,605][42004] Updated weights for policy 0, policy_version 47226 (0.0023) +[2024-11-08 06:21:17,935][41694] Fps is (10 sec: 6960.8, 60 sec: 6485.0, 300 sec: 6817.3). Total num frames: 193445888. Throughput: 0: 1670.5. Samples: 43353664. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:21:17,937][41694] Avg episode reward: [(0, '4.302')] +[2024-11-08 06:21:22,015][42004] Updated weights for policy 0, policy_version 47236 (0.0026) +[2024-11-08 06:21:22,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 193482752. Throughput: 0: 1681.9. Samples: 43365230. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:21:22,933][41694] Avg episode reward: [(0, '4.458')] +[2024-11-08 06:21:27,931][41694] Fps is (10 sec: 6965.7, 60 sec: 6826.7, 300 sec: 6789.6). Total num frames: 193515520. Throughput: 0: 1668.5. Samples: 43375706. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:21:27,935][41694] Avg episode reward: [(0, '4.672')] +[2024-11-08 06:21:27,947][42004] Updated weights for policy 0, policy_version 47246 (0.0040) +[2024-11-08 06:21:32,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6553.6, 300 sec: 6734.1). Total num frames: 193536000. Throughput: 0: 1587.2. Samples: 43377660. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:21:32,934][41694] Avg episode reward: [(0, '4.645')] +[2024-11-08 06:21:36,187][42004] Updated weights for policy 0, policy_version 47256 (0.0023) +[2024-11-08 06:21:37,932][41694] Fps is (10 sec: 5734.1, 60 sec: 6485.3, 300 sec: 6720.2). Total num frames: 193572864. Throughput: 0: 1563.6. Samples: 43387342. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:21:37,935][41694] Avg episode reward: [(0, '4.621')] +[2024-11-08 06:21:37,949][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000047259_193572864.pth... +[2024-11-08 06:21:38,103][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000046864_191954944.pth +[2024-11-08 06:21:41,752][42004] Updated weights for policy 0, policy_version 47266 (0.0025) +[2024-11-08 06:21:42,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 193609728. Throughput: 0: 1684.1. Samples: 43398450. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:21:42,933][41694] Avg episode reward: [(0, '4.493')] +[2024-11-08 06:21:47,036][42004] Updated weights for policy 0, policy_version 47276 (0.0028) +[2024-11-08 06:21:47,934][41694] Fps is (10 sec: 7371.3, 60 sec: 6553.3, 300 sec: 6789.6). Total num frames: 193646592. Throughput: 0: 1694.5. Samples: 43404014. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:21:47,937][41694] Avg episode reward: [(0, '4.496')] +[2024-11-08 06:21:52,317][42004] Updated weights for policy 0, policy_version 47286 (0.0026) +[2024-11-08 06:21:52,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6690.2, 300 sec: 6803.5). Total num frames: 193687552. Throughput: 0: 1719.0. Samples: 43415888. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:21:52,934][41694] Avg episode reward: [(0, '4.477')] +[2024-11-08 06:21:57,815][42004] Updated weights for policy 0, policy_version 47296 (0.0031) +[2024-11-08 06:21:57,932][41694] Fps is (10 sec: 7784.0, 60 sec: 7031.4, 300 sec: 6817.4). Total num frames: 193724416. Throughput: 0: 1750.5. Samples: 43427286. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:21:57,935][41694] Avg episode reward: [(0, '4.658')] +[2024-11-08 06:22:05,216][41694] Fps is (10 sec: 5334.8, 60 sec: 6642.0, 300 sec: 6737.5). Total num frames: 193753088. Throughput: 0: 1651.8. Samples: 43431764. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:22:05,218][41694] Avg episode reward: [(0, '4.331')] +[2024-11-08 06:22:07,129][42004] Updated weights for policy 0, policy_version 47306 (0.0037) +[2024-11-08 06:22:07,932][41694] Fps is (10 sec: 4505.7, 60 sec: 6553.6, 300 sec: 6734.1). Total num frames: 193769472. Throughput: 0: 1602.4. Samples: 43437340. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:22:07,933][41694] Avg episode reward: [(0, '4.247')] +[2024-11-08 06:22:12,931][41694] Fps is (10 sec: 5839.8, 60 sec: 6485.4, 300 sec: 6692.4). Total num frames: 193798144. Throughput: 0: 1570.1. Samples: 43446360. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:22:12,933][41694] Avg episode reward: [(0, '4.393')] +[2024-11-08 06:22:13,592][42004] Updated weights for policy 0, policy_version 47316 (0.0027) +[2024-11-08 06:22:17,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6485.7, 300 sec: 6692.4). Total num frames: 193835008. Throughput: 0: 1645.2. Samples: 43451694. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:22:17,936][41694] Avg episode reward: [(0, '4.369')] +[2024-11-08 06:22:19,620][42004] Updated weights for policy 0, policy_version 47326 (0.0027) +[2024-11-08 06:22:22,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6417.1, 300 sec: 6734.1). Total num frames: 193867776. Throughput: 0: 1651.3. Samples: 43461650. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:22:22,933][41694] Avg episode reward: [(0, '4.343')] +[2024-11-08 06:22:25,284][42004] Updated weights for policy 0, policy_version 47336 (0.0027) +[2024-11-08 06:22:27,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6485.3, 300 sec: 6734.1). Total num frames: 193904640. Throughput: 0: 1656.6. Samples: 43472998. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:22:27,933][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 06:22:30,874][42004] Updated weights for policy 0, policy_version 47346 (0.0028) +[2024-11-08 06:22:32,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 6720.2). Total num frames: 193937408. Throughput: 0: 1649.2. Samples: 43478226. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:22:32,935][41694] Avg episode reward: [(0, '4.715')] +[2024-11-08 06:22:37,558][42004] Updated weights for policy 0, policy_version 47356 (0.0041) +[2024-11-08 06:22:40,153][41694] Fps is (10 sec: 5362.3, 60 sec: 6385.5, 300 sec: 6656.2). Total num frames: 193970176. Throughput: 0: 1518.5. Samples: 43487594. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:22:40,155][41694] Avg episode reward: [(0, '4.704')] +[2024-11-08 06:22:42,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6348.8, 300 sec: 6636.9). Total num frames: 193990656. Throughput: 0: 1471.1. Samples: 43493486. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:22:42,934][41694] Avg episode reward: [(0, '4.537')] +[2024-11-08 06:22:45,699][42004] Updated weights for policy 0, policy_version 47366 (0.0032) +[2024-11-08 06:22:47,931][41694] Fps is (10 sec: 6845.8, 60 sec: 6280.8, 300 sec: 6609.1). Total num frames: 194023424. Throughput: 0: 1575.1. Samples: 43499044. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:22:47,933][41694] Avg episode reward: [(0, '4.664')] +[2024-11-08 06:22:51,433][42004] Updated weights for policy 0, policy_version 47376 (0.0030) +[2024-11-08 06:22:52,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6212.2, 300 sec: 6609.1). Total num frames: 194060288. Throughput: 0: 1612.0. Samples: 43509878. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:22:52,933][41694] Avg episode reward: [(0, '4.651')] +[2024-11-08 06:22:56,606][42004] Updated weights for policy 0, policy_version 47386 (0.0030) +[2024-11-08 06:22:57,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6280.6, 300 sec: 6706.3). Total num frames: 194101248. Throughput: 0: 1672.7. Samples: 43521632. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:22:57,933][41694] Avg episode reward: [(0, '4.371')] +[2024-11-08 06:23:02,248][42004] Updated weights for policy 0, policy_version 47396 (0.0035) +[2024-11-08 06:23:02,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6671.1, 300 sec: 6720.2). Total num frames: 194138112. Throughput: 0: 1677.7. Samples: 43527192. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:23:02,933][41694] Avg episode reward: [(0, '4.485')] +[2024-11-08 06:23:07,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6690.1, 300 sec: 6720.2). Total num frames: 194170880. Throughput: 0: 1687.2. Samples: 43537574. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:23:07,936][41694] Avg episode reward: [(0, '4.803')] +[2024-11-08 06:23:08,216][42004] Updated weights for policy 0, policy_version 47406 (0.0029) +[2024-11-08 06:23:14,912][41694] Fps is (10 sec: 5128.3, 60 sec: 6476.3, 300 sec: 6647.8). Total num frames: 194199552. Throughput: 0: 1586.9. Samples: 43547550. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:23:14,914][41694] Avg episode reward: [(0, '4.291')] +[2024-11-08 06:23:16,899][42004] Updated weights for policy 0, policy_version 47416 (0.0023) +[2024-11-08 06:23:17,931][41694] Fps is (10 sec: 4915.3, 60 sec: 6417.1, 300 sec: 6623.0). Total num frames: 194220032. Throughput: 0: 1561.2. Samples: 43548478. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:23:17,933][41694] Avg episode reward: [(0, '4.225')] +[2024-11-08 06:23:22,328][42004] Updated weights for policy 0, policy_version 47426 (0.0023) +[2024-11-08 06:23:22,933][41694] Fps is (10 sec: 7659.8, 60 sec: 6553.4, 300 sec: 6623.0). Total num frames: 194260992. Throughput: 0: 1675.4. Samples: 43559268. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:23:22,936][41694] Avg episode reward: [(0, '4.356')] +[2024-11-08 06:23:27,692][42004] Updated weights for policy 0, policy_version 47436 (0.0029) +[2024-11-08 06:23:27,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6553.6, 300 sec: 6623.0). Total num frames: 194297856. Throughput: 0: 1716.0. Samples: 43570704. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:23:27,934][41694] Avg episode reward: [(0, '4.354')] +[2024-11-08 06:23:32,932][41694] Fps is (10 sec: 7374.0, 60 sec: 6621.9, 300 sec: 6692.5). Total num frames: 194334720. Throughput: 0: 1721.4. Samples: 43576506. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:23:32,940][41694] Avg episode reward: [(0, '4.474')] +[2024-11-08 06:23:32,996][42004] Updated weights for policy 0, policy_version 47446 (0.0035) +[2024-11-08 06:23:37,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7018.3, 300 sec: 6734.1). Total num frames: 194375680. Throughput: 0: 1740.9. Samples: 43588220. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:23:37,932][41694] Avg episode reward: [(0, '4.424')] +[2024-11-08 06:23:37,948][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000047455_194375680.pth... +[2024-11-08 06:23:38,156][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000047067_192786432.pth +[2024-11-08 06:23:38,301][42004] Updated weights for policy 0, policy_version 47456 (0.0025) +[2024-11-08 06:23:42,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 6720.2). Total num frames: 194408448. Throughput: 0: 1708.4. Samples: 43598512. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:23:42,935][41694] Avg episode reward: [(0, '4.453')] +[2024-11-08 06:23:44,511][42004] Updated weights for policy 0, policy_version 47466 (0.0037) +[2024-11-08 06:23:49,816][41694] Fps is (10 sec: 5169.8, 60 sec: 6685.0, 300 sec: 6636.2). Total num frames: 194437120. Throughput: 0: 1632.1. Samples: 43603710. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:23:49,819][41694] Avg episode reward: [(0, '4.534')] +[2024-11-08 06:23:52,774][42004] Updated weights for policy 0, policy_version 47476 (0.0026) +[2024-11-08 06:23:52,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6690.1, 300 sec: 6636.9). Total num frames: 194461696. Throughput: 0: 1605.3. Samples: 43609812. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:23:52,933][41694] Avg episode reward: [(0, '4.380')] +[2024-11-08 06:23:57,931][41694] Fps is (10 sec: 7570.5, 60 sec: 6621.9, 300 sec: 6623.0). Total num frames: 194498560. Throughput: 0: 1708.3. Samples: 43621040. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:23:57,933][41694] Avg episode reward: [(0, '4.214')] +[2024-11-08 06:23:58,017][42004] Updated weights for policy 0, policy_version 47486 (0.0027) +[2024-11-08 06:24:02,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6690.1, 300 sec: 6636.9). Total num frames: 194539520. Throughput: 0: 1743.7. Samples: 43626944. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:24:02,935][41694] Avg episode reward: [(0, '4.408')] +[2024-11-08 06:24:03,422][42004] Updated weights for policy 0, policy_version 47496 (0.0041) +[2024-11-08 06:24:07,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 194576384. Throughput: 0: 1761.4. Samples: 43638530. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:24:07,934][41694] Avg episode reward: [(0, '4.528')] +[2024-11-08 06:24:08,498][42004] Updated weights for policy 0, policy_version 47506 (0.0027) +[2024-11-08 06:24:12,933][41694] Fps is (10 sec: 7782.5, 60 sec: 7200.9, 300 sec: 6748.0). Total num frames: 194617344. Throughput: 0: 1776.7. Samples: 43650654. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:24:12,935][41694] Avg episode reward: [(0, '4.389')] +[2024-11-08 06:24:13,948][42004] Updated weights for policy 0, policy_version 47516 (0.0040) +[2024-11-08 06:24:17,932][41694] Fps is (10 sec: 6962.9, 60 sec: 7099.7, 300 sec: 6706.3). Total num frames: 194646016. Throughput: 0: 1748.3. Samples: 43655180. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:24:17,936][41694] Avg episode reward: [(0, '4.515')] +[2024-11-08 06:24:20,292][42004] Updated weights for policy 0, policy_version 47526 (0.0027) +[2024-11-08 06:24:24,505][41694] Fps is (10 sec: 4954.7, 60 sec: 6718.9, 300 sec: 6643.1). Total num frames: 194674688. Throughput: 0: 1660.6. Samples: 43665558. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:24:24,507][41694] Avg episode reward: [(0, '4.671')] +[2024-11-08 06:24:27,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6690.1, 300 sec: 6623.0). Total num frames: 194699264. Throughput: 0: 1617.5. Samples: 43671298. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:24:27,935][41694] Avg episode reward: [(0, '4.595')] +[2024-11-08 06:24:28,677][42004] Updated weights for policy 0, policy_version 47536 (0.0032) +[2024-11-08 06:24:32,932][41694] Fps is (10 sec: 7777.5, 60 sec: 6758.4, 300 sec: 6623.0). Total num frames: 194740224. Throughput: 0: 1696.3. Samples: 43676848. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 06:24:32,934][41694] Avg episode reward: [(0, '4.511')] +[2024-11-08 06:24:33,904][42004] Updated weights for policy 0, policy_version 47546 (0.0023) +[2024-11-08 06:24:37,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6690.1, 300 sec: 6623.0). Total num frames: 194777088. Throughput: 0: 1747.8. Samples: 43688464. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 06:24:37,933][41694] Avg episode reward: [(0, '4.403')] +[2024-11-08 06:24:39,304][42004] Updated weights for policy 0, policy_version 47556 (0.0029) +[2024-11-08 06:24:42,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6692.5). Total num frames: 194813952. Throughput: 0: 1751.7. Samples: 43699866. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 06:24:42,933][41694] Avg episode reward: [(0, '4.440')] +[2024-11-08 06:24:44,778][42004] Updated weights for policy 0, policy_version 47566 (0.0035) +[2024-11-08 06:24:47,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7118.5, 300 sec: 6706.3). Total num frames: 194850816. Throughput: 0: 1747.6. Samples: 43705584. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 06:24:47,934][41694] Avg episode reward: [(0, '4.571')] +[2024-11-08 06:24:50,897][42004] Updated weights for policy 0, policy_version 47576 (0.0042) +[2024-11-08 06:24:52,932][41694] Fps is (10 sec: 6963.1, 60 sec: 7031.5, 300 sec: 6692.4). Total num frames: 194883584. Throughput: 0: 1709.8. Samples: 43715470. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 06:24:52,934][41694] Avg episode reward: [(0, '4.674')] +[2024-11-08 06:24:59,075][41694] Fps is (10 sec: 5145.9, 60 sec: 6699.0, 300 sec: 6611.3). Total num frames: 194908160. Throughput: 0: 1513.7. Samples: 43720500. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 06:24:59,078][41694] Avg episode reward: [(0, '4.316')] +[2024-11-08 06:24:59,441][42004] Updated weights for policy 0, policy_version 47586 (0.0023) +[2024-11-08 06:25:02,932][41694] Fps is (10 sec: 4915.2, 60 sec: 6553.6, 300 sec: 6581.4). Total num frames: 194932736. Throughput: 0: 1590.5. Samples: 43726752. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:25:02,933][41694] Avg episode reward: [(0, '4.477')] +[2024-11-08 06:25:05,439][42004] Updated weights for policy 0, policy_version 47596 (0.0029) +[2024-11-08 06:25:07,932][41694] Fps is (10 sec: 6937.5, 60 sec: 6553.6, 300 sec: 6595.3). Total num frames: 194969600. Throughput: 0: 1645.2. Samples: 43737004. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:25:07,934][41694] Avg episode reward: [(0, '4.454')] +[2024-11-08 06:25:10,838][42004] Updated weights for policy 0, policy_version 47606 (0.0021) +[2024-11-08 06:25:12,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6485.3, 300 sec: 6609.1). Total num frames: 195006464. Throughput: 0: 1711.7. Samples: 43748322. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:25:12,933][41694] Avg episode reward: [(0, '4.604')] +[2024-11-08 06:25:16,675][42004] Updated weights for policy 0, policy_version 47616 (0.0023) +[2024-11-08 06:25:17,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6678.6). Total num frames: 195043328. Throughput: 0: 1696.7. Samples: 43753200. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:25:17,933][41694] Avg episode reward: [(0, '4.542')] +[2024-11-08 06:25:22,796][42004] Updated weights for policy 0, policy_version 47626 (0.0034) +[2024-11-08 06:25:22,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6870.3, 300 sec: 6678.6). Total num frames: 195076096. Throughput: 0: 1669.3. Samples: 43763582. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:25:22,934][41694] Avg episode reward: [(0, '4.350')] +[2024-11-08 06:25:27,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6826.7, 300 sec: 6664.7). Total num frames: 195108864. Throughput: 0: 1644.6. Samples: 43773872. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:25:27,933][41694] Avg episode reward: [(0, '4.291')] +[2024-11-08 06:25:28,488][42004] Updated weights for policy 0, policy_version 47636 (0.0033) +[2024-11-08 06:25:33,711][41694] Fps is (10 sec: 5319.7, 60 sec: 6469.5, 300 sec: 6591.7). Total num frames: 195133440. Throughput: 0: 1608.3. Samples: 43779210. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:25:33,716][41694] Avg episode reward: [(0, '4.357')] +[2024-11-08 06:25:36,941][42004] Updated weights for policy 0, policy_version 47646 (0.0029) +[2024-11-08 06:25:37,932][41694] Fps is (10 sec: 5324.6, 60 sec: 6417.0, 300 sec: 6595.2). Total num frames: 195162112. Throughput: 0: 1550.7. Samples: 43785250. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:25:37,937][41694] Avg episode reward: [(0, '4.527')] +[2024-11-08 06:25:38,091][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000047648_195166208.pth... +[2024-11-08 06:25:38,189][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000047259_193572864.pth +[2024-11-08 06:25:42,358][42004] Updated weights for policy 0, policy_version 47656 (0.0032) +[2024-11-08 06:25:42,931][41694] Fps is (10 sec: 7552.1, 60 sec: 6485.3, 300 sec: 6609.1). Total num frames: 195203072. Throughput: 0: 1735.0. Samples: 43796590. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:25:42,933][41694] Avg episode reward: [(0, '4.643')] +[2024-11-08 06:25:47,671][42004] Updated weights for policy 0, policy_version 47666 (0.0030) +[2024-11-08 06:25:47,932][41694] Fps is (10 sec: 7782.6, 60 sec: 6485.3, 300 sec: 6623.0). Total num frames: 195239936. Throughput: 0: 1675.1. Samples: 43802130. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:25:47,933][41694] Avg episode reward: [(0, '4.541')] +[2024-11-08 06:25:52,838][42004] Updated weights for policy 0, policy_version 47676 (0.0031) +[2024-11-08 06:25:52,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6621.9, 300 sec: 6706.3). Total num frames: 195280896. Throughput: 0: 1713.2. Samples: 43814096. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:25:52,934][41694] Avg episode reward: [(0, '4.450')] +[2024-11-08 06:25:57,933][41694] Fps is (10 sec: 7371.4, 60 sec: 6889.5, 300 sec: 6692.4). Total num frames: 195313664. Throughput: 0: 1703.7. Samples: 43824990. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:25:57,935][41694] Avg episode reward: [(0, '4.403')] +[2024-11-08 06:25:58,797][42004] Updated weights for policy 0, policy_version 47686 (0.0021) +[2024-11-08 06:26:02,933][41694] Fps is (10 sec: 6552.3, 60 sec: 6894.7, 300 sec: 6678.5). Total num frames: 195346432. Throughput: 0: 1703.9. Samples: 43829878. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:26:02,936][41694] Avg episode reward: [(0, '4.365')] +[2024-11-08 06:26:05,074][42004] Updated weights for policy 0, policy_version 47696 (0.0033) +[2024-11-08 06:26:08,282][41694] Fps is (10 sec: 5145.5, 60 sec: 6583.4, 300 sec: 6629.0). Total num frames: 195366912. Throughput: 0: 1575.2. Samples: 43835018. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:26:08,290][41694] Avg episode reward: [(0, '4.622')] +[2024-11-08 06:26:12,932][41694] Fps is (10 sec: 4915.8, 60 sec: 6485.2, 300 sec: 6609.2). Total num frames: 195395584. Throughput: 0: 1593.1. Samples: 43845564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:26:12,935][41694] Avg episode reward: [(0, '4.354')] +[2024-11-08 06:26:13,647][42004] Updated weights for policy 0, policy_version 47706 (0.0039) +[2024-11-08 06:26:17,931][41694] Fps is (10 sec: 6791.6, 60 sec: 6485.3, 300 sec: 6609.1). Total num frames: 195432448. Throughput: 0: 1617.3. Samples: 43850728. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:26:17,934][41694] Avg episode reward: [(0, '4.498')] +[2024-11-08 06:26:19,195][42004] Updated weights for policy 0, policy_version 47716 (0.0035) +[2024-11-08 06:26:22,931][41694] Fps is (10 sec: 7783.1, 60 sec: 6621.9, 300 sec: 6636.9). Total num frames: 195473408. Throughput: 0: 1707.6. Samples: 43862090. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:26:22,934][41694] Avg episode reward: [(0, '4.361')] +[2024-11-08 06:26:24,571][42004] Updated weights for policy 0, policy_version 47726 (0.0028) +[2024-11-08 06:26:27,932][41694] Fps is (10 sec: 7782.1, 60 sec: 6690.1, 300 sec: 6692.4). Total num frames: 195510272. Throughput: 0: 1713.8. Samples: 43873714. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:26:27,935][41694] Avg episode reward: [(0, '4.571')] +[2024-11-08 06:26:30,560][42004] Updated weights for policy 0, policy_version 47736 (0.0052) +[2024-11-08 06:26:32,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6847.4, 300 sec: 6664.7). Total num frames: 195538944. Throughput: 0: 1692.4. Samples: 43878288. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:26:32,934][41694] Avg episode reward: [(0, '4.290')] +[2024-11-08 06:26:36,221][42004] Updated weights for policy 0, policy_version 47746 (0.0023) +[2024-11-08 06:26:37,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6894.9, 300 sec: 6664.7). Total num frames: 195575808. Throughput: 0: 1664.7. Samples: 43889008. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:26:37,933][41694] Avg episode reward: [(0, '4.574')] +[2024-11-08 06:26:42,944][41694] Fps is (10 sec: 5318.1, 60 sec: 6484.0, 300 sec: 6595.0). Total num frames: 195592192. Throughput: 0: 1534.7. Samples: 43894068. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:26:42,946][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 06:26:44,850][42004] Updated weights for policy 0, policy_version 47756 (0.0032) +[2024-11-08 06:26:47,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6485.3, 300 sec: 6581.4). Total num frames: 195629056. Throughput: 0: 1558.4. Samples: 43900002. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:26:47,933][41694] Avg episode reward: [(0, '4.600')] +[2024-11-08 06:26:50,569][42004] Updated weights for policy 0, policy_version 47766 (0.0037) +[2024-11-08 06:26:52,931][41694] Fps is (10 sec: 7382.2, 60 sec: 6417.1, 300 sec: 6581.4). Total num frames: 195665920. Throughput: 0: 1696.6. Samples: 43910770. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:26:52,934][41694] Avg episode reward: [(0, '4.492')] +[2024-11-08 06:26:56,079][42004] Updated weights for policy 0, policy_version 47776 (0.0029) +[2024-11-08 06:26:57,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6485.6, 300 sec: 6660.7). Total num frames: 195702784. Throughput: 0: 1697.9. Samples: 43921966. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:26:57,933][41694] Avg episode reward: [(0, '4.558')] +[2024-11-08 06:27:02,076][42004] Updated weights for policy 0, policy_version 47786 (0.0026) +[2024-11-08 06:27:02,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6485.5, 300 sec: 6664.7). Total num frames: 195735552. Throughput: 0: 1695.8. Samples: 43927040. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:27:02,933][41694] Avg episode reward: [(0, '4.391')] +[2024-11-08 06:27:07,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6729.4, 300 sec: 6678.6). Total num frames: 195768320. Throughput: 0: 1657.1. Samples: 43936660. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:27:07,933][41694] Avg episode reward: [(0, '4.678')] +[2024-11-08 06:27:08,359][42004] Updated weights for policy 0, policy_version 47796 (0.0039) +[2024-11-08 06:27:12,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6826.7, 300 sec: 6678.6). Total num frames: 195805184. Throughput: 0: 1643.6. Samples: 43947676. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:27:12,935][41694] Avg episode reward: [(0, '4.420')] +[2024-11-08 06:27:13,808][42004] Updated weights for policy 0, policy_version 47806 (0.0022) +[2024-11-08 06:27:17,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 195829760. Throughput: 0: 1661.7. Samples: 43953064. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:27:17,933][41694] Avg episode reward: [(0, '4.197')] +[2024-11-08 06:27:21,249][42004] Updated weights for policy 0, policy_version 47816 (0.0023) +[2024-11-08 06:27:22,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6485.3, 300 sec: 6636.9). Total num frames: 195862528. Throughput: 0: 1592.3. Samples: 43960660. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:27:22,933][41694] Avg episode reward: [(0, '4.535')] +[2024-11-08 06:27:26,685][42004] Updated weights for policy 0, policy_version 47826 (0.0022) +[2024-11-08 06:27:27,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 195903488. Throughput: 0: 1734.3. Samples: 43972092. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:27:27,936][41694] Avg episode reward: [(0, '4.410')] +[2024-11-08 06:27:32,040][42004] Updated weights for policy 0, policy_version 47836 (0.0024) +[2024-11-08 06:27:32,932][41694] Fps is (10 sec: 7782.0, 60 sec: 6690.1, 300 sec: 6729.2). Total num frames: 195940352. Throughput: 0: 1724.6. Samples: 43977608. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:27:32,935][41694] Avg episode reward: [(0, '4.363')] +[2024-11-08 06:27:37,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 195973120. Throughput: 0: 1729.8. Samples: 43988610. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:27:37,934][41694] Avg episode reward: [(0, '4.547')] +[2024-11-08 06:27:38,048][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000047846_195977216.pth... +[2024-11-08 06:27:38,053][42004] Updated weights for policy 0, policy_version 47846 (0.0029) +[2024-11-08 06:27:38,253][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000047455_194375680.pth +[2024-11-08 06:27:42,932][41694] Fps is (10 sec: 6963.6, 60 sec: 6964.7, 300 sec: 6734.1). Total num frames: 196009984. Throughput: 0: 1701.1. Samples: 43998514. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:27:42,934][41694] Avg episode reward: [(0, '4.521')] +[2024-11-08 06:27:43,936][42004] Updated weights for policy 0, policy_version 47856 (0.0023) +[2024-11-08 06:27:47,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6894.9, 300 sec: 6720.2). Total num frames: 196042752. Throughput: 0: 1709.2. Samples: 44003952. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:27:47,933][41694] Avg episode reward: [(0, '4.619')] +[2024-11-08 06:27:51,989][42004] Updated weights for policy 0, policy_version 47866 (0.0032) +[2024-11-08 06:27:52,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.8, 300 sec: 6650.8). Total num frames: 196063232. Throughput: 0: 1659.2. Samples: 44011326. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:27:52,934][41694] Avg episode reward: [(0, '4.585')] +[2024-11-08 06:27:57,388][42004] Updated weights for policy 0, policy_version 47876 (0.0023) +[2024-11-08 06:27:57,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6690.1, 300 sec: 6664.7). Total num frames: 196104192. Throughput: 0: 1650.3. Samples: 44021940. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:27:57,934][41694] Avg episode reward: [(0, '4.541')] +[2024-11-08 06:28:02,852][42004] Updated weights for policy 0, policy_version 47886 (0.0041) +[2024-11-08 06:28:02,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.4, 300 sec: 6678.6). Total num frames: 196141056. Throughput: 0: 1657.1. Samples: 44027634. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:28:02,933][41694] Avg episode reward: [(0, '4.393')] +[2024-11-08 06:28:07,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6751.7). Total num frames: 196177920. Throughput: 0: 1742.6. Samples: 44039076. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:28:07,933][41694] Avg episode reward: [(0, '4.479')] +[2024-11-08 06:28:08,004][42004] Updated weights for policy 0, policy_version 47896 (0.0023) +[2024-11-08 06:28:12,934][41694] Fps is (10 sec: 7370.8, 60 sec: 6826.4, 300 sec: 6761.8). Total num frames: 196214784. Throughput: 0: 1727.8. Samples: 44049846. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:28:12,936][41694] Avg episode reward: [(0, '4.411')] +[2024-11-08 06:28:13,968][42004] Updated weights for policy 0, policy_version 47906 (0.0043) +[2024-11-08 06:28:17,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7031.5, 300 sec: 6748.0). Total num frames: 196251648. Throughput: 0: 1720.8. Samples: 44055044. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:28:17,934][41694] Avg episode reward: [(0, '4.458')] +[2024-11-08 06:28:19,642][42004] Updated weights for policy 0, policy_version 47916 (0.0030) +[2024-11-08 06:28:22,931][41694] Fps is (10 sec: 6965.1, 60 sec: 7031.5, 300 sec: 6734.1). Total num frames: 196284416. Throughput: 0: 1718.5. Samples: 44065944. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:28:22,933][41694] Avg episode reward: [(0, '4.518')] +[2024-11-08 06:28:27,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6621.9, 300 sec: 6664.7). Total num frames: 196300800. Throughput: 0: 1632.4. Samples: 44071974. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:28:27,935][41694] Avg episode reward: [(0, '4.584')] +[2024-11-08 06:28:28,007][42004] Updated weights for policy 0, policy_version 47926 (0.0028) +[2024-11-08 06:28:32,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.2, 300 sec: 6664.7). Total num frames: 196341760. Throughput: 0: 1637.6. Samples: 44077646. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:28:32,934][41694] Avg episode reward: [(0, '4.456')] +[2024-11-08 06:28:33,171][42004] Updated weights for policy 0, policy_version 47936 (0.0027) +[2024-11-08 06:28:37,932][41694] Fps is (10 sec: 8192.1, 60 sec: 6826.7, 300 sec: 6692.4). Total num frames: 196382720. Throughput: 0: 1735.4. Samples: 44089420. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:28:37,937][41694] Avg episode reward: [(0, '4.858')] +[2024-11-08 06:28:38,333][42004] Updated weights for policy 0, policy_version 47946 (0.0021) +[2024-11-08 06:28:42,933][41694] Fps is (10 sec: 7371.5, 60 sec: 6758.2, 300 sec: 6749.4). Total num frames: 196415488. Throughput: 0: 1754.7. Samples: 44100906. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:28:42,936][41694] Avg episode reward: [(0, '4.450')] +[2024-11-08 06:28:44,438][42004] Updated weights for policy 0, policy_version 47956 (0.0028) +[2024-11-08 06:28:47,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 196448256. Throughput: 0: 1726.2. Samples: 44105314. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:28:47,934][41694] Avg episode reward: [(0, '4.457')] +[2024-11-08 06:28:50,676][42004] Updated weights for policy 0, policy_version 47966 (0.0027) +[2024-11-08 06:28:52,931][41694] Fps is (10 sec: 6964.5, 60 sec: 7031.5, 300 sec: 6734.1). Total num frames: 196485120. Throughput: 0: 1693.7. Samples: 44115292. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:28:52,933][41694] Avg episode reward: [(0, '4.611')] +[2024-11-08 06:28:56,369][42004] Updated weights for policy 0, policy_version 47976 (0.0028) +[2024-11-08 06:28:57,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 6706.3). Total num frames: 196517888. Throughput: 0: 1693.5. Samples: 44126050. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:28:57,933][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 06:29:02,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6553.6, 300 sec: 6636.9). Total num frames: 196534272. Throughput: 0: 1625.0. Samples: 44128170. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:29:02,934][41694] Avg episode reward: [(0, '4.531')] +[2024-11-08 06:29:05,017][42004] Updated weights for policy 0, policy_version 47986 (0.0035) +[2024-11-08 06:29:07,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6553.6, 300 sec: 6623.0). Total num frames: 196571136. Throughput: 0: 1580.2. Samples: 44137052. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:29:07,933][41694] Avg episode reward: [(0, '4.329')] +[2024-11-08 06:29:10,699][42004] Updated weights for policy 0, policy_version 47996 (0.0028) +[2024-11-08 06:29:12,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6553.9, 300 sec: 6650.8). Total num frames: 196608000. Throughput: 0: 1687.0. Samples: 44147890. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:29:12,933][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 06:29:16,383][42004] Updated weights for policy 0, policy_version 48006 (0.0032) +[2024-11-08 06:29:17,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6485.4, 300 sec: 6700.4). Total num frames: 196640768. Throughput: 0: 1673.1. Samples: 44152934. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:29:17,933][41694] Avg episode reward: [(0, '4.375')] +[2024-11-08 06:29:22,677][42004] Updated weights for policy 0, policy_version 48016 (0.0029) +[2024-11-08 06:29:22,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6485.3, 300 sec: 6692.5). Total num frames: 196673536. Throughput: 0: 1636.4. Samples: 44163060. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:29:22,934][41694] Avg episode reward: [(0, '4.232')] +[2024-11-08 06:29:27,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6678.6). Total num frames: 196710400. Throughput: 0: 1626.2. Samples: 44174082. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:29:27,933][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 06:29:28,095][42004] Updated weights for policy 0, policy_version 48026 (0.0031) +[2024-11-08 06:29:32,932][41694] Fps is (10 sec: 7782.0, 60 sec: 6826.6, 300 sec: 6692.4). Total num frames: 196751360. Throughput: 0: 1653.1. Samples: 44179706. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:29:32,938][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 06:29:35,418][42004] Updated weights for policy 0, policy_version 48036 (0.0031) +[2024-11-08 06:29:37,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6485.3, 300 sec: 6636.9). Total num frames: 196771840. Throughput: 0: 1605.5. Samples: 44187540. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:29:37,933][41694] Avg episode reward: [(0, '4.373')] +[2024-11-08 06:29:37,945][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000048040_196771840.pth... +[2024-11-08 06:29:38,059][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000047648_195166208.pth +[2024-11-08 06:29:41,049][42004] Updated weights for policy 0, policy_version 48046 (0.0033) +[2024-11-08 06:29:42,932][41694] Fps is (10 sec: 5734.6, 60 sec: 6553.8, 300 sec: 6636.9). Total num frames: 196808704. Throughput: 0: 1609.4. Samples: 44198472. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:29:42,933][41694] Avg episode reward: [(0, '4.333')] +[2024-11-08 06:29:46,724][42004] Updated weights for policy 0, policy_version 48056 (0.0026) +[2024-11-08 06:29:47,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 196845568. Throughput: 0: 1675.5. Samples: 44203568. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:29:47,933][41694] Avg episode reward: [(0, '4.382')] +[2024-11-08 06:29:52,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6485.3, 300 sec: 6690.6). Total num frames: 196874240. Throughput: 0: 1716.9. Samples: 44214314. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:29:52,934][41694] Avg episode reward: [(0, '4.605')] +[2024-11-08 06:29:53,335][42004] Updated weights for policy 0, policy_version 48066 (0.0036) +[2024-11-08 06:29:57,932][41694] Fps is (10 sec: 6143.7, 60 sec: 6485.3, 300 sec: 6692.4). Total num frames: 196907008. Throughput: 0: 1671.5. Samples: 44223110. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:29:57,935][41694] Avg episode reward: [(0, '4.634')] +[2024-11-08 06:29:59,492][42004] Updated weights for policy 0, policy_version 48076 (0.0030) +[2024-11-08 06:30:02,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6758.4, 300 sec: 6678.6). Total num frames: 196939776. Throughput: 0: 1669.5. Samples: 44228060. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 06:30:02,933][41694] Avg episode reward: [(0, '4.496')] +[2024-11-08 06:30:05,564][42004] Updated weights for policy 0, policy_version 48086 (0.0032) +[2024-11-08 06:30:09,459][41694] Fps is (10 sec: 5330.2, 60 sec: 6457.5, 300 sec: 6616.5). Total num frames: 196968448. Throughput: 0: 1620.6. Samples: 44238464. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 06:30:09,461][41694] Avg episode reward: [(0, '4.454')] +[2024-11-08 06:30:12,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6485.4, 300 sec: 6623.0). Total num frames: 196997120. Throughput: 0: 1577.4. Samples: 44245066. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 06:30:12,933][41694] Avg episode reward: [(0, '4.614')] +[2024-11-08 06:30:13,513][42004] Updated weights for policy 0, policy_version 48096 (0.0035) +[2024-11-08 06:30:17,931][41694] Fps is (10 sec: 6768.1, 60 sec: 6417.1, 300 sec: 6609.1). Total num frames: 197025792. Throughput: 0: 1561.2. Samples: 44249960. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 06:30:17,933][41694] Avg episode reward: [(0, '4.576')] +[2024-11-08 06:30:20,198][42004] Updated weights for policy 0, policy_version 48106 (0.0031) +[2024-11-08 06:30:22,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6417.1, 300 sec: 6609.1). Total num frames: 197058560. Throughput: 0: 1603.1. Samples: 44259678. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 06:30:22,933][41694] Avg episode reward: [(0, '4.633')] +[2024-11-08 06:30:26,371][42004] Updated weights for policy 0, policy_version 48116 (0.0033) +[2024-11-08 06:30:27,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6348.8, 300 sec: 6654.5). Total num frames: 197091328. Throughput: 0: 1577.1. Samples: 44269440. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 06:30:27,934][41694] Avg episode reward: [(0, '4.807')] +[2024-11-08 06:30:32,353][42004] Updated weights for policy 0, policy_version 48126 (0.0034) +[2024-11-08 06:30:32,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6280.6, 300 sec: 6664.7). Total num frames: 197128192. Throughput: 0: 1569.5. Samples: 44274198. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:30:32,934][41694] Avg episode reward: [(0, '4.537')] +[2024-11-08 06:30:37,684][42004] Updated weights for policy 0, policy_version 48136 (0.0036) +[2024-11-08 06:30:37,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 197165056. Throughput: 0: 1584.5. Samples: 44285618. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:30:37,939][41694] Avg episode reward: [(0, '4.356')] +[2024-11-08 06:30:43,910][41694] Fps is (10 sec: 5969.9, 60 sec: 6314.2, 300 sec: 6601.1). Total num frames: 197193728. Throughput: 0: 1481.5. Samples: 44291228. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:30:43,911][41694] Avg episode reward: [(0, '4.525')] +[2024-11-08 06:30:45,481][42004] Updated weights for policy 0, policy_version 48146 (0.0040) +[2024-11-08 06:30:47,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6280.5, 300 sec: 6581.4). Total num frames: 197222400. Throughput: 0: 1563.3. Samples: 44298408. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:30:47,933][41694] Avg episode reward: [(0, '4.451')] +[2024-11-08 06:30:51,067][42004] Updated weights for policy 0, policy_version 48156 (0.0024) +[2024-11-08 06:30:52,931][41694] Fps is (10 sec: 7264.1, 60 sec: 6417.1, 300 sec: 6595.3). Total num frames: 197259264. Throughput: 0: 1627.4. Samples: 44309210. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:30:52,933][41694] Avg episode reward: [(0, '4.255')] +[2024-11-08 06:30:56,691][42004] Updated weights for policy 0, policy_version 48166 (0.0037) +[2024-11-08 06:30:57,935][41694] Fps is (10 sec: 7370.5, 60 sec: 6485.0, 300 sec: 6609.1). Total num frames: 197296128. Throughput: 0: 1668.6. Samples: 44320160. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:30:57,937][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 06:31:02,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6644.8). Total num frames: 197324800. Throughput: 0: 1670.6. Samples: 44325138. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:31:02,933][41694] Avg episode reward: [(0, '4.254')] +[2024-11-08 06:31:02,990][42004] Updated weights for policy 0, policy_version 48176 (0.0040) +[2024-11-08 06:31:07,931][41694] Fps is (10 sec: 6555.7, 60 sec: 6724.8, 300 sec: 6664.7). Total num frames: 197361664. Throughput: 0: 1670.4. Samples: 44334846. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:31:07,933][41694] Avg episode reward: [(0, '4.400')] +[2024-11-08 06:31:08,711][42004] Updated weights for policy 0, policy_version 48186 (0.0039) +[2024-11-08 06:31:12,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 197394432. Throughput: 0: 1680.3. Samples: 44345054. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:31:12,933][41694] Avg episode reward: [(0, '4.365')] +[2024-11-08 06:31:15,182][42004] Updated weights for policy 0, policy_version 48196 (0.0029) +[2024-11-08 06:31:18,386][41694] Fps is (10 sec: 5093.1, 60 sec: 6436.5, 300 sec: 6571.2). Total num frames: 197414912. Throughput: 0: 1667.2. Samples: 44349978. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:31:18,389][41694] Avg episode reward: [(0, '4.258')] +[2024-11-08 06:31:22,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6485.3, 300 sec: 6567.5). Total num frames: 197447680. Throughput: 0: 1586.2. Samples: 44356996. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:31:22,934][41694] Avg episode reward: [(0, '4.564')] +[2024-11-08 06:31:22,976][42004] Updated weights for policy 0, policy_version 48206 (0.0040) +[2024-11-08 06:31:27,931][41694] Fps is (10 sec: 7724.1, 60 sec: 6621.9, 300 sec: 6609.1). Total num frames: 197488640. Throughput: 0: 1749.2. Samples: 44368230. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:31:27,933][41694] Avg episode reward: [(0, '4.829')] +[2024-11-08 06:31:28,309][42004] Updated weights for policy 0, policy_version 48216 (0.0029) +[2024-11-08 06:31:32,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6621.9, 300 sec: 6609.1). Total num frames: 197525504. Throughput: 0: 1674.2. Samples: 44373748. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:31:32,934][41694] Avg episode reward: [(0, '4.358')] +[2024-11-08 06:31:33,940][42004] Updated weights for policy 0, policy_version 48226 (0.0023) +[2024-11-08 06:31:37,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6553.6, 300 sec: 6665.0). Total num frames: 197558272. Throughput: 0: 1663.6. Samples: 44384072. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:31:37,936][41694] Avg episode reward: [(0, '4.318')] +[2024-11-08 06:31:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000048232_197558272.pth... +[2024-11-08 06:31:38,057][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000047846_195977216.pth +[2024-11-08 06:31:40,093][42004] Updated weights for policy 0, policy_version 48236 (0.0039) +[2024-11-08 06:31:42,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6801.0, 300 sec: 6664.7). Total num frames: 197595136. Throughput: 0: 1660.6. Samples: 44394880. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:31:42,935][41694] Avg episode reward: [(0, '4.387')] +[2024-11-08 06:31:45,677][42004] Updated weights for policy 0, policy_version 48246 (0.0033) +[2024-11-08 06:31:47,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6664.7). Total num frames: 197632000. Throughput: 0: 1669.0. Samples: 44400244. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:31:47,934][41694] Avg episode reward: [(0, '4.580')] +[2024-11-08 06:31:52,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6553.6, 300 sec: 6609.1). Total num frames: 197652480. Throughput: 0: 1687.1. Samples: 44410764. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:31:52,934][41694] Avg episode reward: [(0, '4.538')] +[2024-11-08 06:31:53,318][42004] Updated weights for policy 0, policy_version 48256 (0.0033) +[2024-11-08 06:31:57,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6553.9, 300 sec: 6623.0). Total num frames: 197689344. Throughput: 0: 1637.3. Samples: 44418734. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:31:57,933][41694] Avg episode reward: [(0, '4.513')] +[2024-11-08 06:31:58,665][42004] Updated weights for policy 0, policy_version 48266 (0.0028) +[2024-11-08 06:32:02,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6636.9). Total num frames: 197726208. Throughput: 0: 1673.7. Samples: 44424534. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:32:02,935][41694] Avg episode reward: [(0, '4.555')] +[2024-11-08 06:32:04,447][42004] Updated weights for policy 0, policy_version 48276 (0.0034) +[2024-11-08 06:32:07,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6621.8, 300 sec: 6623.0). Total num frames: 197758976. Throughput: 0: 1731.6. Samples: 44434918. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:32:07,934][41694] Avg episode reward: [(0, '4.500')] +[2024-11-08 06:32:10,687][42004] Updated weights for policy 0, policy_version 48286 (0.0023) +[2024-11-08 06:32:12,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6621.8, 300 sec: 6650.8). Total num frames: 197791744. Throughput: 0: 1700.6. Samples: 44444758. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:32:12,934][41694] Avg episode reward: [(0, '4.392')] +[2024-11-08 06:32:16,615][42004] Updated weights for policy 0, policy_version 48296 (0.0022) +[2024-11-08 06:32:17,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6947.6, 300 sec: 6664.7). Total num frames: 197828608. Throughput: 0: 1685.2. Samples: 44449580. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:32:17,935][41694] Avg episode reward: [(0, '4.520')] +[2024-11-08 06:32:22,235][42004] Updated weights for policy 0, policy_version 48306 (0.0025) +[2024-11-08 06:32:22,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6650.8). Total num frames: 197865472. Throughput: 0: 1699.3. Samples: 44460542. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:32:22,934][41694] Avg episode reward: [(0, '4.746')] +[2024-11-08 06:32:27,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6621.9, 300 sec: 6595.3). Total num frames: 197885952. Throughput: 0: 1625.1. Samples: 44468010. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:32:27,933][41694] Avg episode reward: [(0, '4.591')] +[2024-11-08 06:32:30,058][42004] Updated weights for policy 0, policy_version 48316 (0.2258) +[2024-11-08 06:32:32,931][41694] Fps is (10 sec: 5734.6, 60 sec: 6621.9, 300 sec: 6609.1). Total num frames: 197922816. Throughput: 0: 1618.0. Samples: 44473052. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:32:32,933][41694] Avg episode reward: [(0, '4.260')] +[2024-11-08 06:32:35,472][42004] Updated weights for policy 0, policy_version 48326 (0.0029) +[2024-11-08 06:32:37,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6609.1). Total num frames: 197959680. Throughput: 0: 1633.1. Samples: 44484254. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:32:37,934][41694] Avg episode reward: [(0, '4.376')] +[2024-11-08 06:32:41,167][42004] Updated weights for policy 0, policy_version 48336 (0.0032) +[2024-11-08 06:32:42,934][41694] Fps is (10 sec: 6961.6, 60 sec: 6621.6, 300 sec: 6609.1). Total num frames: 197992448. Throughput: 0: 1690.4. Samples: 44494806. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:32:42,935][41694] Avg episode reward: [(0, '4.534')] +[2024-11-08 06:32:47,484][42004] Updated weights for policy 0, policy_version 48346 (0.0053) +[2024-11-08 06:32:47,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 198025216. Throughput: 0: 1659.0. Samples: 44499188. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:32:47,934][41694] Avg episode reward: [(0, '4.430')] +[2024-11-08 06:32:52,931][41694] Fps is (10 sec: 6964.8, 60 sec: 6826.7, 300 sec: 6636.9). Total num frames: 198062080. Throughput: 0: 1675.8. Samples: 44510330. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:32:52,933][41694] Avg episode reward: [(0, '4.377')] +[2024-11-08 06:32:52,951][42004] Updated weights for policy 0, policy_version 48356 (0.0023) +[2024-11-08 06:32:57,932][41694] Fps is (10 sec: 7781.9, 60 sec: 6894.8, 300 sec: 6650.8). Total num frames: 198103040. Throughput: 0: 1711.4. Samples: 44521770. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:32:57,935][41694] Avg episode reward: [(0, '4.502')] +[2024-11-08 06:32:58,316][42004] Updated weights for policy 0, policy_version 48366 (0.0031) +[2024-11-08 06:33:02,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6553.6, 300 sec: 6581.4). Total num frames: 198119424. Throughput: 0: 1712.9. Samples: 44526660. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:33:02,934][41694] Avg episode reward: [(0, '4.568')] +[2024-11-08 06:33:06,488][42004] Updated weights for policy 0, policy_version 48376 (0.0030) +[2024-11-08 06:33:07,944][41694] Fps is (10 sec: 5318.9, 60 sec: 6620.6, 300 sec: 6581.2). Total num frames: 198156288. Throughput: 0: 1625.4. Samples: 44533704. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:33:07,946][41694] Avg episode reward: [(0, '4.625')] +[2024-11-08 06:33:12,084][42004] Updated weights for policy 0, policy_version 48386 (0.0026) +[2024-11-08 06:33:12,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.2, 300 sec: 6581.4). Total num frames: 198193152. Throughput: 0: 1700.7. Samples: 44544540. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:33:12,934][41694] Avg episode reward: [(0, '4.478')] +[2024-11-08 06:33:17,932][41694] Fps is (10 sec: 6971.1, 60 sec: 6621.8, 300 sec: 6581.4). Total num frames: 198225920. Throughput: 0: 1707.2. Samples: 44549878. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:33:17,933][41694] Avg episode reward: [(0, '4.424')] +[2024-11-08 06:33:18,003][42004] Updated weights for policy 0, policy_version 48396 (0.0024) +[2024-11-08 06:33:22,932][41694] Fps is (10 sec: 6962.5, 60 sec: 6621.8, 300 sec: 6650.8). Total num frames: 198262784. Throughput: 0: 1688.2. Samples: 44560224. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:33:22,934][41694] Avg episode reward: [(0, '4.457')] +[2024-11-08 06:33:23,676][42004] Updated weights for policy 0, policy_version 48406 (0.0028) +[2024-11-08 06:33:27,931][41694] Fps is (10 sec: 7373.2, 60 sec: 6894.9, 300 sec: 6636.9). Total num frames: 198299648. Throughput: 0: 1705.5. Samples: 44571550. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:33:27,933][41694] Avg episode reward: [(0, '4.557')] +[2024-11-08 06:33:29,033][42004] Updated weights for policy 0, policy_version 48416 (0.0039) +[2024-11-08 06:33:32,932][41694] Fps is (10 sec: 7373.4, 60 sec: 6894.9, 300 sec: 6623.0). Total num frames: 198336512. Throughput: 0: 1727.2. Samples: 44576910. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:33:32,933][41694] Avg episode reward: [(0, '4.333')] +[2024-11-08 06:33:37,133][42004] Updated weights for policy 0, policy_version 48426 (0.0047) +[2024-11-08 06:33:37,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6621.9, 300 sec: 6581.4). Total num frames: 198356992. Throughput: 0: 1642.8. Samples: 44584258. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:33:37,934][41694] Avg episode reward: [(0, '4.223')] +[2024-11-08 06:33:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000048427_198356992.pth... +[2024-11-08 06:33:38,074][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000048040_196771840.pth +[2024-11-08 06:33:42,877][42004] Updated weights for policy 0, policy_version 48436 (0.0032) +[2024-11-08 06:33:42,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6690.3, 300 sec: 6595.2). Total num frames: 198393856. Throughput: 0: 1611.8. Samples: 44594300. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:33:42,933][41694] Avg episode reward: [(0, '4.571')] +[2024-11-08 06:33:47,933][41694] Fps is (10 sec: 7371.8, 60 sec: 6758.3, 300 sec: 6595.2). Total num frames: 198430720. Throughput: 0: 1625.5. Samples: 44599812. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:33:47,937][41694] Avg episode reward: [(0, '4.767')] +[2024-11-08 06:33:48,424][42004] Updated weights for policy 0, policy_version 48446 (0.0031) +[2024-11-08 06:33:52,937][41694] Fps is (10 sec: 6959.6, 60 sec: 6689.5, 300 sec: 6595.1). Total num frames: 198463488. Throughput: 0: 1697.8. Samples: 44610092. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:33:52,943][41694] Avg episode reward: [(0, '4.524')] +[2024-11-08 06:33:54,633][42004] Updated weights for policy 0, policy_version 48456 (0.0020) +[2024-11-08 06:33:57,931][41694] Fps is (10 sec: 6964.2, 60 sec: 6622.0, 300 sec: 6664.7). Total num frames: 198500352. Throughput: 0: 1697.2. Samples: 44620914. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:33:57,933][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 06:34:00,234][42004] Updated weights for policy 0, policy_version 48466 (0.0035) +[2024-11-08 06:34:02,932][41694] Fps is (10 sec: 6967.0, 60 sec: 6894.9, 300 sec: 6650.8). Total num frames: 198533120. Throughput: 0: 1699.7. Samples: 44626364. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:34:02,934][41694] Avg episode reward: [(0, '4.496')] +[2024-11-08 06:34:06,147][42004] Updated weights for policy 0, policy_version 48476 (0.0033) +[2024-11-08 06:34:07,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6828.0, 300 sec: 6636.9). Total num frames: 198565888. Throughput: 0: 1697.9. Samples: 44636630. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:34:07,934][41694] Avg episode reward: [(0, '4.529')] +[2024-11-08 06:34:12,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6553.6, 300 sec: 6595.2). Total num frames: 198586368. Throughput: 0: 1591.1. Samples: 44643148. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:34:12,935][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 06:34:14,162][42004] Updated weights for policy 0, policy_version 48486 (0.0040) +[2024-11-08 06:34:17,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6621.9, 300 sec: 6609.1). Total num frames: 198623232. Throughput: 0: 1594.2. Samples: 44648648. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:34:17,939][41694] Avg episode reward: [(0, '4.571')] +[2024-11-08 06:34:19,792][42004] Updated weights for policy 0, policy_version 48496 (0.0036) +[2024-11-08 06:34:22,933][41694] Fps is (10 sec: 7372.1, 60 sec: 6621.9, 300 sec: 6609.1). Total num frames: 198660096. Throughput: 0: 1679.6. Samples: 44659840. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:34:22,935][41694] Avg episode reward: [(0, '4.314')] +[2024-11-08 06:34:25,781][42004] Updated weights for policy 0, policy_version 48506 (0.0028) +[2024-11-08 06:34:27,931][41694] Fps is (10 sec: 6963.6, 60 sec: 6553.6, 300 sec: 6581.4). Total num frames: 198692864. Throughput: 0: 1673.8. Samples: 44669622. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:34:27,933][41694] Avg episode reward: [(0, '4.350')] +[2024-11-08 06:34:31,699][42004] Updated weights for policy 0, policy_version 48516 (0.0027) +[2024-11-08 06:34:32,931][41694] Fps is (10 sec: 6964.0, 60 sec: 6553.6, 300 sec: 6636.9). Total num frames: 198729728. Throughput: 0: 1671.3. Samples: 44675016. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:34:32,933][41694] Avg episode reward: [(0, '4.678')] +[2024-11-08 06:34:37,154][42004] Updated weights for policy 0, policy_version 48526 (0.0030) +[2024-11-08 06:34:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6636.9). Total num frames: 198766592. Throughput: 0: 1685.8. Samples: 44685944. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:34:37,933][41694] Avg episode reward: [(0, '4.262')] +[2024-11-08 06:34:42,678][42004] Updated weights for policy 0, policy_version 48536 (0.0029) +[2024-11-08 06:34:42,938][41694] Fps is (10 sec: 7367.9, 60 sec: 6826.0, 300 sec: 6636.8). Total num frames: 198803456. Throughput: 0: 1694.7. Samples: 44697188. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:34:42,946][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 06:34:47,932][41694] Fps is (10 sec: 5734.1, 60 sec: 6553.7, 300 sec: 6609.1). Total num frames: 198823936. Throughput: 0: 1615.0. Samples: 44699038. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:34:47,934][41694] Avg episode reward: [(0, '4.587')] +[2024-11-08 06:34:50,524][42004] Updated weights for policy 0, policy_version 48546 (0.0032) +[2024-11-08 06:34:52,932][41694] Fps is (10 sec: 5738.2, 60 sec: 6622.5, 300 sec: 6623.0). Total num frames: 198860800. Throughput: 0: 1618.7. Samples: 44709472. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:34:52,934][41694] Avg episode reward: [(0, '4.339')] +[2024-11-08 06:34:56,392][42004] Updated weights for policy 0, policy_version 48556 (0.0027) +[2024-11-08 06:34:57,933][41694] Fps is (10 sec: 6962.9, 60 sec: 6553.5, 300 sec: 6623.0). Total num frames: 198893568. Throughput: 0: 1706.9. Samples: 44719962. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:34:57,935][41694] Avg episode reward: [(0, '4.394')] +[2024-11-08 06:35:02,852][42004] Updated weights for policy 0, policy_version 48566 (0.0035) +[2024-11-08 06:35:02,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6671.4). Total num frames: 198926336. Throughput: 0: 1691.1. Samples: 44724748. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:35:02,933][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 06:35:07,931][41694] Fps is (10 sec: 6964.0, 60 sec: 6621.9, 300 sec: 6664.7). Total num frames: 198963200. Throughput: 0: 1671.2. Samples: 44735042. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:35:07,933][41694] Avg episode reward: [(0, '4.556')] +[2024-11-08 06:35:08,212][42004] Updated weights for policy 0, policy_version 48576 (0.0038) +[2024-11-08 06:35:12,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6894.9, 300 sec: 6692.4). Total num frames: 199000064. Throughput: 0: 1700.7. Samples: 44746156. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:35:12,938][41694] Avg episode reward: [(0, '4.412')] +[2024-11-08 06:35:14,126][42004] Updated weights for policy 0, policy_version 48586 (0.0033) +[2024-11-08 06:35:19,784][41694] Fps is (10 sec: 5529.2, 60 sec: 6556.0, 300 sec: 6636.9). Total num frames: 199028736. Throughput: 0: 1616.9. Samples: 44750774. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:35:19,790][41694] Avg episode reward: [(0, '4.524')] +[2024-11-08 06:35:22,339][42004] Updated weights for policy 0, policy_version 48596 (0.0027) +[2024-11-08 06:35:22,933][41694] Fps is (10 sec: 5324.0, 60 sec: 6553.5, 300 sec: 6650.8). Total num frames: 199053312. Throughput: 0: 1583.7. Samples: 44757214. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:35:22,936][41694] Avg episode reward: [(0, '4.305')] +[2024-11-08 06:35:27,926][42004] Updated weights for policy 0, policy_version 48606 (0.0031) +[2024-11-08 06:35:27,932][41694] Fps is (10 sec: 7540.7, 60 sec: 6621.8, 300 sec: 6650.8). Total num frames: 199090176. Throughput: 0: 1581.3. Samples: 44768338. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:35:27,935][41694] Avg episode reward: [(0, '4.584')] +[2024-11-08 06:35:32,932][41694] Fps is (10 sec: 6964.3, 60 sec: 6553.6, 300 sec: 6636.9). Total num frames: 199122944. Throughput: 0: 1661.4. Samples: 44773802. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:35:32,933][41694] Avg episode reward: [(0, '4.470')] +[2024-11-08 06:35:33,859][42004] Updated weights for policy 0, policy_version 48616 (0.0034) +[2024-11-08 06:35:37,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6553.6, 300 sec: 6686.8). Total num frames: 199159808. Throughput: 0: 1656.4. Samples: 44784008. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:35:37,933][41694] Avg episode reward: [(0, '4.452')] +[2024-11-08 06:35:37,946][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000048623_199159808.pth... +[2024-11-08 06:35:38,121][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000048232_197558272.pth +[2024-11-08 06:35:39,473][42004] Updated weights for policy 0, policy_version 48626 (0.0030) +[2024-11-08 06:35:42,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6486.0, 300 sec: 6678.6). Total num frames: 199192576. Throughput: 0: 1658.8. Samples: 44794608. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:35:42,933][41694] Avg episode reward: [(0, '4.382')] +[2024-11-08 06:35:45,424][42004] Updated weights for policy 0, policy_version 48636 (0.0026) +[2024-11-08 06:35:47,933][41694] Fps is (10 sec: 6962.4, 60 sec: 6758.4, 300 sec: 6678.5). Total num frames: 199229440. Throughput: 0: 1675.4. Samples: 44800142. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:35:47,937][41694] Avg episode reward: [(0, '4.275')] +[2024-11-08 06:35:50,912][42004] Updated weights for policy 0, policy_version 48646 (0.0023) +[2024-11-08 06:35:54,274][41694] Fps is (10 sec: 6139.2, 60 sec: 6543.7, 300 sec: 6634.6). Total num frames: 199262208. Throughput: 0: 1645.1. Samples: 44811278. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:35:54,277][41694] Avg episode reward: [(0, '4.376')] +[2024-11-08 06:35:57,931][41694] Fps is (10 sec: 6144.7, 60 sec: 6622.0, 300 sec: 6664.7). Total num frames: 199290880. Throughput: 0: 1603.4. Samples: 44818310. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:35:57,933][41694] Avg episode reward: [(0, '4.556')] +[2024-11-08 06:35:58,505][42004] Updated weights for policy 0, policy_version 48656 (0.0025) +[2024-11-08 06:36:02,932][41694] Fps is (10 sec: 7096.5, 60 sec: 6621.8, 300 sec: 6650.8). Total num frames: 199323648. Throughput: 0: 1696.9. Samples: 44823992. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:36:02,934][41694] Avg episode reward: [(0, '4.774')] +[2024-11-08 06:36:04,556][42004] Updated weights for policy 0, policy_version 48666 (0.0033) +[2024-11-08 06:36:07,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 199356416. Throughput: 0: 1698.5. Samples: 44833644. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:36:07,934][41694] Avg episode reward: [(0, '4.528')] +[2024-11-08 06:36:11,053][42004] Updated weights for policy 0, policy_version 48676 (0.0031) +[2024-11-08 06:36:12,932][41694] Fps is (10 sec: 6143.7, 60 sec: 6417.0, 300 sec: 6688.9). Total num frames: 199385088. Throughput: 0: 1667.4. Samples: 44843372. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:36:12,934][41694] Avg episode reward: [(0, '4.710')] +[2024-11-08 06:36:17,006][42004] Updated weights for policy 0, policy_version 48686 (0.0035) +[2024-11-08 06:36:17,932][41694] Fps is (10 sec: 6553.3, 60 sec: 6762.4, 300 sec: 6692.4). Total num frames: 199421952. Throughput: 0: 1651.9. Samples: 44848140. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:36:17,936][41694] Avg episode reward: [(0, '4.413')] +[2024-11-08 06:36:22,548][42004] Updated weights for policy 0, policy_version 48696 (0.0024) +[2024-11-08 06:36:22,932][41694] Fps is (10 sec: 7373.2, 60 sec: 6758.6, 300 sec: 6678.6). Total num frames: 199458816. Throughput: 0: 1672.9. Samples: 44859290. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:36:22,935][41694] Avg episode reward: [(0, '4.294')] +[2024-11-08 06:36:28,776][41694] Fps is (10 sec: 6043.2, 60 sec: 6529.9, 300 sec: 6631.8). Total num frames: 199487488. Throughput: 0: 1535.1. Samples: 44864986. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:36:28,778][41694] Avg episode reward: [(0, '4.371')] +[2024-11-08 06:36:30,326][42004] Updated weights for policy 0, policy_version 48706 (0.0033) +[2024-11-08 06:36:32,931][41694] Fps is (10 sec: 5734.6, 60 sec: 6553.6, 300 sec: 6636.9). Total num frames: 199516160. Throughput: 0: 1594.5. Samples: 44871892. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:36:32,933][41694] Avg episode reward: [(0, '4.201')] +[2024-11-08 06:36:35,807][42004] Updated weights for policy 0, policy_version 48716 (0.0024) +[2024-11-08 06:36:37,932][41694] Fps is (10 sec: 7158.4, 60 sec: 6553.6, 300 sec: 6636.9). Total num frames: 199553024. Throughput: 0: 1641.3. Samples: 44882934. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:36:37,936][41694] Avg episode reward: [(0, '4.219')] +[2024-11-08 06:36:42,144][42004] Updated weights for policy 0, policy_version 48726 (0.0039) +[2024-11-08 06:36:42,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6553.6, 300 sec: 6623.0). Total num frames: 199585792. Throughput: 0: 1654.4. Samples: 44892756. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:36:42,933][41694] Avg episode reward: [(0, '4.566')] +[2024-11-08 06:36:47,663][42004] Updated weights for policy 0, policy_version 48736 (0.0025) +[2024-11-08 06:36:47,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6553.7, 300 sec: 6678.6). Total num frames: 199622656. Throughput: 0: 1645.7. Samples: 44898048. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:36:47,933][41694] Avg episode reward: [(0, '4.476')] +[2024-11-08 06:36:52,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6773.4, 300 sec: 6678.6). Total num frames: 199659520. Throughput: 0: 1680.2. Samples: 44909254. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:36:52,933][41694] Avg episode reward: [(0, '4.370')] +[2024-11-08 06:36:53,181][42004] Updated weights for policy 0, policy_version 48746 (0.0029) +[2024-11-08 06:36:57,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6678.6). Total num frames: 199696384. Throughput: 0: 1713.2. Samples: 44920464. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:36:57,933][41694] Avg episode reward: [(0, '4.529')] +[2024-11-08 06:36:58,605][42004] Updated weights for policy 0, policy_version 48756 (0.0025) +[2024-11-08 06:37:03,327][41694] Fps is (10 sec: 5516.3, 60 sec: 6510.7, 300 sec: 6628.0). Total num frames: 199716864. Throughput: 0: 1713.2. Samples: 44925910. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:37:03,338][41694] Avg episode reward: [(0, '4.564')] +[2024-11-08 06:37:06,886][42004] Updated weights for policy 0, policy_version 48766 (0.0025) +[2024-11-08 06:37:07,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6553.6, 300 sec: 6636.9). Total num frames: 199749632. Throughput: 0: 1622.1. Samples: 44932284. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:37:07,933][41694] Avg episode reward: [(0, '4.596')] +[2024-11-08 06:37:12,651][42004] Updated weights for policy 0, policy_version 48776 (0.0034) +[2024-11-08 06:37:12,931][41694] Fps is (10 sec: 7249.9, 60 sec: 6690.2, 300 sec: 6636.9). Total num frames: 199786496. Throughput: 0: 1769.4. Samples: 44943112. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:37:12,934][41694] Avg episode reward: [(0, '4.643')] +[2024-11-08 06:37:17,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.2, 300 sec: 6636.9). Total num frames: 199823360. Throughput: 0: 1692.4. Samples: 44948050. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:37:17,934][41694] Avg episode reward: [(0, '4.486')] +[2024-11-08 06:37:18,365][42004] Updated weights for policy 0, policy_version 48786 (0.0030) +[2024-11-08 06:37:22,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6690.1, 300 sec: 6692.4). Total num frames: 199860224. Throughput: 0: 1697.9. Samples: 44959340. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:37:22,933][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 06:37:23,827][42004] Updated weights for policy 0, policy_version 48796 (0.0029) +[2024-11-08 06:37:27,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6924.2, 300 sec: 6692.4). Total num frames: 199897088. Throughput: 0: 1723.6. Samples: 44970316. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:37:27,933][41694] Avg episode reward: [(0, '4.369')] +[2024-11-08 06:37:29,565][42004] Updated weights for policy 0, policy_version 48806 (0.0026) +[2024-11-08 06:37:32,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6692.4). Total num frames: 199933952. Throughput: 0: 1725.7. Samples: 44975704. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:37:32,933][41694] Avg episode reward: [(0, '4.586')] +[2024-11-08 06:37:35,175][42004] Updated weights for policy 0, policy_version 48816 (0.0041) +[2024-11-08 06:37:37,944][41694] Fps is (10 sec: 5318.0, 60 sec: 6620.5, 300 sec: 6636.7). Total num frames: 199950336. Throughput: 0: 1596.5. Samples: 44981116. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:37:37,948][41694] Avg episode reward: [(0, '4.481')] +[2024-11-08 06:37:37,959][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000048816_199950336.pth... +[2024-11-08 06:37:38,073][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000048427_198356992.pth +[2024-11-08 06:37:42,932][41694] Fps is (10 sec: 5324.5, 60 sec: 6690.1, 300 sec: 6650.8). Total num frames: 199987200. Throughput: 0: 1614.5. Samples: 44993118. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:37:42,934][41694] Avg episode reward: [(0, '4.654')] +[2024-11-08 06:37:43,251][42004] Updated weights for policy 0, policy_version 48826 (0.0038) +[2024-11-08 06:37:47,932][41694] Fps is (10 sec: 6972.1, 60 sec: 6621.9, 300 sec: 6636.9). Total num frames: 200019968. Throughput: 0: 1626.2. Samples: 44998444. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:37:47,933][41694] Avg episode reward: [(0, '4.548')] +[2024-11-08 06:37:49,156][42004] Updated weights for policy 0, policy_version 48836 (0.0045) +[2024-11-08 06:37:52,932][41694] Fps is (10 sec: 6963.6, 60 sec: 6621.9, 300 sec: 6623.0). Total num frames: 200056832. Throughput: 0: 1695.2. Samples: 45008570. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:37:52,934][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 06:37:55,010][42004] Updated weights for policy 0, policy_version 48846 (0.0038) +[2024-11-08 06:37:57,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6692.4). Total num frames: 200093696. Throughput: 0: 1699.4. Samples: 45019586. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:37:57,933][41694] Avg episode reward: [(0, '4.495')] +[2024-11-08 06:38:00,600][42004] Updated weights for policy 0, policy_version 48856 (0.0028) +[2024-11-08 06:38:02,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6940.7, 300 sec: 6692.7). Total num frames: 200130560. Throughput: 0: 1711.6. Samples: 45025070. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:38:02,933][41694] Avg episode reward: [(0, '4.378')] +[2024-11-08 06:38:06,289][42004] Updated weights for policy 0, policy_version 48866 (0.0035) +[2024-11-08 06:38:07,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 6678.6). Total num frames: 200163328. Throughput: 0: 1701.6. Samples: 45035912. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:38:07,934][41694] Avg episode reward: [(0, '4.453')] +[2024-11-08 06:38:12,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6621.9, 300 sec: 6636.9). Total num frames: 200183808. Throughput: 0: 1624.4. Samples: 45043412. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:38:12,933][41694] Avg episode reward: [(0, '4.300')] +[2024-11-08 06:38:14,480][42004] Updated weights for policy 0, policy_version 48876 (0.0028) +[2024-11-08 06:38:17,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6621.9, 300 sec: 6636.9). Total num frames: 200220672. Throughput: 0: 1596.9. Samples: 45047566. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:38:17,935][41694] Avg episode reward: [(0, '4.422')] +[2024-11-08 06:38:20,476][42004] Updated weights for policy 0, policy_version 48886 (0.0028) +[2024-11-08 06:38:22,931][41694] Fps is (10 sec: 6553.5, 60 sec: 6485.3, 300 sec: 6609.1). Total num frames: 200249344. Throughput: 0: 1700.7. Samples: 45057626. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:38:22,934][41694] Avg episode reward: [(0, '4.571')] +[2024-11-08 06:38:26,389][42004] Updated weights for policy 0, policy_version 48896 (0.0025) +[2024-11-08 06:38:27,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6485.3, 300 sec: 6609.1). Total num frames: 200286208. Throughput: 0: 1670.9. Samples: 45068308. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:38:27,934][41694] Avg episode reward: [(0, '4.366')] +[2024-11-08 06:38:31,849][42004] Updated weights for policy 0, policy_version 48906 (0.0032) +[2024-11-08 06:38:32,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6553.6, 300 sec: 6678.6). Total num frames: 200327168. Throughput: 0: 1666.8. Samples: 45073450. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:38:32,935][41694] Avg episode reward: [(0, '4.658')] +[2024-11-08 06:38:37,159][42004] Updated weights for policy 0, policy_version 48916 (0.0026) +[2024-11-08 06:38:37,932][41694] Fps is (10 sec: 7782.0, 60 sec: 6896.3, 300 sec: 6678.6). Total num frames: 200364032. Throughput: 0: 1704.9. Samples: 45085292. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:38:37,935][41694] Avg episode reward: [(0, '4.518')] +[2024-11-08 06:38:42,887][42004] Updated weights for policy 0, policy_version 48926 (0.0037) +[2024-11-08 06:38:42,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6895.0, 300 sec: 6678.6). Total num frames: 200400896. Throughput: 0: 1697.9. Samples: 45095990. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:38:42,933][41694] Avg episode reward: [(0, '4.435')] +[2024-11-08 06:38:47,932][41694] Fps is (10 sec: 5734.6, 60 sec: 6690.1, 300 sec: 6637.0). Total num frames: 200421376. Throughput: 0: 1697.5. Samples: 45101456. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:38:47,933][41694] Avg episode reward: [(0, '4.519')] +[2024-11-08 06:38:50,788][42004] Updated weights for policy 0, policy_version 48936 (0.0022) +[2024-11-08 06:38:52,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.9, 300 sec: 6623.0). Total num frames: 200454144. Throughput: 0: 1610.6. Samples: 45108388. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:38:52,936][41694] Avg episode reward: [(0, '4.559')] +[2024-11-08 06:38:57,049][42004] Updated weights for policy 0, policy_version 48946 (0.0024) +[2024-11-08 06:38:57,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6623.0). Total num frames: 200486912. Throughput: 0: 1662.6. Samples: 45118228. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:38:57,935][41694] Avg episode reward: [(0, '4.444')] +[2024-11-08 06:39:02,661][42004] Updated weights for policy 0, policy_version 48956 (0.0024) +[2024-11-08 06:39:02,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6553.6, 300 sec: 6636.9). Total num frames: 200523776. Throughput: 0: 1692.1. Samples: 45123712. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:39:02,935][41694] Avg episode reward: [(0, '4.318')] +[2024-11-08 06:39:07,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6621.8, 300 sec: 6692.4). Total num frames: 200560640. Throughput: 0: 1716.6. Samples: 45134874. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:39:07,934][41694] Avg episode reward: [(0, '4.312')] +[2024-11-08 06:39:07,948][42004] Updated weights for policy 0, policy_version 48966 (0.0020) +[2024-11-08 06:39:12,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6963.2, 300 sec: 6706.3). Total num frames: 200601600. Throughput: 0: 1742.3. Samples: 45146710. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:39:12,935][41694] Avg episode reward: [(0, '4.545')] +[2024-11-08 06:39:13,239][42004] Updated weights for policy 0, policy_version 48976 (0.0021) +[2024-11-08 06:39:17,931][41694] Fps is (10 sec: 7782.7, 60 sec: 6963.2, 300 sec: 6706.4). Total num frames: 200638464. Throughput: 0: 1752.9. Samples: 45152330. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:39:17,934][41694] Avg episode reward: [(0, '4.552')] +[2024-11-08 06:39:18,552][42004] Updated weights for policy 0, policy_version 48986 (0.0029) +[2024-11-08 06:39:22,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6758.4, 300 sec: 6650.8). Total num frames: 200654848. Throughput: 0: 1680.5. Samples: 45160914. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:39:22,933][41694] Avg episode reward: [(0, '4.654')] +[2024-11-08 06:39:27,357][42004] Updated weights for policy 0, policy_version 48996 (0.0030) +[2024-11-08 06:39:27,932][41694] Fps is (10 sec: 4914.8, 60 sec: 6690.0, 300 sec: 6636.9). Total num frames: 200687616. Throughput: 0: 1621.7. Samples: 45168970. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:39:27,935][41694] Avg episode reward: [(0, '4.578')] +[2024-11-08 06:39:32,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6621.8, 300 sec: 6636.9). Total num frames: 200724480. Throughput: 0: 1600.8. Samples: 45173494. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:39:32,934][41694] Avg episode reward: [(0, '4.535')] +[2024-11-08 06:39:33,486][42004] Updated weights for policy 0, policy_version 49006 (0.0043) +[2024-11-08 06:39:37,932][41694] Fps is (10 sec: 7373.2, 60 sec: 6621.9, 300 sec: 6637.1). Total num frames: 200761344. Throughput: 0: 1691.9. Samples: 45184524. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:39:37,933][41694] Avg episode reward: [(0, '4.281')] +[2024-11-08 06:39:37,948][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000049014_200761344.pth... +[2024-11-08 06:39:38,086][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000048623_199159808.pth +[2024-11-08 06:39:38,794][42004] Updated weights for policy 0, policy_version 49016 (0.0030) +[2024-11-08 06:39:42,931][41694] Fps is (10 sec: 7373.1, 60 sec: 6621.9, 300 sec: 6692.5). Total num frames: 200798208. Throughput: 0: 1730.1. Samples: 45196082. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:39:42,935][41694] Avg episode reward: [(0, '4.494')] +[2024-11-08 06:39:44,282][42004] Updated weights for policy 0, policy_version 49026 (0.0039) +[2024-11-08 06:39:47,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6963.2, 300 sec: 6706.3). Total num frames: 200839168. Throughput: 0: 1733.4. Samples: 45201714. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:39:47,932][41694] Avg episode reward: [(0, '4.588')] +[2024-11-08 06:39:49,593][42004] Updated weights for policy 0, policy_version 49036 (0.0032) +[2024-11-08 06:39:52,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6706.4). Total num frames: 200871936. Throughput: 0: 1732.5. Samples: 45212834. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:39:52,933][41694] Avg episode reward: [(0, '4.519')] +[2024-11-08 06:39:57,879][42004] Updated weights for policy 0, policy_version 49046 (0.0045) +[2024-11-08 06:39:57,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 200892416. Throughput: 0: 1608.1. Samples: 45219074. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:39:57,934][41694] Avg episode reward: [(0, '4.457')] +[2024-11-08 06:40:02,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6621.9, 300 sec: 6636.9). Total num frames: 200921088. Throughput: 0: 1598.5. Samples: 45224264. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:40:02,935][41694] Avg episode reward: [(0, '4.400')] +[2024-11-08 06:40:04,596][42004] Updated weights for policy 0, policy_version 49056 (0.0043) +[2024-11-08 06:40:07,936][41694] Fps is (10 sec: 6141.5, 60 sec: 6553.2, 300 sec: 6622.9). Total num frames: 200953856. Throughput: 0: 1610.8. Samples: 45233408. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:40:07,938][41694] Avg episode reward: [(0, '4.334')] +[2024-11-08 06:40:10,273][42004] Updated weights for policy 0, policy_version 49066 (0.0026) +[2024-11-08 06:40:12,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6485.3, 300 sec: 6692.8). Total num frames: 200990720. Throughput: 0: 1674.5. Samples: 45244320. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:40:12,933][41694] Avg episode reward: [(0, '4.368')] +[2024-11-08 06:40:16,160][42004] Updated weights for policy 0, policy_version 49076 (0.0026) +[2024-11-08 06:40:17,931][41694] Fps is (10 sec: 6966.0, 60 sec: 6417.1, 300 sec: 6678.6). Total num frames: 201023488. Throughput: 0: 1685.8. Samples: 45249356. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:40:17,933][41694] Avg episode reward: [(0, '4.365')] +[2024-11-08 06:40:21,757][42004] Updated weights for policy 0, policy_version 49086 (0.0035) +[2024-11-08 06:40:22,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 6678.6). Total num frames: 201060352. Throughput: 0: 1683.5. Samples: 45260282. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:40:22,936][41694] Avg episode reward: [(0, '4.421')] +[2024-11-08 06:40:27,462][42004] Updated weights for policy 0, policy_version 49096 (0.0023) +[2024-11-08 06:40:27,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.8, 300 sec: 6692.4). Total num frames: 201097216. Throughput: 0: 1665.5. Samples: 45271030. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:40:27,933][41694] Avg episode reward: [(0, '4.656')] +[2024-11-08 06:40:32,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6553.7, 300 sec: 6636.9). Total num frames: 201117696. Throughput: 0: 1621.6. Samples: 45274686. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:40:32,933][41694] Avg episode reward: [(0, '4.511')] +[2024-11-08 06:40:36,223][42004] Updated weights for policy 0, policy_version 49106 (0.0038) +[2024-11-08 06:40:37,932][41694] Fps is (10 sec: 4914.9, 60 sec: 6417.0, 300 sec: 6623.0). Total num frames: 201146368. Throughput: 0: 1533.5. Samples: 45281844. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:40:37,943][41694] Avg episode reward: [(0, '4.331')] +[2024-11-08 06:40:42,095][42004] Updated weights for policy 0, policy_version 49116 (0.0051) +[2024-11-08 06:40:42,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6623.1). Total num frames: 201183232. Throughput: 0: 1626.1. Samples: 45292248. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:40:42,933][41694] Avg episode reward: [(0, '4.517')] +[2024-11-08 06:40:47,623][42004] Updated weights for policy 0, policy_version 49126 (0.0019) +[2024-11-08 06:40:47,931][41694] Fps is (10 sec: 7373.4, 60 sec: 6348.8, 300 sec: 6667.2). Total num frames: 201220096. Throughput: 0: 1628.5. Samples: 45297546. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:40:47,933][41694] Avg episode reward: [(0, '4.633')] +[2024-11-08 06:40:52,778][42004] Updated weights for policy 0, policy_version 49136 (0.0025) +[2024-11-08 06:40:52,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6485.3, 300 sec: 6678.6). Total num frames: 201261056. Throughput: 0: 1685.4. Samples: 45309246. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:40:52,933][41694] Avg episode reward: [(0, '4.409')] +[2024-11-08 06:40:57,876][42004] Updated weights for policy 0, policy_version 49146 (0.0027) +[2024-11-08 06:40:57,931][41694] Fps is (10 sec: 8191.9, 60 sec: 6826.7, 300 sec: 6706.3). Total num frames: 201302016. Throughput: 0: 1708.3. Samples: 45321192. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:40:57,933][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 06:41:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6894.9, 300 sec: 6706.3). Total num frames: 201334784. Throughput: 0: 1718.5. Samples: 45326690. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:41:02,933][41694] Avg episode reward: [(0, '4.567')] +[2024-11-08 06:41:03,601][42004] Updated weights for policy 0, policy_version 49156 (0.0029) +[2024-11-08 06:41:07,933][41694] Fps is (10 sec: 4914.4, 60 sec: 6622.1, 300 sec: 6664.7). Total num frames: 201351168. Throughput: 0: 1634.0. Samples: 45333816. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:41:07,935][41694] Avg episode reward: [(0, '4.504')] +[2024-11-08 06:41:12,932][41694] Fps is (10 sec: 4505.6, 60 sec: 6485.3, 300 sec: 6636.9). Total num frames: 201379840. Throughput: 0: 1581.6. Samples: 45342200. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:41:12,933][41694] Avg episode reward: [(0, '4.680')] +[2024-11-08 06:41:13,067][42004] Updated weights for policy 0, policy_version 49166 (0.0055) +[2024-11-08 06:41:17,932][41694] Fps is (10 sec: 6554.2, 60 sec: 6553.5, 300 sec: 6636.9). Total num frames: 201416704. Throughput: 0: 1600.1. Samples: 45346690. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:41:17,933][41694] Avg episode reward: [(0, '4.605')] +[2024-11-08 06:41:18,726][42004] Updated weights for policy 0, policy_version 49176 (0.0027) +[2024-11-08 06:41:22,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6621.9, 300 sec: 6697.7). Total num frames: 201457664. Throughput: 0: 1694.4. Samples: 45358092. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:41:22,933][41694] Avg episode reward: [(0, '4.622')] +[2024-11-08 06:41:23,961][42004] Updated weights for policy 0, policy_version 49186 (0.0022) +[2024-11-08 06:41:27,932][41694] Fps is (10 sec: 7373.1, 60 sec: 6553.6, 300 sec: 6692.4). Total num frames: 201490432. Throughput: 0: 1710.5. Samples: 45369220. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:41:27,935][41694] Avg episode reward: [(0, '4.527')] +[2024-11-08 06:41:29,730][42004] Updated weights for policy 0, policy_version 49196 (0.0028) +[2024-11-08 06:41:32,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6894.9, 300 sec: 6706.3). Total num frames: 201531392. Throughput: 0: 1717.8. Samples: 45374846. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:41:32,935][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 06:41:35,137][42004] Updated weights for policy 0, policy_version 49206 (0.0027) +[2024-11-08 06:41:37,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6963.3, 300 sec: 6706.3). Total num frames: 201564160. Throughput: 0: 1707.3. Samples: 45386074. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:41:37,933][41694] Avg episode reward: [(0, '4.509')] +[2024-11-08 06:41:38,031][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000049211_201568256.pth... +[2024-11-08 06:41:38,148][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000048816_199950336.pth +[2024-11-08 06:41:42,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6621.8, 300 sec: 6636.9). Total num frames: 201580544. Throughput: 0: 1568.2. Samples: 45391762. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:41:42,935][41694] Avg episode reward: [(0, '4.490')] +[2024-11-08 06:41:44,178][42004] Updated weights for policy 0, policy_version 49216 (0.0027) +[2024-11-08 06:41:47,931][41694] Fps is (10 sec: 4505.7, 60 sec: 6485.3, 300 sec: 6609.1). Total num frames: 201609216. Throughput: 0: 1537.7. Samples: 45395886. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:41:47,934][41694] Avg episode reward: [(0, '4.413')] +[2024-11-08 06:41:50,403][42004] Updated weights for policy 0, policy_version 49226 (0.0026) +[2024-11-08 06:41:52,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6609.1). Total num frames: 201646080. Throughput: 0: 1601.3. Samples: 45405870. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 06:41:52,933][41694] Avg episode reward: [(0, '4.298')] +[2024-11-08 06:41:56,162][42004] Updated weights for policy 0, policy_version 49236 (0.0029) +[2024-11-08 06:41:57,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6348.8, 300 sec: 6673.6). Total num frames: 201682944. Throughput: 0: 1661.3. Samples: 45416956. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 06:41:57,934][41694] Avg episode reward: [(0, '4.329')] +[2024-11-08 06:42:01,748][42004] Updated weights for policy 0, policy_version 49246 (0.0027) +[2024-11-08 06:42:02,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6417.1, 300 sec: 6678.6). Total num frames: 201719808. Throughput: 0: 1679.2. Samples: 45422254. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 06:42:02,934][41694] Avg episode reward: [(0, '4.364')] +[2024-11-08 06:42:07,512][42004] Updated weights for policy 0, policy_version 49256 (0.0030) +[2024-11-08 06:42:07,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.3, 300 sec: 6664.7). Total num frames: 201752576. Throughput: 0: 1660.7. Samples: 45432824. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 06:42:07,933][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 06:42:12,764][42004] Updated weights for policy 0, policy_version 49266 (0.0021) +[2024-11-08 06:42:12,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6894.9, 300 sec: 6678.6). Total num frames: 201793536. Throughput: 0: 1669.0. Samples: 45444326. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 06:42:12,934][41694] Avg episode reward: [(0, '4.314')] +[2024-11-08 06:42:17,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6621.9, 300 sec: 6623.0). Total num frames: 201814016. Throughput: 0: 1600.3. Samples: 45446858. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 06:42:17,934][41694] Avg episode reward: [(0, '4.435')] +[2024-11-08 06:42:20,778][42004] Updated weights for policy 0, policy_version 49276 (0.0043) +[2024-11-08 06:42:22,933][41694] Fps is (10 sec: 5324.4, 60 sec: 6485.2, 300 sec: 6609.1). Total num frames: 201846784. Throughput: 0: 1566.7. Samples: 45456576. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:42:22,935][41694] Avg episode reward: [(0, '4.487')] +[2024-11-08 06:42:26,256][42004] Updated weights for policy 0, policy_version 49286 (0.0030) +[2024-11-08 06:42:27,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6623.0). Total num frames: 201887744. Throughput: 0: 1692.8. Samples: 45467938. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:42:27,933][41694] Avg episode reward: [(0, '4.278')] +[2024-11-08 06:42:31,739][42004] Updated weights for policy 0, policy_version 49296 (0.0027) +[2024-11-08 06:42:32,931][41694] Fps is (10 sec: 7783.4, 60 sec: 6553.6, 300 sec: 6692.7). Total num frames: 201924608. Throughput: 0: 1715.6. Samples: 45473086. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:42:32,934][41694] Avg episode reward: [(0, '4.602')] +[2024-11-08 06:42:37,160][42004] Updated weights for policy 0, policy_version 49306 (0.0030) +[2024-11-08 06:42:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6621.9, 300 sec: 6692.5). Total num frames: 201961472. Throughput: 0: 1750.9. Samples: 45484662. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:42:37,933][41694] Avg episode reward: [(0, '4.557')] +[2024-11-08 06:42:42,463][42004] Updated weights for policy 0, policy_version 49316 (0.0025) +[2024-11-08 06:42:42,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6706.3). Total num frames: 201998336. Throughput: 0: 1761.3. Samples: 45496214. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:42:42,933][41694] Avg episode reward: [(0, '4.574')] +[2024-11-08 06:42:49,743][41694] Fps is (10 sec: 5895.3, 60 sec: 6825.4, 300 sec: 6651.6). Total num frames: 202031104. Throughput: 0: 1696.7. Samples: 45501680. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:42:49,745][41694] Avg episode reward: [(0, '4.489')] +[2024-11-08 06:42:50,488][42004] Updated weights for policy 0, policy_version 49326 (0.0031) +[2024-11-08 06:42:52,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6758.4, 300 sec: 6636.9). Total num frames: 202051584. Throughput: 0: 1673.2. Samples: 45508116. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:42:52,933][41694] Avg episode reward: [(0, '4.522')] +[2024-11-08 06:42:56,624][42004] Updated weights for policy 0, policy_version 49336 (0.0027) +[2024-11-08 06:42:57,932][41694] Fps is (10 sec: 7003.0, 60 sec: 6758.4, 300 sec: 6636.9). Total num frames: 202088448. Throughput: 0: 1646.6. Samples: 45518422. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:42:57,934][41694] Avg episode reward: [(0, '4.484')] +[2024-11-08 06:43:02,125][42004] Updated weights for policy 0, policy_version 49346 (0.0024) +[2024-11-08 06:43:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6650.8). Total num frames: 202125312. Throughput: 0: 1711.1. Samples: 45523858. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:43:02,934][41694] Avg episode reward: [(0, '4.729')] +[2024-11-08 06:43:07,410][42004] Updated weights for policy 0, policy_version 49356 (0.0026) +[2024-11-08 06:43:07,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6826.6, 300 sec: 6706.3). Total num frames: 202162176. Throughput: 0: 1749.7. Samples: 45535310. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:43:07,934][41694] Avg episode reward: [(0, '4.769')] +[2024-11-08 06:43:12,659][42004] Updated weights for policy 0, policy_version 49366 (0.0029) +[2024-11-08 06:43:12,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 202203136. Throughput: 0: 1759.3. Samples: 45547106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:43:12,934][41694] Avg episode reward: [(0, '4.387')] +[2024-11-08 06:43:17,932][41694] Fps is (10 sec: 7782.6, 60 sec: 7099.7, 300 sec: 6748.0). Total num frames: 202240000. Throughput: 0: 1769.9. Samples: 45552732. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:43:17,933][41694] Avg episode reward: [(0, '4.423')] +[2024-11-08 06:43:18,075][42004] Updated weights for policy 0, policy_version 49376 (0.0027) +[2024-11-08 06:43:24,527][41694] Fps is (10 sec: 5651.7, 60 sec: 6849.4, 300 sec: 6684.1). Total num frames: 202268672. Throughput: 0: 1698.8. Samples: 45563820. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:43:24,530][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 06:43:26,686][42004] Updated weights for policy 0, policy_version 49386 (0.0034) +[2024-11-08 06:43:27,935][41694] Fps is (10 sec: 4914.9, 60 sec: 6690.1, 300 sec: 6650.8). Total num frames: 202289152. Throughput: 0: 1623.2. Samples: 45569258. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:43:27,942][41694] Avg episode reward: [(0, '4.422')] +[2024-11-08 06:43:32,708][42004] Updated weights for policy 0, policy_version 49396 (0.0028) +[2024-11-08 06:43:32,932][41694] Fps is (10 sec: 6823.2, 60 sec: 6690.1, 300 sec: 6650.8). Total num frames: 202326016. Throughput: 0: 1673.4. Samples: 45573952. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:43:32,933][41694] Avg episode reward: [(0, '4.665')] +[2024-11-08 06:43:37,932][41694] Fps is (10 sec: 7373.1, 60 sec: 6690.1, 300 sec: 6650.8). Total num frames: 202362880. Throughput: 0: 1717.5. Samples: 45585406. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:43:37,933][41694] Avg episode reward: [(0, '4.604')] +[2024-11-08 06:43:38,023][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000049406_202366976.pth... +[2024-11-08 06:43:38,037][42004] Updated weights for policy 0, policy_version 49406 (0.0026) +[2024-11-08 06:43:38,146][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000049014_200761344.pth +[2024-11-08 06:43:42,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 6720.2). Total num frames: 202403840. Throughput: 0: 1745.6. Samples: 45596972. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:43:42,934][41694] Avg episode reward: [(0, '4.534')] +[2024-11-08 06:43:43,336][42004] Updated weights for policy 0, policy_version 49416 (0.0022) +[2024-11-08 06:43:47,931][41694] Fps is (10 sec: 8192.2, 60 sec: 7109.6, 300 sec: 6748.0). Total num frames: 202444800. Throughput: 0: 1754.7. Samples: 45602818. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:43:47,933][41694] Avg episode reward: [(0, '4.420')] +[2024-11-08 06:43:48,511][42004] Updated weights for policy 0, policy_version 49426 (0.0027) +[2024-11-08 06:43:52,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7099.7, 300 sec: 6748.0). Total num frames: 202477568. Throughput: 0: 1745.0. Samples: 45613836. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:43:52,936][41694] Avg episode reward: [(0, '4.513')] +[2024-11-08 06:43:54,469][42004] Updated weights for policy 0, policy_version 49436 (0.0026) +[2024-11-08 06:43:59,067][41694] Fps is (10 sec: 5149.4, 60 sec: 6766.8, 300 sec: 6680.6). Total num frames: 202502144. Throughput: 0: 1564.9. Samples: 45619306. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:43:59,069][41694] Avg episode reward: [(0, '4.461')] +[2024-11-08 06:44:02,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6690.2, 300 sec: 6664.7). Total num frames: 202526720. Throughput: 0: 1604.4. Samples: 45624932. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:44:02,933][41694] Avg episode reward: [(0, '4.412')] +[2024-11-08 06:44:03,364][42004] Updated weights for policy 0, policy_version 49446 (0.0035) +[2024-11-08 06:44:07,931][41694] Fps is (10 sec: 6931.4, 60 sec: 6690.2, 300 sec: 6650.8). Total num frames: 202563584. Throughput: 0: 1641.6. Samples: 45635070. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:44:07,933][41694] Avg episode reward: [(0, '4.426')] +[2024-11-08 06:44:08,636][42004] Updated weights for policy 0, policy_version 49456 (0.0029) +[2024-11-08 06:44:12,932][41694] Fps is (10 sec: 7372.1, 60 sec: 6621.8, 300 sec: 6650.8). Total num frames: 202600448. Throughput: 0: 1716.0. Samples: 45646480. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:44:12,934][41694] Avg episode reward: [(0, '4.332')] +[2024-11-08 06:44:14,134][42004] Updated weights for policy 0, policy_version 49466 (0.0018) +[2024-11-08 06:44:17,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 202637312. Throughput: 0: 1741.4. Samples: 45652316. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:44:17,933][41694] Avg episode reward: [(0, '4.545')] +[2024-11-08 06:44:19,655][42004] Updated weights for policy 0, policy_version 49476 (0.0035) +[2024-11-08 06:44:22,932][41694] Fps is (10 sec: 7783.0, 60 sec: 7013.2, 300 sec: 6748.0). Total num frames: 202678272. Throughput: 0: 1738.9. Samples: 45663656. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:44:22,933][41694] Avg episode reward: [(0, '4.413')] +[2024-11-08 06:44:25,103][42004] Updated weights for policy 0, policy_version 49486 (0.0025) +[2024-11-08 06:44:27,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7099.8, 300 sec: 6748.0). Total num frames: 202715136. Throughput: 0: 1726.9. Samples: 45674684. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:44:27,933][41694] Avg episode reward: [(0, '4.649')] +[2024-11-08 06:44:31,236][42004] Updated weights for policy 0, policy_version 49496 (0.0029) +[2024-11-08 06:44:34,040][41694] Fps is (10 sec: 5162.3, 60 sec: 6702.9, 300 sec: 6667.4). Total num frames: 202735616. Throughput: 0: 1659.0. Samples: 45679312. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:44:34,044][41694] Avg episode reward: [(0, '4.795')] +[2024-11-08 06:44:37,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6690.1, 300 sec: 6664.7). Total num frames: 202764288. Throughput: 0: 1579.1. Samples: 45684894. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:44:37,934][41694] Avg episode reward: [(0, '4.507')] +[2024-11-08 06:44:39,531][42004] Updated weights for policy 0, policy_version 49506 (0.0038) +[2024-11-08 06:44:42,931][41694] Fps is (10 sec: 7370.5, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 202801152. Throughput: 0: 1747.9. Samples: 45695974. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:44:42,933][41694] Avg episode reward: [(0, '4.566')] +[2024-11-08 06:44:45,350][42004] Updated weights for policy 0, policy_version 49516 (0.0020) +[2024-11-08 06:44:47,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6485.3, 300 sec: 6650.8). Total num frames: 202833920. Throughput: 0: 1699.2. Samples: 45701396. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:44:47,937][41694] Avg episode reward: [(0, '4.450')] +[2024-11-08 06:44:50,662][42004] Updated weights for policy 0, policy_version 49526 (0.0026) +[2024-11-08 06:44:52,933][41694] Fps is (10 sec: 6962.3, 60 sec: 6553.5, 300 sec: 6706.3). Total num frames: 202870784. Throughput: 0: 1727.1. Samples: 45712790. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:44:52,936][41694] Avg episode reward: [(0, '4.527')] +[2024-11-08 06:44:56,548][42004] Updated weights for policy 0, policy_version 49536 (0.0036) +[2024-11-08 06:44:57,942][41694] Fps is (10 sec: 7364.9, 60 sec: 6887.5, 300 sec: 6733.9). Total num frames: 202907648. Throughput: 0: 1706.5. Samples: 45723288. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:44:57,944][41694] Avg episode reward: [(0, '4.743')] +[2024-11-08 06:45:02,614][42004] Updated weights for policy 0, policy_version 49546 (0.0035) +[2024-11-08 06:45:02,931][41694] Fps is (10 sec: 6964.1, 60 sec: 6894.9, 300 sec: 6734.2). Total num frames: 202940416. Throughput: 0: 1685.4. Samples: 45728158. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:45:02,933][41694] Avg episode reward: [(0, '4.538')] +[2024-11-08 06:45:08,646][41694] Fps is (10 sec: 4974.8, 60 sec: 6543.9, 300 sec: 6662.4). Total num frames: 202960896. Throughput: 0: 1627.5. Samples: 45738054. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:45:08,648][41694] Avg episode reward: [(0, '4.288')] +[2024-11-08 06:45:11,288][42004] Updated weights for policy 0, policy_version 49556 (0.0032) +[2024-11-08 06:45:12,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6485.4, 300 sec: 6664.7). Total num frames: 202989568. Throughput: 0: 1540.8. Samples: 45744020. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:45:12,934][41694] Avg episode reward: [(0, '4.468')] +[2024-11-08 06:45:17,702][42004] Updated weights for policy 0, policy_version 49566 (0.0027) +[2024-11-08 06:45:17,932][41694] Fps is (10 sec: 6616.5, 60 sec: 6417.1, 300 sec: 6650.8). Total num frames: 203022336. Throughput: 0: 1578.0. Samples: 45748574. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:45:17,934][41694] Avg episode reward: [(0, '4.674')] +[2024-11-08 06:45:22,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6348.8, 300 sec: 6650.8). Total num frames: 203059200. Throughput: 0: 1642.2. Samples: 45758794. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:45:22,933][41694] Avg episode reward: [(0, '4.521')] +[2024-11-08 06:45:23,336][42004] Updated weights for policy 0, policy_version 49576 (0.0026) +[2024-11-08 06:45:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6348.8, 300 sec: 6706.3). Total num frames: 203096064. Throughput: 0: 1648.9. Samples: 45770176. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:45:27,933][41694] Avg episode reward: [(0, '4.284')] +[2024-11-08 06:45:28,931][42004] Updated weights for policy 0, policy_version 49586 (0.0029) +[2024-11-08 06:45:32,932][41694] Fps is (10 sec: 7372.4, 60 sec: 6746.4, 300 sec: 6734.1). Total num frames: 203132928. Throughput: 0: 1650.5. Samples: 45775668. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:45:32,934][41694] Avg episode reward: [(0, '4.635')] +[2024-11-08 06:45:34,664][42004] Updated weights for policy 0, policy_version 49596 (0.0032) +[2024-11-08 06:45:37,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 6720.2). Total num frames: 203165696. Throughput: 0: 1627.3. Samples: 45786018. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:45:37,936][41694] Avg episode reward: [(0, '4.490')] +[2024-11-08 06:45:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000049601_203165696.pth... +[2024-11-08 06:45:38,310][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000049211_201568256.pth +[2024-11-08 06:45:43,374][41694] Fps is (10 sec: 4707.2, 60 sec: 6302.3, 300 sec: 6640.8). Total num frames: 203182080. Throughput: 0: 1483.2. Samples: 45790670. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:45:43,377][41694] Avg episode reward: [(0, '4.536')] +[2024-11-08 06:45:43,754][42004] Updated weights for policy 0, policy_version 49606 (0.0032) +[2024-11-08 06:45:47,932][41694] Fps is (10 sec: 4915.2, 60 sec: 6348.8, 300 sec: 6623.0). Total num frames: 203214848. Throughput: 0: 1519.0. Samples: 45796512. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:45:47,933][41694] Avg episode reward: [(0, '4.468')] +[2024-11-08 06:45:49,285][42004] Updated weights for policy 0, policy_version 49616 (0.0027) +[2024-11-08 06:45:52,932][41694] Fps is (10 sec: 7285.4, 60 sec: 6348.9, 300 sec: 6609.1). Total num frames: 203251712. Throughput: 0: 1566.6. Samples: 45807432. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:45:52,936][41694] Avg episode reward: [(0, '4.408')] +[2024-11-08 06:45:54,707][42004] Updated weights for policy 0, policy_version 49626 (0.0024) +[2024-11-08 06:45:57,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6418.2, 300 sec: 6636.9). Total num frames: 203292672. Throughput: 0: 1667.4. Samples: 45819054. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:45:57,935][41694] Avg episode reward: [(0, '4.571')] +[2024-11-08 06:46:00,220][42004] Updated weights for policy 0, policy_version 49636 (0.0033) +[2024-11-08 06:46:02,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6417.1, 300 sec: 6692.5). Total num frames: 203325440. Throughput: 0: 1687.0. Samples: 45824488. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:46:02,935][41694] Avg episode reward: [(0, '4.488')] +[2024-11-08 06:46:05,963][42004] Updated weights for policy 0, policy_version 49646 (0.0025) +[2024-11-08 06:46:07,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6770.8, 300 sec: 6720.2). Total num frames: 203362304. Throughput: 0: 1697.0. Samples: 45835158. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:46:07,934][41694] Avg episode reward: [(0, '4.630')] +[2024-11-08 06:46:12,663][42004] Updated weights for policy 0, policy_version 49656 (0.0039) +[2024-11-08 06:46:12,932][41694] Fps is (10 sec: 6553.3, 60 sec: 6690.1, 300 sec: 6692.5). Total num frames: 203390976. Throughput: 0: 1648.2. Samples: 45844346. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:46:12,942][41694] Avg episode reward: [(0, '4.678')] +[2024-11-08 06:46:18,189][41694] Fps is (10 sec: 4392.3, 60 sec: 6389.6, 300 sec: 6603.4). Total num frames: 203407360. Throughput: 0: 1619.1. Samples: 45848944. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:46:18,192][41694] Avg episode reward: [(0, '4.702')] +[2024-11-08 06:46:21,234][42004] Updated weights for policy 0, policy_version 49666 (0.0030) +[2024-11-08 06:46:22,932][41694] Fps is (10 sec: 4915.4, 60 sec: 6348.8, 300 sec: 6609.1). Total num frames: 203440128. Throughput: 0: 1538.0. Samples: 45855226. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:46:22,935][41694] Avg episode reward: [(0, '4.367')] +[2024-11-08 06:46:27,025][42004] Updated weights for policy 0, policy_version 49676 (0.0033) +[2024-11-08 06:46:27,932][41694] Fps is (10 sec: 7147.5, 60 sec: 6348.8, 300 sec: 6595.3). Total num frames: 203476992. Throughput: 0: 1685.0. Samples: 45865750. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:46:27,934][41694] Avg episode reward: [(0, '4.312')] +[2024-11-08 06:46:32,357][42004] Updated weights for policy 0, policy_version 49686 (0.0031) +[2024-11-08 06:46:32,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6417.1, 300 sec: 6623.0). Total num frames: 203517952. Throughput: 0: 1662.6. Samples: 45871328. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 06:46:32,933][41694] Avg episode reward: [(0, '4.579')] +[2024-11-08 06:46:37,510][42004] Updated weights for policy 0, policy_version 49696 (0.0027) +[2024-11-08 06:46:37,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6485.4, 300 sec: 6692.5). Total num frames: 203554816. Throughput: 0: 1683.9. Samples: 45883208. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 06:46:37,934][41694] Avg episode reward: [(0, '4.305')] +[2024-11-08 06:46:42,848][42004] Updated weights for policy 0, policy_version 49706 (0.0024) +[2024-11-08 06:46:42,931][41694] Fps is (10 sec: 7782.3, 60 sec: 6946.2, 300 sec: 6734.1). Total num frames: 203595776. Throughput: 0: 1684.8. Samples: 45894868. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 06:46:42,933][41694] Avg episode reward: [(0, '4.667')] +[2024-11-08 06:46:47,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6826.6, 300 sec: 6706.3). Total num frames: 203624448. Throughput: 0: 1672.6. Samples: 45899756. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 06:46:47,936][41694] Avg episode reward: [(0, '4.390')] +[2024-11-08 06:46:49,322][42004] Updated weights for policy 0, policy_version 49716 (0.0029) +[2024-11-08 06:46:52,931][41694] Fps is (10 sec: 4505.6, 60 sec: 6485.4, 300 sec: 6636.9). Total num frames: 203640832. Throughput: 0: 1632.6. Samples: 45908624. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 06:46:52,933][41694] Avg episode reward: [(0, '4.389')] +[2024-11-08 06:46:57,333][42004] Updated weights for policy 0, policy_version 49726 (0.0024) +[2024-11-08 06:46:57,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6485.3, 300 sec: 6650.8). Total num frames: 203681792. Throughput: 0: 1599.7. Samples: 45916332. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 06:46:57,934][41694] Avg episode reward: [(0, '4.606')] +[2024-11-08 06:47:02,890][42004] Updated weights for policy 0, policy_version 49736 (0.0039) +[2024-11-08 06:47:02,931][41694] Fps is (10 sec: 7782.3, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 203718656. Throughput: 0: 1630.6. Samples: 45921900. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:47:02,933][41694] Avg episode reward: [(0, '4.379')] +[2024-11-08 06:47:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 203755520. Throughput: 0: 1731.2. Samples: 45933132. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:47:07,935][41694] Avg episode reward: [(0, '4.677')] +[2024-11-08 06:47:08,276][42004] Updated weights for policy 0, policy_version 49746 (0.0034) +[2024-11-08 06:47:12,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.5, 300 sec: 6720.2). Total num frames: 203796480. Throughput: 0: 1756.3. Samples: 45944784. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:47:12,933][41694] Avg episode reward: [(0, '4.500')] +[2024-11-08 06:47:13,447][42004] Updated weights for policy 0, policy_version 49756 (0.0035) +[2024-11-08 06:47:17,932][41694] Fps is (10 sec: 7372.9, 60 sec: 7061.8, 300 sec: 6720.2). Total num frames: 203829248. Throughput: 0: 1758.1. Samples: 45950444. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:47:17,933][41694] Avg episode reward: [(0, '4.436')] +[2024-11-08 06:47:19,375][42004] Updated weights for policy 0, policy_version 49766 (0.0031) +[2024-11-08 06:47:22,932][41694] Fps is (10 sec: 6553.5, 60 sec: 7031.5, 300 sec: 6692.4). Total num frames: 203862016. Throughput: 0: 1719.3. Samples: 45960578. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:47:22,933][41694] Avg episode reward: [(0, '4.577')] +[2024-11-08 06:47:27,777][42004] Updated weights for policy 0, policy_version 49776 (0.0027) +[2024-11-08 06:47:27,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6758.4, 300 sec: 6636.9). Total num frames: 203882496. Throughput: 0: 1621.2. Samples: 45967822. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:47:27,935][41694] Avg episode reward: [(0, '4.672')] +[2024-11-08 06:47:32,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6690.1, 300 sec: 6636.9). Total num frames: 203919360. Throughput: 0: 1607.4. Samples: 45972088. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:47:32,935][41694] Avg episode reward: [(0, '4.483')] +[2024-11-08 06:47:33,166][42004] Updated weights for policy 0, policy_version 49786 (0.0034) +[2024-11-08 06:47:37,932][41694] Fps is (10 sec: 7372.4, 60 sec: 6690.1, 300 sec: 6636.9). Total num frames: 203956224. Throughput: 0: 1662.9. Samples: 45983456. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:47:37,934][41694] Avg episode reward: [(0, '4.691')] +[2024-11-08 06:47:37,990][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000049795_203960320.pth... +[2024-11-08 06:47:38,120][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000049406_202366976.pth +[2024-11-08 06:47:38,588][42004] Updated weights for policy 0, policy_version 49796 (0.0024) +[2024-11-08 06:47:42,931][41694] Fps is (10 sec: 7782.8, 60 sec: 6690.1, 300 sec: 6705.9). Total num frames: 203997184. Throughput: 0: 1753.0. Samples: 45995216. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:47:42,934][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 06:47:43,848][42004] Updated weights for policy 0, policy_version 49806 (0.0025) +[2024-11-08 06:47:47,931][41694] Fps is (10 sec: 7782.8, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 204034048. Throughput: 0: 1758.8. Samples: 46001048. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:47:47,933][41694] Avg episode reward: [(0, '4.643')] +[2024-11-08 06:47:49,258][42004] Updated weights for policy 0, policy_version 49816 (0.0024) +[2024-11-08 06:47:52,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7168.0, 300 sec: 6720.2). Total num frames: 204070912. Throughput: 0: 1756.8. Samples: 46012186. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:47:52,937][41694] Avg episode reward: [(0, '4.292')] +[2024-11-08 06:47:54,859][42004] Updated weights for policy 0, policy_version 49826 (0.0029) +[2024-11-08 06:47:57,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7099.8, 300 sec: 6720.2). Total num frames: 204107776. Throughput: 0: 1744.2. Samples: 46023274. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:47:57,934][41694] Avg episode reward: [(0, '4.608')] +[2024-11-08 06:48:02,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6758.4, 300 sec: 6650.8). Total num frames: 204124160. Throughput: 0: 1722.9. Samples: 46027976. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:48:02,935][41694] Avg episode reward: [(0, '4.765')] +[2024-11-08 06:48:03,570][42004] Updated weights for policy 0, policy_version 49836 (0.0035) +[2024-11-08 06:48:07,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6690.2, 300 sec: 6623.0). Total num frames: 204156928. Throughput: 0: 1623.9. Samples: 46033652. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:48:07,933][41694] Avg episode reward: [(0, '4.489')] +[2024-11-08 06:48:09,789][42004] Updated weights for policy 0, policy_version 49846 (0.0032) +[2024-11-08 06:48:12,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6609.1). Total num frames: 204189696. Throughput: 0: 1688.9. Samples: 46043824. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:48:12,933][41694] Avg episode reward: [(0, '4.469')] +[2024-11-08 06:48:15,516][42004] Updated weights for policy 0, policy_version 49856 (0.0027) +[2024-11-08 06:48:17,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6621.9, 300 sec: 6673.0). Total num frames: 204226560. Throughput: 0: 1717.4. Samples: 46049372. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:48:17,933][41694] Avg episode reward: [(0, '4.352')] +[2024-11-08 06:48:20,883][42004] Updated weights for policy 0, policy_version 49866 (0.0028) +[2024-11-08 06:48:22,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6692.5). Total num frames: 204263424. Throughput: 0: 1713.4. Samples: 46060556. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:48:22,933][41694] Avg episode reward: [(0, '4.595')] +[2024-11-08 06:48:26,309][42004] Updated weights for policy 0, policy_version 49876 (0.0024) +[2024-11-08 06:48:27,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6692.4). Total num frames: 204300288. Throughput: 0: 1703.4. Samples: 46071870. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:48:27,933][41694] Avg episode reward: [(0, '4.440')] +[2024-11-08 06:48:31,920][42004] Updated weights for policy 0, policy_version 49886 (0.0025) +[2024-11-08 06:48:32,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6963.2, 300 sec: 6692.4). Total num frames: 204337152. Throughput: 0: 1687.9. Samples: 46077006. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:48:32,935][41694] Avg episode reward: [(0, '4.600')] +[2024-11-08 06:48:37,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.9, 300 sec: 6609.1). Total num frames: 204353536. Throughput: 0: 1626.7. Samples: 46085386. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:48:37,935][41694] Avg episode reward: [(0, '4.510')] +[2024-11-08 06:48:40,816][42004] Updated weights for policy 0, policy_version 49896 (0.0033) +[2024-11-08 06:48:42,932][41694] Fps is (10 sec: 4915.0, 60 sec: 6485.3, 300 sec: 6581.4). Total num frames: 204386304. Throughput: 0: 1547.1. Samples: 46092894. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:48:42,935][41694] Avg episode reward: [(0, '4.411')] +[2024-11-08 06:48:46,670][42004] Updated weights for policy 0, policy_version 49906 (0.0034) +[2024-11-08 06:48:47,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6485.3, 300 sec: 6595.3). Total num frames: 204423168. Throughput: 0: 1554.6. Samples: 46097934. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:48:47,933][41694] Avg episode reward: [(0, '4.474')] +[2024-11-08 06:48:52,131][42004] Updated weights for policy 0, policy_version 49916 (0.0028) +[2024-11-08 06:48:52,931][41694] Fps is (10 sec: 7373.4, 60 sec: 6485.4, 300 sec: 6662.6). Total num frames: 204460032. Throughput: 0: 1684.2. Samples: 46109442. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:48:52,933][41694] Avg episode reward: [(0, '4.576')] +[2024-11-08 06:48:57,320][42004] Updated weights for policy 0, policy_version 49926 (0.0023) +[2024-11-08 06:48:57,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6553.6, 300 sec: 6692.4). Total num frames: 204500992. Throughput: 0: 1718.0. Samples: 46121136. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:48:57,933][41694] Avg episode reward: [(0, '4.566')] +[2024-11-08 06:49:02,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6826.6, 300 sec: 6678.6). Total num frames: 204533760. Throughput: 0: 1715.6. Samples: 46126574. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:49:02,933][41694] Avg episode reward: [(0, '4.407')] +[2024-11-08 06:49:02,990][42004] Updated weights for policy 0, policy_version 49936 (0.0041) +[2024-11-08 06:49:07,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6894.9, 300 sec: 6678.6). Total num frames: 204570624. Throughput: 0: 1706.3. Samples: 46137342. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:49:07,933][41694] Avg episode reward: [(0, '4.269')] +[2024-11-08 06:49:08,710][42004] Updated weights for policy 0, policy_version 49946 (0.0044) +[2024-11-08 06:49:12,934][41694] Fps is (10 sec: 5323.4, 60 sec: 6621.6, 300 sec: 6609.1). Total num frames: 204587008. Throughput: 0: 1590.2. Samples: 46143434. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:49:12,941][41694] Avg episode reward: [(0, '4.136')] +[2024-11-08 06:49:17,668][42004] Updated weights for policy 0, policy_version 49956 (0.0032) +[2024-11-08 06:49:17,931][41694] Fps is (10 sec: 4915.3, 60 sec: 6553.6, 300 sec: 6581.4). Total num frames: 204619776. Throughput: 0: 1572.8. Samples: 46147782. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:49:17,934][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 06:49:22,857][42004] Updated weights for policy 0, policy_version 49966 (0.0023) +[2024-11-08 06:49:22,931][41694] Fps is (10 sec: 7375.0, 60 sec: 6621.9, 300 sec: 6595.3). Total num frames: 204660736. Throughput: 0: 1637.1. Samples: 46159054. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:49:22,933][41694] Avg episode reward: [(0, '4.304')] +[2024-11-08 06:49:27,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6621.9, 300 sec: 6675.9). Total num frames: 204697600. Throughput: 0: 1731.8. Samples: 46170826. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:49:27,934][41694] Avg episode reward: [(0, '4.539')] +[2024-11-08 06:49:28,162][42004] Updated weights for policy 0, policy_version 49976 (0.0025) +[2024-11-08 06:49:32,932][41694] Fps is (10 sec: 7781.6, 60 sec: 6690.1, 300 sec: 6692.4). Total num frames: 204738560. Throughput: 0: 1747.5. Samples: 46176574. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:49:32,934][41694] Avg episode reward: [(0, '4.300')] +[2024-11-08 06:49:33,219][42004] Updated weights for policy 0, policy_version 49986 (0.0027) +[2024-11-08 06:49:37,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7031.5, 300 sec: 6692.4). Total num frames: 204775424. Throughput: 0: 1752.2. Samples: 46188292. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:49:37,933][41694] Avg episode reward: [(0, '4.406')] +[2024-11-08 06:49:37,963][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000049994_204775424.pth... +[2024-11-08 06:49:38,089][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000049601_203165696.pth +[2024-11-08 06:49:38,671][42004] Updated weights for policy 0, policy_version 49996 (0.0026) +[2024-11-08 06:49:42,932][41694] Fps is (10 sec: 7373.3, 60 sec: 7099.8, 300 sec: 6706.3). Total num frames: 204812288. Throughput: 0: 1733.0. Samples: 46199122. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 06:49:42,934][41694] Avg episode reward: [(0, '4.604')] +[2024-11-08 06:49:47,597][42004] Updated weights for policy 0, policy_version 50006 (0.0033) +[2024-11-08 06:49:47,932][41694] Fps is (10 sec: 4915.2, 60 sec: 6690.1, 300 sec: 6623.0). Total num frames: 204824576. Throughput: 0: 1688.1. Samples: 46202538. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 06:49:47,934][41694] Avg episode reward: [(0, '4.595')] +[2024-11-08 06:49:52,932][41694] Fps is (10 sec: 4505.6, 60 sec: 6621.8, 300 sec: 6609.4). Total num frames: 204857344. Throughput: 0: 1588.0. Samples: 46208800. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 06:49:52,934][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 06:49:53,521][42004] Updated weights for policy 0, policy_version 50016 (0.0027) +[2024-11-08 06:49:57,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6636.9). Total num frames: 204898304. Throughput: 0: 1718.4. Samples: 46220756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 06:49:57,933][41694] Avg episode reward: [(0, '4.318')] +[2024-11-08 06:49:58,852][42004] Updated weights for policy 0, policy_version 50026 (0.0031) +[2024-11-08 06:50:02,932][41694] Fps is (10 sec: 7782.6, 60 sec: 6690.2, 300 sec: 6708.7). Total num frames: 204935168. Throughput: 0: 1747.6. Samples: 46226422. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 06:50:02,934][41694] Avg episode reward: [(0, '4.430')] +[2024-11-08 06:50:04,519][42004] Updated weights for policy 0, policy_version 50036 (0.0036) +[2024-11-08 06:50:07,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.2, 300 sec: 6720.2). Total num frames: 204972032. Throughput: 0: 1740.0. Samples: 46237356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 06:50:07,933][41694] Avg episode reward: [(0, '4.514')] +[2024-11-08 06:50:09,925][42004] Updated weights for policy 0, policy_version 50046 (0.0033) +[2024-11-08 06:50:12,934][41694] Fps is (10 sec: 6961.4, 60 sec: 6963.2, 300 sec: 6720.2). Total num frames: 205004800. Throughput: 0: 1717.8. Samples: 46248132. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:50:12,937][41694] Avg episode reward: [(0, '4.371')] +[2024-11-08 06:50:16,746][42004] Updated weights for policy 0, policy_version 50056 (0.0035) +[2024-11-08 06:50:17,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6894.9, 300 sec: 6692.4). Total num frames: 205033472. Throughput: 0: 1680.7. Samples: 46252204. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:50:17,933][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 06:50:22,932][41694] Fps is (10 sec: 4097.1, 60 sec: 6417.0, 300 sec: 6609.1). Total num frames: 205045760. Throughput: 0: 1530.5. Samples: 46257164. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:50:22,934][41694] Avg episode reward: [(0, '4.405')] +[2024-11-08 06:50:26,411][42004] Updated weights for policy 0, policy_version 50066 (0.0033) +[2024-11-08 06:50:27,932][41694] Fps is (10 sec: 4505.5, 60 sec: 6348.8, 300 sec: 6595.3). Total num frames: 205078528. Throughput: 0: 1490.8. Samples: 46266208. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:50:27,935][41694] Avg episode reward: [(0, '4.333')] +[2024-11-08 06:50:32,023][42004] Updated weights for policy 0, policy_version 50076 (0.0027) +[2024-11-08 06:50:32,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6280.6, 300 sec: 6609.1). Total num frames: 205115392. Throughput: 0: 1533.8. Samples: 46271560. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:50:32,934][41694] Avg episode reward: [(0, '4.364')] +[2024-11-08 06:50:37,370][42004] Updated weights for policy 0, policy_version 50086 (0.0028) +[2024-11-08 06:50:37,932][41694] Fps is (10 sec: 7782.6, 60 sec: 6348.8, 300 sec: 6702.5). Total num frames: 205156352. Throughput: 0: 1648.4. Samples: 46282976. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:50:37,934][41694] Avg episode reward: [(0, '4.491')] +[2024-11-08 06:50:42,861][42004] Updated weights for policy 0, policy_version 50096 (0.0035) +[2024-11-08 06:50:42,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6348.8, 300 sec: 6706.3). Total num frames: 205193216. Throughput: 0: 1634.8. Samples: 46294324. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:50:42,935][41694] Avg episode reward: [(0, '4.436')] +[2024-11-08 06:50:47,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 205230080. Throughput: 0: 1629.6. Samples: 46299754. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:50:47,933][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 06:50:48,456][42004] Updated weights for policy 0, policy_version 50106 (0.0040) +[2024-11-08 06:50:52,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6690.1, 300 sec: 6664.7). Total num frames: 205258752. Throughput: 0: 1622.7. Samples: 46310378. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:50:52,939][41694] Avg episode reward: [(0, '4.469')] +[2024-11-08 06:50:57,529][42004] Updated weights for policy 0, policy_version 50116 (0.0035) +[2024-11-08 06:50:57,932][41694] Fps is (10 sec: 4505.5, 60 sec: 6280.5, 300 sec: 6609.1). Total num frames: 205275136. Throughput: 0: 1499.2. Samples: 46315594. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:50:57,933][41694] Avg episode reward: [(0, '4.543')] +[2024-11-08 06:51:02,931][41694] Fps is (10 sec: 4915.4, 60 sec: 6212.3, 300 sec: 6595.3). Total num frames: 205307904. Throughput: 0: 1510.4. Samples: 46320174. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:51:02,933][41694] Avg episode reward: [(0, '4.636')] +[2024-11-08 06:51:03,736][42004] Updated weights for policy 0, policy_version 50126 (0.0030) +[2024-11-08 06:51:07,933][41694] Fps is (10 sec: 6962.1, 60 sec: 6212.1, 300 sec: 6623.0). Total num frames: 205344768. Throughput: 0: 1628.2. Samples: 46330436. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:51:07,936][41694] Avg episode reward: [(0, '4.492')] +[2024-11-08 06:51:09,685][42004] Updated weights for policy 0, policy_version 50136 (0.0034) +[2024-11-08 06:51:12,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6212.5, 300 sec: 6684.4). Total num frames: 205377536. Throughput: 0: 1649.9. Samples: 46340454. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:51:12,934][41694] Avg episode reward: [(0, '4.601')] +[2024-11-08 06:51:16,152][42004] Updated weights for policy 0, policy_version 50146 (0.0028) +[2024-11-08 06:51:17,931][41694] Fps is (10 sec: 6554.7, 60 sec: 6280.6, 300 sec: 6678.6). Total num frames: 205410304. Throughput: 0: 1633.7. Samples: 46345076. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:51:17,933][41694] Avg episode reward: [(0, '4.474')] +[2024-11-08 06:51:21,760][42004] Updated weights for policy 0, policy_version 50156 (0.0034) +[2024-11-08 06:51:22,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 205447168. Throughput: 0: 1621.9. Samples: 46355960. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:51:22,933][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 06:51:27,865][42004] Updated weights for policy 0, policy_version 50166 (0.0035) +[2024-11-08 06:51:27,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.2, 300 sec: 6650.8). Total num frames: 205479936. Throughput: 0: 1595.5. Samples: 46366120. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:51:27,933][41694] Avg episode reward: [(0, '4.458')] +[2024-11-08 06:51:32,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6417.1, 300 sec: 6595.2). Total num frames: 205500416. Throughput: 0: 1521.9. Samples: 46368240. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:51:32,935][41694] Avg episode reward: [(0, '4.611')] +[2024-11-08 06:51:35,285][42004] Updated weights for policy 0, policy_version 50176 (0.0031) +[2024-11-08 06:51:37,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6417.1, 300 sec: 6595.3). Total num frames: 205541376. Throughput: 0: 1527.9. Samples: 46379134. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:51:37,933][41694] Avg episode reward: [(0, '4.431')] +[2024-11-08 06:51:37,944][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000050181_205541376.pth... +[2024-11-08 06:51:38,052][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000049795_203960320.pth +[2024-11-08 06:51:40,704][42004] Updated weights for policy 0, policy_version 50186 (0.0034) +[2024-11-08 06:51:42,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6417.1, 300 sec: 6623.0). Total num frames: 205578240. Throughput: 0: 1664.9. Samples: 46390514. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:51:42,933][41694] Avg episode reward: [(0, '4.565')] +[2024-11-08 06:51:46,210][42004] Updated weights for policy 0, policy_version 50196 (0.0048) +[2024-11-08 06:51:47,933][41694] Fps is (10 sec: 7372.2, 60 sec: 6417.0, 300 sec: 6692.4). Total num frames: 205615104. Throughput: 0: 1687.6. Samples: 46396118. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:51:47,935][41694] Avg episode reward: [(0, '4.389')] +[2024-11-08 06:51:51,720][42004] Updated weights for policy 0, policy_version 50206 (0.0025) +[2024-11-08 06:51:52,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6553.6, 300 sec: 6678.6). Total num frames: 205651968. Throughput: 0: 1704.9. Samples: 46407152. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:51:52,934][41694] Avg episode reward: [(0, '4.233')] +[2024-11-08 06:51:57,462][42004] Updated weights for policy 0, policy_version 50216 (0.0028) +[2024-11-08 06:51:57,937][41694] Fps is (10 sec: 6962.7, 60 sec: 6826.5, 300 sec: 6664.6). Total num frames: 205684736. Throughput: 0: 1722.2. Samples: 46417954. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:51:57,949][41694] Avg episode reward: [(0, '4.426')] +[2024-11-08 06:52:04,380][41694] Fps is (10 sec: 5008.7, 60 sec: 6532.4, 300 sec: 6590.7). Total num frames: 205709312. Throughput: 0: 1668.5. Samples: 46422578. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:52:04,383][41694] Avg episode reward: [(0, '4.578')] +[2024-11-08 06:52:06,665][42004] Updated weights for policy 0, policy_version 50226 (0.0027) +[2024-11-08 06:52:07,934][41694] Fps is (10 sec: 4914.8, 60 sec: 6485.3, 300 sec: 6567.4). Total num frames: 205733888. Throughput: 0: 1599.8. Samples: 46427954. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:52:07,940][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 06:52:12,339][42004] Updated weights for policy 0, policy_version 50236 (0.0031) +[2024-11-08 06:52:12,932][41694] Fps is (10 sec: 7185.1, 60 sec: 6553.6, 300 sec: 6581.4). Total num frames: 205770752. Throughput: 0: 1613.2. Samples: 46438716. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:52:12,934][41694] Avg episode reward: [(0, '4.204')] +[2024-11-08 06:52:17,741][42004] Updated weights for policy 0, policy_version 50246 (0.0033) +[2024-11-08 06:52:17,932][41694] Fps is (10 sec: 7374.5, 60 sec: 6621.9, 300 sec: 6595.3). Total num frames: 205807616. Throughput: 0: 1687.4. Samples: 46444172. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:52:17,933][41694] Avg episode reward: [(0, '4.513')] +[2024-11-08 06:52:22,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 205844480. Throughput: 0: 1703.3. Samples: 46455784. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:52:22,933][41694] Avg episode reward: [(0, '4.557')] +[2024-11-08 06:52:23,005][42004] Updated weights for policy 0, policy_version 50256 (0.0027) +[2024-11-08 06:52:27,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 205885440. Throughput: 0: 1709.7. Samples: 46467452. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:52:27,934][41694] Avg episode reward: [(0, '4.493')] +[2024-11-08 06:52:28,296][42004] Updated weights for policy 0, policy_version 50266 (0.0023) +[2024-11-08 06:52:32,932][41694] Fps is (10 sec: 7782.2, 60 sec: 7031.5, 300 sec: 6664.7). Total num frames: 205922304. Throughput: 0: 1703.6. Samples: 46472778. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:52:32,935][41694] Avg episode reward: [(0, '4.401')] +[2024-11-08 06:52:34,244][42004] Updated weights for policy 0, policy_version 50276 (0.0031) +[2024-11-08 06:52:39,071][41694] Fps is (10 sec: 5147.8, 60 sec: 6565.4, 300 sec: 6569.9). Total num frames: 205942784. Throughput: 0: 1638.9. Samples: 46482768. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:52:39,073][41694] Avg episode reward: [(0, '4.625')] +[2024-11-08 06:52:42,932][41694] Fps is (10 sec: 4505.7, 60 sec: 6485.3, 300 sec: 6553.6). Total num frames: 205967360. Throughput: 0: 1566.0. Samples: 46488420. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:52:42,933][41694] Avg episode reward: [(0, '4.568')] +[2024-11-08 06:52:43,053][42004] Updated weights for policy 0, policy_version 50286 (0.0037) +[2024-11-08 06:52:47,931][41694] Fps is (10 sec: 6934.2, 60 sec: 6485.4, 300 sec: 6553.6). Total num frames: 206004224. Throughput: 0: 1638.0. Samples: 46493916. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:52:47,933][41694] Avg episode reward: [(0, '4.490')] +[2024-11-08 06:52:48,521][42004] Updated weights for policy 0, policy_version 50296 (0.0043) +[2024-11-08 06:52:52,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6553.6, 300 sec: 6567.5). Total num frames: 206045184. Throughput: 0: 1715.1. Samples: 46505130. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:52:52,934][41694] Avg episode reward: [(0, '4.488')] +[2024-11-08 06:52:53,996][42004] Updated weights for policy 0, policy_version 50306 (0.0029) +[2024-11-08 06:52:57,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6622.0, 300 sec: 6636.9). Total num frames: 206082048. Throughput: 0: 1728.9. Samples: 46516518. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:52:57,934][41694] Avg episode reward: [(0, '4.264')] +[2024-11-08 06:52:59,462][42004] Updated weights for policy 0, policy_version 50316 (0.0034) +[2024-11-08 06:53:02,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6925.7, 300 sec: 6636.9). Total num frames: 206114816. Throughput: 0: 1731.3. Samples: 46522080. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:53:02,933][41694] Avg episode reward: [(0, '4.641')] +[2024-11-08 06:53:05,486][42004] Updated weights for policy 0, policy_version 50326 (0.0028) +[2024-11-08 06:53:07,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6895.2, 300 sec: 6636.9). Total num frames: 206147584. Throughput: 0: 1696.5. Samples: 46532126. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:53:07,936][41694] Avg episode reward: [(0, '4.524')] +[2024-11-08 06:53:13,787][41694] Fps is (10 sec: 4905.4, 60 sec: 6528.8, 300 sec: 6562.4). Total num frames: 206168064. Throughput: 0: 1504.7. Samples: 46536452. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:53:13,788][41694] Avg episode reward: [(0, '4.406')] +[2024-11-08 06:53:14,631][42004] Updated weights for policy 0, policy_version 50336 (0.0050) +[2024-11-08 06:53:17,932][41694] Fps is (10 sec: 4915.3, 60 sec: 6485.3, 300 sec: 6553.6). Total num frames: 206196736. Throughput: 0: 1544.0. Samples: 46542260. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:53:17,934][41694] Avg episode reward: [(0, '4.390')] +[2024-11-08 06:53:20,277][42004] Updated weights for policy 0, policy_version 50346 (0.0033) +[2024-11-08 06:53:22,932][41694] Fps is (10 sec: 7166.2, 60 sec: 6485.3, 300 sec: 6553.6). Total num frames: 206233600. Throughput: 0: 1608.8. Samples: 46553330. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:53:22,934][41694] Avg episode reward: [(0, '4.581')] +[2024-11-08 06:53:25,828][42004] Updated weights for policy 0, policy_version 50356 (0.0021) +[2024-11-08 06:53:27,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6417.1, 300 sec: 6553.6). Total num frames: 206270464. Throughput: 0: 1690.9. Samples: 46564512. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:53:27,935][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 06:53:31,183][42004] Updated weights for policy 0, policy_version 50366 (0.0036) +[2024-11-08 06:53:32,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6417.1, 300 sec: 6623.0). Total num frames: 206307328. Throughput: 0: 1692.2. Samples: 46570064. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:53:32,933][41694] Avg episode reward: [(0, '4.592')] +[2024-11-08 06:53:36,717][42004] Updated weights for policy 0, policy_version 50376 (0.0038) +[2024-11-08 06:53:37,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6889.2, 300 sec: 6650.8). Total num frames: 206348288. Throughput: 0: 1694.2. Samples: 46581368. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:53:37,933][41694] Avg episode reward: [(0, '4.417')] +[2024-11-08 06:53:37,943][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000050378_206348288.pth... +[2024-11-08 06:53:38,252][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000049994_204775424.pth +[2024-11-08 06:53:42,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6826.6, 300 sec: 6623.0). Total num frames: 206376960. Throughput: 0: 1659.8. Samples: 46591210. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:53:42,933][41694] Avg episode reward: [(0, '4.538')] +[2024-11-08 06:53:42,964][42004] Updated weights for policy 0, policy_version 50386 (0.0036) +[2024-11-08 06:53:48,481][41694] Fps is (10 sec: 4659.0, 60 sec: 6494.1, 300 sec: 6555.3). Total num frames: 206397440. Throughput: 0: 1620.1. Samples: 46595874. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:53:48,483][41694] Avg episode reward: [(0, '4.495')] +[2024-11-08 06:53:51,414][42004] Updated weights for policy 0, policy_version 50396 (0.0029) +[2024-11-08 06:53:52,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6417.0, 300 sec: 6539.7). Total num frames: 206430208. Throughput: 0: 1560.3. Samples: 46602338. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:53:52,934][41694] Avg episode reward: [(0, '4.393')] +[2024-11-08 06:53:57,130][42004] Updated weights for policy 0, policy_version 50406 (0.0032) +[2024-11-08 06:53:57,932][41694] Fps is (10 sec: 7368.3, 60 sec: 6417.0, 300 sec: 6553.6). Total num frames: 206467072. Throughput: 0: 1735.3. Samples: 46613056. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:53:57,933][41694] Avg episode reward: [(0, '4.544')] +[2024-11-08 06:54:02,647][42004] Updated weights for policy 0, policy_version 50416 (0.0028) +[2024-11-08 06:54:02,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6485.3, 300 sec: 6553.6). Total num frames: 206503936. Throughput: 0: 1697.9. Samples: 46618666. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:54:02,933][41694] Avg episode reward: [(0, '4.532')] +[2024-11-08 06:54:07,828][42004] Updated weights for policy 0, policy_version 50426 (0.0034) +[2024-11-08 06:54:07,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6621.9, 300 sec: 6637.0). Total num frames: 206544896. Throughput: 0: 1706.0. Samples: 46630098. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:54:07,933][41694] Avg episode reward: [(0, '4.274')] +[2024-11-08 06:54:12,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6994.6, 300 sec: 6650.8). Total num frames: 206581760. Throughput: 0: 1721.4. Samples: 46641974. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:54:12,933][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 06:54:13,064][42004] Updated weights for policy 0, policy_version 50436 (0.0030) +[2024-11-08 06:54:17,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6963.2, 300 sec: 6623.0). Total num frames: 206614528. Throughput: 0: 1712.3. Samples: 46647120. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:54:17,934][41694] Avg episode reward: [(0, '4.372')] +[2024-11-08 06:54:19,459][42004] Updated weights for policy 0, policy_version 50446 (0.0048) +[2024-11-08 06:54:23,206][41694] Fps is (10 sec: 5182.6, 60 sec: 6659.7, 300 sec: 6561.4). Total num frames: 206635008. Throughput: 0: 1560.8. Samples: 46652030. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:54:23,209][41694] Avg episode reward: [(0, '4.412')] +[2024-11-08 06:54:27,576][42004] Updated weights for policy 0, policy_version 50456 (0.0023) +[2024-11-08 06:54:27,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6621.9, 300 sec: 6539.7). Total num frames: 206667776. Throughput: 0: 1602.0. Samples: 46663298. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:54:27,933][41694] Avg episode reward: [(0, '4.504')] +[2024-11-08 06:54:32,903][42004] Updated weights for policy 0, policy_version 50466 (0.0028) +[2024-11-08 06:54:32,931][41694] Fps is (10 sec: 7580.9, 60 sec: 6690.1, 300 sec: 6553.6). Total num frames: 206708736. Throughput: 0: 1644.2. Samples: 46668958. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:54:32,933][41694] Avg episode reward: [(0, '4.682')] +[2024-11-08 06:54:37,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6621.9, 300 sec: 6553.6). Total num frames: 206745600. Throughput: 0: 1741.5. Samples: 46680706. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:54:37,933][41694] Avg episode reward: [(0, '4.254')] +[2024-11-08 06:54:38,110][42004] Updated weights for policy 0, policy_version 50476 (0.0033) +[2024-11-08 06:54:42,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6636.9). Total num frames: 206782464. Throughput: 0: 1760.2. Samples: 46692264. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:54:42,933][41694] Avg episode reward: [(0, '4.423')] +[2024-11-08 06:54:43,515][42004] Updated weights for policy 0, policy_version 50486 (0.0027) +[2024-11-08 06:54:47,933][41694] Fps is (10 sec: 7372.0, 60 sec: 7096.4, 300 sec: 6650.8). Total num frames: 206819328. Throughput: 0: 1758.4. Samples: 46697796. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:54:47,936][41694] Avg episode reward: [(0, '4.574')] +[2024-11-08 06:54:49,613][42004] Updated weights for policy 0, policy_version 50496 (0.0040) +[2024-11-08 06:54:52,931][41694] Fps is (10 sec: 6963.3, 60 sec: 7031.5, 300 sec: 6623.0). Total num frames: 206852096. Throughput: 0: 1727.4. Samples: 46707832. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:54:52,933][41694] Avg episode reward: [(0, '4.808')] +[2024-11-08 06:54:57,973][41694] Fps is (10 sec: 4895.3, 60 sec: 6685.5, 300 sec: 6552.7). Total num frames: 206868480. Throughput: 0: 1576.3. Samples: 46712974. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:54:57,974][41694] Avg episode reward: [(0, '4.944')] +[2024-11-08 06:54:57,999][41991] Saving new best policy, reward=4.944! +[2024-11-08 06:54:58,446][42004] Updated weights for policy 0, policy_version 50506 (0.0034) +[2024-11-08 06:55:02,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6621.9, 300 sec: 6539.7). Total num frames: 206901248. Throughput: 0: 1587.3. Samples: 46718546. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:55:02,934][41694] Avg episode reward: [(0, '4.485')] +[2024-11-08 06:55:04,108][42004] Updated weights for policy 0, policy_version 50516 (0.0031) +[2024-11-08 06:55:07,932][41694] Fps is (10 sec: 7403.6, 60 sec: 6621.9, 300 sec: 6567.5). Total num frames: 206942208. Throughput: 0: 1733.4. Samples: 46729556. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:55:07,934][41694] Avg episode reward: [(0, '4.642')] +[2024-11-08 06:55:09,420][42004] Updated weights for policy 0, policy_version 50526 (0.0029) +[2024-11-08 06:55:12,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6621.9, 300 sec: 6595.3). Total num frames: 206979072. Throughput: 0: 1727.9. Samples: 46741054. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:55:12,935][41694] Avg episode reward: [(0, '4.512')] +[2024-11-08 06:55:15,345][42004] Updated weights for policy 0, policy_version 50536 (0.0031) +[2024-11-08 06:55:17,933][41694] Fps is (10 sec: 6962.5, 60 sec: 6621.8, 300 sec: 6664.7). Total num frames: 207011840. Throughput: 0: 1710.5. Samples: 46745934. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:55:17,935][41694] Avg episode reward: [(0, '4.613')] +[2024-11-08 06:55:21,683][42004] Updated weights for policy 0, policy_version 50546 (0.0038) +[2024-11-08 06:55:22,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6789.5, 300 sec: 6650.8). Total num frames: 207040512. Throughput: 0: 1664.5. Samples: 46755608. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:55:22,933][41694] Avg episode reward: [(0, '4.892')] +[2024-11-08 06:55:27,932][41694] Fps is (10 sec: 6144.5, 60 sec: 6758.4, 300 sec: 6636.9). Total num frames: 207073280. Throughput: 0: 1620.9. Samples: 46765204. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:55:27,933][41694] Avg episode reward: [(0, '4.838')] +[2024-11-08 06:55:27,957][42004] Updated weights for policy 0, policy_version 50556 (0.0026) +[2024-11-08 06:55:32,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6417.1, 300 sec: 6567.5). Total num frames: 207093760. Throughput: 0: 1614.9. Samples: 46770464. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:55:32,933][41694] Avg episode reward: [(0, '4.528')] +[2024-11-08 06:55:36,270][42004] Updated weights for policy 0, policy_version 50566 (0.0030) +[2024-11-08 06:55:37,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6417.1, 300 sec: 6567.5). Total num frames: 207130624. Throughput: 0: 1533.6. Samples: 46776842. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:55:37,933][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 06:55:37,945][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000050569_207130624.pth... +[2024-11-08 06:55:38,085][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000050181_205541376.pth +[2024-11-08 06:55:41,638][42004] Updated weights for policy 0, policy_version 50576 (0.0044) +[2024-11-08 06:55:42,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6417.1, 300 sec: 6567.5). Total num frames: 207167488. Throughput: 0: 1672.2. Samples: 46788152. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:55:42,935][41694] Avg episode reward: [(0, '4.553')] +[2024-11-08 06:55:46,971][42004] Updated weights for policy 0, policy_version 50586 (0.0030) +[2024-11-08 06:55:47,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6417.2, 300 sec: 6595.3). Total num frames: 207204352. Throughput: 0: 1665.8. Samples: 46793506. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:55:47,935][41694] Avg episode reward: [(0, '4.644')] +[2024-11-08 06:55:52,220][42004] Updated weights for policy 0, policy_version 50596 (0.0033) +[2024-11-08 06:55:52,932][41694] Fps is (10 sec: 7781.9, 60 sec: 6553.5, 300 sec: 6678.5). Total num frames: 207245312. Throughput: 0: 1689.4. Samples: 46805580. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:55:52,934][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 06:55:57,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6831.4, 300 sec: 6678.6). Total num frames: 207278080. Throughput: 0: 1666.0. Samples: 46816022. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:55:57,935][41694] Avg episode reward: [(0, '4.458')] +[2024-11-08 06:55:58,135][42004] Updated weights for policy 0, policy_version 50606 (0.0031) +[2024-11-08 06:56:02,933][41694] Fps is (10 sec: 6553.0, 60 sec: 6826.5, 300 sec: 6664.7). Total num frames: 207310848. Throughput: 0: 1663.0. Samples: 46820772. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:56:02,937][41694] Avg episode reward: [(0, '4.421')] +[2024-11-08 06:56:04,633][42004] Updated weights for policy 0, policy_version 50616 (0.0030) +[2024-11-08 06:56:07,932][41694] Fps is (10 sec: 4915.2, 60 sec: 6417.1, 300 sec: 6609.1). Total num frames: 207327232. Throughput: 0: 1628.3. Samples: 46828884. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:56:07,933][41694] Avg episode reward: [(0, '4.253')] +[2024-11-08 06:56:12,931][41694] Fps is (10 sec: 4916.0, 60 sec: 6348.8, 300 sec: 6609.1). Total num frames: 207360000. Throughput: 0: 1590.5. Samples: 46836774. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:56:12,933][41694] Avg episode reward: [(0, '4.393')] +[2024-11-08 06:56:12,977][42004] Updated weights for policy 0, policy_version 50626 (0.0030) +[2024-11-08 06:56:17,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6485.5, 300 sec: 6623.0). Total num frames: 207400960. Throughput: 0: 1586.0. Samples: 46841832. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:56:17,932][41694] Avg episode reward: [(0, '4.532')] +[2024-11-08 06:56:18,498][42004] Updated weights for policy 0, policy_version 50636 (0.0027) +[2024-11-08 06:56:22,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6621.9, 300 sec: 6636.9). Total num frames: 207437824. Throughput: 0: 1702.4. Samples: 46853448. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:56:22,933][41694] Avg episode reward: [(0, '4.441')] +[2024-11-08 06:56:23,948][42004] Updated weights for policy 0, policy_version 50646 (0.0020) +[2024-11-08 06:56:27,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.2, 300 sec: 6692.5). Total num frames: 207474688. Throughput: 0: 1699.1. Samples: 46864610. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:56:27,933][41694] Avg episode reward: [(0, '4.591')] +[2024-11-08 06:56:29,776][42004] Updated weights for policy 0, policy_version 50656 (0.0031) +[2024-11-08 06:56:32,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6826.7, 300 sec: 6650.8). Total num frames: 207503360. Throughput: 0: 1690.9. Samples: 46869594. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:56:32,934][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 06:56:35,830][42004] Updated weights for policy 0, policy_version 50666 (0.0030) +[2024-11-08 06:56:37,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6826.7, 300 sec: 6650.8). Total num frames: 207540224. Throughput: 0: 1652.4. Samples: 46879936. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:56:37,935][41694] Avg episode reward: [(0, '4.343')] +[2024-11-08 06:56:42,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6485.3, 300 sec: 6581.4). Total num frames: 207556608. Throughput: 0: 1546.9. Samples: 46885632. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:56:42,934][41694] Avg episode reward: [(0, '4.574')] +[2024-11-08 06:56:44,368][42004] Updated weights for policy 0, policy_version 50676 (0.0038) +[2024-11-08 06:56:47,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6485.4, 300 sec: 6581.4). Total num frames: 207593472. Throughput: 0: 1561.9. Samples: 46891054. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:56:47,933][41694] Avg episode reward: [(0, '4.673')] +[2024-11-08 06:56:49,627][42004] Updated weights for policy 0, policy_version 50686 (0.0031) +[2024-11-08 06:56:52,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6485.4, 300 sec: 6609.2). Total num frames: 207634432. Throughput: 0: 1634.2. Samples: 46902424. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:56:52,935][41694] Avg episode reward: [(0, '4.260')] +[2024-11-08 06:56:55,028][42004] Updated weights for policy 0, policy_version 50696 (0.0037) +[2024-11-08 06:56:57,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6553.6, 300 sec: 6683.6). Total num frames: 207671296. Throughput: 0: 1716.0. Samples: 46913994. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:56:57,933][41694] Avg episode reward: [(0, '4.437')] +[2024-11-08 06:57:00,642][42004] Updated weights for policy 0, policy_version 50706 (0.0028) +[2024-11-08 06:57:02,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6553.8, 300 sec: 6678.6). Total num frames: 207704064. Throughput: 0: 1723.7. Samples: 46919400. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:57:02,933][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 06:57:06,947][42004] Updated weights for policy 0, policy_version 50716 (0.0042) +[2024-11-08 06:57:07,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6826.7, 300 sec: 6664.7). Total num frames: 207736832. Throughput: 0: 1682.2. Samples: 46929148. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:57:07,933][41694] Avg episode reward: [(0, '4.412')] +[2024-11-08 06:57:12,604][42004] Updated weights for policy 0, policy_version 50726 (0.0029) +[2024-11-08 06:57:12,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6894.9, 300 sec: 6664.7). Total num frames: 207773696. Throughput: 0: 1671.3. Samples: 46939818. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:57:12,933][41694] Avg episode reward: [(0, '4.263')] +[2024-11-08 06:57:17,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6485.3, 300 sec: 6595.3). Total num frames: 207790080. Throughput: 0: 1649.2. Samples: 46943810. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:57:17,933][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 06:57:20,943][42004] Updated weights for policy 0, policy_version 50736 (0.0027) +[2024-11-08 06:57:22,932][41694] Fps is (10 sec: 5324.5, 60 sec: 6485.3, 300 sec: 6581.4). Total num frames: 207826944. Throughput: 0: 1587.6. Samples: 46951380. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:57:22,934][41694] Avg episode reward: [(0, '4.346')] +[2024-11-08 06:57:26,447][42004] Updated weights for policy 0, policy_version 50746 (0.0030) +[2024-11-08 06:57:27,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6485.3, 300 sec: 6581.4). Total num frames: 207863808. Throughput: 0: 1707.4. Samples: 46962464. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:57:27,933][41694] Avg episode reward: [(0, '4.502')] +[2024-11-08 06:57:31,955][42004] Updated weights for policy 0, policy_version 50756 (0.0029) +[2024-11-08 06:57:32,931][41694] Fps is (10 sec: 7373.3, 60 sec: 6621.9, 300 sec: 6662.6). Total num frames: 207900672. Throughput: 0: 1705.5. Samples: 46967802. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:57:32,933][41694] Avg episode reward: [(0, '4.526')] +[2024-11-08 06:57:37,574][42004] Updated weights for policy 0, policy_version 50766 (0.0049) +[2024-11-08 06:57:37,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6678.6). Total num frames: 207937536. Throughput: 0: 1711.4. Samples: 46979438. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:57:37,935][41694] Avg episode reward: [(0, '4.708')] +[2024-11-08 06:57:37,952][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000050766_207937536.pth... +[2024-11-08 06:57:38,137][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000050378_206348288.pth +[2024-11-08 06:57:42,931][41694] Fps is (10 sec: 6963.1, 60 sec: 6894.9, 300 sec: 6664.7). Total num frames: 207970304. Throughput: 0: 1668.3. Samples: 46989066. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 06:57:42,934][41694] Avg episode reward: [(0, '4.489')] +[2024-11-08 06:57:43,574][42004] Updated weights for policy 0, policy_version 50776 (0.0034) +[2024-11-08 06:57:47,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6894.9, 300 sec: 6650.8). Total num frames: 208007168. Throughput: 0: 1672.4. Samples: 46994658. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 06:57:47,934][41694] Avg episode reward: [(0, '4.540')] +[2024-11-08 06:57:49,159][42004] Updated weights for policy 0, policy_version 50786 (0.0030) +[2024-11-08 06:57:52,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6553.6, 300 sec: 6595.2). Total num frames: 208027648. Throughput: 0: 1625.5. Samples: 47002296. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 06:57:52,934][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 06:57:57,296][42004] Updated weights for policy 0, policy_version 50796 (0.0029) +[2024-11-08 06:57:57,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6553.6, 300 sec: 6609.1). Total num frames: 208064512. Throughput: 0: 1607.3. Samples: 47012148. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 06:57:57,934][41694] Avg episode reward: [(0, '4.382')] +[2024-11-08 06:58:02,770][42004] Updated weights for policy 0, policy_version 50806 (0.0023) +[2024-11-08 06:58:02,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6623.0). Total num frames: 208101376. Throughput: 0: 1644.8. Samples: 47017826. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 06:58:02,933][41694] Avg episode reward: [(0, '4.352')] +[2024-11-08 06:58:07,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6698.0). Total num frames: 208138240. Throughput: 0: 1722.0. Samples: 47028870. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 06:58:07,934][41694] Avg episode reward: [(0, '4.403')] +[2024-11-08 06:58:08,171][42004] Updated weights for policy 0, policy_version 50816 (0.0027) +[2024-11-08 06:58:12,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6621.9, 300 sec: 6692.4). Total num frames: 208171008. Throughput: 0: 1718.6. Samples: 47039802. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:58:12,933][41694] Avg episode reward: [(0, '4.486')] +[2024-11-08 06:58:14,142][42004] Updated weights for policy 0, policy_version 50826 (0.0037) +[2024-11-08 06:58:17,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6963.2, 300 sec: 6692.4). Total num frames: 208207872. Throughput: 0: 1709.9. Samples: 47044746. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:58:17,934][41694] Avg episode reward: [(0, '4.504')] +[2024-11-08 06:58:19,713][42004] Updated weights for policy 0, policy_version 50836 (0.0028) +[2024-11-08 06:58:22,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6692.4). Total num frames: 208244736. Throughput: 0: 1701.7. Samples: 47056016. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:58:22,935][41694] Avg episode reward: [(0, '4.415')] +[2024-11-08 06:58:27,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6621.9, 300 sec: 6623.0). Total num frames: 208261120. Throughput: 0: 1614.4. Samples: 47061712. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:58:27,933][41694] Avg episode reward: [(0, '4.524')] +[2024-11-08 06:58:28,234][42004] Updated weights for policy 0, policy_version 50846 (0.0028) +[2024-11-08 06:58:32,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.8, 300 sec: 6609.1). Total num frames: 208297984. Throughput: 0: 1615.9. Samples: 47067372. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:58:32,934][41694] Avg episode reward: [(0, '4.345')] +[2024-11-08 06:58:33,635][42004] Updated weights for policy 0, policy_version 50856 (0.0037) +[2024-11-08 06:58:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6621.9, 300 sec: 6636.9). Total num frames: 208334848. Throughput: 0: 1691.6. Samples: 47078418. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:58:37,935][41694] Avg episode reward: [(0, '4.411')] +[2024-11-08 06:58:39,042][42004] Updated weights for policy 0, policy_version 50866 (0.0023) +[2024-11-08 06:58:42,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 6718.9). Total num frames: 208375808. Throughput: 0: 1728.2. Samples: 47089918. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:58:42,935][41694] Avg episode reward: [(0, '4.374')] +[2024-11-08 06:58:44,976][42004] Updated weights for policy 0, policy_version 50876 (0.0035) +[2024-11-08 06:58:47,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6621.9, 300 sec: 6692.4). Total num frames: 208404480. Throughput: 0: 1708.0. Samples: 47094688. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:58:47,934][41694] Avg episode reward: [(0, '4.446')] +[2024-11-08 06:58:51,090][42004] Updated weights for policy 0, policy_version 50886 (0.0028) +[2024-11-08 06:58:52,932][41694] Fps is (10 sec: 6553.3, 60 sec: 6894.9, 300 sec: 6692.4). Total num frames: 208441344. Throughput: 0: 1685.7. Samples: 47104726. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:58:52,934][41694] Avg episode reward: [(0, '4.378')] +[2024-11-08 06:58:57,044][42004] Updated weights for policy 0, policy_version 50896 (0.0029) +[2024-11-08 06:58:57,932][41694] Fps is (10 sec: 6963.4, 60 sec: 6826.7, 300 sec: 6678.6). Total num frames: 208474112. Throughput: 0: 1670.2. Samples: 47114962. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:58:57,934][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 06:59:02,931][41694] Fps is (10 sec: 4915.5, 60 sec: 6485.4, 300 sec: 6595.3). Total num frames: 208490496. Throughput: 0: 1637.4. Samples: 47118430. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:59:02,933][41694] Avg episode reward: [(0, '4.529')] +[2024-11-08 06:59:05,519][42004] Updated weights for policy 0, policy_version 50906 (0.0027) +[2024-11-08 06:59:07,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6485.3, 300 sec: 6595.3). Total num frames: 208527360. Throughput: 0: 1562.0. Samples: 47126308. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:59:07,933][41694] Avg episode reward: [(0, '4.771')] +[2024-11-08 06:59:10,969][42004] Updated weights for policy 0, policy_version 50916 (0.0023) +[2024-11-08 06:59:12,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6553.6, 300 sec: 6609.1). Total num frames: 208564224. Throughput: 0: 1686.8. Samples: 47137620. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:59:12,934][41694] Avg episode reward: [(0, '4.195')] +[2024-11-08 06:59:16,226][42004] Updated weights for policy 0, policy_version 50926 (0.0027) +[2024-11-08 06:59:17,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6553.6, 300 sec: 6670.9). Total num frames: 208601088. Throughput: 0: 1684.8. Samples: 47143188. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:59:17,937][41694] Avg episode reward: [(0, '4.484')] +[2024-11-08 06:59:22,301][42004] Updated weights for policy 0, policy_version 50936 (0.0027) +[2024-11-08 06:59:22,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6553.6, 300 sec: 6678.6). Total num frames: 208637952. Throughput: 0: 1675.9. Samples: 47153834. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:59:22,933][41694] Avg episode reward: [(0, '4.533')] +[2024-11-08 06:59:27,551][42004] Updated weights for policy 0, policy_version 50946 (0.0029) +[2024-11-08 06:59:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6664.7). Total num frames: 208674816. Throughput: 0: 1672.4. Samples: 47165176. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:59:27,935][41694] Avg episode reward: [(0, '4.404')] +[2024-11-08 06:59:32,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6894.9, 300 sec: 6664.7). Total num frames: 208711680. Throughput: 0: 1686.0. Samples: 47170560. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:59:32,936][41694] Avg episode reward: [(0, '4.655')] +[2024-11-08 06:59:33,156][42004] Updated weights for policy 0, policy_version 50956 (0.0027) +[2024-11-08 06:59:37,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6553.6, 300 sec: 6595.2). Total num frames: 208728064. Throughput: 0: 1617.5. Samples: 47177514. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:59:37,934][41694] Avg episode reward: [(0, '4.629')] +[2024-11-08 06:59:37,980][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000050960_208732160.pth... +[2024-11-08 06:59:38,090][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000050569_207130624.pth +[2024-11-08 06:59:41,298][42004] Updated weights for policy 0, policy_version 50966 (0.0032) +[2024-11-08 06:59:42,933][41694] Fps is (10 sec: 5324.1, 60 sec: 6485.2, 300 sec: 6595.2). Total num frames: 208764928. Throughput: 0: 1625.3. Samples: 47188102. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:59:42,935][41694] Avg episode reward: [(0, '4.288')] +[2024-11-08 06:59:46,746][42004] Updated weights for policy 0, policy_version 50976 (0.0034) +[2024-11-08 06:59:47,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6690.1, 300 sec: 6623.0). Total num frames: 208805888. Throughput: 0: 1663.5. Samples: 47193288. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:59:47,943][41694] Avg episode reward: [(0, '4.424')] +[2024-11-08 06:59:52,557][42004] Updated weights for policy 0, policy_version 50986 (0.0032) +[2024-11-08 06:59:52,932][41694] Fps is (10 sec: 7374.0, 60 sec: 6621.9, 300 sec: 6679.5). Total num frames: 208838656. Throughput: 0: 1746.0. Samples: 47204876. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:59:52,934][41694] Avg episode reward: [(0, '4.236')] +[2024-11-08 06:59:57,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6621.9, 300 sec: 6678.6). Total num frames: 208871424. Throughput: 0: 1710.3. Samples: 47214586. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:59:57,933][41694] Avg episode reward: [(0, '4.320')] +[2024-11-08 06:59:58,543][42004] Updated weights for policy 0, policy_version 50996 (0.0045) +[2024-11-08 07:00:02,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6963.1, 300 sec: 6664.7). Total num frames: 208908288. Throughput: 0: 1709.5. Samples: 47220116. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:00:02,935][41694] Avg episode reward: [(0, '4.485')] +[2024-11-08 07:00:04,236][42004] Updated weights for policy 0, policy_version 51006 (0.0031) +[2024-11-08 07:00:07,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6664.7). Total num frames: 208945152. Throughput: 0: 1707.5. Samples: 47230674. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:00:07,933][41694] Avg episode reward: [(0, '4.488')] +[2024-11-08 07:00:12,771][42004] Updated weights for policy 0, policy_version 51016 (0.0038) +[2024-11-08 07:00:12,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6621.8, 300 sec: 6609.2). Total num frames: 208961536. Throughput: 0: 1589.6. Samples: 47236708. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:00:12,934][41694] Avg episode reward: [(0, '4.472')] +[2024-11-08 07:00:17,931][41694] Fps is (10 sec: 4915.3, 60 sec: 6553.6, 300 sec: 6623.0). Total num frames: 208994304. Throughput: 0: 1580.6. Samples: 47241684. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:00:17,933][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 07:00:18,699][42004] Updated weights for policy 0, policy_version 51026 (0.0026) +[2024-11-08 07:00:22,934][41694] Fps is (10 sec: 6552.0, 60 sec: 6485.0, 300 sec: 6623.0). Total num frames: 209027072. Throughput: 0: 1646.0. Samples: 47251588. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:00:22,938][41694] Avg episode reward: [(0, '4.393')] +[2024-11-08 07:00:24,900][42004] Updated weights for policy 0, policy_version 51036 (0.0059) +[2024-11-08 07:00:27,933][41694] Fps is (10 sec: 6552.3, 60 sec: 6416.9, 300 sec: 6664.6). Total num frames: 209059840. Throughput: 0: 1635.3. Samples: 47261692. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:00:27,936][41694] Avg episode reward: [(0, '4.256')] +[2024-11-08 07:00:31,325][42004] Updated weights for policy 0, policy_version 51046 (0.0027) +[2024-11-08 07:00:32,932][41694] Fps is (10 sec: 6555.3, 60 sec: 6348.8, 300 sec: 6650.8). Total num frames: 209092608. Throughput: 0: 1627.6. Samples: 47266528. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:00:32,933][41694] Avg episode reward: [(0, '4.524')] +[2024-11-08 07:00:36,734][42004] Updated weights for policy 0, policy_version 51056 (0.0026) +[2024-11-08 07:00:37,932][41694] Fps is (10 sec: 7374.1, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 209133568. Throughput: 0: 1616.2. Samples: 47277604. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:00:37,934][41694] Avg episode reward: [(0, '4.436')] +[2024-11-08 07:00:42,153][42004] Updated weights for policy 0, policy_version 51066 (0.0031) +[2024-11-08 07:00:42,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.6, 300 sec: 6664.7). Total num frames: 209170432. Throughput: 0: 1650.2. Samples: 47288844. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:00:42,933][41694] Avg episode reward: [(0, '4.318')] +[2024-11-08 07:00:47,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6417.1, 300 sec: 6595.3). Total num frames: 209190912. Throughput: 0: 1579.0. Samples: 47291170. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:00:47,933][41694] Avg episode reward: [(0, '4.290')] +[2024-11-08 07:00:50,079][42004] Updated weights for policy 0, policy_version 51076 (0.0024) +[2024-11-08 07:00:52,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6485.3, 300 sec: 6609.1). Total num frames: 209227776. Throughput: 0: 1563.9. Samples: 47301050. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:00:52,934][41694] Avg episode reward: [(0, '4.561')] +[2024-11-08 07:00:55,583][42004] Updated weights for policy 0, policy_version 51086 (0.0029) +[2024-11-08 07:00:57,932][41694] Fps is (10 sec: 7372.2, 60 sec: 6553.5, 300 sec: 6623.0). Total num frames: 209264640. Throughput: 0: 1675.6. Samples: 47312112. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:00:57,935][41694] Avg episode reward: [(0, '4.325')] +[2024-11-08 07:01:01,480][42004] Updated weights for policy 0, policy_version 51096 (0.0043) +[2024-11-08 07:01:02,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6664.7). Total num frames: 209293312. Throughput: 0: 1685.4. Samples: 47317528. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:01:02,934][41694] Avg episode reward: [(0, '4.405')] +[2024-11-08 07:01:07,724][42004] Updated weights for policy 0, policy_version 51106 (0.0024) +[2024-11-08 07:01:07,932][41694] Fps is (10 sec: 6554.0, 60 sec: 6417.1, 300 sec: 6678.6). Total num frames: 209330176. Throughput: 0: 1668.1. Samples: 47326648. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:01:07,934][41694] Avg episode reward: [(0, '4.662')] +[2024-11-08 07:01:12,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.2, 300 sec: 6650.8). Total num frames: 209362944. Throughput: 0: 1675.8. Samples: 47337098. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:01:12,935][41694] Avg episode reward: [(0, '4.363')] +[2024-11-08 07:01:14,029][42004] Updated weights for policy 0, policy_version 51116 (0.0027) +[2024-11-08 07:01:17,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6650.8). Total num frames: 209399808. Throughput: 0: 1683.2. Samples: 47342270. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:01:17,933][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 07:01:21,779][42004] Updated weights for policy 0, policy_version 51126 (0.0033) +[2024-11-08 07:01:22,932][41694] Fps is (10 sec: 5324.4, 60 sec: 6485.6, 300 sec: 6581.4). Total num frames: 209416192. Throughput: 0: 1592.9. Samples: 47349286. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:01:22,934][41694] Avg episode reward: [(0, '4.351')] +[2024-11-08 07:01:27,565][42004] Updated weights for policy 0, policy_version 51136 (0.0025) +[2024-11-08 07:01:27,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6553.8, 300 sec: 6609.1). Total num frames: 209453056. Throughput: 0: 1579.2. Samples: 47359906. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:01:27,934][41694] Avg episode reward: [(0, '4.537')] +[2024-11-08 07:01:32,877][42004] Updated weights for policy 0, policy_version 51146 (0.0028) +[2024-11-08 07:01:32,931][41694] Fps is (10 sec: 7783.0, 60 sec: 6690.2, 300 sec: 6623.0). Total num frames: 209494016. Throughput: 0: 1652.8. Samples: 47365544. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:01:32,933][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 07:01:37,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6485.3, 300 sec: 6664.7). Total num frames: 209522688. Throughput: 0: 1664.1. Samples: 47375936. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:01:37,933][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 07:01:37,956][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000051154_209526784.pth... +[2024-11-08 07:01:38,106][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000050766_207937536.pth +[2024-11-08 07:01:39,444][42004] Updated weights for policy 0, policy_version 51156 (0.0039) +[2024-11-08 07:01:42,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6485.3, 300 sec: 6664.7). Total num frames: 209559552. Throughput: 0: 1638.6. Samples: 47385846. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:01:42,933][41694] Avg episode reward: [(0, '4.477')] +[2024-11-08 07:01:45,031][42004] Updated weights for policy 0, policy_version 51166 (0.0034) +[2024-11-08 07:01:47,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6650.8). Total num frames: 209596416. Throughput: 0: 1641.5. Samples: 47391396. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:01:47,933][41694] Avg episode reward: [(0, '4.626')] +[2024-11-08 07:01:50,727][42004] Updated weights for policy 0, policy_version 51176 (0.0034) +[2024-11-08 07:01:54,959][41694] Fps is (10 sec: 5789.3, 60 sec: 6471.4, 300 sec: 6591.6). Total num frames: 209629184. Throughput: 0: 1608.0. Samples: 47402270. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:01:54,961][41694] Avg episode reward: [(0, '4.236')] +[2024-11-08 07:01:57,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6417.1, 300 sec: 6595.3). Total num frames: 209649664. Throughput: 0: 1597.7. Samples: 47408996. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:01:57,934][41694] Avg episode reward: [(0, '4.184')] +[2024-11-08 07:01:58,834][42004] Updated weights for policy 0, policy_version 51186 (0.0022) +[2024-11-08 07:02:02,931][41694] Fps is (10 sec: 6679.2, 60 sec: 6485.3, 300 sec: 6595.3). Total num frames: 209682432. Throughput: 0: 1597.0. Samples: 47414136. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:02:02,933][41694] Avg episode reward: [(0, '4.487')] +[2024-11-08 07:02:04,653][42004] Updated weights for policy 0, policy_version 51196 (0.0034) +[2024-11-08 07:02:07,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6485.3, 300 sec: 6595.3). Total num frames: 209719296. Throughput: 0: 1674.0. Samples: 47424616. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:02:07,933][41694] Avg episode reward: [(0, '4.503')] +[2024-11-08 07:02:10,780][42004] Updated weights for policy 0, policy_version 51206 (0.0028) +[2024-11-08 07:02:12,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6485.3, 300 sec: 6650.8). Total num frames: 209752064. Throughput: 0: 1658.2. Samples: 47434526. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:02:12,934][41694] Avg episode reward: [(0, '4.594')] +[2024-11-08 07:02:16,725][42004] Updated weights for policy 0, policy_version 51216 (0.0029) +[2024-11-08 07:02:17,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6485.3, 300 sec: 6650.8). Total num frames: 209788928. Throughput: 0: 1637.2. Samples: 47439220. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:02:17,935][41694] Avg episode reward: [(0, '4.420')] +[2024-11-08 07:02:22,222][42004] Updated weights for policy 0, policy_version 51226 (0.0028) +[2024-11-08 07:02:22,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6826.7, 300 sec: 6650.8). Total num frames: 209825792. Throughput: 0: 1657.1. Samples: 47450504. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:02:22,933][41694] Avg episode reward: [(0, '4.671')] +[2024-11-08 07:02:29,614][41694] Fps is (10 sec: 5960.6, 60 sec: 6574.1, 300 sec: 6599.3). Total num frames: 209858560. Throughput: 0: 1629.1. Samples: 47461898. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:02:29,616][41694] Avg episode reward: [(0, '4.546')] +[2024-11-08 07:02:30,173][42004] Updated weights for policy 0, policy_version 51236 (0.0026) +[2024-11-08 07:02:32,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6485.3, 300 sec: 6595.3). Total num frames: 209883136. Throughput: 0: 1590.7. Samples: 47462976. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:02:32,933][41694] Avg episode reward: [(0, '4.386')] +[2024-11-08 07:02:35,600][42004] Updated weights for policy 0, policy_version 51246 (0.0028) +[2024-11-08 07:02:37,932][41694] Fps is (10 sec: 7386.4, 60 sec: 6621.9, 300 sec: 6609.1). Total num frames: 209920000. Throughput: 0: 1675.3. Samples: 47474262. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:02:37,934][41694] Avg episode reward: [(0, '4.479')] +[2024-11-08 07:02:41,275][42004] Updated weights for policy 0, policy_version 51256 (0.0038) +[2024-11-08 07:02:42,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6553.6, 300 sec: 6595.3). Total num frames: 209952768. Throughput: 0: 1688.2. Samples: 47484966. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:02:42,935][41694] Avg episode reward: [(0, '4.482')] +[2024-11-08 07:02:47,588][42004] Updated weights for policy 0, policy_version 51266 (0.0027) +[2024-11-08 07:02:47,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6485.3, 300 sec: 6636.9). Total num frames: 209985536. Throughput: 0: 1673.7. Samples: 47489452. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:02:47,934][41694] Avg episode reward: [(0, '4.392')] +[2024-11-08 07:02:52,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6782.8, 300 sec: 6636.9). Total num frames: 210022400. Throughput: 0: 1687.2. Samples: 47500540. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:02:52,933][41694] Avg episode reward: [(0, '4.419')] +[2024-11-08 07:02:53,031][42004] Updated weights for policy 0, policy_version 51276 (0.0025) +[2024-11-08 07:02:57,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6895.0, 300 sec: 6650.8). Total num frames: 210063360. Throughput: 0: 1719.0. Samples: 47511880. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:02:57,933][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 07:02:58,323][42004] Updated weights for policy 0, policy_version 51286 (0.0036) +[2024-11-08 07:03:04,189][41694] Fps is (10 sec: 5821.6, 60 sec: 6619.7, 300 sec: 6581.1). Total num frames: 210087936. Throughput: 0: 1680.0. Samples: 47516932. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:03:04,192][41694] Avg episode reward: [(0, '4.452')] +[2024-11-08 07:03:06,629][42004] Updated weights for policy 0, policy_version 51296 (0.0035) +[2024-11-08 07:03:07,932][41694] Fps is (10 sec: 5324.4, 60 sec: 6621.8, 300 sec: 6595.2). Total num frames: 210116608. Throughput: 0: 1621.2. Samples: 47523458. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:03:07,937][41694] Avg episode reward: [(0, '4.545')] +[2024-11-08 07:03:12,098][42004] Updated weights for policy 0, policy_version 51306 (0.0033) +[2024-11-08 07:03:12,931][41694] Fps is (10 sec: 7496.1, 60 sec: 6690.2, 300 sec: 6595.3). Total num frames: 210153472. Throughput: 0: 1680.3. Samples: 47534684. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:03:12,933][41694] Avg episode reward: [(0, '4.483')] +[2024-11-08 07:03:17,845][42004] Updated weights for policy 0, policy_version 51316 (0.0028) +[2024-11-08 07:03:17,932][41694] Fps is (10 sec: 7373.3, 60 sec: 6690.1, 300 sec: 6595.3). Total num frames: 210190336. Throughput: 0: 1713.8. Samples: 47540096. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:03:17,935][41694] Avg episode reward: [(0, '4.482')] +[2024-11-08 07:03:22,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 210223104. Throughput: 0: 1688.3. Samples: 47550234. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:03:22,936][41694] Avg episode reward: [(0, '4.481')] +[2024-11-08 07:03:23,671][42004] Updated weights for policy 0, policy_version 51326 (0.0034) +[2024-11-08 07:03:27,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6953.3, 300 sec: 6664.7). Total num frames: 210264064. Throughput: 0: 1710.6. Samples: 47561942. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:03:27,933][41694] Avg episode reward: [(0, '4.563')] +[2024-11-08 07:03:28,878][42004] Updated weights for policy 0, policy_version 51336 (0.0028) +[2024-11-08 07:03:32,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6963.2, 300 sec: 6664.7). Total num frames: 210300928. Throughput: 0: 1738.2. Samples: 47567672. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:03:32,933][41694] Avg episode reward: [(0, '4.412')] +[2024-11-08 07:03:34,417][42004] Updated weights for policy 0, policy_version 51346 (0.0026) +[2024-11-08 07:03:38,780][41694] Fps is (10 sec: 5663.6, 60 sec: 6664.2, 300 sec: 6590.2). Total num frames: 210325504. Throughput: 0: 1708.0. Samples: 47578848. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:03:38,782][41694] Avg episode reward: [(0, '4.324')] +[2024-11-08 07:03:38,796][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000051349_210325504.pth... +[2024-11-08 07:03:38,940][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000050960_208732160.pth +[2024-11-08 07:03:42,391][42004] Updated weights for policy 0, policy_version 51356 (0.0023) +[2024-11-08 07:03:42,932][41694] Fps is (10 sec: 5734.0, 60 sec: 6758.3, 300 sec: 6623.0). Total num frames: 210358272. Throughput: 0: 1634.5. Samples: 47585434. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:03:42,937][41694] Avg episode reward: [(0, '4.376')] +[2024-11-08 07:03:47,931][41694] Fps is (10 sec: 7161.0, 60 sec: 6758.4, 300 sec: 6609.2). Total num frames: 210391040. Throughput: 0: 1685.5. Samples: 47590660. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:03:47,933][41694] Avg episode reward: [(0, '4.379')] +[2024-11-08 07:03:47,943][42004] Updated weights for policy 0, policy_version 51366 (0.0042) +[2024-11-08 07:03:52,931][41694] Fps is (10 sec: 6144.5, 60 sec: 6621.9, 300 sec: 6595.3). Total num frames: 210419712. Throughput: 0: 1720.6. Samples: 47600886. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:03:52,935][41694] Avg episode reward: [(0, '4.548')] +[2024-11-08 07:03:57,786][42004] Updated weights for policy 0, policy_version 51376 (0.0043) +[2024-11-08 07:03:57,932][41694] Fps is (10 sec: 4505.6, 60 sec: 6212.3, 300 sec: 6595.3). Total num frames: 210436096. Throughput: 0: 1576.9. Samples: 47605646. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:03:57,937][41694] Avg episode reward: [(0, '4.644')] +[2024-11-08 07:04:02,933][41694] Fps is (10 sec: 4095.6, 60 sec: 6345.1, 300 sec: 6553.6). Total num frames: 210460672. Throughput: 0: 1527.3. Samples: 47608824. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:04:02,943][41694] Avg episode reward: [(0, '4.590')] +[2024-11-08 07:04:05,143][42004] Updated weights for policy 0, policy_version 51386 (0.0036) +[2024-11-08 07:04:07,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6280.6, 300 sec: 6539.7). Total num frames: 210493440. Throughput: 0: 1508.3. Samples: 47618108. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 07:04:07,934][41694] Avg episode reward: [(0, '4.407')] +[2024-11-08 07:04:13,017][41694] Fps is (10 sec: 4874.3, 60 sec: 5930.8, 300 sec: 6468.4). Total num frames: 210509824. Throughput: 0: 1354.5. Samples: 47623010. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 07:04:13,018][41694] Avg episode reward: [(0, '4.580')] +[2024-11-08 07:04:13,694][42004] Updated weights for policy 0, policy_version 51396 (0.0039) +[2024-11-08 07:04:17,932][41694] Fps is (10 sec: 4915.4, 60 sec: 5870.9, 300 sec: 6456.4). Total num frames: 210542592. Throughput: 0: 1371.4. Samples: 47629386. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 07:04:17,933][41694] Avg episode reward: [(0, '4.609')] +[2024-11-08 07:04:20,093][42004] Updated weights for policy 0, policy_version 51406 (0.0031) +[2024-11-08 07:04:22,931][41694] Fps is (10 sec: 6609.9, 60 sec: 5870.9, 300 sec: 6442.5). Total num frames: 210575360. Throughput: 0: 1357.9. Samples: 47638802. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 07:04:22,933][41694] Avg episode reward: [(0, '4.512')] +[2024-11-08 07:04:26,384][42004] Updated weights for policy 0, policy_version 51416 (0.0044) +[2024-11-08 07:04:27,931][41694] Fps is (10 sec: 6553.7, 60 sec: 5734.4, 300 sec: 6428.6). Total num frames: 210608128. Throughput: 0: 1402.4. Samples: 47648540. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 07:04:27,933][41694] Avg episode reward: [(0, '4.442')] +[2024-11-08 07:04:32,216][42004] Updated weights for policy 0, policy_version 51426 (0.0026) +[2024-11-08 07:04:32,933][41694] Fps is (10 sec: 6962.3, 60 sec: 5734.3, 300 sec: 6498.0). Total num frames: 210644992. Throughput: 0: 1398.2. Samples: 47653580. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 07:04:32,935][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 07:04:37,932][41694] Fps is (10 sec: 6963.1, 60 sec: 5955.1, 300 sec: 6484.2). Total num frames: 210677760. Throughput: 0: 1402.8. Samples: 47664012. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:04:37,934][41694] Avg episode reward: [(0, '4.393')] +[2024-11-08 07:04:38,357][42004] Updated weights for policy 0, policy_version 51436 (0.0061) +[2024-11-08 07:04:42,932][41694] Fps is (10 sec: 6963.9, 60 sec: 5939.3, 300 sec: 6470.3). Total num frames: 210714624. Throughput: 0: 1534.4. Samples: 47674696. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:04:42,933][41694] Avg episode reward: [(0, '4.442')] +[2024-11-08 07:04:44,083][42004] Updated weights for policy 0, policy_version 51446 (0.0023) +[2024-11-08 07:04:47,932][41694] Fps is (10 sec: 6143.8, 60 sec: 5802.6, 300 sec: 6442.5). Total num frames: 210739200. Throughput: 0: 1576.9. Samples: 47679784. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:04:47,934][41694] Avg episode reward: [(0, '4.266')] +[2024-11-08 07:04:51,146][42004] Updated weights for policy 0, policy_version 51456 (0.0028) +[2024-11-08 07:04:52,932][41694] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 6442.5). Total num frames: 210771968. Throughput: 0: 1556.0. Samples: 47688126. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:04:52,937][41694] Avg episode reward: [(0, '4.234')] +[2024-11-08 07:04:56,840][42004] Updated weights for policy 0, policy_version 51466 (0.0036) +[2024-11-08 07:04:57,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6212.3, 300 sec: 6442.5). Total num frames: 210808832. Throughput: 0: 1691.9. Samples: 47699002. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:04:57,936][41694] Avg episode reward: [(0, '4.293')] +[2024-11-08 07:05:02,750][42004] Updated weights for policy 0, policy_version 51476 (0.0027) +[2024-11-08 07:05:02,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6417.2, 300 sec: 6442.5). Total num frames: 210845696. Throughput: 0: 1659.9. Samples: 47704080. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:05:02,935][41694] Avg episode reward: [(0, '4.440')] +[2024-11-08 07:05:07,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 210878464. Throughput: 0: 1683.7. Samples: 47714570. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:05:07,933][41694] Avg episode reward: [(0, '4.484')] +[2024-11-08 07:05:08,894][42004] Updated weights for policy 0, policy_version 51486 (0.0035) +[2024-11-08 07:05:12,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6631.3, 300 sec: 6484.2). Total num frames: 210907136. Throughput: 0: 1672.0. Samples: 47723782. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 07:05:12,933][41694] Avg episode reward: [(0, '4.707')] +[2024-11-08 07:05:15,602][42004] Updated weights for policy 0, policy_version 51496 (0.0038) +[2024-11-08 07:05:17,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6621.9, 300 sec: 6484.2). Total num frames: 210939904. Throughput: 0: 1662.0. Samples: 47728370. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 07:05:17,933][41694] Avg episode reward: [(0, '4.324')] +[2024-11-08 07:05:22,902][42004] Updated weights for policy 0, policy_version 51506 (0.0026) +[2024-11-08 07:05:22,932][41694] Fps is (10 sec: 6143.8, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 210968576. Throughput: 0: 1603.6. Samples: 47736176. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 07:05:22,933][41694] Avg episode reward: [(0, '4.313')] +[2024-11-08 07:05:27,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6485.3, 300 sec: 6456.4). Total num frames: 210997248. Throughput: 0: 1595.1. Samples: 47746476. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 07:05:27,933][41694] Avg episode reward: [(0, '4.431')] +[2024-11-08 07:05:28,988][42004] Updated weights for policy 0, policy_version 51516 (0.0026) +[2024-11-08 07:05:32,931][41694] Fps is (10 sec: 6553.8, 60 sec: 6485.5, 300 sec: 6442.5). Total num frames: 211034112. Throughput: 0: 1593.9. Samples: 47751510. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 07:05:32,934][41694] Avg episode reward: [(0, '4.354')] +[2024-11-08 07:05:34,796][42004] Updated weights for policy 0, policy_version 51526 (0.0038) +[2024-11-08 07:05:37,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6553.6, 300 sec: 6442.5). Total num frames: 211070976. Throughput: 0: 1644.7. Samples: 47762140. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 07:05:37,934][41694] Avg episode reward: [(0, '4.679')] +[2024-11-08 07:05:37,943][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000051531_211070976.pth... +[2024-11-08 07:05:38,086][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000051154_209526784.pth +[2024-11-08 07:05:40,935][42004] Updated weights for policy 0, policy_version 51536 (0.0029) +[2024-11-08 07:05:42,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 211099648. Throughput: 0: 1620.8. Samples: 47771938. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:05:42,934][41694] Avg episode reward: [(0, '4.452')] +[2024-11-08 07:05:47,673][42004] Updated weights for policy 0, policy_version 51546 (0.0036) +[2024-11-08 07:05:47,932][41694] Fps is (10 sec: 6144.1, 60 sec: 6553.6, 300 sec: 6456.4). Total num frames: 211132416. Throughput: 0: 1597.9. Samples: 47775988. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:05:47,934][41694] Avg episode reward: [(0, '4.487')] +[2024-11-08 07:05:53,348][41694] Fps is (10 sec: 6291.6, 60 sec: 6508.4, 300 sec: 6433.5). Total num frames: 211165184. Throughput: 0: 1586.3. Samples: 47786616. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:05:53,354][41694] Avg episode reward: [(0, '4.497')] +[2024-11-08 07:05:54,242][42004] Updated weights for policy 0, policy_version 51556 (0.0032) +[2024-11-08 07:05:57,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6485.3, 300 sec: 6456.4). Total num frames: 211197952. Throughput: 0: 1592.3. Samples: 47795436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:05:57,934][41694] Avg episode reward: [(0, '4.757')] +[2024-11-08 07:06:00,279][42004] Updated weights for policy 0, policy_version 51566 (0.0036) +[2024-11-08 07:06:02,932][41694] Fps is (10 sec: 6838.4, 60 sec: 6417.1, 300 sec: 6442.5). Total num frames: 211230720. Throughput: 0: 1609.6. Samples: 47800804. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:06:02,934][41694] Avg episode reward: [(0, '4.565')] +[2024-11-08 07:06:06,195][42004] Updated weights for policy 0, policy_version 51576 (0.0024) +[2024-11-08 07:06:07,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6417.0, 300 sec: 6442.5). Total num frames: 211263488. Throughput: 0: 1665.5. Samples: 47811124. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:06:07,933][41694] Avg episode reward: [(0, '4.459')] +[2024-11-08 07:06:12,804][42004] Updated weights for policy 0, policy_version 51586 (0.0041) +[2024-11-08 07:06:12,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6485.3, 300 sec: 6428.6). Total num frames: 211296256. Throughput: 0: 1639.4. Samples: 47820250. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:06:12,933][41694] Avg episode reward: [(0, '4.618')] +[2024-11-08 07:06:17,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6417.0, 300 sec: 6470.3). Total num frames: 211324928. Throughput: 0: 1623.9. Samples: 47824586. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:06:17,934][41694] Avg episode reward: [(0, '4.727')] +[2024-11-08 07:06:19,682][42004] Updated weights for policy 0, policy_version 51596 (0.0039) +[2024-11-08 07:06:22,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6485.4, 300 sec: 6456.4). Total num frames: 211357696. Throughput: 0: 1603.2. Samples: 47834282. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:06:22,935][41694] Avg episode reward: [(0, '4.356')] +[2024-11-08 07:06:25,429][42004] Updated weights for policy 0, policy_version 51606 (0.0030) +[2024-11-08 07:06:27,932][41694] Fps is (10 sec: 6144.1, 60 sec: 6485.3, 300 sec: 6414.7). Total num frames: 211386368. Throughput: 0: 1591.8. Samples: 47843570. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:06:27,934][41694] Avg episode reward: [(0, '4.543')] +[2024-11-08 07:06:31,946][42004] Updated weights for policy 0, policy_version 51616 (0.0031) +[2024-11-08 07:06:32,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6485.3, 300 sec: 6442.5). Total num frames: 211423232. Throughput: 0: 1610.3. Samples: 47848452. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:06:32,936][41694] Avg episode reward: [(0, '4.619')] +[2024-11-08 07:06:37,722][42004] Updated weights for policy 0, policy_version 51626 (0.0037) +[2024-11-08 07:06:37,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6485.4, 300 sec: 6442.5). Total num frames: 211460096. Throughput: 0: 1631.5. Samples: 47859354. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:06:37,935][41694] Avg episode reward: [(0, '4.524')] +[2024-11-08 07:06:42,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.8, 300 sec: 6442.5). Total num frames: 211496960. Throughput: 0: 1663.3. Samples: 47870284. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:06:42,935][41694] Avg episode reward: [(0, '4.519')] +[2024-11-08 07:06:43,400][42004] Updated weights for policy 0, policy_version 51636 (0.0033) +[2024-11-08 07:06:47,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6621.9, 300 sec: 6487.1). Total num frames: 211529728. Throughput: 0: 1658.0. Samples: 47875416. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:06:47,933][41694] Avg episode reward: [(0, '4.514')] +[2024-11-08 07:06:49,675][42004] Updated weights for policy 0, policy_version 51646 (0.0036) +[2024-11-08 07:06:52,932][41694] Fps is (10 sec: 6144.1, 60 sec: 6599.4, 300 sec: 6470.3). Total num frames: 211558400. Throughput: 0: 1639.4. Samples: 47884896. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:06:52,935][41694] Avg episode reward: [(0, '4.587')] +[2024-11-08 07:06:56,008][42004] Updated weights for policy 0, policy_version 51656 (0.0027) +[2024-11-08 07:06:57,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6621.9, 300 sec: 6484.2). Total num frames: 211595264. Throughput: 0: 1663.5. Samples: 47895106. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:06:57,933][41694] Avg episode reward: [(0, '4.261')] +[2024-11-08 07:07:02,921][42004] Updated weights for policy 0, policy_version 51666 (0.0042) +[2024-11-08 07:07:02,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6456.4). Total num frames: 211623936. Throughput: 0: 1648.8. Samples: 47898780. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:07:02,933][41694] Avg episode reward: [(0, '4.371')] +[2024-11-08 07:07:07,932][41694] Fps is (10 sec: 6143.8, 60 sec: 6553.6, 300 sec: 6456.4). Total num frames: 211656704. Throughput: 0: 1657.1. Samples: 47908852. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:07:07,934][41694] Avg episode reward: [(0, '4.579')] +[2024-11-08 07:07:09,048][42004] Updated weights for policy 0, policy_version 51676 (0.0043) +[2024-11-08 07:07:12,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6442.5). Total num frames: 211689472. Throughput: 0: 1666.7. Samples: 47918570. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:07:12,934][41694] Avg episode reward: [(0, '4.358')] +[2024-11-08 07:07:15,026][42004] Updated weights for policy 0, policy_version 51686 (0.0032) +[2024-11-08 07:07:17,933][41694] Fps is (10 sec: 6553.0, 60 sec: 6621.8, 300 sec: 6428.6). Total num frames: 211722240. Throughput: 0: 1679.4. Samples: 47924026. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:07:17,935][41694] Avg episode reward: [(0, '4.357')] +[2024-11-08 07:07:20,885][42004] Updated weights for policy 0, policy_version 51696 (0.0032) +[2024-11-08 07:07:22,935][41694] Fps is (10 sec: 6551.1, 60 sec: 6621.4, 300 sec: 6465.4). Total num frames: 211755008. Throughput: 0: 1672.4. Samples: 47934616. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:07:22,938][41694] Avg episode reward: [(0, '4.367')] +[2024-11-08 07:07:27,708][42004] Updated weights for policy 0, policy_version 51706 (0.0044) +[2024-11-08 07:07:27,932][41694] Fps is (10 sec: 6554.4, 60 sec: 6690.1, 300 sec: 6456.4). Total num frames: 211787776. Throughput: 0: 1622.6. Samples: 47943300. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:07:27,935][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 07:07:32,932][41694] Fps is (10 sec: 6556.0, 60 sec: 6621.9, 300 sec: 6442.5). Total num frames: 211820544. Throughput: 0: 1630.2. Samples: 47948776. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:07:32,934][41694] Avg episode reward: [(0, '4.344')] +[2024-11-08 07:07:33,831][42004] Updated weights for policy 0, policy_version 51716 (0.0033) +[2024-11-08 07:07:37,934][41694] Fps is (10 sec: 6552.2, 60 sec: 6553.4, 300 sec: 6442.5). Total num frames: 211853312. Throughput: 0: 1625.6. Samples: 47958050. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:07:37,936][41694] Avg episode reward: [(0, '4.461')] +[2024-11-08 07:07:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000051723_211857408.pth... +[2024-11-08 07:07:38,106][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000051349_210325504.pth +[2024-11-08 07:07:39,981][42004] Updated weights for policy 0, policy_version 51726 (0.0037) +[2024-11-08 07:07:42,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6485.4, 300 sec: 6442.5). Total num frames: 211886080. Throughput: 0: 1626.1. Samples: 47968282. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:07:42,933][41694] Avg episode reward: [(0, '4.484')] +[2024-11-08 07:07:45,713][42004] Updated weights for policy 0, policy_version 51736 (0.0030) +[2024-11-08 07:07:47,932][41694] Fps is (10 sec: 6964.5, 60 sec: 6553.6, 300 sec: 6442.5). Total num frames: 211922944. Throughput: 0: 1667.6. Samples: 47973822. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:07:47,934][41694] Avg episode reward: [(0, '4.609')] +[2024-11-08 07:07:51,650][42004] Updated weights for policy 0, policy_version 51746 (0.0027) +[2024-11-08 07:07:52,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.2, 300 sec: 6428.6). Total num frames: 211959808. Throughput: 0: 1677.1. Samples: 47984320. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:07:52,933][41694] Avg episode reward: [(0, '4.603')] +[2024-11-08 07:07:57,935][41694] Fps is (10 sec: 6551.8, 60 sec: 6553.3, 300 sec: 6470.0). Total num frames: 211988480. Throughput: 0: 1679.5. Samples: 47994154. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:07:57,941][41694] Avg episode reward: [(0, '4.595')] +[2024-11-08 07:07:58,092][42004] Updated weights for policy 0, policy_version 51756 (0.0028) +[2024-11-08 07:08:02,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6621.9, 300 sec: 6456.4). Total num frames: 212021248. Throughput: 0: 1655.5. Samples: 47998522. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:08:02,934][41694] Avg episode reward: [(0, '4.662')] +[2024-11-08 07:08:04,994][42004] Updated weights for policy 0, policy_version 51766 (0.0042) +[2024-11-08 07:08:07,932][41694] Fps is (10 sec: 6145.7, 60 sec: 6553.6, 300 sec: 6428.6). Total num frames: 212049920. Throughput: 0: 1621.9. Samples: 48007594. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:08:07,937][41694] Avg episode reward: [(0, '4.531')] +[2024-11-08 07:08:10,980][42004] Updated weights for policy 0, policy_version 51776 (0.0026) +[2024-11-08 07:08:12,932][41694] Fps is (10 sec: 6553.3, 60 sec: 6621.8, 300 sec: 6428.6). Total num frames: 212086784. Throughput: 0: 1661.3. Samples: 48018060. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:08:12,935][41694] Avg episode reward: [(0, '4.546')] +[2024-11-08 07:08:16,895][42004] Updated weights for policy 0, policy_version 51786 (0.0028) +[2024-11-08 07:08:17,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6690.3, 300 sec: 6442.5). Total num frames: 212123648. Throughput: 0: 1647.4. Samples: 48022908. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:08:17,934][41694] Avg episode reward: [(0, '4.487')] +[2024-11-08 07:08:22,530][42004] Updated weights for policy 0, policy_version 51796 (0.0026) +[2024-11-08 07:08:22,932][41694] Fps is (10 sec: 6963.6, 60 sec: 6690.5, 300 sec: 6414.7). Total num frames: 212156416. Throughput: 0: 1688.5. Samples: 48034030. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:08:22,933][41694] Avg episode reward: [(0, '4.547')] +[2024-11-08 07:08:27,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 6414.8). Total num frames: 212193280. Throughput: 0: 1693.6. Samples: 48044494. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:08:27,933][41694] Avg episode reward: [(0, '4.484')] +[2024-11-08 07:08:28,329][42004] Updated weights for policy 0, policy_version 51806 (0.0028) +[2024-11-08 07:08:32,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6690.2, 300 sec: 6447.2). Total num frames: 212221952. Throughput: 0: 1675.9. Samples: 48049236. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:08:32,936][41694] Avg episode reward: [(0, '4.695')] +[2024-11-08 07:08:34,846][42004] Updated weights for policy 0, policy_version 51816 (0.0043) +[2024-11-08 07:08:37,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6690.4, 300 sec: 6428.7). Total num frames: 212254720. Throughput: 0: 1669.1. Samples: 48059428. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:08:37,933][41694] Avg episode reward: [(0, '4.493')] +[2024-11-08 07:08:41,355][42004] Updated weights for policy 0, policy_version 51826 (0.0030) +[2024-11-08 07:08:42,932][41694] Fps is (10 sec: 6553.3, 60 sec: 6690.1, 300 sec: 6428.6). Total num frames: 212287488. Throughput: 0: 1654.4. Samples: 48068600. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:08:42,939][41694] Avg episode reward: [(0, '4.768')] +[2024-11-08 07:08:47,236][42004] Updated weights for policy 0, policy_version 51836 (0.0033) +[2024-11-08 07:08:47,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6690.1, 300 sec: 6456.4). Total num frames: 212324352. Throughput: 0: 1670.9. Samples: 48073712. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:08:47,933][41694] Avg episode reward: [(0, '4.655')] +[2024-11-08 07:08:52,703][42004] Updated weights for policy 0, policy_version 51846 (0.0026) +[2024-11-08 07:08:52,932][41694] Fps is (10 sec: 7373.1, 60 sec: 6690.1, 300 sec: 6525.8). Total num frames: 212361216. Throughput: 0: 1712.1. Samples: 48084638. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:08:52,934][41694] Avg episode reward: [(0, '4.356')] +[2024-11-08 07:08:57,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6827.0, 300 sec: 6567.5). Total num frames: 212398080. Throughput: 0: 1715.5. Samples: 48095256. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:08:57,933][41694] Avg episode reward: [(0, '4.511')] +[2024-11-08 07:08:58,474][42004] Updated weights for policy 0, policy_version 51856 (0.0032) +[2024-11-08 07:09:02,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6826.7, 300 sec: 6567.5). Total num frames: 212430848. Throughput: 0: 1731.1. Samples: 48100806. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:09:02,933][41694] Avg episode reward: [(0, '4.483')] +[2024-11-08 07:09:04,893][42004] Updated weights for policy 0, policy_version 51866 (0.0034) +[2024-11-08 07:09:07,932][41694] Fps is (10 sec: 6143.8, 60 sec: 6826.7, 300 sec: 6611.0). Total num frames: 212459520. Throughput: 0: 1692.1. Samples: 48110174. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:09:07,935][41694] Avg episode reward: [(0, '4.401')] +[2024-11-08 07:09:11,398][42004] Updated weights for policy 0, policy_version 51876 (0.0027) +[2024-11-08 07:09:12,935][41694] Fps is (10 sec: 6141.9, 60 sec: 6758.1, 300 sec: 6609.1). Total num frames: 212492288. Throughput: 0: 1668.1. Samples: 48119564. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:09:12,942][41694] Avg episode reward: [(0, '4.272')] +[2024-11-08 07:09:17,294][42004] Updated weights for policy 0, policy_version 51886 (0.0032) +[2024-11-08 07:09:17,932][41694] Fps is (10 sec: 6963.4, 60 sec: 6758.4, 300 sec: 6623.0). Total num frames: 212529152. Throughput: 0: 1670.8. Samples: 48124420. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:09:17,935][41694] Avg episode reward: [(0, '4.550')] +[2024-11-08 07:09:22,837][42004] Updated weights for policy 0, policy_version 51896 (0.0029) +[2024-11-08 07:09:22,932][41694] Fps is (10 sec: 7374.9, 60 sec: 6826.6, 300 sec: 6636.9). Total num frames: 212566016. Throughput: 0: 1689.3. Samples: 48135446. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:09:22,933][41694] Avg episode reward: [(0, '4.729')] +[2024-11-08 07:09:27,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6636.9). Total num frames: 212602880. Throughput: 0: 1735.2. Samples: 48146682. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:09:27,934][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 07:09:28,400][42004] Updated weights for policy 0, policy_version 51906 (0.0027) +[2024-11-08 07:09:32,932][41694] Fps is (10 sec: 7373.1, 60 sec: 6963.2, 300 sec: 6650.8). Total num frames: 212639744. Throughput: 0: 1739.8. Samples: 48152002. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:09:32,933][41694] Avg episode reward: [(0, '4.476')] +[2024-11-08 07:09:33,865][42004] Updated weights for policy 0, policy_version 51916 (0.0025) +[2024-11-08 07:09:37,932][41694] Fps is (10 sec: 6553.3, 60 sec: 6894.9, 300 sec: 6623.0). Total num frames: 212668416. Throughput: 0: 1733.5. Samples: 48162644. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:09:37,936][41694] Avg episode reward: [(0, '4.737')] +[2024-11-08 07:09:37,986][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000051922_212672512.pth... +[2024-11-08 07:09:38,133][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000051531_211070976.pth +[2024-11-08 07:09:40,610][42004] Updated weights for policy 0, policy_version 51926 (0.0034) +[2024-11-08 07:09:43,313][41694] Fps is (10 sec: 5918.2, 60 sec: 6851.4, 300 sec: 6642.2). Total num frames: 212701184. Throughput: 0: 1691.2. Samples: 48172006. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:09:43,318][41694] Avg episode reward: [(0, '4.379')] +[2024-11-08 07:09:47,199][42004] Updated weights for policy 0, policy_version 51936 (0.0025) +[2024-11-08 07:09:47,931][41694] Fps is (10 sec: 6554.0, 60 sec: 6826.7, 300 sec: 6650.8). Total num frames: 212733952. Throughput: 0: 1672.8. Samples: 48176084. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:09:47,933][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 07:09:52,932][41694] Fps is (10 sec: 6813.4, 60 sec: 6758.4, 300 sec: 6636.9). Total num frames: 212766720. Throughput: 0: 1700.3. Samples: 48186688. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:09:52,934][41694] Avg episode reward: [(0, '4.408')] +[2024-11-08 07:09:53,069][42004] Updated weights for policy 0, policy_version 51946 (0.0021) +[2024-11-08 07:09:57,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6636.9). Total num frames: 212803584. Throughput: 0: 1731.0. Samples: 48197452. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:09:57,933][41694] Avg episode reward: [(0, '4.411')] +[2024-11-08 07:09:58,826][42004] Updated weights for policy 0, policy_version 51956 (0.0037) +[2024-11-08 07:10:02,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6758.3, 300 sec: 6636.9). Total num frames: 212836352. Throughput: 0: 1733.5. Samples: 48202428. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:10:02,934][41694] Avg episode reward: [(0, '4.447')] +[2024-11-08 07:10:05,084][42004] Updated weights for policy 0, policy_version 51966 (0.0038) +[2024-11-08 07:10:07,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6826.7, 300 sec: 6650.8). Total num frames: 212869120. Throughput: 0: 1710.7. Samples: 48212428. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:10:07,933][41694] Avg episode reward: [(0, '4.478')] +[2024-11-08 07:10:11,700][42004] Updated weights for policy 0, policy_version 51976 (0.0035) +[2024-11-08 07:10:12,932][41694] Fps is (10 sec: 6144.5, 60 sec: 6758.8, 300 sec: 6636.9). Total num frames: 212897792. Throughput: 0: 1659.8. Samples: 48221374. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:10:12,933][41694] Avg episode reward: [(0, '4.358')] +[2024-11-08 07:10:17,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6553.6, 300 sec: 6623.0). Total num frames: 212922368. Throughput: 0: 1620.2. Samples: 48224910. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:10:17,935][41694] Avg episode reward: [(0, '4.417')] +[2024-11-08 07:10:19,699][42004] Updated weights for policy 0, policy_version 51986 (0.0034) +[2024-11-08 07:10:22,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6485.4, 300 sec: 6636.9). Total num frames: 212955136. Throughput: 0: 1577.3. Samples: 48233622. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:10:22,934][41694] Avg episode reward: [(0, '4.519')] +[2024-11-08 07:10:25,669][42004] Updated weights for policy 0, policy_version 51996 (0.0032) +[2024-11-08 07:10:27,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6623.0). Total num frames: 212987904. Throughput: 0: 1609.4. Samples: 48243814. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:10:27,934][41694] Avg episode reward: [(0, '4.754')] +[2024-11-08 07:10:31,666][42004] Updated weights for policy 0, policy_version 52006 (0.0036) +[2024-11-08 07:10:32,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6417.1, 300 sec: 6623.0). Total num frames: 213024768. Throughput: 0: 1622.0. Samples: 48249074. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:10:32,935][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 07:10:37,842][42004] Updated weights for policy 0, policy_version 52016 (0.0025) +[2024-11-08 07:10:37,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6485.4, 300 sec: 6636.9). Total num frames: 213057536. Throughput: 0: 1597.3. Samples: 48258566. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:10:37,934][41694] Avg episode reward: [(0, '4.556')] +[2024-11-08 07:10:42,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6526.8, 300 sec: 6636.9). Total num frames: 213090304. Throughput: 0: 1592.7. Samples: 48269124. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:10:42,934][41694] Avg episode reward: [(0, '4.514')] +[2024-11-08 07:10:43,743][42004] Updated weights for policy 0, policy_version 52026 (0.0028) +[2024-11-08 07:10:47,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6485.3, 300 sec: 6646.3). Total num frames: 213123072. Throughput: 0: 1588.3. Samples: 48273900. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:10:47,934][41694] Avg episode reward: [(0, '4.625')] +[2024-11-08 07:10:51,007][42004] Updated weights for policy 0, policy_version 52036 (0.0057) +[2024-11-08 07:10:52,931][41694] Fps is (10 sec: 6144.2, 60 sec: 6417.1, 300 sec: 6623.0). Total num frames: 213151744. Throughput: 0: 1555.6. Samples: 48282428. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:10:52,932][41694] Avg episode reward: [(0, '4.500')] +[2024-11-08 07:10:56,984][42004] Updated weights for policy 0, policy_version 52046 (0.0043) +[2024-11-08 07:10:57,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6348.8, 300 sec: 6623.0). Total num frames: 213184512. Throughput: 0: 1589.3. Samples: 48292894. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:10:57,935][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 07:11:02,690][42004] Updated weights for policy 0, policy_version 52056 (0.0033) +[2024-11-08 07:11:02,935][41694] Fps is (10 sec: 6961.0, 60 sec: 6416.8, 300 sec: 6636.8). Total num frames: 213221376. Throughput: 0: 1628.7. Samples: 48298208. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:11:02,937][41694] Avg episode reward: [(0, '4.366')] +[2024-11-08 07:11:07,932][41694] Fps is (10 sec: 6962.8, 60 sec: 6417.0, 300 sec: 6636.9). Total num frames: 213254144. Throughput: 0: 1667.6. Samples: 48308664. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:11:07,937][41694] Avg episode reward: [(0, '4.576')] +[2024-11-08 07:11:08,548][42004] Updated weights for policy 0, policy_version 52066 (0.0026) +[2024-11-08 07:11:12,932][41694] Fps is (10 sec: 6555.2, 60 sec: 6485.3, 300 sec: 6650.8). Total num frames: 213286912. Throughput: 0: 1655.5. Samples: 48318314. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:11:12,934][41694] Avg episode reward: [(0, '4.547')] +[2024-11-08 07:11:15,271][42004] Updated weights for policy 0, policy_version 52076 (0.0035) +[2024-11-08 07:11:17,932][41694] Fps is (10 sec: 6553.8, 60 sec: 6621.8, 300 sec: 6650.8). Total num frames: 213319680. Throughput: 0: 1642.5. Samples: 48322986. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:11:17,935][41694] Avg episode reward: [(0, '4.273')] +[2024-11-08 07:11:21,553][42004] Updated weights for policy 0, policy_version 52086 (0.0026) +[2024-11-08 07:11:22,932][41694] Fps is (10 sec: 6144.3, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 213348352. Throughput: 0: 1646.0. Samples: 48332636. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:11:22,936][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 07:11:27,932][41694] Fps is (10 sec: 6144.2, 60 sec: 6553.6, 300 sec: 6636.9). Total num frames: 213381120. Throughput: 0: 1620.6. Samples: 48342052. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:11:27,933][41694] Avg episode reward: [(0, '4.453')] +[2024-11-08 07:11:28,052][42004] Updated weights for policy 0, policy_version 52096 (0.0027) +[2024-11-08 07:11:32,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6553.6, 300 sec: 6636.9). Total num frames: 213417984. Throughput: 0: 1625.4. Samples: 48347044. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:11:32,933][41694] Avg episode reward: [(0, '4.423')] +[2024-11-08 07:11:33,757][42004] Updated weights for policy 0, policy_version 52106 (0.0029) +[2024-11-08 07:11:37,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6553.6, 300 sec: 6623.0). Total num frames: 213450752. Throughput: 0: 1669.1. Samples: 48357538. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:11:37,933][41694] Avg episode reward: [(0, '4.659')] +[2024-11-08 07:11:38,049][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000052113_213454848.pth... +[2024-11-08 07:11:38,161][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000051723_211857408.pth +[2024-11-08 07:11:39,900][42004] Updated weights for policy 0, policy_version 52116 (0.0028) +[2024-11-08 07:11:42,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 6636.9). Total num frames: 213487616. Throughput: 0: 1669.4. Samples: 48368016. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:11:42,933][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 07:11:45,723][42004] Updated weights for policy 0, policy_version 52126 (0.0026) +[2024-11-08 07:11:47,933][41694] Fps is (10 sec: 6962.4, 60 sec: 6621.8, 300 sec: 6650.8). Total num frames: 213520384. Throughput: 0: 1669.2. Samples: 48373320. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:11:47,935][41694] Avg episode reward: [(0, '4.642')] +[2024-11-08 07:11:51,781][42004] Updated weights for policy 0, policy_version 52136 (0.0043) +[2024-11-08 07:11:52,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6690.1, 300 sec: 6636.9). Total num frames: 213553152. Throughput: 0: 1664.9. Samples: 48383584. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:11:52,935][41694] Avg episode reward: [(0, '4.547')] +[2024-11-08 07:11:57,932][41694] Fps is (10 sec: 6144.7, 60 sec: 6621.9, 300 sec: 6636.9). Total num frames: 213581824. Throughput: 0: 1637.0. Samples: 48391976. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:11:57,934][41694] Avg episode reward: [(0, '4.564')] +[2024-11-08 07:11:58,715][42004] Updated weights for policy 0, policy_version 52146 (0.0035) +[2024-11-08 07:12:02,932][41694] Fps is (10 sec: 6143.6, 60 sec: 6553.9, 300 sec: 6636.9). Total num frames: 213614592. Throughput: 0: 1648.5. Samples: 48397170. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:12:02,934][41694] Avg episode reward: [(0, '4.351')] +[2024-11-08 07:12:05,330][42004] Updated weights for policy 0, policy_version 52156 (0.0035) +[2024-11-08 07:12:07,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6553.7, 300 sec: 6636.9). Total num frames: 213647360. Throughput: 0: 1637.6. Samples: 48406328. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:12:07,937][41694] Avg episode reward: [(0, '4.419')] +[2024-11-08 07:12:11,284][42004] Updated weights for policy 0, policy_version 52166 (0.0024) +[2024-11-08 07:12:12,932][41694] Fps is (10 sec: 6554.0, 60 sec: 6553.7, 300 sec: 6636.9). Total num frames: 213680128. Throughput: 0: 1660.6. Samples: 48416780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:12:12,933][41694] Avg episode reward: [(0, '4.556')] +[2024-11-08 07:12:16,995][42004] Updated weights for policy 0, policy_version 52176 (0.0031) +[2024-11-08 07:12:17,932][41694] Fps is (10 sec: 6963.4, 60 sec: 6621.9, 300 sec: 6650.9). Total num frames: 213716992. Throughput: 0: 1669.4. Samples: 48422166. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:12:17,933][41694] Avg episode reward: [(0, '4.724')] +[2024-11-08 07:12:22,666][42004] Updated weights for policy 0, policy_version 52186 (0.0025) +[2024-11-08 07:12:22,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 213753856. Throughput: 0: 1672.0. Samples: 48432776. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:12:22,933][41694] Avg episode reward: [(0, '4.431')] +[2024-11-08 07:12:27,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6690.1, 300 sec: 6650.8). Total num frames: 213782528. Throughput: 0: 1665.8. Samples: 48442976. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:12:27,933][41694] Avg episode reward: [(0, '4.559')] +[2024-11-08 07:12:29,609][42004] Updated weights for policy 0, policy_version 52196 (0.0038) +[2024-11-08 07:12:32,934][41694] Fps is (10 sec: 6142.4, 60 sec: 6621.6, 300 sec: 6650.8). Total num frames: 213815296. Throughput: 0: 1633.2. Samples: 48446818. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:12:32,936][41694] Avg episode reward: [(0, '4.417')] +[2024-11-08 07:12:35,617][42004] Updated weights for policy 0, policy_version 52206 (0.0024) +[2024-11-08 07:12:37,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6690.1, 300 sec: 6664.7). Total num frames: 213852160. Throughput: 0: 1637.4. Samples: 48457266. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:12:37,933][41694] Avg episode reward: [(0, '4.491')] +[2024-11-08 07:12:41,278][42004] Updated weights for policy 0, policy_version 52216 (0.0035) +[2024-11-08 07:12:42,932][41694] Fps is (10 sec: 6964.8, 60 sec: 6621.8, 300 sec: 6650.8). Total num frames: 213884928. Throughput: 0: 1691.3. Samples: 48468086. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:12:42,934][41694] Avg episode reward: [(0, '4.522')] +[2024-11-08 07:12:46,902][42004] Updated weights for policy 0, policy_version 52226 (0.0024) +[2024-11-08 07:12:47,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.5, 300 sec: 6664.7). Total num frames: 213925888. Throughput: 0: 1684.3. Samples: 48472964. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:12:47,933][41694] Avg episode reward: [(0, '4.481')] +[2024-11-08 07:12:52,649][42004] Updated weights for policy 0, policy_version 52236 (0.0031) +[2024-11-08 07:12:52,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6758.4, 300 sec: 6678.6). Total num frames: 213958656. Throughput: 0: 1732.5. Samples: 48484290. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:12:52,937][41694] Avg episode reward: [(0, '4.519')] +[2024-11-08 07:12:57,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6894.9, 300 sec: 6692.4). Total num frames: 213995520. Throughput: 0: 1742.8. Samples: 48495206. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:12:57,934][41694] Avg episode reward: [(0, '4.452')] +[2024-11-08 07:12:58,341][42004] Updated weights for policy 0, policy_version 52246 (0.0033) +[2024-11-08 07:13:02,932][41694] Fps is (10 sec: 6144.3, 60 sec: 6758.5, 300 sec: 6678.6). Total num frames: 214020096. Throughput: 0: 1718.0. Samples: 48499474. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:13:02,935][41694] Avg episode reward: [(0, '4.396')] +[2024-11-08 07:13:06,235][42004] Updated weights for policy 0, policy_version 52256 (0.0035) +[2024-11-08 07:13:07,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6690.2, 300 sec: 6650.8). Total num frames: 214048768. Throughput: 0: 1657.5. Samples: 48507364. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:13:07,934][41694] Avg episode reward: [(0, '4.503')] +[2024-11-08 07:13:11,933][42004] Updated weights for policy 0, policy_version 52266 (0.0025) +[2024-11-08 07:13:12,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6758.4, 300 sec: 6650.8). Total num frames: 214085632. Throughput: 0: 1670.7. Samples: 48518156. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:13:12,933][41694] Avg episode reward: [(0, '4.528')] +[2024-11-08 07:13:17,679][42004] Updated weights for policy 0, policy_version 52276 (0.0024) +[2024-11-08 07:13:17,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 214122496. Throughput: 0: 1699.1. Samples: 48523274. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:13:17,935][41694] Avg episode reward: [(0, '4.688')] +[2024-11-08 07:13:22,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 214159360. Throughput: 0: 1715.6. Samples: 48534470. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:13:22,933][41694] Avg episode reward: [(0, '4.546')] +[2024-11-08 07:13:23,215][42004] Updated weights for policy 0, policy_version 52286 (0.0038) +[2024-11-08 07:13:27,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6894.9, 300 sec: 6692.4). Total num frames: 214196224. Throughput: 0: 1718.0. Samples: 48545396. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:13:27,935][41694] Avg episode reward: [(0, '4.423')] +[2024-11-08 07:13:28,718][42004] Updated weights for policy 0, policy_version 52296 (0.0028) +[2024-11-08 07:13:33,200][41694] Fps is (10 sec: 6780.8, 60 sec: 6864.5, 300 sec: 6686.4). Total num frames: 214228992. Throughput: 0: 1718.2. Samples: 48550746. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:13:33,203][41694] Avg episode reward: [(0, '4.572')] +[2024-11-08 07:13:35,708][42004] Updated weights for policy 0, policy_version 52306 (0.0036) +[2024-11-08 07:13:37,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6758.4, 300 sec: 6678.6). Total num frames: 214257664. Throughput: 0: 1670.0. Samples: 48559438. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:13:37,934][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 07:13:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000052309_214257664.pth... +[2024-11-08 07:13:38,119][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000051922_212672512.pth +[2024-11-08 07:13:41,970][42004] Updated weights for policy 0, policy_version 52316 (0.0049) +[2024-11-08 07:13:42,931][41694] Fps is (10 sec: 6313.9, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 214290432. Throughput: 0: 1645.8. Samples: 48569266. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:13:42,933][41694] Avg episode reward: [(0, '4.609')] +[2024-11-08 07:13:47,568][42004] Updated weights for policy 0, policy_version 52326 (0.0047) +[2024-11-08 07:13:47,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 6664.7). Total num frames: 214327296. Throughput: 0: 1676.7. Samples: 48574924. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:13:47,933][41694] Avg episode reward: [(0, '4.497')] +[2024-11-08 07:13:52,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 214364160. Throughput: 0: 1724.0. Samples: 48584946. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:13:52,933][41694] Avg episode reward: [(0, '4.550')] +[2024-11-08 07:13:53,472][42004] Updated weights for policy 0, policy_version 52336 (0.0032) +[2024-11-08 07:13:57,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 6664.7). Total num frames: 214396928. Throughput: 0: 1723.5. Samples: 48595712. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:13:57,937][41694] Avg episode reward: [(0, '4.763')] +[2024-11-08 07:13:59,335][42004] Updated weights for policy 0, policy_version 52346 (0.0028) +[2024-11-08 07:14:02,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 214425600. Throughput: 0: 1720.0. Samples: 48600672. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:14:02,934][41694] Avg episode reward: [(0, '4.486')] +[2024-11-08 07:14:06,814][42004] Updated weights for policy 0, policy_version 52356 (0.0035) +[2024-11-08 07:14:07,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6758.4, 300 sec: 6650.9). Total num frames: 214454272. Throughput: 0: 1650.8. Samples: 48608758. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:14:07,934][41694] Avg episode reward: [(0, '4.387')] +[2024-11-08 07:14:12,931][41694] Fps is (10 sec: 6144.2, 60 sec: 6690.2, 300 sec: 6636.9). Total num frames: 214487040. Throughput: 0: 1622.4. Samples: 48618402. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:14:12,934][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 07:14:13,071][42004] Updated weights for policy 0, policy_version 52366 (0.0040) +[2024-11-08 07:14:17,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6690.1, 300 sec: 6636.9). Total num frames: 214523904. Throughput: 0: 1625.6. Samples: 48623460. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:14:17,934][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 07:14:18,671][42004] Updated weights for policy 0, policy_version 52376 (0.0033) +[2024-11-08 07:14:22,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6636.9). Total num frames: 214560768. Throughput: 0: 1667.6. Samples: 48634478. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:14:22,933][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 07:14:24,556][42004] Updated weights for policy 0, policy_version 52386 (0.0040) +[2024-11-08 07:14:27,932][41694] Fps is (10 sec: 6963.4, 60 sec: 6621.9, 300 sec: 6623.0). Total num frames: 214593536. Throughput: 0: 1684.3. Samples: 48645060. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:14:27,934][41694] Avg episode reward: [(0, '4.443')] +[2024-11-08 07:14:30,459][42004] Updated weights for policy 0, policy_version 52396 (0.0024) +[2024-11-08 07:14:32,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6651.7, 300 sec: 6636.9). Total num frames: 214626304. Throughput: 0: 1673.1. Samples: 48650212. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:14:32,945][41694] Avg episode reward: [(0, '4.531')] +[2024-11-08 07:14:37,934][41694] Fps is (10 sec: 5323.5, 60 sec: 6485.1, 300 sec: 6603.7). Total num frames: 214646784. Throughput: 0: 1608.8. Samples: 48657346. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:14:37,938][41694] Avg episode reward: [(0, '4.738')] +[2024-11-08 07:14:40,144][42004] Updated weights for policy 0, policy_version 52406 (0.0048) +[2024-11-08 07:14:42,932][41694] Fps is (10 sec: 3686.3, 60 sec: 6212.2, 300 sec: 6539.7). Total num frames: 214663168. Throughput: 0: 1497.1. Samples: 48663080. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:14:42,936][41694] Avg episode reward: [(0, '4.804')] +[2024-11-08 07:14:47,931][41694] Fps is (10 sec: 3687.3, 60 sec: 5939.2, 300 sec: 6498.1). Total num frames: 214683648. Throughput: 0: 1437.1. Samples: 48665340. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:14:47,933][41694] Avg episode reward: [(0, '4.608')] +[2024-11-08 07:14:49,523][42004] Updated weights for policy 0, policy_version 52416 (0.0049) +[2024-11-08 07:14:52,932][41694] Fps is (10 sec: 5324.9, 60 sec: 5870.9, 300 sec: 6484.2). Total num frames: 214716416. Throughput: 0: 1448.6. Samples: 48673946. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:14:52,934][41694] Avg episode reward: [(0, '4.560')] +[2024-11-08 07:14:55,822][42004] Updated weights for policy 0, policy_version 52426 (0.0032) +[2024-11-08 07:14:57,932][41694] Fps is (10 sec: 6553.0, 60 sec: 5870.9, 300 sec: 6484.2). Total num frames: 214749184. Throughput: 0: 1451.2. Samples: 48683708. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:14:57,936][41694] Avg episode reward: [(0, '4.524')] +[2024-11-08 07:15:02,517][42004] Updated weights for policy 0, policy_version 52436 (0.0028) +[2024-11-08 07:15:02,932][41694] Fps is (10 sec: 6143.9, 60 sec: 5870.9, 300 sec: 6470.3). Total num frames: 214777856. Throughput: 0: 1445.6. Samples: 48688510. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:15:02,937][41694] Avg episode reward: [(0, '4.531')] +[2024-11-08 07:15:07,934][41694] Fps is (10 sec: 6552.6, 60 sec: 6007.2, 300 sec: 6498.0). Total num frames: 214814720. Throughput: 0: 1412.1. Samples: 48698026. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:15:07,935][41694] Avg episode reward: [(0, '4.721')] +[2024-11-08 07:15:08,302][42004] Updated weights for policy 0, policy_version 52446 (0.0034) +[2024-11-08 07:15:12,932][41694] Fps is (10 sec: 6553.6, 60 sec: 5939.2, 300 sec: 6511.9). Total num frames: 214843392. Throughput: 0: 1379.5. Samples: 48707136. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:15:12,935][41694] Avg episode reward: [(0, '4.506')] +[2024-11-08 07:15:15,616][42004] Updated weights for policy 0, policy_version 52456 (0.0039) +[2024-11-08 07:15:17,932][41694] Fps is (10 sec: 5735.6, 60 sec: 5802.7, 300 sec: 6498.1). Total num frames: 214872064. Throughput: 0: 1362.3. Samples: 48711518. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:15:17,934][41694] Avg episode reward: [(0, '4.464')] +[2024-11-08 07:15:22,548][42004] Updated weights for policy 0, policy_version 52466 (0.0046) +[2024-11-08 07:15:22,932][41694] Fps is (10 sec: 5734.3, 60 sec: 5666.1, 300 sec: 6484.2). Total num frames: 214900736. Throughput: 0: 1388.8. Samples: 48719840. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:15:22,937][41694] Avg episode reward: [(0, '4.393')] +[2024-11-08 07:15:27,933][41694] Fps is (10 sec: 6143.2, 60 sec: 5666.0, 300 sec: 6470.3). Total num frames: 214933504. Throughput: 0: 1479.7. Samples: 48729670. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:15:27,936][41694] Avg episode reward: [(0, '4.474')] +[2024-11-08 07:15:28,797][42004] Updated weights for policy 0, policy_version 52476 (0.0032) +[2024-11-08 07:15:32,932][41694] Fps is (10 sec: 6553.7, 60 sec: 5666.1, 300 sec: 6470.3). Total num frames: 214966272. Throughput: 0: 1544.3. Samples: 48734836. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 07:15:32,934][41694] Avg episode reward: [(0, '4.568')] +[2024-11-08 07:15:35,288][42004] Updated weights for policy 0, policy_version 52486 (0.0033) +[2024-11-08 07:15:37,938][41694] Fps is (10 sec: 6552.6, 60 sec: 5870.9, 300 sec: 6470.2). Total num frames: 214999040. Throughput: 0: 1569.1. Samples: 48744562. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 07:15:37,941][41694] Avg episode reward: [(0, '4.704')] +[2024-11-08 07:15:37,964][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000052490_214999040.pth... +[2024-11-08 07:15:38,137][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000052113_213454848.pth +[2024-11-08 07:15:41,527][42004] Updated weights for policy 0, policy_version 52496 (0.0032) +[2024-11-08 07:15:42,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6144.0, 300 sec: 6470.3). Total num frames: 215031808. Throughput: 0: 1568.4. Samples: 48754284. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 07:15:42,934][41694] Avg episode reward: [(0, '4.696')] +[2024-11-08 07:15:47,882][42004] Updated weights for policy 0, policy_version 52506 (0.0025) +[2024-11-08 07:15:47,932][41694] Fps is (10 sec: 6555.6, 60 sec: 6348.8, 300 sec: 6484.2). Total num frames: 215064576. Throughput: 0: 1543.6. Samples: 48757972. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 07:15:47,933][41694] Avg episode reward: [(0, '4.303')] +[2024-11-08 07:15:52,932][41694] Fps is (10 sec: 6144.1, 60 sec: 6280.5, 300 sec: 6470.3). Total num frames: 215093248. Throughput: 0: 1572.0. Samples: 48768762. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 07:15:52,933][41694] Avg episode reward: [(0, '4.489')] +[2024-11-08 07:15:54,682][42004] Updated weights for policy 0, policy_version 52516 (0.0029) +[2024-11-08 07:15:57,932][41694] Fps is (10 sec: 5734.0, 60 sec: 6212.3, 300 sec: 6442.6). Total num frames: 215121920. Throughput: 0: 1565.4. Samples: 48777580. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 07:15:57,937][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 07:16:01,563][42004] Updated weights for policy 0, policy_version 52526 (0.0048) +[2024-11-08 07:16:02,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6280.6, 300 sec: 6442.5). Total num frames: 215154688. Throughput: 0: 1561.1. Samples: 48781766. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:16:02,933][41694] Avg episode reward: [(0, '4.544')] +[2024-11-08 07:16:07,873][42004] Updated weights for policy 0, policy_version 52536 (0.0026) +[2024-11-08 07:16:07,933][41694] Fps is (10 sec: 6553.4, 60 sec: 6212.4, 300 sec: 6442.5). Total num frames: 215187456. Throughput: 0: 1585.1. Samples: 48791170. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:16:07,935][41694] Avg episode reward: [(0, '4.522')] +[2024-11-08 07:16:12,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6212.3, 300 sec: 6428.6). Total num frames: 215216128. Throughput: 0: 1578.7. Samples: 48800708. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:16:12,933][41694] Avg episode reward: [(0, '4.521')] +[2024-11-08 07:16:14,691][42004] Updated weights for policy 0, policy_version 52546 (0.0039) +[2024-11-08 07:16:17,931][41694] Fps is (10 sec: 5735.1, 60 sec: 6212.3, 300 sec: 6428.6). Total num frames: 215244800. Throughput: 0: 1560.9. Samples: 48805078. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:16:17,933][41694] Avg episode reward: [(0, '4.638')] +[2024-11-08 07:16:21,154][42004] Updated weights for policy 0, policy_version 52556 (0.0505) +[2024-11-08 07:16:22,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6280.5, 300 sec: 6428.6). Total num frames: 215277568. Throughput: 0: 1557.4. Samples: 48814642. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:16:22,935][41694] Avg episode reward: [(0, '4.745')] +[2024-11-08 07:16:27,612][42004] Updated weights for policy 0, policy_version 52566 (0.0026) +[2024-11-08 07:16:27,934][41694] Fps is (10 sec: 6552.0, 60 sec: 6280.5, 300 sec: 6414.7). Total num frames: 215310336. Throughput: 0: 1552.8. Samples: 48824164. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:16:27,937][41694] Avg episode reward: [(0, '4.398')] +[2024-11-08 07:16:32,931][41694] Fps is (10 sec: 6144.2, 60 sec: 6212.3, 300 sec: 6400.9). Total num frames: 215339008. Throughput: 0: 1567.7. Samples: 48828518. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:16:32,934][41694] Avg episode reward: [(0, '4.608')] +[2024-11-08 07:16:34,354][42004] Updated weights for policy 0, policy_version 52576 (0.0029) +[2024-11-08 07:16:37,932][41694] Fps is (10 sec: 6145.3, 60 sec: 6212.6, 300 sec: 6387.0). Total num frames: 215371776. Throughput: 0: 1545.3. Samples: 48838300. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:16:37,934][41694] Avg episode reward: [(0, '4.688')] +[2024-11-08 07:16:40,911][42004] Updated weights for policy 0, policy_version 52586 (0.0034) +[2024-11-08 07:16:42,933][41694] Fps is (10 sec: 6143.2, 60 sec: 6143.9, 300 sec: 6373.1). Total num frames: 215400448. Throughput: 0: 1546.7. Samples: 48847182. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:16:42,938][41694] Avg episode reward: [(0, '4.511')] +[2024-11-08 07:16:47,447][42004] Updated weights for policy 0, policy_version 52596 (0.0032) +[2024-11-08 07:16:47,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6144.0, 300 sec: 6373.1). Total num frames: 215433216. Throughput: 0: 1547.4. Samples: 48851398. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:16:47,936][41694] Avg episode reward: [(0, '4.578')] +[2024-11-08 07:16:52,932][41694] Fps is (10 sec: 6554.3, 60 sec: 6212.3, 300 sec: 6387.0). Total num frames: 215465984. Throughput: 0: 1556.3. Samples: 48861204. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:16:52,934][41694] Avg episode reward: [(0, '4.389')] +[2024-11-08 07:16:53,675][42004] Updated weights for policy 0, policy_version 52606 (0.0033) +[2024-11-08 07:16:57,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6387.0). Total num frames: 215498752. Throughput: 0: 1555.8. Samples: 48870720. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:16:57,933][41694] Avg episode reward: [(0, '4.680')] +[2024-11-08 07:17:00,172][42004] Updated weights for policy 0, policy_version 52616 (0.0041) +[2024-11-08 07:17:02,932][41694] Fps is (10 sec: 6144.1, 60 sec: 6212.3, 300 sec: 6373.1). Total num frames: 215527424. Throughput: 0: 1577.6. Samples: 48876068. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:17:02,934][41694] Avg episode reward: [(0, '4.774')] +[2024-11-08 07:17:07,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6075.8, 300 sec: 6345.3). Total num frames: 215552000. Throughput: 0: 1527.0. Samples: 48883358. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:17:07,938][41694] Avg episode reward: [(0, '4.564')] +[2024-11-08 07:17:08,353][42004] Updated weights for policy 0, policy_version 52626 (0.0031) +[2024-11-08 07:17:12,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6331.4). Total num frames: 215584768. Throughput: 0: 1529.0. Samples: 48892964. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:17:12,935][41694] Avg episode reward: [(0, '4.614')] +[2024-11-08 07:17:14,904][42004] Updated weights for policy 0, policy_version 52636 (0.0048) +[2024-11-08 07:17:17,932][41694] Fps is (10 sec: 6143.7, 60 sec: 6143.9, 300 sec: 6303.7). Total num frames: 215613440. Throughput: 0: 1522.8. Samples: 48897044. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:17:17,938][41694] Avg episode reward: [(0, '4.461')] +[2024-11-08 07:17:21,101][42004] Updated weights for policy 0, policy_version 52646 (0.0040) +[2024-11-08 07:17:22,932][41694] Fps is (10 sec: 6143.5, 60 sec: 6143.9, 300 sec: 6317.5). Total num frames: 215646208. Throughput: 0: 1523.9. Samples: 48906876. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:17:22,934][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 07:17:27,881][42004] Updated weights for policy 0, policy_version 52656 (0.0041) +[2024-11-08 07:17:27,932][41694] Fps is (10 sec: 6553.9, 60 sec: 6144.2, 300 sec: 6317.6). Total num frames: 215678976. Throughput: 0: 1523.1. Samples: 48915718. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:17:27,933][41694] Avg episode reward: [(0, '4.580')] +[2024-11-08 07:17:29,276][41694] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 41694], exiting... +[2024-11-08 07:17:29,285][41694] Runner profile tree view: +main_loop: 29019.2188 +[2024-11-08 07:17:29,294][41694] Collected {0: 215687168}, FPS: 6742.9 +[2024-11-08 07:17:29,328][41991] Stopping Batcher_0... +[2024-11-08 07:17:29,333][41991] Loop batcher_evt_loop terminating... +[2024-11-08 07:17:29,341][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000052658_215687168.pth... +[2024-11-08 07:17:29,520][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000052309_214257664.pth +[2024-11-08 07:17:29,533][41991] Stopping LearnerWorker_p0... +[2024-11-08 07:17:29,533][41991] Loop learner_proc0_evt_loop terminating... +[2024-11-08 07:17:29,843][42009] Stopping RolloutWorker_w5... +[2024-11-08 07:17:29,844][42006] Stopping RolloutWorker_w1... +[2024-11-08 07:17:29,850][42006] Loop rollout_proc1_evt_loop terminating... +[2024-11-08 07:17:29,850][42009] Loop rollout_proc5_evt_loop terminating... +[2024-11-08 07:17:29,847][42008] Stopping RolloutWorker_w3... +[2024-11-08 07:17:29,860][42008] Loop rollout_proc3_evt_loop terminating... +[2024-11-08 07:17:29,867][42010] Stopping RolloutWorker_w4... +[2024-11-08 07:17:29,881][42010] Loop rollout_proc4_evt_loop terminating... +[2024-11-08 07:17:29,858][42005] Stopping RolloutWorker_w0... +[2024-11-08 07:17:29,889][42005] Loop rollout_proc0_evt_loop terminating... +[2024-11-08 07:17:29,955][42004] Weights refcount: 2 0 +[2024-11-08 07:17:29,964][42004] Stopping InferenceWorker_p0-w0... +[2024-11-08 07:17:29,964][42004] Loop inference_proc0-0_evt_loop terminating... +[2024-11-08 07:17:29,934][42018] Stopping RolloutWorker_w7... +[2024-11-08 07:17:29,968][42018] Loop rollout_proc7_evt_loop terminating... +[2024-11-08 07:17:30,233][42007] Stopping RolloutWorker_w2... +[2024-11-08 07:17:30,253][42007] Loop rollout_proc2_evt_loop terminating... +[2024-11-08 07:17:31,452][42017] Stopping RolloutWorker_w6... +[2024-11-08 07:17:31,680][42017] Loop rollout_proc6_evt_loop terminating... +[2024-11-08 07:18:22,720][41694] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-08 07:18:22,722][41694] Overriding arg 'num_workers' with value 1 passed from command line +[2024-11-08 07:18:22,724][41694] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-11-08 07:18:22,726][41694] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-11-08 07:18:22,728][41694] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-11-08 07:18:22,730][41694] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-11-08 07:18:22,732][41694] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2024-11-08 07:18:22,734][41694] Adding new argument 'max_num_episodes'=100 that is not in the saved config file! +[2024-11-08 07:18:22,737][41694] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2024-11-08 07:18:22,738][41694] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2024-11-08 07:18:22,740][41694] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-11-08 07:18:22,742][41694] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-11-08 07:18:22,745][41694] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-11-08 07:18:22,746][41694] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-11-08 07:18:22,748][41694] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-11-08 07:18:22,824][41694] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-08 07:18:22,842][41694] RunningMeanStd input shape: (3, 72, 128) +[2024-11-08 07:18:22,861][41694] RunningMeanStd input shape: (1,) +[2024-11-08 07:18:22,952][41694] ConvEncoder: input_channels=3 +[2024-11-08 07:18:23,210][41694] Conv encoder output size: 512 +[2024-11-08 07:18:23,216][41694] Policy head output size: 512 +[2024-11-08 07:18:24,814][41694] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000052658_215687168.pth... +[2024-11-08 07:18:27,067][41694] Num frames 100... +[2024-11-08 07:18:27,242][41694] Num frames 200... +[2024-11-08 07:18:27,445][41694] Num frames 300... +[2024-11-08 07:18:28,200][41694] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-08 07:18:28,206][41694] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-08 07:18:28,262][41694] Num frames 400... +[2024-11-08 07:18:28,436][41694] Num frames 500... +[2024-11-08 07:18:28,660][41694] Num frames 600... +[2024-11-08 07:18:28,872][41694] Num frames 700... +[2024-11-08 07:18:29,088][41694] Num frames 800... +[2024-11-08 07:18:29,284][41694] Avg episode rewards: #0: 5.320, true rewards: #0: 4.320 +[2024-11-08 07:18:29,289][41694] Avg episode reward: 5.320, avg true_objective: 4.320 +[2024-11-08 07:18:29,366][41694] Num frames 900... +[2024-11-08 07:18:29,532][41694] Num frames 1000... +[2024-11-08 07:18:29,700][41694] Num frames 1100... +[2024-11-08 07:18:29,942][41694] Num frames 1200... +[2024-11-08 07:18:30,174][41694] Avg episode rewards: #0: 5.647, true rewards: #0: 4.313 +[2024-11-08 07:18:30,178][41694] Avg episode reward: 5.647, avg true_objective: 4.313 +[2024-11-08 07:18:30,204][41694] Num frames 1300... +[2024-11-08 07:18:30,587][41694] Num frames 1400... +[2024-11-08 07:18:31,071][41694] Num frames 1500... +[2024-11-08 07:18:31,256][41694] Num frames 1600... +[2024-11-08 07:18:31,449][41694] Num frames 1700... +[2024-11-08 07:18:31,555][41694] Avg episode rewards: #0: 5.525, true rewards: #0: 4.275 +[2024-11-08 07:18:31,562][41694] Avg episode reward: 5.525, avg true_objective: 4.275 +[2024-11-08 07:18:31,752][41694] Num frames 1800... +[2024-11-08 07:18:31,938][41694] Num frames 1900... +[2024-11-08 07:18:32,185][41694] Avg episode rewards: #0: 5.196, true rewards: #0: 3.996 +[2024-11-08 07:18:32,187][41694] Avg episode reward: 5.196, avg true_objective: 3.996 +[2024-11-08 07:18:32,191][41694] Num frames 2000... +[2024-11-08 07:18:32,440][41694] Num frames 2100... +[2024-11-08 07:18:32,630][41694] Num frames 2200... +[2024-11-08 07:18:32,887][41694] Num frames 2300... +[2024-11-08 07:18:33,110][41694] Avg episode rewards: #0: 4.970, true rewards: #0: 3.970 +[2024-11-08 07:18:33,111][41694] Avg episode reward: 4.970, avg true_objective: 3.970 +[2024-11-08 07:18:33,150][41694] Num frames 2400... +[2024-11-08 07:18:33,359][41694] Num frames 2500... +[2024-11-08 07:18:33,545][41694] Num frames 2600... +[2024-11-08 07:18:33,738][41694] Num frames 2700... +[2024-11-08 07:18:33,924][41694] Avg episode rewards: #0: 4.809, true rewards: #0: 3.951 +[2024-11-08 07:18:33,926][41694] Avg episode reward: 4.809, avg true_objective: 3.951 +[2024-11-08 07:18:33,998][41694] Num frames 2800... +[2024-11-08 07:18:34,215][41694] Num frames 2900... +[2024-11-08 07:18:34,414][41694] Num frames 3000... +[2024-11-08 07:18:34,660][41694] Num frames 3100... +[2024-11-08 07:18:34,895][41694] Avg episode rewards: #0: 4.688, true rewards: #0: 3.937 +[2024-11-08 07:18:34,899][41694] Avg episode reward: 4.688, avg true_objective: 3.937 +[2024-11-08 07:18:35,060][41694] Num frames 3200... +[2024-11-08 07:18:35,233][41694] Num frames 3300... +[2024-11-08 07:18:35,403][41694] Num frames 3400... +[2024-11-08 07:18:35,580][41694] Num frames 3500... +[2024-11-08 07:18:35,694][41694] Avg episode rewards: #0: 4.593, true rewards: #0: 3.927 +[2024-11-08 07:18:35,696][41694] Avg episode reward: 4.593, avg true_objective: 3.927 +[2024-11-08 07:18:35,822][41694] Num frames 3600... +[2024-11-08 07:18:35,993][41694] Num frames 3700... +[2024-11-08 07:18:36,161][41694] Num frames 3800... +[2024-11-08 07:18:36,351][41694] Num frames 3900... +[2024-11-08 07:18:36,523][41694] Num frames 4000... +[2024-11-08 07:18:36,716][41694] Avg episode rewards: #0: 4.878, true rewards: #0: 4.078 +[2024-11-08 07:18:36,721][41694] Avg episode reward: 4.878, avg true_objective: 4.078 +[2024-11-08 07:18:36,786][41694] Num frames 4100... +[2024-11-08 07:18:36,953][41694] Num frames 4200... +[2024-11-08 07:18:37,115][41694] Num frames 4300... +[2024-11-08 07:18:37,291][41694] Num frames 4400... +[2024-11-08 07:18:37,464][41694] Num frames 4500... +[2024-11-08 07:18:37,566][41694] Avg episode rewards: #0: 4.933, true rewards: #0: 4.115 +[2024-11-08 07:18:37,569][41694] Avg episode reward: 4.933, avg true_objective: 4.115 +[2024-11-08 07:18:37,708][41694] Num frames 4600... +[2024-11-08 07:18:37,879][41694] Num frames 4700... +[2024-11-08 07:18:38,059][41694] Num frames 4800... +[2024-11-08 07:18:38,235][41694] Num frames 4900... +[2024-11-08 07:18:38,310][41694] Avg episode rewards: #0: 4.842, true rewards: #0: 4.092 +[2024-11-08 07:18:38,311][41694] Avg episode reward: 4.842, avg true_objective: 4.092 +[2024-11-08 07:18:38,489][41694] Num frames 5000... +[2024-11-08 07:18:38,672][41694] Num frames 5100... +[2024-11-08 07:18:38,873][41694] Num frames 5200... +[2024-11-08 07:18:39,114][41694] Num frames 5300... +[2024-11-08 07:18:39,329][41694] Avg episode rewards: #0: 4.891, true rewards: #0: 4.122 +[2024-11-08 07:18:39,332][41694] Avg episode reward: 4.891, avg true_objective: 4.122 +[2024-11-08 07:18:39,430][41694] Num frames 5400... +[2024-11-08 07:18:39,638][41694] Num frames 5500... +[2024-11-08 07:18:39,838][41694] Num frames 5600... +[2024-11-08 07:18:40,057][41694] Num frames 5700... +[2024-11-08 07:18:40,265][41694] Avg episode rewards: #0: 4.910, true rewards: #0: 4.124 +[2024-11-08 07:18:40,271][41694] Avg episode reward: 4.910, avg true_objective: 4.124 +[2024-11-08 07:18:40,357][41694] Num frames 5800... +[2024-11-08 07:18:40,574][41694] Num frames 5900... +[2024-11-08 07:18:40,785][41694] Num frames 6000... +[2024-11-08 07:18:40,985][41694] Num frames 6100... +[2024-11-08 07:18:41,174][41694] Avg episode rewards: #0: 4.839, true rewards: #0: 4.105 +[2024-11-08 07:18:41,176][41694] Avg episode reward: 4.839, avg true_objective: 4.105 +[2024-11-08 07:18:41,262][41694] Num frames 6200... +[2024-11-08 07:18:41,471][41694] Num frames 6300... +[2024-11-08 07:18:41,683][41694] Num frames 6400... +[2024-11-08 07:18:41,879][41694] Num frames 6500... +[2024-11-08 07:18:42,023][41694] Avg episode rewards: #0: 4.776, true rewards: #0: 4.089 +[2024-11-08 07:18:42,030][41694] Avg episode reward: 4.776, avg true_objective: 4.089 +[2024-11-08 07:18:42,158][41694] Num frames 6600... +[2024-11-08 07:18:42,360][41694] Num frames 6700... +[2024-11-08 07:18:42,586][41694] Num frames 6800... +[2024-11-08 07:18:42,828][41694] Num frames 6900... +[2024-11-08 07:18:43,108][41694] Avg episode rewards: #0: 4.818, true rewards: #0: 4.112 +[2024-11-08 07:18:43,110][41694] Avg episode reward: 4.818, avg true_objective: 4.112 +[2024-11-08 07:18:43,139][41694] Num frames 7000... +[2024-11-08 07:18:43,373][41694] Num frames 7100... +[2024-11-08 07:18:43,615][41694] Num frames 7200... +[2024-11-08 07:18:43,846][41694] Num frames 7300... +[2024-11-08 07:18:44,054][41694] Avg episode rewards: #0: 4.763, true rewards: #0: 4.097 +[2024-11-08 07:18:44,056][41694] Avg episode reward: 4.763, avg true_objective: 4.097 +[2024-11-08 07:18:44,111][41694] Num frames 7400... +[2024-11-08 07:18:44,344][41694] Num frames 7500... +[2024-11-08 07:18:44,535][41694] Num frames 7600... +[2024-11-08 07:18:44,742][41694] Num frames 7700... +[2024-11-08 07:18:44,927][41694] Num frames 7800... +[2024-11-08 07:18:45,082][41694] Avg episode rewards: #0: 4.818, true rewards: #0: 4.134 +[2024-11-08 07:18:45,083][41694] Avg episode reward: 4.818, avg true_objective: 4.134 +[2024-11-08 07:18:45,171][41694] Num frames 7900... +[2024-11-08 07:18:45,385][41694] Num frames 8000... +[2024-11-08 07:18:45,669][41694] Num frames 8100... +[2024-11-08 07:18:45,930][41694] Num frames 8200... +[2024-11-08 07:18:46,067][41694] Avg episode rewards: #0: 4.769, true rewards: #0: 4.119 +[2024-11-08 07:18:46,068][41694] Avg episode reward: 4.769, avg true_objective: 4.119 +[2024-11-08 07:18:46,209][41694] Num frames 8300... +[2024-11-08 07:18:46,405][41694] Num frames 8400... +[2024-11-08 07:18:46,594][41694] Num frames 8500... +[2024-11-08 07:18:46,799][41694] Num frames 8600... +[2024-11-08 07:18:46,887][41694] Avg episode rewards: #0: 4.720, true rewards: #0: 4.101 +[2024-11-08 07:18:46,888][41694] Avg episode reward: 4.720, avg true_objective: 4.101 +[2024-11-08 07:18:47,076][41694] Num frames 8700... +[2024-11-08 07:18:47,301][41694] Num frames 8800... +[2024-11-08 07:18:47,504][41694] Num frames 8900... +[2024-11-08 07:18:47,757][41694] Avg episode rewards: #0: 4.680, true rewards: #0: 4.090 +[2024-11-08 07:18:47,762][41694] Avg episode reward: 4.680, avg true_objective: 4.090 +[2024-11-08 07:18:47,781][41694] Num frames 9000... +[2024-11-08 07:18:48,003][41694] Num frames 9100... +[2024-11-08 07:18:48,218][41694] Num frames 9200... +[2024-11-08 07:18:48,428][41694] Num frames 9300... +[2024-11-08 07:18:48,654][41694] Avg episode rewards: #0: 4.644, true rewards: #0: 4.079 +[2024-11-08 07:18:48,657][41694] Avg episode reward: 4.644, avg true_objective: 4.079 +[2024-11-08 07:18:48,713][41694] Num frames 9400... +[2024-11-08 07:18:48,922][41694] Num frames 9500... +[2024-11-08 07:18:49,136][41694] Num frames 9600... +[2024-11-08 07:18:49,347][41694] Num frames 9700... +[2024-11-08 07:18:49,538][41694] Avg episode rewards: #0: 4.610, true rewards: #0: 4.069 +[2024-11-08 07:18:49,541][41694] Avg episode reward: 4.610, avg true_objective: 4.069 +[2024-11-08 07:18:49,631][41694] Num frames 9800... +[2024-11-08 07:18:49,846][41694] Num frames 9900... +[2024-11-08 07:18:50,060][41694] Num frames 10000... +[2024-11-08 07:18:50,276][41694] Num frames 10100... +[2024-11-08 07:18:50,443][41694] Avg episode rewards: #0: 4.580, true rewards: #0: 4.060 +[2024-11-08 07:18:50,448][41694] Avg episode reward: 4.580, avg true_objective: 4.060 +[2024-11-08 07:18:50,566][41694] Num frames 10200... +[2024-11-08 07:18:50,768][41694] Num frames 10300... +[2024-11-08 07:18:51,005][41694] Num frames 10400... +[2024-11-08 07:18:51,210][41694] Num frames 10500... +[2024-11-08 07:18:51,480][41694] Avg episode rewards: #0: 4.614, true rewards: #0: 4.076 +[2024-11-08 07:18:51,482][41694] Avg episode reward: 4.614, avg true_objective: 4.076 +[2024-11-08 07:18:51,491][41694] Num frames 10600... +[2024-11-08 07:18:51,703][41694] Num frames 10700... +[2024-11-08 07:18:51,897][41694] Num frames 10800... +[2024-11-08 07:18:52,102][41694] Num frames 10900... +[2024-11-08 07:18:52,317][41694] Avg episode rewards: #0: 4.586, true rewards: #0: 4.067 +[2024-11-08 07:18:52,321][41694] Avg episode reward: 4.586, avg true_objective: 4.067 +[2024-11-08 07:18:52,375][41694] Num frames 11000... +[2024-11-08 07:18:52,563][41694] Num frames 11100... +[2024-11-08 07:18:52,751][41694] Num frames 11200... +[2024-11-08 07:18:52,942][41694] Num frames 11300... +[2024-11-08 07:18:53,120][41694] Avg episode rewards: #0: 4.559, true rewards: #0: 4.059 +[2024-11-08 07:18:53,127][41694] Avg episode reward: 4.559, avg true_objective: 4.059 +[2024-11-08 07:18:53,208][41694] Num frames 11400... +[2024-11-08 07:18:53,389][41694] Num frames 11500... +[2024-11-08 07:18:53,567][41694] Num frames 11600... +[2024-11-08 07:18:53,749][41694] Num frames 11700... +[2024-11-08 07:18:53,897][41694] Avg episode rewards: #0: 4.534, true rewards: #0: 4.051 +[2024-11-08 07:18:53,901][41694] Avg episode reward: 4.534, avg true_objective: 4.051 +[2024-11-08 07:18:54,012][41694] Num frames 11800... +[2024-11-08 07:18:54,196][41694] Num frames 11900... +[2024-11-08 07:18:54,375][41694] Num frames 12000... +[2024-11-08 07:18:54,555][41694] Num frames 12100... +[2024-11-08 07:18:54,796][41694] Avg episode rewards: #0: 4.566, true rewards: #0: 4.066 +[2024-11-08 07:18:54,798][41694] Avg episode reward: 4.566, avg true_objective: 4.066 +[2024-11-08 07:18:54,806][41694] Num frames 12200... +[2024-11-08 07:18:55,133][41694] Num frames 12300... +[2024-11-08 07:18:55,418][41694] Num frames 12400... +[2024-11-08 07:18:55,680][41694] Num frames 12500... +[2024-11-08 07:18:55,959][41694] Avg episode rewards: #0: 4.542, true rewards: #0: 4.058 +[2024-11-08 07:18:55,963][41694] Avg episode reward: 4.542, avg true_objective: 4.058 +[2024-11-08 07:18:56,019][41694] Num frames 12600... +[2024-11-08 07:18:56,300][41694] Num frames 12700... +[2024-11-08 07:18:56,582][41694] Num frames 12800... +[2024-11-08 07:18:56,862][41694] Num frames 12900... +[2024-11-08 07:18:57,097][41694] Avg episode rewards: #0: 4.520, true rewards: #0: 4.052 +[2024-11-08 07:18:57,099][41694] Avg episode reward: 4.520, avg true_objective: 4.052 +[2024-11-08 07:18:57,197][41694] Num frames 13000... +[2024-11-08 07:18:57,453][41694] Num frames 13100... +[2024-11-08 07:18:57,683][41694] Num frames 13200... +[2024-11-08 07:18:57,915][41694] Num frames 13300... +[2024-11-08 07:18:58,104][41694] Avg episode rewards: #0: 4.500, true rewards: #0: 4.045 +[2024-11-08 07:18:58,106][41694] Avg episode reward: 4.500, avg true_objective: 4.045 +[2024-11-08 07:18:58,221][41694] Num frames 13400... +[2024-11-08 07:18:58,418][41694] Num frames 13500... +[2024-11-08 07:18:58,611][41694] Num frames 13600... +[2024-11-08 07:18:58,799][41694] Num frames 13700... +[2024-11-08 07:18:58,925][41694] Avg episode rewards: #0: 4.480, true rewards: #0: 4.039 +[2024-11-08 07:18:58,930][41694] Avg episode reward: 4.480, avg true_objective: 4.039 +[2024-11-08 07:18:59,082][41694] Num frames 13800... +[2024-11-08 07:18:59,280][41694] Num frames 13900... +[2024-11-08 07:18:59,479][41694] Num frames 14000... +[2024-11-08 07:18:59,683][41694] Num frames 14100... +[2024-11-08 07:18:59,785][41694] Avg episode rewards: #0: 4.462, true rewards: #0: 4.033 +[2024-11-08 07:18:59,786][41694] Avg episode reward: 4.462, avg true_objective: 4.033 +[2024-11-08 07:18:59,962][41694] Num frames 14200... +[2024-11-08 07:19:00,269][41694] Num frames 14300... +[2024-11-08 07:19:00,973][41694] Num frames 14400... +[2024-11-08 07:19:01,201][41694] Num frames 14500... +[2024-11-08 07:19:01,262][41694] Avg episode rewards: #0: 4.445, true rewards: #0: 4.028 +[2024-11-08 07:19:01,264][41694] Avg episode reward: 4.445, avg true_objective: 4.028 +[2024-11-08 07:19:01,514][41694] Num frames 14600... +[2024-11-08 07:19:01,693][41694] Num frames 14700... +[2024-11-08 07:19:01,881][41694] Num frames 14800... +[2024-11-08 07:19:02,089][41694] Avg episode rewards: #0: 4.428, true rewards: #0: 4.023 +[2024-11-08 07:19:02,091][41694] Avg episode reward: 4.428, avg true_objective: 4.023 +[2024-11-08 07:19:02,140][41694] Num frames 14900... +[2024-11-08 07:19:02,330][41694] Num frames 15000... +[2024-11-08 07:19:02,503][41694] Num frames 15100... +[2024-11-08 07:19:02,676][41694] Num frames 15200... +[2024-11-08 07:19:02,854][41694] Avg episode rewards: #0: 4.413, true rewards: #0: 4.018 +[2024-11-08 07:19:02,858][41694] Avg episode reward: 4.413, avg true_objective: 4.018 +[2024-11-08 07:19:02,936][41694] Num frames 15300... +[2024-11-08 07:19:03,115][41694] Num frames 15400... +[2024-11-08 07:19:03,304][41694] Num frames 15500... +[2024-11-08 07:19:03,504][41694] Num frames 15600... +[2024-11-08 07:19:03,671][41694] Avg episode rewards: #0: 4.398, true rewards: #0: 4.014 +[2024-11-08 07:19:03,677][41694] Avg episode reward: 4.398, avg true_objective: 4.014 +[2024-11-08 07:19:03,791][41694] Num frames 15700... +[2024-11-08 07:19:04,007][41694] Num frames 15800... +[2024-11-08 07:19:04,194][41694] Num frames 15900... +[2024-11-08 07:19:04,386][41694] Num frames 16000... +[2024-11-08 07:19:04,521][41694] Avg episode rewards: #0: 4.384, true rewards: #0: 4.009 +[2024-11-08 07:19:04,528][41694] Avg episode reward: 4.384, avg true_objective: 4.009 +[2024-11-08 07:19:04,670][41694] Num frames 16100... +[2024-11-08 07:19:04,857][41694] Num frames 16200... +[2024-11-08 07:19:05,084][41694] Num frames 16300... +[2024-11-08 07:19:05,287][41694] Num frames 16400... +[2024-11-08 07:19:05,489][41694] Num frames 16500... +[2024-11-08 07:19:05,707][41694] Avg episode rewards: #0: 4.459, true rewards: #0: 4.044 +[2024-11-08 07:19:05,712][41694] Avg episode reward: 4.459, avg true_objective: 4.044 +[2024-11-08 07:19:05,769][41694] Num frames 16600... +[2024-11-08 07:19:05,964][41694] Num frames 16700... +[2024-11-08 07:19:06,164][41694] Num frames 16800... +[2024-11-08 07:19:06,355][41694] Num frames 16900... +[2024-11-08 07:19:06,539][41694] Avg episode rewards: #0: 4.444, true rewards: #0: 4.039 +[2024-11-08 07:19:06,542][41694] Avg episode reward: 4.444, avg true_objective: 4.039 +[2024-11-08 07:19:06,638][41694] Num frames 17000... +[2024-11-08 07:19:06,836][41694] Num frames 17100... +[2024-11-08 07:19:07,040][41694] Num frames 17200... +[2024-11-08 07:19:07,228][41694] Num frames 17300... +[2024-11-08 07:19:07,411][41694] Num frames 17400... +[2024-11-08 07:19:07,498][41694] Avg episode rewards: #0: 4.468, true rewards: #0: 4.050 +[2024-11-08 07:19:07,504][41694] Avg episode reward: 4.468, avg true_objective: 4.050 +[2024-11-08 07:19:07,686][41694] Num frames 17500... +[2024-11-08 07:19:07,860][41694] Num frames 17600... +[2024-11-08 07:19:08,042][41694] Num frames 17700... +[2024-11-08 07:19:08,278][41694] Avg episode rewards: #0: 4.454, true rewards: #0: 4.045 +[2024-11-08 07:19:08,281][41694] Avg episode reward: 4.454, avg true_objective: 4.045 +[2024-11-08 07:19:08,290][41694] Num frames 17800... +[2024-11-08 07:19:08,483][41694] Num frames 17900... +[2024-11-08 07:19:08,671][41694] Num frames 18000... +[2024-11-08 07:19:08,853][41694] Num frames 18100... +[2024-11-08 07:19:09,027][41694] Num frames 18200... +[2024-11-08 07:19:09,116][41694] Avg episode rewards: #0: 4.470, true rewards: #0: 4.047 +[2024-11-08 07:19:09,120][41694] Avg episode reward: 4.470, avg true_objective: 4.047 +[2024-11-08 07:19:09,289][41694] Num frames 18300... +[2024-11-08 07:19:09,506][41694] Num frames 18400... +[2024-11-08 07:19:09,684][41694] Num frames 18500... +[2024-11-08 07:19:09,930][41694] Avg episode rewards: #0: 4.456, true rewards: #0: 4.043 +[2024-11-08 07:19:09,934][41694] Avg episode reward: 4.456, avg true_objective: 4.043 +[2024-11-08 07:19:09,947][41694] Num frames 18600... +[2024-11-08 07:19:10,139][41694] Num frames 18700... +[2024-11-08 07:19:10,325][41694] Num frames 18800... +[2024-11-08 07:19:10,508][41694] Num frames 18900... +[2024-11-08 07:19:10,685][41694] Num frames 19000... +[2024-11-08 07:19:10,827][41694] Avg episode rewards: #0: 4.478, true rewards: #0: 4.052 +[2024-11-08 07:19:10,829][41694] Avg episode reward: 4.478, avg true_objective: 4.052 +[2024-11-08 07:19:10,932][41694] Num frames 19100... +[2024-11-08 07:19:11,119][41694] Num frames 19200... +[2024-11-08 07:19:11,328][41694] Num frames 19300... +[2024-11-08 07:19:11,564][41694] Num frames 19400... +[2024-11-08 07:19:11,688][41694] Avg episode rewards: #0: 4.464, true rewards: #0: 4.048 +[2024-11-08 07:19:11,690][41694] Avg episode reward: 4.464, avg true_objective: 4.048 +[2024-11-08 07:19:11,841][41694] Num frames 19500... +[2024-11-08 07:19:12,049][41694] Num frames 19600... +[2024-11-08 07:19:12,264][41694] Num frames 19700... +[2024-11-08 07:19:12,454][41694] Num frames 19800... +[2024-11-08 07:19:12,541][41694] Avg episode rewards: #0: 4.452, true rewards: #0: 4.043 +[2024-11-08 07:19:12,543][41694] Avg episode reward: 4.452, avg true_objective: 4.043 +[2024-11-08 07:19:12,729][41694] Num frames 19900... +[2024-11-08 07:19:12,918][41694] Num frames 20000... +[2024-11-08 07:19:13,115][41694] Num frames 20100... +[2024-11-08 07:19:13,336][41694] Num frames 20200... +[2024-11-08 07:19:13,514][41694] Avg episode rewards: #0: 4.472, true rewards: #0: 4.052 +[2024-11-08 07:19:13,517][41694] Avg episode reward: 4.472, avg true_objective: 4.052 +[2024-11-08 07:19:13,627][41694] Num frames 20300... +[2024-11-08 07:19:13,825][41694] Num frames 20400... +[2024-11-08 07:19:14,026][41694] Num frames 20500... +[2024-11-08 07:19:14,217][41694] Num frames 20600... +[2024-11-08 07:19:14,374][41694] Avg episode rewards: #0: 4.460, true rewards: #0: 4.048 +[2024-11-08 07:19:14,378][41694] Avg episode reward: 4.460, avg true_objective: 4.048 +[2024-11-08 07:19:14,512][41694] Num frames 20700... +[2024-11-08 07:19:14,735][41694] Num frames 20800... +[2024-11-08 07:19:14,961][41694] Num frames 20900... +[2024-11-08 07:19:15,200][41694] Num frames 21000... +[2024-11-08 07:19:15,329][41694] Avg episode rewards: #0: 4.448, true rewards: #0: 4.044 +[2024-11-08 07:19:15,332][41694] Avg episode reward: 4.448, avg true_objective: 4.044 +[2024-11-08 07:19:15,500][41694] Num frames 21100... +[2024-11-08 07:19:15,732][41694] Num frames 21200... +[2024-11-08 07:19:15,985][41694] Num frames 21300... +[2024-11-08 07:19:16,210][41694] Num frames 21400... +[2024-11-08 07:19:16,435][41694] Avg episode rewards: #0: 4.467, true rewards: #0: 4.052 +[2024-11-08 07:19:16,436][41694] Avg episode reward: 4.467, avg true_objective: 4.052 +[2024-11-08 07:19:16,488][41694] Num frames 21500... +[2024-11-08 07:19:16,715][41694] Num frames 21600... +[2024-11-08 07:19:16,900][41694] Num frames 21700... +[2024-11-08 07:19:17,092][41694] Num frames 21800... +[2024-11-08 07:19:17,272][41694] Avg episode rewards: #0: 4.456, true rewards: #0: 4.048 +[2024-11-08 07:19:17,274][41694] Avg episode reward: 4.456, avg true_objective: 4.048 +[2024-11-08 07:19:17,353][41694] Num frames 21900... +[2024-11-08 07:19:17,558][41694] Num frames 22000... +[2024-11-08 07:19:17,760][41694] Num frames 22100... +[2024-11-08 07:19:17,974][41694] Num frames 22200... +[2024-11-08 07:19:18,171][41694] Num frames 22300... +[2024-11-08 07:19:18,249][41694] Avg episode rewards: #0: 4.474, true rewards: #0: 4.056 +[2024-11-08 07:19:18,251][41694] Avg episode reward: 4.474, avg true_objective: 4.056 +[2024-11-08 07:19:18,439][41694] Num frames 22400... +[2024-11-08 07:19:18,617][41694] Num frames 22500... +[2024-11-08 07:19:18,796][41694] Num frames 22600... +[2024-11-08 07:19:18,973][41694] Num frames 22700... +[2024-11-08 07:19:19,163][41694] Num frames 22800... +[2024-11-08 07:19:19,387][41694] Avg episode rewards: #0: 4.551, true rewards: #0: 4.087 +[2024-11-08 07:19:19,393][41694] Avg episode reward: 4.551, avg true_objective: 4.087 +[2024-11-08 07:19:19,442][41694] Num frames 22900... +[2024-11-08 07:19:19,650][41694] Num frames 23000... +[2024-11-08 07:19:19,849][41694] Num frames 23100... +[2024-11-08 07:19:20,048][41694] Num frames 23200... +[2024-11-08 07:19:20,251][41694] Num frames 23300... +[2024-11-08 07:19:20,313][41694] Avg episode rewards: #0: 4.544, true rewards: #0: 4.088 +[2024-11-08 07:19:20,314][41694] Avg episode reward: 4.544, avg true_objective: 4.088 +[2024-11-08 07:19:20,523][41694] Num frames 23400... +[2024-11-08 07:19:20,723][41694] Num frames 23500... +[2024-11-08 07:19:20,914][41694] Num frames 23600... +[2024-11-08 07:19:21,141][41694] Avg episode rewards: #0: 4.532, true rewards: #0: 4.084 +[2024-11-08 07:19:21,146][41694] Avg episode reward: 4.532, avg true_objective: 4.084 +[2024-11-08 07:19:21,194][41694] Num frames 23700... +[2024-11-08 07:19:21,394][41694] Num frames 23800... +[2024-11-08 07:19:21,604][41694] Num frames 23900... +[2024-11-08 07:19:21,807][41694] Num frames 24000... +[2024-11-08 07:19:22,003][41694] Num frames 24100... +[2024-11-08 07:19:22,128][41694] Avg episode rewards: #0: 4.548, true rewards: #0: 4.090 +[2024-11-08 07:19:22,132][41694] Avg episode reward: 4.548, avg true_objective: 4.090 +[2024-11-08 07:19:22,290][41694] Num frames 24200... +[2024-11-08 07:19:22,494][41694] Num frames 24300... +[2024-11-08 07:19:22,701][41694] Num frames 24400... +[2024-11-08 07:19:22,901][41694] Num frames 24500... +[2024-11-08 07:19:22,997][41694] Avg episode rewards: #0: 4.536, true rewards: #0: 4.086 +[2024-11-08 07:19:23,002][41694] Avg episode reward: 4.536, avg true_objective: 4.086 +[2024-11-08 07:19:23,189][41694] Num frames 24600... +[2024-11-08 07:19:23,380][41694] Num frames 24700... +[2024-11-08 07:19:23,575][41694] Num frames 24800... +[2024-11-08 07:19:23,778][41694] Num frames 24900... +[2024-11-08 07:19:23,839][41694] Avg episode rewards: #0: 4.525, true rewards: #0: 4.082 +[2024-11-08 07:19:23,841][41694] Avg episode reward: 4.525, avg true_objective: 4.082 +[2024-11-08 07:19:24,038][41694] Num frames 25000... +[2024-11-08 07:19:24,233][41694] Num frames 25100... +[2024-11-08 07:19:24,429][41694] Num frames 25200... +[2024-11-08 07:19:24,635][41694] Num frames 25300... +[2024-11-08 07:19:24,791][41694] Avg episode rewards: #0: 4.540, true rewards: #0: 4.089 +[2024-11-08 07:19:24,797][41694] Avg episode reward: 4.540, avg true_objective: 4.089 +[2024-11-08 07:19:24,911][41694] Num frames 25400... +[2024-11-08 07:19:25,088][41694] Num frames 25500... +[2024-11-08 07:19:25,263][41694] Num frames 25600... +[2024-11-08 07:19:25,449][41694] Num frames 25700... +[2024-11-08 07:19:25,691][41694] Avg episode rewards: #0: 4.555, true rewards: #0: 4.095 +[2024-11-08 07:19:25,694][41694] Avg episode reward: 4.555, avg true_objective: 4.095 +[2024-11-08 07:19:25,710][41694] Num frames 25800... +[2024-11-08 07:19:25,916][41694] Num frames 25900... +[2024-11-08 07:19:26,104][41694] Num frames 26000... +[2024-11-08 07:19:26,289][41694] Num frames 26100... +[2024-11-08 07:19:26,583][41694] Avg episode rewards: #0: 4.544, true rewards: #0: 4.091 +[2024-11-08 07:19:26,587][41694] Avg episode reward: 4.544, avg true_objective: 4.091 +[2024-11-08 07:19:26,652][41694] Num frames 26200... +[2024-11-08 07:19:26,835][41694] Num frames 26300... +[2024-11-08 07:19:27,010][41694] Num frames 26400... +[2024-11-08 07:19:27,195][41694] Num frames 26500... +[2024-11-08 07:19:27,395][41694] Num frames 26600... +[2024-11-08 07:19:27,636][41694] Avg episode rewards: #0: 4.584, true rewards: #0: 4.107 +[2024-11-08 07:19:27,639][41694] Avg episode reward: 4.584, avg true_objective: 4.107 +[2024-11-08 07:19:27,669][41694] Num frames 26700... +[2024-11-08 07:19:27,865][41694] Num frames 26800... +[2024-11-08 07:19:28,066][41694] Num frames 26900... +[2024-11-08 07:19:28,260][41694] Num frames 27000... +[2024-11-08 07:19:28,467][41694] Avg episode rewards: #0: 4.572, true rewards: #0: 4.103 +[2024-11-08 07:19:28,471][41694] Avg episode reward: 4.572, avg true_objective: 4.103 +[2024-11-08 07:19:28,540][41694] Num frames 27100... +[2024-11-08 07:19:28,730][41694] Num frames 27200... +[2024-11-08 07:19:28,926][41694] Num frames 27300... +[2024-11-08 07:19:29,146][41694] Num frames 27400... +[2024-11-08 07:19:29,336][41694] Avg episode rewards: #0: 4.561, true rewards: #0: 4.099 +[2024-11-08 07:19:29,342][41694] Avg episode reward: 4.561, avg true_objective: 4.099 +[2024-11-08 07:19:29,444][41694] Num frames 27500... +[2024-11-08 07:19:29,643][41694] Num frames 27600... +[2024-11-08 07:19:29,850][41694] Num frames 27700... +[2024-11-08 07:19:30,071][41694] Num frames 27800... +[2024-11-08 07:19:30,280][41694] Num frames 27900... +[2024-11-08 07:19:30,362][41694] Avg episode rewards: #0: 4.575, true rewards: #0: 4.104 +[2024-11-08 07:19:30,366][41694] Avg episode reward: 4.575, avg true_objective: 4.104 +[2024-11-08 07:19:30,583][41694] Num frames 28000... +[2024-11-08 07:19:30,786][41694] Num frames 28100... +[2024-11-08 07:19:31,000][41694] Num frames 28200... +[2024-11-08 07:19:31,238][41694] Num frames 28300... +[2024-11-08 07:19:31,416][41694] Avg episode rewards: #0: 4.588, true rewards: #0: 4.110 +[2024-11-08 07:19:31,421][41694] Avg episode reward: 4.588, avg true_objective: 4.110 +[2024-11-08 07:19:31,527][41694] Num frames 28400... +[2024-11-08 07:19:31,735][41694] Num frames 28500... +[2024-11-08 07:19:31,945][41694] Num frames 28600... +[2024-11-08 07:19:32,150][41694] Num frames 28700... +[2024-11-08 07:19:32,374][41694] Num frames 28800... +[2024-11-08 07:19:32,447][41694] Avg episode rewards: #0: 4.601, true rewards: #0: 4.115 +[2024-11-08 07:19:32,452][41694] Avg episode reward: 4.601, avg true_objective: 4.115 +[2024-11-08 07:19:32,674][41694] Num frames 28900... +[2024-11-08 07:19:32,891][41694] Num frames 29000... +[2024-11-08 07:19:33,569][41694] Num frames 29100... +[2024-11-08 07:19:33,831][41694] Avg episode rewards: #0: 4.590, true rewards: #0: 4.111 +[2024-11-08 07:19:33,835][41694] Avg episode reward: 4.590, avg true_objective: 4.111 +[2024-11-08 07:19:33,879][41694] Num frames 29200... +[2024-11-08 07:19:34,101][41694] Num frames 29300... +[2024-11-08 07:19:34,294][41694] Num frames 29400... +[2024-11-08 07:19:34,489][41694] Num frames 29500... +[2024-11-08 07:19:34,692][41694] Avg episode rewards: #0: 4.580, true rewards: #0: 4.107 +[2024-11-08 07:19:34,696][41694] Avg episode reward: 4.580, avg true_objective: 4.107 +[2024-11-08 07:19:34,765][41694] Num frames 29600... +[2024-11-08 07:19:34,980][41694] Num frames 29700... +[2024-11-08 07:19:35,206][41694] Num frames 29800... +[2024-11-08 07:19:35,421][41694] Num frames 29900... +[2024-11-08 07:19:35,594][41694] Avg episode rewards: #0: 4.569, true rewards: #0: 4.104 +[2024-11-08 07:19:35,598][41694] Avg episode reward: 4.569, avg true_objective: 4.104 +[2024-11-08 07:19:35,794][41694] Num frames 30000... +[2024-11-08 07:19:36,015][41694] Num frames 30100... +[2024-11-08 07:19:36,228][41694] Num frames 30200... +[2024-11-08 07:19:36,438][41694] Num frames 30300... +[2024-11-08 07:19:36,590][41694] Avg episode rewards: #0: 4.560, true rewards: #0: 4.100 +[2024-11-08 07:19:36,595][41694] Avg episode reward: 4.560, avg true_objective: 4.100 +[2024-11-08 07:19:36,741][41694] Num frames 30400... +[2024-11-08 07:19:36,969][41694] Num frames 30500... +[2024-11-08 07:19:37,189][41694] Num frames 30600... +[2024-11-08 07:19:37,394][41694] Num frames 30700... +[2024-11-08 07:19:37,563][41694] Avg episode rewards: #0: 4.554, true rewards: #0: 4.101 +[2024-11-08 07:19:37,569][41694] Avg episode reward: 4.554, avg true_objective: 4.101 +[2024-11-08 07:19:37,673][41694] Num frames 30800... +[2024-11-08 07:19:37,890][41694] Num frames 30900... +[2024-11-08 07:19:38,094][41694] Num frames 31000... +[2024-11-08 07:19:38,298][41694] Num frames 31100... +[2024-11-08 07:19:38,442][41694] Avg episode rewards: #0: 4.545, true rewards: #0: 4.097 +[2024-11-08 07:19:38,447][41694] Avg episode reward: 4.545, avg true_objective: 4.097 +[2024-11-08 07:19:38,582][41694] Num frames 31200... +[2024-11-08 07:19:38,786][41694] Num frames 31300... +[2024-11-08 07:19:38,994][41694] Num frames 31400... +[2024-11-08 07:19:39,192][41694] Num frames 31500... +[2024-11-08 07:19:39,302][41694] Avg episode rewards: #0: 4.536, true rewards: #0: 4.094 +[2024-11-08 07:19:39,308][41694] Avg episode reward: 4.536, avg true_objective: 4.094 +[2024-11-08 07:19:39,468][41694] Num frames 31600... +[2024-11-08 07:19:39,645][41694] Num frames 31700... +[2024-11-08 07:19:39,872][41694] Num frames 31800... +[2024-11-08 07:19:40,056][41694] Num frames 31900... +[2024-11-08 07:19:40,133][41694] Avg episode rewards: #0: 4.527, true rewards: #0: 4.091 +[2024-11-08 07:19:40,137][41694] Avg episode reward: 4.527, avg true_objective: 4.091 +[2024-11-08 07:19:40,319][41694] Num frames 32000... +[2024-11-08 07:19:40,524][41694] Num frames 32100... +[2024-11-08 07:19:40,727][41694] Num frames 32200... +[2024-11-08 07:19:40,957][41694] Avg episode rewards: #0: 4.518, true rewards: #0: 4.088 +[2024-11-08 07:19:40,958][41694] Avg episode reward: 4.518, avg true_objective: 4.088 +[2024-11-08 07:19:40,978][41694] Num frames 32300... +[2024-11-08 07:19:41,168][41694] Num frames 32400... +[2024-11-08 07:19:41,347][41694] Num frames 32500... +[2024-11-08 07:19:41,546][41694] Avg episode rewards: #0: 4.510, true rewards: #0: 4.073 +[2024-11-08 07:19:41,549][41694] Avg episode reward: 4.510, avg true_objective: 4.073 +[2024-11-08 07:19:41,610][41694] Num frames 32600... +[2024-11-08 07:19:41,838][41694] Num frames 32700... +[2024-11-08 07:19:42,019][41694] Num frames 32800... +[2024-11-08 07:19:42,208][41694] Num frames 32900... +[2024-11-08 07:19:42,385][41694] Avg episode rewards: #0: 4.502, true rewards: #0: 4.070 +[2024-11-08 07:19:42,386][41694] Avg episode reward: 4.502, avg true_objective: 4.070 +[2024-11-08 07:19:42,470][41694] Num frames 33000... +[2024-11-08 07:19:42,656][41694] Num frames 33100... +[2024-11-08 07:19:42,836][41694] Num frames 33200... +[2024-11-08 07:19:43,019][41694] Num frames 33300... +[2024-11-08 07:19:43,165][41694] Avg episode rewards: #0: 4.494, true rewards: #0: 4.067 +[2024-11-08 07:19:43,167][41694] Avg episode reward: 4.494, avg true_objective: 4.067 +[2024-11-08 07:19:43,260][41694] Num frames 33400... +[2024-11-08 07:19:43,446][41694] Num frames 33500... +[2024-11-08 07:19:43,632][41694] Num frames 33600... +[2024-11-08 07:19:43,844][41694] Num frames 33700... +[2024-11-08 07:19:44,035][41694] Avg episode rewards: #0: 4.502, true rewards: #0: 4.068 +[2024-11-08 07:19:44,038][41694] Avg episode reward: 4.502, avg true_objective: 4.068 +[2024-11-08 07:19:44,118][41694] Num frames 33800... +[2024-11-08 07:19:44,323][41694] Num frames 33900... +[2024-11-08 07:19:44,526][41694] Num frames 34000... +[2024-11-08 07:19:44,733][41694] Num frames 34100... +[2024-11-08 07:19:44,943][41694] Num frames 34200... +[2024-11-08 07:19:45,031][41694] Avg episode rewards: #0: 4.513, true rewards: #0: 4.073 +[2024-11-08 07:19:45,035][41694] Avg episode reward: 4.513, avg true_objective: 4.073 +[2024-11-08 07:19:45,217][41694] Num frames 34300... +[2024-11-08 07:19:45,408][41694] Num frames 34400... +[2024-11-08 07:19:45,613][41694] Num frames 34500... +[2024-11-08 07:19:45,811][41694] Num frames 34600... +[2024-11-08 07:19:45,930][41694] Avg episode rewards: #0: 4.521, true rewards: #0: 4.074 +[2024-11-08 07:19:45,935][41694] Avg episode reward: 4.521, avg true_objective: 4.074 +[2024-11-08 07:19:46,110][41694] Num frames 34700... +[2024-11-08 07:19:46,332][41694] Num frames 34800... +[2024-11-08 07:19:46,545][41694] Num frames 34900... +[2024-11-08 07:19:46,762][41694] Num frames 35000... +[2024-11-08 07:19:46,855][41694] Avg episode rewards: #0: 4.513, true rewards: #0: 4.071 +[2024-11-08 07:19:46,856][41694] Avg episode reward: 4.513, avg true_objective: 4.071 +[2024-11-08 07:19:47,048][41694] Num frames 35100... +[2024-11-08 07:19:47,288][41694] Num frames 35200... +[2024-11-08 07:19:47,519][41694] Num frames 35300... +[2024-11-08 07:19:47,795][41694] Avg episode rewards: #0: 4.505, true rewards: #0: 4.069 +[2024-11-08 07:19:47,797][41694] Avg episode reward: 4.505, avg true_objective: 4.069 +[2024-11-08 07:19:47,805][41694] Num frames 35400... +[2024-11-08 07:19:48,035][41694] Num frames 35500... +[2024-11-08 07:19:48,258][41694] Num frames 35600... +[2024-11-08 07:19:48,484][41694] Num frames 35700... +[2024-11-08 07:19:48,712][41694] Avg episode rewards: #0: 4.498, true rewards: #0: 4.066 +[2024-11-08 07:19:48,714][41694] Avg episode reward: 4.498, avg true_objective: 4.066 +[2024-11-08 07:19:48,781][41694] Num frames 35800... +[2024-11-08 07:19:48,983][41694] Num frames 35900... +[2024-11-08 07:19:49,162][41694] Num frames 36000... +[2024-11-08 07:19:49,344][41694] Num frames 36100... +[2024-11-08 07:19:49,535][41694] Num frames 36200... +[2024-11-08 07:19:49,716][41694] Num frames 36300... +[2024-11-08 07:19:49,952][41694] Avg episode rewards: #0: 4.549, true rewards: #0: 4.089 +[2024-11-08 07:19:49,954][41694] Avg episode reward: 4.549, avg true_objective: 4.089 +[2024-11-08 07:19:49,980][41694] Num frames 36400... +[2024-11-08 07:19:50,203][41694] Num frames 36500... +[2024-11-08 07:19:50,413][41694] Num frames 36600... +[2024-11-08 07:19:50,625][41694] Num frames 36700... +[2024-11-08 07:19:50,817][41694] Num frames 36800... +[2024-11-08 07:19:50,950][41694] Avg episode rewards: #0: 4.560, true rewards: #0: 4.093 +[2024-11-08 07:19:50,956][41694] Avg episode reward: 4.560, avg true_objective: 4.093 +[2024-11-08 07:19:51,079][41694] Num frames 36900... +[2024-11-08 07:19:51,257][41694] Num frames 37000... +[2024-11-08 07:19:51,447][41694] Num frames 37100... +[2024-11-08 07:19:51,625][41694] Num frames 37200... +[2024-11-08 07:19:51,727][41694] Avg episode rewards: #0: 4.552, true rewards: #0: 4.090 +[2024-11-08 07:19:51,731][41694] Avg episode reward: 4.552, avg true_objective: 4.090 +[2024-11-08 07:19:51,902][41694] Num frames 37300... +[2024-11-08 07:19:52,091][41694] Num frames 37400... +[2024-11-08 07:19:52,282][41694] Num frames 37500... +[2024-11-08 07:19:52,478][41694] Num frames 37600... +[2024-11-08 07:19:52,547][41694] Avg episode rewards: #0: 4.544, true rewards: #0: 4.087 +[2024-11-08 07:19:52,550][41694] Avg episode reward: 4.544, avg true_objective: 4.087 +[2024-11-08 07:19:52,752][41694] Num frames 37700... +[2024-11-08 07:19:52,951][41694] Num frames 37800... +[2024-11-08 07:19:53,127][41694] Avg episode rewards: #0: 4.523, true rewards: #0: 4.071 +[2024-11-08 07:19:53,134][41694] Avg episode reward: 4.523, avg true_objective: 4.071 +[2024-11-08 07:19:53,231][41694] Num frames 37900... +[2024-11-08 07:19:53,441][41694] Num frames 38000... +[2024-11-08 07:19:53,633][41694] Num frames 38100... +[2024-11-08 07:19:53,831][41694] Num frames 38200... +[2024-11-08 07:19:53,976][41694] Avg episode rewards: #0: 4.515, true rewards: #0: 4.069 +[2024-11-08 07:19:53,980][41694] Avg episode reward: 4.515, avg true_objective: 4.069 +[2024-11-08 07:19:54,100][41694] Num frames 38300... +[2024-11-08 07:19:54,292][41694] Num frames 38400... +[2024-11-08 07:19:54,489][41694] Num frames 38500... +[2024-11-08 07:19:54,689][41694] Num frames 38600... +[2024-11-08 07:19:54,808][41694] Avg episode rewards: #0: 4.508, true rewards: #0: 4.066 +[2024-11-08 07:19:54,814][41694] Avg episode reward: 4.508, avg true_objective: 4.066 +[2024-11-08 07:19:54,967][41694] Num frames 38700... +[2024-11-08 07:19:55,154][41694] Num frames 38800... +[2024-11-08 07:19:55,339][41694] Num frames 38900... +[2024-11-08 07:19:55,528][41694] Num frames 39000... +[2024-11-08 07:19:55,622][41694] Avg episode rewards: #0: 4.501, true rewards: #0: 4.064 +[2024-11-08 07:19:55,625][41694] Avg episode reward: 4.501, avg true_objective: 4.064 +[2024-11-08 07:19:55,814][41694] Num frames 39100... +[2024-11-08 07:19:56,001][41694] Num frames 39200... +[2024-11-08 07:19:56,196][41694] Avg episode rewards: #0: 4.481, true rewards: #0: 4.048 +[2024-11-08 07:19:56,201][41694] Avg episode reward: 4.481, avg true_objective: 4.048 +[2024-11-08 07:19:56,287][41694] Num frames 39300... +[2024-11-08 07:19:56,472][41694] Num frames 39400... +[2024-11-08 07:19:56,676][41694] Num frames 39500... +[2024-11-08 07:19:56,878][41694] Num frames 39600... +[2024-11-08 07:19:57,041][41694] Avg episode rewards: #0: 4.475, true rewards: #0: 4.046 +[2024-11-08 07:19:57,043][41694] Avg episode reward: 4.475, avg true_objective: 4.046 +[2024-11-08 07:19:57,161][41694] Num frames 39700... +[2024-11-08 07:19:57,353][41694] Num frames 39800... +[2024-11-08 07:19:57,543][41694] Num frames 39900... +[2024-11-08 07:19:57,740][41694] Num frames 40000... +[2024-11-08 07:19:57,937][41694] Num frames 40100... +[2024-11-08 07:19:57,997][41694] Avg episode rewards: #0: 4.485, true rewards: #0: 4.051 +[2024-11-08 07:19:58,000][41694] Avg episode reward: 4.485, avg true_objective: 4.051 +[2024-11-08 07:19:58,176][41694] Num frames 40200... +[2024-11-08 07:19:58,351][41694] Num frames 40300... +[2024-11-08 07:19:58,534][41694] Num frames 40400... +[2024-11-08 07:19:58,743][41694] Avg episode rewards: #0: 4.479, true rewards: #0: 4.048 +[2024-11-08 07:19:58,745][41694] Avg episode reward: 4.479, avg true_objective: 4.048 +[2024-11-08 07:21:36,776][41694] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4! +[2024-11-08 07:21:37,442][41694] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-08 07:21:37,444][41694] Overriding arg 'num_workers' with value 1 passed from command line +[2024-11-08 07:21:37,445][41694] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-11-08 07:21:37,447][41694] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-11-08 07:21:37,449][41694] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-11-08 07:21:37,452][41694] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-11-08 07:21:37,455][41694] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2024-11-08 07:21:37,456][41694] Adding new argument 'max_num_episodes'=100 that is not in the saved config file! +[2024-11-08 07:21:37,459][41694] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-11-08 07:21:37,461][41694] Adding new argument 'hf_repository'='alidenewade/rl_course_vizdoom_health_gathering_supreme-alid' that is not in the saved config file! +[2024-11-08 07:21:37,462][41694] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-11-08 07:21:37,463][41694] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-11-08 07:21:37,466][41694] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-11-08 07:21:37,468][41694] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-11-08 07:21:37,470][41694] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-11-08 07:21:37,513][41694] RunningMeanStd input shape: (3, 72, 128) +[2024-11-08 07:21:37,520][41694] RunningMeanStd input shape: (1,) +[2024-11-08 07:21:37,560][41694] ConvEncoder: input_channels=3 +[2024-11-08 07:21:37,628][41694] Conv encoder output size: 512 +[2024-11-08 07:21:37,630][41694] Policy head output size: 512 +[2024-11-08 07:21:37,676][41694] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000052658_215687168.pth... +[2024-11-08 07:21:38,452][41694] Num frames 100... +[2024-11-08 07:21:38,676][41694] Num frames 200... +[2024-11-08 07:21:38,899][41694] Num frames 300... +[2024-11-08 07:21:39,174][41694] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-08 07:21:39,176][41694] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-08 07:21:39,263][41694] Num frames 400... +[2024-11-08 07:21:39,530][41694] Num frames 500... +[2024-11-08 07:21:39,729][41694] Num frames 600... +[2024-11-08 07:21:39,952][41694] Num frames 700... +[2024-11-08 07:21:40,170][41694] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-08 07:21:40,173][41694] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-08 07:21:40,248][41694] Num frames 800... +[2024-11-08 07:21:40,446][41694] Num frames 900... +[2024-11-08 07:21:40,670][41694] Num frames 1000... +[2024-11-08 07:21:40,911][41694] Num frames 1100... +[2024-11-08 07:21:41,130][41694] Num frames 1200... +[2024-11-08 07:21:41,226][41694] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 +[2024-11-08 07:21:41,228][41694] Avg episode reward: 4.387, avg true_objective: 4.053 +[2024-11-08 07:21:41,410][41694] Num frames 1300... +[2024-11-08 07:21:41,624][41694] Num frames 1400... +[2024-11-08 07:21:41,833][41694] Num frames 1500... +[2024-11-08 07:21:42,056][41694] Num frames 1600... +[2024-11-08 07:21:42,259][41694] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 +[2024-11-08 07:21:42,262][41694] Avg episode reward: 4.660, avg true_objective: 4.160 +[2024-11-08 07:21:42,347][41694] Num frames 1700... +[2024-11-08 07:21:42,548][41694] Num frames 1800... +[2024-11-08 07:21:42,752][41694] Num frames 1900... +[2024-11-08 07:21:42,949][41694] Num frames 2000... +[2024-11-08 07:21:43,101][41694] Avg episode rewards: #0: 4.496, true rewards: #0: 4.096 +[2024-11-08 07:21:43,102][41694] Avg episode reward: 4.496, avg true_objective: 4.096 +[2024-11-08 07:21:43,203][41694] Num frames 2100... +[2024-11-08 07:21:43,409][41694] Num frames 2200... +[2024-11-08 07:21:43,620][41694] Num frames 2300... +[2024-11-08 07:21:43,816][41694] Num frames 2400... +[2024-11-08 07:21:43,935][41694] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 +[2024-11-08 07:21:43,937][41694] Avg episode reward: 4.387, avg true_objective: 4.053 +[2024-11-08 07:21:44,528][41694] Num frames 2500... +[2024-11-08 07:21:44,728][41694] Num frames 2600... +[2024-11-08 07:21:44,932][41694] Num frames 2700... +[2024-11-08 07:21:45,128][41694] Num frames 2800... +[2024-11-08 07:21:45,215][41694] Avg episode rewards: #0: 4.309, true rewards: #0: 4.023 +[2024-11-08 07:21:45,218][41694] Avg episode reward: 4.309, avg true_objective: 4.023 +[2024-11-08 07:21:45,406][41694] Num frames 2900... +[2024-11-08 07:21:45,587][41694] Num frames 3000... +[2024-11-08 07:21:45,796][41694] Num frames 3100... +[2024-11-08 07:21:45,993][41694] Num frames 3200... +[2024-11-08 07:21:46,187][41694] Avg episode rewards: #0: 4.455, true rewards: #0: 4.080 +[2024-11-08 07:21:46,190][41694] Avg episode reward: 4.455, avg true_objective: 4.080 +[2024-11-08 07:21:46,294][41694] Num frames 3300... +[2024-11-08 07:21:46,527][41694] Num frames 3400... +[2024-11-08 07:21:46,764][41694] Num frames 3500... +[2024-11-08 07:21:46,964][41694] Num frames 3600... +[2024-11-08 07:21:47,111][41694] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 +[2024-11-08 07:21:47,115][41694] Avg episode reward: 4.387, avg true_objective: 4.053 +[2024-11-08 07:21:47,240][41694] Num frames 3700... +[2024-11-08 07:21:47,433][41694] Num frames 3800... +[2024-11-08 07:21:47,645][41694] Num frames 3900... +[2024-11-08 07:21:47,849][41694] Num frames 4000... +[2024-11-08 07:21:47,971][41694] Avg episode rewards: #0: 4.332, true rewards: #0: 4.032 +[2024-11-08 07:21:47,974][41694] Avg episode reward: 4.332, avg true_objective: 4.032 +[2024-11-08 07:21:48,125][41694] Num frames 4100... +[2024-11-08 07:21:48,320][41694] Num frames 4200... +[2024-11-08 07:21:48,510][41694] Num frames 4300... +[2024-11-08 07:21:48,714][41694] Num frames 4400... +[2024-11-08 07:21:48,806][41694] Avg episode rewards: #0: 4.287, true rewards: #0: 4.015 +[2024-11-08 07:21:48,810][41694] Avg episode reward: 4.287, avg true_objective: 4.015 +[2024-11-08 07:21:49,001][41694] Num frames 4500... +[2024-11-08 07:21:49,252][41694] Num frames 4600... +[2024-11-08 07:21:49,449][41694] Num frames 4700... +[2024-11-08 07:21:49,648][41694] Num frames 4800... +[2024-11-08 07:21:49,789][41694] Avg episode rewards: #0: 4.360, true rewards: #0: 4.027 +[2024-11-08 07:21:49,796][41694] Avg episode reward: 4.360, avg true_objective: 4.027 +[2024-11-08 07:21:49,945][41694] Num frames 4900... +[2024-11-08 07:21:50,138][41694] Num frames 5000... +[2024-11-08 07:21:50,345][41694] Num frames 5100... +[2024-11-08 07:21:50,554][41694] Num frames 5200... +[2024-11-08 07:21:50,643][41694] Avg episode rewards: #0: 4.320, true rewards: #0: 4.012 +[2024-11-08 07:21:50,647][41694] Avg episode reward: 4.320, avg true_objective: 4.012 +[2024-11-08 07:21:50,867][41694] Num frames 5300... +[2024-11-08 07:21:51,061][41694] Num frames 5400... +[2024-11-08 07:21:51,245][41694] Num frames 5500... +[2024-11-08 07:21:51,430][41694] Num frames 5600... +[2024-11-08 07:21:51,483][41694] Avg episode rewards: #0: 4.286, true rewards: #0: 4.000 +[2024-11-08 07:21:51,486][41694] Avg episode reward: 4.286, avg true_objective: 4.000 +[2024-11-08 07:21:51,688][41694] Num frames 5700... +[2024-11-08 07:21:51,883][41694] Num frames 5800... +[2024-11-08 07:21:52,089][41694] Num frames 5900... +[2024-11-08 07:21:52,299][41694] Avg episode rewards: #0: 4.256, true rewards: #0: 3.989 +[2024-11-08 07:21:52,303][41694] Avg episode reward: 4.256, avg true_objective: 3.989 +[2024-11-08 07:21:52,342][41694] Num frames 6000... +[2024-11-08 07:21:52,542][41694] Num frames 6100... +[2024-11-08 07:21:52,749][41694] Num frames 6200... +[2024-11-08 07:21:52,941][41694] Num frames 6300... +[2024-11-08 07:21:53,129][41694] Avg episode rewards: #0: 4.230, true rewards: #0: 3.980 +[2024-11-08 07:21:53,133][41694] Avg episode reward: 4.230, avg true_objective: 3.980 +[2024-11-08 07:21:53,210][41694] Num frames 6400... +[2024-11-08 07:21:53,401][41694] Num frames 6500... +[2024-11-08 07:21:53,595][41694] Num frames 6600... +[2024-11-08 07:21:53,762][41694] Avg episode rewards: #0: 4.209, true rewards: #0: 3.915 +[2024-11-08 07:21:53,763][41694] Avg episode reward: 4.209, avg true_objective: 3.915 +[2024-11-08 07:21:53,868][41694] Num frames 6700... +[2024-11-08 07:21:54,073][41694] Num frames 6800... +[2024-11-08 07:21:54,283][41694] Num frames 6900... +[2024-11-08 07:21:54,480][41694] Num frames 7000... +[2024-11-08 07:21:54,625][41694] Avg episode rewards: #0: 4.189, true rewards: #0: 3.911 +[2024-11-08 07:21:54,626][41694] Avg episode reward: 4.189, avg true_objective: 3.911 +[2024-11-08 07:21:54,758][41694] Num frames 7100... +[2024-11-08 07:21:54,970][41694] Num frames 7200... +[2024-11-08 07:21:55,243][41694] Num frames 7300... +[2024-11-08 07:21:55,542][41694] Num frames 7400... +[2024-11-08 07:21:55,649][41694] Avg episode rewards: #0: 4.171, true rewards: #0: 3.907 +[2024-11-08 07:21:55,650][41694] Avg episode reward: 4.171, avg true_objective: 3.907 +[2024-11-08 07:21:55,818][41694] Num frames 7500... +[2024-11-08 07:21:56,042][41694] Num frames 7600... +[2024-11-08 07:21:56,247][41694] Num frames 7700... +[2024-11-08 07:21:56,459][41694] Num frames 7800... +[2024-11-08 07:21:56,737][41694] Avg episode rewards: #0: 4.236, true rewards: #0: 3.936 +[2024-11-08 07:21:56,744][41694] Avg episode reward: 4.236, avg true_objective: 3.936 +[2024-11-08 07:21:56,830][41694] Num frames 7900... +[2024-11-08 07:21:57,068][41694] Num frames 8000... +[2024-11-08 07:21:57,324][41694] Num frames 8100... +[2024-11-08 07:21:57,451][41694] Avg episode rewards: #0: 4.156, true rewards: #0: 3.870 +[2024-11-08 07:21:57,454][41694] Avg episode reward: 4.156, avg true_objective: 3.870 +[2024-11-08 07:21:57,631][41694] Num frames 8200... +[2024-11-08 07:21:57,923][41694] Num frames 8300... +[2024-11-08 07:21:58,163][41694] Num frames 8400... +[2024-11-08 07:21:58,432][41694] Num frames 8500... +[2024-11-08 07:21:58,665][41694] Avg episode rewards: #0: 4.216, true rewards: #0: 3.898 +[2024-11-08 07:21:58,667][41694] Avg episode reward: 4.216, avg true_objective: 3.898 +[2024-11-08 07:21:58,754][41694] Num frames 8600... +[2024-11-08 07:21:58,989][41694] Num frames 8700... +[2024-11-08 07:21:59,267][41694] Num frames 8800... +[2024-11-08 07:21:59,490][41694] Num frames 8900... +[2024-11-08 07:21:59,720][41694] Num frames 9000... +[2024-11-08 07:21:59,940][41694] Num frames 9100... +[2024-11-08 07:22:00,173][41694] Avg episode rewards: #0: 4.428, true rewards: #0: 3.993 +[2024-11-08 07:22:00,175][41694] Avg episode reward: 4.428, avg true_objective: 3.993 +[2024-11-08 07:22:00,208][41694] Num frames 9200... +[2024-11-08 07:22:00,488][41694] Num frames 9300... +[2024-11-08 07:22:00,734][41694] Num frames 9400... +[2024-11-08 07:22:01,010][41694] Num frames 9500... +[2024-11-08 07:22:01,233][41694] Avg episode rewards: #0: 4.403, true rewards: #0: 3.987 +[2024-11-08 07:22:01,235][41694] Avg episode reward: 4.403, avg true_objective: 3.987 +[2024-11-08 07:22:01,338][41694] Num frames 9600... +[2024-11-08 07:22:01,587][41694] Num frames 9700... +[2024-11-08 07:22:01,782][41694] Num frames 9800... +[2024-11-08 07:22:01,992][41694] Num frames 9900... +[2024-11-08 07:22:02,216][41694] Num frames 10000... +[2024-11-08 07:22:02,307][41694] Avg episode rewards: #0: 4.446, true rewards: #0: 4.006 +[2024-11-08 07:22:02,310][41694] Avg episode reward: 4.446, avg true_objective: 4.006 +[2024-11-08 07:22:02,477][41694] Num frames 10100... +[2024-11-08 07:22:02,676][41694] Num frames 10200... +[2024-11-08 07:22:02,894][41694] Num frames 10300... +[2024-11-08 07:22:03,090][41694] Num frames 10400... +[2024-11-08 07:22:03,287][41694] Avg episode rewards: #0: 4.486, true rewards: #0: 4.025 +[2024-11-08 07:22:03,293][41694] Avg episode reward: 4.486, avg true_objective: 4.025 +[2024-11-08 07:22:03,470][41694] Num frames 10500... +[2024-11-08 07:22:03,679][41694] Num frames 10600... +[2024-11-08 07:22:03,898][41694] Num frames 10700... +[2024-11-08 07:22:04,114][41694] Num frames 10800... +[2024-11-08 07:22:04,332][41694] Num frames 10900... +[2024-11-08 07:22:04,418][41694] Avg episode rewards: #0: 4.523, true rewards: #0: 4.041 +[2024-11-08 07:22:04,424][41694] Avg episode reward: 4.523, avg true_objective: 4.041 +[2024-11-08 07:22:04,610][41694] Num frames 11000... +[2024-11-08 07:22:04,816][41694] Num frames 11100... +[2024-11-08 07:22:05,037][41694] Num frames 11200... +[2024-11-08 07:22:05,248][41694] Num frames 11300... +[2024-11-08 07:22:05,469][41694] Num frames 11400... +[2024-11-08 07:22:05,647][41694] Avg episode rewards: #0: 4.627, true rewards: #0: 4.091 +[2024-11-08 07:22:05,648][41694] Avg episode reward: 4.627, avg true_objective: 4.091 +[2024-11-08 07:22:05,752][41694] Num frames 11500... +[2024-11-08 07:22:05,964][41694] Num frames 11600... +[2024-11-08 07:22:06,174][41694] Num frames 11700... +[2024-11-08 07:22:06,381][41694] Num frames 11800... +[2024-11-08 07:22:06,533][41694] Avg episode rewards: #0: 4.600, true rewards: #0: 4.083 +[2024-11-08 07:22:06,537][41694] Avg episode reward: 4.600, avg true_objective: 4.083 +[2024-11-08 07:22:06,681][41694] Num frames 11900... +[2024-11-08 07:22:06,890][41694] Num frames 12000... +[2024-11-08 07:22:07,075][41694] Num frames 12100... +[2024-11-08 07:22:07,266][41694] Num frames 12200... +[2024-11-08 07:22:07,386][41694] Avg episode rewards: #0: 4.575, true rewards: #0: 4.075 +[2024-11-08 07:22:07,387][41694] Avg episode reward: 4.575, avg true_objective: 4.075 +[2024-11-08 07:22:07,589][41694] Num frames 12300... +[2024-11-08 07:22:07,789][41694] Num frames 12400... +[2024-11-08 07:22:07,985][41694] Num frames 12500... +[2024-11-08 07:22:08,178][41694] Num frames 12600... +[2024-11-08 07:22:08,250][41694] Avg episode rewards: #0: 4.551, true rewards: #0: 4.067 +[2024-11-08 07:22:08,254][41694] Avg episode reward: 4.551, avg true_objective: 4.067 +[2024-11-08 07:22:08,483][41694] Num frames 12700... +[2024-11-08 07:22:08,700][41694] Num frames 12800... +[2024-11-08 07:22:08,972][41694] Num frames 12900... +[2024-11-08 07:22:09,247][41694] Avg episode rewards: #0: 4.529, true rewards: #0: 4.060 +[2024-11-08 07:22:09,249][41694] Avg episode reward: 4.529, avg true_objective: 4.060 +[2024-11-08 07:22:09,269][41694] Num frames 13000... +[2024-11-08 07:22:09,485][41694] Num frames 13100... +[2024-11-08 07:22:09,683][41694] Num frames 13200... +[2024-11-08 07:22:09,911][41694] Num frames 13300... +[2024-11-08 07:22:10,122][41694] Num frames 13400... +[2024-11-08 07:22:10,198][41694] Avg episode rewards: #0: 4.548, true rewards: #0: 4.063 +[2024-11-08 07:22:10,202][41694] Avg episode reward: 4.548, avg true_objective: 4.063 +[2024-11-08 07:22:10,404][41694] Num frames 13500... +[2024-11-08 07:22:10,612][41694] Num frames 13600... +[2024-11-08 07:22:10,823][41694] Num frames 13700... +[2024-11-08 07:22:11,086][41694] Avg episode rewards: #0: 4.527, true rewards: #0: 4.056 +[2024-11-08 07:22:11,090][41694] Avg episode reward: 4.527, avg true_objective: 4.056 +[2024-11-08 07:22:11,129][41694] Num frames 13800... +[2024-11-08 07:22:11,354][41694] Num frames 13900... +[2024-11-08 07:22:11,666][41694] Num frames 14000... +[2024-11-08 07:22:11,907][41694] Num frames 14100... +[2024-11-08 07:22:12,119][41694] Avg episode rewards: #0: 4.507, true rewards: #0: 4.050 +[2024-11-08 07:22:12,122][41694] Avg episode reward: 4.507, avg true_objective: 4.050 +[2024-11-08 07:22:12,211][41694] Num frames 14200... +[2024-11-08 07:22:12,502][41694] Num frames 14300... +[2024-11-08 07:22:12,843][41694] Num frames 14400... +[2024-11-08 07:22:13,074][41694] Num frames 14500... +[2024-11-08 07:22:13,293][41694] Num frames 14600... +[2024-11-08 07:22:13,504][41694] Num frames 14700... +[2024-11-08 07:22:13,609][41694] Avg episode rewards: #0: 4.589, true rewards: #0: 4.089 +[2024-11-08 07:22:13,613][41694] Avg episode reward: 4.589, avg true_objective: 4.089 +[2024-11-08 07:22:13,817][41694] Num frames 14800... +[2024-11-08 07:22:14,076][41694] Num frames 14900... +[2024-11-08 07:22:14,290][41694] Num frames 15000... +[2024-11-08 07:22:14,514][41694] Num frames 15100... +[2024-11-08 07:22:14,711][41694] Avg episode rewards: #0: 4.613, true rewards: #0: 4.099 +[2024-11-08 07:22:14,715][41694] Avg episode reward: 4.613, avg true_objective: 4.099 +[2024-11-08 07:22:14,798][41694] Num frames 15200... +[2024-11-08 07:22:15,026][41694] Num frames 15300... +[2024-11-08 07:22:15,231][41694] Num frames 15400... +[2024-11-08 07:22:15,414][41694] Num frames 15500... +[2024-11-08 07:22:15,564][41694] Avg episode rewards: #0: 4.593, true rewards: #0: 4.093 +[2024-11-08 07:22:15,570][41694] Avg episode reward: 4.593, avg true_objective: 4.093 +[2024-11-08 07:22:15,676][41694] Num frames 15600... +[2024-11-08 07:22:15,862][41694] Num frames 15700... +[2024-11-08 07:22:16,058][41694] Num frames 15800... +[2024-11-08 07:22:16,255][41694] Num frames 15900... +[2024-11-08 07:22:16,390][41694] Avg episode rewards: #0: 4.573, true rewards: #0: 4.086 +[2024-11-08 07:22:16,395][41694] Avg episode reward: 4.573, avg true_objective: 4.086 +[2024-11-08 07:22:16,569][41694] Num frames 16000... +[2024-11-08 07:22:17,223][41694] Num frames 16100... +[2024-11-08 07:22:17,465][41694] Num frames 16200... +[2024-11-08 07:22:17,700][41694] Num frames 16300... +[2024-11-08 07:22:17,799][41694] Avg episode rewards: #0: 4.555, true rewards: #0: 4.080 +[2024-11-08 07:22:17,801][41694] Avg episode reward: 4.555, avg true_objective: 4.080 +[2024-11-08 07:22:17,993][41694] Num frames 16400... +[2024-11-08 07:22:18,195][41694] Num frames 16500... +[2024-11-08 07:22:18,446][41694] Num frames 16600... +[2024-11-08 07:22:18,668][41694] Num frames 16700... +[2024-11-08 07:22:18,897][41694] Avg episode rewards: #0: 4.578, true rewards: #0: 4.090 +[2024-11-08 07:22:18,901][41694] Avg episode reward: 4.578, avg true_objective: 4.090 +[2024-11-08 07:22:19,000][41694] Num frames 16800... +[2024-11-08 07:22:19,242][41694] Num frames 16900... +[2024-11-08 07:22:19,479][41694] Num frames 17000... +[2024-11-08 07:22:19,588][41694] Avg episode rewards: #0: 4.530, true rewards: #0: 4.053 +[2024-11-08 07:22:19,589][41694] Avg episode reward: 4.530, avg true_objective: 4.053 +[2024-11-08 07:22:19,759][41694] Num frames 17100... +[2024-11-08 07:22:19,985][41694] Num frames 17200... +[2024-11-08 07:22:20,201][41694] Num frames 17300... +[2024-11-08 07:22:20,417][41694] Num frames 17400... +[2024-11-08 07:22:20,500][41694] Avg episode rewards: #0: 4.513, true rewards: #0: 4.048 +[2024-11-08 07:22:20,505][41694] Avg episode reward: 4.513, avg true_objective: 4.048 +[2024-11-08 07:22:20,723][41694] Num frames 17500... +[2024-11-08 07:22:20,945][41694] Num frames 17600... +[2024-11-08 07:22:21,235][41694] Num frames 17700... +[2024-11-08 07:22:21,449][41694] Num frames 17800... +[2024-11-08 07:22:21,645][41694] Avg episode rewards: #0: 4.535, true rewards: #0: 4.058 +[2024-11-08 07:22:21,650][41694] Avg episode reward: 4.535, avg true_objective: 4.058 +[2024-11-08 07:22:21,787][41694] Num frames 17900... +[2024-11-08 07:22:22,074][41694] Num frames 18000... +[2024-11-08 07:22:22,338][41694] Num frames 18100... +[2024-11-08 07:22:22,581][41694] Num frames 18200... +[2024-11-08 07:22:22,802][41694] Avg episode rewards: #0: 4.549, true rewards: #0: 4.060 +[2024-11-08 07:22:22,806][41694] Avg episode reward: 4.549, avg true_objective: 4.060 +[2024-11-08 07:22:22,888][41694] Num frames 18300... +[2024-11-08 07:22:23,117][41694] Num frames 18400... +[2024-11-08 07:22:23,334][41694] Num frames 18500... +[2024-11-08 07:22:23,563][41694] Num frames 18600... +[2024-11-08 07:22:23,762][41694] Avg episode rewards: #0: 4.534, true rewards: #0: 4.056 +[2024-11-08 07:22:23,766][41694] Avg episode reward: 4.534, avg true_objective: 4.056 +[2024-11-08 07:22:23,877][41694] Num frames 18700... +[2024-11-08 07:22:24,101][41694] Num frames 18800... +[2024-11-08 07:22:24,334][41694] Num frames 18900... +[2024-11-08 07:22:24,542][41694] Num frames 19000... +[2024-11-08 07:22:24,733][41694] Num frames 19100... +[2024-11-08 07:22:24,796][41694] Avg episode rewards: #0: 4.554, true rewards: #0: 4.065 +[2024-11-08 07:22:24,801][41694] Avg episode reward: 4.554, avg true_objective: 4.065 +[2024-11-08 07:22:24,997][41694] Num frames 19200... +[2024-11-08 07:22:25,195][41694] Num frames 19300... +[2024-11-08 07:22:25,385][41694] Num frames 19400... +[2024-11-08 07:22:25,597][41694] Avg episode rewards: #0: 4.539, true rewards: #0: 4.060 +[2024-11-08 07:22:25,603][41694] Avg episode reward: 4.539, avg true_objective: 4.060 +[2024-11-08 07:22:25,650][41694] Num frames 19500... +[2024-11-08 07:22:25,865][41694] Num frames 19600... +[2024-11-08 07:22:26,093][41694] Num frames 19700... +[2024-11-08 07:22:26,250][41694] Avg episode rewards: #0: 4.499, true rewards: #0: 4.029 +[2024-11-08 07:22:26,253][41694] Avg episode reward: 4.499, avg true_objective: 4.029 +[2024-11-08 07:22:26,394][41694] Num frames 19800... +[2024-11-08 07:22:26,626][41694] Num frames 19900... +[2024-11-08 07:22:26,892][41694] Num frames 20000... +[2024-11-08 07:22:27,163][41694] Num frames 20100... +[2024-11-08 07:22:27,298][41694] Avg episode rewards: #0: 4.486, true rewards: #0: 4.026 +[2024-11-08 07:22:27,303][41694] Avg episode reward: 4.486, avg true_objective: 4.026 +[2024-11-08 07:22:27,491][41694] Num frames 20200... +[2024-11-08 07:22:27,748][41694] Num frames 20300... +[2024-11-08 07:22:27,954][41694] Num frames 20400... +[2024-11-08 07:22:28,150][41694] Num frames 20500... +[2024-11-08 07:22:28,239][41694] Avg episode rewards: #0: 4.473, true rewards: #0: 4.022 +[2024-11-08 07:22:28,241][41694] Avg episode reward: 4.473, avg true_objective: 4.022 +[2024-11-08 07:22:28,419][41694] Num frames 20600... +[2024-11-08 07:22:28,618][41694] Num frames 20700... +[2024-11-08 07:22:28,835][41694] Num frames 20800... +[2024-11-08 07:22:29,067][41694] Num frames 20900... +[2024-11-08 07:22:29,278][41694] Avg episode rewards: #0: 4.492, true rewards: #0: 4.031 +[2024-11-08 07:22:29,282][41694] Avg episode reward: 4.492, avg true_objective: 4.031 +[2024-11-08 07:22:29,405][41694] Num frames 21000... +[2024-11-08 07:22:29,665][41694] Num frames 21100... +[2024-11-08 07:22:29,917][41694] Num frames 21200... +[2024-11-08 07:22:30,169][41694] Num frames 21300... +[2024-11-08 07:22:30,433][41694] Num frames 21400... +[2024-11-08 07:22:30,675][41694] Num frames 21500... +[2024-11-08 07:22:30,742][41694] Avg episode rewards: #0: 4.548, true rewards: #0: 4.057 +[2024-11-08 07:22:30,744][41694] Avg episode reward: 4.548, avg true_objective: 4.057 +[2024-11-08 07:22:31,002][41694] Num frames 21600... +[2024-11-08 07:22:31,292][41694] Num frames 21700... +[2024-11-08 07:22:31,561][41694] Num frames 21800... +[2024-11-08 07:22:31,817][41694] Avg episode rewards: #0: 4.535, true rewards: #0: 4.053 +[2024-11-08 07:22:31,819][41694] Avg episode reward: 4.535, avg true_objective: 4.053 +[2024-11-08 07:22:31,849][41694] Num frames 21900... +[2024-11-08 07:22:32,079][41694] Num frames 22000... +[2024-11-08 07:22:32,304][41694] Num frames 22100... +[2024-11-08 07:22:32,519][41694] Num frames 22200... +[2024-11-08 07:22:32,727][41694] Avg episode rewards: #0: 4.522, true rewards: #0: 4.049 +[2024-11-08 07:22:32,732][41694] Avg episode reward: 4.522, avg true_objective: 4.049 +[2024-11-08 07:22:32,817][41694] Num frames 22300... +[2024-11-08 07:22:33,074][41694] Num frames 22400... +[2024-11-08 07:22:33,314][41694] Num frames 22500... +[2024-11-08 07:22:33,546][41694] Num frames 22600... +[2024-11-08 07:22:33,755][41694] Avg episode rewards: #0: 4.510, true rewards: #0: 4.046 +[2024-11-08 07:22:33,760][41694] Avg episode reward: 4.510, avg true_objective: 4.046 +[2024-11-08 07:22:33,860][41694] Num frames 22700... +[2024-11-08 07:22:34,062][41694] Num frames 22800... +[2024-11-08 07:22:34,274][41694] Num frames 22900... +[2024-11-08 07:22:34,505][41694] Num frames 23000... +[2024-11-08 07:22:34,662][41694] Avg episode rewards: #0: 4.498, true rewards: #0: 4.042 +[2024-11-08 07:22:34,667][41694] Avg episode reward: 4.498, avg true_objective: 4.042 +[2024-11-08 07:22:34,809][41694] Num frames 23100... +[2024-11-08 07:22:35,054][41694] Num frames 23200... +[2024-11-08 07:22:35,283][41694] Num frames 23300... +[2024-11-08 07:22:35,503][41694] Num frames 23400... +[2024-11-08 07:22:35,620][41694] Avg episode rewards: #0: 4.487, true rewards: #0: 4.039 +[2024-11-08 07:22:35,624][41694] Avg episode reward: 4.487, avg true_objective: 4.039 +[2024-11-08 07:22:35,811][41694] Num frames 23500... +[2024-11-08 07:22:36,073][41694] Num frames 23600... +[2024-11-08 07:22:36,338][41694] Num frames 23700... +[2024-11-08 07:22:36,606][41694] Num frames 23800... +[2024-11-08 07:22:36,686][41694] Avg episode rewards: #0: 4.476, true rewards: #0: 4.035 +[2024-11-08 07:22:36,690][41694] Avg episode reward: 4.476, avg true_objective: 4.035 +[2024-11-08 07:22:36,941][41694] Num frames 23900... +[2024-11-08 07:22:37,203][41694] Num frames 24000... +[2024-11-08 07:22:37,451][41694] Num frames 24100... +[2024-11-08 07:22:37,731][41694] Avg episode rewards: #0: 4.465, true rewards: #0: 4.032 +[2024-11-08 07:22:37,738][41694] Avg episode reward: 4.465, avg true_objective: 4.032 +[2024-11-08 07:22:37,773][41694] Num frames 24200... +[2024-11-08 07:22:38,027][41694] Num frames 24300... +[2024-11-08 07:22:38,235][41694] Num frames 24400... +[2024-11-08 07:22:38,474][41694] Num frames 24500... +[2024-11-08 07:22:38,732][41694] Num frames 24600... +[2024-11-08 07:22:38,995][41694] Num frames 24700... +[2024-11-08 07:22:39,138][41694] Avg episode rewards: #0: 4.514, true rewards: #0: 4.055 +[2024-11-08 07:22:39,142][41694] Avg episode reward: 4.514, avg true_objective: 4.055 +[2024-11-08 07:22:39,310][41694] Num frames 24800... +[2024-11-08 07:22:39,534][41694] Num frames 24900... +[2024-11-08 07:22:39,767][41694] Num frames 25000... +[2024-11-08 07:22:39,998][41694] Num frames 25100... +[2024-11-08 07:22:40,099][41694] Avg episode rewards: #0: 4.503, true rewards: #0: 4.052 +[2024-11-08 07:22:40,102][41694] Avg episode reward: 4.503, avg true_objective: 4.052 +[2024-11-08 07:22:40,286][41694] Num frames 25200... +[2024-11-08 07:22:40,506][41694] Num frames 25300... +[2024-11-08 07:22:40,735][41694] Num frames 25400... +[2024-11-08 07:22:40,957][41694] Num frames 25500... +[2024-11-08 07:22:41,018][41694] Avg episode rewards: #0: 4.493, true rewards: #0: 4.048 +[2024-11-08 07:22:41,025][41694] Avg episode reward: 4.493, avg true_objective: 4.048 +[2024-11-08 07:22:41,247][41694] Num frames 25600... +[2024-11-08 07:22:41,463][41694] Num frames 25700... +[2024-11-08 07:22:41,764][41694] Num frames 25800... +[2024-11-08 07:22:41,992][41694] Num frames 25900... +[2024-11-08 07:22:42,094][41694] Avg episode rewards: #0: 4.488, true rewards: #0: 4.050 +[2024-11-08 07:22:42,095][41694] Avg episode reward: 4.488, avg true_objective: 4.050 +[2024-11-08 07:22:42,273][41694] Num frames 26000... +[2024-11-08 07:22:42,507][41694] Num frames 26100... +[2024-11-08 07:22:42,728][41694] Num frames 26200... +[2024-11-08 07:22:42,935][41694] Num frames 26300... +[2024-11-08 07:22:43,011][41694] Avg episode rewards: #0: 4.478, true rewards: #0: 4.047 +[2024-11-08 07:22:43,013][41694] Avg episode reward: 4.478, avg true_objective: 4.047 +[2024-11-08 07:22:43,255][41694] Num frames 26400... +[2024-11-08 07:22:43,488][41694] Num frames 26500... +[2024-11-08 07:22:43,730][41694] Num frames 26600... +[2024-11-08 07:22:44,013][41694] Avg episode rewards: #0: 4.468, true rewards: #0: 4.044 +[2024-11-08 07:22:44,019][41694] Avg episode reward: 4.468, avg true_objective: 4.044 +[2024-11-08 07:22:44,069][41694] Num frames 26700... +[2024-11-08 07:22:44,325][41694] Num frames 26800... +[2024-11-08 07:22:44,576][41694] Num frames 26900... +[2024-11-08 07:22:44,825][41694] Num frames 27000... +[2024-11-08 07:22:45,127][41694] Avg episode rewards: #0: 4.476, true rewards: #0: 4.043 +[2024-11-08 07:22:45,134][41694] Avg episode reward: 4.476, avg true_objective: 4.043 +[2024-11-08 07:22:45,182][41694] Num frames 27100... +[2024-11-08 07:22:45,424][41694] Num frames 27200... +[2024-11-08 07:22:45,670][41694] Num frames 27300... +[2024-11-08 07:22:45,932][41694] Num frames 27400... +[2024-11-08 07:22:46,170][41694] Avg episode rewards: #0: 4.467, true rewards: #0: 4.040 +[2024-11-08 07:22:46,174][41694] Avg episode reward: 4.467, avg true_objective: 4.040 +[2024-11-08 07:22:46,253][41694] Num frames 27500... +[2024-11-08 07:22:46,453][41694] Num frames 27600... +[2024-11-08 07:22:46,672][41694] Num frames 27700... +[2024-11-08 07:22:46,887][41694] Num frames 27800... +[2024-11-08 07:22:47,101][41694] Num frames 27900... +[2024-11-08 07:22:47,211][41694] Avg episode rewards: #0: 4.481, true rewards: #0: 4.047 +[2024-11-08 07:22:47,216][41694] Avg episode reward: 4.481, avg true_objective: 4.047 +[2024-11-08 07:22:47,392][41694] Num frames 28000... +[2024-11-08 07:22:47,587][41694] Num frames 28100... +[2024-11-08 07:22:47,786][41694] Num frames 28200... +[2024-11-08 07:22:47,981][41694] Num frames 28300... +[2024-11-08 07:22:48,185][41694] Num frames 28400... +[2024-11-08 07:22:48,373][41694] Num frames 28500... +[2024-11-08 07:22:48,567][41694] Num frames 28600... +[2024-11-08 07:22:48,680][41694] Avg episode rewards: #0: 4.575, true rewards: #0: 4.089 +[2024-11-08 07:22:48,682][41694] Avg episode reward: 4.575, avg true_objective: 4.089 +[2024-11-08 07:22:48,833][41694] Num frames 28700... +[2024-11-08 07:22:49,031][41694] Num frames 28800... +[2024-11-08 07:22:49,234][41694] Num frames 28900... +[2024-11-08 07:22:49,899][41694] Num frames 29000... +[2024-11-08 07:22:50,129][41694] Avg episode rewards: #0: 4.588, true rewards: #0: 4.095 +[2024-11-08 07:22:50,132][41694] Avg episode reward: 4.588, avg true_objective: 4.095 +[2024-11-08 07:22:50,188][41694] Num frames 29100... +[2024-11-08 07:22:50,402][41694] Num frames 29200... +[2024-11-08 07:22:50,606][41694] Num frames 29300... +[2024-11-08 07:22:50,825][41694] Num frames 29400... +[2024-11-08 07:22:51,103][41694] Avg episode rewards: #0: 4.596, true rewards: #0: 4.096 +[2024-11-08 07:22:51,110][41694] Avg episode reward: 4.596, avg true_objective: 4.096 +[2024-11-08 07:22:51,153][41694] Num frames 29500... +[2024-11-08 07:22:51,396][41694] Num frames 29600... +[2024-11-08 07:22:51,621][41694] Num frames 29700... +[2024-11-08 07:22:51,835][41694] Num frames 29800... +[2024-11-08 07:22:52,055][41694] Avg episode rewards: #0: 4.585, true rewards: #0: 4.092 +[2024-11-08 07:22:52,058][41694] Avg episode reward: 4.585, avg true_objective: 4.092 +[2024-11-08 07:22:52,140][41694] Num frames 29900... +[2024-11-08 07:22:52,398][41694] Num frames 30000... +[2024-11-08 07:22:52,637][41694] Num frames 30100... +[2024-11-08 07:22:52,884][41694] Num frames 30200... +[2024-11-08 07:22:53,140][41694] Num frames 30300... +[2024-11-08 07:22:53,333][41694] Avg episode rewards: #0: 4.602, true rewards: #0: 4.102 +[2024-11-08 07:22:53,335][41694] Avg episode reward: 4.602, avg true_objective: 4.102 +[2024-11-08 07:22:53,442][41694] Num frames 30400... +[2024-11-08 07:22:53,671][41694] Num frames 30500... +[2024-11-08 07:22:53,922][41694] Num frames 30600... +[2024-11-08 07:22:54,178][41694] Num frames 30700... +[2024-11-08 07:22:54,422][41694] Num frames 30800... +[2024-11-08 07:22:54,487][41694] Avg episode rewards: #0: 4.613, true rewards: #0: 4.107 +[2024-11-08 07:22:54,491][41694] Avg episode reward: 4.613, avg true_objective: 4.107 +[2024-11-08 07:22:54,710][41694] Num frames 30900... +[2024-11-08 07:22:54,969][41694] Num frames 31000... +[2024-11-08 07:22:55,196][41694] Num frames 31100... +[2024-11-08 07:22:55,397][41694] Num frames 31200... +[2024-11-08 07:22:55,561][41694] Avg episode rewards: #0: 4.625, true rewards: #0: 4.112 +[2024-11-08 07:22:55,565][41694] Avg episode reward: 4.625, avg true_objective: 4.112 +[2024-11-08 07:22:55,685][41694] Num frames 31300... +[2024-11-08 07:22:55,947][41694] Num frames 31400... +[2024-11-08 07:22:56,188][41694] Num frames 31500... +[2024-11-08 07:22:56,369][41694] Num frames 31600... +[2024-11-08 07:22:56,616][41694] Avg episode rewards: #0: 4.636, true rewards: #0: 4.116 +[2024-11-08 07:22:56,622][41694] Avg episode reward: 4.636, avg true_objective: 4.116 +[2024-11-08 07:22:56,638][41694] Num frames 31700... +[2024-11-08 07:22:56,851][41694] Num frames 31800... +[2024-11-08 07:22:57,061][41694] Num frames 31900... +[2024-11-08 07:22:57,324][41694] Num frames 32000... +[2024-11-08 07:22:57,531][41694] Num frames 32100... +[2024-11-08 07:22:57,731][41694] Num frames 32200... +[2024-11-08 07:22:57,865][41694] Avg episode rewards: #0: 4.672, true rewards: #0: 4.133 +[2024-11-08 07:22:57,867][41694] Avg episode reward: 4.672, avg true_objective: 4.133 +[2024-11-08 07:22:58,003][41694] Num frames 32300... +[2024-11-08 07:22:58,220][41694] Num frames 32400... +[2024-11-08 07:22:58,424][41694] Num frames 32500... +[2024-11-08 07:22:58,632][41694] Num frames 32600... +[2024-11-08 07:22:58,875][41694] Avg episode rewards: #0: 4.682, true rewards: #0: 4.138 +[2024-11-08 07:22:58,878][41694] Avg episode reward: 4.682, avg true_objective: 4.138 +[2024-11-08 07:22:58,925][41694] Num frames 32700... +[2024-11-08 07:22:59,140][41694] Num frames 32800... +[2024-11-08 07:22:59,361][41694] Num frames 32900... +[2024-11-08 07:22:59,576][41694] Num frames 33000... +[2024-11-08 07:22:59,807][41694] Avg episode rewards: #0: 4.672, true rewards: #0: 4.134 +[2024-11-08 07:22:59,808][41694] Avg episode reward: 4.672, avg true_objective: 4.134 +[2024-11-08 07:22:59,889][41694] Num frames 33100... +[2024-11-08 07:23:00,126][41694] Num frames 33200... +[2024-11-08 07:23:00,380][41694] Num frames 33300... +[2024-11-08 07:23:00,721][41694] Num frames 33400... +[2024-11-08 07:23:00,997][41694] Num frames 33500... +[2024-11-08 07:23:01,115][41694] Avg episode rewards: #0: 4.682, true rewards: #0: 4.138 +[2024-11-08 07:23:01,119][41694] Avg episode reward: 4.682, avg true_objective: 4.138 +[2024-11-08 07:23:01,338][41694] Num frames 33600... +[2024-11-08 07:23:01,583][41694] Num frames 33700... +[2024-11-08 07:23:01,795][41694] Num frames 33800... +[2024-11-08 07:23:02,014][41694] Num frames 33900... +[2024-11-08 07:23:02,222][41694] Avg episode rewards: #0: 4.691, true rewards: #0: 4.143 +[2024-11-08 07:23:02,223][41694] Avg episode reward: 4.691, avg true_objective: 4.143 +[2024-11-08 07:23:02,306][41694] Num frames 34000... +[2024-11-08 07:23:02,557][41694] Num frames 34100... +[2024-11-08 07:23:02,832][41694] Num frames 34200... +[2024-11-08 07:23:03,123][41694] Num frames 34300... +[2024-11-08 07:23:03,341][41694] Avg episode rewards: #0: 4.681, true rewards: #0: 4.139 +[2024-11-08 07:23:03,342][41694] Avg episode reward: 4.681, avg true_objective: 4.139 +[2024-11-08 07:23:03,460][41694] Num frames 34400... +[2024-11-08 07:23:03,723][41694] Num frames 34500... +[2024-11-08 07:23:04,011][41694] Num frames 34600... +[2024-11-08 07:23:04,270][41694] Num frames 34700... +[2024-11-08 07:23:04,462][41694] Num frames 34800... +[2024-11-08 07:23:04,522][41694] Avg episode rewards: #0: 4.691, true rewards: #0: 4.143 +[2024-11-08 07:23:04,525][41694] Avg episode reward: 4.691, avg true_objective: 4.143 +[2024-11-08 07:23:04,725][41694] Num frames 34900... +[2024-11-08 07:23:04,920][41694] Num frames 35000... +[2024-11-08 07:23:05,132][41694] Num frames 35100... +[2024-11-08 07:23:05,386][41694] Avg episode rewards: #0: 4.681, true rewards: #0: 4.139 +[2024-11-08 07:23:05,387][41694] Avg episode reward: 4.681, avg true_objective: 4.139 +[2024-11-08 07:23:05,423][41694] Num frames 35200... +[2024-11-08 07:23:05,747][41694] Num frames 35300... +[2024-11-08 07:23:05,973][41694] Num frames 35400... +[2024-11-08 07:23:06,215][41694] Num frames 35500... +[2024-11-08 07:23:06,494][41694] Num frames 35600... +[2024-11-08 07:23:06,631][41694] Avg episode rewards: #0: 4.690, true rewards: #0: 4.143 +[2024-11-08 07:23:06,632][41694] Avg episode reward: 4.690, avg true_objective: 4.143 +[2024-11-08 07:23:06,807][41694] Num frames 35700... +[2024-11-08 07:23:07,044][41694] Num frames 35800... +[2024-11-08 07:23:07,240][41694] Num frames 35900... +[2024-11-08 07:23:07,444][41694] Num frames 36000... +[2024-11-08 07:23:07,541][41694] Avg episode rewards: #0: 4.680, true rewards: #0: 4.140 +[2024-11-08 07:23:07,546][41694] Avg episode reward: 4.680, avg true_objective: 4.140 +[2024-11-08 07:23:07,744][41694] Num frames 36100... +[2024-11-08 07:23:07,958][41694] Num frames 36200... +[2024-11-08 07:23:08,188][41694] Num frames 36300... +[2024-11-08 07:23:08,448][41694] Num frames 36400... +[2024-11-08 07:23:08,516][41694] Avg episode rewards: #0: 4.671, true rewards: #0: 4.136 +[2024-11-08 07:23:08,519][41694] Avg episode reward: 4.671, avg true_objective: 4.136 +[2024-11-08 07:23:08,743][41694] Num frames 36500... +[2024-11-08 07:23:08,978][41694] Num frames 36600... +[2024-11-08 07:23:09,256][41694] Num frames 36700... +[2024-11-08 07:23:09,495][41694] Avg episode rewards: #0: 4.661, true rewards: #0: 4.133 +[2024-11-08 07:23:09,497][41694] Avg episode reward: 4.661, avg true_objective: 4.133 +[2024-11-08 07:23:09,531][41694] Num frames 36800... +[2024-11-08 07:23:09,743][41694] Num frames 36900... +[2024-11-08 07:23:10,025][41694] Num frames 37000... +[2024-11-08 07:23:10,257][41694] Num frames 37100... +[2024-11-08 07:23:10,462][41694] Avg episode rewards: #0: 4.652, true rewards: #0: 4.130 +[2024-11-08 07:23:10,464][41694] Avg episode reward: 4.652, avg true_objective: 4.130 +[2024-11-08 07:23:10,531][41694] Num frames 37200... +[2024-11-08 07:23:10,741][41694] Num frames 37300... +[2024-11-08 07:23:10,945][41694] Num frames 37400... +[2024-11-08 07:23:11,153][41694] Num frames 37500... +[2024-11-08 07:23:11,325][41694] Avg episode rewards: #0: 4.643, true rewards: #0: 4.127 +[2024-11-08 07:23:11,327][41694] Avg episode reward: 4.643, avg true_objective: 4.127 +[2024-11-08 07:23:11,438][41694] Num frames 37600... +[2024-11-08 07:23:11,680][41694] Num frames 37700... +[2024-11-08 07:23:12,035][41694] Num frames 37800... +[2024-11-08 07:23:12,456][41694] Num frames 37900... +[2024-11-08 07:23:12,652][41694] Avg episode rewards: #0: 4.634, true rewards: #0: 4.124 +[2024-11-08 07:23:12,657][41694] Avg episode reward: 4.634, avg true_objective: 4.124 +[2024-11-08 07:23:12,798][41694] Num frames 38000... +[2024-11-08 07:23:12,997][41694] Num frames 38100... +[2024-11-08 07:23:13,220][41694] Num frames 38200... +[2024-11-08 07:23:13,568][41694] Num frames 38300... +[2024-11-08 07:23:13,700][41694] Avg episode rewards: #0: 4.626, true rewards: #0: 4.121 +[2024-11-08 07:23:13,703][41694] Avg episode reward: 4.626, avg true_objective: 4.121 +[2024-11-08 07:23:13,949][41694] Num frames 38400... +[2024-11-08 07:23:14,201][41694] Num frames 38500... +[2024-11-08 07:23:14,460][41694] Num frames 38600... +[2024-11-08 07:23:14,660][41694] Num frames 38700... +[2024-11-08 07:23:14,973][41694] Avg episode rewards: #0: 4.635, true rewards: #0: 4.124 +[2024-11-08 07:23:14,975][41694] Avg episode reward: 4.635, avg true_objective: 4.124 +[2024-11-08 07:23:15,050][41694] Num frames 38800... +[2024-11-08 07:23:15,483][41694] Num frames 38900... +[2024-11-08 07:23:15,793][41694] Num frames 39000... +[2024-11-08 07:23:15,996][41694] Num frames 39100... +[2024-11-08 07:23:16,194][41694] Avg episode rewards: #0: 4.627, true rewards: #0: 4.121 +[2024-11-08 07:23:16,197][41694] Avg episode reward: 4.627, avg true_objective: 4.121 +[2024-11-08 07:23:16,333][41694] Num frames 39200... +[2024-11-08 07:23:16,635][41694] Num frames 39300... +[2024-11-08 07:23:16,960][41694] Num frames 39400... +[2024-11-08 07:23:17,218][41694] Num frames 39500... +[2024-11-08 07:23:17,386][41694] Avg episode rewards: #0: 4.618, true rewards: #0: 4.118 +[2024-11-08 07:23:17,394][41694] Avg episode reward: 4.618, avg true_objective: 4.118 +[2024-11-08 07:23:17,589][41694] Num frames 39600... +[2024-11-08 07:23:17,882][41694] Num frames 39700... +[2024-11-08 07:23:18,142][41694] Num frames 39800... +[2024-11-08 07:23:18,510][41694] Num frames 39900... +[2024-11-08 07:23:18,646][41694] Avg episode rewards: #0: 4.610, true rewards: #0: 4.116 +[2024-11-08 07:23:18,649][41694] Avg episode reward: 4.610, avg true_objective: 4.116 +[2024-11-08 07:23:19,014][41694] Num frames 40000... +[2024-11-08 07:23:19,354][41694] Num frames 40100... +[2024-11-08 07:23:19,591][41694] Num frames 40200... +[2024-11-08 07:23:19,852][41694] Num frames 40300... +[2024-11-08 07:23:19,929][41694] Avg episode rewards: #0: 4.603, true rewards: #0: 4.113 +[2024-11-08 07:23:19,931][41694] Avg episode reward: 4.603, avg true_objective: 4.113 +[2024-11-08 07:23:20,124][41694] Num frames 40400... +[2024-11-08 07:23:20,325][41694] Num frames 40500... +[2024-11-08 07:23:20,532][41694] Num frames 40600... +[2024-11-08 07:23:20,822][41694] Num frames 40700... +[2024-11-08 07:23:21,134][41694] Avg episode rewards: #0: 4.611, true rewards: #0: 4.116 +[2024-11-08 07:23:21,135][41694] Avg episode reward: 4.611, avg true_objective: 4.116 +[2024-11-08 07:23:21,247][41694] Num frames 40800... +[2024-11-08 07:23:21,495][41694] Num frames 40900... +[2024-11-08 07:23:21,858][41694] Num frames 41000... +[2024-11-08 07:23:22,657][41694] Num frames 41100... +[2024-11-08 07:23:22,827][41694] Avg episode rewards: #0: 4.604, true rewards: #0: 4.114 +[2024-11-08 07:23:22,829][41694] Avg episode reward: 4.604, avg true_objective: 4.114 +[2024-11-08 07:25:09,068][41694] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4! +[2024-11-08 07:25:55,668][41694] The model has been pushed to https://huggingface.co/alidenewade/rl_course_vizdoom_health_gathering_supreme-alid +[2024-11-08 07:29:58,984][41694] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-08 07:29:58,986][41694] Overriding arg 'num_workers' with value 1 passed from command line +[2024-11-08 07:29:58,987][41694] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-11-08 07:29:58,988][41694] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-11-08 07:29:58,992][41694] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-11-08 07:29:58,995][41694] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-11-08 07:29:58,996][41694] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2024-11-08 07:29:58,998][41694] Adding new argument 'max_num_episodes'=100 that is not in the saved config file! +[2024-11-08 07:29:58,999][41694] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-11-08 07:29:59,002][41694] Adding new argument 'hf_repository'='alidenewade/rl_course_vizdoom_health_gathering_supreme-alid' that is not in the saved config file! +[2024-11-08 07:29:59,003][41694] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-11-08 07:29:59,006][41694] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-11-08 07:29:59,007][41694] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-11-08 07:29:59,009][41694] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-11-08 07:29:59,012][41694] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-11-08 07:29:59,101][41694] RunningMeanStd input shape: (3, 72, 128) +[2024-11-08 07:29:59,160][41694] RunningMeanStd input shape: (1,) +[2024-11-08 07:29:59,257][41694] ConvEncoder: input_channels=3 +[2024-11-08 07:29:59,321][41694] Conv encoder output size: 512 +[2024-11-08 07:29:59,323][41694] Policy head output size: 512 +[2024-11-08 07:30:00,646][41694] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000052658_215687168.pth... +[2024-11-08 07:30:01,794][41694] Num frames 100... +[2024-11-08 07:30:02,261][41694] Num frames 200... +[2024-11-08 07:30:02,708][41694] Num frames 300... +[2024-11-08 07:30:03,173][41694] Num frames 400... +[2024-11-08 07:30:03,398][41694] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480 +[2024-11-08 07:30:03,401][41694] Avg episode reward: 5.480, avg true_objective: 4.480 +[2024-11-08 07:30:03,546][41694] Num frames 500... +[2024-11-08 07:30:03,816][41694] Num frames 600... +[2024-11-08 07:30:04,059][41694] Num frames 700... +[2024-11-08 07:30:04,126][41694] Avg episode rewards: #0: 4.020, true rewards: #0: 3.520 +[2024-11-08 07:30:04,129][41694] Avg episode reward: 4.020, avg true_objective: 3.520 +[2024-11-08 07:30:04,372][41694] Num frames 800... +[2024-11-08 07:30:04,594][41694] Num frames 900... +[2024-11-08 07:30:04,834][41694] Num frames 1000... +[2024-11-08 07:30:05,100][41694] Num frames 1100... +[2024-11-08 07:30:05,293][41694] Avg episode rewards: #0: 4.507, true rewards: #0: 3.840 +[2024-11-08 07:30:05,297][41694] Avg episode reward: 4.507, avg true_objective: 3.840 +[2024-11-08 07:30:05,452][41694] Num frames 1200... +[2024-11-08 07:30:05,755][41694] Num frames 1300... +[2024-11-08 07:30:06,055][41694] Num frames 1400... +[2024-11-08 07:30:06,313][41694] Num frames 1500... +[2024-11-08 07:30:06,455][41694] Avg episode rewards: #0: 4.340, true rewards: #0: 3.840 +[2024-11-08 07:30:06,457][41694] Avg episode reward: 4.340, avg true_objective: 3.840 +[2024-11-08 07:30:06,640][41694] Num frames 1600... +[2024-11-08 07:30:06,922][41694] Num frames 1700... +[2024-11-08 07:30:07,167][41694] Num frames 1800... +[2024-11-08 07:30:07,446][41694] Num frames 1900... +[2024-11-08 07:30:07,564][41694] Avg episode rewards: #0: 4.240, true rewards: #0: 3.840 +[2024-11-08 07:30:07,566][41694] Avg episode reward: 4.240, avg true_objective: 3.840 +[2024-11-08 07:30:07,826][41694] Num frames 2000... +[2024-11-08 07:30:08,037][41694] Num frames 2100... +[2024-11-08 07:30:08,261][41694] Num frames 2200... +[2024-11-08 07:30:08,459][41694] Num frames 2300... +[2024-11-08 07:30:08,657][41694] Avg episode rewards: #0: 4.447, true rewards: #0: 3.947 +[2024-11-08 07:30:08,660][41694] Avg episode reward: 4.447, avg true_objective: 3.947 +[2024-11-08 07:30:08,736][41694] Num frames 2400... +[2024-11-08 07:30:09,234][41694] Num frames 2500... +[2024-11-08 07:30:09,830][41694] Num frames 2600... +[2024-11-08 07:30:10,227][41694] Num frames 2700... +[2024-11-08 07:30:10,689][41694] Avg episode rewards: #0: 4.360, true rewards: #0: 3.931 +[2024-11-08 07:30:10,691][41694] Avg episode reward: 4.360, avg true_objective: 3.931 +[2024-11-08 07:30:10,944][41694] Num frames 2800... +[2024-11-08 07:30:11,284][41694] Num frames 2900... +[2024-11-08 07:30:11,598][41694] Num frames 3000... +[2024-11-08 07:30:11,890][41694] Num frames 3100... +[2024-11-08 07:30:12,154][41694] Num frames 3200... +[2024-11-08 07:30:12,397][41694] Num frames 3300... +[2024-11-08 07:30:12,622][41694] Num frames 3400... +[2024-11-08 07:30:12,749][41694] Avg episode rewards: #0: 5.155, true rewards: #0: 4.280 +[2024-11-08 07:30:12,752][41694] Avg episode reward: 5.155, avg true_objective: 4.280 +[2024-11-08 07:30:13,067][41694] Num frames 3500... +[2024-11-08 07:30:13,472][41694] Num frames 3600... +[2024-11-08 07:30:13,886][41694] Num frames 3700... +[2024-11-08 07:30:14,302][41694] Num frames 3800... +[2024-11-08 07:30:14,407][41694] Avg episode rewards: #0: 5.009, true rewards: #0: 4.231 +[2024-11-08 07:30:14,409][41694] Avg episode reward: 5.009, avg true_objective: 4.231 +[2024-11-08 07:30:15,032][41694] Num frames 3900... +[2024-11-08 07:30:15,484][41694] Num frames 4000... +[2024-11-08 07:30:15,808][41694] Avg episode rewards: #0: 4.764, true rewards: #0: 4.064 +[2024-11-08 07:30:15,810][41694] Avg episode reward: 4.764, avg true_objective: 4.064 +[2024-11-08 07:30:15,940][41694] Num frames 4100... +[2024-11-08 07:30:16,375][41694] Num frames 4200... +[2024-11-08 07:30:16,715][41694] Num frames 4300... +[2024-11-08 07:30:17,063][41694] Num frames 4400... +[2024-11-08 07:30:17,328][41694] Avg episode rewards: #0: 4.680, true rewards: #0: 4.044 +[2024-11-08 07:30:17,330][41694] Avg episode reward: 4.680, avg true_objective: 4.044 +[2024-11-08 07:30:17,487][41694] Num frames 4500... +[2024-11-08 07:30:17,776][41694] Num frames 4600... +[2024-11-08 07:30:18,342][41694] Num frames 4700... +[2024-11-08 07:30:18,855][41694] Num frames 4800... +[2024-11-08 07:30:19,598][41694] Avg episode rewards: #0: 4.747, true rewards: #0: 4.080 +[2024-11-08 07:30:19,599][41694] Avg episode reward: 4.747, avg true_objective: 4.080 +[2024-11-08 07:30:19,617][41694] Num frames 4900... +[2024-11-08 07:30:20,028][41694] Num frames 5000... +[2024-11-08 07:30:20,361][41694] Num frames 5100... +[2024-11-08 07:30:20,855][41694] Num frames 5200... +[2024-11-08 07:30:21,155][41694] Avg episode rewards: #0: 4.677, true rewards: #0: 4.062 +[2024-11-08 07:30:21,157][41694] Avg episode reward: 4.677, avg true_objective: 4.062 +[2024-11-08 07:30:21,280][41694] Num frames 5300... +[2024-11-08 07:30:21,890][41694] Num frames 5400... +[2024-11-08 07:30:22,236][41694] Num frames 5500... +[2024-11-08 07:30:22,727][41694] Num frames 5600... +[2024-11-08 07:30:23,395][41694] Num frames 5700... +[2024-11-08 07:30:23,652][41694] Avg episode rewards: #0: 4.734, true rewards: #0: 4.091 +[2024-11-08 07:30:23,654][41694] Avg episode reward: 4.734, avg true_objective: 4.091 +[2024-11-08 07:30:23,917][41694] Num frames 5800... +[2024-11-08 07:30:24,254][41694] Num frames 5900... +[2024-11-08 07:30:24,621][41694] Num frames 6000... +[2024-11-08 07:30:25,052][41694] Num frames 6100... +[2024-11-08 07:30:25,172][41694] Avg episode rewards: #0: 4.675, true rewards: #0: 4.075 +[2024-11-08 07:30:25,175][41694] Avg episode reward: 4.675, avg true_objective: 4.075 +[2024-11-08 07:30:25,556][41694] Num frames 6200... +[2024-11-08 07:30:25,913][41694] Num frames 6300... +[2024-11-08 07:30:26,321][41694] Num frames 6400... +[2024-11-08 07:30:26,719][41694] Avg episode rewards: #0: 4.623, true rewards: #0: 4.060 +[2024-11-08 07:30:26,722][41694] Avg episode reward: 4.623, avg true_objective: 4.060 +[2024-11-08 07:30:26,747][41694] Num frames 6500... +[2024-11-08 07:30:27,067][41694] Num frames 6600... +[2024-11-08 07:30:27,268][41694] Num frames 6700... +[2024-11-08 07:30:27,583][41694] Num frames 6800... +[2024-11-08 07:30:27,963][41694] Avg episode rewards: #0: 4.576, true rewards: #0: 4.047 +[2024-11-08 07:30:27,966][41694] Avg episode reward: 4.576, avg true_objective: 4.047 +[2024-11-08 07:30:28,077][41694] Num frames 6900... +[2024-11-08 07:30:28,659][41694] Num frames 7000... +[2024-11-08 07:30:28,953][41694] Num frames 7100... +[2024-11-08 07:30:29,073][41694] Avg episode rewards: #0: 4.464, true rewards: #0: 3.964 +[2024-11-08 07:30:29,078][41694] Avg episode reward: 4.464, avg true_objective: 3.964 +[2024-11-08 07:30:29,222][41694] Num frames 7200... +[2024-11-08 07:30:29,397][41694] Num frames 7300... +[2024-11-08 07:30:29,597][41694] Num frames 7400... +[2024-11-08 07:30:29,767][41694] Num frames 7500... +[2024-11-08 07:30:30,016][41694] Avg episode rewards: #0: 4.518, true rewards: #0: 3.992 +[2024-11-08 07:30:30,020][41694] Avg episode reward: 4.518, avg true_objective: 3.992 +[2024-11-08 07:30:30,083][41694] Num frames 7600... +[2024-11-08 07:30:30,289][41694] Num frames 7700... +[2024-11-08 07:30:30,670][41694] Num frames 7800... +[2024-11-08 07:30:31,036][41694] Num frames 7900... +[2024-11-08 07:30:31,233][41694] Avg episode rewards: #0: 4.484, true rewards: #0: 3.984 +[2024-11-08 07:30:31,238][41694] Avg episode reward: 4.484, avg true_objective: 3.984 +[2024-11-08 07:30:31,322][41694] Num frames 8000... +[2024-11-08 07:30:31,530][41694] Num frames 8100... +[2024-11-08 07:30:31,743][41694] Num frames 8200... +[2024-11-08 07:30:32,005][41694] Num frames 8300... +[2024-11-08 07:30:32,129][41694] Avg episode rewards: #0: 4.486, true rewards: #0: 3.962 +[2024-11-08 07:30:32,131][41694] Avg episode reward: 4.486, avg true_objective: 3.962 +[2024-11-08 07:30:32,407][41694] Num frames 8400... +[2024-11-08 07:30:32,650][41694] Num frames 8500... +[2024-11-08 07:30:32,902][41694] Num frames 8600... +[2024-11-08 07:30:33,191][41694] Num frames 8700... +[2024-11-08 07:30:33,452][41694] Avg episode rewards: #0: 4.531, true rewards: #0: 3.985 +[2024-11-08 07:30:33,455][41694] Avg episode reward: 4.531, avg true_objective: 3.985 +[2024-11-08 07:30:33,557][41694] Num frames 8800... +[2024-11-08 07:30:33,736][41694] Num frames 8900... +[2024-11-08 07:30:33,941][41694] Num frames 9000... +[2024-11-08 07:30:34,125][41694] Num frames 9100... +[2024-11-08 07:30:34,291][41694] Num frames 9200... +[2024-11-08 07:30:34,380][41694] Avg episode rewards: #0: 4.572, true rewards: #0: 4.007 +[2024-11-08 07:30:34,385][41694] Avg episode reward: 4.572, avg true_objective: 4.007 +[2024-11-08 07:30:34,557][41694] Num frames 9300... +[2024-11-08 07:30:34,735][41694] Num frames 9400... +[2024-11-08 07:30:34,996][41694] Avg episode rewards: #0: 4.488, true rewards: #0: 3.947 +[2024-11-08 07:30:35,000][41694] Avg episode reward: 4.488, avg true_objective: 3.947 +[2024-11-08 07:30:35,083][41694] Num frames 9500... +[2024-11-08 07:30:35,267][41694] Num frames 9600... +[2024-11-08 07:30:35,436][41694] Num frames 9700... +[2024-11-08 07:30:35,716][41694] Num frames 9800... +[2024-11-08 07:30:35,907][41694] Num frames 9900... +[2024-11-08 07:30:36,097][41694] Num frames 10000... +[2024-11-08 07:30:36,387][41694] Num frames 10100... +[2024-11-08 07:30:36,633][41694] Avg episode rewards: #0: 4.750, true rewards: #0: 4.070 +[2024-11-08 07:30:36,638][41694] Avg episode reward: 4.750, avg true_objective: 4.070 +[2024-11-08 07:30:36,713][41694] Num frames 10200... +[2024-11-08 07:30:37,163][41694] Num frames 10300... +[2024-11-08 07:30:37,639][41694] Num frames 10400... +[2024-11-08 07:30:37,867][41694] Num frames 10500... +[2024-11-08 07:30:38,078][41694] Avg episode rewards: #0: 4.715, true rewards: #0: 4.062 +[2024-11-08 07:30:38,080][41694] Avg episode reward: 4.715, avg true_objective: 4.062 +[2024-11-08 07:30:38,194][41694] Num frames 10600... +[2024-11-08 07:30:38,496][41694] Num frames 10700... +[2024-11-08 07:30:38,768][41694] Num frames 10800... +[2024-11-08 07:30:38,861][41694] Avg episode rewards: #0: 4.636, true rewards: #0: 4.006 +[2024-11-08 07:30:38,863][41694] Avg episode reward: 4.636, avg true_objective: 4.006 +[2024-11-08 07:30:39,120][41694] Num frames 10900... +[2024-11-08 07:30:39,539][41694] Num frames 11000... +[2024-11-08 07:30:39,828][41694] Num frames 11100... +[2024-11-08 07:30:40,118][41694] Num frames 11200... +[2024-11-08 07:30:40,418][41694] Num frames 11300... +[2024-11-08 07:30:40,661][41694] Avg episode rewards: #0: 4.736, true rewards: #0: 4.057 +[2024-11-08 07:30:40,664][41694] Avg episode reward: 4.736, avg true_objective: 4.057 +[2024-11-08 07:30:40,828][41694] Num frames 11400... +[2024-11-08 07:30:41,164][41694] Num frames 11500... +[2024-11-08 07:30:41,466][41694] Num frames 11600... +[2024-11-08 07:30:41,907][41694] Num frames 11700... +[2024-11-08 07:30:42,105][41694] Avg episode rewards: #0: 4.705, true rewards: #0: 4.050 +[2024-11-08 07:30:42,107][41694] Avg episode reward: 4.705, avg true_objective: 4.050 +[2024-11-08 07:30:42,384][41694] Num frames 11800... +[2024-11-08 07:30:42,930][41694] Num frames 11900... +[2024-11-08 07:30:43,457][41694] Num frames 12000... +[2024-11-08 07:30:43,919][41694] Num frames 12100... +[2024-11-08 07:30:44,544][41694] Avg episode rewards: #0: 4.731, true rewards: #0: 4.064 +[2024-11-08 07:30:44,546][41694] Avg episode reward: 4.731, avg true_objective: 4.064 +[2024-11-08 07:30:44,604][41694] Num frames 12200... +[2024-11-08 07:30:45,005][41694] Num frames 12300... +[2024-11-08 07:30:45,511][41694] Num frames 12400... +[2024-11-08 07:30:45,947][41694] Num frames 12500... +[2024-11-08 07:30:46,308][41694] Num frames 12600... +[2024-11-08 07:30:46,392][41694] Avg episode rewards: #0: 4.745, true rewards: #0: 4.067 +[2024-11-08 07:30:46,394][41694] Avg episode reward: 4.745, avg true_objective: 4.067 +[2024-11-08 07:30:46,604][41694] Num frames 12700... +[2024-11-08 07:30:46,828][41694] Num frames 12800... +[2024-11-08 07:30:47,103][41694] Num frames 12900... +[2024-11-08 07:30:47,405][41694] Num frames 13000... +[2024-11-08 07:30:47,580][41694] Avg episode rewards: #0: 4.768, true rewards: #0: 4.080 +[2024-11-08 07:30:47,584][41694] Avg episode reward: 4.768, avg true_objective: 4.080 +[2024-11-08 07:30:47,677][41694] Num frames 13100... +[2024-11-08 07:30:47,888][41694] Num frames 13200... +[2024-11-08 07:30:48,098][41694] Num frames 13300... +[2024-11-08 07:30:48,318][41694] Num frames 13400... +[2024-11-08 07:30:48,545][41694] Num frames 13500... +[2024-11-08 07:30:48,609][41694] Avg episode rewards: #0: 4.789, true rewards: #0: 4.092 +[2024-11-08 07:30:48,612][41694] Avg episode reward: 4.789, avg true_objective: 4.092 +[2024-11-08 07:30:48,934][41694] Num frames 13600... +[2024-11-08 07:30:49,193][41694] Num frames 13700... +[2024-11-08 07:30:49,384][41694] Num frames 13800... +[2024-11-08 07:30:49,612][41694] Avg episode rewards: #0: 4.761, true rewards: #0: 4.085 +[2024-11-08 07:30:49,614][41694] Avg episode reward: 4.761, avg true_objective: 4.085 +[2024-11-08 07:30:49,645][41694] Num frames 13900... +[2024-11-08 07:30:49,863][41694] Num frames 14000... +[2024-11-08 07:30:50,080][41694] Num frames 14100... +[2024-11-08 07:30:50,508][41694] Num frames 14200... +[2024-11-08 07:30:50,765][41694] Num frames 14300... +[2024-11-08 07:30:50,830][41694] Avg episode rewards: #0: 4.773, true rewards: #0: 4.087 +[2024-11-08 07:30:50,833][41694] Avg episode reward: 4.773, avg true_objective: 4.087 +[2024-11-08 07:30:51,033][41694] Num frames 14400... +[2024-11-08 07:30:51,250][41694] Num frames 14500... +[2024-11-08 07:30:51,467][41694] Num frames 14600... +[2024-11-08 07:30:51,692][41694] Avg episode rewards: #0: 4.747, true rewards: #0: 4.080 +[2024-11-08 07:30:51,695][41694] Avg episode reward: 4.747, avg true_objective: 4.080 +[2024-11-08 07:30:51,743][41694] Num frames 14700... +[2024-11-08 07:30:52,171][41694] Num frames 14800... +[2024-11-08 07:30:52,586][41694] Num frames 14900... +[2024-11-08 07:30:52,976][41694] Num frames 15000... +[2024-11-08 07:30:53,293][41694] Avg episode rewards: #0: 4.722, true rewards: #0: 4.074 +[2024-11-08 07:30:53,294][41694] Avg episode reward: 4.722, avg true_objective: 4.074 +[2024-11-08 07:30:53,355][41694] Num frames 15100... +[2024-11-08 07:30:53,598][41694] Num frames 15200... +[2024-11-08 07:30:53,870][41694] Num frames 15300... +[2024-11-08 07:30:54,084][41694] Num frames 15400... +[2024-11-08 07:30:54,319][41694] Avg episode rewards: #0: 4.734, true rewards: #0: 4.076 +[2024-11-08 07:30:54,321][41694] Avg episode reward: 4.734, avg true_objective: 4.076 +[2024-11-08 07:30:54,358][41694] Num frames 15500... +[2024-11-08 07:30:54,578][41694] Num frames 15600... +[2024-11-08 07:30:54,845][41694] Num frames 15700... +[2024-11-08 07:30:55,168][41694] Num frames 15800... +[2024-11-08 07:30:55,542][41694] Num frames 15900... +[2024-11-08 07:30:55,724][41694] Avg episode rewards: #0: 4.753, true rewards: #0: 4.086 +[2024-11-08 07:30:55,725][41694] Avg episode reward: 4.753, avg true_objective: 4.086 +[2024-11-08 07:30:55,941][41694] Num frames 16000... +[2024-11-08 07:30:56,264][41694] Num frames 16100... +[2024-11-08 07:30:56,530][41694] Num frames 16200... +[2024-11-08 07:30:56,859][41694] Num frames 16300... +[2024-11-08 07:30:56,974][41694] Avg episode rewards: #0: 4.730, true rewards: #0: 4.080 +[2024-11-08 07:30:56,976][41694] Avg episode reward: 4.730, avg true_objective: 4.080 +[2024-11-08 07:30:57,333][41694] Num frames 16400... +[2024-11-08 07:30:57,733][41694] Num frames 16500... +[2024-11-08 07:30:57,996][41694] Num frames 16600... +[2024-11-08 07:30:58,229][41694] Num frames 16700... +[2024-11-08 07:30:58,539][41694] Num frames 16800... +[2024-11-08 07:30:58,592][41694] Avg episode rewards: #0: 4.756, true rewards: #0: 4.098 +[2024-11-08 07:30:58,594][41694] Avg episode reward: 4.756, avg true_objective: 4.098 +[2024-11-08 07:30:58,848][41694] Num frames 16900... +[2024-11-08 07:30:59,225][41694] Num frames 17000... +[2024-11-08 07:30:59,502][41694] Num frames 17100... +[2024-11-08 07:30:59,737][41694] Num frames 17200... +[2024-11-08 07:30:59,901][41694] Avg episode rewards: #0: 4.773, true rewards: #0: 4.107 +[2024-11-08 07:30:59,903][41694] Avg episode reward: 4.773, avg true_objective: 4.107 +[2024-11-08 07:31:00,040][41694] Num frames 17300... +[2024-11-08 07:31:00,295][41694] Num frames 17400... +[2024-11-08 07:31:00,544][41694] Num frames 17500... +[2024-11-08 07:31:00,977][41694] Num frames 17600... +[2024-11-08 07:31:01,451][41694] Avg episode rewards: #0: 4.782, true rewards: #0: 4.108 +[2024-11-08 07:31:01,454][41694] Avg episode reward: 4.782, avg true_objective: 4.108 +[2024-11-08 07:31:01,558][41694] Num frames 17700... +[2024-11-08 07:31:01,957][41694] Num frames 17800... +[2024-11-08 07:31:02,399][41694] Num frames 17900... +[2024-11-08 07:31:02,662][41694] Num frames 18000... +[2024-11-08 07:31:02,848][41694] Avg episode rewards: #0: 4.761, true rewards: #0: 4.102 +[2024-11-08 07:31:02,850][41694] Avg episode reward: 4.761, avg true_objective: 4.102 +[2024-11-08 07:31:03,002][41694] Num frames 18100... +[2024-11-08 07:31:03,283][41694] Num frames 18200... +[2024-11-08 07:31:03,597][41694] Num frames 18300... +[2024-11-08 07:31:04,027][41694] Num frames 18400... +[2024-11-08 07:31:04,209][41694] Avg episode rewards: #0: 4.740, true rewards: #0: 4.096 +[2024-11-08 07:31:04,212][41694] Avg episode reward: 4.740, avg true_objective: 4.096 +[2024-11-08 07:31:04,536][41694] Num frames 18500... +[2024-11-08 07:31:05,018][41694] Num frames 18600... +[2024-11-08 07:31:05,437][41694] Num frames 18700... +[2024-11-08 07:31:05,721][41694] Num frames 18800... +[2024-11-08 07:31:06,002][41694] Num frames 18900... +[2024-11-08 07:31:06,188][41694] Avg episode rewards: #0: 4.792, true rewards: #0: 4.118 +[2024-11-08 07:31:06,196][41694] Avg episode reward: 4.792, avg true_objective: 4.118 +[2024-11-08 07:31:06,339][41694] Num frames 19000... +[2024-11-08 07:31:06,580][41694] Num frames 19100... +[2024-11-08 07:31:06,798][41694] Num frames 19200... +[2024-11-08 07:31:07,053][41694] Num frames 19300... +[2024-11-08 07:31:07,191][41694] Avg episode rewards: #0: 4.772, true rewards: #0: 4.112 +[2024-11-08 07:31:07,194][41694] Avg episode reward: 4.772, avg true_objective: 4.112 +[2024-11-08 07:31:07,414][41694] Num frames 19400... +[2024-11-08 07:31:07,853][41694] Num frames 19500... +[2024-11-08 07:31:08,085][41694] Num frames 19600... +[2024-11-08 07:31:08,249][41694] Avg episode rewards: #0: 4.760, true rewards: #0: 4.093 +[2024-11-08 07:31:08,253][41694] Avg episode reward: 4.760, avg true_objective: 4.093 +[2024-11-08 07:31:08,394][41694] Num frames 19700... +[2024-11-08 07:31:08,612][41694] Num frames 19800... +[2024-11-08 07:31:08,804][41694] Num frames 19900... +[2024-11-08 07:31:09,021][41694] Num frames 20000... +[2024-11-08 07:31:09,262][41694] Avg episode rewards: #0: 4.775, true rewards: #0: 4.101 +[2024-11-08 07:31:09,266][41694] Avg episode reward: 4.775, avg true_objective: 4.101 +[2024-11-08 07:31:09,293][41694] Num frames 20100... +[2024-11-08 07:31:09,534][41694] Num frames 20200... +[2024-11-08 07:31:09,745][41694] Num frames 20300... +[2024-11-08 07:31:10,046][41694] Num frames 20400... +[2024-11-08 07:31:10,293][41694] Avg episode rewards: #0: 4.756, true rewards: #0: 4.096 +[2024-11-08 07:31:10,294][41694] Avg episode reward: 4.756, avg true_objective: 4.096 +[2024-11-08 07:31:10,341][41694] Num frames 20500... +[2024-11-08 07:31:10,564][41694] Num frames 20600... +[2024-11-08 07:31:10,797][41694] Num frames 20700... +[2024-11-08 07:31:11,041][41694] Num frames 20800... +[2024-11-08 07:31:11,256][41694] Avg episode rewards: #0: 4.738, true rewards: #0: 4.091 +[2024-11-08 07:31:11,259][41694] Avg episode reward: 4.738, avg true_objective: 4.091 +[2024-11-08 07:31:11,339][41694] Num frames 20900... +[2024-11-08 07:31:11,603][41694] Num frames 21000... +[2024-11-08 07:31:11,868][41694] Num frames 21100... +[2024-11-08 07:31:12,104][41694] Num frames 21200... +[2024-11-08 07:31:12,345][41694] Avg episode rewards: #0: 4.746, true rewards: #0: 4.092 +[2024-11-08 07:31:12,346][41694] Avg episode reward: 4.746, avg true_objective: 4.092 +[2024-11-08 07:31:12,404][41694] Num frames 21300... +[2024-11-08 07:31:12,655][41694] Num frames 21400... +[2024-11-08 07:31:12,935][41694] Num frames 21500... +[2024-11-08 07:31:13,099][41694] Avg episode rewards: #0: 4.705, true rewards: #0: 4.063 +[2024-11-08 07:31:13,101][41694] Avg episode reward: 4.705, avg true_objective: 4.063 +[2024-11-08 07:31:13,281][41694] Num frames 21600... +[2024-11-08 07:31:13,584][41694] Num frames 21700... +[2024-11-08 07:31:13,841][41694] Num frames 21800... +[2024-11-08 07:31:14,056][41694] Num frames 21900... +[2024-11-08 07:31:14,280][41694] Avg episode rewards: #0: 4.719, true rewards: #0: 4.071 +[2024-11-08 07:31:14,283][41694] Avg episode reward: 4.719, avg true_objective: 4.071 +[2024-11-08 07:31:14,340][41694] Num frames 22000... +[2024-11-08 07:31:14,555][41694] Num frames 22100... +[2024-11-08 07:31:14,769][41694] Num frames 22200... +[2024-11-08 07:31:15,012][41694] Num frames 22300... +[2024-11-08 07:31:15,256][41694] Avg episode rewards: #0: 4.703, true rewards: #0: 4.067 +[2024-11-08 07:31:15,258][41694] Avg episode reward: 4.703, avg true_objective: 4.067 +[2024-11-08 07:31:15,367][41694] Num frames 22400... +[2024-11-08 07:31:15,630][41694] Num frames 22500... +[2024-11-08 07:31:15,822][41694] Num frames 22600... +[2024-11-08 07:31:16,108][41694] Num frames 22700... +[2024-11-08 07:31:16,290][41694] Avg episode rewards: #0: 4.688, true rewards: #0: 4.063 +[2024-11-08 07:31:16,295][41694] Avg episode reward: 4.688, avg true_objective: 4.063 +[2024-11-08 07:31:16,408][41694] Num frames 22800... +[2024-11-08 07:31:16,606][41694] Num frames 22900... +[2024-11-08 07:31:16,824][41694] Num frames 23000... +[2024-11-08 07:31:17,165][41694] Num frames 23100... +[2024-11-08 07:31:17,310][41694] Avg episode rewards: #0: 4.673, true rewards: #0: 4.059 +[2024-11-08 07:31:17,315][41694] Avg episode reward: 4.673, avg true_objective: 4.059 +[2024-11-08 07:31:17,476][41694] Num frames 23200... +[2024-11-08 07:31:17,720][41694] Num frames 23300... +[2024-11-08 07:31:18,125][41694] Num frames 23400... +[2024-11-08 07:31:18,474][41694] Num frames 23500... +[2024-11-08 07:31:18,769][41694] Avg episode rewards: #0: 4.687, true rewards: #0: 4.066 +[2024-11-08 07:31:18,772][41694] Avg episode reward: 4.687, avg true_objective: 4.066 +[2024-11-08 07:31:18,873][41694] Num frames 23600... +[2024-11-08 07:31:19,292][41694] Num frames 23700... +[2024-11-08 07:31:19,537][41694] Num frames 23800... +[2024-11-08 07:31:19,765][41694] Num frames 23900... +[2024-11-08 07:31:20,057][41694] Num frames 24000... +[2024-11-08 07:31:20,204][41694] Avg episode rewards: #0: 4.700, true rewards: #0: 4.073 +[2024-11-08 07:31:20,206][41694] Avg episode reward: 4.700, avg true_objective: 4.073 +[2024-11-08 07:31:20,445][41694] Num frames 24100... +[2024-11-08 07:31:20,787][41694] Num frames 24200... +[2024-11-08 07:31:21,033][41694] Num frames 24300... +[2024-11-08 07:31:21,235][41694] Num frames 24400... +[2024-11-08 07:31:21,331][41694] Avg episode rewards: #0: 4.686, true rewards: #0: 4.069 +[2024-11-08 07:31:21,337][41694] Avg episode reward: 4.686, avg true_objective: 4.069 +[2024-11-08 07:31:21,530][41694] Num frames 24500... +[2024-11-08 07:31:21,801][41694] Num frames 24600... +[2024-11-08 07:31:22,067][41694] Num frames 24700... +[2024-11-08 07:31:22,314][41694] Num frames 24800... +[2024-11-08 07:31:22,368][41694] Avg episode rewards: #0: 4.672, true rewards: #0: 4.066 +[2024-11-08 07:31:22,370][41694] Avg episode reward: 4.672, avg true_objective: 4.066 +[2024-11-08 07:31:22,655][41694] Num frames 24900... +[2024-11-08 07:31:22,855][41694] Num frames 25000... +[2024-11-08 07:31:23,061][41694] Num frames 25100... +[2024-11-08 07:31:23,285][41694] Avg episode rewards: #0: 4.659, true rewards: #0: 4.062 +[2024-11-08 07:31:23,288][41694] Avg episode reward: 4.659, avg true_objective: 4.062 +[2024-11-08 07:31:23,340][41694] Num frames 25200... +[2024-11-08 07:31:23,610][41694] Num frames 25300... +[2024-11-08 07:31:23,966][41694] Num frames 25400... +[2024-11-08 07:31:24,123][41694] Avg episode rewards: #0: 4.625, true rewards: #0: 4.038 +[2024-11-08 07:31:24,127][41694] Avg episode reward: 4.625, avg true_objective: 4.038 +[2024-11-08 07:31:24,267][41694] Num frames 25500... +[2024-11-08 07:31:24,524][41694] Num frames 25600... +[2024-11-08 07:31:24,841][41694] Num frames 25700... +[2024-11-08 07:31:25,074][41694] Num frames 25800... +[2024-11-08 07:31:25,188][41694] Avg episode rewards: #0: 4.613, true rewards: #0: 4.035 +[2024-11-08 07:31:25,189][41694] Avg episode reward: 4.613, avg true_objective: 4.035 +[2024-11-08 07:31:25,366][41694] Num frames 25900... +[2024-11-08 07:31:25,582][41694] Num frames 26000... +[2024-11-08 07:31:25,816][41694] Num frames 26100... +[2024-11-08 07:31:26,135][41694] Num frames 26200... +[2024-11-08 07:31:26,236][41694] Avg episode rewards: #0: 4.601, true rewards: #0: 4.032 +[2024-11-08 07:31:26,239][41694] Avg episode reward: 4.601, avg true_objective: 4.032 +[2024-11-08 07:31:26,573][41694] Num frames 26300... +[2024-11-08 07:31:26,892][41694] Num frames 26400... +[2024-11-08 07:31:27,357][41694] Num frames 26500... +[2024-11-08 07:31:27,851][41694] Avg episode rewards: #0: 4.590, true rewards: #0: 4.029 +[2024-11-08 07:31:27,854][41694] Avg episode reward: 4.590, avg true_objective: 4.029 +[2024-11-08 07:31:27,884][41694] Num frames 26600... +[2024-11-08 07:31:28,188][41694] Num frames 26700... +[2024-11-08 07:31:28,592][41694] Num frames 26800... +[2024-11-08 07:31:28,886][41694] Num frames 26900... +[2024-11-08 07:31:29,147][41694] Num frames 27000... +[2024-11-08 07:31:29,296][41694] Avg episode rewards: #0: 4.603, true rewards: #0: 4.036 +[2024-11-08 07:31:29,299][41694] Avg episode reward: 4.603, avg true_objective: 4.036 +[2024-11-08 07:31:29,493][41694] Num frames 27100... +[2024-11-08 07:31:29,788][41694] Num frames 27200... +[2024-11-08 07:31:30,071][41694] Num frames 27300... +[2024-11-08 07:31:30,483][41694] Num frames 27400... +[2024-11-08 07:31:30,833][41694] Avg episode rewards: #0: 4.616, true rewards: #0: 4.042 +[2024-11-08 07:31:30,837][41694] Avg episode reward: 4.616, avg true_objective: 4.042 +[2024-11-08 07:31:30,910][41694] Num frames 27500... +[2024-11-08 07:31:31,153][41694] Num frames 27600... +[2024-11-08 07:31:31,369][41694] Num frames 27700... +[2024-11-08 07:31:31,583][41694] Num frames 27800... +[2024-11-08 07:31:31,795][41694] Num frames 27900... +[2024-11-08 07:31:31,989][41694] Num frames 28000... +[2024-11-08 07:31:32,117][41694] Avg episode rewards: #0: 4.657, true rewards: #0: 4.063 +[2024-11-08 07:31:32,121][41694] Avg episode reward: 4.657, avg true_objective: 4.063 +[2024-11-08 07:31:32,279][41694] Num frames 28100... +[2024-11-08 07:31:32,483][41694] Num frames 28200... +[2024-11-08 07:31:32,818][41694] Num frames 28300... +[2024-11-08 07:31:33,079][41694] Num frames 28400... +[2024-11-08 07:31:33,186][41694] Avg episode rewards: #0: 4.645, true rewards: #0: 4.059 +[2024-11-08 07:31:33,190][41694] Avg episode reward: 4.645, avg true_objective: 4.059 +[2024-11-08 07:31:33,404][41694] Num frames 28500... +[2024-11-08 07:31:33,635][41694] Num frames 28600... +[2024-11-08 07:31:33,957][41694] Num frames 28700... +[2024-11-08 07:31:34,229][41694] Num frames 28800... +[2024-11-08 07:31:34,282][41694] Avg episode rewards: #0: 4.634, true rewards: #0: 4.056 +[2024-11-08 07:31:34,284][41694] Avg episode reward: 4.634, avg true_objective: 4.056 +[2024-11-08 07:31:34,797][41694] Num frames 28900... +[2024-11-08 07:31:35,487][41694] Num frames 29000... +[2024-11-08 07:31:36,167][41694] Num frames 29100... +[2024-11-08 07:31:36,718][41694] Num frames 29200... +[2024-11-08 07:31:36,927][41694] Avg episode rewards: #0: 4.646, true rewards: #0: 4.062 +[2024-11-08 07:31:36,930][41694] Avg episode reward: 4.646, avg true_objective: 4.062 +[2024-11-08 07:31:37,113][41694] Num frames 29300... +[2024-11-08 07:31:37,661][41694] Num frames 29400... +[2024-11-08 07:31:38,407][41694] Num frames 29500... +[2024-11-08 07:31:38,949][41694] Num frames 29600... +[2024-11-08 07:31:39,500][41694] Num frames 29700... +[2024-11-08 07:31:40,096][41694] Avg episode rewards: #0: 4.684, true rewards: #0: 4.081 +[2024-11-08 07:31:40,098][41694] Avg episode reward: 4.684, avg true_objective: 4.081 +[2024-11-08 07:31:40,150][41694] Num frames 29800... +[2024-11-08 07:31:40,741][41694] Num frames 29900... +[2024-11-08 07:31:41,206][41694] Num frames 30000... +[2024-11-08 07:31:41,701][41694] Num frames 30100... +[2024-11-08 07:31:42,314][41694] Avg episode rewards: #0: 4.672, true rewards: #0: 4.078 +[2024-11-08 07:31:42,319][41694] Avg episode reward: 4.672, avg true_objective: 4.078 +[2024-11-08 07:31:42,556][41694] Num frames 30200... +[2024-11-08 07:31:43,282][41694] Num frames 30300... +[2024-11-08 07:31:43,784][41694] Num frames 30400... +[2024-11-08 07:31:44,202][41694] Num frames 30500... +[2024-11-08 07:31:44,421][41694] Avg episode rewards: #0: 4.661, true rewards: #0: 4.075 +[2024-11-08 07:31:44,424][41694] Avg episode reward: 4.661, avg true_objective: 4.075 +[2024-11-08 07:31:44,561][41694] Num frames 30600... +[2024-11-08 07:31:44,926][41694] Num frames 30700... +[2024-11-08 07:31:45,279][41694] Num frames 30800... +[2024-11-08 07:31:45,677][41694] Num frames 30900... +[2024-11-08 07:31:45,861][41694] Avg episode rewards: #0: 4.651, true rewards: #0: 4.072 +[2024-11-08 07:31:45,862][41694] Avg episode reward: 4.651, avg true_objective: 4.072 +[2024-11-08 07:31:46,053][41694] Num frames 31000... +[2024-11-08 07:31:46,375][41694] Num frames 31100... +[2024-11-08 07:31:46,652][41694] Num frames 31200... +[2024-11-08 07:31:47,104][41694] Num frames 31300... +[2024-11-08 07:31:47,329][41694] Avg episode rewards: #0: 4.640, true rewards: #0: 4.069 +[2024-11-08 07:31:47,332][41694] Avg episode reward: 4.640, avg true_objective: 4.069 +[2024-11-08 07:31:47,638][41694] Num frames 31400... +[2024-11-08 07:31:48,021][41694] Num frames 31500... +[2024-11-08 07:31:48,466][41694] Num frames 31600... +[2024-11-08 07:31:48,988][41694] Num frames 31700... +[2024-11-08 07:31:49,097][41694] Avg episode rewards: #0: 4.630, true rewards: #0: 4.066 +[2024-11-08 07:31:49,101][41694] Avg episode reward: 4.630, avg true_objective: 4.066 +[2024-11-08 07:31:49,497][41694] Num frames 31800... +[2024-11-08 07:31:49,791][41694] Num frames 31900... +[2024-11-08 07:31:50,331][41694] Num frames 32000... +[2024-11-08 07:31:50,654][41694] Avg episode rewards: #0: 4.620, true rewards: #0: 4.063 +[2024-11-08 07:31:50,655][41694] Avg episode reward: 4.620, avg true_objective: 4.063 +[2024-11-08 07:31:50,667][41694] Num frames 32100... +[2024-11-08 07:31:51,086][41694] Num frames 32200... +[2024-11-08 07:31:51,454][41694] Num frames 32300... +[2024-11-08 07:31:51,923][41694] Num frames 32400... +[2024-11-08 07:31:52,392][41694] Num frames 32500... +[2024-11-08 07:31:52,716][41694] Avg episode rewards: #0: 4.631, true rewards: #0: 4.068 +[2024-11-08 07:31:52,719][41694] Avg episode reward: 4.631, avg true_objective: 4.068 +[2024-11-08 07:31:52,931][41694] Num frames 32600... +[2024-11-08 07:31:53,310][41694] Num frames 32700... +[2024-11-08 07:31:53,780][41694] Num frames 32800... +[2024-11-08 07:31:54,199][41694] Num frames 32900... +[2024-11-08 07:31:54,340][41694] Avg episode rewards: #0: 4.621, true rewards: #0: 4.065 +[2024-11-08 07:31:54,342][41694] Avg episode reward: 4.621, avg true_objective: 4.065 +[2024-11-08 07:31:54,548][41694] Num frames 33000... +[2024-11-08 07:31:54,898][41694] Num frames 33100... +[2024-11-08 07:31:55,203][41694] Num frames 33200... +[2024-11-08 07:31:55,497][41694] Num frames 33300... +[2024-11-08 07:31:55,782][41694] Avg episode rewards: #0: 4.631, true rewards: #0: 4.070 +[2024-11-08 07:31:55,786][41694] Avg episode reward: 4.631, avg true_objective: 4.070 +[2024-11-08 07:31:55,864][41694] Num frames 33400... +[2024-11-08 07:31:56,132][41694] Num frames 33500... +[2024-11-08 07:31:56,397][41694] Num frames 33600... +[2024-11-08 07:31:56,673][41694] Num frames 33700... +[2024-11-08 07:31:56,951][41694] Avg episode rewards: #0: 4.622, true rewards: #0: 4.067 +[2024-11-08 07:31:56,953][41694] Avg episode reward: 4.622, avg true_objective: 4.067 +[2024-11-08 07:31:57,051][41694] Num frames 33800... +[2024-11-08 07:31:57,262][41694] Num frames 33900... +[2024-11-08 07:31:57,466][41694] Num frames 34000... +[2024-11-08 07:31:57,663][41694] Num frames 34100... +[2024-11-08 07:31:57,725][41694] Avg episode rewards: #0: 4.607, true rewards: #0: 4.060 +[2024-11-08 07:31:57,730][41694] Avg episode reward: 4.607, avg true_objective: 4.060 +[2024-11-08 07:31:57,954][41694] Num frames 34200... +[2024-11-08 07:31:58,156][41694] Num frames 34300... +[2024-11-08 07:31:58,346][41694] Num frames 34400... +[2024-11-08 07:31:58,545][41694] Num frames 34500... +[2024-11-08 07:31:58,638][41694] Avg episode rewards: #0: 4.614, true rewards: #0: 4.061 +[2024-11-08 07:31:58,641][41694] Avg episode reward: 4.614, avg true_objective: 4.061 +[2024-11-08 07:31:58,831][41694] Num frames 34600... +[2024-11-08 07:31:59,020][41694] Num frames 34700... +[2024-11-08 07:31:59,225][41694] Num frames 34800... +[2024-11-08 07:31:59,430][41694] Num frames 34900... +[2024-11-08 07:31:59,555][41694] Avg episode rewards: #0: 4.620, true rewards: #0: 4.062 +[2024-11-08 07:31:59,562][41694] Avg episode reward: 4.620, avg true_objective: 4.062 +[2024-11-08 07:31:59,696][41694] Num frames 35000... +[2024-11-08 07:31:59,888][41694] Num frames 35100... +[2024-11-08 07:32:00,098][41694] Num frames 35200... +[2024-11-08 07:32:00,327][41694] Num frames 35300... +[2024-11-08 07:32:00,427][41694] Avg episode rewards: #0: 4.611, true rewards: #0: 4.059 +[2024-11-08 07:32:00,430][41694] Avg episode reward: 4.611, avg true_objective: 4.059 +[2024-11-08 07:32:00,670][41694] Num frames 35400... +[2024-11-08 07:32:00,974][41694] Num frames 35500... +[2024-11-08 07:32:01,256][41694] Num frames 35600... +[2024-11-08 07:32:01,555][41694] Num frames 35700... +[2024-11-08 07:32:01,619][41694] Avg episode rewards: #0: 4.602, true rewards: #0: 4.057 +[2024-11-08 07:32:01,622][41694] Avg episode reward: 4.602, avg true_objective: 4.057 +[2024-11-08 07:32:01,841][41694] Num frames 35800... +[2024-11-08 07:32:02,034][41694] Num frames 35900... +[2024-11-08 07:32:02,231][41694] Num frames 36000... +[2024-11-08 07:32:02,438][41694] Avg episode rewards: #0: 4.601, true rewards: #0: 4.051 +[2024-11-08 07:32:02,441][41694] Avg episode reward: 4.601, avg true_objective: 4.051 +[2024-11-08 07:32:02,549][41694] Num frames 36100... +[2024-11-08 07:32:02,834][41694] Num frames 36200... +[2024-11-08 07:32:03,202][41694] Num frames 36300... +[2024-11-08 07:32:03,426][41694] Num frames 36400... +[2024-11-08 07:32:03,646][41694] Avg episode rewards: #0: 4.597, true rewards: #0: 4.052 +[2024-11-08 07:32:03,650][41694] Avg episode reward: 4.597, avg true_objective: 4.052 +[2024-11-08 07:32:03,727][41694] Num frames 36500... +[2024-11-08 07:32:03,941][41694] Num frames 36600... +[2024-11-08 07:32:04,258][41694] Num frames 36700... +[2024-11-08 07:32:04,471][41694] Num frames 36800... +[2024-11-08 07:32:04,776][41694] Num frames 36900... +[2024-11-08 07:32:04,877][41694] Avg episode rewards: #0: 4.606, true rewards: #0: 4.057 +[2024-11-08 07:32:04,881][41694] Avg episode reward: 4.606, avg true_objective: 4.057 +[2024-11-08 07:32:05,069][41694] Num frames 37000... +[2024-11-08 07:32:05,281][41694] Num frames 37100... +[2024-11-08 07:32:05,503][41694] Num frames 37200... +[2024-11-08 07:32:05,787][41694] Num frames 37300... +[2024-11-08 07:32:05,990][41694] Avg episode rewards: #0: 4.612, true rewards: #0: 4.058 +[2024-11-08 07:32:05,994][41694] Avg episode reward: 4.612, avg true_objective: 4.058 +[2024-11-08 07:32:06,179][41694] Num frames 37400... +[2024-11-08 07:32:06,448][41694] Num frames 37500... +[2024-11-08 07:32:06,767][41694] Num frames 37600... +[2024-11-08 07:32:07,060][41694] Num frames 37700... +[2024-11-08 07:32:07,330][41694] Num frames 37800... +[2024-11-08 07:32:07,603][41694] Avg episode rewards: #0: 4.643, true rewards: #0: 4.073 +[2024-11-08 07:32:07,605][41694] Avg episode reward: 4.643, avg true_objective: 4.073 +[2024-11-08 07:32:07,707][41694] Num frames 37900... +[2024-11-08 07:32:08,006][41694] Num frames 38000... +[2024-11-08 07:32:08,269][41694] Num frames 38100... +[2024-11-08 07:32:08,637][41694] Num frames 38200... +[2024-11-08 07:32:08,953][41694] Avg episode rewards: #0: 4.634, true rewards: #0: 4.070 +[2024-11-08 07:32:08,956][41694] Avg episode reward: 4.634, avg true_objective: 4.070 +[2024-11-08 07:32:09,061][41694] Num frames 38300... +[2024-11-08 07:32:09,255][41694] Num frames 38400... +[2024-11-08 07:32:09,459][41694] Num frames 38500... +[2024-11-08 07:32:09,655][41694] Num frames 38600... +[2024-11-08 07:32:09,865][41694] Num frames 38700... +[2024-11-08 07:32:09,943][41694] Avg episode rewards: #0: 4.643, true rewards: #0: 4.075 +[2024-11-08 07:32:09,947][41694] Avg episode reward: 4.643, avg true_objective: 4.075 +[2024-11-08 07:32:10,138][41694] Num frames 38800... +[2024-11-08 07:32:10,322][41694] Num frames 38900... +[2024-11-08 07:32:10,514][41694] Num frames 39000... +[2024-11-08 07:32:10,743][41694] Avg episode rewards: #0: 4.635, true rewards: #0: 4.072 +[2024-11-08 07:32:10,748][41694] Avg episode reward: 4.635, avg true_objective: 4.072 +[2024-11-08 07:32:10,783][41694] Num frames 39100... +[2024-11-08 07:32:10,986][41694] Num frames 39200... +[2024-11-08 07:32:11,181][41694] Num frames 39300... +[2024-11-08 07:32:11,376][41694] Num frames 39400... +[2024-11-08 07:32:11,582][41694] Avg episode rewards: #0: 4.627, true rewards: #0: 4.070 +[2024-11-08 07:32:11,585][41694] Avg episode reward: 4.627, avg true_objective: 4.070 +[2024-11-08 07:32:11,655][41694] Num frames 39500... +[2024-11-08 07:32:11,850][41694] Num frames 39600... +[2024-11-08 07:32:12,052][41694] Num frames 39700... +[2024-11-08 07:32:12,244][41694] Num frames 39800... +[2024-11-08 07:32:12,416][41694] Avg episode rewards: #0: 4.618, true rewards: #0: 4.067 +[2024-11-08 07:32:12,419][41694] Avg episode reward: 4.618, avg true_objective: 4.067 +[2024-11-08 07:32:12,519][41694] Num frames 39900... +[2024-11-08 07:32:12,725][41694] Num frames 40000... +[2024-11-08 07:32:12,923][41694] Num frames 40100... +[2024-11-08 07:32:13,122][41694] Num frames 40200... +[2024-11-08 07:32:13,317][41694] Num frames 40300... +[2024-11-08 07:32:13,394][41694] Avg episode rewards: #0: 4.627, true rewards: #0: 4.072 +[2024-11-08 07:32:13,398][41694] Avg episode reward: 4.627, avg true_objective: 4.072 +[2024-11-08 07:32:13,589][41694] Num frames 40400... +[2024-11-08 07:32:13,790][41694] Num frames 40500... +[2024-11-08 07:32:14,006][41694] Num frames 40600... +[2024-11-08 07:32:14,337][41694] Avg episode rewards: #0: 4.619, true rewards: #0: 4.069 +[2024-11-08 07:32:14,339][41694] Avg episode reward: 4.619, avg true_objective: 4.069 +[2024-11-08 07:34:27,542][41694] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4! +[2024-11-08 07:34:47,182][41694] The model has been pushed to https://huggingface.co/alidenewade/rl_course_vizdoom_health_gathering_supreme-alid +[2024-11-08 07:44:57,962][41694] Environment doom_basic already registered, overwriting... +[2024-11-08 07:44:58,190][41694] Environment doom_two_colors_easy already registered, overwriting... +[2024-11-08 07:44:58,195][41694] Environment doom_two_colors_hard already registered, overwriting... +[2024-11-08 07:44:58,200][41694] Environment doom_dm already registered, overwriting... +[2024-11-08 07:44:58,205][41694] Environment doom_dwango5 already registered, overwriting... +[2024-11-08 07:44:58,208][41694] Environment doom_my_way_home_flat_actions already registered, overwriting... +[2024-11-08 07:44:58,212][41694] Environment doom_defend_the_center_flat_actions already registered, overwriting... +[2024-11-08 07:44:58,216][41694] Environment doom_my_way_home already registered, overwriting... +[2024-11-08 07:44:58,218][41694] Environment doom_deadly_corridor already registered, overwriting... +[2024-11-08 07:44:58,220][41694] Environment doom_defend_the_center already registered, overwriting... +[2024-11-08 07:44:58,223][41694] Environment doom_defend_the_line already registered, overwriting... +[2024-11-08 07:44:58,225][41694] Environment doom_health_gathering already registered, overwriting... +[2024-11-08 07:44:58,227][41694] Environment doom_health_gathering_supreme already registered, overwriting... +[2024-11-08 07:44:58,229][41694] Environment doom_battle already registered, overwriting... +[2024-11-08 07:44:58,230][41694] Environment doom_battle2 already registered, overwriting... +[2024-11-08 07:44:58,234][41694] Environment doom_duel_bots already registered, overwriting... +[2024-11-08 07:44:58,236][41694] Environment doom_deathmatch_bots already registered, overwriting... +[2024-11-08 07:44:58,238][41694] Environment doom_duel already registered, overwriting... +[2024-11-08 07:44:58,241][41694] Environment doom_deathmatch_full already registered, overwriting... +[2024-11-08 07:44:58,242][41694] Environment doom_benchmark already registered, overwriting... +[2024-11-08 07:44:58,245][41694] register_encoder_factory: +[2024-11-08 07:44:58,567][41694] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-08 07:44:58,570][41694] Overriding arg 'num_workers' with value 16 passed from command line +[2024-11-08 07:44:58,571][41694] Overriding arg 'num_envs_per_worker' with value 8 passed from command line +[2024-11-08 07:44:58,573][41694] Overriding arg 'train_for_env_steps' with value 5000 passed from command line +[2024-11-08 07:44:58,582][41694] Experiment dir /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment already exists! +[2024-11-08 07:44:58,585][41694] Resuming existing experiment from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment... +[2024-11-08 07:44:58,589][41694] Weights and Biases integration disabled +[2024-11-08 07:44:58,655][41694] Environment var CUDA_VISIBLE_DEVICES is 0 + +[2024-11-08 07:45:16,919][41694] Starting experiment with the following configuration: +help=False +algo=APPO +env=doom_health_gathering_supreme +experiment=default_experiment +train_dir=/root/hfRL/ml/LunarLander-v2/train_dir +restart_behavior=resume +device=gpu +seed=None +num_policies=1 +async_rl=True +serial_mode=False +batched_sampling=False +num_batches_to_accumulate=2 +worker_num_splits=2 +policy_workers_per_policy=1 +max_policy_lag=1000 +num_workers=16 +num_envs_per_worker=8 +batch_size=1024 +num_batches_per_epoch=1 +num_epochs=1 +rollout=32 +recurrence=32 +shuffle_minibatches=False +gamma=0.99 +reward_scale=1.0 +reward_clip=1000.0 +value_bootstrap=False +normalize_returns=True +exploration_loss_coeff=0.001 +value_loss_coeff=0.5 +kl_loss_coeff=0.0 +exploration_loss=symmetric_kl +gae_lambda=0.95 +ppo_clip_ratio=0.1 +ppo_clip_value=0.2 +with_vtrace=False +vtrace_rho=1.0 +vtrace_c=1.0 +optimizer=adam +adam_eps=1e-06 +adam_beta1=0.9 +adam_beta2=0.999 +max_grad_norm=4.0 +learning_rate=0.0003 +lr_schedule=constant +lr_schedule_kl_threshold=0.008 +lr_adaptive_min=1e-06 +lr_adaptive_max=0.01 +obs_subtract_mean=0.0 +obs_scale=255.0 +normalize_input=True +normalize_input_keys=None +decorrelate_experience_max_seconds=0 +decorrelate_envs_on_one_worker=True +actor_worker_gpus=[] +set_workers_cpu_affinity=True +force_envs_single_thread=False +default_niceness=0 +log_to_file=True +experiment_summaries_interval=10 +flush_summaries_interval=30 +stats_avg=100 +summaries_use_frameskip=True +heartbeat_interval=20 +heartbeat_reporting_interval=600 +train_for_env_steps=5000 +train_for_seconds=10000000000 +save_every_sec=120 +keep_checkpoints=2 +load_checkpoint_kind=latest +save_milestones_sec=-1 +save_best_every_sec=5 +save_best_metric=reward +save_best_after=100000 +benchmark=False +encoder_mlp_layers=[512, 512] +encoder_conv_architecture=convnet_simple +encoder_conv_mlp_layers=[512] +use_rnn=True +rnn_size=512 +rnn_type=gru +rnn_num_layers=1 +decoder_mlp_layers=[] +nonlinearity=elu +policy_initialization=orthogonal +policy_init_gain=1.0 +actor_critic_share_weights=True +adaptive_stddev=True +continuous_tanh_scale=0.0 +initial_stddev=1.0 +use_env_info_cache=False +env_gpu_actions=False +env_gpu_observations=True +env_frameskip=4 +env_framestack=1 +pixel_format=CHW +use_record_episode_statistics=False +with_wandb=False +wandb_user=None +wandb_project=sample_factory +wandb_group=None +wandb_job_type=SF +wandb_tags=[] +with_pbt=False +pbt_mix_policies_in_one_env=True +pbt_period_env_steps=5000000 +pbt_start_mutation=20000000 +pbt_replace_fraction=0.3 +pbt_mutation_rate=0.15 +pbt_replace_reward_gap=0.1 +pbt_replace_reward_gap_absolute=1e-06 +pbt_optimize_gamma=False +pbt_target_objective=true_objective +pbt_perturb_min=1.1 +pbt_perturb_max=1.5 +num_agents=-1 +num_humans=0 +num_bots=-1 +start_bot_difficulty=None +timelimit=None +res_w=128 +res_h=72 +wide_aspect_ratio=False +eval_env_frameskip=1 +fps=35 +command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 +cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} +git_hash=unknown +git_repo_name=not a git repository +[2024-11-08 07:45:16,921][41694] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... +[2024-11-08 07:45:16,924][41694] Rollout worker 0 uses device cpu +[2024-11-08 07:45:16,926][41694] Rollout worker 1 uses device cpu +[2024-11-08 07:45:16,928][41694] Rollout worker 2 uses device cpu +[2024-11-08 07:45:16,930][41694] Rollout worker 3 uses device cpu +[2024-11-08 07:45:16,933][41694] Rollout worker 4 uses device cpu +[2024-11-08 07:45:16,938][41694] Rollout worker 5 uses device cpu +[2024-11-08 07:45:16,940][41694] Rollout worker 6 uses device cpu +[2024-11-08 07:45:16,942][41694] Rollout worker 7 uses device cpu +[2024-11-08 07:45:16,944][41694] Rollout worker 8 uses device cpu +[2024-11-08 07:45:16,947][41694] Rollout worker 9 uses device cpu +[2024-11-08 07:45:16,949][41694] Rollout worker 10 uses device cpu +[2024-11-08 07:45:16,952][41694] Rollout worker 11 uses device cpu +[2024-11-08 07:45:16,954][41694] Rollout worker 12 uses device cpu +[2024-11-08 07:45:16,956][41694] Rollout worker 13 uses device cpu +[2024-11-08 07:45:16,958][41694] Rollout worker 14 uses device cpu +[2024-11-08 07:45:16,960][41694] Rollout worker 15 uses device cpu +[2024-11-08 07:45:17,210][41694] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-08 07:45:17,213][41694] InferenceWorker_p0-w0: min num requests: 5 +[2024-11-08 07:45:17,322][41694] Starting all processes... +[2024-11-08 07:45:17,323][41694] Starting process learner_proc0 +[2024-11-08 07:45:17,406][41694] Starting all processes... +[2024-11-08 07:45:17,418][41694] Starting process inference_proc0-0 +[2024-11-08 07:45:17,420][41694] Starting process rollout_proc0 +[2024-11-08 07:45:17,421][41694] Starting process rollout_proc1 +[2024-11-08 07:45:17,421][41694] Starting process rollout_proc2 +[2024-11-08 07:45:17,424][41694] Starting process rollout_proc3 +[2024-11-08 07:45:17,426][41694] Starting process rollout_proc4 +[2024-11-08 07:45:17,433][41694] Starting process rollout_proc5 +[2024-11-08 07:45:17,440][41694] Starting process rollout_proc6 +[2024-11-08 07:45:17,452][41694] Starting process rollout_proc7 +[2024-11-08 07:45:17,453][41694] Starting process rollout_proc8 +[2024-11-08 07:45:17,454][41694] Starting process rollout_proc9 +[2024-11-08 07:45:17,463][41694] Starting process rollout_proc10 +[2024-11-08 07:45:17,467][41694] Starting process rollout_proc11 +[2024-11-08 07:45:17,479][41694] Starting process rollout_proc12 +[2024-11-08 07:45:17,487][41694] Starting process rollout_proc13 +[2024-11-08 07:45:17,487][41694] Starting process rollout_proc14 +[2024-11-08 07:45:17,620][41694] Starting process rollout_proc15 +[2024-11-08 12:00:59,827][06692] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... +[2024-11-08 12:00:59,831][06692] Rollout worker 0 uses device cpu +[2024-11-08 12:00:59,833][06692] Rollout worker 1 uses device cpu +[2024-11-08 12:00:59,835][06692] Rollout worker 2 uses device cpu +[2024-11-08 12:00:59,837][06692] Rollout worker 3 uses device cpu +[2024-11-08 12:00:59,840][06692] Rollout worker 4 uses device cpu +[2024-11-08 12:00:59,843][06692] Rollout worker 5 uses device cpu +[2024-11-08 12:00:59,846][06692] Rollout worker 6 uses device cpu +[2024-11-08 12:00:59,847][06692] Rollout worker 7 uses device cpu +[2024-11-08 12:00:59,947][06692] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-08 12:00:59,962][06692] InferenceWorker_p0-w0: min num requests: 2 +[2024-11-08 12:01:00,052][06692] Starting all processes... +[2024-11-08 12:01:00,053][06692] Starting process learner_proc0 +[2024-11-08 12:01:00,262][06692] Starting all processes... +[2024-11-08 12:01:00,351][06692] Starting process inference_proc0-0 +[2024-11-08 12:01:00,352][06692] Starting process rollout_proc0 +[2024-11-08 12:01:00,352][06692] Starting process rollout_proc1 +[2024-11-08 12:01:00,353][06692] Starting process rollout_proc2 +[2024-11-08 12:01:00,354][06692] Starting process rollout_proc3 +[2024-11-08 12:01:00,354][06692] Starting process rollout_proc4 +[2024-11-08 12:01:00,355][06692] Starting process rollout_proc5 +[2024-11-08 12:01:00,355][06692] Starting process rollout_proc6 +[2024-11-08 12:01:00,356][06692] Starting process rollout_proc7 +[2024-11-08 12:01:06,252][06692] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 6692], exiting... +[2024-11-08 12:01:06,254][06692] Runner profile tree view: +main_loop: 6.2030 +[2024-11-08 12:01:06,258][06692] Collected {}, FPS: 0.0 +[2024-11-08 12:01:09,627][06692] Environment doom_basic already registered, overwriting... +[2024-11-08 12:01:09,647][06692] Environment doom_two_colors_easy already registered, overwriting... +[2024-11-08 12:01:09,649][06692] Environment doom_two_colors_hard already registered, overwriting... +[2024-11-08 12:01:09,655][06692] Environment doom_dm already registered, overwriting... +[2024-11-08 12:01:09,656][06692] Environment doom_dwango5 already registered, overwriting... +[2024-11-08 12:01:09,657][06692] Environment doom_my_way_home_flat_actions already registered, overwriting... +[2024-11-08 12:01:09,658][06692] Environment doom_defend_the_center_flat_actions already registered, overwriting... +[2024-11-08 12:01:09,659][06692] Environment doom_my_way_home already registered, overwriting... +[2024-11-08 12:01:09,660][06692] Environment doom_deadly_corridor already registered, overwriting... +[2024-11-08 12:01:09,661][06692] Environment doom_defend_the_center already registered, overwriting... +[2024-11-08 12:01:09,676][06692] Environment doom_defend_the_line already registered, overwriting... +[2024-11-08 12:01:09,677][06692] Environment doom_health_gathering already registered, overwriting... +[2024-11-08 12:01:09,678][06692] Environment doom_health_gathering_supreme already registered, overwriting... +[2024-11-08 12:01:09,679][06692] Environment doom_battle already registered, overwriting... +[2024-11-08 12:01:09,682][06692] Environment doom_battle2 already registered, overwriting... +[2024-11-08 12:01:09,683][06692] Environment doom_duel_bots already registered, overwriting... +[2024-11-08 12:01:09,684][06692] Environment doom_deathmatch_bots already registered, overwriting... +[2024-11-08 12:01:09,685][06692] Environment doom_duel already registered, overwriting... +[2024-11-08 12:01:09,686][06692] Environment doom_deathmatch_full already registered, overwriting... +[2024-11-08 12:01:09,697][06692] Environment doom_benchmark already registered, overwriting... +[2024-11-08 12:01:09,698][06692] register_encoder_factory: +[2024-11-08 12:01:14,737][17665] Worker 3 uses CPU cores [3] +[2024-11-08 12:01:14,738][17667] Worker 5 uses CPU cores [5] +[2024-11-08 12:01:14,742][17664] Worker 1 uses CPU cores [1] +[2024-11-08 12:01:14,737][17674] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] +[2024-11-08 12:01:14,742][17676] Worker 6 uses CPU cores [6] +[2024-11-08 12:01:14,743][17663] Worker 0 uses CPU cores [0] +[2024-11-08 12:01:14,744][17662] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-08 12:01:14,744][17662] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-11-08 12:01:14,744][17666] Worker 2 uses CPU cores [2] +[2024-11-08 12:01:14,745][17649] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-08 12:01:14,746][17649] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-11-08 12:01:14,748][17675] Worker 4 uses CPU cores [4] +[2024-11-08 12:01:14,839][17665] Stopping RolloutWorker_w3... +[2024-11-08 12:01:14,839][17666] Stopping RolloutWorker_w2... +[2024-11-08 12:01:14,840][17666] Loop rollout_proc2_evt_loop terminating... +[2024-11-08 12:01:14,840][17663] Stopping RolloutWorker_w0... +[2024-11-08 12:01:14,841][17663] Loop rollout_proc0_evt_loop terminating... +[2024-11-08 12:01:14,842][17675] Stopping RolloutWorker_w4... +[2024-11-08 12:01:14,842][17675] Loop rollout_proc4_evt_loop terminating... +[2024-11-08 12:01:14,840][17665] Loop rollout_proc3_evt_loop terminating... +[2024-11-08 12:01:14,860][17674] Stopping RolloutWorker_w7... +[2024-11-08 12:01:14,862][17674] Loop rollout_proc7_evt_loop terminating... +[2024-11-08 12:01:14,867][17676] Stopping RolloutWorker_w6... +[2024-11-08 12:01:14,868][17676] Loop rollout_proc6_evt_loop terminating... +[2024-11-08 12:01:14,868][17649] Num visible devices: 1 +[2024-11-08 12:01:14,870][17664] Stopping RolloutWorker_w1... +[2024-11-08 12:01:14,871][17664] Loop rollout_proc1_evt_loop terminating... +[2024-11-08 12:01:14,874][17667] Stopping RolloutWorker_w5... +[2024-11-08 12:01:14,875][17667] Loop rollout_proc5_evt_loop terminating... +[2024-11-08 12:01:14,924][17662] Num visible devices: 1 +[2024-11-08 12:01:14,971][17649] Starting seed is not provided +[2024-11-08 12:01:14,971][17649] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-08 12:01:14,972][17649] Initializing actor-critic model on device cuda:0 +[2024-11-08 12:01:14,982][17662] Stopping InferenceWorker_p0-w0... +[2024-11-08 12:01:14,983][17662] Loop inference_proc0-0_evt_loop terminating... +[2024-11-08 12:01:14,984][17649] RunningMeanStd input shape: (3, 72, 128) +[2024-11-08 12:01:14,986][17649] Stopping Batcher_0... +[2024-11-08 12:01:14,988][17649] Loop batcher_evt_loop terminating... +[2024-11-08 12:01:15,076][17649] RunningMeanStd input shape: (1,) +[2024-11-08 12:01:15,240][17649] ConvEncoder: input_channels=3 +[2024-11-08 12:01:16,515][17649] Conv encoder output size: 512 +[2024-11-08 12:01:16,516][17649] Policy head output size: 512 +[2024-11-08 12:01:16,589][17649] Created Actor Critic model with architecture: +[2024-11-08 12:01:16,592][17649] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-11-08 12:01:18,383][17649] Using optimizer +[2024-11-08 12:01:22,789][06692] Environment doom_basic already registered, overwriting... +[2024-11-08 12:01:22,794][06692] Environment doom_two_colors_easy already registered, overwriting... +[2024-11-08 12:01:22,797][06692] Environment doom_two_colors_hard already registered, overwriting... +[2024-11-08 12:01:22,798][06692] Environment doom_dm already registered, overwriting... +[2024-11-08 12:01:22,802][06692] Environment doom_dwango5 already registered, overwriting... +[2024-11-08 12:01:22,803][06692] Environment doom_my_way_home_flat_actions already registered, overwriting... +[2024-11-08 12:01:22,805][06692] Environment doom_defend_the_center_flat_actions already registered, overwriting... +[2024-11-08 12:01:22,806][06692] Environment doom_my_way_home already registered, overwriting... +[2024-11-08 12:01:22,808][06692] Environment doom_deadly_corridor already registered, overwriting... +[2024-11-08 12:01:22,812][06692] Environment doom_defend_the_center already registered, overwriting... +[2024-11-08 12:01:22,814][06692] Environment doom_defend_the_line already registered, overwriting... +[2024-11-08 12:01:22,816][06692] Environment doom_health_gathering already registered, overwriting... +[2024-11-08 12:01:22,818][06692] Environment doom_health_gathering_supreme already registered, overwriting... +[2024-11-08 12:01:22,821][06692] Environment doom_battle already registered, overwriting... +[2024-11-08 12:01:22,823][06692] Environment doom_battle2 already registered, overwriting... +[2024-11-08 12:01:22,824][06692] Environment doom_duel_bots already registered, overwriting... +[2024-11-08 12:01:22,826][06692] Environment doom_deathmatch_bots already registered, overwriting... +[2024-11-08 12:01:22,827][06692] Environment doom_duel already registered, overwriting... +[2024-11-08 12:01:22,830][06692] Environment doom_deathmatch_full already registered, overwriting... +[2024-11-08 12:01:22,832][06692] Environment doom_benchmark already registered, overwriting... +[2024-11-08 12:01:22,834][06692] register_encoder_factory: +[2024-11-08 12:01:22,894][06692] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-08 12:01:22,896][06692] Overriding arg 'learning_rate' with value 0.0001 passed from command line +[2024-11-08 12:01:22,906][06692] Experiment dir /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment already exists! +[2024-11-08 12:01:22,908][06692] Resuming existing experiment from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment... +[2024-11-08 12:01:22,910][06692] Weights and Biases integration disabled +[2024-11-08 12:01:22,930][06692] Environment var CUDA_VISIBLE_DEVICES is 0 + +[2024-11-08 12:01:25,372][17649] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000052658_215687168.pth... +[2024-11-08 12:01:25,874][17649] Loading model from checkpoint +[2024-11-08 12:01:25,876][17649] Loaded experiment state at self.train_step=52658, self.env_steps=215687168 +[2024-11-08 12:01:25,879][17649] Initialized policy 0 weights for model version 52658 +[2024-11-08 12:01:25,932][17649] LearnerWorker_p0 finished initialization! +[2024-11-08 12:01:25,934][17649] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000052658_215687168.pth... +[2024-11-08 12:01:26,130][17649] Stopping LearnerWorker_p0... +[2024-11-08 12:01:26,130][17649] Loop learner_proc0_evt_loop terminating... +[2024-11-08 12:01:26,813][06692] Starting experiment with the following configuration: +help=False +algo=APPO +env=doom_health_gathering_supreme +experiment=default_experiment +train_dir=/root/hfRL/ml/LunarLander-v2/train_dir +restart_behavior=resume +device=gpu +seed=None +num_policies=1 +async_rl=True +serial_mode=False +batched_sampling=False +num_batches_to_accumulate=2 +worker_num_splits=2 +policy_workers_per_policy=1 +max_policy_lag=1000 +num_workers=8 +num_envs_per_worker=4 +batch_size=1024 +num_batches_per_epoch=1 +num_epochs=1 +rollout=32 +recurrence=32 +shuffle_minibatches=False +gamma=0.99 +reward_scale=1.0 +reward_clip=1000.0 +value_bootstrap=False +normalize_returns=True +exploration_loss_coeff=0.001 +value_loss_coeff=0.5 +kl_loss_coeff=0.0 +exploration_loss=symmetric_kl +gae_lambda=0.95 +ppo_clip_ratio=0.1 +ppo_clip_value=0.2 +with_vtrace=False +vtrace_rho=1.0 +vtrace_c=1.0 +optimizer=adam +adam_eps=1e-06 +adam_beta1=0.9 +adam_beta2=0.999 +max_grad_norm=4.0 +learning_rate=0.0001 +lr_schedule=constant +lr_schedule_kl_threshold=0.008 +lr_adaptive_min=1e-06 +lr_adaptive_max=0.01 +obs_subtract_mean=0.0 +obs_scale=255.0 +normalize_input=True +normalize_input_keys=None +decorrelate_experience_max_seconds=0 +decorrelate_envs_on_one_worker=True +actor_worker_gpus=[] +set_workers_cpu_affinity=True +force_envs_single_thread=False +default_niceness=0 +log_to_file=True +experiment_summaries_interval=10 +flush_summaries_interval=30 +stats_avg=100 +summaries_use_frameskip=True +heartbeat_interval=20 +heartbeat_reporting_interval=600 +train_for_env_steps=4000000 +train_for_seconds=10000000000 +save_every_sec=120 +keep_checkpoints=2 +load_checkpoint_kind=latest +save_milestones_sec=-1 +save_best_every_sec=5 +save_best_metric=reward +save_best_after=100000 +benchmark=False +encoder_mlp_layers=[512, 512] +encoder_conv_architecture=convnet_simple +encoder_conv_mlp_layers=[512] +use_rnn=True +rnn_size=512 +rnn_type=gru +rnn_num_layers=1 +decoder_mlp_layers=[] +nonlinearity=elu +policy_initialization=orthogonal +policy_init_gain=1.0 +actor_critic_share_weights=True +adaptive_stddev=True +continuous_tanh_scale=0.0 +initial_stddev=1.0 +use_env_info_cache=False +env_gpu_actions=False +env_gpu_observations=True +env_frameskip=4 +env_framestack=1 +pixel_format=CHW +use_record_episode_statistics=False +with_wandb=False +wandb_user=None +wandb_project=sample_factory +wandb_group=None +wandb_job_type=SF +wandb_tags=[] +with_pbt=False +pbt_mix_policies_in_one_env=True +pbt_period_env_steps=5000000 +pbt_start_mutation=20000000 +pbt_replace_fraction=0.3 +pbt_mutation_rate=0.15 +pbt_replace_reward_gap=0.1 +pbt_replace_reward_gap_absolute=1e-06 +pbt_optimize_gamma=False +pbt_target_objective=true_objective +pbt_perturb_min=1.1 +pbt_perturb_max=1.5 +num_agents=-1 +num_humans=0 +num_bots=-1 +start_bot_difficulty=None +timelimit=None +res_w=128 +res_h=72 +wide_aspect_ratio=False +eval_env_frameskip=1 +fps=35 +command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 +cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} +git_hash=unknown +git_repo_name=not a git repository +[2024-11-08 12:01:26,815][06692] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... +[2024-11-08 12:01:26,817][06692] Rollout worker 0 uses device cpu +[2024-11-08 12:01:26,818][06692] Rollout worker 1 uses device cpu +[2024-11-08 12:01:26,820][06692] Rollout worker 2 uses device cpu +[2024-11-08 12:01:26,823][06692] Rollout worker 3 uses device cpu +[2024-11-08 12:01:26,824][06692] Rollout worker 4 uses device cpu +[2024-11-08 12:01:26,826][06692] Rollout worker 5 uses device cpu +[2024-11-08 12:01:26,828][06692] Rollout worker 6 uses device cpu +[2024-11-08 12:01:26,831][06692] Rollout worker 7 uses device cpu +[2024-11-08 12:01:26,901][06692] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-08 12:01:26,902][06692] InferenceWorker_p0-w0: min num requests: 2 +[2024-11-08 12:01:26,939][06692] Starting all processes... +[2024-11-08 12:01:26,941][06692] Starting process learner_proc0 +[2024-11-08 12:01:26,990][06692] Starting all processes... +[2024-11-08 12:01:26,997][06692] Starting process inference_proc0-0 +[2024-11-08 12:01:26,999][06692] Starting process rollout_proc0 +[2024-11-08 12:01:27,000][06692] Starting process rollout_proc1 +[2024-11-08 12:01:27,000][06692] Starting process rollout_proc2 +[2024-11-08 12:01:27,000][06692] Starting process rollout_proc3 +[2024-11-08 12:01:27,001][06692] Starting process rollout_proc4 +[2024-11-08 12:01:27,002][06692] Starting process rollout_proc5 +[2024-11-08 12:01:27,004][06692] Starting process rollout_proc6 +[2024-11-08 12:01:27,004][06692] Starting process rollout_proc7 +[2024-11-08 12:01:32,928][17877] Worker 1 uses CPU cores [1] +[2024-11-08 12:01:33,039][17861] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-08 12:01:33,039][17861] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-11-08 12:01:33,097][17861] Num visible devices: 1 +[2024-11-08 12:01:33,183][17861] Starting seed is not provided +[2024-11-08 12:01:33,183][17861] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-08 12:01:33,184][17861] Initializing actor-critic model on device cuda:0 +[2024-11-08 12:01:33,184][17861] RunningMeanStd input shape: (3, 72, 128) +[2024-11-08 12:01:33,188][17861] RunningMeanStd input shape: (1,) +[2024-11-08 12:01:33,249][17861] ConvEncoder: input_channels=3 +[2024-11-08 12:01:33,355][17879] Worker 4 uses CPU cores [4] +[2024-11-08 12:01:33,596][17861] Conv encoder output size: 512 +[2024-11-08 12:01:33,598][17861] Policy head output size: 512 +[2024-11-08 12:01:33,695][17861] Created Actor Critic model with architecture: +[2024-11-08 12:01:33,695][17861] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-11-08 12:01:34,059][17888] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] +[2024-11-08 12:01:34,072][17878] Worker 3 uses CPU cores [3] +[2024-11-08 12:01:34,082][17881] Worker 6 uses CPU cores [6] +[2024-11-08 12:01:34,113][17875] Worker 0 uses CPU cores [0] +[2024-11-08 12:01:34,658][17876] Worker 2 uses CPU cores [2] +[2024-11-08 12:01:34,663][17874] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-08 12:01:34,663][17874] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-11-08 12:01:34,708][17861] Using optimizer +[2024-11-08 12:01:34,736][17880] Worker 5 uses CPU cores [5] +[2024-11-08 12:01:34,741][17874] Num visible devices: 1 +[2024-11-08 12:01:44,181][17861] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000052658_215687168.pth... +[2024-11-08 12:01:44,868][17861] Loading model from checkpoint +[2024-11-08 12:01:44,870][17861] Loaded experiment state at self.train_step=52658, self.env_steps=215687168 +[2024-11-08 12:01:44,871][17861] Initialized policy 0 weights for model version 52658 +[2024-11-08 12:01:44,881][17861] LearnerWorker_p0 finished initialization! +[2024-11-08 12:01:44,882][17861] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-08 12:01:45,350][17874] RunningMeanStd input shape: (3, 72, 128) +[2024-11-08 12:01:45,354][17874] RunningMeanStd input shape: (1,) +[2024-11-08 12:01:45,374][17874] ConvEncoder: input_channels=3 +[2024-11-08 12:01:45,652][17874] Conv encoder output size: 512 +[2024-11-08 12:01:45,652][17874] Policy head output size: 512 +[2024-11-08 12:01:45,837][06692] Inference worker 0-0 is ready! +[2024-11-08 12:01:45,839][06692] All inference workers are ready! Signal rollout workers to start! +[2024-11-08 12:01:46,396][17878] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-08 12:01:46,397][17879] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-08 12:01:46,401][17876] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-08 12:01:46,402][17877] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-08 12:01:46,404][17888] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-08 12:01:46,404][17875] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-08 12:01:46,406][17880] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-08 12:01:46,407][17881] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-08 12:01:46,892][06692] Heartbeat connected on Batcher_0 +[2024-11-08 12:01:46,897][06692] Heartbeat connected on LearnerWorker_p0 +[2024-11-08 12:01:46,953][06692] Heartbeat connected on InferenceWorker_p0-w0 +[2024-11-08 12:01:47,940][06692] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 215687168. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-08 12:01:50,226][17876] Decorrelating experience for 0 frames... +[2024-11-08 12:01:50,226][17881] Decorrelating experience for 0 frames... +[2024-11-08 12:01:50,226][17879] Decorrelating experience for 0 frames... +[2024-11-08 12:01:50,227][17875] Decorrelating experience for 0 frames... +[2024-11-08 12:01:50,228][17877] Decorrelating experience for 0 frames... +[2024-11-08 12:01:50,227][17878] Decorrelating experience for 0 frames... +[2024-11-08 12:01:50,227][17888] Decorrelating experience for 0 frames... +[2024-11-08 12:01:50,728][17879] Decorrelating experience for 32 frames... +[2024-11-08 12:01:50,732][17876] Decorrelating experience for 32 frames... +[2024-11-08 12:01:50,746][17880] Decorrelating experience for 0 frames... +[2024-11-08 12:01:50,753][17877] Decorrelating experience for 32 frames... +[2024-11-08 12:01:50,758][17875] Decorrelating experience for 32 frames... +[2024-11-08 12:01:50,885][17881] Decorrelating experience for 32 frames... +[2024-11-08 12:01:51,186][17878] Decorrelating experience for 32 frames... +[2024-11-08 12:01:51,215][17880] Decorrelating experience for 32 frames... +[2024-11-08 12:01:51,413][17879] Decorrelating experience for 64 frames... +[2024-11-08 12:01:51,522][17877] Decorrelating experience for 64 frames... +[2024-11-08 12:01:51,633][17876] Decorrelating experience for 64 frames... +[2024-11-08 12:01:51,829][17875] Decorrelating experience for 64 frames... +[2024-11-08 12:01:51,976][17880] Decorrelating experience for 64 frames... +[2024-11-08 12:01:52,069][17879] Decorrelating experience for 96 frames... +[2024-11-08 12:01:52,170][06692] Heartbeat connected on RolloutWorker_w4 +[2024-11-08 12:01:52,220][17881] Decorrelating experience for 64 frames... +[2024-11-08 12:01:52,240][17878] Decorrelating experience for 64 frames... +[2024-11-08 12:01:54,117][06692] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 215687168. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-08 12:01:54,146][17876] Decorrelating experience for 96 frames... +[2024-11-08 12:01:54,806][06692] Heartbeat connected on RolloutWorker_w2 +[2024-11-08 12:01:54,907][17875] Decorrelating experience for 96 frames... +[2024-11-08 12:01:54,931][17881] Decorrelating experience for 96 frames... +[2024-11-08 12:01:54,983][17880] Decorrelating experience for 96 frames... +[2024-11-08 12:01:55,030][06692] Heartbeat connected on RolloutWorker_w0 +[2024-11-08 12:01:55,080][06692] Heartbeat connected on RolloutWorker_w6 +[2024-11-08 12:01:55,122][06692] Heartbeat connected on RolloutWorker_w5 +[2024-11-08 12:01:55,697][17877] Decorrelating experience for 96 frames... +[2024-11-08 12:01:55,802][06692] Heartbeat connected on RolloutWorker_w1 +[2024-11-08 12:01:56,207][17878] Decorrelating experience for 96 frames... +[2024-11-08 12:01:56,293][06692] Heartbeat connected on RolloutWorker_w3 +[2024-11-08 12:01:56,695][17888] Decorrelating experience for 32 frames... +[2024-11-08 12:01:57,422][17888] Decorrelating experience for 64 frames... +[2024-11-08 12:01:57,623][17888] Decorrelating experience for 96 frames... +[2024-11-08 12:01:57,807][06692] Heartbeat connected on RolloutWorker_w7 +[2024-11-08 12:01:57,931][06692] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 215687168. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-08 12:02:02,930][06692] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 215687168. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-08 12:02:07,930][06692] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 215687168. Throughput: 0: 4.9. Samples: 98. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-08 12:02:10,302][17861] Signal inference workers to stop experience collection... +[2024-11-08 12:02:10,316][17874] InferenceWorker_p0-w0: stopping experience collection +[2024-11-08 12:02:12,930][06692] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 215687168. Throughput: 0: 101.7. Samples: 2542. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-08 12:02:12,940][06692] Avg episode reward: [(0, '2.321')] +[2024-11-08 12:02:17,931][06692] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 215687168. Throughput: 0: 84.7. Samples: 2542. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-08 12:02:17,933][06692] Avg episode reward: [(0, '2.321')] +[2024-11-08 12:02:22,930][06692] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 215687168. Throughput: 0: 72.6. Samples: 2542. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-08 12:02:22,931][06692] Avg episode reward: [(0, '2.321')] +[2024-11-08 12:02:27,180][17861] Signal inference workers to resume experience collection... +[2024-11-08 12:02:27,181][17874] InferenceWorker_p0-w0: resuming experience collection +[2024-11-08 12:02:27,205][17861] Stopping Batcher_0... +[2024-11-08 12:02:27,206][17861] Loop batcher_evt_loop terminating... +[2024-11-08 12:02:27,206][06692] Component Batcher_0 stopped! +[2024-11-08 12:02:27,335][17874] Weights refcount: 2 0 +[2024-11-08 12:02:27,340][17874] Stopping InferenceWorker_p0-w0... +[2024-11-08 12:02:27,340][17874] Loop inference_proc0-0_evt_loop terminating... +[2024-11-08 12:02:27,340][06692] Component InferenceWorker_p0-w0 stopped! +[2024-11-08 12:02:27,667][17876] Stopping RolloutWorker_w2... +[2024-11-08 12:02:27,668][17876] Loop rollout_proc2_evt_loop terminating... +[2024-11-08 12:02:27,668][17878] Stopping RolloutWorker_w3... +[2024-11-08 12:02:27,669][17878] Loop rollout_proc3_evt_loop terminating... +[2024-11-08 12:02:27,667][06692] Component RolloutWorker_w2 stopped! +[2024-11-08 12:02:27,670][17877] Stopping RolloutWorker_w1... +[2024-11-08 12:02:27,671][17877] Loop rollout_proc1_evt_loop terminating... +[2024-11-08 12:02:27,670][06692] Component RolloutWorker_w3 stopped! +[2024-11-08 12:02:27,675][06692] Component RolloutWorker_w1 stopped! +[2024-11-08 12:02:27,682][17881] Stopping RolloutWorker_w6... +[2024-11-08 12:02:27,683][17881] Loop rollout_proc6_evt_loop terminating... +[2024-11-08 12:02:27,682][06692] Component RolloutWorker_w4 stopped! +[2024-11-08 12:02:27,684][06692] Component RolloutWorker_w6 stopped! +[2024-11-08 12:02:27,683][17879] Stopping RolloutWorker_w4... +[2024-11-08 12:02:27,687][17879] Loop rollout_proc4_evt_loop terminating... +[2024-11-08 12:02:27,724][17880] Stopping RolloutWorker_w5... +[2024-11-08 12:02:27,724][06692] Component RolloutWorker_w5 stopped! +[2024-11-08 12:02:27,729][17880] Loop rollout_proc5_evt_loop terminating... +[2024-11-08 12:02:27,739][06692] Component RolloutWorker_w7 stopped! +[2024-11-08 12:02:27,738][17888] Stopping RolloutWorker_w7... +[2024-11-08 12:02:27,743][17888] Loop rollout_proc7_evt_loop terminating... +[2024-11-08 12:02:27,814][17875] Stopping RolloutWorker_w0... +[2024-11-08 12:02:27,814][06692] Component RolloutWorker_w0 stopped! +[2024-11-08 12:02:27,815][17875] Loop rollout_proc0_evt_loop terminating... +[2024-11-08 12:02:29,069][17861] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000052660_215695360.pth... +[2024-11-08 12:02:29,781][17861] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000052490_214999040.pth +[2024-11-08 12:02:30,218][17861] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000052660_215695360.pth... +[2024-11-08 12:02:30,354][17861] Stopping LearnerWorker_p0... +[2024-11-08 12:02:30,354][17861] Loop learner_proc0_evt_loop terminating... +[2024-11-08 12:02:30,354][06692] Component LearnerWorker_p0 stopped! +[2024-11-08 12:02:30,357][06692] Waiting for process learner_proc0 to stop... +[2024-11-08 12:02:37,870][06692] Waiting for process inference_proc0-0 to join... +[2024-11-08 12:02:37,873][06692] Waiting for process rollout_proc0 to join... +[2024-11-08 12:02:37,875][06692] Waiting for process rollout_proc1 to join... +[2024-11-08 12:02:37,876][06692] Waiting for process rollout_proc2 to join... +[2024-11-08 12:02:37,879][06692] Waiting for process rollout_proc3 to join... +[2024-11-08 12:02:37,881][06692] Waiting for process rollout_proc4 to join... +[2024-11-08 12:02:37,884][06692] Waiting for process rollout_proc5 to join... +[2024-11-08 12:02:37,887][06692] Waiting for process rollout_proc6 to join... +[2024-11-08 12:02:37,889][06692] Waiting for process rollout_proc7 to join... +[2024-11-08 12:02:37,892][06692] Batcher 0 profile tree view: +batching: 0.4873, releasing_batches: 0.0007 +[2024-11-08 12:02:37,894][06692] InferenceWorker_p0-w0 profile tree view: +wait_policy: 0.0052 + wait_policy_total: 6.1678 +update_model: 1.9686 + weight_update: 0.0025 +one_step: 0.0132 + handle_policy_step: 16.0007 + deserialize: 0.6107, stack: 0.0072, obs_to_device_normalize: 1.9624, forward: 12.8832, send_messages: 0.1214 + prepare_outputs: 0.3447 + to_cpu: 0.2406 +[2024-11-08 12:02:37,897][06692] Learner 0 profile tree view: +misc: 0.0000, prepare_batch: 6.9579 +train: 12.7083 + epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0015, kl_divergence: 0.0890, after_optimizer: 0.5321 + calculate_losses: 2.5129 + losses_init: 0.0000, forward_head: 0.5594, bptt_initial: 1.1007, tail: 0.2342, advantages_returns: 0.0047, losses: 0.4035 + bptt: 0.2086 + bptt_forward_core: 0.2083 + update: 9.5358 + clip: 1.8980 +[2024-11-08 12:02:37,899][06692] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.0006, enqueue_policy_requests: 0.0516, env_step: 0.4258, overhead: 0.0350, complete_rollouts: 0.0013 +save_policy_outputs: 0.0427 + split_output_tensors: 0.0154 +[2024-11-08 12:02:37,900][06692] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.0006, enqueue_policy_requests: 0.0485, env_step: 0.7034, overhead: 0.0313, complete_rollouts: 0.0009 +save_policy_outputs: 0.0506 + split_output_tensors: 0.0176 +[2024-11-08 12:02:37,905][06692] Loop Runner_EvtLoop terminating... +[2024-11-08 12:02:37,908][06692] Runner profile tree view: +main_loop: 70.9686 +[2024-11-08 12:02:37,910][06692] Collected {0: 215695360}, FPS: 115.4 +[2024-11-08 12:03:38,495][06692] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-08 12:03:38,497][06692] Overriding arg 'num_workers' with value 1 passed from command line +[2024-11-08 12:03:38,500][06692] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-11-08 12:03:38,502][06692] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-11-08 12:03:38,506][06692] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-11-08 12:03:38,508][06692] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-11-08 12:03:38,510][06692] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2024-11-08 12:03:38,512][06692] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-11-08 12:03:38,514][06692] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-11-08 12:03:38,516][06692] Adding new argument 'hf_repository'='alidenewade/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2024-11-08 12:03:38,519][06692] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-11-08 12:03:38,520][06692] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-11-08 12:03:38,523][06692] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-11-08 12:03:38,525][06692] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-11-08 12:03:38,527][06692] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-11-08 12:03:39,172][06692] RunningMeanStd input shape: (3, 72, 128) +[2024-11-08 12:03:39,188][06692] RunningMeanStd input shape: (1,) +[2024-11-08 12:03:39,373][06692] ConvEncoder: input_channels=3 +[2024-11-08 12:03:39,687][06692] Conv encoder output size: 512 +[2024-11-08 12:03:39,689][06692] Policy head output size: 512 +[2024-11-08 12:03:39,874][06692] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000052660_215695360.pth... +[2024-11-08 12:03:42,529][06692] Num frames 100... +[2024-11-08 12:03:42,799][06692] Num frames 200... +[2024-11-08 12:03:43,041][06692] Num frames 300... +[2024-11-08 12:03:43,299][06692] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-08 12:03:43,306][06692] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-08 12:03:43,364][06692] Num frames 400... +[2024-11-08 12:03:43,673][06692] Num frames 500... +[2024-11-08 12:03:44,025][06692] Num frames 600... +[2024-11-08 12:03:44,333][06692] Num frames 700... +[2024-11-08 12:03:44,635][06692] Num frames 800... +[2024-11-08 12:03:44,794][06692] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 +[2024-11-08 12:03:44,800][06692] Avg episode reward: 4.660, avg true_objective: 4.160 +[2024-11-08 12:03:45,030][06692] Num frames 900... +[2024-11-08 12:03:45,539][06692] Num frames 1000... +[2024-11-08 12:03:45,822][06692] Num frames 1100... +[2024-11-08 12:03:46,172][06692] Num frames 1200... +[2024-11-08 12:03:46,365][06692] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 +[2024-11-08 12:03:46,369][06692] Avg episode reward: 4.387, avg true_objective: 4.053 +[2024-11-08 12:03:46,701][06692] Num frames 1300... +[2024-11-08 12:03:46,949][06692] Num frames 1400... +[2024-11-08 12:03:47,178][06692] Num frames 1500... +[2024-11-08 12:03:47,442][06692] Num frames 1600... +[2024-11-08 12:03:47,644][06692] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 +[2024-11-08 12:03:47,651][06692] Avg episode reward: 4.660, avg true_objective: 4.160 +[2024-11-08 12:03:47,812][06692] Num frames 1700... +[2024-11-08 12:03:48,120][06692] Num frames 1800... +[2024-11-08 12:03:48,514][06692] Num frames 1900... +[2024-11-08 12:03:48,817][06692] Num frames 2000... +[2024-11-08 12:03:49,165][06692] Num frames 2100... +[2024-11-08 12:03:49,261][06692] Avg episode rewards: #0: 4.824, true rewards: #0: 4.224 +[2024-11-08 12:03:49,267][06692] Avg episode reward: 4.824, avg true_objective: 4.224 +[2024-11-08 12:03:49,681][06692] Num frames 2200... +[2024-11-08 12:03:50,299][06692] Num frames 2300... +[2024-11-08 12:03:50,741][06692] Num frames 2400... +[2024-11-08 12:03:51,110][06692] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 +[2024-11-08 12:03:51,111][06692] Avg episode reward: 4.660, avg true_objective: 4.160 +[2024-11-08 12:03:51,143][06692] Num frames 2500... +[2024-11-08 12:03:51,850][06692] Num frames 2600... +[2024-11-08 12:03:52,271][06692] Num frames 2700... +[2024-11-08 12:03:52,719][06692] Num frames 2800... +[2024-11-08 12:03:52,976][06692] Avg episode rewards: #0: 4.594, true rewards: #0: 4.023 +[2024-11-08 12:03:52,980][06692] Avg episode reward: 4.594, avg true_objective: 4.023 +[2024-11-08 12:03:53,304][06692] Num frames 2900... +[2024-11-08 12:03:53,750][06692] Num frames 3000... +[2024-11-08 12:03:54,125][06692] Num frames 3100... +[2024-11-08 12:03:54,492][06692] Num frames 3200... +[2024-11-08 12:03:54,762][06692] Avg episode rewards: #0: 4.705, true rewards: #0: 4.080 +[2024-11-08 12:03:54,766][06692] Avg episode reward: 4.705, avg true_objective: 4.080 +[2024-11-08 12:03:54,894][06692] Num frames 3300... +[2024-11-08 12:03:55,215][06692] Num frames 3400... +[2024-11-08 12:03:55,518][06692] Num frames 3500... +[2024-11-08 12:03:55,841][06692] Num frames 3600... +[2024-11-08 12:03:56,060][06692] Avg episode rewards: #0: 4.609, true rewards: #0: 4.053 +[2024-11-08 12:03:56,064][06692] Avg episode reward: 4.609, avg true_objective: 4.053 +[2024-11-08 12:03:56,251][06692] Num frames 3700... +[2024-11-08 12:03:56,543][06692] Num frames 3800... +[2024-11-08 12:03:56,855][06692] Num frames 3900... +[2024-11-08 12:03:57,177][06692] Num frames 4000... +[2024-11-08 12:03:57,418][06692] Avg episode rewards: #0: 4.664, true rewards: #0: 4.064 +[2024-11-08 12:03:57,422][06692] Avg episode reward: 4.664, avg true_objective: 4.064 +[2024-11-08 12:04:13,149][06692] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4!