W0225 19:21:58.664000 31390 site-packages/torch/distributed/run.py:852] W0225 19:21:58.664000 31390 site-packages/torch/distributed/run.py:852] ***************************************** W0225 19:21:58.664000 31390 site-packages/torch/distributed/run.py:852] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. W0225 19:21:58.664000 31390 site-packages/torch/distributed/run.py:852] ***************************************** /home/zihan/miniconda3/envs/video-r1-bw/lib/python3.11/site-packages/transformers/utils/hub.py:110: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn( /home/zihan/miniconda3/envs/video-r1-bw/lib/python3.11/site-packages/transformers/utils/hub.py:110: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn( /home/zihan/miniconda3/envs/video-r1-bw/lib/python3.11/site-packages/transformers/utils/hub.py:110: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn( /home/zihan/miniconda3/envs/video-r1-bw/lib/python3.11/site-packages/transformers/utils/hub.py:110: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn( [2026-02-25 19:22:02,641] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (override) [2026-02-25 19:22:02,645] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (override) [2026-02-25 19:22:02,657] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (override) [2026-02-25 19:22:02,688] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (override) VIDEO_MAX_PIXELS: 100352 VIDEO_MAX_PIXELS: 100352 VIDEO_MAX_PIXELS: 100352 VIDEO_MAX_PIXELS: 100352 [2026-02-25 19:22:03,946] [INFO] [comm.py:652:init_distributed] cdb=None [2026-02-25 19:22:04,000] [INFO] [comm.py:652:init_distributed] cdb=None [2026-02-25 19:22:04,008] [INFO] [comm.py:652:init_distributed] cdb=None [2026-02-25 19:22:04,008] [INFO] [comm.py:683:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [2026-02-25 19:22:04,043] [INFO] [comm.py:652:init_distributed] cdb=None [INFO] Dataset split: train=402, test=40 using: [grpo] Loading MemGen model with dtype=torch.bfloat16, weight_sharing=False [MemGen] Loading base models from: Qwen/Qwen2.5-VL-7B-Instruct [MemGen] dtype=torch.bfloat16, attn=sdpa, weight_sharing=False `torch_dtype` is deprecated! Use `dtype` instead! [INFO] Dataset split: train=402, test=40 using: [grpo] Loading MemGen model with dtype=torch.bfloat16, weight_sharing=False [MemGen] Loading base models from: Qwen/Qwen2.5-VL-7B-Instruct [MemGen] dtype=torch.bfloat16, attn=sdpa, weight_sharing=False `torch_dtype` is deprecated! Use `dtype` instead! [INFO] Dataset split: train=402, test=40 [INFO] Dataset split: train=402, test=40 using: [grpo] Loading MemGen model with dtype=torch.bfloat16, weight_sharing=False [MemGen] Loading base models from: Qwen/Qwen2.5-VL-7B-Instruct [MemGen] dtype=torch.bfloat16, attn=sdpa, weight_sharing=False `torch_dtype` is deprecated! Use `dtype` instead! using: [grpo] Loading MemGen model with dtype=torch.bfloat16, weight_sharing=False [MemGen] Loading base models from: Qwen/Qwen2.5-VL-7B-Instruct [MemGen] dtype=torch.bfloat16, attn=sdpa, weight_sharing=False `torch_dtype` is deprecated! Use `dtype` instead! Loading checkpoint shards: 0%| | 0/5 [00:00, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=no, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=1e-05, length_column_name=length, liger_kernel_config=None, load_best_model_at_end=False, local_rank=3, log_completions=False, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/mnt/nas2/zihan/Video-R1-Workspace/checkpoints/vismem_rl_7b_woH_membonus_mem8_20260225_192156/runs/Feb25_19-22-03_memories-burlingame, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=IntervalStrategy.STEPS, lr_scheduler_kwargs={}, lr_scheduler_type=SchedulerType.COSINE, max_completion_length=768, max_grad_norm=10.0, max_prompt_length=16384, max_steps=-1, metric_for_best_model=None, min_p=None, model_init_kwargs=None, mp_parameters=, neftune_noise_alpha=None, no_cuda=False, num_generations=8, num_iterations=1, num_train_epochs=2.0, optim=OptimizerNames.ADAMW_TORCH_FUSED, optim_args=None, optim_target_modules=None, output_dir=/mnt/nas2/zihan/Video-R1-Workspace/checkpoints/vismem_rl_7b_woH_membonus_mem8_20260225_192156, overwrite_output_dir=False, parallelism_config=None, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=1, prediction_loss_only=False, project=huggingface, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, ref_model_mixup_alpha=0.6, ref_model_sync_steps=512, remove_unused_columns=False, repetition_penalty=1.0, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, reward_weights=None, run_name=vismem_rl_7b_woH_membonus_mem8_20260225_192156, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=100.0, save_strategy=no, save_total_limit=None, scale_rewards=True, seed=42, skip_memory_metrics=True, sync_ref_model=False, temperature=0.9, tf32=None, top_k=50, top_p=1.0, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, trackio_space_id=trackio, use_cpu=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, use_vllm=False, vllm_device=None, vllm_dtype=None, vllm_enable_prefix_caching=None, vllm_gpu_memory_utilization=None, vllm_guided_decoding_regex=None, vllm_max_model_len=None, vllm_server_host=0.0.0.0, vllm_server_port=8000, vllm_server_timeout=120.0, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.01, ) [grpo] DeepSpeed config before trainer: zero_stage=2 [grpo] DeepSpeed full zero_optimization config: {'stage': 2, 'offload_optimizer': {'device': 'none', 'pin_memory': True}, 'allgather_partitions': True, 'allgather_bucket_size': 200000000.0, 'overlap_comm': False, 'reduce_scatter': False, 'reduce_bucket_size': 200000000.0, 'contiguous_gradients': False, 'ignore_unused_parameters': True} [VisMemModel] Loaded vismem_params.pt [grpo] VisMemModel created - Query len: 8, Mem len: 8 - use_context_in_former: False - LoRA: r=16, targets=['q_proj', 'k_proj', 'v_proj', 'o_proj'] [VisMemModel] Loaded vismem_params.pt [grpo] MemGen model created - Prompt latents: 0 - Inference latents: 8 - Trainable params: 120,296,960 (100.00%) [grpo] VisMemModel created - Query len: 8, Mem len: 8 - use_context_in_former: False - LoRA: r=16, targets=['q_proj', 'k_proj', 'v_proj', 'o_proj'] [grpo] Memory footprint: - Reasoner: 15816MiB - Weaver: 0MiB - Trainable: 248MiB [grpo] aug_skip_prob set to 0.5 [LatentModelBase] Gradient checkpointing enabled (use_reentrant=False) [grpo] MemGen model created - Prompt latents: 0 - Inference latents: 8 - Trainable params: 120,296,960 (100.00%) [VisMemModel] GC enabled on reasoner with {'use_reentrant': False} [VisMemModel] Gradient checkpointing enabled (external checkpoint for form_memory) [grpo] Enabled gradient checkpointing with kwargs={'use_reentrant': False} [grpo] use_latent=True, use_video_memory=False, use_vismem_video=False, use_vismem=True, model=VisMemModel [grpo] training_args: GRPOConfig( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=True, batch_eval_metrics=False, beta=0.0, bf16=True, bf16_full_eval=False, cache_implementation=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=/home/zihan/proj-eidos/third_party/Video-R1/src/r1-v/local_scripts/zero2_noreduce.json, disable_tqdm=False, do_eval=True, do_predict=False, do_train=False, ds3_gather_for_generation=True, epsilon=0.2, epsilon_high=None, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=20, eval_strategy=IntervalStrategy.STEPS, eval_use_gather_object=False, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=2, gradient_checkpointing=True, gradient_checkpointing_kwargs={'use_reentrant': False}, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=None, hub_private_repo=None, hub_revision=None, hub_strategy=HubStrategy.EVERY_SAVE, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=no, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=1e-05, length_column_name=length, liger_kernel_config=None, load_best_model_at_end=False, local_rank=1, log_completions=False, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/mnt/nas2/zihan/Video-R1-Workspace/checkpoints/vismem_rl_7b_woH_membonus_mem8_20260225_192156/runs/Feb25_19-22-04_memories-burlingame, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=IntervalStrategy.STEPS, lr_scheduler_kwargs={}, lr_scheduler_type=SchedulerType.COSINE, max_completion_length=768, max_grad_norm=10.0, max_prompt_length=16384, max_steps=-1, metric_for_best_model=None, min_p=None, model_init_kwargs=None, mp_parameters=, neftune_noise_alpha=None, no_cuda=False, num_generations=8, num_iterations=1, num_train_epochs=2.0, optim=OptimizerNames.ADAMW_TORCH_FUSED, optim_args=None, optim_target_modules=None, output_dir=/mnt/nas2/zihan/Video-R1-Workspace/checkpoints/vismem_rl_7b_woH_membonus_mem8_20260225_192156, overwrite_output_dir=False, parallelism_config=None, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=1, prediction_loss_only=False, project=huggingface, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, ref_model_mixup_alpha=0.6, ref_model_sync_steps=512, remove_unused_columns=False, repetition_penalty=1.0, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, reward_weights=None, run_name=vismem_rl_7b_woH_membonus_mem8_20260225_192156, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=100.0, save_strategy=no, save_total_limit=None, scale_rewards=True, seed=42, skip_memory_metrics=True, sync_ref_model=False, temperature=0.9, tf32=None, top_k=50, top_p=1.0, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, trackio_space_id=trackio, use_cpu=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, use_vllm=False, vllm_device=None, vllm_dtype=None, vllm_enable_prefix_caching=None, vllm_gpu_memory_utilization=None, vllm_guided_decoding_regex=None, vllm_max_model_len=None, vllm_server_host=0.0.0.0, vllm_server_port=8000, vllm_server_timeout=120.0, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.01, ) [grpo] DeepSpeed config before trainer: zero_stage=2 [grpo] DeepSpeed full zero_optimization config: {'stage': 2, 'offload_optimizer': {'device': 'none', 'pin_memory': True}, 'allgather_partitions': True, 'allgather_bucket_size': 200000000.0, 'overlap_comm': False, 'reduce_scatter': False, 'reduce_bucket_size': 200000000.0, 'contiguous_gradients': False, 'ignore_unused_parameters': True} [grpo] Memory footprint: - Reasoner: 15816MiB - Weaver: 0MiB - Trainable: 248MiB [grpo] aug_skip_prob set to 0.5 [LatentModelBase] Gradient checkpointing enabled (use_reentrant=False) [VisMemModel] GC enabled on reasoner with {'use_reentrant': False} [VisMemModel] Gradient checkpointing enabled (external checkpoint for form_memory) [grpo] Enabled gradient checkpointing with kwargs={'use_reentrant': False} [grpo] use_latent=True, use_video_memory=False, use_vismem_video=False, use_vismem=True, model=VisMemModel [grpo] training_args: GRPOConfig( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=True, batch_eval_metrics=False, beta=0.0, bf16=True, bf16_full_eval=False, cache_implementation=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=/home/zihan/proj-eidos/third_party/Video-R1/src/r1-v/local_scripts/zero2_noreduce.json, disable_tqdm=False, do_eval=True, do_predict=False, do_train=False, ds3_gather_for_generation=True, epsilon=0.2, epsilon_high=None, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=20, eval_strategy=IntervalStrategy.STEPS, eval_use_gather_object=False, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=2, gradient_checkpointing=True, gradient_checkpointing_kwargs={'use_reentrant': False}, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=None, hub_private_repo=None, hub_revision=None, hub_strategy=HubStrategy.EVERY_SAVE, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=no, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=1e-05, length_column_name=length, liger_kernel_config=None, load_best_model_at_end=False, local_rank=2, log_completions=False, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/mnt/nas2/zihan/Video-R1-Workspace/checkpoints/vismem_rl_7b_woH_membonus_mem8_20260225_192156/runs/Feb25_19-22-03_memories-burlingame, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=IntervalStrategy.STEPS, lr_scheduler_kwargs={}, lr_scheduler_type=SchedulerType.COSINE, max_completion_length=768, max_grad_norm=10.0, max_prompt_length=16384, max_steps=-1, metric_for_best_model=None, min_p=None, model_init_kwargs=None, mp_parameters=, neftune_noise_alpha=None, no_cuda=False, num_generations=8, num_iterations=1, num_train_epochs=2.0, optim=OptimizerNames.ADAMW_TORCH_FUSED, optim_args=None, optim_target_modules=None, output_dir=/mnt/nas2/zihan/Video-R1-Workspace/checkpoints/vismem_rl_7b_woH_membonus_mem8_20260225_192156, overwrite_output_dir=False, parallelism_config=None, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=1, prediction_loss_only=False, project=huggingface, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, ref_model_mixup_alpha=0.6, ref_model_sync_steps=512, remove_unused_columns=False, repetition_penalty=1.0, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, reward_weights=None, run_name=vismem_rl_7b_woH_membonus_mem8_20260225_192156, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=100.0, save_strategy=no, save_total_limit=None, scale_rewards=True, seed=42, skip_memory_metrics=True, sync_ref_model=False, temperature=0.9, tf32=None, top_k=50, top_p=1.0, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, trackio_space_id=trackio, use_cpu=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, use_vllm=False, vllm_device=None, vllm_dtype=None, vllm_enable_prefix_caching=None, vllm_gpu_memory_utilization=None, vllm_guided_decoding_regex=None, vllm_max_model_len=None, vllm_server_host=0.0.0.0, vllm_server_port=8000, vllm_server_timeout=120.0, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.01, ) [grpo] DeepSpeed config before trainer: zero_stage=2 [grpo] DeepSpeed full zero_optimization config: {'stage': 2, 'offload_optimizer': {'device': 'none', 'pin_memory': True}, 'allgather_partitions': True, 'allgather_bucket_size': 200000000.0, 'overlap_comm': False, 'reduce_scatter': False, 'reduce_bucket_size': 200000000.0, 'contiguous_gradients': False, 'ignore_unused_parameters': True} [VisMemModel] Loaded vismem_params.pt [grpo] VisMemModel created - Query len: 8, Mem len: 8 - use_context_in_former: False - LoRA: r=16, targets=['q_proj', 'k_proj', 'v_proj', 'o_proj'] [grpo] MemGen model created - Prompt latents: 0 - Inference latents: 8 - Trainable params: 120,296,960 (100.00%) ============================================================ TRAINABLE PARAMETERS: ============================================================ m_init: (8, 3584) [LORA] memory_former.base_model.model.model.language_model.layers.0.self_attn.q_proj.lora_A.memory_former.weight: (16, 3584) [LORA] memory_former.base_model.model.model.language_model.layers.0.self_attn.q_proj.lora_B.memory_former.weight: (3584, 16) [LORA] memory_former.base_model.model.model.language_model.layers.0.self_attn.k_proj.lora_A.memory_former.weight: (16, 3584) [LORA] memory_former.base_model.model.model.language_model.layers.0.self_attn.k_proj.lora_B.memory_former.weight: (512, 16) [LORA] memory_former.base_model.model.model.language_model.layers.0.self_attn.v_proj.lora_A.memory_former.weight: (16, 3584) [LORA] memory_former.base_model.model.model.language_model.layers.0.self_attn.v_proj.lora_B.memory_former.weight: (512, 16) [LORA] memory_former.base_model.model.model.language_model.layers.0.self_attn.o_proj.lora_A.memory_former.weight: (16, 3584) [LORA] memory_former.base_model.model.model.language_model.layers.0.self_attn.o_proj.lora_B.memory_former.weight: (3584, 16) [LORA] memory_former.base_model.model.model.language_model.layers.1.self_attn.q_proj.lora_A.memory_former.weight: (16, 3584) [LORA] memory_former.base_model.model.model.language_model.layers.1.self_attn.q_proj.lora_B.memory_former.weight: (3584, 16) [LORA] memory_former.base_model.model.model.language_model.layers.1.self_attn.k_proj.lora_A.memory_former.weight: (16, 3584) [LORA] memory_former.base_model.model.model.language_model.layers.1.self_attn.k_proj.lora_B.memory_former.weight: (512, 16) [LORA] memory_former.base_model.model.model.language_model.layers.1.self_attn.v_proj.lora_A.memory_former.weight: (16, 3584) [LORA] memory_former.base_model.model.model.language_model.layers.1.self_attn.v_proj.lora_B.memory_former.weight: (512, 16) [LORA] memory_former.base_model.model.model.language_model.layers.1.self_attn.o_proj.lora_A.memory_former.weight: (16, 3584) [LORA] memory_former.base_model.model.model.language_model.layers.1.self_attn.o_proj.lora_B.memory_former.weight: (3584, 16) [LORA] memory_former.base_model.model.model.language_model.layers.2.self_attn.q_proj.lora_A.memory_former.weight: (16, 3584) [LORA] memory_former.base_model.model.model.language_model.layers.2.self_attn.q_proj.lora_B.memory_former.weight: (3584, 16) [LORA] memory_former.base_model.model.model.language_model.layers.2.self_attn.k_proj.lora_A.memory_former.weight: (16, 3584) ... and 221 more Total: 241 param groups, 120,296,960 params LoRA: 224 param groups ============================================================ [grpo] Memory footprint: - Reasoner: 15816MiB - Weaver: 0MiB - Trainable: 248MiB [grpo] aug_skip_prob set to 0.5 [LatentModelBase] Gradient checkpointing enabled (use_reentrant=False) [VisMemModel] GC enabled on reasoner with {'use_reentrant': False} [VisMemModel] Gradient checkpointing enabled (external checkpoint for form_memory) [grpo] Enabled gradient checkpointing with kwargs={'use_reentrant': False} [grpo] use_latent=True, use_video_memory=False, use_vismem_video=False, use_vismem=True, model=VisMemModel [grpo] training_args: GRPOConfig( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=True, batch_eval_metrics=False, beta=0.0, bf16=True, bf16_full_eval=False, cache_implementation=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=/home/zihan/proj-eidos/third_party/Video-R1/src/r1-v/local_scripts/zero2_noreduce.json, disable_tqdm=False, do_eval=True, do_predict=False, do_train=False, ds3_gather_for_generation=True, epsilon=0.2, epsilon_high=None, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=20, eval_strategy=IntervalStrategy.STEPS, eval_use_gather_object=False, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=2, gradient_checkpointing=True, gradient_checkpointing_kwargs={'use_reentrant': False}, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=None, hub_private_repo=None, hub_revision=None, hub_strategy=HubStrategy.EVERY_SAVE, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=no, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=1e-05, length_column_name=length, liger_kernel_config=None, load_best_model_at_end=False, local_rank=0, log_completions=False, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/mnt/nas2/zihan/Video-R1-Workspace/checkpoints/vismem_rl_7b_woH_membonus_mem8_20260225_192156/runs/Feb25_19-22-04_memories-burlingame, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=IntervalStrategy.STEPS, lr_scheduler_kwargs={}, lr_scheduler_type=SchedulerType.COSINE, max_completion_length=768, max_grad_norm=10.0, max_prompt_length=16384, max_steps=-1, metric_for_best_model=None, min_p=None, model_init_kwargs=None, mp_parameters=, neftune_noise_alpha=None, no_cuda=False, num_generations=8, num_iterations=1, num_train_epochs=2.0, optim=OptimizerNames.ADAMW_TORCH_FUSED, optim_args=None, optim_target_modules=None, output_dir=/mnt/nas2/zihan/Video-R1-Workspace/checkpoints/vismem_rl_7b_woH_membonus_mem8_20260225_192156, overwrite_output_dir=False, parallelism_config=None, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=1, prediction_loss_only=False, project=huggingface, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, ref_model_mixup_alpha=0.6, ref_model_sync_steps=512, remove_unused_columns=False, repetition_penalty=1.0, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, reward_weights=None, run_name=vismem_rl_7b_woH_membonus_mem8_20260225_192156, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=100.0, save_strategy=no, save_total_limit=None, scale_rewards=True, seed=42, skip_memory_metrics=True, sync_ref_model=False, temperature=0.9, tf32=None, top_k=50, top_p=1.0, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, trackio_space_id=trackio, use_cpu=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, use_vllm=False, vllm_device=None, vllm_dtype=None, vllm_enable_prefix_caching=None, vllm_gpu_memory_utilization=None, vllm_guided_decoding_regex=None, vllm_max_model_len=None, vllm_server_host=0.0.0.0, vllm_server_port=8000, vllm_server_timeout=120.0, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.01, ) [grpo] DeepSpeed config before trainer: zero_stage=2 [grpo] DeepSpeed full zero_optimization config: {'stage': 2, 'offload_optimizer': {'device': 'none', 'pin_memory': True}, 'allgather_partitions': True, 'allgather_bucket_size': 200000000.0, 'overlap_comm': False, 'reduce_scatter': False, 'reduce_bucket_size': 200000000.0, 'contiguous_gradients': False, 'ignore_unused_parameters': True} The image processor of type `Qwen2VLImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. Note that this behavior will be extended to all models in a future release. The image processor of type `Qwen2VLImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. Note that this behavior will be extended to all models in a future release. The image processor of type `Qwen2VLImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. Note that this behavior will be extended to all models in a future release. The image processor of type `Qwen2VLImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. Note that this behavior will be extended to all models in a future release. [grpo] DeepSpeed config after trainer: zero_stage=2 [grpo] Accelerator is_deepspeed_enabled: True The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None, 'pad_token_id': 151643}. [grpo] DeepSpeed config after trainer: zero_stage=2 [grpo] Accelerator is_deepspeed_enabled: True The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None, 'pad_token_id': 151643}. [grpo] DeepSpeed config after trainer: zero_stage=2 [grpo] Accelerator is_deepspeed_enabled: True The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None, 'pad_token_id': 151643}. [grpo] DeepSpeed config after trainer: zero_stage=2 [grpo] Accelerator is_deepspeed_enabled: True The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None, 'pad_token_id': 151643}. [LatentModelBase] Gradient checkpointing enabled (use_reentrant=False) [VisMemModel] GC enabled on reasoner with {'use_reentrant': False} [VisMemModel] Gradient checkpointing enabled (external checkpoint for form_memory) [LatentModelBase] Gradient checkpointing enabled (use_reentrant=False) [VisMemModel] GC enabled on reasoner with {'use_reentrant': False} [VisMemModel] Gradient checkpointing enabled (external checkpoint for form_memory) Gradient accumulation steps mismatch: GradientAccumulationPlugin has 1, DeepSpeed config has 2. Using DeepSpeed's value. [LatentModelBase] Gradient checkpointing enabled (use_reentrant=False) [VisMemModel] GC enabled on reasoner with {'use_reentrant': False} [VisMemModel] Gradient checkpointing enabled (external checkpoint for form_memory) [OPTIMIZER] Uniform LR=1e-05 for 241 trainable tensors [OPTIMIZER] Uniform LR=1e-05 for 241 trainable tensors [LatentModelBase] Gradient checkpointing enabled (use_reentrant=False) [VisMemModel] GC enabled on reasoner with {'use_reentrant': False} [VisMemModel] Gradient checkpointing enabled (external checkpoint for form_memory) qwen-vl-utils using decord to read video. qwen-vl-utils using decord to read video. qwen-vl-utils using decord to read video. [LatentModelBase] Gradient checkpointing disabled [LatentModelBase] Gradient checkpointing disabled [LatentModelBase] Gradient checkpointing disabled wandb: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information. wandb: Currently logged in as: ziz131 (ziz131-uc-san-diego). Use `wandb login --relogin` to force relogin wandb: Tracking run with wandb version 0.18.3 wandb: Run data is saved locally in /mnt/nas2/zihan/Video-R1-Workspace/wandb_logs/wandb/run-20260225_192217-jsihcxss wandb: Run `wandb offline` to turn off syncing. wandb: Syncing run vismem_rl_7b_woH_membonus_mem8_20260225_192156 wandb: ⭐️ View project at https://wandb.ai/ziz131-uc-san-diego/video-r1-7b-vismem-rl wandb: 🚀 View run at https://wandb.ai/ziz131-uc-san-diego/video-r1-7b-vismem-rl/runs/jsihcxss 0%| | 0/102 [00:00 Analyze the video and select the best option for the following multiple choice question. Question: What will happen next? Options:Options: A. The red object and the brown cube collide B. The sphere collides with the cylinder Please think about this question as if you were a human pondering deeply. Engage in an internal dialogue using expressions such as 'let me think', 'wait', 'Hmm', 'oh, I see', 'let's break it down', etc, or other natural language thought expressions It's encouraged to include self-reflection or verification in the reasoning process. Provide your detailed reasoning between the tags, and then give your final answer between the tags. Please provide only the single option letter (e.g., A, B, C, D, etc.) within the tags. ------------------------------------------------------------------------------------------------------------------------ [EXPECTED SOLUTION] B ------------------------------------------------------------------------------------------------------------------------ [ROLLOUTS] ------------------------------------------------------------------------------------------------------------------------ Rollout 1/8 completion_tokens : 352 (before EOS mask) / 411 (raw) acc_reward : 1.0000 fmt_reward : 1.0000 total_reward : 3.0000 (= 1.00 + 1.00 + 1.00 bonus) completion (latent-marked): [P_LAT]Let me think about this.[I_LAT_0] The video shows a scene with several objects: a red cube, a brown cube, a small gray cube, and a couple of gold spheres. A purple cylinder and a pink cube have been introduced in the scene.[I_LAT_1] The gold spheres are rolling and seem to be moving towards the center of the scene. Hmm, let's break it down. The red cube is currently not in the scene, but it appears to be moving towards the left side of the image. The spheres are also moving in that direction, and they seem to be colliding with the gray cube.[I_LAT_2] It is possible that the red cube will also move to the left, but the question asks about the next event. [I_LAT_3]Oh, I see. The red cube is moving towards the left, but it is not yet visible in the scene.[I_LAT_4] The gold spheres are already in contact with the gray cube and seem to be continuing to move, which could lead to a collision with the red cube if it enters the scene. Wait, let's consider the options. Option A suggests that the red object and the brown cube collide, but the red object is not yet visible, and the brown cube is not in the path of the gold spheres. Option B states that the sphere will collide with the cylinder. The spheres are moving in the direction of the purple cylinder, so it is possible that they could collide with it. Given this analysis, I believe the most likely next event is that the gold spheres will collide with the purple cylinder. Let's verify this by observing the scene closely. The spheres are moving in the direction of the cylinder, and there are no obstructions preventing this collision. The final answer is B. B<|im_end|> ------------------------------------------------------------------------------------------------------------------------ Rollout 2/8 completion_tokens : 276 (before EOS mask) / 411 (raw) acc_reward : 1.0000 fmt_reward : 1.0000 total_reward : 3.0000 (= 1.00 + 1.00 + 1.00 bonus) completion (latent-marked): [P_LAT]Let me think about this for a moment. The red cube is currently not in contact with the brown cube, so it seems unlikely that they will collide next.[I_LAT_0] However, the sphere is very close to the red cube and has a high chance of colliding with it.[I_LAT_1] The cylinder is further away and not directly involved in the current dynamics. [I_LAT_2]Hmm, let's break it down step by step.[I_LAT_3] The red cube is moving towards the left, and the sphere is positioned near it.[I_LAT_4] The cylinder is further back and doesn't seem to be in immediate proximity to any other objects. Therefore, the most likely next interaction is between the sphere and the red cube. Oh, I see. The red cube is moving towards the left, and the sphere is in the path of this movement, making a collision with the red cube the most probable next event. Wait, let's verify if the cylinder could have a collision. The cylinder is currently above and to the right of the red cube and the sphere, and it's not moving in a way that suggests it will collide with any object soon. Based on this reasoning, the most likely next event is a collision between the red cube and the sphere, not between the red cube and the brown cube, nor the cylinder. So, after careful consideration, my final answer is: B<|im_end|> ------------------------------------------------------------------------------------------------------------------------ Rollout 3/8 completion_tokens : 310 (before EOS mask) / 411 (raw) acc_reward : 0.0000 fmt_reward : 1.0000 total_reward : 1.0000 (= 0.00 + 1.00) completion (latent-marked): [P_LAT]Let me think about this. The video shows a scene with cubes and spheres, and it looks like the cubes are moving towards the center of the scene.[I_LAT_0] The red cube appears to be moving towards the center, and it seems like it might collide with the brown cube if it continues in that direction. However, the question asks about what will happen next, so let's break it down. [I_LAT_1]First, the red cube is moving forward. It looks like it could potentially collide with the brown cube if it continues straight ahead.[I_LAT_2] But, the question doesn't specify a particular path or collision, it just asks about the next event. Given the current positions and movements, it seems the red cube is moving towards the brown cube, and a collision is likely the next event. Hmm, let's consider the options: [I_LAT_3]A. The red object and the brown cube collide [I_LAT_4]B. The sphere collides with the cylinder The sphere and the cylinder seem to be moving in different directions and don't appear to be in a direct path towards each other. So, option B seems less likely. Oh, I see. Based on the current movements and the question's context, it seems the most logical next event would be the collision between the red cube and the brown cube. Let's verify this. The red cube is moving towards the brown cube, and it looks like they are on a direct collision path. Therefore, the next event will likely be a collision between the red cube and the brown cube. A<|im_end|> ------------------------------------------------------------------------------------------------------------------------ Rollout 4/8 completion_tokens : 201 (before EOS mask) / 411 (raw) acc_reward : 1.0000 fmt_reward : 1.0000 total_reward : 3.0000 (= 1.00 + 1.00 + 1.00 bonus) completion (latent-marked): [P_LAT]Let me think about this... The red object is currently not colliding with the brown cube.[I_LAT_0] The sphere is colliding with the red object, but the red object is not colliding with the brown cube. Let's break it down further.[I_LAT_1] The red object is closer to the sphere than to the brown cube.[I_LAT_2] If the red object continues to move in its current direction, it will eventually collide with the sphere. Hmm, the red object doesn't appear to be directly heading towards the brown cube.[I_LAT_3] The brown cube is further away.[I_LAT_4] Oh, I see. The red object will likely collide with the sphere before it reaches the brown cube. Therefore, the next event will be a collision between the red object and the sphere. Now, let's verify. Yes, the red object is heading towards the sphere, and there's no indication that it will move away from the sphere soon. Hence, the collision will happen first with the sphere. B<|im_end|> ------------------------------------------------------------------------------------------------------------------------ Rollout 5/8 completion_tokens : 213 (before EOS mask) / 411 (raw) acc_reward : 1.0000 fmt_reward : 1.0000 total_reward : 3.0000 (= 1.00 + 1.00 + 1.00 bonus) completion (latent-marked): [P_LAT]Let me think about this. The video shows various objects moving around on a flat surface.[I_LAT_0] Currently, the red object appears to be approaching the sphere, which is also moving.[I_LAT_1] The red object is positioned to the left of the sphere, and it seems to be moving towards it.[I_LAT_2] If the red object continues its path, it will likely collide with the sphere.[I_LAT_3] [I_LAT_4]Hmm, let's break it down. The red object is a cube, and it appears to be solid. The sphere is metallic and appears to be in motion as well. The cylinder that was seen earlier is now stationary and positioned near the brown cube. Oh, I see. Given the current position and motion of the red object and the sphere, the most likely scenario is that the red object will collide with the sphere before it collides with the cylinder. Therefore, the cylinder will not be the next collision point for the red object. After analyzing the options, the best choice is B. The sphere collides with the cylinder. B<|im_end|> ------------------------------------------------------------------------------------------------------------------------ Rollout 6/8 completion_tokens : 411 (before EOS mask) / 411 (raw) acc_reward : 1.0000 fmt_reward : 1.0000 total_reward : 3.0000 (= 1.00 + 1.00 + 1.00 bonus) completion (latent-marked): [P_LAT]Let me think about this for a moment. The current scene shows several objects in a room, including a gray cube, a brown cube, two golden spheres, a red cube, and a purple cylinder. First, let's analyze the positions of these objects.[I_LAT_0] The gray cube and the brown cube are relatively far apart from each other.[I_LAT_1] The golden spheres are in the foreground and seem to be moving towards the gray cube.[I_LAT_2] The red cube and the purple cylinder are also present in the scene. [I_LAT_3]Now, let's break it down further.[I_LAT_4] The red cube and the purple cylinder are positioned on the same side of the gray cube. The golden spheres are moving towards the gray cube, which suggests a potential collision. Hmm, the question asks what will happen next. The red cube and the brown cube are currently not in close proximity, so they are unlikely to collide in the near future. The purple cylinder and the gray cube are also not likely to collide immediately, as the golden spheres appear to be the ones moving towards the gray cube. Wait, I see. The golden spheres are indeed moving towards the gray cube, and the red cube is positioned in a way that it could potentially collide with the gray cube. However, the golden spheres might also collide with the purple cylinder first, as they are closer to the cylinder than the gray cube. Let's consider the options: A. The red object and the brown cube collide. B. The sphere collides with the cylinder. The golden spheres are moving towards the gray cube, but they could also collide with the purple cylinder. The purple cylinder is positioned between the gray cube and the golden spheres, so it's a more likely candidate for a collision first. Final verification: The golden spheres are closer to the purple cylinder than to the gray cube, and they are moving towards it. Therefore, the next collision is more likely to be with the purple cylinder. Oh, I see. The golden spheres will most likely collide with the purple cylinder next. B<|im_end|> ------------------------------------------------------------------------------------------------------------------------ Rollout 7/8 completion_tokens : 164 (before EOS mask) / 411 (raw) acc_reward : 1.0000 fmt_reward : 1.0000 total_reward : 3.0000 (= 1.00 + 1.00 + 1.00 bonus) completion (latent-marked): [P_LAT]Let me think about this for a moment. The red object and the brown cube are currently separated by a distance, so the likelihood of a collision between them seems low.[I_LAT_0] Let's focus on the sphere.[I_LAT_1] It appears to be moving and may have a higher chance of colliding with other objects. The cylinder is also moving and might collide with the sphere.[I_LAT_2] However, the red object and the brown cube are not in the immediate path of the sphere or the cylinder. [I_LAT_3]Hmm, let's break it down again. The sphere is moving towards the left, and the cylinder is following a similar path. They seem to be on a collision course.[I_LAT_4] Oh, I see. The sphere is close to the cylinder, and their paths intersect, indicating a possible collision. B<|im_end|> ------------------------------------------------------------------------------------------------------------------------ Rollout 8/8 completion_tokens : 246 (before EOS mask) / 411 (raw) acc_reward : 1.0000 fmt_reward : 1.0000 total_reward : 3.0000 (= 1.00 + 1.00 + 1.00 bonus) completion (latent-marked): [P_LAT]Let me think about this for a moment. The video shows various objects in a scene, including cubes, a sphere, and a cylinder. The sphere and the cylinder seem to be positioned in a way that suggests a potential collision between them.[I_LAT_0] Let's break it down: [I_LAT_1]1.[I_LAT_2] The sphere is moving towards the cylinder. [I_LAT_3]2.[I_LAT_4] The cylinder is stationary. 3. The sphere and the cylinder are aligned in such a way that they appear to be moving towards each other. Hmm, if the sphere continues its trajectory, it is likely to collide with the cylinder. The other objects in the scene, such as the red cube and the brown cube, are not aligned with the trajectory of the sphere in a way that suggests a potential collision. Wait, I should verify this by looking at the direction and speed of the sphere. The sphere seems to be moving at a speed that would likely cause it to collide with the cylinder if it continues on its current path. Oh, I see. The trajectory of the sphere and the position of the cylinder suggest that a collision is imminent. Therefore, the best option for what will happen next is that the sphere will collide with the cylinder. B<|im_end|> ======================================================================================================================== [VISMEM STATS] qb=0.005288 | m_std=0.005079 m_abs=0.017822 [LatentModelBase] Gradient checkpointing disabled [LatentModelBase] Gradient checkpointing disabled [LatentModelBase] Gradient checkpointing disabled [LatentModelBase] Gradient checkpointing disabled [LatentModelBase] Gradient checkpointing enabled (use_reentrant=False) [VisMemModel] GC enabled on reasoner with {} [VisMemModel] Gradient checkpointing enabled (external checkpoint for form_memory) [LatentModelBase] Gradient checkpointing enabled (use_reentrant=False) [VisMemModel] GC enabled on reasoner with {} [VisMemModel] Gradient checkpointing enabled (external checkpoint for form_memory) [LatentModelBase] Gradient checkpointing enabled (use_reentrant=False) [VisMemModel] GC enabled on reasoner with {} [VisMemModel] Gradient checkpointing enabled (external checkpoint for form_memory) [LatentModelBase] Gradient checkpointing enabled (use_reentrant=False) [VisMemModel] GC enabled on reasoner with {} [VisMemModel] Gradient checkpointing enabled (external checkpoint for form_memory) [MEMORY_REWARD] mem_acc=1.000 nomem_acc=1.000 delta_acc=+0.000 | verdict=HELPED bonus_applied=8 penalty_applied=0 | delta_bonus: ['+1.0', '+1.0', '+1.0', '+1.0', '+1.0', '+1.0', '+1.0', '+1.0'] combined: ['3.00', '3.00', '3.00', '3.00', '3.00', '3.00', '3.00', '3.00'] [MEMORY_REWARD] nomem_acc_per_rollout=['1.0', '1.0', '1.0', '1.0', '1.0', '1.0', '1.0', '1.0'] ======================================================================================================================== [PROBLEM INFO] problem_id : 193135 data_type : video problem_type : multiple choice file : /home/zihan/proj-eidos/third_party/Video-R1/src/r1-v/Video-R1-data/LLaVA-Video-178K/academic_source/youcook2/204/SRdsaqpnfd4/split_3.mp4 ------------------------------------------------------------------------------------------------------------------------ [TOKEN INFO] prompt_length : 1156 video_info : grid=[8,16,30], tokens=960 image_info : N/A ------------------------------------------------------------------------------------------------------------------------ [GENERATION DEBUG] completion_ids_shape : (8, 294) eos_token_id : 151645 eos_idx (per sample) : [293, 200, 230, 268, 191, 214, 159, 217] ------------------------------------------------------------------------------------------------------------------------ [PROMPT] [USER]