wuc1/bi_so101_ffp_20260603_200349_subtask
Viewer • Updated • 127k • 26
How to use wuc1/sarm_current_only_full_dataset with LeRobot:
A Success-Aware Reward Model (SARM) predicts a dense reward signal from observations, typically used downstream for reinforcement learning or human-in-the-loop fine-tuning when task success is not directly observable.
This reward model has been trained and pushed to the Hub using LeRobot. See the full documentation at LeRobot Docs.
lerobot-train \
--dataset.repo_id=${HF_USER}/<dataset> \
--reward_model.type=sarm \
--output_dir=outputs/train/<desired_reward_model_repo_id> \
--job_name=lerobot_reward_training \
--reward_model.device=cuda \
--reward_model.repo_id=${HF_USER}/<desired_reward_model_repo_id> \
--wandb.enable=true
Writes checkpoints to outputs/train/<desired_reward_model_repo_id>/checkpoints/.
from lerobot.rewards import make_reward_model
reward_model = make_reward_model(pretrained_path="<hf_user>/<reward_model_repo_id>")
reward = reward_model.compute_reward(batch)