Robotics
LeRobot
Safetensors
subtask-prediction
sarm
reward-model

Reward Model Card for sarm

A Success-Aware Reward Model (SARM) predicts a dense reward signal from observations, typically used downstream for reinforcement learning or human-in-the-loop fine-tuning when task success is not directly observable.

This reward model has been trained and pushed to the Hub using LeRobot. See the full documentation at LeRobot Docs.


How to Get Started with the Reward Model

Train from scratch

lerobot-train \
  --dataset.repo_id=${HF_USER}/<dataset> \
  --reward_model.type=sarm \
  --output_dir=outputs/train/<desired_reward_model_repo_id> \
  --job_name=lerobot_reward_training \
  --reward_model.device=cuda \
  --reward_model.repo_id=${HF_USER}/<desired_reward_model_repo_id> \
  --wandb.enable=true

Writes checkpoints to outputs/train/<desired_reward_model_repo_id>/checkpoints/.

Load the reward model in Python

from lerobot.rewards import make_reward_model

reward_model = make_reward_model(pretrained_path="<hf_user>/<reward_model_repo_id>")
reward = reward_model.compute_reward(batch)

Model Details

  • License: apache-2.0
Downloads last month
21
Safetensors
Model size
0.1B params
Tensor type
F32
·
Video Preview
loading

Dataset used to train wuc1/sarm_current_only_full_dataset