Insert Mouse Battery ResNet18 Reward V1
This repository contains a binary ResNet18 reward model for the insert-mouse-battery task.
Input Format
- Image input: RGB three-view vertical stack in Wan world-model format.
- View order:
cam_high + cam_left_wrist + cam_right_wrist. - Image size:
[3, 544, 320]as CHW tensor. - Layout: each view is
180x320; the stacked image is540x320; the full stack is resized to544x320. - Intended preprocessing: the same normalization/preprocessing path used by the RLinf ResNet reward model.
Output Format
The checkpoint is a binary reward model. For a single input image, the model outputs one scalar logit. Applying sigmoid gives the estimated full-task success probability.
Files
| File | Description |
|---|---|
full_weights.pt |
RLinf ResNet18 reward checkpoint. |
model_metadata.json |
Input/output format and dataset construction metadata. |
eval_summary.json |
Train/validation/hard-validation metrics. |
train_grid.jpg |
Sampled training examples. |
val_grid.jpg |
Sampled validation examples. |
hard_val_grid.jpg |
Sampled hard-validation examples. |
Test Results
| split | samples | positives | negatives | accuracy | AUC | loss |
|---|---|---|---|---|---|---|
| 5582 | 2791 | 2791 | 0.9946 | 0.9999 | 0.0189 | |
| 1388 | 694 | 694 | 0.9856 | 0.9915 | 0.1085 | |
| 536 | 268 | 268 | 0.9832 | 0.9921 | 0.1076 |
Training Data
The dataset uses three-view videos from expert-data, success-and-hil-data, and failure-data.
- Positive samples: tail frames from expert and success/HIL episodes.
- Negative samples: early and middle frames from expert and success/HIL episodes, plus sampled frames from failure episodes.
- Train/validation splitting is done at the episode level.
- The train and validation sets are class-balanced.