OpenVLA Maniskill RPD Weights

This repo contains the OpenVLA weights used in Refined Policy Distillation (RPD). RPD distills VLAs into small expert policies using online Reinforcement Learning.

Project Page: https://refined-policy-distillation.github.io Code: https://github.com/Refined-Policy-Distillation/RPD

The dataset used to fine-tune this checkpoint can be found here.

Also checkout the RPD Octo weights.

Usage

Adapted from the OpenVLA Repo:

from transformers import AutoModelForVision2Seq, AutoProcessor
from PIL import Image

import torch

# Load Processor & VLA
processor = AutoProcessor.from_pretrained("Juelg/openvla-7b-finetuned-maniskill", trust_remote_code=True)
vla = AutoModelForVision2Seq.from_pretrained(
    "openvla/openvla-7b", 
    attn_implementation="flash_attention_2",  # [Optional] Requires `flash_attn`
    torch_dtype=torch.bfloat16, 
    low_cpu_mem_usage=True, 
    trust_remote_code=True
).to("cuda:0")

# Grab image input & format prompt
image: Image.Image = get_from_camera(...)
prompt = "In: What action should the robot take to {<INSTRUCTION>}?
Out:"

# Predict Action (7-DoF franka; un-normalize for maniskill env)
inputs = processor(prompt, image).to("cuda:0", dtype=torch.bfloat16)
action = vla.predict_action(**inputs, unnorm_key="maniskill_human:7.0.0", do_sample=False)

# Execute...
robot.act(action, ...)

For details on how OpenVLA was used in RPD checkout the RPD Code Repo and the Agents library.

Citation

If you find RPD useful for your work, please consider citing it:

@inproceedings{juelg2025refinedpolicydistillationvla,
    title={{Refined Policy Distillation}: {F}rom {VLA} Generalists to {RL} Experts}, 
    author={Tobias Jülg and Wolfram Burgard and Florian Walter},
    year={2025},
    booktitle={Proc.~of the IEEE/RSJ Int.~Conf.~on Intelligent Robots and Systems (IROS)},
    note={Accepted for publication.}
}
Downloads last month
40
Safetensors
Model size
8B params
Tensor type
BF16
·
Video Preview
loading

Paper for Juelg/openvla-7b-finetuned-maniskill