Refined Policy Distillation: From VLA Generalists to RL Experts
Paper • 2503.05833 • Published
This repo contains the OpenVLA weights used in Refined Policy Distillation (RPD). RPD distills VLAs into small expert policies using online Reinforcement Learning.
Project Page: https://refined-policy-distillation.github.io Code: https://github.com/Refined-Policy-Distillation/RPD
The dataset used to fine-tune this checkpoint can be found here.
Also checkout the RPD Octo weights.
Adapted from the OpenVLA Repo:
from transformers import AutoModelForVision2Seq, AutoProcessor
from PIL import Image
import torch
# Load Processor & VLA
processor = AutoProcessor.from_pretrained("Juelg/openvla-7b-finetuned-maniskill", trust_remote_code=True)
vla = AutoModelForVision2Seq.from_pretrained(
"openvla/openvla-7b",
attn_implementation="flash_attention_2", # [Optional] Requires `flash_attn`
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
trust_remote_code=True
).to("cuda:0")
# Grab image input & format prompt
image: Image.Image = get_from_camera(...)
prompt = "In: What action should the robot take to {<INSTRUCTION>}?
Out:"
# Predict Action (7-DoF franka; un-normalize for maniskill env)
inputs = processor(prompt, image).to("cuda:0", dtype=torch.bfloat16)
action = vla.predict_action(**inputs, unnorm_key="maniskill_human:7.0.0", do_sample=False)
# Execute...
robot.act(action, ...)
For details on how OpenVLA was used in RPD checkout the RPD Code Repo and the Agents library.
If you find RPD useful for your work, please consider citing it:
@inproceedings{juelg2025refinedpolicydistillationvla,
title={{Refined Policy Distillation}: {F}rom {VLA} Generalists to {RL} Experts},
author={Tobias Jülg and Wolfram Burgard and Florian Walter},
year={2025},
booktitle={Proc.~of the IEEE/RSJ Int.~Conf.~on Intelligent Robots and Systems (IROS)},
note={Accepted for publication.}
}