opus-27b-dsl-step210-2026-05-02
LoRA adapter trained with reinforcement learning (GRPO via Thinking Machines' Tinker SDK) on the Opus-Magnum puzzle-solving REPL benchmark, snapshotted at training step 210.
Training setup
- Base model:
Qwen/Qwen3.5-27B - Renderer:
qwen3_5_disable_thinking - Representation:
dsl(action language the agent emits) - Adapter: LoRA, rank
32 - RL recipe: GRPO via Tinker
- Hyperparameters:
learning_rate = 1e-5group_size = 8,groups_per_batch = 16max_tokens = 1024,max_trajectory_tokens = 12000distances = 1,2,3,4max_steps_off_policy = Nonesave_every = 5
Files
adapter_model.safetensors— Tinker raw LoRA adapter weightsadapter_config.json— adapter metadata (rank, alpha, target modules)README.md— this file
Provenance
Tinker checkpoint:
tinker://6b93f86a-7788-58f9-ac5a-4a33711d8367:train:0/sampler_weights/000210
Converting to PEFT format
The files above are in Tinker's raw adapter format. To convert to PEFT format
suitable for direct vLLM --lora-modules loading, run on a machine that can
host the base model:
from tinker_cookbook.weights import build_lora_adapter
build_lora_adapter(
base_model="Qwen/Qwen3.5-27B",
adapter_path="./tinker_adapter", # this repo's contents
output_path="./peft_adapter",
)
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for GoodStartLabs/opus-27b-dsl-step210-2026-05-02
Base model
Qwen/Qwen3.5-27B