Wrench 9B — Purpose-Built Agentic Model

A LoRA fine-tuned version of Qwen3.5-9B (dense), purpose-built for tool calling, error recovery, and system prompt following. Runs on 8GB VRAM.

Part of the Wrench family — the 35B sibling scores 82% on the Berkeley Function Calling Leaderboard (BFCL).

Benchmarks

Benchmark	Score	Details
Clank Agentic Benchmark	114/120 (95%)	40-prompt, 8-category tool-calling evaluation

Model	Score	Runs On	Cost
Wrench 35B v7	118/120 + 82% BFCL	16GB GPU	Free
Claude Sonnet 4.6	~114/120	Cloud	$20/mo
Wrench 9B v4	114/120	8GB GPU	Free
GPT-4o	~110/120	Cloud	$20/mo
Claude Haiku	~100/120	Cloud	Paid
Base Qwen 3.5 9B	~50/120	8GB GPU	Free

Download the GGUF and Modelfile from the Files tab, then:

ollama create wrench-9b -f Modelfile
ollama run wrench-9b

./llama-server -m wrench-9B-Q4_K_M.gguf --jinja -ngl 100 -fa on \
  --temp 0.4 --top-k 20 --top-p 0.95 --min-p 0 --presence-penalty 1.5 -c 8192

npm install -g @clanklabs/clank
clank setup
# Set primary model to "ollama/wrench-9b" in config


Base Model	Qwen3.5-9B (dense)
Fine-Tune Method	LoRA (rank 32, alpha 64) via HuggingFace PEFT
Training Data	1,356 examples across 15 categories
Hardware	1x NVIDIA H100 80GB
Training Time	~30 minutes
Final Loss	0.1512
Quantization	Q4_K_M GGUF (~5GB)
Context Window	8,192 tokens
License	Apache 2.0

All training data is published and auditable: ClankLabs/wrench-training-data

1,356 examples (1,251 base + 105 frontier-targeted) across 15 categories.

Base model

Finetuned

Quantized

(170)

this model