Qwen3.5-4B Capability Vector v1 (cross-model contrast, 2026-05-14)

STATUS — null result on aggregate. This was the first attempt at a CAA-style additive "capability direction". Across ~80 docker runs of terminal-bench-2 sweeps, the vector did not produce a statistically significant pass-rate lift. Discriminator AUC=1.0 in residual stream, but the direction encodes output style (parse_fail ↔ no_cmd trade-off), not task-solving capability. See capvec-v2-samemodel and abliterated repos for the follow-up experiments that confirmed this.

What this is

A residual-stream direction tensor for Qwen/Qwen3.5-4B, computed via mean-difference between 5 SFT-successful agent traces and 12 base/cp600/DPO-failing traces on terminal-bench-2 sprint. Adapted from NousResearch/llm-abliteration and failspy's ortho cookbook but inverted: we add the direction at inference instead of orthogonally removing a refusal direction from weights.

Per-layer AUC ranking

Layers 12–22 all reach AUC=1.000 on per-trace projection separation. Layer 22 was picked for the published dir.pt (max margin among 1.0-AUC layers).

layer	AUC	margin
22	1.000	1.95
19	1.000	1.44
26	0.98	2.65

See vectors/ranking.csv for all 32 layers.

Behavioural results (terminal-bench-2)

sweep	configuration	pass / N	Fisher's p vs base
α=4 sweep, log-summary	steered-L22-α4	1/3	0.21
α-grid log-summary	α ∈ {2,4,6,8}, L26-α4, N=5 each	0/20	—
wide-task sweep (5 sprint tasks)	steered-L22-α4	1/5	1.0 (same as base)
multi-layer L13/16/19/22	α=0.5, 1.0	0/9	—
negative α (sub)	α=−2, α=−4	1/12	0.62

Net: null lift across all sweeps. The single early α=4 win on log-summary did not replicate.

Quick use (still functional as a research artifact)

import torch
from transformers import AutoTokenizer, AutoModelForImageTextToText
from huggingface_hub import hf_hub_download

tok = AutoTokenizer.from_pretrained('Qwen/Qwen3.5-4B')
model = AutoModelForImageTextToText.from_pretrained(
    'Qwen/Qwen3.5-4B', dtype=torch.bfloat16, device_map={'':0})

vec_path = hf_hub_download('AlexWortega/qwen3.5-4b-capability-vector-20260514', 'vectors/dir.pt')
vec = torch.load(vec_path, weights_only=False)
# Best layer = 22. Apply at inference via residual-stream hook.

See scripts/steer.py for the hook implementation.

Files

vectors/dir.pt — 32 unit-norm direction tensors (one per decoder layer)
vectors/ranking.csv — per-layer AUC/margin
scripts/* — full reproducer (collect/capture/compute/steer/serve/sweep)
TASK.md RESEARCH.md PLAN.md RESULTS.md VERIFY.md — full report bundle

Caveats

5 positive traces is small. AUC=1.0 with that n is real but tight.
The direction conflates "agent-trace style" with "task-solving" — extracted contrast was between different LoRAs producing different traces, not the same model under different conditions. v2 (same-model contrast) addresses this confound.
α≥6 induces "I am done" early-bail behavior. α≤2 has no measurable effect.
α=4 sits at the over-steering boundary on log-summary, capable of one-trial flips that don't replicate.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AlexWortega/qwen3.5-4b-capability-vector-20260514

Base model

Qwen/Qwen3.5-4B-Base

Finetuned

Qwen/Qwen3.5-4B

Finetuned

(290)

this model