Instructions to use AlexWortega/qwen3.5-4b-capability-vector-20260514 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AlexWortega/qwen3.5-4b-capability-vector-20260514 with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("AlexWortega/qwen3.5-4b-capability-vector-20260514", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Qwen3.5-4B Capability Vector v1 (cross-model contrast, 2026-05-14)
STATUS — null result on aggregate. This was the first attempt at a CAA-style additive "capability direction". Across ~80 docker runs of
terminal-bench-2sweeps, the vector did not produce a statistically significant pass-rate lift. Discriminator AUC=1.0 in residual stream, but the direction encodes output style (parse_fail ↔ no_cmd trade-off), not task-solving capability. See capvec-v2-samemodel and abliterated repos for the follow-up experiments that confirmed this.
What this is
A residual-stream direction tensor for Qwen/Qwen3.5-4B, computed via
mean-difference between 5 SFT-successful agent traces and 12
base/cp600/DPO-failing traces on terminal-bench-2 sprint. Adapted from
NousResearch/llm-abliteration
and failspy's ortho cookbook
but inverted: we add the direction at inference instead of orthogonally
removing a refusal direction from weights.
Per-layer AUC ranking
Layers 12–22 all reach AUC=1.000 on per-trace projection separation.
Layer 22 was picked for the published dir.pt (max margin among 1.0-AUC layers).
| layer | AUC | margin |
|---|---|---|
| 22 | 1.000 | 1.95 |
| 19 | 1.000 | 1.44 |
| 26 | 0.98 | 2.65 |
See vectors/ranking.csv for all 32 layers.
Behavioural results (terminal-bench-2)
| sweep | configuration | pass / N | Fisher's p vs base |
|---|---|---|---|
| α=4 sweep, log-summary | steered-L22-α4 | 1/3 | 0.21 |
| α-grid log-summary | α ∈ {2,4,6,8}, L26-α4, N=5 each | 0/20 | — |
| wide-task sweep (5 sprint tasks) | steered-L22-α4 | 1/5 | 1.0 (same as base) |
| multi-layer L13/16/19/22 | α=0.5, 1.0 | 0/9 | — |
| negative α (sub) | α=−2, α=−4 | 1/12 | 0.62 |
Net: null lift across all sweeps. The single early α=4 win on log-summary did not replicate.
Quick use (still functional as a research artifact)
import torch
from transformers import AutoTokenizer, AutoModelForImageTextToText
from huggingface_hub import hf_hub_download
tok = AutoTokenizer.from_pretrained('Qwen/Qwen3.5-4B')
model = AutoModelForImageTextToText.from_pretrained(
'Qwen/Qwen3.5-4B', dtype=torch.bfloat16, device_map={'':0})
vec_path = hf_hub_download('AlexWortega/qwen3.5-4b-capability-vector-20260514', 'vectors/dir.pt')
vec = torch.load(vec_path, weights_only=False)
# Best layer = 22. Apply at inference via residual-stream hook.
See scripts/steer.py for the hook implementation.
Files
vectors/dir.pt— 32 unit-norm direction tensors (one per decoder layer)vectors/ranking.csv— per-layer AUC/marginscripts/*— full reproducer (collect/capture/compute/steer/serve/sweep)TASK.mdRESEARCH.mdPLAN.mdRESULTS.mdVERIFY.md— full report bundle
Caveats
- 5 positive traces is small. AUC=1.0 with that n is real but tight.
- The direction conflates "agent-trace style" with "task-solving" — extracted contrast was between different LoRAs producing different traces, not the same model under different conditions. v2 (same-model contrast) addresses this confound.
- α≥6 induces "I am done" early-bail behavior. α≤2 has no measurable effect.
- α=4 sits at the over-steering boundary on log-summary, capable of one-trial flips that don't replicate.