Qwen3.5-9B-Antirep

A merged Qwen3.5-9B model with an anti-repetition DPO (Direct Preference Optimization) adapter baked in. This model reduces the tendency of Qwen3.5-9B to fall into repetitive generation loops, particularly during long-form outputs.

Inspired by ConicCat/Qwen3.5-Antirep-27B.

Training Details

Base model: Qwen/Qwen3.5-9B
Method: QLoRA DPO training, then merged into the base model
Training data: 481 on-policy preference pairs (chosen = clean completions with thinking traces, rejected = degenerate repetitive loops)
Categories: general (181), reasoning (106), code (93), math (73), safety (28)
LoRA config: r=32, alpha=16, RSLoRA, targeting all attention + MLP modules
DPO config: beta=0.1, sigmoid loss
Training: 3 epochs, 363 steps, ~48 minutes on a single RTX 3090
Sequence length: 768 tokens (max_prompt=256, max_completion=512)
Optimizer: paged_adamw_8bit, lr=5e-6, cosine schedule, warmup_ratio=0.1

Training Metrics

Epoch	Avg Loss	Reward Margins	Accuracy
1	~0.065	~5	100%
2	~0.001	~10	100%
3	~0.002	~8-13	100%

Evaluation

Tested on prompts that previously triggered repetitive outputs from the base model:

Model	Repetition Rate	Avg Repetition Score
Base Qwen3.5-9B	10%	0.033
This model	0%	0.010

The adapter reduces sub-threshold repetition scores by ~50-70% and eliminates hard repetition loops.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("ToastyPigeon/Qwen3.5-9B-Antirep")
tokenizer = AutoTokenizer.from_pretrained("ToastyPigeon/Qwen3.5-9B-Antirep")

Architecture

Qwen3.5-9B is a hybrid architecture with 32 layers:

24 GDN (Gated Delta Net / linear attention) layers
8 standard full attention layers (every 4th layer)
9B parameters, 248K vocabulary, bf16

The LoRA adapter targeted all attention projections (GDN: in_proj_qkv, in_proj_z, in_proj_a, in_proj_b, out_proj; standard: q/k/v/o_proj) and all MLP projections (gate/up/down_proj).

Downloads last month: 37

Safetensors

Model size

9B params

Tensor type

BF16

Model tree for ToastyPigeon/Qwen3.5-9B-Antirep

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Adapter

(110)

this model

Adapters

2 models