Qwen3.5-9B-Antirep

A merged Qwen3.5-9B model with an anti-repetition DPO (Direct Preference Optimization) adapter baked in. This model reduces the tendency of Qwen3.5-9B to fall into repetitive generation loops, particularly during long-form outputs.

Inspired by ConicCat/Qwen3.5-Antirep-27B.

Training Details

  • Base model: Qwen/Qwen3.5-9B
  • Method: QLoRA DPO training, then merged into the base model
  • Training data: 481 on-policy preference pairs (chosen = clean completions with thinking traces, rejected = degenerate repetitive loops)
  • Categories: general (181), reasoning (106), code (93), math (73), safety (28)
  • LoRA config: r=32, alpha=16, RSLoRA, targeting all attention + MLP modules
  • DPO config: beta=0.1, sigmoid loss
  • Training: 3 epochs, 363 steps, ~48 minutes on a single RTX 3090
  • Sequence length: 768 tokens (max_prompt=256, max_completion=512)
  • Optimizer: paged_adamw_8bit, lr=5e-6, cosine schedule, warmup_ratio=0.1

Training Metrics

Epoch Avg Loss Reward Margins Accuracy
1 ~0.065 ~5 100%
2 ~0.001 ~10 100%
3 ~0.002 ~8-13 100%

Evaluation

Tested on prompts that previously triggered repetitive outputs from the base model:

Model Repetition Rate Avg Repetition Score
Base Qwen3.5-9B 10% 0.033
This model 0% 0.010

The adapter reduces sub-threshold repetition scores by ~50-70% and eliminates hard repetition loops.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("ToastyPigeon/Qwen3.5-9B-Antirep")
tokenizer = AutoTokenizer.from_pretrained("ToastyPigeon/Qwen3.5-9B-Antirep")

Architecture

Qwen3.5-9B is a hybrid architecture with 32 layers:

  • 24 GDN (Gated Delta Net / linear attention) layers
  • 8 standard full attention layers (every 4th layer)
  • 9B parameters, 248K vocabulary, bf16

The LoRA adapter targeted all attention projections (GDN: in_proj_qkv, in_proj_z, in_proj_a, in_proj_b, out_proj; standard: q/k/v/o_proj) and all MLP projections (gate/up/down_proj).

Downloads last month
37
Safetensors
Model size
9B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ToastyPigeon/Qwen3.5-9B-Antirep

Finetuned
Qwen/Qwen3.5-9B
Adapter
(110)
this model
Adapters
2 models