PERSUADE DPO LoRA Adapter (Checkpoint 1000)

DPO LoRA adapter trained on SFT-full Qwen3-4B for debate persuasion, part of the PERSUADE framework (A Debate Arena Framework for Benchmarking, Optimizing, and Ethically Auditing Persuasion in Language Models).

Quick Summary


Base model	Mishrakshitij/spar-qwen3-4b-sft-full (Qwen3-4B-Instruct, fully fine-tuned with SFT on arena demonstrations)
Method	DPO (Direct Preference Optimization) with beta=0.2
LoRA config	r=16, alpha=32, dropout=0.05, all linear layers
Preference pairs	81,192 pairs from debate arena
Checkpoint	Step 1000 / 10149

How the Preference Data Was Created

The preference pairs are derived from a large-scale debate arena where 10 LLMs (ranging from 4B to 235B parameters) debated each other on 259 diverse topics.

Arena Setup

Models: Kimi-K2, DeepSeek-R1, Qwen3-235B, Qwen3-235B-Instruct, Qwen3-32B, Qwen3-14B, Qwen3-4B, DeepSeek-V3.1, GLM-4-Plus-0711, LLaMA-3.3-70B
Topics: 259 debate topics spanning politics, ethics, technology, science, society, etc.
Format: 10-turn structured debates (5 turns per side, alternating FOR/AGAINST)
Judge: GPT-5.2 evaluated each debate and declared a winner based on argument quality, evidence, and persuasiveness
Total debates: 13,726 (10,487 train / 3,239 test split by topic — 200 train topics, 59 test topics)

Preference Pair Construction

For each debate involving the chosen model (Kimi-K2, the top-performing arena model), preference pairs are constructed at the turn level:

Identify winning/losing sides: For each debate, the judge's verdict determines which side (FOR or AGAINST) won
Index by (topic, position, turn): Each argument is keyed by its debate topic, the position it argued (FOR/AGAINST), and the turn number
Pair winners with losers: For the same (topic, position, turn) key, arguments from the winning side become chosen and arguments from the losing side become rejected
Cross-product pairing: All winning arguments are paired with all losing arguments for the same key (excluding identical texts), creating a dense set of preference pairs
Think-tag stripping: Internal chain-of-thought (<think>...</think>) tags are stripped from arguments before pairing, so the model learns from clean debate text only

This produces 81,192 preference pairs from the training split, where each pair shares the same debate context (topic, position, turn number) but differs in argument quality as determined by the judge's outcome.

Why This Approach?

Outcome-grounded: Preferences come from actual debate outcomes judged by GPT-5.2, not synthetic ratings
Turn-level granularity: Rather than comparing entire debates, we compare individual arguments at the same point in a debate, giving the model fine-grained signal about what makes a specific argument more persuasive
Cross-model diversity: Since the arena includes 10 diverse models, the rejected arguments represent a wide range of failure modes (weak evidence, poor engagement, repetition, etc.)
Chosen model filter: Using Kimi-K2 (the arena's top performer with 1,666 wins) as the chosen model ensures the preferred arguments represent high-quality persuasive writing

Training Details

Hyperparameters

Parameter	Value
Learning rate	5e-6
Beta (DPO)	0.2
Batch size	8
Epochs	1
Warmup ratio	0.05
Weight decay	0.01
Max grad norm	1.0
Precision	AMP (mixed precision)
PPL upper bound	10.0 (safety stop)

Training Progress at Checkpoint 1000

Metric	Value
Loss	~0.01-0.10 (down from 0.693 initial)
Accuracy	~87-100%
Perplexity	4.70 (stable, initial: 4.68)

How to Use

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the SFT-full base model
base_model = AutoModelForCausalLM.from_pretrained(
    "Mishrakshitij/spar-qwen3-4b-sft-full",
    torch_dtype="auto",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("Mishrakshitij/spar-qwen3-4b-sft-full")

# Apply DPO LoRA adapter
model = PeftModel.from_pretrained(base_model, "Mishrakshitij/spar-dpo-lora-adapter-ckpt1000")

# Optional: merge for faster inference
model = model.merge_and_unload()

Framework versions

PEFT 0.17.1

Downloads last month: 3

Model tree for Mishrakshitij/spar-dpo-lora-adapter-ckpt1000

Base model

Mishrakshitij/spar-qwen3-4b-sft-full

Adapter

(5)

this model