PERSUADE DPO LoRA Adapter (Checkpoint 1000)
DPO LoRA adapter trained on SFT-full Qwen3-4B for debate persuasion, part of the PERSUADE framework (A Debate Arena Framework for Benchmarking, Optimizing, and Ethically Auditing Persuasion in Language Models).
Quick Summary
| Base model | Mishrakshitij/spar-qwen3-4b-sft-full (Qwen3-4B-Instruct, fully fine-tuned with SFT on arena demonstrations) |
| Method | DPO (Direct Preference Optimization) with beta=0.2 |
| LoRA config | r=16, alpha=32, dropout=0.05, all linear layers |
| Preference pairs | 81,192 pairs from debate arena |
| Checkpoint | Step 1000 / 10149 |
How the Preference Data Was Created
The preference pairs are derived from a large-scale debate arena where 10 LLMs (ranging from 4B to 235B parameters) debated each other on 259 diverse topics.
Arena Setup
- Models: Kimi-K2, DeepSeek-R1, Qwen3-235B, Qwen3-235B-Instruct, Qwen3-32B, Qwen3-14B, Qwen3-4B, DeepSeek-V3.1, GLM-4-Plus-0711, LLaMA-3.3-70B
- Topics: 259 debate topics spanning politics, ethics, technology, science, society, etc.
- Format: 10-turn structured debates (5 turns per side, alternating FOR/AGAINST)
- Judge: GPT-5.2 evaluated each debate and declared a winner based on argument quality, evidence, and persuasiveness
- Total debates: 13,726 (10,487 train / 3,239 test split by topic — 200 train topics, 59 test topics)
Preference Pair Construction
For each debate involving the chosen model (Kimi-K2, the top-performing arena model), preference pairs are constructed at the turn level:
- Identify winning/losing sides: For each debate, the judge's verdict determines which side (FOR or AGAINST) won
- Index by (topic, position, turn): Each argument is keyed by its debate topic, the position it argued (FOR/AGAINST), and the turn number
- Pair winners with losers: For the same (topic, position, turn) key, arguments from the winning side become
chosenand arguments from the losing side becomerejected - Cross-product pairing: All winning arguments are paired with all losing arguments for the same key (excluding identical texts), creating a dense set of preference pairs
- Think-tag stripping: Internal chain-of-thought (
<think>...</think>) tags are stripped from arguments before pairing, so the model learns from clean debate text only
This produces 81,192 preference pairs from the training split, where each pair shares the same debate context (topic, position, turn number) but differs in argument quality as determined by the judge's outcome.
Why This Approach?
- Outcome-grounded: Preferences come from actual debate outcomes judged by GPT-5.2, not synthetic ratings
- Turn-level granularity: Rather than comparing entire debates, we compare individual arguments at the same point in a debate, giving the model fine-grained signal about what makes a specific argument more persuasive
- Cross-model diversity: Since the arena includes 10 diverse models, the rejected arguments represent a wide range of failure modes (weak evidence, poor engagement, repetition, etc.)
- Chosen model filter: Using Kimi-K2 (the arena's top performer with 1,666 wins) as the chosen model ensures the preferred arguments represent high-quality persuasive writing
Training Details
Hyperparameters
| Parameter | Value |
|---|---|
| Learning rate | 5e-6 |
| Beta (DPO) | 0.2 |
| Batch size | 8 |
| Epochs | 1 |
| Warmup ratio | 0.05 |
| Weight decay | 0.01 |
| Max grad norm | 1.0 |
| Precision | AMP (mixed precision) |
| PPL upper bound | 10.0 (safety stop) |
Training Progress at Checkpoint 1000
| Metric | Value |
|---|---|
| Loss | ~0.01-0.10 (down from 0.693 initial) |
| Accuracy | ~87-100% |
| Perplexity | 4.70 (stable, initial: 4.68) |
How to Use
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load the SFT-full base model
base_model = AutoModelForCausalLM.from_pretrained(
"Mishrakshitij/spar-qwen3-4b-sft-full",
torch_dtype="auto",
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("Mishrakshitij/spar-qwen3-4b-sft-full")
# Apply DPO LoRA adapter
model = PeftModel.from_pretrained(base_model, "Mishrakshitij/spar-dpo-lora-adapter-ckpt1000")
# Optional: merge for faster inference
model = model.merge_and_unload()
Framework versions
- PEFT 0.17.1
- Downloads last month
- 3
Model tree for Mishrakshitij/spar-dpo-lora-adapter-ckpt1000
Base model
Mishrakshitij/spar-qwen3-4b-sft-full