PERSUADE DPO LoRA Adapter (Checkpoint 1000)

DPO LoRA adapter trained on SFT-full Qwen3-4B for debate persuasion, part of the PERSUADE framework (A Debate Arena Framework for Benchmarking, Optimizing, and Ethically Auditing Persuasion in Language Models).

Quick Summary

Base model Mishrakshitij/spar-qwen3-4b-sft-full (Qwen3-4B-Instruct, fully fine-tuned with SFT on arena demonstrations)
Method DPO (Direct Preference Optimization) with beta=0.2
LoRA config r=16, alpha=32, dropout=0.05, all linear layers
Preference pairs 81,192 pairs from debate arena
Checkpoint Step 1000 / 10149

How the Preference Data Was Created

The preference pairs are derived from a large-scale debate arena where 10 LLMs (ranging from 4B to 235B parameters) debated each other on 259 diverse topics.

Arena Setup

  • Models: Kimi-K2, DeepSeek-R1, Qwen3-235B, Qwen3-235B-Instruct, Qwen3-32B, Qwen3-14B, Qwen3-4B, DeepSeek-V3.1, GLM-4-Plus-0711, LLaMA-3.3-70B
  • Topics: 259 debate topics spanning politics, ethics, technology, science, society, etc.
  • Format: 10-turn structured debates (5 turns per side, alternating FOR/AGAINST)
  • Judge: GPT-5.2 evaluated each debate and declared a winner based on argument quality, evidence, and persuasiveness
  • Total debates: 13,726 (10,487 train / 3,239 test split by topic — 200 train topics, 59 test topics)

Preference Pair Construction

For each debate involving the chosen model (Kimi-K2, the top-performing arena model), preference pairs are constructed at the turn level:

  1. Identify winning/losing sides: For each debate, the judge's verdict determines which side (FOR or AGAINST) won
  2. Index by (topic, position, turn): Each argument is keyed by its debate topic, the position it argued (FOR/AGAINST), and the turn number
  3. Pair winners with losers: For the same (topic, position, turn) key, arguments from the winning side become chosen and arguments from the losing side become rejected
  4. Cross-product pairing: All winning arguments are paired with all losing arguments for the same key (excluding identical texts), creating a dense set of preference pairs
  5. Think-tag stripping: Internal chain-of-thought (<think>...</think>) tags are stripped from arguments before pairing, so the model learns from clean debate text only

This produces 81,192 preference pairs from the training split, where each pair shares the same debate context (topic, position, turn number) but differs in argument quality as determined by the judge's outcome.

Why This Approach?

  • Outcome-grounded: Preferences come from actual debate outcomes judged by GPT-5.2, not synthetic ratings
  • Turn-level granularity: Rather than comparing entire debates, we compare individual arguments at the same point in a debate, giving the model fine-grained signal about what makes a specific argument more persuasive
  • Cross-model diversity: Since the arena includes 10 diverse models, the rejected arguments represent a wide range of failure modes (weak evidence, poor engagement, repetition, etc.)
  • Chosen model filter: Using Kimi-K2 (the arena's top performer with 1,666 wins) as the chosen model ensures the preferred arguments represent high-quality persuasive writing

Training Details

Hyperparameters

Parameter Value
Learning rate 5e-6
Beta (DPO) 0.2
Batch size 8
Epochs 1
Warmup ratio 0.05
Weight decay 0.01
Max grad norm 1.0
Precision AMP (mixed precision)
PPL upper bound 10.0 (safety stop)

Training Progress at Checkpoint 1000

Metric Value
Loss ~0.01-0.10 (down from 0.693 initial)
Accuracy ~87-100%
Perplexity 4.70 (stable, initial: 4.68)

How to Use

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the SFT-full base model
base_model = AutoModelForCausalLM.from_pretrained(
    "Mishrakshitij/spar-qwen3-4b-sft-full",
    torch_dtype="auto",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("Mishrakshitij/spar-qwen3-4b-sft-full")

# Apply DPO LoRA adapter
model = PeftModel.from_pretrained(base_model, "Mishrakshitij/spar-dpo-lora-adapter-ckpt1000")

# Optional: merge for faster inference
model = model.merge_and_unload()

Framework versions

  • PEFT 0.17.1
Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Mishrakshitij/spar-dpo-lora-adapter-ckpt1000

Adapter
(5)
this model