Qwen3.5-9B Medical Triage — Stage 3 DPO (v4)

Emergency department triage model fine-tuned on Qwen3.5-9B via a 3-stage pipeline: Stage 1 (general medical SFT) → Stage 2 (ED intake SOAP → ESI decision SFT) → Stage 3 (DPO alignment to reduce over-triage, this model).

Quantized to Q4_K_M GGUF for on-device inference.

Model Description

Given an ED SOAP intake note, the model outputs a structured triage decision:

ESI level (1–5) with justification
Key clinical findings
Time-to-provider target
Immediate interventions required

ESI Scale: 1 = Immediate life threat · 2 = Emergent high-risk · 3 = Urgent stable · 4 = Less urgent · 5 = Non-urgent

Training Pipeline

Stage	Method	Objective
1	SFT (LoRA r=16)	General medical knowledge (PubMed, clinical guidelines)
2	SFT (LoRA r=16)	SOAP note → structured ESI triage decision
3	DPO (LoRA r=8)	Reduce over-triage · preserve ESI 1/2 high-risk recall

Stage 3 DPO Details

Base: Stage 2 LoRA checkpoint (vadimbelsky/qwen3.5-medical-ft-stage2)
Dataset: dpo_dataset_v4.jsonl — 5,413 raw pairs → 7,789 weighted pairs
Loss: Combined apo_down × 0.3 + sft × 1.0 (MPO-style)
Beta: 0.5 · LR: 5e-5 · Epochs: 0.1 (47 steps)
Batch: 2 × 8 gradient accumulation = effective 16
ESI label prepending: All chosen/rejected completions prefixed with explicit ESI label (e.g. ESI 2 — Emergent (high risk)\n\n...) to anchor preference signal at token position 0

Dataset Sources (v4)

Source	Description	Raw pairs	Weight	Weighted
A	Anti-overtriage synthetic (ESI 3→1/2 rejected)	2,388	1×	2,388
B	Anti-overtriage synthetic (ESI 4/5→1/2 rejected)	1,500	1×	1,500
C	Edge cases (synthetic boundary scenarios)	39	1×	39
D	ESI 1/2 anchor pairs (high-risk recall preservation)	890	3×	2,670
E-over	ESI 3 bidirectional — anti-overtriage	297	2×	594
E-under	ESI 3 bidirectional — anti-undertriage	299	2×	598
Total		5,413		7,789

Evaluation Results

Evaluated on MIMIC-IV-Ext Triage Instruction Corpus (MIETIC) — 36 human-expert validated RETAIN cases.

v4 vs Previous Stages

Metric	Stage 2 (SFT)	v1 DPO	v2 DPO	v3 DPO	v4 DPO	Target
Accuracy	~68%	55.6%	50.0%	27.8%	75.0%	>82%
Over-triage rate	~22%	22.2%	30.6%	0%	13.9%	<10%
Under-triage rate	~8%	36.1%	41.7%	72.2%	11.1%	<6%
High-risk recall (ESI 1+2)	~84%	76%	64%	40%	92%	100%
ESI 3 accuracy	~45%	~40%	~30%	~0%	60%	>65%

v4 Detailed Results (MIETIC, n=36)

Samples evaluated   : 36
ESI level parsed    : 36 / 36
Correct             : 27
Accuracy            : 75.0%
Under-triage rate   : 11.1% (4 cases)
Over-triage rate    : 13.9% (5 cases)
High-risk recall    : 92.0% (ESI 1+2, n=25)

Per-ESI Accuracy:

ESI Level	N	Correct	Accuracy
ESI 1	14	12	85.7%
ESI 2	11	9	81.8%
ESI 3	5	3	60.0%
ESI 4	4	2	50.0%
ESI 5	2	1	50.0%

Confusion Matrix (rows = ground truth, cols = predicted):

GT \ Pred  ESI 1  ESI 2  ESI 3  ESI 4  ESI 5
ESI 1         12      2      0      0      0
ESI 2          0      9      2      0      0
ESI 3          0      2      3      0      0
ESI 4          0      0      2      2      0
ESI 5          0      0      0      1      1

All remaining errors are ±1 ESI boundary confusions — no catastrophic mis-triage.

Key Lessons from DPO Iteration

v1–v3 failure: IPO/sigmoid loss collapsed when dataset direction was 100% anti-overtriage → catastrophic under-triage regression (40% high-risk recall at worst)
v4 fix: (1) ESI label prepended at token position 0 for unambiguous preference signal; (2) apo_down + sft combined loss preserves ESI 1/2 recall via SFT component; (3) Sources D (ESI 1/2 anchors ×3) + E (ESI 3 bidirectional ×2) balance dataset direction

Usage

# Requires llama.cpp server running with the Q4_K_M GGUF
# llama-server --model qwen3.5-medical-ft-stage3-dpo-q4km.gguf --port 8080 -c 4096

from openai import OpenAI
client = OpenAI(base_url="http://localhost:8080/v1", api_key="none")

SYSTEM_PROMPT = (
    "You are an expert emergency medicine triage nurse. "
    "Given a SOAP intake note, provide a structured triage decision including "
    "ESI level with justification, key clinical findings, time-to-provider target, "
    "and any immediate interventions required."
)

response = client.chat.completions.create(
    model="local",
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": "<SOAP intake note here>"},
    ],
    temperature=0.1,
    max_tokens=512,
)
print(response.choices[0].message.content)

Limitations & Safety

⚠️ This model is for research purposes only. It must NOT be used for clinical decision-making without licensed clinician oversight.

Evaluated on 36 MIETIC validation cases — not a clinical trial
11.1% under-triage rate means critical patients may be down-triaged
92% high-risk recall means ~8% of ESI 1/2 patients may be missed
Model has not been validated on real ED populations
Fine-tuned on synthetic + MIMIC-IV derived data only

Training Infrastructure

Hardware: NVIDIA GB10 (121 GB VRAM), 1 GPU
Framework: Unsloth 2026.3.4 + TRL DPOTrainer + Transformers 5.2.0
Training time: ~2 hours (47 steps)
Quantization: GGUF Q4_K_M via llama.cpp

Fine-tuned with Unsloth 🦥

Downloads last month: 555

Safetensors

Model size

10B params

Tensor type

BF16

F32

vadimbelsky
/

qwen3.5-medical-ft-stage3-dpo