Qwen3.5-9B Medical Triage — Stage 3 DPO (v4)

Emergency department triage model fine-tuned on Qwen3.5-9B via a 3-stage pipeline: Stage 1 (general medical SFT) → Stage 2 (ED intake SOAP → ESI decision SFT) → Stage 3 (DPO alignment to reduce over-triage, this model).

Quantized to Q4_K_M GGUF for on-device inference.


Model Description

Given an ED SOAP intake note, the model outputs a structured triage decision:

  • ESI level (1–5) with justification
  • Key clinical findings
  • Time-to-provider target
  • Immediate interventions required

ESI Scale: 1 = Immediate life threat · 2 = Emergent high-risk · 3 = Urgent stable · 4 = Less urgent · 5 = Non-urgent


Training Pipeline

Stage Method Objective
1 SFT (LoRA r=16) General medical knowledge (PubMed, clinical guidelines)
2 SFT (LoRA r=16) SOAP note → structured ESI triage decision
3 DPO (LoRA r=8) Reduce over-triage · preserve ESI 1/2 high-risk recall

Stage 3 DPO Details

  • Base: Stage 2 LoRA checkpoint (vadimbelsky/qwen3.5-medical-ft-stage2)
  • Dataset: dpo_dataset_v4.jsonl — 5,413 raw pairs → 7,789 weighted pairs
  • Loss: Combined apo_down × 0.3 + sft × 1.0 (MPO-style)
  • Beta: 0.5 · LR: 5e-5 · Epochs: 0.1 (47 steps)
  • Batch: 2 × 8 gradient accumulation = effective 16
  • ESI label prepending: All chosen/rejected completions prefixed with explicit ESI label (e.g. ESI 2 — Emergent (high risk)\n\n...) to anchor preference signal at token position 0

Dataset Sources (v4)

Source Description Raw pairs Weight Weighted
A Anti-overtriage synthetic (ESI 3→1/2 rejected) 2,388 2,388
B Anti-overtriage synthetic (ESI 4/5→1/2 rejected) 1,500 1,500
C Edge cases (synthetic boundary scenarios) 39 39
D ESI 1/2 anchor pairs (high-risk recall preservation) 890 2,670
E-over ESI 3 bidirectional — anti-overtriage 297 594
E-under ESI 3 bidirectional — anti-undertriage 299 598
Total 5,413 7,789

Evaluation Results

Evaluated on MIMIC-IV-Ext Triage Instruction Corpus (MIETIC) — 36 human-expert validated RETAIN cases.

v4 vs Previous Stages

Metric Stage 2 (SFT) v1 DPO v2 DPO v3 DPO v4 DPO Target
Accuracy ~68% 55.6% 50.0% 27.8% 75.0% >82%
Over-triage rate ~22% 22.2% 30.6% 0% 13.9% <10%
Under-triage rate ~8% 36.1% 41.7% 72.2% 11.1% <6%
High-risk recall (ESI 1+2) ~84% 76% 64% 40% 92% 100%
ESI 3 accuracy ~45% ~40% ~30% ~0% 60% >65%

v4 Detailed Results (MIETIC, n=36)

Samples evaluated   : 36
ESI level parsed    : 36 / 36
Correct             : 27
Accuracy            : 75.0%
Under-triage rate   : 11.1% (4 cases)
Over-triage rate    : 13.9% (5 cases)
High-risk recall    : 92.0% (ESI 1+2, n=25)

Per-ESI Accuracy:

ESI Level N Correct Accuracy
ESI 1 14 12 85.7%
ESI 2 11 9 81.8%
ESI 3 5 3 60.0%
ESI 4 4 2 50.0%
ESI 5 2 1 50.0%

Confusion Matrix (rows = ground truth, cols = predicted):

GT \ Pred  ESI 1  ESI 2  ESI 3  ESI 4  ESI 5
ESI 1         12      2      0      0      0
ESI 2          0      9      2      0      0
ESI 3          0      2      3      0      0
ESI 4          0      0      2      2      0
ESI 5          0      0      0      1      1

All remaining errors are ±1 ESI boundary confusions — no catastrophic mis-triage.


Key Lessons from DPO Iteration

  • v1–v3 failure: IPO/sigmoid loss collapsed when dataset direction was 100% anti-overtriage → catastrophic under-triage regression (40% high-risk recall at worst)
  • v4 fix: (1) ESI label prepended at token position 0 for unambiguous preference signal; (2) apo_down + sft combined loss preserves ESI 1/2 recall via SFT component; (3) Sources D (ESI 1/2 anchors ×3) + E (ESI 3 bidirectional ×2) balance dataset direction

Usage

# Requires llama.cpp server running with the Q4_K_M GGUF
# llama-server --model qwen3.5-medical-ft-stage3-dpo-q4km.gguf --port 8080 -c 4096

from openai import OpenAI
client = OpenAI(base_url="http://localhost:8080/v1", api_key="none")

SYSTEM_PROMPT = (
    "You are an expert emergency medicine triage nurse. "
    "Given a SOAP intake note, provide a structured triage decision including "
    "ESI level with justification, key clinical findings, time-to-provider target, "
    "and any immediate interventions required."
)

response = client.chat.completions.create(
    model="local",
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": "<SOAP intake note here>"},
    ],
    temperature=0.1,
    max_tokens=512,
)
print(response.choices[0].message.content)

Limitations & Safety

⚠️ This model is for research purposes only. It must NOT be used for clinical decision-making without licensed clinician oversight.

  • Evaluated on 36 MIETIC validation cases — not a clinical trial
  • 11.1% under-triage rate means critical patients may be down-triaged
  • 92% high-risk recall means ~8% of ESI 1/2 patients may be missed
  • Model has not been validated on real ED populations
  • Fine-tuned on synthetic + MIMIC-IV derived data only

Training Infrastructure

  • Hardware: NVIDIA GB10 (121 GB VRAM), 1 GPU
  • Framework: Unsloth 2026.3.4 + TRL DPOTrainer + Transformers 5.2.0
  • Training time: ~2 hours (47 steps)
  • Quantization: GGUF Q4_K_M via llama.cpp

Fine-tuned with Unsloth 🦥

Downloads last month
555
Safetensors
Model size
10B params
Tensor type
BF16
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using vadimbelsky/qwen3.5-medical-ft-stage3-dpo 1