Qwen3.5-9B Medical Triage — Stage 3 DPO (v4)
Emergency department triage model fine-tuned on Qwen3.5-9B via a 3-stage pipeline: Stage 1 (general medical SFT) → Stage 2 (ED intake SOAP → ESI decision SFT) → Stage 3 (DPO alignment to reduce over-triage, this model).
Quantized to Q4_K_M GGUF for on-device inference.
Model Description
Given an ED SOAP intake note, the model outputs a structured triage decision:
- ESI level (1–5) with justification
- Key clinical findings
- Time-to-provider target
- Immediate interventions required
ESI Scale: 1 = Immediate life threat · 2 = Emergent high-risk · 3 = Urgent stable · 4 = Less urgent · 5 = Non-urgent
Training Pipeline
| Stage | Method | Objective |
|---|---|---|
| 1 | SFT (LoRA r=16) | General medical knowledge (PubMed, clinical guidelines) |
| 2 | SFT (LoRA r=16) | SOAP note → structured ESI triage decision |
| 3 | DPO (LoRA r=8) | Reduce over-triage · preserve ESI 1/2 high-risk recall |
Stage 3 DPO Details
- Base: Stage 2 LoRA checkpoint (
vadimbelsky/qwen3.5-medical-ft-stage2) - Dataset:
dpo_dataset_v4.jsonl— 5,413 raw pairs → 7,789 weighted pairs - Loss: Combined
apo_down × 0.3 + sft × 1.0(MPO-style) - Beta: 0.5 · LR: 5e-5 · Epochs: 0.1 (47 steps)
- Batch: 2 × 8 gradient accumulation = effective 16
- ESI label prepending: All chosen/rejected completions prefixed with explicit ESI label (e.g.
ESI 2 — Emergent (high risk)\n\n...) to anchor preference signal at token position 0
Dataset Sources (v4)
| Source | Description | Raw pairs | Weight | Weighted |
|---|---|---|---|---|
| A | Anti-overtriage synthetic (ESI 3→1/2 rejected) | 2,388 | 1× | 2,388 |
| B | Anti-overtriage synthetic (ESI 4/5→1/2 rejected) | 1,500 | 1× | 1,500 |
| C | Edge cases (synthetic boundary scenarios) | 39 | 1× | 39 |
| D | ESI 1/2 anchor pairs (high-risk recall preservation) | 890 | 3× | 2,670 |
| E-over | ESI 3 bidirectional — anti-overtriage | 297 | 2× | 594 |
| E-under | ESI 3 bidirectional — anti-undertriage | 299 | 2× | 598 |
| Total | 5,413 | 7,789 |
Evaluation Results
Evaluated on MIMIC-IV-Ext Triage Instruction Corpus (MIETIC) — 36 human-expert validated RETAIN cases.
v4 vs Previous Stages
| Metric | Stage 2 (SFT) | v1 DPO | v2 DPO | v3 DPO | v4 DPO | Target |
|---|---|---|---|---|---|---|
| Accuracy | ~68% | 55.6% | 50.0% | 27.8% | 75.0% | >82% |
| Over-triage rate | ~22% | 22.2% | 30.6% | 0% | 13.9% | <10% |
| Under-triage rate | ~8% | 36.1% | 41.7% | 72.2% | 11.1% | <6% |
| High-risk recall (ESI 1+2) | ~84% | 76% | 64% | 40% | 92% | 100% |
| ESI 3 accuracy | ~45% | ~40% | ~30% | ~0% | 60% | >65% |
v4 Detailed Results (MIETIC, n=36)
Samples evaluated : 36
ESI level parsed : 36 / 36
Correct : 27
Accuracy : 75.0%
Under-triage rate : 11.1% (4 cases)
Over-triage rate : 13.9% (5 cases)
High-risk recall : 92.0% (ESI 1+2, n=25)
Per-ESI Accuracy:
| ESI Level | N | Correct | Accuracy |
|---|---|---|---|
| ESI 1 | 14 | 12 | 85.7% |
| ESI 2 | 11 | 9 | 81.8% |
| ESI 3 | 5 | 3 | 60.0% |
| ESI 4 | 4 | 2 | 50.0% |
| ESI 5 | 2 | 1 | 50.0% |
Confusion Matrix (rows = ground truth, cols = predicted):
GT \ Pred ESI 1 ESI 2 ESI 3 ESI 4 ESI 5
ESI 1 12 2 0 0 0
ESI 2 0 9 2 0 0
ESI 3 0 2 3 0 0
ESI 4 0 0 2 2 0
ESI 5 0 0 0 1 1
All remaining errors are ±1 ESI boundary confusions — no catastrophic mis-triage.
Key Lessons from DPO Iteration
- v1–v3 failure: IPO/sigmoid loss collapsed when dataset direction was 100% anti-overtriage → catastrophic under-triage regression (40% high-risk recall at worst)
- v4 fix: (1) ESI label prepended at token position 0 for unambiguous preference signal; (2)
apo_down + sftcombined loss preserves ESI 1/2 recall via SFT component; (3) Sources D (ESI 1/2 anchors ×3) + E (ESI 3 bidirectional ×2) balance dataset direction
Usage
# Requires llama.cpp server running with the Q4_K_M GGUF
# llama-server --model qwen3.5-medical-ft-stage3-dpo-q4km.gguf --port 8080 -c 4096
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8080/v1", api_key="none")
SYSTEM_PROMPT = (
"You are an expert emergency medicine triage nurse. "
"Given a SOAP intake note, provide a structured triage decision including "
"ESI level with justification, key clinical findings, time-to-provider target, "
"and any immediate interventions required."
)
response = client.chat.completions.create(
model="local",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": "<SOAP intake note here>"},
],
temperature=0.1,
max_tokens=512,
)
print(response.choices[0].message.content)
Limitations & Safety
⚠️ This model is for research purposes only. It must NOT be used for clinical decision-making without licensed clinician oversight.
- Evaluated on 36 MIETIC validation cases — not a clinical trial
- 11.1% under-triage rate means critical patients may be down-triaged
- 92% high-risk recall means ~8% of ESI 1/2 patients may be missed
- Model has not been validated on real ED populations
- Fine-tuned on synthetic + MIMIC-IV derived data only
Training Infrastructure
- Hardware: NVIDIA GB10 (121 GB VRAM), 1 GPU
- Framework: Unsloth 2026.3.4 + TRL DPOTrainer + Transformers 5.2.0
- Training time: ~2 hours (47 steps)
- Quantization: GGUF Q4_K_M via llama.cpp
Fine-tuned with Unsloth 🦥
- Downloads last month
- 555