Qwen3.5-9B — Medical Triage LoRA Adapters

LoRA fine-tune of Qwen3.5-9B on synthetic clinical triage Q&A pairs generated from PubMed Central open-access papers. The model is specialized for emergency-medicine decision-making: triaging patients, applying clinical decision rules, and generating protocol-grounded triage recommendations.

Disclaimer — Not for clinical use. This model is a research artifact trained on synthetic data. It must not be used to inform real patient care decisions.

Model Details

Developed by: Vadim Belsky
Base model: Qwen/Qwen3.5-9B (9.4 B parameters)
Fine-tuning method: LoRA (PEFT 0.18.1)
Language: English
License: Apache 2.0
Training dataset: vadimbelsky/medical-triage-qa-50k

Quick Start

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = "Qwen/Qwen3.5-9B"
adapter = "vadimbelsky/qwen3.5-medical-ft"

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, torch_dtype="auto", device_map="auto")
model = PeftModel.from_pretrained(model, adapter)
model.eval()

prompt = (
    "A 67-year-old male presents with sudden onset crushing chest pain radiating to "
    "the left arm, diaphoresis, and mild dyspnea. BP 145/90, HR 102, SpO2 96%. "
    "What is the triage priority and initial management?"
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=512, temperature=0.3, do_sample=True)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Intended Uses

Use	Suitable?
Research on medical NLP / domain adaptation	✅
Educational demonstrations of LLM fine-tuning	✅
Clinical decision support in production	❌
Patient-facing applications	❌

Out-of-Scope Use

Real-time triage of actual patients
Replacing trained clinicians or validated clinical decision rules
Domains outside emergency medicine / triage

Training Details

Dataset

47,500 synthetic Q&A pairs generated from **2,100 PubMed Central open-access papers** covering emergency medicine, triage protocols, clinical decision rules, prehospital care, pediatric emergencies, and sepsis management.

Generation pipeline:

Papers downloaded from PMC via fetch_papers.py
Relevant clinical sections extracted and classified
Q&A pairs generated by gpt-4.1-nano via NVIDIA NeMo Data Designer
Quality-filtered by LLM judge on three axes: relevance, faithfulness, and clinical realism

Train/validation split: 95 / 5 (≈45,125 / 2,375 samples)

Hyperparameters

Parameter	Value
LoRA rank (r)	16
LoRA alpha	16
LoRA dropout	0
Target modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Epochs	1
Learning rate	2e-4
Global batch size	16 (8 per-device × 2 gradient accumulation)
Max sequence length	4,096 tokens
Optimizer	AdamW 8-bit
Precision	BFloat16
LR scheduler	cosine

Training Run

Metric	Value
Steps	3,249
Final training loss	0.596
Final eval loss	0.552
Wall-clock time	~15.2 hours
Framework	TRL 0.29.0 + Unsloth 2026.3.4
Hardware	NVIDIA Blackwell GB10 (DGX Spark)

Evaluation — Pre Fine-Tune Baseline

Zero-shot evaluation of the quantized base model (Q4_K_M GGUF) using lm-evaluation-harness:

Benchmark	Accuracy
MedMCQA (validation)	32.2% ± 0.72%
MedQA-4options (test)	27.7% ± 1.26%

Post fine-tune evaluation on these benchmarks is in progress.

Bias, Risks, and Limitations

Synthetic data: All training Q&A pairs are machine-generated from research papers; the model may reflect biases or errors in the source literature and generation pipeline.
No clinical validation: The model has not been evaluated against real patient outcomes or validated clinical guidelines.
English only: Trained exclusively on English-language literature.
Hallucination risk: Like all LLMs, this model can generate plausible-sounding but factually incorrect clinical information.
Narrow scope: Optimized for emergency triage scenarios; performance outside this domain is untested.

Environmental Impact

Hardware: NVIDIA Blackwell GB10 (DGX Spark)
Training time: ~15.2 hours
Carbon estimate: Use the ML CO₂ Impact calculator for an estimate based on your region.

Technical Specifications

Model Architecture

Qwen3.5-9B is a transformer-based causal language model with:

32 layers, mixed linear/full attention (Blackwell-optimized)
9.4 B total parameters
Context window: 262,144 tokens

The LoRA adapters add ~~40 M trainable parameters (~~0.4% of base model).

Software

Library	Version
transformers	5.3.0
peft	0.18.1
trl	0.29.0
unsloth	2026.3.4
torch	2.10.0
accelerate	1.13.0

Model Card Authors

Vadim Belski

Framework versions

PEFT 0.18.1

Downloads last month: -

Safetensors

Model size

10B params

Tensor type

BF16

F32

Model tree for vadimbelsky/qwen3.5-medical-ft

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Adapter

(122)

this model

vadimbelsky
/

qwen3.5-medical-ft