Qwen3.5-9B — Medical Triage LoRA Adapters
LoRA fine-tune of Qwen3.5-9B on synthetic clinical triage Q&A pairs generated from PubMed Central open-access papers. The model is specialized for emergency-medicine decision-making: triaging patients, applying clinical decision rules, and generating protocol-grounded triage recommendations.
Disclaimer — Not for clinical use. This model is a research artifact trained on synthetic data. It must not be used to inform real patient care decisions.
Model Details
- Developed by: Vadim Belsky
- Base model: Qwen/Qwen3.5-9B (9.4 B parameters)
- Fine-tuning method: LoRA (PEFT 0.18.1)
- Language: English
- License: Apache 2.0
- Training dataset: vadimbelsky/medical-triage-qa-50k
Quick Start
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = "Qwen/Qwen3.5-9B"
adapter = "vadimbelsky/qwen3.5-medical-ft"
tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, torch_dtype="auto", device_map="auto")
model = PeftModel.from_pretrained(model, adapter)
model.eval()
prompt = (
"A 67-year-old male presents with sudden onset crushing chest pain radiating to "
"the left arm, diaphoresis, and mild dyspnea. BP 145/90, HR 102, SpO2 96%. "
"What is the triage priority and initial management?"
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=512, temperature=0.3, do_sample=True)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Intended Uses
| Use | Suitable? |
|---|---|
| Research on medical NLP / domain adaptation | ✅ |
| Educational demonstrations of LLM fine-tuning | ✅ |
| Clinical decision support in production | ❌ |
| Patient-facing applications | ❌ |
Out-of-Scope Use
- Real-time triage of actual patients
- Replacing trained clinicians or validated clinical decision rules
- Domains outside emergency medicine / triage
Training Details
Dataset
47,500 synthetic Q&A pairs generated from **2,100 PubMed Central open-access papers** covering emergency medicine, triage protocols, clinical decision rules, prehospital care, pediatric emergencies, and sepsis management.
Generation pipeline:
- Papers downloaded from PMC via
fetch_papers.py - Relevant clinical sections extracted and classified
- Q&A pairs generated by
gpt-4.1-nanovia NVIDIA NeMo Data Designer - Quality-filtered by LLM judge on three axes: relevance, faithfulness, and clinical realism
Train/validation split: 95 / 5 (≈45,125 / 2,375 samples)
Hyperparameters
| Parameter | Value |
|---|---|
| LoRA rank (r) | 16 |
| LoRA alpha | 16 |
| LoRA dropout | 0 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Epochs | 1 |
| Learning rate | 2e-4 |
| Global batch size | 16 (8 per-device × 2 gradient accumulation) |
| Max sequence length | 4,096 tokens |
| Optimizer | AdamW 8-bit |
| Precision | BFloat16 |
| LR scheduler | cosine |
Training Run
| Metric | Value |
|---|---|
| Steps | 3,249 |
| Final training loss | 0.596 |
| Final eval loss | 0.552 |
| Wall-clock time | ~15.2 hours |
| Framework | TRL 0.29.0 + Unsloth 2026.3.4 |
| Hardware | NVIDIA Blackwell GB10 (DGX Spark) |
Evaluation — Pre Fine-Tune Baseline
Zero-shot evaluation of the quantized base model (Q4_K_M GGUF) using lm-evaluation-harness:
| Benchmark | Accuracy |
|---|---|
| MedMCQA (validation) | 32.2% ± 0.72% |
| MedQA-4options (test) | 27.7% ± 1.26% |
Post fine-tune evaluation on these benchmarks is in progress.
Bias, Risks, and Limitations
- Synthetic data: All training Q&A pairs are machine-generated from research papers; the model may reflect biases or errors in the source literature and generation pipeline.
- No clinical validation: The model has not been evaluated against real patient outcomes or validated clinical guidelines.
- English only: Trained exclusively on English-language literature.
- Hallucination risk: Like all LLMs, this model can generate plausible-sounding but factually incorrect clinical information.
- Narrow scope: Optimized for emergency triage scenarios; performance outside this domain is untested.
Environmental Impact
- Hardware: NVIDIA Blackwell GB10 (DGX Spark)
- Training time: ~15.2 hours
- Carbon estimate: Use the ML CO₂ Impact calculator for an estimate based on your region.
Technical Specifications
Model Architecture
Qwen3.5-9B is a transformer-based causal language model with:
- 32 layers, mixed linear/full attention (Blackwell-optimized)
- 9.4 B total parameters
- Context window: 262,144 tokens
The LoRA adapters add 40 M trainable parameters (0.4% of base model).
Software
| Library | Version |
|---|---|
| transformers | 5.3.0 |
| peft | 0.18.1 |
| trl | 0.29.0 |
| unsloth | 2026.3.4 |
| torch | 2.10.0 |
| accelerate | 1.13.0 |
Model Card Authors
Vadim Belski
Framework versions
- PEFT 0.18.1
- Downloads last month
- -