Qwen3.5-9B — Medical Triage LoRA Adapters

LoRA fine-tune of Qwen3.5-9B on synthetic clinical triage Q&A pairs generated from PubMed Central open-access papers. The model is specialized for emergency-medicine decision-making: triaging patients, applying clinical decision rules, and generating protocol-grounded triage recommendations.

Disclaimer — Not for clinical use. This model is a research artifact trained on synthetic data. It must not be used to inform real patient care decisions.

Model Details

Quick Start

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = "Qwen/Qwen3.5-9B"
adapter = "vadimbelsky/qwen3.5-medical-ft"

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, torch_dtype="auto", device_map="auto")
model = PeftModel.from_pretrained(model, adapter)
model.eval()

prompt = (
    "A 67-year-old male presents with sudden onset crushing chest pain radiating to "
    "the left arm, diaphoresis, and mild dyspnea. BP 145/90, HR 102, SpO2 96%. "
    "What is the triage priority and initial management?"
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=512, temperature=0.3, do_sample=True)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Intended Uses

Use Suitable?
Research on medical NLP / domain adaptation
Educational demonstrations of LLM fine-tuning
Clinical decision support in production
Patient-facing applications

Out-of-Scope Use

  • Real-time triage of actual patients
  • Replacing trained clinicians or validated clinical decision rules
  • Domains outside emergency medicine / triage

Training Details

Dataset

47,500 synthetic Q&A pairs generated from **2,100 PubMed Central open-access papers** covering emergency medicine, triage protocols, clinical decision rules, prehospital care, pediatric emergencies, and sepsis management.

Generation pipeline:

  1. Papers downloaded from PMC via fetch_papers.py
  2. Relevant clinical sections extracted and classified
  3. Q&A pairs generated by gpt-4.1-nano via NVIDIA NeMo Data Designer
  4. Quality-filtered by LLM judge on three axes: relevance, faithfulness, and clinical realism

Train/validation split: 95 / 5 (≈45,125 / 2,375 samples)

Hyperparameters

Parameter Value
LoRA rank (r) 16
LoRA alpha 16
LoRA dropout 0
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Epochs 1
Learning rate 2e-4
Global batch size 16 (8 per-device × 2 gradient accumulation)
Max sequence length 4,096 tokens
Optimizer AdamW 8-bit
Precision BFloat16
LR scheduler cosine

Training Run

Metric Value
Steps 3,249
Final training loss 0.596
Final eval loss 0.552
Wall-clock time ~15.2 hours
Framework TRL 0.29.0 + Unsloth 2026.3.4
Hardware NVIDIA Blackwell GB10 (DGX Spark)

Evaluation — Pre Fine-Tune Baseline

Zero-shot evaluation of the quantized base model (Q4_K_M GGUF) using lm-evaluation-harness:

Benchmark Accuracy
MedMCQA (validation) 32.2% ± 0.72%
MedQA-4options (test) 27.7% ± 1.26%

Post fine-tune evaluation on these benchmarks is in progress.

Bias, Risks, and Limitations

  • Synthetic data: All training Q&A pairs are machine-generated from research papers; the model may reflect biases or errors in the source literature and generation pipeline.
  • No clinical validation: The model has not been evaluated against real patient outcomes or validated clinical guidelines.
  • English only: Trained exclusively on English-language literature.
  • Hallucination risk: Like all LLMs, this model can generate plausible-sounding but factually incorrect clinical information.
  • Narrow scope: Optimized for emergency triage scenarios; performance outside this domain is untested.

Environmental Impact

  • Hardware: NVIDIA Blackwell GB10 (DGX Spark)
  • Training time: ~15.2 hours
  • Carbon estimate: Use the ML CO₂ Impact calculator for an estimate based on your region.

Technical Specifications

Model Architecture

Qwen3.5-9B is a transformer-based causal language model with:

  • 32 layers, mixed linear/full attention (Blackwell-optimized)
  • 9.4 B total parameters
  • Context window: 262,144 tokens

The LoRA adapters add 40 M trainable parameters (0.4% of base model).

Software

Library Version
transformers 5.3.0
peft 0.18.1
trl 0.29.0
unsloth 2026.3.4
torch 2.10.0
accelerate 1.13.0

Model Card Authors

Vadim Belski

Framework versions

  • PEFT 0.18.1
Downloads last month
-
Safetensors
Model size
10B params
Tensor type
BF16
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vadimbelsky/qwen3.5-medical-ft

Finetuned
Qwen/Qwen3.5-9B
Adapter
(122)
this model

Space using vadimbelsky/qwen3.5-medical-ft 1