Ministral-DrugDetector-14B-All9
A LoRA fine-tune of Ministral-3-14B-Instruct-2512 for multi-class substance use detection in clinical notes. Given a medical note, the model predicts illicit or harmful use across 9 substance classes and classifies each as current, historical, or unknown — all in a single forward pass.
Substances Detected
| Class | Label |
|---|---|
| Methamphetamine | Illicit use |
| Fentanyl | Illicit use |
| Injection Drug Use (IDU) | Any route |
| Heroin | Illicit use |
| Prescription Opioid Misuse | Non-prescribed / abusive context |
| Cocaine | Illicit use |
| Alcohol | Harmful use (AUD, intoxication, withdrawal) |
| Cannabis | Illicit / recreational use |
| Benzodiazepines | Misuse / non-prescribed |
Output Format
Methamphetamine Illicit Use: True/False/Unknown
Fentanyl Illicit Use: True/False/Unknown
Injection Drug Use: True/False/Unknown
Heroin Illicit Use: True/False/Unknown
Prescription Opioid Misuse: True/False/Unknown
Cocaine Illicit Use: True/False/Unknown
Alcohol Harmful Use: True/False/Unknown
Cannabis Illicit Use: True/False/Unknown
Benzodiazepine Misuse: True/False/Unknown
Methamphetamine Temporal Status: Current/Historical/Unknown/N/A
Fentanyl Temporal Status: Current/Historical/Unknown/N/A
Injection Drug Use Temporal Status: Current/Historical/Unknown/N/A
Heroin Temporal Status: Current/Historical/Unknown/N/A
Prescription Opioid Temporal Status: Current/Historical/Unknown/N/A
Cocaine Temporal Status: Current/Historical/Unknown/N/A
Alcohol Temporal Status: Current/Historical/Unknown/N/A
Cannabis Temporal Status: Current/Historical/Unknown/N/A
Benzodiazepine Temporal Status: Current/Historical/Unknown/N/A
Performance (held-out test set, n=187 clinical notes)
Illicit Use Detection (F1, positive class = True)
| Substance | Precision | Recall | F1 |
|---|---|---|---|
| Methamphetamine | 1.000 | 0.963 | 0.981 |
| Fentanyl | 0.971 | 0.971 | 0.971 |
| Rx Opioid Misuse | 1.000 | 0.917 | 0.957 |
| Cannabis | 0.971 | 0.943 | 0.957 |
| Benzodiazepine Misuse | 0.967 | 0.935 | 0.951 |
| Heroin | 1.000 | 0.897 | 0.945 |
| Injection Drug Use | 1.000 | 0.886 | 0.939 |
| Cocaine | 1.000 | 0.844 | 0.915 |
| Alcohol (harmful) | 1.000 | 0.795 | 0.886 |
Temporal Classification (accuracy among True cases)
| Substance | N | Accuracy |
|---|---|---|
| Benzodiazepine Misuse | 24 | 0.833 |
| Fentanyl | 35 | 0.800 |
| Methamphetamine | 41 | 0.756 |
| Heroin | 22 | 0.773 |
| Cannabis | 25 | 0.760 |
| Rx Opioid Misuse | 31 | 0.742 |
| Injection Drug Use | 27 | 0.630 |
| Cocaine | 24 | 0.583 |
| Alcohol (harmful) | 27 | 0.556 |
Training
- Base model:
mistralai/Ministral-3-14B-Instruct-2512 - Method: LoRA (r=16, alpha=32, dropout=0.05, targets: q/k/v/o_proj)
- Training data: 495 manually annotated clinical notes (UCLA Health, de-identified)
- Epochs: 3 | Batch size: 1 (grad accum 2) | LR: 2e-4
- Trainable parameters: 22.8M (0.16% of 14B)
- Hardware: 1× NVIDIA RTX A6000 (48GB)
Usage
from transformers import AutoTokenizer
from peft import PeftModel
import torch
try:
from transformers import Mistral3ForConditionalGeneration
model = Mistral3ForConditionalGeneration.from_pretrained(
"mistralai/Ministral-3-14B-Instruct-2512",
torch_dtype=torch.bfloat16, device_map="auto"
)
except ImportError:
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
"mistralai/Ministral-3-14B-Instruct-2512",
torch_dtype=torch.bfloat16, device_map="auto"
)
model = PeftModel.from_pretrained(model, "fabriceyhc/Ministral-DrugDetector-14B-All9")
model.eval()
tokenizer = AutoTokenizer.from_pretrained("fabriceyhc/Ministral-DrugDetector-14B-All9")
note = "Patient presents with polysubstance use disorder. History of IV heroin use, now on MAT. Recent urine tox positive for methamphetamine."
prompt = f"""<s>[INST] ### Task Description:
Please carefully review the following medical note and identify illicit or harmful substance use.
### The medical note to evaluate:
{note}
### Answer:
[/INST]"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(**inputs, max_new_tokens=300, do_sample=False)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Intended Use
Designed for NLP annotation of de-identified clinical notes in research settings. Not intended for clinical decision-making.
Framework versions
- PEFT 0.18.1
- Downloads last month
- 1
Model tree for fabriceyhc/Ministral-DrugDetector-14B-All9
Base model
mistralai/Ministral-3-14B-Base-2512 Quantized
mistralai/Ministral-3-14B-Instruct-2512