---
language:
- en
license: apache-2.0
library_name: setfit
tags:
- setfit
- sentence-transformers
- text-classification
- medical
- triage
- few-shot-learning
- patient-safety
datasets:
- custom
metrics:
- f1
- accuracy
pipeline_tag: text-classification
base_model: sentence-transformers/all-mpnet-base-v2
model-index:
- name: medical-query-router
  results:
  - task:
      type: text-classification
      name: Medical Query Triage
    metrics:
    - name: Weighted F1
      type: f1
      value: 0.888
    - name: Accuracy
      type: accuracy
      value: 0.889
    - name: Urgent Recall
      type: recall
      value: 0.933
---

# 🏥 Medical Query Router

**Few-shot classifier that routes patient queries into 3 safety tiers.**

Built with [SetFit](https://github.com/huggingface/setfit) — trained on just **90 hand-crafted examples** (30 per class) using contrastive learning.

## Classes

| Tier | Label | Action | Example |
|------|-------|--------|---------|
| 🟢 | `low_stakes` | Chatbot answers directly | *"How much paracetamol for a headache? I'm 30 and healthy"* |
| 🟡 | `high_stakes` | Doctor reviews before responding | *"Can I take ibuprofen while on blood thinners?"* |
| 🔴 | `urgent` | Tell patient to call 911/999 NOW | *"Crushing chest pain going down my left arm"* |

## Performance

Evaluated on 45 held-out examples (15 per class) including deliberate edge cases:

| Metric | Score |
|--------|-------|
| **Weighted F1** | **0.888** |
| **Accuracy** | **88.9%** |
| **Urgent Recall** | **93.3%** |
| Urgent Precision | 82.4% |
| Low Stakes F1 | 0.933 |
| High Stakes F1 | 0.857 |

### Confusion Matrix

```
              Predicted →  low   high  urgent
low_stakes                  14     0      1
high_stakes                  1    12      2
urgent                       0     1     14
```

### Backbone Comparison

We trained 3 models and selected the best:

| Backbone | F1 | Urgent Recall | Safety Score |
|----------|-----|---------------|-------------|
| **all-mpnet-base-v2** ★ | **0.888** | **0.933** | **0.859** |
| all-MiniLM-L6-v2 | 0.846 | 0.867 | 0.789 |
| MedEmbed-base-v0.1 | 0.801 | 0.867 | 0.748 |

## Usage

```python
from setfit import SetFitModel

model = SetFitModel.from_pretrained("boredpanda9/medical-query-router")

queries = [
    "What are some healthy ways to lose weight?",
    "Can I take naproxen with my blood pressure medication?",
    "I have crushing chest pain spreading to my left arm",
]

predictions = model.predict(queries)
print(predictions)
# ['low_stakes', 'high_stakes', 'urgent']

# With confidence scores
probabilities = model.predict_proba(queries)
print(probabilities)
```

## Training Details

- **Method**: SetFit (Sentence-Transformer fine-tuning + Logistic Regression head)
- **Paper**: [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
- **Base model**: `sentence-transformers/all-mpnet-base-v2` (109.5M params)
- **Training data**: 90 hand-crafted examples (30 per class)
- **Contrastive pairs**: 3,600 (generated via R=20 pair sampling)
- **Epochs**: 1 (contrastive phase) + 1 (head phase)
- **Body learning rate**: 2e-5
- **Head learning rate**: 1e-2
- **Batch size**: 16 (contrastive), 2 (head)
- **Loss**: CosineSimilarityLoss
- **Head**: Logistic Regression with balanced class weights

## Class Design Rationale

### 🟢 Low Stakes
Queries where a chatbot can safely provide general information:
- OTC medication dosing for otherwise **healthy adults** (paracetamol, ibuprofen, antihistamines)
- General wellness (weight loss, sleep, hydration, exercise)
- Mild, self-limiting symptoms with **no red flags** (common cold, mild fever in children who are otherwise well, minor cuts/grazes)
- Lifestyle and prevention advice

### 🟡 High Stakes  
Queries requiring clinical judgement — a doctor must review before responding:
- **Prescription medication dosing** where errors cause harm (insulin, warfarin, metformin, chemotherapy)
- **Drug interactions** (especially with narrow therapeutic index drugs)
- **Comorbidities** that change management (diabetes + wound, COPD + ankle swelling)
- **Pregnancy/breastfeeding** medication safety
- **Chronic disease** management and flare-ups
- **Red flags** in symptoms (unexplained weight loss, persistent cough >3 weeks, changing moles)
- **Children's prescription** medications
- **Mental health** (non-crisis)

### 🔴 Urgent
Life-threatening emergencies — patient must call 911/999/112 immediately:
- Signs of **heart attack** (chest pain + arm/jaw, sweating, collapse)
- Signs of **stroke** (FAST: Face drooping, Arm weakness, Speech difficulty, Time to call)
- **Breathing emergencies** (anaphylaxis, severe asthma, choking, blue lips)
- **Overdose or poisoning** (especially in children)
- **Suicidal crisis** (active plan, immediate danger)
- **Severe bleeding** or major trauma
- **Meningitis** signs (non-blanching rash + fever + neck stiffness)
- **Seizures** lasting >5 minutes
- **Unconscious/unresponsive** person

## Limitations

⚠️ **This is a routing tool, not a diagnostic tool.** It decides *who* should answer a query, not *what* the answer is.

- Trained on 90 examples — may misclassify unusual or ambiguous queries
- Designed for English-language queries in UK/US healthcare contexts
- Should be used as a **first-pass filter** with human oversight, never as the sole decision-maker
- The model errs toward safety (high_stakes/urgent) when uncertain — this is by design
- Not validated on real clinical data — performance on actual patient messages may differ from the eval set

## License

Apache 2.0