---
language: en
license: apache-2.0
library_name: setfit
tags:
  - text-classification
  - transaction-classification
  - banking
  - finance
  - setfit
  - sentence-transformers
  - few-shot-learning
  - contrastive-learning
datasets:
  - mitulshah/transaction-categorization
base_model: sentence-transformers/all-MiniLM-L6-v2
pipeline_tag: text-classification
model-index:
  - name: transaction-classifier-setfit
    results:
      - task:
          type: text-classification
          name: Transaction Classification
        metrics:
          - name: Real-World Accuracy (Weighted)
            type: accuracy
            value: 0.805
          - name: SetFit-Only Accuracy
            type: accuracy
            value: 0.667
          - name: Validation Accuracy
            type: accuracy
            value: 0.98
---

# Transaction Classifier — SetFit (v3)

A [SetFit](https://github.com/huggingface/setfit) model built on `sentence-transformers/all-MiniLM-L6-v2` that classifies bank transaction strings into 10 budget categories using contrastive few-shot learning.

This is **version 3** in a progressive model development series. It demonstrated that pre-trained semantic embeddings dramatically outperform traditional NLP approaches for transaction classification, jumping from 55.7% to 80.5% real-world accuracy.

## Model Details

| Property | Value |
|---|---|
| Base model | `sentence-transformers/all-MiniLM-L6-v2` (22M params) |
| Framework | SetFit (contrastive learning + logistic head) |
| Task | Multi-class text classification (10 categories) |
| Training samples | 8,000 |
| Contrastive iterations | 20 |
| Epochs | 1 |
| Batch size | 32 |
| Format | SafeTensors + model_head.pkl |
| Trained | 2026-03-28 |

## Categories

| ID | Category |
|---|---|
| 0 | Food & Dining |
| 1 | Transportation |
| 2 | Shopping & Retail |
| 3 | Entertainment & Recreation |
| 4 | Healthcare & Medical |
| 5 | Utilities & Services |
| 6 | Financial Services |
| 7 | Income |
| 8 | Government & Legal |
| 9 | Charity & Donations |

## Performance

Evaluated on 505 unique real-world RBC transactions (3,113 weighted, 2019-2026).

### Overall

| Metric | Score |
|---|---|
| Real-world accuracy (weighted) | **80.5%** |
| SetFit-only accuracy | **66.7%** |
| Validation accuracy | 98.0% |

### Per-Category Accuracy

| Category | Accuracy |
|---|---|
| Income | 97.8% |
| Healthcare & Medical | 100.0% |
| Financial Services | 91.2% |
| Entertainment & Recreation | 88.6% |
| Food & Dining | 83.9% |
| Transportation | 83.3% |
| Shopping & Retail | 74.6% |
| Government & Legal | 54.5% |
| Utilities & Services | 34.2% |
| Charity & Donations | 0.0% |

## Usage

```python
from setfit import SetFitModel

model = SetFitModel.from_pretrained("maaz-zaidi/transaction-classifier-setfit")

predictions = model.predict([
    "STARBUCKS STORE 12345",
    "SHELL GAS STATION",
    "NETFLIX.COM"
])
print(predictions)
```

## Training Data

- **Primary**: [mitulshah/transaction-categorization](https://huggingface.co/datasets/mitulshah/transaction-categorization) - 8K samples from 3.6M records (gated dataset)
- **Evaluation**: 505 real-world RBC bank transactions (2019-2026)

## Key Breakthrough

SetFit's contrastive learning approach was the breakthrough moment in this project:
- **v2 (FastText) -> v3 (SetFit)**: 55.7% -> 80.5% overall accuracy
- FastText's ML-only accuracy was 14.8% (severe Income category bias). SetFit's ML-only accuracy was 66.7%.
- Pre-trained sentence embeddings understand real-world merchant concepts that character n-grams cannot capture.

## Part of a Series

See the [Transaction Classifier collection](https://huggingface.co/collections/maaz-zaidi/transaction-classifier) for all 7 model versions.

## Limitations

- Contrastive learning with a logistic regression head is outperformed by standard cross-entropy fine-tuning at this data scale (see v4)
- Utilities & Services at only 34.2% accuracy
- Domain-specific to Canadian banking transaction formats

## Citation

```bibtex
@misc{zaidi2026txnclassifier,
  title={Transaction Classifier: Multi-Stage Bank Transaction Categorization},
  author={Maaz Zaidi},
  year={2026},
  url={https://huggingface.co/maaz-zaidi/transaction-classifier-setfit}
}
```