maaz-zaidi's picture
base files
cd429ee verified
metadata
language: en
license: apache-2.0
library_name: setfit
tags:
  - text-classification
  - transaction-classification
  - banking
  - finance
  - setfit
  - sentence-transformers
  - few-shot-learning
  - contrastive-learning
datasets:
  - mitulshah/transaction-categorization
base_model: sentence-transformers/all-MiniLM-L6-v2
pipeline_tag: text-classification
model-index:
  - name: transaction-classifier-setfit
    results:
      - task:
          type: text-classification
          name: Transaction Classification
        metrics:
          - name: Real-World Accuracy (Weighted)
            type: accuracy
            value: 0.805
          - name: SetFit-Only Accuracy
            type: accuracy
            value: 0.667
          - name: Validation Accuracy
            type: accuracy
            value: 0.98

Transaction Classifier — SetFit (v3)

A SetFit model built on sentence-transformers/all-MiniLM-L6-v2 that classifies bank transaction strings into 10 budget categories using contrastive few-shot learning.

This is version 3 in a progressive model development series. It demonstrated that pre-trained semantic embeddings dramatically outperform traditional NLP approaches for transaction classification, jumping from 55.7% to 80.5% real-world accuracy.

Model Details

Property Value
Base model sentence-transformers/all-MiniLM-L6-v2 (22M params)
Framework SetFit (contrastive learning + logistic head)
Task Multi-class text classification (10 categories)
Training samples 8,000
Contrastive iterations 20
Epochs 1
Batch size 32
Format SafeTensors + model_head.pkl
Trained 2026-03-28

Categories

ID Category
0 Food & Dining
1 Transportation
2 Shopping & Retail
3 Entertainment & Recreation
4 Healthcare & Medical
5 Utilities & Services
6 Financial Services
7 Income
8 Government & Legal
9 Charity & Donations

Performance

Evaluated on 505 unique real-world RBC transactions (3,113 weighted, 2019-2026).

Overall

Metric Score
Real-world accuracy (weighted) 80.5%
SetFit-only accuracy 66.7%
Validation accuracy 98.0%

Per-Category Accuracy

Category Accuracy
Income 97.8%
Healthcare & Medical 100.0%
Financial Services 91.2%
Entertainment & Recreation 88.6%
Food & Dining 83.9%
Transportation 83.3%
Shopping & Retail 74.6%
Government & Legal 54.5%
Utilities & Services 34.2%
Charity & Donations 0.0%

Usage

from setfit import SetFitModel

model = SetFitModel.from_pretrained("maaz-zaidi/transaction-classifier-setfit")

predictions = model.predict([
    "STARBUCKS STORE 12345",
    "SHELL GAS STATION",
    "NETFLIX.COM"
])
print(predictions)

Training Data

Key Breakthrough

SetFit's contrastive learning approach was the breakthrough moment in this project:

  • v2 (FastText) -> v3 (SetFit): 55.7% -> 80.5% overall accuracy
  • FastText's ML-only accuracy was 14.8% (severe Income category bias). SetFit's ML-only accuracy was 66.7%.
  • Pre-trained sentence embeddings understand real-world merchant concepts that character n-grams cannot capture.

Part of a Series

See the Transaction Classifier collection for all 7 model versions.

Limitations

  • Contrastive learning with a logistic regression head is outperformed by standard cross-entropy fine-tuning at this data scale (see v4)
  • Utilities & Services at only 34.2% accuracy
  • Domain-specific to Canadian banking transaction formats

Citation

@misc{zaidi2026txnclassifier,
  title={Transaction Classifier: Multi-Stage Bank Transaction Categorization},
  author={Maaz Zaidi},
  year={2026},
  url={https://huggingface.co/maaz-zaidi/transaction-classifier-setfit}
}