Note: This hybrid program is intended to be used in its corresponding Space.

ScandiProb is an intentionally data-constrained, multi-label language ID hybrid text classifier for Norwegian, Swedish, and Danish, based on ScandiBERT. It was done as an undergraduate final project for a Spring 2026 NLP course at the University of Alaska Fairbanks. It is licensed under AGPL-3.0.

The full program utilizes a fine-tuned ScandiBERT, trained on limited amounts of OPUS-100, and combined with regex-enforced heuristics. Achieves ~93% macro-F1 score on OPUS-100 test set and ~84% macro-F1 score against the comprehensive SLIDE eval set, with a fraction of the training data used in SLIDE.

(GitHub | Kaggle Notebooks | Space)

Downloads last month: 15

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for ianro04/ScandiProb

Base model

vesteinn/ScandiBERT

Finetuned

(4)

this model

ianro04
/

ScandiProb

Model tree for ianro04/ScandiProb

Dataset used to train ianro04/ScandiProb

Space using ianro04/ScandiProb 1