Note: This hybrid program is intended to be used in its corresponding Space.
ScandiProb is an intentionally data-constrained, multi-label language ID hybrid text classifier for Norwegian, Swedish, and Danish, based on ScandiBERT. It was done as an undergraduate final project for a Spring 2026 NLP course at the University of Alaska Fairbanks. It is licensed under AGPL-3.0.
The full program utilizes a fine-tuned ScandiBERT, trained on limited amounts of OPUS-100, and combined with regex-enforced heuristics. Achieves ~93% macro-F1 score on OPUS-100 test set and ~84% macro-F1 score against the comprehensive SLIDE eval set, with a fraction of the training data used in SLIDE.
(GitHub | Kaggle Notebooks | Space)
- Downloads last month
- 15
Model tree for ianro04/ScandiProb
Base model
vesteinn/ScandiBERT