BioLORD-2023-M GGUF

GGUF-quantized version of FremyCompany/BioLORD-2023-M, a multilingual biomedical sentence embedding model trained with the BioLORD strategy on SNOMED CT and UMLS ontologies.

Model Details

Property	Value
Architecture	XLM-RoBERTa (12 layers, 768-dim)
Parameters	~278M
Context length	512 tokens
Pooling	Mean token pooling
Quantization	Q8_0
File size	~296 MB
Base model	sentence-transformers/paraphrase-multilingual-mpnet-base-v2
Languages	English, Spanish, French, German, Italian*, Dutch, Danish, Swedish

* Italian is not officially supported by the upstream model but tested cross-lingual similarity (IT↔EN) scores 0.95–0.99 on biomedical terms, on par with officially supported languages.

Available Files

File	Quantization	Size	Description
`BioLORD-2023-M-Q8_0.gguf`	Q8_0	~296 MB	8-bit quantization, near-lossless quality

Usage with llama.cpp

# Generate embeddings
llama-embedding -m BioLORD-2023-M-Q8_0.gguf -p "atrial fibrillation"

About BioLORD-2023-M

BioLORD is a pre-training strategy for producing meaningful representations for clinical sentences and biomedical concepts. It overcomes limitations of prior methods by grounding concept representations using definitions and short descriptions derived from biomedical ontologies (SNOMED CT, UMLS).

BioLORD-2023-M is the multilingual variant, distilled from the English-only BioLORD-2023 model. It achieves state-of-the-art results for text similarity on clinical sentences (MedSTS) and biomedical concepts (EHR-Rel-B).

Sibling models

BioLORD-2023 — best monolingual English model
BioLORD-2023-S — monolingual English, no model averaging
BioLORD-2023-C — contrastive training only

License

This model inherits the licensing terms of the original FremyCompany/BioLORD-2023-M.

Important: The training data includes concepts from SNOMED CT (IHTSDO license) and UMLS (NLM license). Users must comply with the respective data use agreements:

SNOMED CT: Requires an IHTSDO affiliate license for use in countries without a national license.
UMLS: Requires a free UMLS Terminology Services (UTS) account and agreement to the UMLS Metathesaurus License.

The model weights themselves derive from the paraphrase-multilingual-mpnet-base-v2 base (Apache 2.0), but the combined work carries the IHTSDO and NLM licensing constraints from the training data.

Citation

@article{remy-etal-2023-biolord,
    author = {Remy, François and Demuynck, Kris and Demeester, Thomas},
    title = "{BioLORD-2023: semantic textual representations fusing large language models and clinical knowledge graph insights}",
    journal = {Journal of the American Medical Informatics Association},
    pages = {ocae029},
    year = {2024},
    month = {02},
    doi = {10.1093/jamia/ocae029},
}

Conversion

Converted from safetensors to GGUF using llama.cpp convert_hf_to_gguf.py.

Downloads last month: -

GGUF

Model size

0.3B params

Architecture

bert

Hardware compatibility

8-bit

Model tree for Novaloop/BioLORD-2023-M-GGUF

Base model

FremyCompany/BioLORD-2023-M

Quantized

(1)

this model

Novaloop
/

BioLORD-2023-M-GGUF