Sentence Similarity
sentence-transformers
GGUF
feature-extraction
medical
biology
embeddings

BioLORD-2023-M GGUF

GGUF-quantized version of FremyCompany/BioLORD-2023-M, a multilingual biomedical sentence embedding model trained with the BioLORD strategy on SNOMED CT and UMLS ontologies.

Model Details

Property Value
Architecture XLM-RoBERTa (12 layers, 768-dim)
Parameters ~278M
Context length 512 tokens
Pooling Mean token pooling
Quantization Q8_0
File size ~296 MB
Base model sentence-transformers/paraphrase-multilingual-mpnet-base-v2
Languages English, Spanish, French, German, Italian*, Dutch, Danish, Swedish

* Italian is not officially supported by the upstream model but tested cross-lingual similarity (IT↔EN) scores 0.95–0.99 on biomedical terms, on par with officially supported languages.

Available Files

File Quantization Size Description
BioLORD-2023-M-Q8_0.gguf Q8_0 ~296 MB 8-bit quantization, near-lossless quality

Usage with llama.cpp

# Generate embeddings
llama-embedding -m BioLORD-2023-M-Q8_0.gguf -p "atrial fibrillation"

About BioLORD-2023-M

BioLORD is a pre-training strategy for producing meaningful representations for clinical sentences and biomedical concepts. It overcomes limitations of prior methods by grounding concept representations using definitions and short descriptions derived from biomedical ontologies (SNOMED CT, UMLS).

BioLORD-2023-M is the multilingual variant, distilled from the English-only BioLORD-2023 model. It achieves state-of-the-art results for text similarity on clinical sentences (MedSTS) and biomedical concepts (EHR-Rel-B).

Sibling models

License

This model inherits the licensing terms of the original FremyCompany/BioLORD-2023-M.

Important: The training data includes concepts from SNOMED CT (IHTSDO license) and UMLS (NLM license). Users must comply with the respective data use agreements:

The model weights themselves derive from the paraphrase-multilingual-mpnet-base-v2 base (Apache 2.0), but the combined work carries the IHTSDO and NLM licensing constraints from the training data.

Citation

@article{remy-etal-2023-biolord,
    author = {Remy, François and Demuynck, Kris and Demeester, Thomas},
    title = "{BioLORD-2023: semantic textual representations fusing large language models and clinical knowledge graph insights}",
    journal = {Journal of the American Medical Informatics Association},
    pages = {ocae029},
    year = {2024},
    month = {02},
    doi = {10.1093/jamia/ocae029},
}

Conversion

Converted from safetensors to GGUF using llama.cpp convert_hf_to_gguf.py.

Downloads last month
-
GGUF
Model size
0.3B params
Architecture
bert
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Novaloop/BioLORD-2023-M-GGUF

Quantized
(1)
this model

Datasets used to train Novaloop/BioLORD-2023-M-GGUF