BioLORD-2023-M GGUF
GGUF-quantized version of FremyCompany/BioLORD-2023-M, a multilingual biomedical sentence embedding model trained with the BioLORD strategy on SNOMED CT and UMLS ontologies.
Model Details
| Property | Value |
|---|---|
| Architecture | XLM-RoBERTa (12 layers, 768-dim) |
| Parameters | ~278M |
| Context length | 512 tokens |
| Pooling | Mean token pooling |
| Quantization | Q8_0 |
| File size | ~296 MB |
| Base model | sentence-transformers/paraphrase-multilingual-mpnet-base-v2 |
| Languages | English, Spanish, French, German, Italian*, Dutch, Danish, Swedish |
* Italian is not officially supported by the upstream model but tested cross-lingual similarity (IT↔EN) scores 0.95–0.99 on biomedical terms, on par with officially supported languages.
Available Files
| File | Quantization | Size | Description |
|---|---|---|---|
BioLORD-2023-M-Q8_0.gguf |
Q8_0 | ~296 MB | 8-bit quantization, near-lossless quality |
Usage with llama.cpp
# Generate embeddings
llama-embedding -m BioLORD-2023-M-Q8_0.gguf -p "atrial fibrillation"
About BioLORD-2023-M
BioLORD is a pre-training strategy for producing meaningful representations for clinical sentences and biomedical concepts. It overcomes limitations of prior methods by grounding concept representations using definitions and short descriptions derived from biomedical ontologies (SNOMED CT, UMLS).
BioLORD-2023-M is the multilingual variant, distilled from the English-only BioLORD-2023 model. It achieves state-of-the-art results for text similarity on clinical sentences (MedSTS) and biomedical concepts (EHR-Rel-B).
Sibling models
- BioLORD-2023 — best monolingual English model
- BioLORD-2023-S — monolingual English, no model averaging
- BioLORD-2023-C — contrastive training only
License
This model inherits the licensing terms of the original FremyCompany/BioLORD-2023-M.
Important: The training data includes concepts from SNOMED CT (IHTSDO license) and UMLS (NLM license). Users must comply with the respective data use agreements:
- SNOMED CT: Requires an IHTSDO affiliate license for use in countries without a national license.
- UMLS: Requires a free UMLS Terminology Services (UTS) account and agreement to the UMLS Metathesaurus License.
The model weights themselves derive from the paraphrase-multilingual-mpnet-base-v2 base (Apache 2.0), but the combined work carries the IHTSDO and NLM licensing constraints from the training data.
Citation
@article{remy-etal-2023-biolord,
author = {Remy, François and Demuynck, Kris and Demeester, Thomas},
title = "{BioLORD-2023: semantic textual representations fusing large language models and clinical knowledge graph insights}",
journal = {Journal of the American Medical Informatics Association},
pages = {ocae029},
year = {2024},
month = {02},
doi = {10.1093/jamia/ocae029},
}
Conversion
Converted from safetensors to GGUF using llama.cpp convert_hf_to_gguf.py.
- Downloads last month
- -
8-bit
Model tree for Novaloop/BioLORD-2023-M-GGUF
Base model
FremyCompany/BioLORD-2023-M