Instructions to use VAGOsolutions/SauerkrautLM-LFM2.5-GLiNER with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- GLiNER
How to use VAGOsolutions/SauerkrautLM-LFM2.5-GLiNER with GLiNER:
from gliner import GLiNER model = GLiNER.from_pretrained("VAGOsolutions/SauerkrautLM-LFM2.5-GLiNER") - Notebooks
- Google Colab
- Kaggle
VAGO solutions SauerkrautLM-LFM2.5-GLiNER
Zero-Shot NER Model โ Bidirectional GLiNER on an LFM2.5-350M backbone โ strong multilingual, PII and biomedical entity extraction
Introducing SauerkrautLM-LFM2.5-GLiNER โ our zero-shot Named Entity Recognition model built on the LFM2.5-350M backbone, converted from causal to bidirectional attention and fine-tuned for the GLiNER spanโlabel matching task.
- Zero-shot extraction of arbitrary entity types provided as labels at inference โ no retraining
- Multilingual: English, French, German, Italian, Spanish
- State-of-the-art PII / privacy recall across all five languages (avg 79.5 F1)
- Large margin on biomedical NER (BioNLP-CG: 54.6 F1)
- Compact 350M backbone โ efficient to deploy
Table of Contents
- Overview of all SauerkrautLM-LFM2.5-GLiNER Models
- Model Details
- Capabilities
- Evaluation
- Usage
- Disclaimer
- Contact
- Collaborations
- Acknowledgement
All SauerkrautLM-LFM2.5-GLiNER
| Model | HF | ONNX | GGUF |
|---|---|---|---|
| SauerkrautLM-LFM2.5-GLiNER | Link | coming soon | coming soon |
Model Details
- Model Name: SauerkrautLM-LFM2.5-GLiNER
- Backbone: LFM2.5-350M (causal โ bidirectional)
- Task: Zero-shot Named Entity Recognition (GLiNER spanโlabel matching)
- Language(s): English, French, German, Italian, Spanish
- License: lfm1.0
- Contact: VAGO solutions
Architecture
- Backbone: LFM2.5-350M, converted from causal to bidirectional attention (full self-attention + symmetric, center-padded convolutions) so that every token attends to the full context in both directions.
- Head: standard GLiNER scoring head. Text spans and entity-type labels pass through the same encoder; entities are predicted via the dot product between span representations and label representations in a shared latent space. Because labels are free-form phrases supplied at inference, the model performs open-vocabulary, zero-shot extraction.
Why Bidirectional Attention?
The backbone was pretrained causally โ each token only sees the tokens before it. NER, however, is not a left-to-right generation task: deciding what a token is and where an entity starts and ends frequently depends on context that appears after the token. Converting the encoder to bidirectional attention (and replacing the causal, left-padded convolutions with symmetric, center-padded ones) lets every token condition on the full sentence.
Cases where right-hand context is decisive:
- Type disambiguation โ In "Apple slipped 3% after the iPhone launch" vs "Apple is rich in fiber", the token Apple is an ORG in the first but not an entity in the second. The distinguishing evidence (slipped 3% / rich in fiber) comes later โ a causal encoder cannot see it when encoding Apple.
- Span boundaries โ In "Bank of America", whether Bank opens a multi-token organization span only becomes clear from the of America that follows.
- Structured / PII formats โ Emails, phone numbers, IBANs and IDs are defined by their whole surface form. Recognizing such a span requires seeing the trailing characters (domain, check digits, separators), which a left-to-right view truncates โ a likely cause of the weak PII recall observed with the causal backbone.
This is also why a masked-language-modeling adaptation stage precedes task training: bidirectional MLM is what teaches the converted encoder to actually use right-hand context and to build the surface-form sensitivity that structured-entity recall depends on. The measured outcome is consistent with this โ strong PII results across all five languages and a large biomedical-NER margin.
Training Procedure
The model is produced in three sequential stages.
Stage 1 โ Bidirectional MLM adaptation. The causal backbone is converted to bidirectional attention and adapted with a masked-language- modeling objective. This teaches the model to use both left and right context and builds the surface-form / format sensitivity that NER (especially structured entities) depends on. Output: a dense bidirectional encoder checkpoint.
- Data: โ3.8M documents (multilingual), 2 epochs
- Sequence length 512, 15% token masking
Stage 2 โ GLiNER task training. The adapted encoder is fine-tuned on the GLiNER NER objective (BCE over spanโlabel scores), establishing the shared latent space between spans and labels and the zero-shot capability.
- Data: โ772k annotated examples (multilingual)
- โ110k distinct entity-type labels โ free-form phrases, enabling open-vocabulary zero-shot extraction
Stage 3 โ Refinement on higher-quality data. A second fine-tuning pass on a smaller, cleaned, higher-quality set sharpens precision and recall. This stage delivers the main quality gains over the task-trained model.
- Data: โ79k high-quality examples (multilingual)
- โ96k distinct entity-type labels
Capabilities
- Zero-shot NER โ extract arbitrary entity types provided as labels at inference time, no retraining required.
- Multilingual โ English, French, German, Italian, Spanish.
- Strong on general NER (CrossNER), privacy / PII entities, and domain benchmarks (biomedical).
Evaluation
All models were evaluated under a single shared benchmark harness (F1 ร100). Please note that benchmark results in absolute numbers may differ from other published pipelines; the relative differences remain consistent.
Capability Overview โ Final Checkpoint
| Benchmark | F1 |
|---|---|
| CrossNER โ English (avg) | 78.4 |
| CrossNER โ multilingual (avg) | 72.5 |
| Privacy / PII โ multilingual (avg) | 79.5 |
| Biomedical NER (BioNLP-CG) | 54.6 |
CrossNER โ Multilingual Zero-Shot NER
| Model | EN | FR | DE | IT | ES | avg |
|---|---|---|---|---|---|---|
| SauerkrautLM-LFM2.5-GLiNER (ours) | 78.4 | 71.4 | 69.0 | 71.2 | 72.4 | 72.5 |
| SauerkrautLM-GLiNER | 73.8 | 71.2 | 68.7 | 71.3 | 72.0 | 71.4 |
| urchade/gliner_large-v2.1 | 71.9 | 57.3 | 55.8 | 58.1 | 58.6 | 60.3 |
| urchade/gliner_multi-v2.1 | 72.2 | 46.7 | 46.8 | 48.1 | 48.9 | 52.5 |
Privacy / PII โ Multilingual Entity Extraction
| Model | EN | FR | DE | IT | ES | avg |
|---|---|---|---|---|---|---|
| SauerkrautLM-LFM2.5-GLiNER (ours) | 78.7 | 81.8 | 76.5 | 79.4 | 81.4 | 79.5 |
| urchade/gliner_large-v2.1 | 72.0 | 76.1 | 70.3 | 68.9 | 72.2 | 71.9 |
| urchade/gliner_multi-v2.1 | 51.1 | 62.2 | 58.6 | 57.6 | 58.0 | 57.5 |
| SauerkrautLM-GLiNER | 65.8 | 52.9 | 57.8 | 53.6 | 46.2 | 55.2 |
Biomedical NER โ BioNLP-CG (EN)
| Model | F1 |
|---|---|
| SauerkrautLM-LFM2.5-GLiNER (ours) | 54.6 |
| SauerkrautLM-GLiNER | 36.3 |
| urchade/gliner_large-v2.1 | 35.5 |
| urchade/gliner_multi-v2.1 | 29.4 |
Usage
This is a GLiNER model. Install the library and provide the entity types you want to extract as labels at inference time:
pip install gliner
from gliner import GLiNER
model = GLiNER.from_pretrained("VAGOsolutions/SauerkrautLM-LFM2.5-GLiNER")
text = "Maria Schmidt arbeitet bei Siemens in Mรผnchen, E-Mail: maria.schmidt@siemens.com"
# free-form labels โ change them per request, no retraining needed
labels = ["person", "organization", "location", "email"]
entities = model.predict_entities(text, labels, threshold=0.5)
for entity in entities:
print(f"{entity['text']} => {entity['label']} ({entity['score']:.2f})")
Disclaimer
We must inform users that despite our best efforts in data cleansing, the possibility of uncensored or incorrect content slipping through cannot be entirely ruled out. We cannot guarantee consistently appropriate behavior. Therefore, if you encounter any issues or come across inappropriate content, we kindly request that you inform us through the contact information provided. Additionally, it is essential to understand that the licensing of these models does not constitute legal advice. We are not held responsible for the actions of third parties who utilize our models.
Contact
If you are interested in customized LLMs or NER/PII extraction solutions for business applications, please get in contact with us via our website. We are also grateful for your feedback and suggestions.
Collaborations
We are also keenly seeking support and investment for our startup, VAGO solutions, where we continuously advance the development of robust language models designed to address a diverse range of purposes and requirements. If the prospect of collaboratively navigating future challenges excites you, we warmly invite you to reach out to us at VAGO solutions.
Citation
If you use SauerkrautLM-LFM2.5-GLiNER in your research or applications, please cite:
@misc{SauerkrautLM-LFM2.5-GLiNER,
title={SauerkrautLM-LFM2.5-GLiNER},
author={Michele Montebovi},
organization={VAGO Solutions},
url={https://huggingface.co/VAGOsolutions/SauerkrautLM-LFM2.5-GLiNER},
year={2026}
}
Acknowledgement
Many thanks to Liquid AI for the LFM2 base model, to urchade for the GLiNER framework, and to our community for their continued support and engagement.
- Downloads last month
- -





