VAGO solutions SauerkrautLM-LFM2.5-GLiNER

Zero-Shot NER Model – Bidirectional GLiNER on an LFM2.5-350M backbone — strong multilingual, PII and biomedical entity extraction

Introducing SauerkrautLM-LFM2.5-GLiNER – our zero-shot Named Entity Recognition model built on the LFM2.5-350M backbone, converted from causal to bidirectional attention and fine-tuned for the GLiNER span–label matching task.

Zero-shot extraction of arbitrary entity types provided as labels at inference — no retraining
Multilingual: English, French, German, Italian, Spanish
State-of-the-art PII / privacy recall across all five languages (avg 79.5 F1)
Large margin on biomedical NER (BioNLP-CG: 54.6 F1)
Compact 350M backbone — efficient to deploy

Overview of all SauerkrautLM-LFM2.5-GLiNER Models
Model Details
Capabilities
Evaluation
Usage
Disclaimer
Contact
Collaborations
Acknowledgement

All SauerkrautLM-LFM2.5-GLiNER

Model	HF	ONNX	GGUF
SauerkrautLM-LFM2.5-GLiNER	Link	coming soon	coming soon

Model Details

Model Name: SauerkrautLM-LFM2.5-GLiNER
Backbone: LFM2.5-350M (causal → bidirectional)
Task: Zero-shot Named Entity Recognition (GLiNER span–label matching)
Language(s): English, French, German, Italian, Spanish
License: lfm1.0
Contact: VAGO solutions

Architecture

Backbone: LFM2.5-350M, converted from causal to bidirectional attention (full self-attention + symmetric, center-padded convolutions) so that every token attends to the full context in both directions.
Head: standard GLiNER scoring head. Text spans and entity-type labels pass through the same encoder; entities are predicted via the dot product between span representations and label representations in a shared latent space. Because labels are free-form phrases supplied at inference, the model performs open-vocabulary, zero-shot extraction.

Why Bidirectional Attention?

The backbone was pretrained causally — each token only sees the tokens before it. NER, however, is not a left-to-right generation task: deciding what a token is and where an entity starts and ends frequently depends on context that appears after the token. Converting the encoder to bidirectional attention (and replacing the causal, left-padded convolutions with symmetric, center-padded ones) lets every token condition on the full sentence.

Cases where right-hand context is decisive:

Type disambiguation — In "Apple slipped 3% after the iPhone launch" vs "Apple is rich in fiber", the token Apple is an ORG in the first but not an entity in the second. The distinguishing evidence (slipped 3% / rich in fiber) comes later — a causal encoder cannot see it when encoding Apple.
Span boundaries — In "Bank of America", whether Bank opens a multi-token organization span only becomes clear from the of America that follows.
Structured / PII formats — Emails, phone numbers, IBANs and IDs are defined by their whole surface form. Recognizing such a span requires seeing the trailing characters (domain, check digits, separators), which a left-to-right view truncates — a likely cause of the weak PII recall observed with the causal backbone.

This is also why a masked-language-modeling adaptation stage precedes task training: bidirectional MLM is what teaches the converted encoder to actually use right-hand context and to build the surface-form sensitivity that structured-entity recall depends on. The measured outcome is consistent with this — strong PII results across all five languages and a large biomedical-NER margin.

Training Procedure

The model is produced in three sequential stages.

Stage 1 — Bidirectional MLM adaptation. The causal backbone is converted to bidirectional attention and adapted with a masked-language- modeling objective. This teaches the model to use both left and right context and builds the surface-form / format sensitivity that NER (especially structured entities) depends on. Output: a dense bidirectional encoder checkpoint.

Data: ≈3.8M documents (multilingual), 2 epochs
Sequence length 512, 15% token masking

Stage 2 — GLiNER task training. The adapted encoder is fine-tuned on the GLiNER NER objective (BCE over span–label scores), establishing the shared latent space between spans and labels and the zero-shot capability.

Data: ≈772k annotated examples (multilingual)
≈110k distinct entity-type labels — free-form phrases, enabling open-vocabulary zero-shot extraction

Stage 3 — Refinement on higher-quality data. A second fine-tuning pass on a smaller, cleaned, higher-quality set sharpens precision and recall. This stage delivers the main quality gains over the task-trained model.

Data: ≈79k high-quality examples (multilingual)
≈96k distinct entity-type labels

Capabilities

Zero-shot NER — extract arbitrary entity types provided as labels at inference time, no retraining required.
Multilingual — English, French, German, Italian, Spanish.
Strong on general NER (CrossNER), privacy / PII entities, and domain benchmarks (biomedical).

Evaluation

All models were evaluated under a single shared benchmark harness (F1 ×100). Please note that benchmark results in absolute numbers may differ from other published pipelines; the relative differences remain consistent.

Capability Overview — Final Checkpoint

Benchmark	F1
CrossNER — English (avg)	78.4
CrossNER — multilingual (avg)	72.5
Privacy / PII — multilingual (avg)	79.5
Biomedical NER (BioNLP-CG)	54.6

CrossNER — Multilingual Zero-Shot NER

Model	EN	FR	DE	IT	ES	avg
SauerkrautLM-LFM2.5-GLiNER (ours)	78.4	71.4	69.0	71.2	72.4	72.5
SauerkrautLM-GLiNER	73.8	71.2	68.7	71.3	72.0	71.4
urchade/gliner_large-v2.1	71.9	57.3	55.8	58.1	58.6	60.3
urchade/gliner_multi-v2.1	72.2	46.7	46.8	48.1	48.9	52.5

Privacy / PII — Multilingual Entity Extraction

Model	EN	FR	DE	IT	ES	avg
SauerkrautLM-LFM2.5-GLiNER (ours)	78.7	81.8	76.5	79.4	81.4	79.5
urchade/gliner_large-v2.1	72.0	76.1	70.3	68.9	72.2	71.9
urchade/gliner_multi-v2.1	51.1	62.2	58.6	57.6	58.0	57.5
SauerkrautLM-GLiNER	65.8	52.9	57.8	53.6	46.2	55.2

Biomedical NER — BioNLP-CG (EN)

Model	F1
SauerkrautLM-LFM2.5-GLiNER (ours)	54.6
SauerkrautLM-GLiNER	36.3
urchade/gliner_large-v2.1	35.5
urchade/gliner_multi-v2.1	29.4

Usage

This is a GLiNER model. Install the library and provide the entity types you want to extract as labels at inference time:

pip install gliner

from gliner import GLiNER

model = GLiNER.from_pretrained("VAGOsolutions/SauerkrautLM-LFM2.5-GLiNER")

text = "Maria Schmidt arbeitet bei Siemens in München, E-Mail: maria.schmidt@siemens.com"

# free-form labels — change them per request, no retraining needed
labels = ["person", "organization", "location", "email"]

entities = model.predict_entities(text, labels, threshold=0.5)
for entity in entities:
    print(f"{entity['text']} => {entity['label']}  ({entity['score']:.2f})")

Disclaimer

We must inform users that despite our best efforts in data cleansing, the possibility of uncensored or incorrect content slipping through cannot be entirely ruled out. We cannot guarantee consistently appropriate behavior. Therefore, if you encounter any issues or come across inappropriate content, we kindly request that you inform us through the contact information provided. Additionally, it is essential to understand that the licensing of these models does not constitute legal advice. We are not held responsible for the actions of third parties who utilize our models.

Contact

If you are interested in customized LLMs or NER/PII extraction solutions for business applications, please get in contact with us via our website. We are also grateful for your feedback and suggestions.

Collaborations

We are also keenly seeking support and investment for our startup, VAGO solutions, where we continuously advance the development of robust language models designed to address a diverse range of purposes and requirements. If the prospect of collaboratively navigating future challenges excites you, we warmly invite you to reach out to us at VAGO solutions.

Citation

If you use SauerkrautLM-LFM2.5-GLiNER in your research or applications, please cite:

@misc{SauerkrautLM-LFM2.5-GLiNER,
  title={SauerkrautLM-LFM2.5-GLiNER},
  author={Michele Montebovi},
  organization={VAGO Solutions},
  url={https://huggingface.co/VAGOsolutions/SauerkrautLM-LFM2.5-GLiNER},
  year={2026}
}

Acknowledgement

Many thanks to Liquid AI for the LFM2 base model, to urchade for the GLiNER framework, and to our community for their continued support and engagement.

Downloads last month: -

VAGOsolutions
/

SauerkrautLM-LFM2.5-GLiNER