Instructions to use samirmsallem/xlm-roberta-base-definitions_ner with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use samirmsallem/xlm-roberta-base-definitions_ner with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="samirmsallem/xlm-roberta-base-definitions_ner")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("samirmsallem/xlm-roberta-base-definitions_ner") model = AutoModelForTokenClassification.from_pretrained("samirmsallem/xlm-roberta-base-definitions_ner") - Notebooks
- Google Colab
- Kaggle
NER model for definition component recognition in German scientific texts
xlm-roberta-base-definitions_ner is a NER model (token classification) in the scientific domain in German, finetuned from the model xlm-roberta-base. It was trained using a custom annotated dataset of around 10,000 training and 2,000 test examples containing definition- and non-definition-related sentences from wikipedia articles in german.
The model is specifically designed to recognize and classify components of definitions, using the following entity labels:
- DF: Definiendum (the term being defined)
- VF: Definitor (the verb or phrase introducing the definition)
- GF: Definiens (the explanation or meaning)
Training was conducted using a standard NER objective. The model achieves an F1 score of approximately 83% on the evaluation set.
Here are the overall final metrics on the test dataset after 5 epochs of training:
- f1: 0.8262004492199356
- precision: 0.8189914550487424
- recall: 0.8335374816266536
- loss: 0.312337189912796
Model Performance Comparision on wiki_definitions_de_multitask:
| Model | Precision | Recall | F1 Score | Eval Samples per Second | Epoch |
|---|---|---|---|---|---|
| distilbert-base-multilingual-cased-definitions_ner | 80.76 | 81.74 | 81.25 | 457.53 | 5.0 |
| scibert_scivocab_cased-definitions_ner | 80.54 | 82.11 | 81.32 | 236.61 | 4.0 |
| GottBERT_base_best-definitions_ner | 82.98 | 82.81 | 82.90 | 272.26 | 5.0 |
| xlm-roberta-base-definitions_ner | 81.90 | 83.35 | 82.62 | 241.21 | 5.0 |
| gbert-base-definitions_ner | 82.73 | 83.56 | 83.14 | 278.87 | 5.0 |
| gbert-large-definitions_ner | 80.67 | 83.36 | 81.99 | 109.83 | 2.0 |
- Downloads last month
- 6
Model tree for samirmsallem/xlm-roberta-base-definitions_ner
Base model
FacebookAI/xlm-roberta-baseDataset used to train samirmsallem/xlm-roberta-base-definitions_ner
Collection including samirmsallem/xlm-roberta-base-definitions_ner
Evaluation results
- F1 on samirmsallem/wiki_def_de_multitaskself-reported0.826
- Precision on samirmsallem/wiki_def_de_multitaskself-reported0.819
- Recall on samirmsallem/wiki_def_de_multitaskself-reported0.834
- Loss on samirmsallem/wiki_def_de_multitaskself-reported0.312