SpanMarker for GermEval 2014 NER
This is a SpanMarker model that was fine-tuned on the GermEval 2014 NER Dataset.
The GermEval 2014 NER Shared Task builds on a new dataset with German Named Entity annotation with the following
properties: The data was sampled from German Wikipedia and News Corpora as a collection of citations. The dataset
covers over 31,000 sentences corresponding to over 590,000 tokens. The NER annotation uses the NoSta-D guidelines,
which extend the Tübingen Treebank guidelines, using four main NER categories with sub-structure, and annotating
embeddings among NEs such as [ORG FC Kickers [LOC Darmstadt]].
12 classes of Named Entites are annotated and must be recognized: four main classes PERson, LOCation, ORGanisation,
and OTHer and their subclasses by introducing two fine-grained labels: -deriv marks derivations from NEs such as
"englisch" (“English”), and -part marks compounds including a NE as a subsequence deutschlandweit (“Germany-wide”).
Fine-Tuning
We use the same hyper-parameters as used in the "German's Next Language Model" paper using the released GELECTRA Large model as backbone.
Evaluation is performed with SpanMarkers internal evaluation code that uses seqeval. Additionally we use
the official GermEval 2014 Evaluation Script for double-checking the results. A backup of the nereval.py script
can be found here.
We fine-tune 5 models and upload the model with best F1-Score on development set. Results on development set are in brackets:
| Model | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | Avg. |
|---|---|---|---|---|---|---|
| GELECTRA Large (5e-05) | (89.99) / 89.08 | (89.55) / 89.23 | (89.60) / 89.10 | (89.34) / 89.02 | (89.68) / 88.80 | (89.63) / 89.05 |
The best model achieves a final test score of 89.08%:
1. Strict, Combined Evaluation (official):
Accuracy: 99.26%;
Precision: 89.01%;
Recall: 89.16%;
FB1: 89.08
Scripts for training and evaluation are also available.
Usage
The fine-tuned model can be used like:
from span_marker import SpanMarkerModel
# Download from the 🤗 Hub
model = SpanMarkerModel.from_pretrained("stefan-it/span-marker-gelectra-large-germeval14")
# Run inference
entities = model.predict("Jürgen Schmidhuber studierte ab 1983 Informatik und Mathematik an der TU München .")
- Downloads last month
- 23
Model tree for stefan-it/span-marker-gelectra-large-germeval14
Dataset used to train stefan-it/span-marker-gelectra-large-germeval14
Evaluation results
- F1 on GermEval 2014test set self-reported0.891
- Precision on GermEval 2014test set self-reported0.890
- Recall on GermEval 2014test set self-reported0.892