| Feature |
Description |
| Name |
BiomedNLP-PubMedBERT-ProteinStructure-NER-v1.4 |
| Default Pipeline |
transformer, ner |
| Components |
transformer, ner |
| Vectors |
0 keys, 0 unique vectors (0 dimensions) |
| Sources |
n/a |
| License |
n/a |
| Author |
Melanie Vollmar |
Label Scheme
View label scheme (19 labels for 1 components)
| Component |
Labels |
ner |
"chemical", "complex_assembly", "evidence", "experimental_method", "gene", "mutant", "oligomeric_state", "protein", "protein_state", "protein_type", "ptm", "residue_name", "residue_name_number", "residue_number", "residue_range", "site", "species", "structure_element", "taxonomy_domain" |
Scores for entity types
| entity type |
precision |
recall |
F1 |
sample number |
| "chemical" |
0.90 |
0.93 |
0.92 |
390 |
| "complex_assembly" |
0.88 |
0.91 |
0.89 |
162 |
| "evidence" |
0.86 |
0.89 |
0.88 |
272 |
| "experimental_method" |
0.73 |
0.76 |
0.75 |
240 |
| "gene" |
0.89 |
0.86 |
0.88 |
66 |
| "mutant" |
0.93 |
0.95 |
0.94 |
495 |
| "oligomeric_state" |
0.88 |
1.00 |
0.93 |
64 |
| "protein" |
0.97 |
0.97 |
0.97 |
1017 |
| "protein_state" |
0.78 |
0.85 |
0.81 |
363 |
| "protein_type" |
0.84 |
0.90 |
0.87 |
262 |
| "ptm" |
0.64 |
0.81 |
0.71 |
37 |
| "residue_name" |
0.97 |
0.92 |
0.94 |
84 |
| "residue_name_number" |
0.98 |
0.99 |
0.99 |
487 |
| "residue_number" |
1.00 |
0.93 |
0.96 |
14 |
| "residue_range" |
0.86 |
0.91 |
0.89 |
47 |
| "site" |
0.83 |
0.86 |
0.85 |
139 |
| "species" |
0.97 |
1.00 |
0.98 |
59 |
| "structure_element" |
0.91 |
0.92 |
0.91 |
677 |
| "taxonomy_domain" |
0.97 |
0.96 |
0.97 |
73 |
Data and annotations
The dataset can be found here: https://huggingface.co/datasets/mevol/protein_structure_NER_model_v1.4
Citation
Vollmar, M., Tirunagari, S., Harrus, D. et al.
Dataset from a human-in-the-loop approach to identify functionally important protein residues from literature.
Sci Data 11, 1032
2024
https://doi.org/10.1038/s41597-024-03841-9