SciBERT Concept Annotation
This model is a fine-tuned version of SciBERT for Concept Annotation. It classifies the relationship between a document text and a specific concept/term using sequence classification.
Model Description
- Model type: SciBERT (BERT-based)
- Language(s): English
- License: Apache 2.0
- Fine-tuned from model:
allenai/scibert_scivocab_uncased
Usage
You can use this model directly with a custom inference script. Note that while the model weights are hosted here, it is designed to work with the allenai/scibert_scivocab_uncased tokenizer.
Example Code
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
# Load model and tokenizer
model_id = "linh101201/scibert-concept-annotation"
tokenizer_id = "allenai/scibert_scivocab_uncased"
model = AutoModelForSequenceClassification.from_pretrained(model_id, num_labels=2).to("cuda")
tokenizer = AutoTokenizer.from_pretrained(tokenizer_id)
# Example inputs: Document text and the Concept to annotate
text = "Large Language Model in Law Documents Hub"
concept = "natural language processing"
inputs = tokenizer(text, concept, return_tensors="pt").to("cuda")
with torch.no_grad():
logits = model(**inputs).logits
# Apply softmax to get probabilities
probs = torch.nn.functional.softmax(logits, dim=-1)
print(f"Logits: {logits}")
print(f"Probabilities: {probs}")
- Downloads last month
- 27