LegalBERT Fine-Tuned on LEDGAR Dataset

This model is a fine-tuned version of LegalBERT on the LEDGAR dataset for legal clause classification.
It classifies legal clauses into one of 100 clause types (e.g., confidentiality, termination, liability, etc.).


Model Overview

  • Base Model: nlpaueb/legal-bert-base-uncased
  • Task: Multi-class clause classification
  • Dataset: LEDGAR
  • Language: English
  • Number of labels: 100
  • Fine-tuning epochs: 4
  • Batch size: 32
  • Optimizer: AdamW
  • Mixed Precision (FP16): Enabled (when CUDA available)

Dataset Details

Split Samples Description
Train 60,000 Used for model fine-tuning
Eval 10,000 Used for validation during training
Test 10,000 Held-out test set for final evaluation
  • Total samples: 80,000
  • Number of labels: 100
  • Text column: text (contains the clause text)
  • Label column: label

Evaluation Results (on Test Set)

Metric Score
Accuracy 0.8678
Macro F1 0.7779
Macro Precision 0.7917
Macro Recall 0.7763
Evaluation Time 38.37 sec

How to Use

from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load model and tokenizer
model_name = "FENTECH/Legal-BERT-Clause-Classification"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Example inference
text = "The contractor shall maintain confidentiality of all client information."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)

predicted_label = outputs.logits.argmax(dim=-1).item()
print("Predicted label ID:", predicted_label)
Downloads last month
652
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for FENTECH/Legal-BERT-Clause-Classification

Finetuned
(80)
this model

Dataset used to train FENTECH/Legal-BERT-Clause-Classification