LegalBERT Fine-Tuned on LEDGAR Dataset

This model is a fine-tuned version of LegalBERT on the LEDGAR dataset for legal clause classification.
It classifies legal clauses into one of 100 clause types (e.g., confidentiality, termination, liability, etc.).

Model Overview

Base Model: nlpaueb/legal-bert-base-uncased
Task: Multi-class clause classification
Dataset: LEDGAR
Language: English
Number of labels: 100
Fine-tuning epochs: 4
Batch size: 32
Optimizer: AdamW
Mixed Precision (FP16): Enabled (when CUDA available)

Dataset Details

Split	Samples	Description
Train	60,000	Used for model fine-tuning
Eval	10,000	Used for validation during training
Test	10,000	Held-out test set for final evaluation

Total samples: 80,000
Number of labels: 100
Text column: text (contains the clause text)
Label column: label

Evaluation Results (on Test Set)

Metric	Score
Accuracy	0.8678
Macro F1	0.7779
Macro Precision	0.7917
Macro Recall	0.7763
Evaluation Time	38.37 sec

How to Use

from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load model and tokenizer
model_name = "FENTECH/Legal-BERT-Clause-Classification"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Example inference
text = "The contractor shall maintain confidentiality of all client information."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)

predicted_label = outputs.logits.argmax(dim=-1).item()
print("Predicted label ID:", predicted_label)

Downloads last month: 652

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for FENTECH/Legal-BERT-Clause-Classification

Base model

nlpaueb/legal-bert-base-uncased

Finetuned

(80)

this model

FENTECH
/

Legal-BERT-Clause-Classification