๐ง LexGLUEโLEDGAR DistilBERT Legal Clause Classifier
Author: Salman Abbasi
Affiliation: Beijing Institute of Technology, China
Research Area: NLP ร Agentic AI ร Prompt Engineering
Date: October 2025
๐ Model Overview
This model is a DistilBERT-based legal clause classifier fine-tuned on the LexGLUEโLEDGAR dataset.
It predicts the type of contract clause (e.g., Confidentiality, Termination, Liability, Payment, etc.) given a legal paragraph or clause.
- Base Model:
distilbert-base-uncased - Dataset:
coastalcph/lex_glue(ledgarconfiguration) - Task Type: Multi-class text classification
- Language: English
- Domain: Legal / Contractual Texts
This model achieves strong performance despite its small size, making it efficient for deployment in legal NLP systems and agentic AI applications.
๐ Model Performance
| Metric | Score |
|---|---|
| Accuracy | 0.8618 |
| Macro F1 | 0.7842 |
| Validation Loss | 0.5756 |
| Epochs | 3 |
| Batch Size | 8 |
| Learning Rate | 2e-5 |
| Max Sequence Length | 512 |
๐ Dataset โ LexGLUE LEDGAR
The LEDGAR dataset contains contract clauses annotated with one of 100+ provision types.
It is part of the LexGLUE benchmark, which evaluates legal text understanding models.
Each sample includes:
{
"text": "This clause sets out the obligations of each party regarding confidentiality.",
"label": "Confidentiality"
}
- Downloads last month
- 6