Legal-BERT-PEFT-EURLEX
A BERT model fine-tuned on EU legal documents from the Pile of Law dataset using Parameter-Efficient Fine-Tuning (PEFT).
Model Details
Base Model
- Architecture:
bert-base-uncased - Parameters: 110 million
- Language: English
Fine-tuning Details
- Method: PEFT with LoRA (Low-Rank Adaptation)
- Trainable Parameters: 1.3 million (1.21% of total)
- Training Approach: Masked Language Modeling (MLM)
Training Data
- Dataset: EURLEX subset of Pile of Law
- Training Samples: 20,000 legal documents
- Domain: European Union Legal Documents
- Text Length: Average 13,327 characters per document
Performance
Training Results
| Metric | Base Model | Fine-tuned Model | Improvement |
|---|---|---|---|
| Test Loss | 1.9327 | 0.6580 | 66.69% |
| Perplexity | 6.91 | 1.93 | 72.07% |
Training Configuration
Hyperparameters
| Parameter | Value |
|---|---|
| Learning Rate | 2e-4 |
| Batch Size | 16 (8 × gradient accumulation 2) |
| Epochs | 3 |
| Max Sequence Length | 512 |
| Warmup Steps | 500 |
| Weight Decay | 0.01 |
Intended Use Cases
Recommended Use
- Legal document analysis and processing
- Masked language modeling in legal contexts
- Legal text understanding and generation
- Research in computational law and legal AI
- Educational purposes in legal technology
Limitations and Bias
Limitations
- Domain Specific: Primarily effective on legal text, especially EU law
- Language: English only
- Scope: Trained on a subset of EURLEX documents
- Temporal Scope: Training data up to 2022 only
Qualitative Examples
Example 1: Legal Judgment
Input: "The court found the defendant [MASK] of all charges."
Predictions: ["guilty", "innocent", "acquitted", "free", "liable"]
Example 2: Contract Law
Input: "The contract was declared [MASK] due to fraudulent activities."
Predictions: ["void", "invalid", "null", "bankrupt", "cancelled"]
Example 3: Civil Law
Input: "The plaintiff sought [MASK] for damages incurred."
Predictions: ["compensation", "damages", "only", "insurance", "forgiveness"]
Usage
from transformers import AutoModelForMaskedLM, AutoTokenizer
# Load model and tokenizer
model = AutoModelForMaskedLM.from_pretrained("Nahla-yasmine/legal-bert-peft-eurlex")
tokenizer = AutoTokenizer.from_pretrained("Nahla-yasmine/legal-bert-peft-eurlex")
# Example: Masked language prediction
text = "The court found the defendant [MASK] of all charges."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
# Get top predictions
mask_token_index = torch.where(inputs["input_ids"] == tokenizer.mask_token_id)[1]
logits = outputs.logits[0, mask_token_index, :]
top_tokens = torch.topk(logits, 5, dim=1).indices[0].tolist()
for i, token_id in enumerate(top_tokens):
predicted_token = tokenizer.decode([token_id])
print(f"{i+1}. {predicted_token}")
Advanced Usage with PEFT
from peft import PeftModel, PeftConfig
from transformers import AutoModelForMaskedLM
# Load base model
base_model = AutoModelForMaskedLM.from_pretrained("bert-base-uncased")
# Load PEFT adapter
model = PeftModel.from_pretrained(base_model, "Nahla-yasmine/legal-bert-peft-eurlex")
@software{legal_bert_peft_2024, title = {Legal-BERT-PEFT-EURLEX}, author = {Nahla-yasmine}, year = {2024}, url = {https://huggingface.co/Nahla-yasmine/legal-bert-peft-eurlex} }
- PEFT 0.17.0
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for Nahla-yasmine/legal-bert-peft-eurlex
Base model
google-bert/bert-base-uncased