Legal-BERT-PEFT-EURLEX

A BERT model fine-tuned on EU legal documents from the Pile of Law dataset using Parameter-Efficient Fine-Tuning (PEFT).

Model Details

Base Model

Architecture: bert-base-uncased
Parameters: 110 million
Language: English

Fine-tuning Details

Method: PEFT with LoRA (Low-Rank Adaptation)
Trainable Parameters: 1.3 million (1.21% of total)
Training Approach: Masked Language Modeling (MLM)

Training Data

Dataset: EURLEX subset of Pile of Law
Training Samples: 20,000 legal documents
Domain: European Union Legal Documents
Text Length: Average 13,327 characters per document

Performance

Training Results

Metric	Base Model	Fine-tuned Model	Improvement
Test Loss	1.9327	0.6580	66.69%
Perplexity	6.91	1.93	72.07%

Training Configuration

Hyperparameters

Parameter	Value
Learning Rate	2e-4
Batch Size	16 (8 × gradient accumulation 2)
Epochs	3
Max Sequence Length	512
Warmup Steps	500
Weight Decay	0.01

Intended Use Cases

Recommended Use

Legal document analysis and processing
Masked language modeling in legal contexts
Legal text understanding and generation
Research in computational law and legal AI
Educational purposes in legal technology

Limitations and Bias

Limitations

Domain Specific: Primarily effective on legal text, especially EU law
Language: English only
Scope: Trained on a subset of EURLEX documents
Temporal Scope: Training data up to 2022 only

Qualitative Examples

Example 1: Legal Judgment

Input: "The court found the defendant [MASK] of all charges."
Predictions: ["guilty", "innocent", "acquitted", "free", "liable"]

Example 2: Contract Law

Input: "The contract was declared [MASK] due to fraudulent activities."
Predictions: ["void", "invalid", "null", "bankrupt", "cancelled"]

Example 3: Civil Law

Input: "The plaintiff sought [MASK] for damages incurred."
Predictions: ["compensation", "damages", "only", "insurance", "forgiveness"]

Usage

from transformers import AutoModelForMaskedLM, AutoTokenizer

# Load model and tokenizer
model = AutoModelForMaskedLM.from_pretrained("Nahla-yasmine/legal-bert-peft-eurlex")
tokenizer = AutoTokenizer.from_pretrained("Nahla-yasmine/legal-bert-peft-eurlex")

# Example: Masked language prediction
text = "The court found the defendant [MASK] of all charges."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)

# Get top predictions
mask_token_index = torch.where(inputs["input_ids"] == tokenizer.mask_token_id)[1]
logits = outputs.logits[0, mask_token_index, :]
top_tokens = torch.topk(logits, 5, dim=1).indices[0].tolist()

for i, token_id in enumerate(top_tokens):
    predicted_token = tokenizer.decode([token_id])
    print(f"{i+1}. {predicted_token}")

Advanced Usage with PEFT

from peft import PeftModel, PeftConfig
from transformers import AutoModelForMaskedLM

# Load base model
base_model = AutoModelForMaskedLM.from_pretrained("bert-base-uncased")

# Load PEFT adapter
model = PeftModel.from_pretrained(base_model, "Nahla-yasmine/legal-bert-peft-eurlex")

@software{legal_bert_peft_2024, title = {Legal-BERT-PEFT-EURLEX}, author = {Nahla-yasmine}, year = {2024}, url = {https://huggingface.co/Nahla-yasmine/legal-bert-peft-eurlex} }

PEFT 0.17.0

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Nahla-yasmine/legal-bert-peft-eurlex

Base model

google-bert/bert-base-uncased

Adapter

(99)

this model

Nahla-yasmine
/

legal-bert-peft-eurlex