Spam Email Classifier - RoBERTa-base with LoRA (r=4)

This model is a LoRA adapter for spam email classification, fine-tuned on the Email Spam Classification Dataset with 83,448 emails.

Model Description

Base Model: FacebookAI/roberta-base
LoRA Rank: 4
LoRA Alpha: 8
Task: Binary Text Classification (Spam/Ham)
Training Dataset: 83,448 emails (66,758 training samples)
Trainable Parameters: 1,255,682 (1.00% of total)
Total Parameters: 125,902,852

Performance

Metric	Score
Accuracy	99.41%
Precision	99.50%
Recall	99.39%
F1 Score	99.44%
ROC-AUC	0.9990
PR-AUC	0.9988

Training Time: 544.60 minutes (~9.1 hours)

Usage

Method 1: Using the Inference Script (Recommended)

Download the inference script and config from the GitHub repository:

# Download inference files
wget https://raw.githubusercontent.com/sherozshaikh/spam-email-classification-lora/main/inference/inference.py
wget https://raw.githubusercontent.com/sherozshaikh/spam-email-classification-lora/main/inference/inference_config.yaml

# Update inference_config.yaml with this model:
# base_model_name: "FacebookAI/roberta-base"
# adapter_path: "ssheroz/spam-email-classifier-roberta-r4"

Python API:

from inference import SpamClassifier

# Initialize classifier
classifier = SpamClassifier(config_path="inference_config.yaml")

# Classify single email
email = "Subject: URGENT! You've won $1,000,000! Click here to claim now!"
result = classifier.predict_single(email)

print(f"Prediction: {result['label']}")
print(f"Confidence: {result['confidence']:.2%}")
print(f"Probabilities: {result['probabilities']}")

Command Line:

# Single email prediction
python inference.py --text "Subject: Meeting tomorrow at 2pm"

# Batch prediction from CSV
python inference.py --input_file emails.csv --output_file predictions.csv

Method 2: Direct Usage with Transformers

from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel
import torch

# Load base model and tokenizer
base_model_name = "FacebookAI/roberta-base"
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
base_model = AutoModelForSequenceClassification.from_pretrained(
    base_model_name,
    num_labels=2,
    problem_type="single_label_classification"
)

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "ssheroz/spam-email-classifier-roberta-r4")
model.eval()

# Inference
text = "Subject: URGENT! You've won $1,000,000! Click here now!"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

with torch.no_grad():
    outputs = model(**inputs)
    probabilities = torch.softmax(outputs.logits, dim=1)
    prediction = torch.argmax(probabilities, dim=1).item()

label = "SPAM" if prediction == 1 else "HAM"
confidence = probabilities[0][prediction].item()

print(f"Prediction: {label} (Confidence: {confidence:.2%})")

Training Details

Hyperparameters

Epochs: 2
Learning Rate: 2e-4
Batch Size: 16
Optimizer: AdamW with weight decay (0.01)
Scheduler: Cosine with warmup (10% warmup ratio)
Gradient Clipping: 1.0
Mixed Precision: FP16
Early Stopping: Patience=2

LoRA Configuration

Rank (r): 4
Alpha: 8
Dropout: 0.1
Target Modules: query, key, value, dense (all attention layers)

Data Split

Train: 66,758 samples (80%)
Validation: 8,345 samples (10%)
Test: 8,345 samples (10%)

Limitations

Trained primarily on English emails
Performance may degrade on domain-specific spam (e.g., social media, SMS)
Requires periodic retraining for evolving spam patterns
False positives (legitimate emails marked as spam) can occur with unusual email patterns

Ethical Considerations

False positives may cause users to miss important emails
Should be used as part of a larger system with human oversight for critical applications
Regular monitoring and updates recommended to maintain effectiveness

Citation

If you use this model, please cite:

@misc{shaikh2025spamclassifier,
  author = {Sheroz Shaikh},
  title = {Spam Email Classification using LoRA Fine-tuned Transformers},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/ssheroz/spam-email-classifier-roberta-r4}}
}

Related Models

GitHub Repository

Full training code, analysis, and inference scripts: spam-email-classification-lora

License

MIT License - See LICENSE for details.

Contact

GitHub: @sherozshaikh
HuggingFace: @ssheroz

Downloads last month: 17

Model tree for ssheroz/spam-email-classifier-roberta-r4

Base model

FacebookAI/roberta-base

Adapter

(304)

this model