AbuseBERT

Model Description

AbuseBERT is a BERT-based classification model fine-tuned for abusive language detection, optimized for cross-dataset generalization.

Abusive language detection models often suffer from poor generalization due to sampling and lexical biases in individual datasets. Our approach addresses this by integrating publicly available abusive language datasets, harmonizing labels and preprocessing textual samples to create a broader and more representative training distribution.

Key Findings using 10 datasets:

  • Individual dataset models: average F1 = 0.60
  • Integrated model: F1 = 0.84
  • Dataset contribution to performance improvements correlates with lexical diversity (0.71 correlation)
  • Integration exposes models to diverse abuse patterns, enhancing real-world generalization

Conclusion / Takeaways

  • No single dataset captures the full spectrum of abusive language; each dataset reflects a limited slice of the problem space.
  • Systematically integrating ten heterogeneous datasets significantly improves classification performance on a held-out benchmark.
  • Lexically dissimilar datasets contribute more to enhancing generalization.
  • The integrated model demonstrates superior cross-dataset performance compared to models trained on individual datasets.

Paper Reference

Samaneh Hosseini Moghaddam, Kelly Lyons, Frank Rudzicz, Cheryl Regehr, Vivek Goel, Kaitlyn Regehr,
โ€œEnhancing machine learning in abusive language detection with dataset aggregation,โ€ in Proc. 35th IEEE Int. Conf. Collaborative Advances in Software Computing (CASC), 2025.


Intended Use

Recommended:

  • Detecting abusive, offensive, or toxic language in text from social media, online forums, or messaging platforms.

  • Supporting research on online harassment, cyber violence, and hate speech analysis.

  • Assisting human moderators in content review or flagging potentially harmful content.

  • Evaluating trends, prevalence, or patterns of abusive language in large-scale textual datasets.

Not Recommended:

  • Fully automated moderation without human oversight
  • High-stakes legal or policy decisions

Usage Example

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

# Load the model
model_name = "Samanehmoghaddam/AbuseBERT"  
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Create a pipeline for text classification
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)

# Example texts to classify
texts = [
    "@user You are amazing!",
    "@user You are stupid!",
]

# Run the classifier
results = classifier(texts)

# Print results
for text, result in zip(texts, results):
    print(f"Text: {text}")
    print(f"Prediction: {result}")
    print("-" * 40)
Downloads last month
21
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support