BERTweet-BR Fine-tuned for Multilabel Hate Speech Detection (ToLD-Br)
Model Description
This model is a fine-tuned version of melll-uff/bertweetbr specifically trained for multilabel hate speech detection in Brazilian Portuguese. The model was fine-tuned on the multilabel version of the ToLD-Br dataset, achieving performance across six categories of toxic content in Portuguese social media text.
Training Dataset
ToLD-Br (Toxic Language Detection in Brazilian Portuguese) multilabel version contains:
- 21,000 tweets annotated by 42 carefully selected annotators
- Multilabel classification across 6 categories: [homophobia, obscene, insult, racism, misogyny, xenophobia]
- Each text can be labeled with multiple categories simultaneously
- Diverse annotator demographics to reduce bias
- Focus on Brazilian Portuguese social media language
The dataset covers the following categories of toxic content:
- Homophobia/LGBTQ+phobia: Discrimination against LGBTQ+ individuals
- Obscene language: Vulgar, profane, or sexually explicit content
- Insults: General offensive language and personal attacks
- Racism: Racial discrimination and prejudice
- Misogyny: Gender-based discrimination against women
- Xenophobia: Discrimination against foreigners or different cultures
Training Configuration
- Base Model: melll-uff/bertweetbr
- Task: Multilabel sequence classification
- Number of Labels: 6
- Max Sequence Length: 70 tokens
- Batch Size: 16
- Learning Rate: 2e-5
- Epochs: 3
- Optimizer: AdamW
- Scheduler: Linear with warmup
- Problem Type: multi_label_classification
- Loss Function: Binary Cross Entropy (sigmoid + threshold 0.5)
Performance Metrics
Evaluated on ToLD-Br multilabel test set (1,183 samples):
| Category | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| Homophobia | 0.58 | 0.65 | 0.61 | 23 |
| Obscene | 0.66 | 0.72 | 0.69 | 671 |
| Insult | 0.70 | 0.69 | 0.70 | 414 |
| Racism | 0.00 | 0.00 | 0.00 | 16 |
| Misogyny | 0.65 | 0.29 | 0.40 | 45 |
| Xenophobia | 0.00 | 0.00 | 0.00 | 14 |
Overall Metrics:
- Micro Average: Precision: 0.68, Recall: 0.68, F1-Score: 0.68
- Macro Average: Precision: 0.43, Recall: 0.39, F1-Score: 0.40
- Weighted Average: Precision: 0.66, Recall: 0.68, F1-Score: 0.66
- Samples Average: Precision: 0.31, Recall: 0.30, F1-Score: 0.30
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import numpy as np
# Load model and tokenizer
model_name = "your-username/bertweetbr-toldbr-multilabel-hate-detection"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Define label names
label_names = ["homophobia", "obscene", "insult", "racism", "misogyny", "xenophobia"]
def predict_hate_speech_multilabel(text, threshold=0.5):
inputs = tokenizer(
text,
truncation=True,
padding=True,
max_length=70,
return_tensors='pt'
)
with torch.no_grad():
outputs = model(**inputs)
# Apply sigmoid to get probabilities
probabilities = torch.sigmoid(outputs.logits)
# Apply threshold to get binary predictions
predictions = (probabilities > threshold).int().numpy()[0]
# Return dictionary with label names and predictions
results = {label_names[i]: bool(predictions[i]) for i in range(len(label_names))}
probabilities_dict = {label_names[i]: float(probabilities[0][i]) for i in range(len(label_names))}
return {
'predictions': results,
'probabilities': probabilities_dict
}
# Test the model
text = "Exemplo de texto para classificar"
result = predict_hate_speech_multilabel(text)
print(f"Predictions: {result['predictions']}")
print(f"Probabilities: {result['probabilities']}")
Model Performance Analysis
The model shows varying performance across different categories:
Strong Performance:
- Insult detection: Best performing category (F1: 0.70) with balanced precision and recall
- Obscene content: Good performance (F1: 0.69) with strong recall (0.72)
Moderate Performance:
- Homophobia: Moderate performance (F1: 0.61) with decent recall but lower precision
- Misogyny: Challenging category (F1: 0.40) with low recall (0.29) despite good precision
Poor Performance:
- Racism and Xenophobia: Very poor performance due to extremely limited training samples (16 and 14 respectively)
Intended Use
This model is designed for:
- Research purposes in multilabel hate speech detection
- Content moderation assistance in Brazilian Portuguese social media
- Social media monitoring applications across multiple hate speech categories
- Educational purposes in NLP, ethics, and bias detection
Limitations and Biases
- Language: Optimized specifically for Brazilian Portuguese
- Domain: Trained on social media text (Twitter-like content)
- Class Imbalance: Significant performance variations due to unbalanced training data (racism and xenophobia categories severely underrepresented)
- Multilabel Complexity: Some categories may be more difficult to detect when co-occurring with others
- Bias: Despite efforts to reduce bias through diverse annotators, the model may still reflect societal biases present in the training data
- Context: May struggle with highly context-dependent sarcasm or irony
- Generalization: Performance may vary on texts significantly different from social media posts
Ethical Considerations
- This model should be used as an assistive tool rather than for automated decision-making
- Human oversight is strongly recommended, especially for categories with poor performance
- Be aware of class imbalance effects when interpreting results
- Consider the intersectionality of hate speech categories
- Be mindful of false positives/negatives in each category when moderating content
- Consider the cultural and linguistic context of Brazilian Portuguese
Citation
If you use this model, please cite the original ToLD-Br dataset:
@article{DBLP:journals/corr/abs-2010.04543,
author = {Joao Augusto Leite and
Diego F. Silva and
Kalina Bontcheva and
Carolina Scarton},
title = {Toxic Language Detection in Social Media for Brazilian Portuguese:
New Dataset and Multilingual Analysis},
journal = {CoRR},
volume = {abs/2010.04543},
year = {2020},
url = {https://arxiv.org/abs/2010.04543}
}
License
This model inherits the license from the base BERTweet-BR model and ToLD-Br dataset. Please refer to their respective repositories for detailed licensing information.
Disclaimer: This model is provided for research and educational purposes. Users are responsible for ensuring appropriate and ethical use of the model in their applications. The significant class imbalance in the training data should be carefully considered when deploying this model in production environments.
- Downloads last month
- 3