YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

BERTweet-BR Fine-tuned for Multilabel Hate Speech Detection (ToLD-Br)

Model Description

This model is a fine-tuned version of melll-uff/bertweetbr specifically trained for multilabel hate speech detection in Brazilian Portuguese. The model was fine-tuned on the multilabel version of the ToLD-Br dataset, achieving performance across six categories of toxic content in Portuguese social media text.

Training Dataset

ToLD-Br (Toxic Language Detection in Brazilian Portuguese) multilabel version contains:

  • 21,000 tweets annotated by 42 carefully selected annotators
  • Multilabel classification across 6 categories: [homophobia, obscene, insult, racism, misogyny, xenophobia]
  • Each text can be labeled with multiple categories simultaneously
  • Diverse annotator demographics to reduce bias
  • Focus on Brazilian Portuguese social media language

The dataset covers the following categories of toxic content:

  • Homophobia/LGBTQ+phobia: Discrimination against LGBTQ+ individuals
  • Obscene language: Vulgar, profane, or sexually explicit content
  • Insults: General offensive language and personal attacks
  • Racism: Racial discrimination and prejudice
  • Misogyny: Gender-based discrimination against women
  • Xenophobia: Discrimination against foreigners or different cultures

Training Configuration

  • Base Model: melll-uff/bertweetbr
  • Task: Multilabel sequence classification
  • Number of Labels: 6
  • Max Sequence Length: 70 tokens
  • Batch Size: 16
  • Learning Rate: 2e-5
  • Epochs: 3
  • Optimizer: AdamW
  • Scheduler: Linear with warmup
  • Problem Type: multi_label_classification
  • Loss Function: Binary Cross Entropy (sigmoid + threshold 0.5)

Performance Metrics

Evaluated on ToLD-Br multilabel test set (1,183 samples):

Category Precision Recall F1-Score Support
Homophobia 0.58 0.65 0.61 23
Obscene 0.66 0.72 0.69 671
Insult 0.70 0.69 0.70 414
Racism 0.00 0.00 0.00 16
Misogyny 0.65 0.29 0.40 45
Xenophobia 0.00 0.00 0.00 14

Overall Metrics:

  • Micro Average: Precision: 0.68, Recall: 0.68, F1-Score: 0.68
  • Macro Average: Precision: 0.43, Recall: 0.39, F1-Score: 0.40
  • Weighted Average: Precision: 0.66, Recall: 0.68, F1-Score: 0.66
  • Samples Average: Precision: 0.31, Recall: 0.30, F1-Score: 0.30

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import numpy as np

# Load model and tokenizer
model_name = "your-username/bertweetbr-toldbr-multilabel-hate-detection"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Define label names
label_names = ["homophobia", "obscene", "insult", "racism", "misogyny", "xenophobia"]

def predict_hate_speech_multilabel(text, threshold=0.5):
    inputs = tokenizer(
        text,
        truncation=True,
        padding=True,
        max_length=70,
        return_tensors='pt'
    )
    
    with torch.no_grad():
        outputs = model(**inputs)
        # Apply sigmoid to get probabilities
        probabilities = torch.sigmoid(outputs.logits)
        # Apply threshold to get binary predictions
        predictions = (probabilities > threshold).int().numpy()[0]
    
    # Return dictionary with label names and predictions
    results = {label_names[i]: bool(predictions[i]) for i in range(len(label_names))}
    probabilities_dict = {label_names[i]: float(probabilities[0][i]) for i in range(len(label_names))}
    
    return {
        'predictions': results,
        'probabilities': probabilities_dict
    }

# Test the model
text = "Exemplo de texto para classificar"
result = predict_hate_speech_multilabel(text)
print(f"Predictions: {result['predictions']}")
print(f"Probabilities: {result['probabilities']}")

Model Performance Analysis

The model shows varying performance across different categories:

Strong Performance:

  • Insult detection: Best performing category (F1: 0.70) with balanced precision and recall
  • Obscene content: Good performance (F1: 0.69) with strong recall (0.72)

Moderate Performance:

  • Homophobia: Moderate performance (F1: 0.61) with decent recall but lower precision
  • Misogyny: Challenging category (F1: 0.40) with low recall (0.29) despite good precision

Poor Performance:

  • Racism and Xenophobia: Very poor performance due to extremely limited training samples (16 and 14 respectively)

Intended Use

This model is designed for:

  • Research purposes in multilabel hate speech detection
  • Content moderation assistance in Brazilian Portuguese social media
  • Social media monitoring applications across multiple hate speech categories
  • Educational purposes in NLP, ethics, and bias detection

Limitations and Biases

  • Language: Optimized specifically for Brazilian Portuguese
  • Domain: Trained on social media text (Twitter-like content)
  • Class Imbalance: Significant performance variations due to unbalanced training data (racism and xenophobia categories severely underrepresented)
  • Multilabel Complexity: Some categories may be more difficult to detect when co-occurring with others
  • Bias: Despite efforts to reduce bias through diverse annotators, the model may still reflect societal biases present in the training data
  • Context: May struggle with highly context-dependent sarcasm or irony
  • Generalization: Performance may vary on texts significantly different from social media posts

Ethical Considerations

  • This model should be used as an assistive tool rather than for automated decision-making
  • Human oversight is strongly recommended, especially for categories with poor performance
  • Be aware of class imbalance effects when interpreting results
  • Consider the intersectionality of hate speech categories
  • Be mindful of false positives/negatives in each category when moderating content
  • Consider the cultural and linguistic context of Brazilian Portuguese

Citation

If you use this model, please cite the original ToLD-Br dataset:

@article{DBLP:journals/corr/abs-2010.04543,
  author    = {Joao Augusto Leite and
               Diego F. Silva and
               Kalina Bontcheva and
               Carolina Scarton},
  title     = {Toxic Language Detection in Social Media for Brazilian Portuguese:
               New Dataset and Multilingual Analysis},
  journal   = {CoRR},
  volume    = {abs/2010.04543},
  year      = {2020},
  url       = {https://arxiv.org/abs/2010.04543}
}

License

This model inherits the license from the base BERTweet-BR model and ToLD-Br dataset. Please refer to their respective repositories for detailed licensing information.


Disclaimer: This model is provided for research and educational purposes. Users are responsible for ensuring appropriate and ethical use of the model in their applications. The significant class imbalance in the training data should be carefully considered when deploying this model in production environments.

Downloads last month
3
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support