YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

BERTweet-BR Fine-tuned for Multilabel Hate Speech Detection (ToLD-Br)

Model Description

This model is a fine-tuned version of melll-uff/bertweetbr specifically trained for multilabel hate speech detection in Brazilian Portuguese. The model was fine-tuned on the multilabel version of the ToLD-Br dataset, achieving performance across six categories of toxic content in Portuguese social media text.

Training Dataset

ToLD-Br (Toxic Language Detection in Brazilian Portuguese) multilabel version contains:

21,000 tweets annotated by 42 carefully selected annotators
Multilabel classification across 6 categories: [homophobia, obscene, insult, racism, misogyny, xenophobia]
Each text can be labeled with multiple categories simultaneously
Diverse annotator demographics to reduce bias
Focus on Brazilian Portuguese social media language

The dataset covers the following categories of toxic content:

Homophobia/LGBTQ+phobia: Discrimination against LGBTQ+ individuals
Obscene language: Vulgar, profane, or sexually explicit content
Insults: General offensive language and personal attacks
Racism: Racial discrimination and prejudice
Misogyny: Gender-based discrimination against women
Xenophobia: Discrimination against foreigners or different cultures

Training Configuration

Base Model: melll-uff/bertweetbr
Task: Multilabel sequence classification
Number of Labels: 6
Max Sequence Length: 70 tokens
Batch Size: 16
Learning Rate: 2e-5
Epochs: 3
Optimizer: AdamW
Scheduler: Linear with warmup
Problem Type: multi_label_classification
Loss Function: Binary Cross Entropy (sigmoid + threshold 0.5)

Performance Metrics

Evaluated on ToLD-Br multilabel test set (1,183 samples):

Category	Precision	Recall	F1-Score	Support
Homophobia	0.58	0.65	0.61	23
Obscene	0.66	0.72	0.69	671
Insult	0.70	0.69	0.70	414
Racism	0.00	0.00	0.00	16
Misogyny	0.65	0.29	0.40	45
Xenophobia	0.00	0.00	0.00	14

Overall Metrics:

Micro Average: Precision: 0.68, Recall: 0.68, F1-Score: 0.68
Macro Average: Precision: 0.43, Recall: 0.39, F1-Score: 0.40
Weighted Average: Precision: 0.66, Recall: 0.68, F1-Score: 0.66
Samples Average: Precision: 0.31, Recall: 0.30, F1-Score: 0.30

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import numpy as np

# Load model and tokenizer
model_name = "your-username/bertweetbr-toldbr-multilabel-hate-detection"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Define label names
label_names = ["homophobia", "obscene", "insult", "racism", "misogyny", "xenophobia"]

def predict_hate_speech_multilabel(text, threshold=0.5):
    inputs = tokenizer(
        text,
        truncation=True,
        padding=True,
        max_length=70,
        return_tensors='pt'
    )
    
    with torch.no_grad():
        outputs = model(**inputs)
        # Apply sigmoid to get probabilities
        probabilities = torch.sigmoid(outputs.logits)
        # Apply threshold to get binary predictions
        predictions = (probabilities > threshold).int().numpy()[0]
    
    # Return dictionary with label names and predictions
    results = {label_names[i]: bool(predictions[i]) for i in range(len(label_names))}
    probabilities_dict = {label_names[i]: float(probabilities[0][i]) for i in range(len(label_names))}
    
    return {
        'predictions': results,
        'probabilities': probabilities_dict
    }

# Test the model
text = "Exemplo de texto para classificar"
result = predict_hate_speech_multilabel(text)
print(f"Predictions: {result['predictions']}")
print(f"Probabilities: {result['probabilities']}")

Model Performance Analysis

The model shows varying performance across different categories:

Strong Performance:

Insult detection: Best performing category (F1: 0.70) with balanced precision and recall
Obscene content: Good performance (F1: 0.69) with strong recall (0.72)

Moderate Performance:

Homophobia: Moderate performance (F1: 0.61) with decent recall but lower precision
Misogyny: Challenging category (F1: 0.40) with low recall (0.29) despite good precision

Poor Performance:

Racism and Xenophobia: Very poor performance due to extremely limited training samples (16 and 14 respectively)

Intended Use

This model is designed for:

Research purposes in multilabel hate speech detection
Content moderation assistance in Brazilian Portuguese social media
Social media monitoring applications across multiple hate speech categories
Educational purposes in NLP, ethics, and bias detection

Limitations and Biases

Language: Optimized specifically for Brazilian Portuguese
Domain: Trained on social media text (Twitter-like content)
Class Imbalance: Significant performance variations due to unbalanced training data (racism and xenophobia categories severely underrepresented)
Multilabel Complexity: Some categories may be more difficult to detect when co-occurring with others
Bias: Despite efforts to reduce bias through diverse annotators, the model may still reflect societal biases present in the training data
Context: May struggle with highly context-dependent sarcasm or irony
Generalization: Performance may vary on texts significantly different from social media posts

Ethical Considerations

This model should be used as an assistive tool rather than for automated decision-making
Human oversight is strongly recommended, especially for categories with poor performance
Be aware of class imbalance effects when interpreting results
Consider the intersectionality of hate speech categories
Be mindful of false positives/negatives in each category when moderating content
Consider the cultural and linguistic context of Brazilian Portuguese

Citation

If you use this model, please cite the original ToLD-Br dataset:

@article{DBLP:journals/corr/abs-2010.04543,
  author    = {Joao Augusto Leite and
               Diego F. Silva and
               Kalina Bontcheva and
               Carolina Scarton},
  title     = {Toxic Language Detection in Social Media for Brazilian Portuguese:
               New Dataset and Multilingual Analysis},
  journal   = {CoRR},
  volume    = {abs/2010.04543},
  year      = {2020},
  url       = {https://arxiv.org/abs/2010.04543}
}

License

This model inherits the license from the base BERTweet-BR model and ToLD-Br dataset. Please refer to their respective repositories for detailed licensing information.

Disclaimer: This model is provided for research and educational purposes. Users are responsible for ensuring appropriate and ethical use of the model in their applications. The significant class imbalance in the training data should be carefully considered when deploying this model in production environments.

Downloads last month: 3

Safetensors

Model size

0.1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support