Silly-Machine
/

TuPy-Bert-Base-Multilabel

Text Classification

Model card Files Files and versions

FpOliveira commited on Dec 28, 2023

Commit

4cb78ba

·

1 Parent(s): 82f9ab2

Create README.md

Files changed (1) hide show

README.md +80 -0

README.md ADDED Viewed

	@@ -0,0 +1,80 @@

+---
+license: mit
+datasets:
+- Silly-Machine/TuPyE-Dataset
+language:
+- pt
+pipeline_tag: text-classification
+base_model: neuralmind/bert-base-portuguese-cased
+widget:
+- text: 'Bom dia, flor do dia!!'
+model-index:
+  - name: Yi-34B
+    results:
+      - task:
+          type: text-classfication
+        dataset:
+          name: Silly-Machine/TuPyE-Dataset
+          type: Silly-Machine/TuPyE-Dataset
+        metrics:
+          - name: f1
+            type: f1
+            value: 64.59
+        source:
+          name: Open LLM Leaderboard
+          url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
+---
+## Introduction
+Tupi-BERT-Base is a fine-tuned BERT model designed specifically for binary classification of hate speech in Portuguese. Derived from the [BERTimbau base](https://huggingface.co/neuralmind/bert-base-portuguese-cased), TuPi-Base is refinde solution for addressing hate speech concerns.
+For more details or specific inquiries, please refer to the [BERTimbau repository](https://github.com/neuralmind-ai/portuguese-bert/).
+The efficacy of Language Models can exhibit notable variations when confronted with a shift in domain between training and test data. In the creation of a specialized Portuguese Language Model tailored for hate speech classification, the original BERTimbau model underwent fine-tuning processe carried out on the [TuPi Hate Speech DataSet](https://huggingface.co/datasets/FpOliveira/TuPi-Portuguese-Hate-Speech-Dataset-Binary), sourced from diverse social networks.
+## Available models
+| Model                                    | Arch.      | #Layers | #Params |
+| ---------------------------------------- | ---------- | ------- | ------- |
+| `Silly-Machine/TuPy-Bert-Base-Binary-Classifier`  | BERT-Base	|12	|109M|
+| `Silly-Machine/TuPy-Bert-Large-Binary-Classifier` | BERT-Large | 24      | 334M    |
+| `Silly-Machine/TuPy-Bert-Base-Multilabel` | BERT-Base | 12      | 109M    |
+| `Silly-Machine/TuPy-Bert-Large-Multilabel` | BERT-Large | 24      | 334M    |
+## Example usage usage
+```python
+from transformers import AutoModelForSequenceClassification, AutoTokenizer, AutoConfig
+import torch
+import numpy as np
+from scipy.special import softmax
+def classify_hate_speech(model_name, text):
+    model = AutoModelForSequenceClassification.from_pretrained(model_name)
+    tokenizer = AutoTokenizer.from_pretrained(model_name)
+    config = AutoConfig.from_pretrained(model_name)
+    # Tokenize input text and prepare model input
+    model_input = tokenizer(text, padding=True, return_tensors="pt")
+    # Get model output scores
+    with torch.no_grad():
+        output = model(**model_input)
+        scores = softmax(output.logits.numpy(), axis=1)
+        ranking = np.argsort(scores[0])[::-1]
+    # Print the results
+    for i, rank in enumerate(ranking):
+        label = config.id2label[rank]
+        score = scores[0, rank]
+        print(f"{i + 1}) Label: {label} Score: {score:.4f}")
+# Example usage
+model_name = "Silly-Machine/TuPy-Bert-Base-Multilabel"
+text = "Bom dia, flor do dia!!"
+classify_hate_speech(model_name, text)
+```