---
pipeline_tag: fill-mask
---
# 5CD-AI/visocial-T5-base
## Overview
We trimmed vocabulary size to 50,589 and continually pretrained `google/mt5-base`[1] on a merged 20GB dataset, the training dataset includes:
- Crawled data (Million of comments and posts on Facebook)
- UIT data[2], which is used to pretrain `uitnlp/visobert`[2]
- MC4 ecommerce
- 10.7M comments on VOZ Forum from `tarudesu/VOZ-HSD`[7]
- 3.6M reviews from Amazon[3] translated into Vietnamese from `5CD-AI/Vietnamese-amazon_polarity-gg-translated`
Here are the results on 3 downstream tasks on Vietnamese social media texts, including Hate Speech Detection(UIT-HSD), Toxic Speech Detection(ViCTSD), Hate Spans Detection(ViHOS):
| Model |
Average MF1 |
Hate Speech Detection |
Toxic Speech Detection |
Hate Spans Detection |
| Acc |
WF1 |
MF1 |
Acc |
WF1 |
MF1 |
Acc |
WF1 |
MF1 |
| PhoBERT[4] |
69.63 |
86.75 |
86.52 |
64.76 |
90.78 |
90.27 |
71.31 |
84.65 |
81.12 |
72.81 |
| PhoBERT_v2[4] |
70.50 |
87.42 |
87.33 |
66.60 |
90.23 |
89.78 |
71.39 |
84.92 |
81.51 |
73.51 |
| viBERT[5] |
67.80 |
86.33 |
85.79 |
62.85 |
88.81 |
88.17 |
67.65 |
84.63 |
81.28 |
72.91 |
| ViSoBERT[6] |
75.07 |
88.17 |
87.86 |
67.71 |
90.35 |
90.16 |
71.45 |
90.16 |
90.07 |
86.04 |
| ViHateT5[7] |
75.56 |
88.76 |
89.14 |
68.67 |
90.80 |
91.78 |
71.63 |
91.00 |
90.20 |
86.37 |
| visocial-T5-base(Ours) |
78.01 |
89.51 |
89.78 |
71.19 |
92.2 |
93.47 |
73.81 |
92.57 |
92.20 |
89.04 |
Visocial-T5-base versus other T5-based models in terms of Vietnamese HSD-related task performance with Macro F1-score:
| Model |
MF1 |
| Hate Speech Detection |
Toxic Speech Detection |
Hate Spans Detection |
| mT5[1] |
66.76 |
69.93 |
86.60 |
| ViT5[8] |
66.95 |
64.82 |
86.90 |
| ViHateT5[7] |
68.67 |
71.63 |
86.37 |
| visocial-T5-base(Ours) |
71.90 |
73.81 |
89.04 |
## Fine-tune Configuration
We fine-tune `5CD-AI/visocial-T5-base` on 3 downstream tasks with `transformers` library with the following configuration:
- seed: 42
- training_epochs: 4
- train_batch_size: 4
- gradient_accumulation_steps: 8
- learning_rate: 3e-4
- lr_scheduler_type: linear
- model_max_length: 256
- metric_for_best_model: eval_loss
- evaluation_strategy: steps
- eval_steps=0.1
## References
[1] [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934)
[2] [ViSoBERT: A Pre-Trained Language Model for Vietnamese Social Media Text Processing](https://aclanthology.org/2023.emnlp-main.315/)
[3] [The Amazon Polarity dataset](https://paperswithcode.com/dataset/amazon-polarity-1)
[4] [PhoBERT: Pre-trained language models for Vietnamese](https://aclanthology.org/2020.findings-emnlp.92/)
[5] [Improving Sequence Tagging for Vietnamese Text Using Transformer-based Neural Models](https://arxiv.org/abs/2006.15994)
[6] [ViSoBERT: A Pre-Trained Language Model for Vietnamese Social Media Text Processing](https://aclanthology.org/2023.emnlp-main.315/)
[7] [ViHateT5: Enhancing Hate Speech Detection in Vietnamese With A Unified Text-to-Text Transformer Model](https://arxiv.org/abs/2405.14141)
[8] [ViT5: Pretrained Text-to-Text Transformer for Vietnamese Language Generation](https://aclanthology.org/2022.naacl-srw.18/)