This is RuBERT model fine-tuned for emotion classification of short Russian texts. The task is a multi-label classification with the following labels:
0: no_emotion
1: joy
2: sadness
3: surprise
4: fear
5: anger
Label to Russian label:
no_emotion: нет эмоции
joy: радость
sadness: грусть
surprise: удивление
fear: страх
anger: злость
Usage
from transformers import pipeline
model = pipeline(model="seara/rubert-base-cased-cedr-russian-emotion")
model("Привет, ты мне нравишься!")
# [{'label': 'joy', 'score': 0.9388909935951233}]
Dataset
This model was trained on CEDR dataset.
An overview of the training data can be found in it's Hugging Face card or in the source article.
Training
Training were done in this project with this parameters:
tokenizer.max_length: null
batch_size: 64
optimizer: adam
lr: 0.00001
weight_decay: 0
num_epochs: 5
Eval results (on test split)
no_emotion | joy | sadness | surprise | fear | anger | micro avg | macro avg | weighted avg | |
---|---|---|---|---|---|---|---|---|---|
precision | 0.87 | 0.84 | 0.85 | 0.74 | 0.7 | 0.66 | 0.83 | 0.78 | 0.83 |
recall | 0.84 | 0.86 | 0.82 | 0.71 | 0.74 | 0.33 | 0.79 | 0.72 | 0.79 |
f1-score | 0.86 | 0.85 | 0.84 | 0.72 | 0.72 | 0.44 | 0.81 | 0.74 | 0.8 |
auc-roc | 0.95 | 0.97 | 0.96 | 0.94 | 0.93 | 0.86 | 0.95 | 0.93 | 0.95 |
support | 734 | 353 | 379 | 170 | 141 | 125 | 1902 | 1902 | 1902 |
- Downloads last month
- 216