|
|
--- |
|
|
license: mit |
|
|
datasets: |
|
|
- mrm8488/goemotions |
|
|
- IconicAI/DDD |
|
|
language: |
|
|
- en |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
base_model: |
|
|
- Mango-Juice/trpg_mlm |
|
|
- microsoft/deberta-v3-large |
|
|
library_name: transformers |
|
|
model-index: |
|
|
- name: trpg_emotion_classification |
|
|
results: |
|
|
- task: |
|
|
type: text-classification |
|
|
dataset: |
|
|
name: IconicAI/DDD (custom subset manually labeled) |
|
|
type: custom |
|
|
split: test |
|
|
config: csv |
|
|
metrics: |
|
|
- type: accuracy |
|
|
value: 0.929 |
|
|
- type: f1 |
|
|
value: 0.476 |
|
|
name: f1 macro |
|
|
--- |
|
|
|
|
|
# GoEmotions Fine-tuned Model |
|
|
|
|
|
This is a multi-label emotion classification model trained on the GoEmotions dataset and TRPG sentences. |
|
|
|
|
|
## Model Information |
|
|
- **Base Model**: Mango-Juice/trpg_mlm |
|
|
- **Task**: Multi-label Emotion Classification |
|
|
- **Labels**: 28 emotion labels |
|
|
- **Training**: Completed a two-stage fine-tuning process (1st stage: GoEmotions data, 2nd stage: TRPG sentence data) |
|
|
|
|
|
## Emotion Labels |
|
|
- admiration |
|
|
- amusement |
|
|
- anger |
|
|
- annoyance |
|
|
- approval |
|
|
- caring |
|
|
- confusion |
|
|
- curiosity |
|
|
- desire |
|
|
- disappointment |
|
|
- disapproval |
|
|
- disgust |
|
|
- embarrassment |
|
|
- excitement |
|
|
- fear |
|
|
- gratitude |
|
|
- grief |
|
|
- joy |
|
|
- love |
|
|
- nervousness |
|
|
- optimism |
|
|
- pride |
|
|
- realization |
|
|
- relief |
|
|
- remorse |
|
|
- sadness |
|
|
- surprise |
|
|
- neutral |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
import torch |
|
|
|
|
|
# Load model and tokenizer |
|
|
tokenizer = AutoTokenizer.from_pretrained("Mango-Juice/trpg_emotion_classification") |
|
|
model = AutoModelForSequenceClassification.from_pretrained("Mango-Juice/trpg_emotion_classification") |
|
|
|
|
|
# Inference |
|
|
def predict_emotions(text): |
|
|
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128) |
|
|
with torch.no_grad(): |
|
|
logits = model(**inputs).logits |
|
|
probs = torch.sigmoid(logits).cpu().numpy()[0] |
|
|
|
|
|
emotion_labels = ['admiration', 'amusement', 'anger', 'annoyance', 'approval', 'caring', 'confusion', 'curiosity', 'desire', 'disappointment', 'disapproval', 'disgust', 'embarrassment', 'excitement', 'fear', 'gratitude', 'grief', 'joy', 'love', 'nervousness', 'optimism', 'pride', 'realization', 'relief', 'remorse', 'sadness', 'surprise', 'neutral'] |
|
|
return {emotion: float(prob) for emotion, prob in zip(emotion_labels, probs)} |
|
|
|
|
|
# Example |
|
|
text = "I am so happy today!" |
|
|
emotions = predict_emotions(text) |
|
|
print(emotions) |
|
|
``` |
|
|
|
|
|
## Performance |
|
|
- The fine-tuned model provides improved performance in emotion classification. |
|
|
- Data augmentation was applied for minority classes. |
|
|
|
|
|
## Training Details |
|
|
- **Data Augmentation**: Oversampling based on paraphrasing and back-translation. |
|
|
- **Loss Function**: Focal Loss with Label Smoothing |
|
|
- **Optimizer**: AdamW |
|
|
- **Scheduler**: ReduceLROnPlateau |