Model Card: lstm_dom_emotion_model
Model Summary
nikatonika/lstm_dom_emotion_model is a recurrent neural network trained to determine the dominant emotion across short video segments based on sequences of frame-level emotion probabilities. The model uses a single-layer LSTM architecture and was developed as part of the EchoStressAI system for analyzing emotional dynamics in real-world operator video recordings.
Use Case
Unlike conventional frame-based classifiers, this model aggregates temporal emotion patterns to infer a single dominant emotion for the entire fragment. It is designed for:
- Emotion tracking in low-expressivity settings (e.g., fatigue, stress)
- Offline emotion summarization
- Operator condition monitoring
Input and Architecture
- Input: Sequences of emotion probability vectors per frame (7-dimensional, padded)
- Classes: Angry, Disgusted, Happy, Neutral, Sad, Scared, Surprised
- Model: Unidirectional LSTM
- 1 layer, 128 hidden units
- Sequence padding and mask-aware loss
The model is trained to output a single label for the whole sequence, corresponding to the dominant emotional state across time.
Training Details
- Dataset: Structured video data with frame-level emotion probability vectors
- Loss: CrossEntropyLoss with time masking
- Epochs: 30
- Batch size: 64
- Optimizer: Adam
- Device: Google Colab Pro (T4)
Evaluation Results (Test Set)
| Metric | Value |
|---|---|
| Accuracy | 97.07% |
| MSE | โ |
| Rยฒ (R-squared) | โ |
The model provides stable predictions and successfully captures dominant affective patterns, though with slightly lower accuracy compared to its BiLSTM counterpart.
Scientific Motivation
Emotion expression in video is often:
- Fragmented and inconsistent
- Influenced by microexpressions
- Not easily captured by frame-wise majority voting or softmax summing
This model was introduced to:
- Aggregate temporal patterns over time
- Improve robustness to fleeting changes
- Avoid frame-level fluctuations
Comparison to BiLSTM
| Feature | LSTM | BiLSTM |
|---|---|---|
| Directionality | Unidirectional | Bidirectional |
| Accuracy (Test) | 97.07% | 99.10% |
| Robustness | Moderate | Higher (better with noise) |
| Use in Production | Experimental / fallback | โ Production model in EchoStressAI |
Integration in EchoStressAI
The model can be integrated into the offline video analysis pipeline to:
- Compute dominant emotion over full video segments
- Assist in emotional trend detection
- Support fatigue/stress detection in operators
License
This model is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0).
Free for commercial and research use with proper attribution.
Contact
Developed by https://huggingface.co/nikatonika
Part of the EchoStressAI project
- Downloads last month
- 3