YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Model Card: lstm_dom_emotion_model

Model Summary

nikatonika/lstm_dom_emotion_model is a recurrent neural network trained to determine the dominant emotion across short video segments based on sequences of frame-level emotion probabilities. The model uses a single-layer LSTM architecture and was developed as part of the EchoStressAI system for analyzing emotional dynamics in real-world operator video recordings.

Use Case

Unlike conventional frame-based classifiers, this model aggregates temporal emotion patterns to infer a single dominant emotion for the entire fragment. It is designed for:

Emotion tracking in low-expressivity settings (e.g., fatigue, stress)
Offline emotion summarization
Operator condition monitoring

Input and Architecture

Input: Sequences of emotion probability vectors per frame (7-dimensional, padded)
Classes: Angry, Disgusted, Happy, Neutral, Sad, Scared, Surprised
Model: Unidirectional LSTM
- 1 layer, 128 hidden units
- Sequence padding and mask-aware loss

The model is trained to output a single label for the whole sequence, corresponding to the dominant emotional state across time.

Training Details

Dataset: Structured video data with frame-level emotion probability vectors
Loss: CrossEntropyLoss with time masking
Epochs: 30
Batch size: 64
Optimizer: Adam
Device: Google Colab Pro (T4)

Evaluation Results (Test Set)

Metric	Value
Accuracy	97.07%
MSE	–
R² (R-squared)	–

The model provides stable predictions and successfully captures dominant affective patterns, though with slightly lower accuracy compared to its BiLSTM counterpart.

Scientific Motivation

Emotion expression in video is often:

Fragmented and inconsistent
Influenced by microexpressions
Not easily captured by frame-wise majority voting or softmax summing

This model was introduced to:

Aggregate temporal patterns over time
Improve robustness to fleeting changes
Avoid frame-level fluctuations

Comparison to BiLSTM

Feature	LSTM	BiLSTM
Directionality	Unidirectional	Bidirectional
Accuracy (Test)	97.07%	99.10%
Robustness	Moderate	Higher (better with noise)
Use in Production	Experimental / fallback	✅ Production model in EchoStressAI

Integration in EchoStressAI

The model can be integrated into the offline video analysis pipeline to:

Compute dominant emotion over full video segments
Assist in emotional trend detection
Support fatigue/stress detection in operators

License

This model is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0).
Free for commercial and research use with proper attribution.

Contact

Developed by https://huggingface.co/nikatonika
Part of the EchoStressAI project

Downloads last month: 3

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support