YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Model Card: lstm_dom_emotion_model

Model Summary

nikatonika/lstm_dom_emotion_model is a recurrent neural network trained to determine the dominant emotion across short video segments based on sequences of frame-level emotion probabilities. The model uses a single-layer LSTM architecture and was developed as part of the EchoStressAI system for analyzing emotional dynamics in real-world operator video recordings.


Use Case

Unlike conventional frame-based classifiers, this model aggregates temporal emotion patterns to infer a single dominant emotion for the entire fragment. It is designed for:

  • Emotion tracking in low-expressivity settings (e.g., fatigue, stress)
  • Offline emotion summarization
  • Operator condition monitoring

Input and Architecture

  • Input: Sequences of emotion probability vectors per frame (7-dimensional, padded)
  • Classes: Angry, Disgusted, Happy, Neutral, Sad, Scared, Surprised
  • Model: Unidirectional LSTM
    • 1 layer, 128 hidden units
    • Sequence padding and mask-aware loss

The model is trained to output a single label for the whole sequence, corresponding to the dominant emotional state across time.


Training Details

  • Dataset: Structured video data with frame-level emotion probability vectors
  • Loss: CrossEntropyLoss with time masking
  • Epochs: 30
  • Batch size: 64
  • Optimizer: Adam
  • Device: Google Colab Pro (T4)

Evaluation Results (Test Set)

Metric Value
Accuracy 97.07%
MSE โ€“
Rยฒ (R-squared) โ€“

The model provides stable predictions and successfully captures dominant affective patterns, though with slightly lower accuracy compared to its BiLSTM counterpart.


Scientific Motivation

Emotion expression in video is often:

  • Fragmented and inconsistent
  • Influenced by microexpressions
  • Not easily captured by frame-wise majority voting or softmax summing

This model was introduced to:

  • Aggregate temporal patterns over time
  • Improve robustness to fleeting changes
  • Avoid frame-level fluctuations

Comparison to BiLSTM

Feature LSTM BiLSTM
Directionality Unidirectional Bidirectional
Accuracy (Test) 97.07% 99.10%
Robustness Moderate Higher (better with noise)
Use in Production Experimental / fallback โœ… Production model in EchoStressAI

Integration in EchoStressAI

The model can be integrated into the offline video analysis pipeline to:

  • Compute dominant emotion over full video segments
  • Assist in emotional trend detection
  • Support fatigue/stress detection in operators

License

This model is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0).
Free for commercial and research use with proper attribution.


Contact

Developed by https://huggingface.co/nikatonika
Part of the EchoStressAI project

Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support