Video-Text-to-Text

Improve model card for MovieCORE dataset

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +143 -10
README.md CHANGED
@@ -1,22 +1,155 @@
1
  ---
2
- license: mit
 
3
  datasets:
4
  - MovieCORE/MovieCORE
5
  - Enxin/MovieChat-1K-test
6
- base_model:
7
- - lmsys/vicuna-7b-v1.1
8
  ---
9
 
10
- # HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
- [Project Page](https://joslefaure.github.io/assets/html/hermes.html) |
13
- [Model Paper](https://huggingface.co/papers/2408.17443) |
14
- [Github](https://github.com/joslefaure/HERMES)
15
 
16
- To use these models for inference, please look at our inference code and guidelines in this [Github Repo](https://github.com/joslefaure/HERMES)
 
 
 
 
 
 
17
 
 
18
 
19
- # References
20
 
21
- - [MovieCORE](https://huggingface.co/papers/2508.19026)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
 
 
 
 
 
 
1
  ---
2
+ base_model:
3
+ - lmsys/vicuna-7b-v1.1
4
  datasets:
5
  - MovieCORE/MovieCORE
6
  - Enxin/MovieChat-1K-test
7
+ license: mit
8
+ pipeline_tag: video-text-to-text
9
  ---
10
 
11
+ <div align="center">
12
+ <img src="https://github.com/joslefaure/MovieCORE/raw/main/assets/moviecore_icon.png" alt="MovieCORE Icon" width="150"/>
13
+
14
+ # MovieCORE: COgnitive REasoning in Movies
15
+
16
+ **A Video Question Answering Dataset for Probing Deeper Cognitive Understanding of Movie Content**
17
+
18
+ [![arXiv](https://img.shields.io/badge/arXiv-2508.19026-b31b1b.svg)](https://arxiv.org/abs/2508.19026)
19
+ [![Hugging Face Paper](https://img.shields.io/badge/%F0%9F%A4%97%20Paper-HuggingFace-blue)](https://huggingface.co/papers/2508.19026)
20
+ [![Hugging Face Dataset](https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-HuggingFace-yellow.svg)](https://huggingface.co/datasets/MovieCORE/MovieCORE)
21
+ [![GitHub Code](https://img.shields.io/badge/GitHub-Code-blue.svg?logo=github&)](https://github.com/joslefaure/moviecore)
22
+ [![Project Page](https://img.shields.io/badge/Project%20Page-Website-green.svg)](https://joslefaure.github.io/assets/html/moviecore.html)
23
+ [![License](https://img.shields.io/badge/License-MIT-green.svg)](https://github.com/joslefaure/MovieCORE/blob/main/LICENSE)
24
+
25
+ ![MovieCore Dataset Teaser](https://github.com/joslefaure/MovieCORE/raw/main/assets/poster_teaser.png)
26
+ </div>
27
+
28
+ ## πŸ“– Overview
29
+
30
+ MovieCORE is a comprehensive video question answering (VQA) dataset specifically designed to evaluate and probe deeper cognitive understanding of movie content. Unlike traditional VQA datasets that focus on surface-level visual understanding, MovieCORE challenges models to demonstrate sophisticated reasoning about narrative structures, character development, thematic elements, and complex temporal relationships within cinematic content.
31
+
32
+ ## πŸ—‚οΈ Data Preparation
33
+
34
+ The MovieCORE dataset builds upon video content from MovieChat. To get started:
35
+
36
+ ### Video Data
37
+ Download the video files from MovieChat's HuggingFace repositories:
38
+ - **Training Data**: [MovieChat-1K Train](https://huggingface.co/datasets/Enxin/MovieChat-1K_train)
39
+ - **Test Data**: [MovieChat-1K Test](https://huggingface.co/datasets/Enxin/MovieChat-1K-test)
40
+
41
+ ### Annotations
42
+ Access our annotations on HuggingFace:
43
+ - **MovieCORE Annotations**: [πŸ€— HuggingFace Dataset](https://huggingface.co/datasets/MovieCORE/MovieCORE/tree/main)
44
+
45
+ Extract and organize the data according to your model's requirements, then use our annotations for evaluation.
46
+
47
+ ## πŸš€ Quick Start
48
+
49
+ ### Installation
50
+ ```bash
51
+ git clone https://github.com/joslefaure/MovieCORE.git
52
+ cd MovieCORE
53
+ ```
54
+
55
+ ## 🎯 Baselines
56
+ - We have provided the script to run [HERMES](https://github.com/joslefaure/HERMES) (ICCV'25) on MovieCORE. Please check out the linked project.
57
+
58
+ ## πŸ“Š Evaluation Dimensions
59
 
60
+ MovieCORE employs a comprehensive multi-dimensional evaluation framework to assess model performance across different aspects of cognitive understanding:
 
 
61
 
62
+ | Dimension | Description |
63
+ |-----------|-------------|
64
+ | **🎯 Accuracy** | Measures semantic similarity between predicted and ground truth answers |
65
+ | **πŸ“‹ Comprehensiveness** | Assesses coverage of all key aspects mentioned in the ground truth |
66
+ | **🧠 Depth** | Evaluates level of reasoning and insight demonstrated in predictions |
67
+ | **πŸ” Evidence** | Checks quality and relevance of supporting evidence provided |
68
+ | **πŸ”— Coherence** | Measures logical flow, organization, and clarity of responses |
69
 
70
+ Each dimension provides unique insights into different cognitive capabilities required for deep video understanding.
71
 
72
+ ## πŸ’» Usage
73
 
74
+ ### Evaluation Script
75
+
76
+ Evaluate your model's performance on MovieCORE using our evaluation script:
77
+
78
+ ```bash
79
+ export OPENAI_API_KEY='your_openai_api_key'
80
+ python evaluate_moviecore.py --pred_path path/to/your/predictions.json
81
+ ```
82
+
83
+ ### πŸ“ Input Format
84
+
85
+ Your predictions should follow this JSON structure:
86
+
87
+ ```json
88
+ {
89
+ "video_1.mp4": [
90
+ {
91
+ "question": "How does the video depict the unique adaptations of the species in the Sahara Desert, and what roles do these species play in their ecosystem?",
92
+ "answer": "The ground truth answer.",
93
+ "pred": "Your model's prediction.",
94
+ "classification": "the question classification"
95
+ },
96
+ {
97
+ "question": "The second question for video 1?",
98
+ "answer": "The ground truth answer.",
99
+ "pred": "Your model's prediction.",
100
+ "classification": "the question classification"
101
+ }
102
+ ],
103
+ "video_2.mp4": [
104
+ {
105
+ "question": "The only question for video 2",
106
+ "answer": "The ground truth answer.",
107
+ "pred": "Your model's prediction.",
108
+ "classification": "the question classification"
109
+ }
110
+ ]
111
+ }
112
+ ```
113
+
114
+ ### πŸ“ˆ Output
115
+
116
+ The evaluation script provides:
117
+ - Overall scores across all dimensions
118
+ - Classification-specific performance metrics
119
+ - Detailed breakdowns for comprehensive analysis
120
+
121
+ ## πŸ“š Citation
122
+
123
+ If you use MovieCORE in your research, please cite our paper:
124
+
125
+ ```bibtex
126
+ @misc{faure2025moviecorecognitivereasoningmovies,
127
+ title={MovieCORE: COgnitive REasoning in Movies},
128
+ author={Gueter Josmy Faure and Min-Hung Chen and Jia-Fong Yeh and Ying Cheng and Hung-Ting Su and Yung-Hao Tang and Shang-Hong Lai and Winston H. Hsu},
129
+ year={2025},
130
+ eprint={2508.19026},
131
+ archivePrefix={arXiv},
132
+ primaryClass={cs.CL},
133
+ url={https://arxiv.org/abs/2508.19026},
134
+ }
135
+ ```
136
+
137
+ ## 🀝 Contributing
138
+
139
+ We welcome contributions to MovieCORE! Please feel free to:
140
+ - Report issues or bugs
141
+ - Suggest improvements or new features
142
+ - Submit baseline implementations
143
+ - Provide feedback on the evaluation framework
144
+
145
+ ## πŸ“„ License
146
+
147
+ This dataset is provided under the MIT License. See [LICENSE](https://github.com/joslefaure/MovieCORE/blob/main/LICENSE) for more details.
148
+
149
+ ---
150
 
151
+ <div align="center">
152
+ <p>🎬 <strong>Advancing Video Understanding Through Cognitive Evaluation</strong> 🎬</p>
153
+
154
+ **[\ud83d\udcd6 Paper](https://arxiv.org/abs/2508.19026v1) | [\ud83e\udd17 Dataset](https://huggingface.co/datasets/MovieCORE/MovieCORE) | [\ud83d\udcbb Code](https://github.com/joslefaure/moviecore)**
155
+ </div>