--- tags: - deep-learning - vision - VQA - Transformer - CNN license: apache-2.0 datasets: - LIVE-VQC - KoNViD-1k - YouTube-UGC - CVD2014 - LSVQ model-index: - name: ReLaX-VQA results: [] pipeline_tag: visual-question-answering --- # ReLaX-VQA Official Code for the following paper: **X. Wang, A. Katsenou, and D. Bull**. [ReLaX-VQA: Residual Fragment and Layer Stack Extraction for Enhancing Video Quality Assessment](https://arxiv.org/abs/2407.11496) --- [//]: # (## Abstract) [//]: # (With the rapid growth of User-Generated Content (UGC) exchanged between users and sharing platforms, the need for video quality assessment in the wild has emerged. UGC is mostly acquired using consumer devices and undergoes multiple rounds of compression or transcoding before reaching the end user. Therefore, traditional quality metrics that require the original content as a reference cannot be used. In this paper, we propose ReLaX-VQA, a novel No-Reference Video Quality Assessment (NR-VQA) model that aims to address the challenges of evaluating the diversity of video content and the assessment of its quality without reference videos. ReLaX-VQA uses fragments of residual frames and optical flow, along with different expressions of spatial features of the sampled frames, to enhance motion and spatial perception. Furthermore, the model enhances abstraction by employing layer-stacking techniques in deep neural network features (from Residual Networks and Vision Transformers). Extensive testing on four UGC datasets confirms that ReLaX-VQA outperforms existing NR-VQA methods with an average SRCC value of 0.8658 and PLCC value of 0.8872. We will open source the code and trained models to facilitate further research and applications of NR-VQA.) ## Performance [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/relax-vqa-residual-fragment-and-layer-stack/video-quality-assessment-on-live-vqc)](https://paperswithcode.com/sota/video-quality-assessment-on-live-vqc?p=relax-vqa-residual-fragment-and-layer-stack) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/relax-vqa-residual-fragment-and-layer-stack/video-quality-assessment-on-youtube-ugc)](https://paperswithcode.com/sota/video-quality-assessment-on-youtube-ugc?p=relax-vqa-residual-fragment-and-layer-stack) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/relax-vqa-residual-fragment-and-layer-stack/video-quality-assessment-on-konvid-1k)](https://paperswithcode.com/sota/video-quality-assessment-on-konvid-1k?p=relax-vqa-residual-fragment-and-layer-stack) We evaluate the performance of ReLaX-VQA on four datasets. ReLaX-VQA has three different versions based on the training and testing strategies: - **ReLaX-VQA**: Trained and tested on each dataset with an **80%-20% random split**. - **ReLaX-VQA (w/o FT)**: Trained on **[LSVQ](https://github.com/baidut/PatchVQ)**, and the frozen model was tested on other datasets. - **ReLaX-VQA (w/ FT)**: Trained on **[LSVQ](https://github.com/baidut/PatchVQ)**, and the frozen model was **fine-tuned** on other datasets. #### **Spearman’s Rank Correlation Coefficient (SRCC)** | Model | CVD2014 | KoNViD-1k | LIVE-VQC | YouTube-UGC | |-----------------------|--------|--------|--------|--------| | ReLaX-VQA | 0.8643 | 0.8535 | 0.7655 | 0.8014 | | ReLaX-VQA (w/o FT) | 0.7845 | 0.8312 | 0.7664 | 0.8104 | | **ReLaX-VQA (w/ FT)** | **0.8974** | **0.8720** | **0.8468** | **0.8469** | #### **Pearson’s Linear Correlation Coefficient (PLCC)** | Model | CVD2014 | KoNViD-1k | LIVE-VQC | YouTube-UGC | |-----------------------|------------|----------|----------|-------------| | ReLaX-VQA | 0.8895 | 0.8473 | 0.8079 | 0.8204 | | ReLaX-VQA (w/o FT) | 0.8336 | 0.8427 | 0.8242 | 0.8354 | | **ReLaX-VQA (w/ FT)** | **0.9294** | **0.8668** | **0.8876** | **0.8652** | More results can be found in **[reported_result.ipynb](https://huggingface.co/xinyiW915/ReLaX-VQA/blob/main/reported_result.ipynb)**. ## Proposed Model The figure shows the overview of the proposed ReLaX-VQA framework. The architectures of ResNet-50 Stack (I) and ResNet-50 Pool (II) are provided in Fig.2 in the paper. proposed_ReLaX-VQA_framework ## Usage ### 📌 Install Requirement The repository is built with **Python 3.10.14** and can be installed via the following commands: ```shell git clone https://github.com/xinyiW915/ReLaX-VQA.git cd ReLaX-VQA conda create -n relaxvqa python=3.10.14 -y conda activate relaxvqa pip install -r requirements.txt ``` ### 📥 Download UGC Datasets The corresponding raw video datasets can be downloaded from the following sources: [LSVQ](https://github.com/baidut/PatchVQ), [KoNViD-1k](https://database.mmsp-kn.de/konvid-1k-database.html), [LIVE-VQC](https://live.ece.utexas.edu/research/LIVEVQC/), [YouTube-UGC](https://media.withyoutube.com/), [CVD2014](https://qualinet.github.io/databases/video/cvd2014_video_database/). The metadata for the experimented UGC dataset is available under [`./metadata`](https://huggingface.co/xinyiW915/ReLaX-VQA/tree/main/metadata). Once downloaded, place the datasets in [`./ugc_original_videos`](https://huggingface.co/xinyiW915/ReLaX-VQA/tree/main/ugc_original_videos) or any other storage location of your choice. Ensure that the `video_path` in the [`get_video_paths`](https://huggingface.co/xinyiW915/ReLaX-VQA/blob/main/src/main_relaxvqa_feats.py) function inside `main_relaxvqa_feats.py` is updated accordingly. ### 🎬 Test Demo Run the pre-trained models to evaluate the quality of a single video. The model weights provided in [`./model`](https://huggingface.co/xinyiW915/ReLaX-VQA/tree/main/model) contain the best-performing saved weights from training. To evaluate the quality of a specific video, run the following command: ```shell python demo_test_gpu.py -device -train_data_name -is_finetune -save_path -video_type -video_name -framerate ``` Or simply try our demo video by running: ```shell python demo_test_gpu.py ``` ### 🧪 How to Use the Pretrained Model You can download and load the model using `huggingface_hub`: ```python from huggingface_hub import hf_hub_download import torch # Download the pretrained model file model_path = hf_hub_download( repo_id="xinyiW915/ReLaX-VQA", filename="model/lsvq_train_relaxvqa_byrmse_trained_median_model_param_onLSVQ_TEST.pth" ) state_dict = torch.load(model_path) fixed_state_dict = fix_state_dict(state_dict) model.load_state_dict(fixed_state_dict) # Use this with your model class ``` ## Training Steps to train ReLaX-VQA from scratch on different datasets. ### Extract Features Run the following command to extract features from videos: ```shell python main_relaxvqa_feats.py -device gpu -video_type youtube_ugc ``` ### Train Model Train our model using extracted features: ```shell python model_regression_simple.py -data_name youtube_ugc -feature_path ../features/ -save_path ../model/ ``` For **LSVQ**, train the model using: ```shell python model_regression.py -data_name lsvq_train -feature_path ../features/ -save_path ../model/ ``` ### Fine-Tuning To fine-tune the pre-trained model on a new dataset, modify [`train_data_name`](https://huggingface.co/xinyiW915/ReLaX-VQA/blob/main/src/model_finetune.py) to match the dataset used for training, and [`test_data_name`](https://huggingface.co/xinyiW915/ReLaX-VQA/blob/main/src/model_finetune.py) to specify the dataset for fine-tuning. ```shell python model_finetune.py ``` ## Ablation Study A detailed analysis of different components in ReLaX-VQA. ### Spatio-Temporal Fragmentation & DNN Layer Stacking Key techniques used in ReLaX-VQA: - **Fragmentation with DNN layer stacking:** ```shell python feature_fragment_layerstack.py ``` - **Fragmentation with DNN layer pooling:** ```shell python feature_fragment_pool.py ``` - **Frame with DNN layer stacking:** ```shell python feature_layerstack.py ``` - **Frame with DNN layer pooling:** ```shell python feature_pool.py ``` ### Other Utilities #### Excluding Greyscale Videos We exclude greyscale videos in our experiments. You can use [`check_greyscale.py`](https://huggingface.co/xinyiW915/ReLaX-VQA/blob/main/src/data_processing/check_greyscale.py) to filter out greyscale videos from the VQA dataset you want to use. ```shell python check_greyscale.py ``` #### Metadata Extraction For easy extraction of metadata from your VQA dataset, use: ```shell python extract_metadata_NR.py ``` ## Acknowledgment This work was funded by the UKRI MyWorld Strength in Places Programme (SIPF00006/1) as part of my PhD study. ## Citation If you find this paper and the repo useful, please cite our paper 😊: ```bibtex @article{wang2024relax, title={ReLaX-VQA: Residual Fragment and Layer Stack Extraction for Enhancing Video Quality Assessment}, author={Wang, Xinyi and Katsenou, Angeliki and Bull, David}, year={2024}, eprint={2407.11496}, archivePrefix={arXiv}, primaryClass={eess.IV}, url={https://arxiv.org/abs/2407.11496}, } ``` ## Contact: Xinyi WANG, ```xinyi.wang@bristol.ac.uk```