--- title: MVP emoji: 🏆 colorFrom: blue colorTo: pink sdk: streamlit app_file: app.py pinned: false short_description: msms annotation tool python_version: 3.11.7 --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference # 🏆 MultiView Projection (MVP) for Spectra Annotation ### Authors **Yan Zhou Chen, Soha Hassoun** Department of Computer Science, Tufts University --- MVP is a framework for **ranking molecular candidates given a spectrum**. This repository provides the official implementation, pretrained models, and utilities for data preparation and training. --- ## 📑 Table of Contents 0. [Quick Test](#quick-test) 1. [Install & Setup](#install--setup) 2. [Data Preparation](#data-prep) 3. [MassSpecGym Data Download](#massspecgym-data-download) 4. [Using the Pretrained Model](#use-our-pretrained-model) 5. [Training from Scratch](#training-from-scratch) 6. [References](#references) --- ## 🚀 Quick Test Run MVP instantly with our [interactive app](https://huggingface.co/spaces/HassounLab/MVP) for small-scale experiments. --- ## ⚙️ Install & setup 1. Clone the repository: `git clone https://huggingface.co/spaces/HassounLab/MVP/` 2. Install evironment or only key packages: ``` conda create -n mvp python=3.11 conda activate mvp pip install -r requirements.txt ``` #### Key packages - python - dgl - pytorch - rdkit - pytorch-geometric - numpy - scikit-learn - scipy - massspecgym - lightning --- ## 📂 Data prep We provide sample spectra data and candidates in `data/sample`. For preprocessing: 1. If using formSpec, compute subformula labels 2. Run our preprocess code to obatain fingerprints and consensus spectra files ``` # If using formSpec python subformula_assign/assign_subformulae.py --spec-files ../data/sample/data.tsv --output-dir ../data/sample/subformulae_default --max-formulae 60 --labels-file ../data/sample/data.tsv python data_preprocess.py --spec_type formSpec --dataset_pth ../data/sample/data.tsv --candidates_pth ../data/sample/candidates_mass.json --subformula_dir_pth ../data/sample/subformulae_default/ --output_dir ../data/sample/ # If using binnedSpec python data_preprocess.py --spec_type binnedSpec --dataset_pth ../data/sample/data.tsv --candidates_pth ../data/sample/candidates_mass.json --output_dir ../data/sample/ ``` We include sample subformula, fingerprint, and consensus spectra data in `../data/sample/`. ## Use our pretrained model You can use our pretrained model (on MassSpecGym) to rank molecular candidates by providing the spectra data and a list of candidates. After prepping your data, modify the params_binnedSpec.yaml or params_formSpec.yaml with your dataset paths: ``` # If using formSpec python test.py --param_pth params_formSpec.yaml # If using binnedSpec python test.py --param_pth params_binnedSpec.yaml ``` We provide a notebook showing sample result files in `notebooks/demo.ipynb` --- ## MassSpecGym data download Our model is trained on [MassSpecGym dataset](https://github.com/pluskal-lab/MassSpecGym). Follow their instruction to download the spectra and candidate dataset. You can preprocess the MassSpecGym dataset as descirbed in the above section or download the preprocessed files as follows: ``` mkdir data/msgym/ cd data/msgym wget https://zenodo.org/records/15223987/files/msgym_preprocessed.zip?download=1 ``` ## Training from scratch To train a model from scratch: 1. Prepare data as described in the data prep section 2. Modify the configuration in params file as necessary 3. Train using the following ``` # If using formSpec python train.py --param_pth params_formSpec.yaml # If using binnedSpec python train.py --param_pth params_binnedSpec.yaml ``` --- ## 📚 References Preprint:[Learning from All Views: A Multiview Contrastive Framework for Metabolite Annotation](https://www.biorxiv.org/content/10.1101/2025.11.12.688047v1) --- ## 📧 Contact For questions, reach out to: Soha.Hassoun@tufts.edu =======