Spaces:
Sleeping
Sleeping
| title: MVP | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: pink | |
| sdk: streamlit | |
| app_file: app.py | |
| pinned: false | |
| short_description: msms annotation tool | |
| python_version: 3.11.7 | |
| Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference | |
| # π MultiView Projection (MVP) for Spectra Annotation | |
| ### Authors | |
| **Yan Zhou Chen, Soha Hassoun** | |
| Department of Computer Science, Tufts University | |
| --- | |
| MVP is a framework for **ranking molecular candidates given a spectrum**. This repository provides the official implementation, pretrained models, and utilities for data preparation and training. | |
| --- | |
| ## π Table of Contents | |
| 0. [Quick Test](#quick-test) | |
| 1. [Install & Setup](#install--setup) | |
| 2. [Data Preparation](#data-prep) | |
| 3. [MassSpecGym Data Download](#massspecgym-data-download) | |
| 4. [Using the Pretrained Model](#use-our-pretrained-model) | |
| 5. [Training from Scratch](#training-from-scratch) | |
| 6. [References](#references) | |
| --- | |
| ## π Quick Test | |
| Run MVP instantly with our [interactive app](https://huggingface.co/spaces/HassounLab/MVP) for small-scale experiments. | |
| --- | |
| ## βοΈ Install & setup | |
| 1. Clone the repository: `git clone https://huggingface.co/spaces/HassounLab/MVP/` | |
| 2. Install evironment or only key packages: | |
| ``` | |
| conda create -n mvp python=3.11 | |
| conda activate mvp | |
| pip install -r requirements.txt | |
| ``` | |
| #### Key packages | |
| - python | |
| - dgl | |
| - pytorch | |
| - rdkit | |
| - pytorch-geometric | |
| - numpy | |
| - scikit-learn | |
| - scipy | |
| - massspecgym | |
| - lightning | |
| --- | |
| ## π Data prep | |
| We provide sample spectra data and candidates in `data/sample`. | |
| For preprocessing: | |
| 1. If using formSpec, compute subformula labels | |
| 2. Run our preprocess code to obatain fingerprints and consensus spectra files | |
| ``` | |
| # If using formSpec | |
| python subformula_assign/assign_subformulae.py --spec-files ../data/sample/data.tsv --output-dir ../data/sample/subformulae_default --max-formulae 60 --labels-file ../data/sample/data.tsv | |
| python data_preprocess.py --spec_type formSpec --dataset_pth ../data/sample/data.tsv --candidates_pth ../data/sample/candidates_mass.json --subformula_dir_pth ../data/sample/subformulae_default/ --output_dir ../data/sample/ | |
| # If using binnedSpec | |
| python data_preprocess.py --spec_type binnedSpec --dataset_pth ../data/sample/data.tsv --candidates_pth ../data/sample/candidates_mass.json --output_dir ../data/sample/ | |
| ``` | |
| We include sample subformula, fingerprint, and consensus spectra data in `../data/sample/`. | |
| ## Use our pretrained model | |
| You can use our pretrained model (on MassSpecGym) to rank molecular candidates by providing the spectra data and a list of candidates. | |
| After prepping your data, modify the params_binnedSpec.yaml or params_formSpec.yaml with your dataset paths: | |
| ``` | |
| # If using formSpec | |
| python test.py --param_pth params_formSpec.yaml | |
| # If using binnedSpec | |
| python test.py --param_pth params_binnedSpec.yaml | |
| ``` | |
| We provide a notebook showing sample result files in `notebooks/demo.ipynb` | |
| --- | |
| ## MassSpecGym data download | |
| Our model is trained on [MassSpecGym dataset](https://github.com/pluskal-lab/MassSpecGym). Follow their instruction to download the spectra and candidate dataset. | |
| You can preprocess the MassSpecGym dataset as descirbed in the above section or download the preprocessed files as follows: | |
| ``` | |
| mkdir data/msgym/ | |
| cd data/msgym | |
| wget https://zenodo.org/records/15223987/files/msgym_preprocessed.zip?download=1 | |
| ``` | |
| ## Training from scratch | |
| To train a model from scratch: | |
| 1. Prepare data as described in the data prep section | |
| 2. Modify the configuration in params file as necessary | |
| 3. Train using the following | |
| ``` | |
| # If using formSpec | |
| python train.py --param_pth params_formSpec.yaml | |
| # If using binnedSpec | |
| python train.py --param_pth params_binnedSpec.yaml | |
| ``` | |
| --- | |
| ## π References | |
| Preprint:[Learning from All Views: A Multiview Contrastive Framework for Metabolite Annotation](https://www.biorxiv.org/content/10.1101/2025.11.12.688047v1) | |
| --- | |
| ## π§ Contact | |
| For questions, reach out to: [email protected] | |
| ======= |