Spaces:

HassounLab
/

MVP

Sleeping

App Files Files Community

MVP / README.md

yzhouchen001

cleaned up description

2500245 29 days ago

preview code

raw

history blame contribute delete

4.02 kB

	---
	title: MVP
	emoji: 🏆
	colorFrom: blue
	colorTo: pink
	sdk: streamlit
	app_file: app.py
	pinned: false
	short_description: msms annotation tool
	python_version: 3.11.7
	---

	Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

	# 🏆 MultiView Projection (MVP) for Spectra Annotation


	### Authors
	Yan Zhou Chen, Soha Hassoun
	Department of Computer Science, Tufts University

	---

	MVP is a framework for ranking molecular candidates given a spectrum. This repository provides the official implementation, pretrained models, and utilities for data preparation and training.

	---

	## 📑 Table of Contents
	0. [Quick Test](#quick-test)
	1. [Install & Setup](#install--setup)
	2. [Data Preparation](#data-prep)
	3. [MassSpecGym Data Download](#massspecgym-data-download)
	4. [Using the Pretrained Model](#use-our-pretrained-model)
	5. [Training from Scratch](#training-from-scratch)
	6. [References](#references)

	---


	## 🚀 Quick Test
	Run MVP instantly with our [interactive app](https://huggingface.co/spaces/HassounLab/MVP) for small-scale experiments.

	---


	## ⚙️ Install & setup
	1. Clone the repository: `git clone https://huggingface.co/spaces/HassounLab/MVP/`
	2. Install evironment or only key packages:
	```
	conda create -n mvp python=3.11
	conda activate mvp
	pip install -r requirements.txt
	```
	#### Key packages
	- python
	- dgl
	- pytorch
	- rdkit
	- pytorch-geometric
	- numpy
	- scikit-learn
	- scipy
	- massspecgym
	- lightning

	---

	## 📂 Data prep
	We provide sample spectra data and candidates in `data/sample`.
	For preprocessing:
	1. If using formSpec, compute subformula labels
	2. Run our preprocess code to obatain fingerprints and consensus spectra files

	```
	# If using formSpec
	python subformula_assign/assign_subformulae.py --spec-files ../data/sample/data.tsv --output-dir ../data/sample/subformulae_default --max-formulae 60 --labels-file ../data/sample/data.tsv
	python data_preprocess.py --spec_type formSpec --dataset_pth ../data/sample/data.tsv --candidates_pth ../data/sample/candidates_mass.json --subformula_dir_pth ../data/sample/subformulae_default/ --output_dir ../data/sample/

	# If using binnedSpec
	python data_preprocess.py --spec_type binnedSpec --dataset_pth ../data/sample/data.tsv --candidates_pth ../data/sample/candidates_mass.json --output_dir ../data/sample/

	```
	We include sample subformula, fingerprint, and consensus spectra data in `../data/sample/`.

	## Use our pretrained model
	You can use our pretrained model (on MassSpecGym) to rank molecular candidates by providing the spectra data and a list of candidates.

	After prepping your data, modify the params_binnedSpec.yaml or params_formSpec.yaml with your dataset paths:

	```
	# If using formSpec
	python test.py --param_pth params_formSpec.yaml

	# If using binnedSpec
	python test.py --param_pth params_binnedSpec.yaml
	```

	We provide a notebook showing sample result files in `notebooks/demo.ipynb`
	---
	## MassSpecGym data download
	Our model is trained on [MassSpecGym dataset](https://github.com/pluskal-lab/MassSpecGym). Follow their instruction to download the spectra and candidate dataset.

	You can preprocess the MassSpecGym dataset as descirbed in the above section or download the preprocessed files as follows:
	```
	mkdir data/msgym/
	cd data/msgym
	wget https://zenodo.org/records/15223987/files/msgym_preprocessed.zip?download=1
	```
	## Training from scratch
	To train a model from scratch:
	1. Prepare data as described in the data prep section
	2. Modify the configuration in params file as necessary
	3. Train using the following
	```
	# If using formSpec
	python train.py --param_pth params_formSpec.yaml

	# If using binnedSpec
	python train.py --param_pth params_binnedSpec.yaml
	```
	---

	## 📚 References
	Preprint:[Learning from All Views: A Multiview Contrastive Framework for Metabolite Annotation](https://www.biorxiv.org/content/10.1101/2025.11.12.688047v1)

	---

	## 📧 Contact
	For questions, reach out to: [email protected]

	=======