File size: 4,015 Bytes
b32ee93
 
 
 
 
78ba665
b32ee93
 
 
45b21f7
b32ee93
 
 
78ba665
2500245
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
78ba665
 
2500245
 
 
78ba665
 
 
 
 
 
 
 
 
 
 
 
 
2500245
 
 
78ba665
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2500245
78ba665
 
 
 
 
 
 
2500245
78ba665
 
 
 
 
 
 
 
 
 
 
 
 
2500245
78ba665
2500245
 
 
 
78ba665
2500245
 
78ba665
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
---
title: MVP
emoji: πŸ†
colorFrom: blue
colorTo: pink
sdk: streamlit
app_file: app.py
pinned: false
short_description: msms annotation tool
python_version: 3.11.7
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

# πŸ† MultiView Projection (MVP) for Spectra Annotation


### Authors  
**Yan Zhou Chen, Soha Hassoun**  
Department of Computer Science, Tufts University  

---

MVP is a framework for **ranking molecular candidates given a spectrum**. This repository provides the official implementation, pretrained models, and utilities for data preparation and training.

---

## πŸ“‘ Table of Contents
0. [Quick Test](#quick-test)  
1. [Install & Setup](#install--setup)  
2. [Data Preparation](#data-prep)  
3. [MassSpecGym Data Download](#massspecgym-data-download)  
4. [Using the Pretrained Model](#use-our-pretrained-model)  
5. [Training from Scratch](#training-from-scratch)  
6. [References](#references)  

---


## πŸš€ Quick Test
Run MVP instantly with our [interactive app](https://huggingface.co/spaces/HassounLab/MVP) for small-scale experiments.

---


## βš™οΈ Install & setup
1. Clone the repository: `git clone https://huggingface.co/spaces/HassounLab/MVP/`
2. Install evironment or only key packages:
```
conda create -n mvp python=3.11
conda activate mvp
pip install -r requirements.txt
``` 
#### Key packages
- python
- dgl
- pytorch
- rdkit
- pytorch-geometric
- numpy
- scikit-learn
- scipy
- massspecgym
- lightning

---

## πŸ“‚ Data prep
We provide sample spectra data and candidates in `data/sample`. 
For preprocessing:
1. If using formSpec, compute subformula labels
2. Run our preprocess code to obatain fingerprints and consensus spectra files

```
# If using formSpec
python subformula_assign/assign_subformulae.py --spec-files ../data/sample/data.tsv --output-dir ../data/sample/subformulae_default --max-formulae 60 --labels-file ../data/sample/data.tsv
python data_preprocess.py --spec_type formSpec --dataset_pth ../data/sample/data.tsv --candidates_pth  ../data/sample/candidates_mass.json --subformula_dir_pth ../data/sample/subformulae_default/ --output_dir ../data/sample/

# If using binnedSpec
python data_preprocess.py --spec_type binnedSpec --dataset_pth ../data/sample/data.tsv --candidates_pth  ../data/sample/candidates_mass.json --output_dir ../data/sample/

```
We include sample subformula, fingerprint, and consensus spectra data in `../data/sample/`.

## Use our pretrained model
You can use our pretrained model (on MassSpecGym) to rank molecular candidates by providing the spectra data and a list of candidates.

After prepping your data, modify the params_binnedSpec.yaml or params_formSpec.yaml with your dataset paths:

```
# If using formSpec
python test.py --param_pth params_formSpec.yaml

# If using binnedSpec
python test.py --param_pth params_binnedSpec.yaml
```

We provide a notebook showing sample result files in `notebooks/demo.ipynb`
---
## MassSpecGym data download
Our model is trained on [MassSpecGym dataset](https://github.com/pluskal-lab/MassSpecGym). Follow their instruction to download the spectra and candidate dataset.

You can preprocess the MassSpecGym dataset as descirbed in the above section or download the preprocessed files as follows:
```
mkdir data/msgym/
cd data/msgym
wget https://zenodo.org/records/15223987/files/msgym_preprocessed.zip?download=1
```
## Training from scratch
To train a model from scratch:
1. Prepare data as described in the data prep section
2. Modify the configuration in params file as necessary
3. Train using the following
```
# If using formSpec
python train.py --param_pth params_formSpec.yaml

# If using binnedSpec
python train.py --param_pth params_binnedSpec.yaml
```
---

## πŸ“š References
Preprint:[Learning from All Views: A Multiview Contrastive Framework for Metabolite Annotation](https://www.biorxiv.org/content/10.1101/2025.11.12.688047v1)

---

## πŸ“§ Contact
For questions, reach out to: [email protected]

=======