esunAI commited on
Commit
e37da79
Β·
verified Β·
1 Parent(s): d02bb74

Add comprehensive model card

Browse files
Files changed (1) hide show
  1. README.md +125 -3
README.md CHANGED
@@ -1,3 +1,125 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - protein-generation
5
+ - antimicrobial-peptides
6
+ - flow-matching
7
+ - protein-design
8
+ - esm
9
+ - amp
10
+ library_name: pytorch
11
+ ---
12
+
13
+ # FlowFinal: AMP Flow Matching Model
14
+
15
+ FlowFinal is a state-of-the-art flow matching model for generating antimicrobial peptides (AMPs). The model uses continuous normalizing flows to generate protein sequences in the ESM-2 embedding space.
16
+
17
+ ## Model Description
18
+
19
+ - **Model Type**: Flow Matching for Protein Generation
20
+ - **Domain**: Antimicrobial Peptide (AMP) Generation
21
+ - **Base Model**: ESM-2 (650M parameters)
22
+ - **Architecture**: Transformer-based flow matching with classifier-free guidance (CFG)
23
+ - **Training Data**: Curated AMP dataset with ~7K sequences
24
+
25
+ ## Key Features
26
+
27
+ - **Classifier-Free Guidance (CFG)**: Enables controlled generation with different conditioning strengths
28
+ - **ESM-2 Integration**: Leverages pre-trained protein language model embeddings
29
+ - **Compression Architecture**: Efficient 16x compression of ESM-2 embeddings (1280 β†’ 80 dimensions)
30
+ - **Multiple CFG Scales**: Support for no conditioning (0.0), weak (3.0), strong (7.5), and very strong (15.0) guidance
31
+
32
+ ## Model Components
33
+
34
+ ### Core Architecture
35
+ - `final_flow_model.py`: Main flow matching model implementation
36
+ - `compressor_with_embeddings.py`: Embedding compression/decompression modules
37
+ - `final_sequence_decoder.py`: ESM-2 embedding to sequence decoder
38
+
39
+ ### Trained Weights
40
+ - `final_compressor_model.pth`: Trained compressor (315MB)
41
+ - `final_decompressor_model.pth`: Trained decompressor (158MB)
42
+ - `amp_flow_model_final_optimized.pth`: Main flow model checkpoint
43
+
44
+ ### Generated Samples (Today's Results)
45
+ - Generated AMP sequences with different CFG scales
46
+ - HMD-AMP validation results showing 8.8% AMP prediction rate
47
+
48
+ ## Performance Results
49
+
50
+ ### HMD-AMP Validation (80 sequences tested)
51
+ - **Total AMPs Predicted**: 7/80 (8.8%)
52
+ - **By CFG Configuration**:
53
+ - No CFG: 1/20 (5.0%)
54
+ - Weak CFG: 2/20 (10.0%)
55
+ - Strong CFG: 4/20 (20.0%) ← Best performance
56
+ - Very Strong CFG: 0/20 (0.0%)
57
+
58
+ ### Best Performing Sequences
59
+ 1. `ILVLVLARRIVGVIVAKVVLYAIVRSVVAAAKSISAVTVAKVTVFFQTTA` (No CFG)
60
+ 2. `EDLSKAKAELQRYLLLSEIVSAFTALTRFYVVLTKIFQIRVKLIAVGQIL` (Weak CFG)
61
+ 3. `IKLSRIAGIIVKRIRVASGDAQRLITASIGFTLSVVLAARFITIILGIVI` (Strong CFG)
62
+
63
+ ## Usage
64
+
65
+ ```python
66
+ from generate_amps import AMPGenerator
67
+
68
+ # Initialize generator
69
+ generator = AMPGenerator(
70
+ model_path="amp_flow_model_final_optimized.pth",
71
+ device='cuda'
72
+ )
73
+
74
+ # Generate AMP samples
75
+ samples = generator.generate_amps(
76
+ num_samples=20,
77
+ num_steps=25,
78
+ cfg_scale=7.5 # Strong CFG recommended
79
+ )
80
+ ```
81
+
82
+ ## Training Details
83
+
84
+ - **Optimizer**: AdamW with cosine annealing
85
+ - **Learning Rate**: 4e-4 (final)
86
+ - **Epochs**: 2000
87
+ - **Final Loss**: 1.318
88
+ - **Training Time**: 2.3 hours on H100
89
+ - **Dataset Size**: 6,983 samples
90
+
91
+ ## Files Structure
92
+
93
+ ```
94
+ FlowFinal/
95
+ β”œβ”€β”€ models/
96
+ β”‚ β”œβ”€β”€ final_compressor_model.pth
97
+ β”‚ β”œβ”€β”€ final_decompressor_model.pth
98
+ β”‚ └── amp_flow_model_final_optimized.pth
99
+ β”œβ”€β”€ generated_samples/
100
+ β”‚ β”œβ”€β”€ generated_sequences_20250829.fasta
101
+ β”‚ └── hmd_amp_detailed_results.csv
102
+ β”œβ”€β”€ src/
103
+ β”‚ β”œβ”€β”€ final_flow_model.py
104
+ β”‚ β”œβ”€β”€ compressor_with_embeddings.py
105
+ β”‚ β”œβ”€β”€ final_sequence_decoder.py
106
+ β”‚ └── generate_amps.py
107
+ └── README.md
108
+ ```
109
+
110
+ ## Citation
111
+
112
+ If you use FlowFinal in your research, please cite:
113
+
114
+ ```bibtex
115
+ @misc{flowfinal2025,
116
+ title={FlowFinal: Flow Matching for Antimicrobial Peptide Generation},
117
+ author={Edward Sun},
118
+ year={2025},
119
+ url={https://huggingface.co/esunAI/FlowFinal}
120
+ }
121
+ ```
122
+
123
+ ## License
124
+
125
+ This model is released under the MIT License.