File size: 3,639 Bytes
e37da79 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 |
---
license: mit
tags:
- protein-generation
- antimicrobial-peptides
- flow-matching
- protein-design
- esm
- amp
library_name: pytorch
---
# FlowFinal: AMP Flow Matching Model
FlowFinal is a state-of-the-art flow matching model for generating antimicrobial peptides (AMPs). The model uses continuous normalizing flows to generate protein sequences in the ESM-2 embedding space.
## Model Description
- **Model Type**: Flow Matching for Protein Generation
- **Domain**: Antimicrobial Peptide (AMP) Generation
- **Base Model**: ESM-2 (650M parameters)
- **Architecture**: Transformer-based flow matching with classifier-free guidance (CFG)
- **Training Data**: Curated AMP dataset with ~7K sequences
## Key Features
- **Classifier-Free Guidance (CFG)**: Enables controlled generation with different conditioning strengths
- **ESM-2 Integration**: Leverages pre-trained protein language model embeddings
- **Compression Architecture**: Efficient 16x compression of ESM-2 embeddings (1280 β 80 dimensions)
- **Multiple CFG Scales**: Support for no conditioning (0.0), weak (3.0), strong (7.5), and very strong (15.0) guidance
## Model Components
### Core Architecture
- `final_flow_model.py`: Main flow matching model implementation
- `compressor_with_embeddings.py`: Embedding compression/decompression modules
- `final_sequence_decoder.py`: ESM-2 embedding to sequence decoder
### Trained Weights
- `final_compressor_model.pth`: Trained compressor (315MB)
- `final_decompressor_model.pth`: Trained decompressor (158MB)
- `amp_flow_model_final_optimized.pth`: Main flow model checkpoint
### Generated Samples (Today's Results)
- Generated AMP sequences with different CFG scales
- HMD-AMP validation results showing 8.8% AMP prediction rate
## Performance Results
### HMD-AMP Validation (80 sequences tested)
- **Total AMPs Predicted**: 7/80 (8.8%)
- **By CFG Configuration**:
- No CFG: 1/20 (5.0%)
- Weak CFG: 2/20 (10.0%)
- Strong CFG: 4/20 (20.0%) β Best performance
- Very Strong CFG: 0/20 (0.0%)
### Best Performing Sequences
1. `ILVLVLARRIVGVIVAKVVLYAIVRSVVAAAKSISAVTVAKVTVFFQTTA` (No CFG)
2. `EDLSKAKAELQRYLLLSEIVSAFTALTRFYVVLTKIFQIRVKLIAVGQIL` (Weak CFG)
3. `IKLSRIAGIIVKRIRVASGDAQRLITASIGFTLSVVLAARFITIILGIVI` (Strong CFG)
## Usage
```python
from generate_amps import AMPGenerator
# Initialize generator
generator = AMPGenerator(
model_path="amp_flow_model_final_optimized.pth",
device='cuda'
)
# Generate AMP samples
samples = generator.generate_amps(
num_samples=20,
num_steps=25,
cfg_scale=7.5 # Strong CFG recommended
)
```
## Training Details
- **Optimizer**: AdamW with cosine annealing
- **Learning Rate**: 4e-4 (final)
- **Epochs**: 2000
- **Final Loss**: 1.318
- **Training Time**: 2.3 hours on H100
- **Dataset Size**: 6,983 samples
## Files Structure
```
FlowFinal/
βββ models/
β βββ final_compressor_model.pth
β βββ final_decompressor_model.pth
β βββ amp_flow_model_final_optimized.pth
βββ generated_samples/
β βββ generated_sequences_20250829.fasta
β βββ hmd_amp_detailed_results.csv
βββ src/
β βββ final_flow_model.py
β βββ compressor_with_embeddings.py
β βββ final_sequence_decoder.py
β βββ generate_amps.py
βββ README.md
```
## Citation
If you use FlowFinal in your research, please cite:
```bibtex
@misc{flowfinal2025,
title={FlowFinal: Flow Matching for Antimicrobial Peptide Generation},
author={Edward Sun},
year={2025},
url={https://huggingface.co/esunAI/FlowFinal}
}
```
## License
This model is released under the MIT License.
|