File size: 3,639 Bytes
e37da79
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
---
license: mit
tags:
- protein-generation
- antimicrobial-peptides
- flow-matching
- protein-design
- esm
- amp
library_name: pytorch
---

# FlowFinal: AMP Flow Matching Model

FlowFinal is a state-of-the-art flow matching model for generating antimicrobial peptides (AMPs). The model uses continuous normalizing flows to generate protein sequences in the ESM-2 embedding space.

## Model Description

- **Model Type**: Flow Matching for Protein Generation
- **Domain**: Antimicrobial Peptide (AMP) Generation
- **Base Model**: ESM-2 (650M parameters)
- **Architecture**: Transformer-based flow matching with classifier-free guidance (CFG)
- **Training Data**: Curated AMP dataset with ~7K sequences

## Key Features

- **Classifier-Free Guidance (CFG)**: Enables controlled generation with different conditioning strengths
- **ESM-2 Integration**: Leverages pre-trained protein language model embeddings
- **Compression Architecture**: Efficient 16x compression of ESM-2 embeddings (1280 β†’ 80 dimensions)
- **Multiple CFG Scales**: Support for no conditioning (0.0), weak (3.0), strong (7.5), and very strong (15.0) guidance

## Model Components

### Core Architecture
- `final_flow_model.py`: Main flow matching model implementation
- `compressor_with_embeddings.py`: Embedding compression/decompression modules
- `final_sequence_decoder.py`: ESM-2 embedding to sequence decoder

### Trained Weights
- `final_compressor_model.pth`: Trained compressor (315MB)
- `final_decompressor_model.pth`: Trained decompressor (158MB)
- `amp_flow_model_final_optimized.pth`: Main flow model checkpoint

### Generated Samples (Today's Results)
- Generated AMP sequences with different CFG scales
- HMD-AMP validation results showing 8.8% AMP prediction rate

## Performance Results

### HMD-AMP Validation (80 sequences tested)
- **Total AMPs Predicted**: 7/80 (8.8%)
- **By CFG Configuration**:
  - No CFG: 1/20 (5.0%)
  - Weak CFG: 2/20 (10.0%)  
  - Strong CFG: 4/20 (20.0%) ← Best performance
  - Very Strong CFG: 0/20 (0.0%)

### Best Performing Sequences
1. `ILVLVLARRIVGVIVAKVVLYAIVRSVVAAAKSISAVTVAKVTVFFQTTA` (No CFG)
2. `EDLSKAKAELQRYLLLSEIVSAFTALTRFYVVLTKIFQIRVKLIAVGQIL` (Weak CFG)
3. `IKLSRIAGIIVKRIRVASGDAQRLITASIGFTLSVVLAARFITIILGIVI` (Strong CFG)

## Usage

```python
from generate_amps import AMPGenerator

# Initialize generator
generator = AMPGenerator(
    model_path="amp_flow_model_final_optimized.pth",
    device='cuda'
)

# Generate AMP samples
samples = generator.generate_amps(
    num_samples=20,
    num_steps=25,
    cfg_scale=7.5  # Strong CFG recommended
)
```

## Training Details

- **Optimizer**: AdamW with cosine annealing
- **Learning Rate**: 4e-4 (final)
- **Epochs**: 2000
- **Final Loss**: 1.318
- **Training Time**: 2.3 hours on H100
- **Dataset Size**: 6,983 samples

## Files Structure

```
FlowFinal/
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ final_compressor_model.pth
β”‚   β”œβ”€β”€ final_decompressor_model.pth
β”‚   └── amp_flow_model_final_optimized.pth
β”œβ”€β”€ generated_samples/
β”‚   β”œβ”€β”€ generated_sequences_20250829.fasta
β”‚   └── hmd_amp_detailed_results.csv
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ final_flow_model.py
β”‚   β”œβ”€β”€ compressor_with_embeddings.py
β”‚   β”œβ”€β”€ final_sequence_decoder.py
β”‚   └── generate_amps.py
└── README.md
```

## Citation

If you use FlowFinal in your research, please cite:

```bibtex
@misc{flowfinal2025,
  title={FlowFinal: Flow Matching for Antimicrobial Peptide Generation},
  author={Edward Sun},
  year={2025},
  url={https://huggingface.co/esunAI/FlowFinal}
}
```

## License

This model is released under the MIT License.