esunAI
/

FlowFinal

+---
+license: mit
+tags:
+- protein-generation
+- antimicrobial-peptides
+- flow-matching
+- protein-design
+- esm
+- amp
+library_name: pytorch
+---
+# FlowFinal: AMP Flow Matching Model
+FlowFinal is a state-of-the-art flow matching model for generating antimicrobial peptides (AMPs). The model uses continuous normalizing flows to generate protein sequences in the ESM-2 embedding space.
+## Model Description
+- **Model Type**: Flow Matching for Protein Generation
+- **Domain**: Antimicrobial Peptide (AMP) Generation
+- **Base Model**: ESM-2 (650M parameters)
+- **Architecture**: Transformer-based flow matching with classifier-free guidance (CFG)
+- **Training Data**: Curated AMP dataset with ~7K sequences
+## Key Features
+- **Classifier-Free Guidance (CFG)**: Enables controlled generation with different conditioning strengths
+- **ESM-2 Integration**: Leverages pre-trained protein language model embeddings
+- **Compression Architecture**: Efficient 16x compression of ESM-2 embeddings (1280 → 80 dimensions)
+- **Multiple CFG Scales**: Support for no conditioning (0.0), weak (3.0), strong (7.5), and very strong (15.0) guidance
+## Model Components
+### Core Architecture
+- `final_flow_model.py`: Main flow matching model implementation
+- `compressor_with_embeddings.py`: Embedding compression/decompression modules
+- `final_sequence_decoder.py`: ESM-2 embedding to sequence decoder
+### Trained Weights
+- `final_compressor_model.pth`: Trained compressor (315MB)
+- `final_decompressor_model.pth`: Trained decompressor (158MB)
+- `amp_flow_model_final_optimized.pth`: Main flow model checkpoint
+### Generated Samples (Today's Results)
+- Generated AMP sequences with different CFG scales
+- HMD-AMP validation results showing 8.8% AMP prediction rate
+## Performance Results
+### HMD-AMP Validation (80 sequences tested)
+- **Total AMPs Predicted**: 7/80 (8.8%)
+- **By CFG Configuration**:
+  - No CFG: 1/20 (5.0%)
+  - Weak CFG: 2/20 (10.0%)
+  - Strong CFG: 4/20 (20.0%) ← Best performance
+  - Very Strong CFG: 0/20 (0.0%)
+### Best Performing Sequences
+1. `ILVLVLARRIVGVIVAKVVLYAIVRSVVAAAKSISAVTVAKVTVFFQTTA` (No CFG)
+2. `EDLSKAKAELQRYLLLSEIVSAFTALTRFYVVLTKIFQIRVKLIAVGQIL` (Weak CFG)
+3. `IKLSRIAGIIVKRIRVASGDAQRLITASIGFTLSVVLAARFITIILGIVI` (Strong CFG)
+## Usage
+```python
+from generate_amps import AMPGenerator
+# Initialize generator
+generator = AMPGenerator(
+    model_path="amp_flow_model_final_optimized.pth",
+    device='cuda'
+)
+# Generate AMP samples
+samples = generator.generate_amps(
+    num_samples=20,
+    num_steps=25,
+    cfg_scale=7.5  # Strong CFG recommended
+)
+```
+## Training Details
+- **Optimizer**: AdamW with cosine annealing
+- **Learning Rate**: 4e-4 (final)
+- **Epochs**: 2000
+- **Final Loss**: 1.318
+- **Training Time**: 2.3 hours on H100
+- **Dataset Size**: 6,983 samples
+## Files Structure
+```
+FlowFinal/
+├── models/
+│   ├── final_compressor_model.pth
+│   ├── final_decompressor_model.pth
+│   └── amp_flow_model_final_optimized.pth
+├── generated_samples/
+│   ├── generated_sequences_20250829.fasta
+│   └── hmd_amp_detailed_results.csv
+├── src/
+│   ├── final_flow_model.py
+│   ├── compressor_with_embeddings.py
+│   ├── final_sequence_decoder.py
+│   └── generate_amps.py
+└── README.md
+```
+## Citation
+If you use FlowFinal in your research, please cite:
+```bibtex
+@misc{flowfinal2025,
+  title={FlowFinal: Flow Matching for Antimicrobial Peptide Generation},
+  author={Edward Sun},
+  year={2025},
+  url={https://huggingface.co/esunAI/FlowFinal}
+}
+```
+## License
+This model is released under the MIT License.