File size: 3,314 Bytes
5e90249 eeeef3f 5e90249 b72ad57 5e90249 b72ad57 5e90249 b72ad57 5e90249 b72ad57 5e90249 b72ad57 5e90249 63d28a7 5e90249 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
# [TR2-D2: Tree Search Guided Trajectory-Aware Fine-Tuning for Discrete Diffusion](https://arxiv.org/abs/2509.25171) 🤖🌳
[**Sophia Tang**](https://sophtang.github.io/)\*, [**Yuchen Zhu**](https://yuchen-zhu-zyc.github.io/)\*, [**Molei Tao**](https://mtao8.math.gatech.edu/), and [**Pranam Chatterjee**](https://www.chatterjeelab.com/)

This is the repository for **[TR2-D2: Tree Search Guided Trajectory-Aware Fine-Tuning for Discrete Diffusion](https://arxiv.org/abs/2509.25171)** 🤖🌳. It is partially built on the **[PepTune repo](https://huggingface.co/ChatterjeeLab/PepTune)** ([Tang et al. 2024](https://arxiv.org/abs/2412.17780)) and **MDNS** ([Zhu et al. 2025](https://arxiv.org/abs/2508.10684)).
Inspired by the incredible success of off-policy reinforcement learning (RL), **TR2-D2** introduces a general framework that enhances the performance of off-policy RL with tree search for discrete diffusion fine-tuning.
🤖 Off-policy RL enables learning from diffusion trajectories from the non-gradient tracking policy model by storing samples in a replay buffer for repeated use.
🌳 Tree search efficiently explores high-dimensional discrete sequence spaces to find the (often sparse) subspace of high-reward sequences and leverages the structural similarities of optimal sequences to exploit optimal sampling paths in the next iteration.
We use this framework to develop an efficient discrete diffusion fine-tuning strategy that leverages **Monte-Carlo Tree Search (MCTS)** to curate a replay buffer of optimal trajectories combined with an **off-policy control-based RL algorithm grounded in stochastic optimal control theory**, yielding theoretically guaranteed convergence to the optimal distribution. 🌟
## Regulatory DNA Sequence Design 🧬
In this experiment, we fine-tune the pre-trained **DNA enhancer MDM from DRAKES** (Wang et al. 2025) trained on **~700k HepG2 sequences** to optimize the measured enhancer activity using the reward oracles from DRAKES. Code and instructions to reproduce our results are provided in `/tr2d2-dna`.
## Multi-Objective Therapeutic Peptide Design 🧫
In this experiment, we fine-tune the pre-trained **unconditional peptide SMILES MDM from PepTune** ([Tang et al. 2024](https://arxiv.org/abs/2412.17780)) to optimize **multiple therapeutic properties**, including target protein binding affinity, solubility, non-hemolysis, non-fouling, and permeability. We show that one-shot generation from the fine-tuned policy outperforms inference-time multi-objective guidance, marking a significant advance over prior fine-tuning methods. Code and instructions to reproduce our results are provided in `/tr2d2-pep`.

## Citation
If you find this repository helpful for your publications, please consider citing our paper:
```python
@article{tang2024tr2d2,
title={TR2-D2: Tree Search Guided Trajectory-Aware Fine-Tuning for Discrete Diffusion},
author={Sophia Tang and Yuchen Zhu and Molei Tao and Pranam Chatterjee},
journal={arXiv preprint arXiv:2509.25171},
year={2025}
}
```
To use this repository, you agree to abide by the [PepTune License](https://drive.google.com/file/d/1Hsu91wTmxyoJLNJzfPDw5_nTbxVySP5x/view?usp=sharing). |