YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

TR2-D2: Tree Search Guided Trajectory-Aware Fine-Tuning for Discrete Diffusion

Sophia Tang*, Yuchen Zhu*, Molei Tao, and Pranam Chatterjee

This is the repository for TR2-D2: Tree Search Guided Trajectory-Aware Fine-Tuning for Discrete Diffusion 🤖🌳. It is partially built on the PepTune repo (Tang et al. 2024) and MDNS (Zhu et al. 2025).

Inspired by the incredible success of off-policy reinforcement learning (RL), TR2-D2 introduces a general framework that enhances the performance of off-policy RL with tree search for discrete diffusion fine-tuning.

🤖 Off-policy RL enables learning from diffusion trajectories from the non-gradient tracking policy model by storing samples in a replay buffer for repeated use.

🌳 Tree search efficiently explores high-dimensional discrete sequence spaces to find the (often sparse) subspace of high-reward sequences and leverages the structural similarities of optimal sequences to exploit optimal sampling paths in the next iteration.

We use this framework to develop an efficient discrete diffusion fine-tuning strategy that leverages Monte-Carlo Tree Search (MCTS) to curate a replay buffer of optimal trajectories combined with an off-policy control-based RL algorithm grounded in stochastic optimal control theory, yielding theoretically guaranteed convergence to the optimal distribution. 🌟

Regulatory DNA Sequence Design 🧬

In this experiment, we fine-tune the pre-trained DNA enhancer MDM from DRAKES (Wang et al. 2025) trained on ~700k HepG2 sequences to optimize the measured enhancer activity using the reward oracles from DRAKES. Code and instructions to reproduce our results are provided in /tr2d2-dna.

Multi-Objective Therapeutic Peptide Design 🧫

In this experiment, we fine-tune the pre-trained unconditional peptide SMILES MDM from PepTune (Tang et al. 2024) to optimize multiple therapeutic properties, including target protein binding affinity, solubility, non-hemolysis, non-fouling, and permeability. We show that one-shot generation from the fine-tuned policy outperforms inference-time multi-objective guidance, marking a significant advance over prior fine-tuning methods. Code and instructions to reproduce our results are provided in /tr2d2-pep.

Citation

If you find this repository helpful for your publications, please consider citing our paper:

@article{tang2024tr2d2,
  title={TR2-D2: Tree Search Guided Trajectory-Aware Fine-Tuning for Discrete Diffusion},
  author={Sophia Tang and Yuchen Zhu and Molei Tao and Pranam Chatterjee},
  journal={arXiv preprint arXiv:2509.25171},
  year={2025}
}

To use this repository, you agree to abide by the PepTune License.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support