💡 DICE
					Collection
				
Self-alignment with DPO Implicit Rewards
					• 
				5 items
				• 
				Updated
					
				•
					
					9
This model was developed using Bootstrapping Language Models with DPO Implicit Rewards (DICE) at iteration 1, based on the HuggingFaceH4/zephyr-7b-beta as the starting point.
| Model | LC. Win Rate | Win Rate | 
|---|---|---|
| Zephyr-7b-beta | 12.69 | 10.71 | 
| Zephyr-7B-DICE-Iter1 | 19.03 | 17.67 | 
| Zephyr-7B-DICE-Iter2 | 20.71 | 20.16 | 
https://github.com/sail-sg/dice
@article{chen2024bootstrapping,
  title={Bootstrapping Language Models with DPO Implicit Rewards},
  author={Chen, Changyu and Liu, Zichen and Du, Chao and Pang, Tianyu and Liu, Qian and Sinha, Arunesh and Varakantham, Pradeep and Lin, Min},
  journal={arXiv preprint arXiv:2406.09760},
  year={2024}
}