RefAlign: RL with Similarity-based Rewards
Collection
Datasets and models in: Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data.
•
19 items
•
Updated
•
1