Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data
Paper
•
2504.09895
•
Published
•
1
Datasets and models in: Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data.