Datasets and models in: Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data.
-
Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data
Paper • 2504.09895 • Published • 1 -
mzhaoshuai/Mistral-7B-v0.1-conf-sft
Text Generation • Updated • 55 -
mzhaoshuai/Llama-3.3-70B-Inst-awq_ultrafeedback_1in3
Viewer • Updated • 61.1k • 46 -
mzhaoshuai/Llama-3.3-70B-Inst-awq_SafeRLHF
Preview • Updated • 29