atrost/math_sft_40K_trl_think_SFT_Regularized-0.5_Normalize-False Text Generation • 2B • Updated Sep 25 • 5
atrost/math_sft_40K_trl_think_SFT_Regularized-0.3_Normalize-True Text Generation • 2B • Updated Sep 25 • 2
atrost/math_sft_40K_trl_think_SFT_Regularized-0.3_Normalize-False Text Generation • 2B • Updated Sep 25 • 1
atrost/math_sft_40K_trl_think_SFT_Regularized-0.1_Normalize-True Text Generation • 2B • Updated Sep 25 • 2
atrost/math_sft_40K_trl_think_SFT_Regularized-0.1_Normalize-False Text Generation • 2B • Updated Sep 25 • 7
atrost/math_sft_40K_trl_think_SFT_Regularized-0.7_Normalize-True Text Generation • 2B • Updated Sep 25 • 2
atrost/math_sft_40K_trl_think_SFT_Regularized-0.7_Normalize-False Text Generation • 2B • Updated Sep 25 • 2
hdong0/Qwen3-1.7B-base-Open-R1-GRPO_deepscaler_acc_8192_nokl Text Generation • 2B • Updated 21 days ago • 210
hdong0/Qwen3-1.7B-base-Open-R1-GRPO_dapo_acc_4096_nokl Text Generation • 2B • Updated 21 days ago • 132
Kazuki1450/Qwen3-1.7B-Base_lightr1_stage1_1p0_0p0_1p0_grpo Text Generation • 2B • Updated 5 days ago • 46