rl-rag/qwen3-8B-sft-mix-v20250921-plus-v20251001-onpolicy-rs-longform_0921 Text Generation • 8B • Updated 27 days ago • 1.36k
rl-rag/qwen3-8B-sft-mix-v20250921-plus-v20251001-onpolicy-rs-longform_0921 Text Generation • 8B • Updated 27 days ago • 1.36k
rl-rag/sft_rejection_sampled_on_policy_long-_form_sft_0921 Viewer • Updated about 1 month ago • 2.22k • 11
rl-rag/sft_rejection_sampled_on_policy_long-_form_sft_0921 Viewer • Updated about 1 month ago • 2.22k • 11