TMLR-Group-HF/Co-rewarding-I-Qwen3-8B-Base-DAPO14k Text Generation • 8B • Updated 19 days ago • 26 • 1
TMLR-Group-HF/Self-Certainty-Qwen3-8B-Base-DAPO14k Text Generation • 8B • Updated 19 days ago • 35 • 1
TMLR-Group-HF/Self-Certainty-Qwen3-4B-Base-DAPO14k Text Generation • 4B • Updated 19 days ago • 27 • 1
TMLR-Group-HF/Self-Certainty-Llama-3.2-3B-Instruct-DAPO14k Text Generation • 4B • Updated 19 days ago • 46
TMLR-Group-HF/Majority-Voting-Qwen3-8B-Base-DAPO14k Text Generation • 8B • Updated 19 days ago • 32 • 2
TMLR-Group-HF/Co-rewarding-I-Qwen3-8B-Base-OpenRS Text Generation • 8B • Updated 19 days ago • 39 • 1
TMLR-Group-HF/Majority-Voting-Llama-3.2-3B-Instruct-DAPO14k Text Generation • 4B • Updated 19 days ago • 35
TMLR-Group-HF/Self-Certainty-Qwen3-8B-Base-OpenRS Text Generation • 8B • Updated 19 days ago • 27 • 1
TMLR-Group-HF/Majority-Voting-Qwen3-8B-Base-OpenRS Text Generation • 8B • Updated 19 days ago • 30 • 1