TMLR-Group-HF/Majority-Voting-Llama-3.2-3B-Instruct-DAPO14k Text Generation • 4B • Updated Oct 11 • 5
mradermacher/Self-Certainty-Qwen3-1.7B-Base-MATH-GGUF Reinforcement Learning • 2B • Updated Oct 11 • 94 • 1