Improve model card: Add pipeline tag, library, paper link, and correct GitHub/Citation
Browse filesThis PR significantly enhances the model card for `TMLR-Group-HF/Self-Certainty-Qwen2.5-7B` by:
* **Adding `pipeline_tag: text-generation`**: This categorizes the model correctly for better discoverability on the Hugging Face Hub, aligning with its function for reasoning in LLMs.
* **Adding `library_name: transformers`**: This indicates compatibility with the `transformers` library, enabling an automated inference widget and showcasing typical usage.
* **Linking to the official Hugging Face paper page**: Providing a direct link to [Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models](https://huggingface.co/papers/2508.00410) for comprehensive paper details.
* **Correcting the GitHub repository link**: Updating the link to the accurate project repository: `https://github.com/tmlr-group/Co-rewarding`.
* **Updating the Citation**: Ensuring the citation block reflects the correct paper title and author list as found in the official GitHub repository and paper.
These changes will make the model card more informative, discoverable, and user-friendly.
|
@@ -1,19 +1,22 @@
|
|
| 1 |
---
|
| 2 |
license: mit
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
| 4 |
## TMLR-Group-HF/Self-Certainty-Qwen2.5-7B
|
| 5 |
|
| 6 |
-
This is the Qwen2.5-7B model trained by Self-Certainty method using MATH training set.
|
| 7 |
|
| 8 |
-
|
| 9 |
|
| 10 |
## Citation
|
| 11 |
|
| 12 |
-
```
|
| 13 |
@article{zhang2025coreward,
|
| 14 |
-
title={Co-
|
| 15 |
-
author={Zizhuo
|
| 16 |
-
journal={arXiv preprint arXiv:2508.00410}
|
| 17 |
-
year={2025}
|
| 18 |
}
|
| 19 |
```
|
|
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
+
pipeline_tag: text-generation
|
| 4 |
+
library_name: transformers
|
| 5 |
---
|
| 6 |
+
|
| 7 |
## TMLR-Group-HF/Self-Certainty-Qwen2.5-7B
|
| 8 |
|
| 9 |
+
This is the Qwen2.5-7B model trained by the Self-Certainty method using the MATH training set. It is part of the work presented in the paper [Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models](https://huggingface.co/papers/2508.00410).
|
| 10 |
|
| 11 |
+
For more details on the Co-rewarding framework and its implementation, you can find the code and further information on the official GitHub repository: [https://github.com/tmlr-group/Co-rewarding](https://github.com/tmlr-group/Co-rewarding).
|
| 12 |
|
| 13 |
## Citation
|
| 14 |
|
| 15 |
+
```bibtex
|
| 16 |
@article{zhang2025coreward,
|
| 17 |
+
title={Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models},
|
| 18 |
+
author={Zhang, Zizhuo and Zhu, Jianing and Ge, Xinmu and Zhao, Zihua and Zhou, Zhanke and Li, Xuan and Feng, Xiao and Yao, Jiangchao and Han, Bo},
|
| 19 |
+
journal={arXiv preprint arXiv:2508.00410},
|
| 20 |
+
year={2025}
|
| 21 |
}
|
| 22 |
```
|