Improve model card: Add pipeline tag, library, paper link, and correct GitHub/Citation

This PR significantly enhances the model card for `TMLR-Group-HF/Self-Certainty-Qwen2.5-7B` by:

* **Adding `pipeline_tag: text-generation`**: This categorizes the model correctly for better discoverability on the Hugging Face Hub, aligning with its function for reasoning in LLMs.
* **Adding `library_name: transformers`**: This indicates compatibility with the `transformers` library, enabling an automated inference widget and showcasing typical usage.
* **Linking to the official Hugging Face paper page**: Providing a direct link to [Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models](https://huggingface.co/papers/2508.00410) for comprehensive paper details.
* **Correcting the GitHub repository link**: Updating the link to the accurate project repository: `https://github.com/tmlr-group/Co-rewarding`.
* **Updating the Citation**: Ensuring the citation block reflects the correct paper title and author list as found in the official GitHub repository and paper.

These changes will make the model card more informative, discoverable, and user-friendly.

Files changed (1) hide show

README.md +10 -7

README.md CHANGED Viewed

@@ -1,19 +1,22 @@
 ---
 license: mit
 ---
 ## TMLR-Group-HF/Self-Certainty-Qwen2.5-7B
-This is the Qwen2.5-7B model trained by Self-Certainty method using MATH training set.
-If you are interested in Co-Reward, you can find more details on our Github Repo [https://github.com/tmlr-group/Co-Reward].
 ## Citation
-```
 @article{zhang2025coreward,
-      title={Co-Reward: Self-supervised Reinforcement Learning for Large Language Model Reasoning via Contrastive Agreement},
-      author={Zizhuo Zhang and Jianing Zhu and Xinmu Ge and Zihua Zhao and Zhanke Zhou and Xuan Li and Xiao Feng and Jiangchao Yao and Bo Han},
-      journal={arXiv preprint arXiv:2508.00410}
-      year={2025},
 }
 ```

 ---
 license: mit
+pipeline_tag: text-generation
+library_name: transformers
 ---
 ## TMLR-Group-HF/Self-Certainty-Qwen2.5-7B
+This is the Qwen2.5-7B model trained by the Self-Certainty method using the MATH training set. It is part of the work presented in the paper [Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models](https://huggingface.co/papers/2508.00410).
+For more details on the Co-rewarding framework and its implementation, you can find the code and further information on the official GitHub repository: [https://github.com/tmlr-group/Co-rewarding](https://github.com/tmlr-group/Co-rewarding).
 ## Citation
+```bibtex
 @article{zhang2025coreward,
+      title={Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models},
+      author={Zhang, Zizhuo and Zhu, Jianing and Ge, Xinmu and Zhao, Zihua and Zhou, Zhanke and Li, Xuan and Feng, Xiao and Yao, Jiangchao and Han, Bo},
+      journal={arXiv preprint arXiv:2508.00410},
+      year={2025}
 }
 ```