Improve model card: Update pipeline tag, add library name, and link paper (#1)
Browse files- Improve model card: Update pipeline tag, add library name, and link paper (92f921c845822e134ed5a982517ca2124ae3607c)
Co-authored-by: Niels Rogge <[email protected]>
    	
        README.md
    CHANGED
    
    | @@ -1,47 +1,50 @@ | |
| 1 | 
             
            ---
         | 
| 2 | 
            -
             | 
| 3 | 
            -
             | 
| 4 | 
            -
            - RLinf
         | 
| 5 | 
             
            language:
         | 
| 6 | 
             
            - en
         | 
|  | |
| 7 | 
             
            metrics:
         | 
| 8 | 
             
            - accuracy
         | 
| 9 | 
            -
             | 
| 10 | 
            -
             | 
| 11 | 
            -
             | 
|  | |
|  | |
| 12 | 
             
            model-index:
         | 
| 13 | 
             
            - name: RLinf-math-7B
         | 
| 14 | 
             
              results:
         | 
| 15 | 
             
              - task:
         | 
| 16 | 
            -
                  type: math | 
| 17 | 
             
                dataset:
         | 
| 18 | 
            -
                   | 
| 19 | 
            -
                   | 
| 20 | 
             
                metrics:
         | 
| 21 | 
            -
             | 
| 22 | 
            -
             | 
| 23 | 
             
              - task:
         | 
| 24 | 
            -
                  type: math | 
| 25 | 
             
                dataset:
         | 
| 26 | 
            -
                   | 
| 27 | 
            -
                   | 
| 28 | 
             
                metrics:
         | 
| 29 | 
            -
             | 
| 30 | 
            -
             | 
| 31 | 
             
              - task:
         | 
| 32 | 
            -
                  type: stem | 
| 33 | 
             
                dataset:
         | 
| 34 | 
            -
                   | 
| 35 | 
            -
                   | 
| 36 | 
             
                metrics:
         | 
| 37 | 
            -
             | 
| 38 | 
            -
             | 
| 39 | 
             
            ---
         | 
| 40 |  | 
| 41 | 
             
            <div align="center">
         | 
| 42 | 
             
              <img src="logo.svg" alt="RLinf-logo" width="500"/>
         | 
| 43 | 
             
            </div>
         | 
| 44 |  | 
|  | |
| 45 |  | 
| 46 | 
             
            <div align="center">
         | 
| 47 | 
             
            <!-- <a href="TODO"><img src="https://img.shields.io/badge/arXiv-Paper-red?logo=arxiv"></a> -->
         | 
| @@ -96,10 +99,15 @@ We trained and evaluated two models using RLinf: | |
| 96 | 
             
            | Model                                    | AIME 24   | AIME 25   | GPQA-diamond | Average   |
         | 
| 97 | 
             
            | ---------------------------------------- | --------- | --------- | ------------ | --------- |
         | 
| 98 | 
             
            | [DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B)  | 54.90     | 40.20     | 45.48        | 46.86     |
         | 
|  | |
| 99 | 
             
            | [AReaL-boba-RL-7B](https://huggingface.co/inclusionAI/AReaL-boba-RL-7B)                           | 61.66     | 49.38     | 46.93        | 52.66     |
         | 
|  | |
| 100 | 
             
            | [Skywork-OR1-7B](https://huggingface.co/Skywork/Skywork-OR1-7B)                           | 66.87     | 52.49     | 44.43        | 54.60     |
         | 
|  | |
| 101 | 
             
            | [Polaris-7B-Preview](https://huggingface.co/POLARIS-Project/Polaris-7B-Preview)                    | **68.55** | 51.24     | 43.88        | 54.56     |
         | 
|  | |
| 102 | 
             
            | [AceMath-RL-Nemotron-7B](https://huggingface.co/nvidia/AceMath-RL-Nemotron-7B)                   | 67.30     | **55.00** | 45.57        | 55.96     |
         | 
|  | |
| 103 | 
             
            | [RLinf-math-7B](https://huggingface.co/RLinf/RLinf-math-7B)                            | 68.33     | 52.19     | **48.18**    | **56.23** |
         | 
| 104 |  | 
| 105 |  | 
| @@ -128,4 +136,4 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True)) | |
| 128 | 
             
            ```
         | 
| 129 |  | 
| 130 | 
             
            ## License
         | 
| 131 | 
            -
            This code repository and the model weights are licensed under the MIT License.
         | 
|  | |
| 1 | 
             
            ---
         | 
| 2 | 
            +
            base_model:
         | 
| 3 | 
            +
            - deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
         | 
|  | |
| 4 | 
             
            language:
         | 
| 5 | 
             
            - en
         | 
| 6 | 
            +
            license: mit
         | 
| 7 | 
             
            metrics:
         | 
| 8 | 
             
            - accuracy
         | 
| 9 | 
            +
            pipeline_tag: text-generation
         | 
| 10 | 
            +
            library_name: transformers
         | 
| 11 | 
            +
            tags:
         | 
| 12 | 
            +
            - RLinf
         | 
| 13 | 
            +
            - reinforcement-learning
         | 
| 14 | 
             
            model-index:
         | 
| 15 | 
             
            - name: RLinf-math-7B
         | 
| 16 | 
             
              results:
         | 
| 17 | 
             
              - task:
         | 
| 18 | 
            +
                  type: math
         | 
| 19 | 
             
                dataset:
         | 
| 20 | 
            +
                  name: AIME24
         | 
| 21 | 
            +
                  type: aime_2024
         | 
| 22 | 
             
                metrics:
         | 
| 23 | 
            +
                - type: accuracy
         | 
| 24 | 
            +
                  value: 68.328125
         | 
| 25 | 
             
              - task:
         | 
| 26 | 
            +
                  type: math
         | 
| 27 | 
             
                dataset:
         | 
| 28 | 
            +
                  name: AIME25
         | 
| 29 | 
            +
                  type: aime_2025
         | 
| 30 | 
             
                metrics:
         | 
| 31 | 
            +
                - type: accuracy
         | 
| 32 | 
            +
                  value: 52.19375
         | 
| 33 | 
             
              - task:
         | 
| 34 | 
            +
                  type: stem
         | 
| 35 | 
             
                dataset:
         | 
| 36 | 
            +
                  name: GPQA-diamond
         | 
| 37 | 
            +
                  type: gpqa_diamond
         | 
| 38 | 
             
                metrics:
         | 
| 39 | 
            +
                - type: accuracy
         | 
| 40 | 
            +
                  value: 48.178124999999994
         | 
| 41 | 
             
            ---
         | 
| 42 |  | 
| 43 | 
             
            <div align="center">
         | 
| 44 | 
             
              <img src="logo.svg" alt="RLinf-logo" width="500"/>
         | 
| 45 | 
             
            </div>
         | 
| 46 |  | 
| 47 | 
            +
            The model was presented in the paper [RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training](https://huggingface.co/papers/2510.06710).
         | 
| 48 |  | 
| 49 | 
             
            <div align="center">
         | 
| 50 | 
             
            <!-- <a href="TODO"><img src="https://img.shields.io/badge/arXiv-Paper-red?logo=arxiv"></a> -->
         | 
|  | |
| 99 | 
             
            | Model                                    | AIME 24   | AIME 25   | GPQA-diamond | Average   |
         | 
| 100 | 
             
            | ---------------------------------------- | --------- | --------- | ------------ | --------- |
         | 
| 101 | 
             
            | [DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B)  | 54.90     | 40.20     | 45.48        | 46.86     |
         | 
| 102 | 
            +
             | 
| 103 | 
             
            | [AReaL-boba-RL-7B](https://huggingface.co/inclusionAI/AReaL-boba-RL-7B)                           | 61.66     | 49.38     | 46.93        | 52.66     |
         | 
| 104 | 
            +
             | 
| 105 | 
             
            | [Skywork-OR1-7B](https://huggingface.co/Skywork/Skywork-OR1-7B)                           | 66.87     | 52.49     | 44.43        | 54.60     |
         | 
| 106 | 
            +
             | 
| 107 | 
             
            | [Polaris-7B-Preview](https://huggingface.co/POLARIS-Project/Polaris-7B-Preview)                    | **68.55** | 51.24     | 43.88        | 54.56     |
         | 
| 108 | 
            +
             | 
| 109 | 
             
            | [AceMath-RL-Nemotron-7B](https://huggingface.co/nvidia/AceMath-RL-Nemotron-7B)                   | 67.30     | **55.00** | 45.57        | 55.96     |
         | 
| 110 | 
            +
             | 
| 111 | 
             
            | [RLinf-math-7B](https://huggingface.co/RLinf/RLinf-math-7B)                            | 68.33     | 52.19     | **48.18**    | **56.23** |
         | 
| 112 |  | 
| 113 |  | 
|  | |
| 136 | 
             
            ```
         | 
| 137 |  | 
| 138 | 
             
            ## License
         | 
| 139 | 
            +
            This code repository and the model weights are licensed under the MIT License.
         | 

