LiangJiang commited on
Commit
732d14d
·
verified ·
1 Parent(s): c6399fe

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -0
README.md CHANGED
@@ -49,6 +49,20 @@ For the IMO 2025 test, similar to the previous preview version, we integrated Ri
49
  <img src="https://mdn.alipayobjects.com/huamei_d2byvp/afts/img/oDFIRr9agCUAAAAAR-AAAAgADod9AQFr/original" width="500" />
50
  </p>
51
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
  ## Model Downloads
53
 
54
  You can download Ring-1T from the following table. If you are located in mainland China, we also provide the model on ModelScope to speed up the download process.
 
49
  <img src="https://mdn.alipayobjects.com/huamei_d2byvp/afts/img/oDFIRr9agCUAAAAAR-AAAAgADod9AQFr/original" width="500" />
50
  </p>
51
 
52
+ At the ICPC World Finals 2025, we compared GPT-5-Thinking, Gemini-2.5-Pro, and Ring-1T. In a test allowing three attempts for direct problem-solving by the models, they solved 6 (problems CDEFKL), 3 (problems DFK), and 5 (problems DFJKL) problems, respectively. The results demonstrate that Ring-1T also delivers outstanding performance in top-tier international programming competitions. Further testing is ongoing, and we will also open-source the solution traces of the models for the aforementioned competitions (IMO traces are provided at the end of the article). We look forward to collaborating with the community to further optimize the reasoning potential of this trillion-parameter thinking model.
53
+
54
+ ## Icepop: Ensuring Stable Reinforcement Learning Through Long-Term Training
55
+
56
+ In the reinforcement learning training of MoE models, the discrepancies in operator implementations between the training and inference engines are more pronounced compared to dense models. This divergence becomes increasingly significant as sequence length and training steps accumulate, particularly during long-sequence generation and extended training cycles. As illustrated in the experiment below, the original GRPO algorithm begins to collapse after relatively few training steps. In contrast, our proposed Icepop algorithm mitigates this issue by correcting distributions through masked bidirectional truncation technology, effectively reducing the gap between training and inference phases—thereby "cooling down" the rapidly escalating training-inference discrepancy.
57
+
58
+ <p align="center">
59
+ <img src="https://mdn.alipayobjects.com/huamei_d2byvp/afts/img/D1jaRoB7D4kAAAAAT6AAAAgADod9AQFr/original" width="500" />
60
+ </p>
61
+
62
+ <p align="center">
63
+ <img src="https://mdn.alipayobjects.com/huamei_d2byvp/afts/img/9BqKQ7E46j0AAAAATLAAAAgADod9AQFr/original" width="500" />
64
+ </p>
65
+
66
  ## Model Downloads
67
 
68
  You can download Ring-1T from the following table. If you are located in mainland China, we also provide the model on ModelScope to speed up the download process.