Update README.md
Browse files
README.md
CHANGED
|
@@ -74,7 +74,8 @@ zjunlp/KnowRL-Train-Data` dataset.
|
|
| 74 |
* **Stage 1: Cold-Start SFT**: The base model undergoes supervised fine-tuning on the `knowrl_coldstart.json` dataset. This stage helps the model adopt a fact-based, slow-thinking response structure.
|
| 75 |
* **Stage 2: Knowledgeable RL**: The SFT-tuned model is further trained using reinforcement learning (GRPO). The reward function combines a correctness reward with a factuality reward, which is calculated by verifying the model's thinking process against an external knowledge base. This stage uses the `knowrl_RLdata.json` and `KnowRL_RLtrain_data_withknowledge.json` files.
|
| 76 |
|
| 77 |
-
For complete details on the training configuration and hyperparameters, please refer to our [GitHub repository](https://github.com/zjunlp/KnowRL
|
|
|
|
| 78 |
|
| 79 |
---
|
| 80 |
|
|
@@ -82,7 +83,7 @@ For complete details on the training configuration and hyperparameters, please r
|
|
| 82 |
If you find this model useful in your research, please consider citing our paper:
|
| 83 |
```bibtex
|
| 84 |
@article{ren2025knowrl,
|
| 85 |
-
title={
|
| 86 |
author={Ren, Baochang and Qiao, Shuofei and Yu, Wenhao and Chen, Huajun and Zhang, Ningyu},
|
| 87 |
journal={arXiv preprint arXiv:2506.19807},
|
| 88 |
year={2025}
|
|
|
|
| 74 |
* **Stage 1: Cold-Start SFT**: The base model undergoes supervised fine-tuning on the `knowrl_coldstart.json` dataset. This stage helps the model adopt a fact-based, slow-thinking response structure.
|
| 75 |
* **Stage 2: Knowledgeable RL**: The SFT-tuned model is further trained using reinforcement learning (GRPO). The reward function combines a correctness reward with a factuality reward, which is calculated by verifying the model's thinking process against an external knowledge base. This stage uses the `knowrl_RLdata.json` and `KnowRL_RLtrain_data_withknowledge.json` files.
|
| 76 |
|
| 77 |
+
For complete details on the training configuration and hyperparameters, please refer to our [GitHub repository](https://github.com/zjunlp/KnowRL
|
| 78 |
+
).
|
| 79 |
|
| 80 |
---
|
| 81 |
|
|
|
|
| 83 |
If you find this model useful in your research, please consider citing our paper:
|
| 84 |
```bibtex
|
| 85 |
@article{ren2025knowrl,
|
| 86 |
+
title={KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality},
|
| 87 |
author={Ren, Baochang and Qiao, Shuofei and Yu, Wenhao and Chen, Huajun and Zhang, Ningyu},
|
| 88 |
journal={arXiv preprint arXiv:2506.19807},
|
| 89 |
year={2025}
|