Update README.md
Browse files
README.md
CHANGED
|
@@ -19,9 +19,9 @@ library_name: transformers
|
|
| 19 |
|
| 20 |
## Description:
|
| 21 |
|
| 22 |
-
Llama-3.3-Nemotron-70B-Reward-Principle is a large language model that leverages Meta-Llama-3.3-70B-Instruct as the foundation and is fine-tuned
|
| 23 |
|
| 24 |
-
Given a conversation with multiple turns between user and assistant (of up to 4,096 tokens) and a user-specified principle, it rates the quality of the final assistant turn using a reward score.
|
| 25 |
|
| 26 |
For the same prompt, a response with higher reward score fulfils the user-specified principle to a larger extent than another response with a lower reward score.
|
| 27 |
|
|
@@ -29,7 +29,6 @@ As of 24 Sep 2025, this model achieves [JudgeBench](https://huggingface.co/space
|
|
| 29 |
|
| 30 |
See details on how this model was trained at [https://arxiv.org/abs/2509.21319](https://arxiv.org/abs/2509.21319)
|
| 31 |
|
| 32 |
-
This model is ready for commercial/non-commercial use.
|
| 33 |
|
| 34 |
## License/Terms of Use:
|
| 35 |
|
|
|
|
| 19 |
|
| 20 |
## Description:
|
| 21 |
|
| 22 |
+
Llama-3.3-Nemotron-70B-Reward-Principle is a large language model that leverages Meta-Llama-3.3-70B-Instruct as the foundation and is fine-tuned to predict the extent to which LLM-generated responses fulfils user-specified principles.
|
| 23 |
|
| 24 |
+
Given a conversation with multiple turns between the user and assistant (of up to 4,096 tokens), and a user-specified principle, it rates the quality of the final assistant turn using a reward score.
|
| 25 |
|
| 26 |
For the same prompt, a response with higher reward score fulfils the user-specified principle to a larger extent than another response with a lower reward score.
|
| 27 |
|
|
|
|
| 29 |
|
| 30 |
See details on how this model was trained at [https://arxiv.org/abs/2509.21319](https://arxiv.org/abs/2509.21319)
|
| 31 |
|
|
|
|
| 32 |
|
| 33 |
## License/Terms of Use:
|
| 34 |
|