okuchaiev commited on
Commit
ac3e237
·
verified ·
1 Parent(s): d959c49

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -3
README.md CHANGED
@@ -19,9 +19,9 @@ library_name: transformers
19
 
20
  ## Description:
21
 
22
- Llama-3.3-Nemotron-70B-Reward-Principle is a large language model that leverages Meta-Llama-3.3-70B-Instruct as the foundation and is fine-tuned using to predict the extent to which LLM-generated responses fulfils user-specified principles.
23
 
24
- Given a conversation with multiple turns between user and assistant (of up to 4,096 tokens) and a user-specified principle, it rates the quality of the final assistant turn using a reward score.
25
 
26
  For the same prompt, a response with higher reward score fulfils the user-specified principle to a larger extent than another response with a lower reward score.
27
 
@@ -29,7 +29,6 @@ As of 24 Sep 2025, this model achieves [JudgeBench](https://huggingface.co/space
29
 
30
  See details on how this model was trained at [https://arxiv.org/abs/2509.21319](https://arxiv.org/abs/2509.21319)
31
 
32
- This model is ready for commercial/non-commercial use.
33
 
34
  ## License/Terms of Use:
35
 
 
19
 
20
  ## Description:
21
 
22
+ Llama-3.3-Nemotron-70B-Reward-Principle is a large language model that leverages Meta-Llama-3.3-70B-Instruct as the foundation and is fine-tuned to predict the extent to which LLM-generated responses fulfils user-specified principles.
23
 
24
+ Given a conversation with multiple turns between the user and assistant (of up to 4,096 tokens), and a user-specified principle, it rates the quality of the final assistant turn using a reward score.
25
 
26
  For the same prompt, a response with higher reward score fulfils the user-specified principle to a larger extent than another response with a lower reward score.
27
 
 
29
 
30
  See details on how this model was trained at [https://arxiv.org/abs/2509.21319](https://arxiv.org/abs/2509.21319)
31
 
 
32
 
33
  ## License/Terms of Use:
34