nvidia
/

Llama-3.3-Nemotron-70B-Reward-Principle

Text Generation

text-generation-inference

Model card Files Files and versions

okuchaiev commited on Oct 30

Commit

ac3e237

·

verified ·

1 Parent(s): d959c49

Update README.md

Files changed (1) hide show

README.md +2 -3

README.md CHANGED Viewed

@@ -19,9 +19,9 @@ library_name: transformers
 ## Description:
-Llama-3.3-Nemotron-70B-Reward-Principle is a large language model that leverages Meta-Llama-3.3-70B-Instruct as the foundation and is fine-tuned using to predict the extent to which LLM-generated responses fulfils user-specified principles.
-Given a conversation with multiple turns between user and assistant (of up to 4,096 tokens) and a user-specified principle, it rates the quality of the final assistant turn using a reward score.
 For the same prompt, a response with higher reward score fulfils the user-specified principle to a larger extent than another response with a lower reward score.
@@ -29,7 +29,6 @@ As of 24 Sep 2025, this model achieves [JudgeBench](https://huggingface.co/space
 See details on how this model was trained at [https://arxiv.org/abs/2509.21319](https://arxiv.org/abs/2509.21319)
-This model is ready for commercial/non-commercial use.
 ## License/Terms of Use:

 ## Description:
+Llama-3.3-Nemotron-70B-Reward-Principle is a large language model that leverages Meta-Llama-3.3-70B-Instruct as the foundation and is fine-tuned to predict the extent to which LLM-generated responses fulfils user-specified principles.
+Given a conversation with multiple turns between the user and assistant (of up to 4,096 tokens), and a user-specified principle, it rates the quality of the final assistant turn using a reward score.
 For the same prompt, a response with higher reward score fulfils the user-specified principle to a larger extent than another response with a lower reward score.
 See details on how this model was trained at [https://arxiv.org/abs/2509.21319](https://arxiv.org/abs/2509.21319)
 ## License/Terms of Use: