nvidia
/

Qwen3-Nemotron-32B-GenRM-Principle

@@ -19,7 +19,7 @@ library_name: transformers
 ## Description:
-Qwen-3-Nemotron-32B-GenRM-Principle is a large language model that leverages Qwen3-32B as the foundation and is fine-tuned to predict the extent to which LLM-generated responses fulfils user-specified principles.
 Given a conversation with multiple turns between user and assistant and a user-specified principle, it rates the quality of the final assistant turn using a reward score.
@@ -41,11 +41,11 @@ Global
 ## Use Case:
-Qwen-3-Nemotron-32B-GenRM-Principle labels an LLM-generated response to a user query and a user-specified principle with a reward score.
 ## Release Date:
-HuggingFace 10/27/2025 via https://huggingface.co/nvidia/Qwen-3-Nemotron-32B-GenRM-Principle
 ## References:
@@ -63,7 +63,7 @@ As of 24 Sep 2025, our reward model is the top performing generative reward mode
 | Model  |  Chat | Math | Code | Safety | Easy | Normal | Hard | Overall RM-Bench|
 |:-----------------------------|:------|:------|:------|:------|:------|:------|:------|:------|
-|**[Qwen-3-Nemotron-32B-GenRM-Principle](https://huggingface.co/nvidia/Qwen-3-Nemotron-32B-GenRM-Principle)** | 80.4 | 92.0 | 77.0 | 95.5 | 88.9 | 86.4 | 83.4 |**86.2** |
 |[Llama-3_3-Nemotron-Super-49B-GenRM](https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-GenRM) | 73.7 |	91.4 |	75.0 |	90.6 |	91.2 |	85.7 |	71.2 |	82.7 |
 |[RewardAnything-8B-v1](https://huggingface.co/WisdomShell/RewardAnything-8B-v1) | 76.7 | 90.3 | 75.2 | 90.2 | 85.6 | 82.2 | 81.5 | 83.1 |
 |[RM-R1-DeepSeek-Distilled-Qwen-32B](https://huggingface.co/gaotang/RM-R1-DeepSeek-Distilled-Qwen-32B) | 74.2 | 	91.8 | 74.1 | 95.4 | 89.5 | 85.4 | 76.7 | 83.9 |
@@ -74,7 +74,7 @@ As of 24 Sep 2025, our reward model is the top performing models on [JudgeBench]
 | Model  |  Knowl.| Reason.| Math | Code | Overall JudgeBench |
 |:-----------------------------|:------|:------|:------|:------|:------|
-| **[Qwen-3-Nemotron-32B-GenRM-Principle](https://huggingface.co/nvidia/Qwen-3-Nemotron-32B-GenRM-Principle)** | 74.6 | 85.7 | 85.7 | 90.5 | **81.4** |
 | [Llama-3_3-Nemotron-Super-49B-GenRM](https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-GenRM) | 71.4 |	73.5 |	87.5	| 76.2 |	75.1 |
 | [RewardAnything-8B-v1](https://huggingface.co/WisdomShell/RewardAnything-8B-v1) | 61.0 |  57.1 | 73.2 | 66.7 | 62.6 |
 | [RM-R1-DeepSeek-Distilled-Qwen-32B](https://huggingface.co/gaotang/RM-R1-DeepSeek-Distilled-Qwen-32B) | 56.5 | 66.3 | 85.7 | 73.8 | 66.0|
@@ -121,7 +121,7 @@ This code has been tested on Transformers v4.57.0, torch v2.3.0a0+40ec155e58.nv2
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer
-model_name = "nvidia/Qwen-3-Nemotron-32B-GenRM-Principle"
 model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")
 tokenizer = AutoTokenizer.from_pretrained(model_name)

 ## Description:
+Qwen3-Nemotron-32B-GenRM-Principle is a large language model that leverages Qwen3-32B as the foundation and is fine-tuned to predict the extent to which LLM-generated responses fulfils user-specified principles.
 Given a conversation with multiple turns between user and assistant and a user-specified principle, it rates the quality of the final assistant turn using a reward score.
 ## Use Case:
+Qwen3-Nemotron-32B-GenRM-Principle labels an LLM-generated response to a user query and a user-specified principle with a reward score.
 ## Release Date:
+HuggingFace 10/27/2025 via https://huggingface.co/nvidia/Qwen3-Nemotron-32B-GenRM-Principle
 ## References:
 | Model  |  Chat | Math | Code | Safety | Easy | Normal | Hard | Overall RM-Bench|
 |:-----------------------------|:------|:------|:------|:------|:------|:------|:------|:------|
+|**[Qwen3-Nemotron-32B-GenRM-Principle](https://huggingface.co/nvidia/Qwen3-Nemotron-32B-GenRM-Principle)** | 80.4 | 92.0 | 77.0 | 95.5 | 88.9 | 86.4 | 83.4 |**86.2** |
 |[Llama-3_3-Nemotron-Super-49B-GenRM](https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-GenRM) | 73.7 |	91.4 |	75.0 |	90.6 |	91.2 |	85.7 |	71.2 |	82.7 |
 |[RewardAnything-8B-v1](https://huggingface.co/WisdomShell/RewardAnything-8B-v1) | 76.7 | 90.3 | 75.2 | 90.2 | 85.6 | 82.2 | 81.5 | 83.1 |
 |[RM-R1-DeepSeek-Distilled-Qwen-32B](https://huggingface.co/gaotang/RM-R1-DeepSeek-Distilled-Qwen-32B) | 74.2 | 	91.8 | 74.1 | 95.4 | 89.5 | 85.4 | 76.7 | 83.9 |
 | Model  |  Knowl.| Reason.| Math | Code | Overall JudgeBench |
 |:-----------------------------|:------|:------|:------|:------|:------|
+| **[Qwen3-Nemotron-32B-GenRM-Principle](https://huggingface.co/nvidia/Qwen3-Nemotron-32B-GenRM-Principle)** | 74.6 | 85.7 | 85.7 | 90.5 | **81.4** |
 | [Llama-3_3-Nemotron-Super-49B-GenRM](https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-GenRM) | 71.4 |	73.5 |	87.5	| 76.2 |	75.1 |
 | [RewardAnything-8B-v1](https://huggingface.co/WisdomShell/RewardAnything-8B-v1) | 61.0 |  57.1 | 73.2 | 66.7 | 62.6 |
 | [RM-R1-DeepSeek-Distilled-Qwen-32B](https://huggingface.co/gaotang/RM-R1-DeepSeek-Distilled-Qwen-32B) | 56.5 | 66.3 | 85.7 | 73.8 | 66.0|
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "nvidia/Qwen3-Nemotron-32B-GenRM-Principle"
 model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")
 tokenizer = AutoTokenizer.from_pretrained(model_name)