Update README.md
Browse files
README.md
CHANGED
|
@@ -19,7 +19,7 @@ library_name: transformers
|
|
| 19 |
|
| 20 |
## Description:
|
| 21 |
|
| 22 |
-
|
| 23 |
|
| 24 |
Given a conversation with multiple turns between user and assistant and a user-specified principle, it rates the quality of the final assistant turn using a reward score.
|
| 25 |
|
|
@@ -41,11 +41,11 @@ Global
|
|
| 41 |
|
| 42 |
## Use Case:
|
| 43 |
|
| 44 |
-
|
| 45 |
|
| 46 |
## Release Date:
|
| 47 |
|
| 48 |
-
HuggingFace 10/27/2025 via https://huggingface.co/nvidia/
|
| 49 |
|
| 50 |
## References:
|
| 51 |
|
|
@@ -63,7 +63,7 @@ As of 24 Sep 2025, our reward model is the top performing generative reward mode
|
|
| 63 |
|
| 64 |
| Model | Chat | Math | Code | Safety | Easy | Normal | Hard | Overall RM-Bench|
|
| 65 |
|:-----------------------------|:------|:------|:------|:------|:------|:------|:------|:------|
|
| 66 |
-
|**[
|
| 67 |
|[Llama-3_3-Nemotron-Super-49B-GenRM](https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-GenRM) | 73.7 | 91.4 | 75.0 | 90.6 | 91.2 | 85.7 | 71.2 | 82.7 |
|
| 68 |
|[RewardAnything-8B-v1](https://huggingface.co/WisdomShell/RewardAnything-8B-v1) | 76.7 | 90.3 | 75.2 | 90.2 | 85.6 | 82.2 | 81.5 | 83.1 |
|
| 69 |
|[RM-R1-DeepSeek-Distilled-Qwen-32B](https://huggingface.co/gaotang/RM-R1-DeepSeek-Distilled-Qwen-32B) | 74.2 | 91.8 | 74.1 | 95.4 | 89.5 | 85.4 | 76.7 | 83.9 |
|
|
@@ -74,7 +74,7 @@ As of 24 Sep 2025, our reward model is the top performing models on [JudgeBench]
|
|
| 74 |
|
| 75 |
| Model | Knowl.| Reason.| Math | Code | Overall JudgeBench |
|
| 76 |
|:-----------------------------|:------|:------|:------|:------|:------|
|
| 77 |
-
| **[
|
| 78 |
| [Llama-3_3-Nemotron-Super-49B-GenRM](https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-GenRM) | 71.4 | 73.5 | 87.5 | 76.2 | 75.1 |
|
| 79 |
| [RewardAnything-8B-v1](https://huggingface.co/WisdomShell/RewardAnything-8B-v1) | 61.0 | 57.1 | 73.2 | 66.7 | 62.6 |
|
| 80 |
| [RM-R1-DeepSeek-Distilled-Qwen-32B](https://huggingface.co/gaotang/RM-R1-DeepSeek-Distilled-Qwen-32B) | 56.5 | 66.3 | 85.7 | 73.8 | 66.0|
|
|
@@ -121,7 +121,7 @@ This code has been tested on Transformers v4.57.0, torch v2.3.0a0+40ec155e58.nv2
|
|
| 121 |
import torch
|
| 122 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 123 |
|
| 124 |
-
model_name = "nvidia/
|
| 125 |
|
| 126 |
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")
|
| 127 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
|
|
|
| 19 |
|
| 20 |
## Description:
|
| 21 |
|
| 22 |
+
Qwen3-Nemotron-32B-GenRM-Principle is a large language model that leverages Qwen3-32B as the foundation and is fine-tuned to predict the extent to which LLM-generated responses fulfils user-specified principles.
|
| 23 |
|
| 24 |
Given a conversation with multiple turns between user and assistant and a user-specified principle, it rates the quality of the final assistant turn using a reward score.
|
| 25 |
|
|
|
|
| 41 |
|
| 42 |
## Use Case:
|
| 43 |
|
| 44 |
+
Qwen3-Nemotron-32B-GenRM-Principle labels an LLM-generated response to a user query and a user-specified principle with a reward score.
|
| 45 |
|
| 46 |
## Release Date:
|
| 47 |
|
| 48 |
+
HuggingFace 10/27/2025 via https://huggingface.co/nvidia/Qwen3-Nemotron-32B-GenRM-Principle
|
| 49 |
|
| 50 |
## References:
|
| 51 |
|
|
|
|
| 63 |
|
| 64 |
| Model | Chat | Math | Code | Safety | Easy | Normal | Hard | Overall RM-Bench|
|
| 65 |
|:-----------------------------|:------|:------|:------|:------|:------|:------|:------|:------|
|
| 66 |
+
|**[Qwen3-Nemotron-32B-GenRM-Principle](https://huggingface.co/nvidia/Qwen3-Nemotron-32B-GenRM-Principle)** | 80.4 | 92.0 | 77.0 | 95.5 | 88.9 | 86.4 | 83.4 |**86.2** |
|
| 67 |
|[Llama-3_3-Nemotron-Super-49B-GenRM](https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-GenRM) | 73.7 | 91.4 | 75.0 | 90.6 | 91.2 | 85.7 | 71.2 | 82.7 |
|
| 68 |
|[RewardAnything-8B-v1](https://huggingface.co/WisdomShell/RewardAnything-8B-v1) | 76.7 | 90.3 | 75.2 | 90.2 | 85.6 | 82.2 | 81.5 | 83.1 |
|
| 69 |
|[RM-R1-DeepSeek-Distilled-Qwen-32B](https://huggingface.co/gaotang/RM-R1-DeepSeek-Distilled-Qwen-32B) | 74.2 | 91.8 | 74.1 | 95.4 | 89.5 | 85.4 | 76.7 | 83.9 |
|
|
|
|
| 74 |
|
| 75 |
| Model | Knowl.| Reason.| Math | Code | Overall JudgeBench |
|
| 76 |
|:-----------------------------|:------|:------|:------|:------|:------|
|
| 77 |
+
| **[Qwen3-Nemotron-32B-GenRM-Principle](https://huggingface.co/nvidia/Qwen3-Nemotron-32B-GenRM-Principle)** | 74.6 | 85.7 | 85.7 | 90.5 | **81.4** |
|
| 78 |
| [Llama-3_3-Nemotron-Super-49B-GenRM](https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-GenRM) | 71.4 | 73.5 | 87.5 | 76.2 | 75.1 |
|
| 79 |
| [RewardAnything-8B-v1](https://huggingface.co/WisdomShell/RewardAnything-8B-v1) | 61.0 | 57.1 | 73.2 | 66.7 | 62.6 |
|
| 80 |
| [RM-R1-DeepSeek-Distilled-Qwen-32B](https://huggingface.co/gaotang/RM-R1-DeepSeek-Distilled-Qwen-32B) | 56.5 | 66.3 | 85.7 | 73.8 | 66.0|
|
|
|
|
| 121 |
import torch
|
| 122 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 123 |
|
| 124 |
+
model_name = "nvidia/Qwen3-Nemotron-32B-GenRM-Principle"
|
| 125 |
|
| 126 |
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")
|
| 127 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|