zhilinw commited on
Commit
9fd58dd
·
verified ·
1 Parent(s): e28ed04

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -19,7 +19,7 @@ library_name: transformers
19
 
20
  ## Description:
21
 
22
- Qwen-3-Nemotron-32B-GenRM-Principle is a large language model that leverages Qwen3-32B as the foundation and is fine-tuned to predict the extent to which LLM-generated responses fulfils user-specified principles.
23
 
24
  Given a conversation with multiple turns between user and assistant and a user-specified principle, it rates the quality of the final assistant turn using a reward score.
25
 
@@ -41,11 +41,11 @@ Global
41
 
42
  ## Use Case:
43
 
44
- Qwen-3-Nemotron-32B-GenRM-Principle labels an LLM-generated response to a user query and a user-specified principle with a reward score.
45
 
46
  ## Release Date:
47
 
48
- HuggingFace 10/27/2025 via https://huggingface.co/nvidia/Qwen-3-Nemotron-32B-GenRM-Principle
49
 
50
  ## References:
51
 
@@ -63,7 +63,7 @@ As of 24 Sep 2025, our reward model is the top performing generative reward mode
63
 
64
  | Model | Chat | Math | Code | Safety | Easy | Normal | Hard | Overall RM-Bench|
65
  |:-----------------------------|:------|:------|:------|:------|:------|:------|:------|:------|
66
- |**[Qwen-3-Nemotron-32B-GenRM-Principle](https://huggingface.co/nvidia/Qwen-3-Nemotron-32B-GenRM-Principle)** | 80.4 | 92.0 | 77.0 | 95.5 | 88.9 | 86.4 | 83.4 |**86.2** |
67
  |[Llama-3_3-Nemotron-Super-49B-GenRM](https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-GenRM) | 73.7 | 91.4 | 75.0 | 90.6 | 91.2 | 85.7 | 71.2 | 82.7 |
68
  |[RewardAnything-8B-v1](https://huggingface.co/WisdomShell/RewardAnything-8B-v1) | 76.7 | 90.3 | 75.2 | 90.2 | 85.6 | 82.2 | 81.5 | 83.1 |
69
  |[RM-R1-DeepSeek-Distilled-Qwen-32B](https://huggingface.co/gaotang/RM-R1-DeepSeek-Distilled-Qwen-32B) | 74.2 | 91.8 | 74.1 | 95.4 | 89.5 | 85.4 | 76.7 | 83.9 |
@@ -74,7 +74,7 @@ As of 24 Sep 2025, our reward model is the top performing models on [JudgeBench]
74
 
75
  | Model | Knowl.| Reason.| Math | Code | Overall JudgeBench |
76
  |:-----------------------------|:------|:------|:------|:------|:------|
77
- | **[Qwen-3-Nemotron-32B-GenRM-Principle](https://huggingface.co/nvidia/Qwen-3-Nemotron-32B-GenRM-Principle)** | 74.6 | 85.7 | 85.7 | 90.5 | **81.4** |
78
  | [Llama-3_3-Nemotron-Super-49B-GenRM](https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-GenRM) | 71.4 | 73.5 | 87.5 | 76.2 | 75.1 |
79
  | [RewardAnything-8B-v1](https://huggingface.co/WisdomShell/RewardAnything-8B-v1) | 61.0 | 57.1 | 73.2 | 66.7 | 62.6 |
80
  | [RM-R1-DeepSeek-Distilled-Qwen-32B](https://huggingface.co/gaotang/RM-R1-DeepSeek-Distilled-Qwen-32B) | 56.5 | 66.3 | 85.7 | 73.8 | 66.0|
@@ -121,7 +121,7 @@ This code has been tested on Transformers v4.57.0, torch v2.3.0a0+40ec155e58.nv2
121
  import torch
122
  from transformers import AutoModelForCausalLM, AutoTokenizer
123
 
124
- model_name = "nvidia/Qwen-3-Nemotron-32B-GenRM-Principle"
125
 
126
  model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")
127
  tokenizer = AutoTokenizer.from_pretrained(model_name)
 
19
 
20
  ## Description:
21
 
22
+ Qwen3-Nemotron-32B-GenRM-Principle is a large language model that leverages Qwen3-32B as the foundation and is fine-tuned to predict the extent to which LLM-generated responses fulfils user-specified principles.
23
 
24
  Given a conversation with multiple turns between user and assistant and a user-specified principle, it rates the quality of the final assistant turn using a reward score.
25
 
 
41
 
42
  ## Use Case:
43
 
44
+ Qwen3-Nemotron-32B-GenRM-Principle labels an LLM-generated response to a user query and a user-specified principle with a reward score.
45
 
46
  ## Release Date:
47
 
48
+ HuggingFace 10/27/2025 via https://huggingface.co/nvidia/Qwen3-Nemotron-32B-GenRM-Principle
49
 
50
  ## References:
51
 
 
63
 
64
  | Model | Chat | Math | Code | Safety | Easy | Normal | Hard | Overall RM-Bench|
65
  |:-----------------------------|:------|:------|:------|:------|:------|:------|:------|:------|
66
+ |**[Qwen3-Nemotron-32B-GenRM-Principle](https://huggingface.co/nvidia/Qwen3-Nemotron-32B-GenRM-Principle)** | 80.4 | 92.0 | 77.0 | 95.5 | 88.9 | 86.4 | 83.4 |**86.2** |
67
  |[Llama-3_3-Nemotron-Super-49B-GenRM](https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-GenRM) | 73.7 | 91.4 | 75.0 | 90.6 | 91.2 | 85.7 | 71.2 | 82.7 |
68
  |[RewardAnything-8B-v1](https://huggingface.co/WisdomShell/RewardAnything-8B-v1) | 76.7 | 90.3 | 75.2 | 90.2 | 85.6 | 82.2 | 81.5 | 83.1 |
69
  |[RM-R1-DeepSeek-Distilled-Qwen-32B](https://huggingface.co/gaotang/RM-R1-DeepSeek-Distilled-Qwen-32B) | 74.2 | 91.8 | 74.1 | 95.4 | 89.5 | 85.4 | 76.7 | 83.9 |
 
74
 
75
  | Model | Knowl.| Reason.| Math | Code | Overall JudgeBench |
76
  |:-----------------------------|:------|:------|:------|:------|:------|
77
+ | **[Qwen3-Nemotron-32B-GenRM-Principle](https://huggingface.co/nvidia/Qwen3-Nemotron-32B-GenRM-Principle)** | 74.6 | 85.7 | 85.7 | 90.5 | **81.4** |
78
  | [Llama-3_3-Nemotron-Super-49B-GenRM](https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-GenRM) | 71.4 | 73.5 | 87.5 | 76.2 | 75.1 |
79
  | [RewardAnything-8B-v1](https://huggingface.co/WisdomShell/RewardAnything-8B-v1) | 61.0 | 57.1 | 73.2 | 66.7 | 62.6 |
80
  | [RM-R1-DeepSeek-Distilled-Qwen-32B](https://huggingface.co/gaotang/RM-R1-DeepSeek-Distilled-Qwen-32B) | 56.5 | 66.3 | 85.7 | 73.8 | 66.0|
 
121
  import torch
122
  from transformers import AutoModelForCausalLM, AutoTokenizer
123
 
124
+ model_name = "nvidia/Qwen3-Nemotron-32B-GenRM-Principle"
125
 
126
  model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")
127
  tokenizer = AutoTokenizer.from_pretrained(model_name)