PKU-Alignment
/

beaver-7b-v1.0-cost

@@ -20,7 +20,7 @@ library_name: safe-rlhf
 ## Model Details
-The Beaver Cost model is a preference model trained using the [PKU-SafeRLHF](https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF) dataset.
 It can play a role in the safe RLHF algorithm, helping the Beaver model become more safe and harmless.
 - **Developed by:** the [PKU-Alignment](https://github.com/PKU-Alignment) Team.
@@ -36,16 +36,17 @@ It can play a role in the safe RLHF algorithm, helping the Beaver model become m
 - **Reward Model:** <https://huggingface.co/PKU-Alignment/beaver-7b-v1.0-reward>
 - **Cost Model:** <https://huggingface.co/PKU-Alignment/beaver-7b-v1.0-cost>
 - **Dataset Paper:** <https://arxiv.org/abs/2307.04657>
-- **Paper:** *Coming soon...*
 ## How to Use the Cost Model
 ```python
 from transformers import AutoTokenizer
 from safe_rlhf.models import AutoModelForScore
-model = AutoModelForScore.from_pretrained('PKU-Alignment/beaver-7b-v1.0-cost', device_map='auto')
-tokenizer = AutoTokenizer.from_pretrained('PKU-Alignment/beaver-7b-v1.0-cost', use_fast=False)
 input = 'BEGINNING OF CONVERSATION: USER: hello ASSISTANT:Hello! How can I help you today?'
@@ -54,34 +55,45 @@ output = model(**input_ids)
 print(output)
 # ScoreModelOutput(
-#     scores=tensor([[[-19.6476],
-#         [-20.2238],
-#         [-21.4228],
-#         [-19.2506],
-#         [-20.2728],
-#         [-23.8799],
-#         [-22.6898],
-#         [-21.5825],
-#         [-21.0855],
-#         [-20.2068],
-#         [-23.8296],
-#         [-21.4940],
-#         [-21.9484],
-#         [-13.1220],
-#         [ -6.4499],
-#         [ -8.1982],
-#         [ -7.2492],
-#         [ -9.3377],
-#         [-13.5010],
-#         [-10.4932],
-#         [ -9.7837],
-#         [ -6.4540],
-#         [ -6.0084],
-#         [ -5.8093],
-#         [ -6.6134],
-#         [ -5.8995],
-#         [ -9.1505],
-#         [-11.3254]]], grad_fn=<ToCopyBackward0>),
-#     end_scores=tensor([[-11.3254]], grad_fn=<ToCopyBackward0>)
 # )
-```

 ## Model Details
+The Beaver cost model is a preference model trained using the [PKU-SafeRLHF](https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF) dataset.
 It can play a role in the safe RLHF algorithm, helping the Beaver model become more safe and harmless.
 - **Developed by:** the [PKU-Alignment](https://github.com/PKU-Alignment) Team.
 - **Reward Model:** <https://huggingface.co/PKU-Alignment/beaver-7b-v1.0-reward>
 - **Cost Model:** <https://huggingface.co/PKU-Alignment/beaver-7b-v1.0-cost>
 - **Dataset Paper:** <https://arxiv.org/abs/2307.04657>
+- **Paper:** <https://arxiv.org/abs/2310.12773>
 ## How to Use the Cost Model
 ```python
+import torch
 from transformers import AutoTokenizer
 from safe_rlhf.models import AutoModelForScore
+model = AutoModelForScore.from_pretrained('PKU-Alignment/beaver-7b-v1.0-cost', torch_dtype=torch.bfloat16, device_map='auto')
+tokenizer = AutoTokenizer.from_pretrained('PKU-Alignment/beaver-7b-v1.0-cost')
 input = 'BEGINNING OF CONVERSATION: USER: hello ASSISTANT:Hello! How can I help you today?'
 print(output)
 # ScoreModelOutput(
+#     scores=tensor([[[ -9.4375],
+#          [ -2.5156],
+#          [ -2.6562],
+#          [ -2.3594],
+#          [ -1.9375],
+#          [ -2.5781],
+#          [ -1.4766],
+#          [ -1.9922],
+#          [ -2.6562],
+#          [ -3.8125],
+#          [ -2.9844],
+#          [ -4.1875],
+#          [ -3.5938],
+#          [ -4.6562],
+#          [ -4.0000],
+#          [ -3.3438],
+#          [ -4.5625],
+#          [ -4.8438],
+#          [ -5.1875],
+#          [ -8.0000],
+#          [ -8.4375],
+#          [-10.5000],
+#          [-10.5000],
+#          [ -8.8750],
+#          [-10.1250],
+#          [-10.2500],
+#          [-11.5625],
+#          [-10.7500]]], grad_fn=<ToCopyBackward0>),
+#     end_scores=tensor([[-10.7500]], grad_fn=<ToCopyBackward0>),
+#     last_hidden_state=tensor([[[ 2.2812, -0.4219, -0.2832,  ...,  0.2715,  0.4277,  1.1875],
+#          [-0.3730, -0.2158,  1.2891,  ..., -1.3281,  0.6016,  0.7773],
+#          [ 0.2285, -1.2422,  1.0625,  ..., -1.3438,  1.1875,  1.1016],
+#          ...,
+#          [-0.8828, -2.6250,  0.9180,  ..., -0.2773,  1.7500,  0.7695],
+#          [ 2.0781, -4.1250, -0.1069,  ..., -0.8008,  0.4844,  0.4102],
+#          [ 2.9688, -1.6250,  1.1250,  ...,  0.3223,  0.0439, -2.3281]]],
+#        dtype=torch.bfloat16, grad_fn=<ToCopyBackward0>),
+#     end_last_hidden_state=tensor([[ 2.9688, -1.6250,  1.1250,  ...,  0.3223,  0.0439, -2.3281]],
+#        dtype=torch.bfloat16, grad_fn=<ToCopyBackward0>),
+#     end_index=tensor([27])
 # )
+```