llm-blender
/

PairRM-hf

Text Generation

Model card Files Files and versions

Dongfu Jiang commited on Jan 6, 2024

Commit

096ad41

·

1 Parent(s): b9ac13b

Update README.md

Files changed (1) hide show

README.md +56 -0

README.md CHANGED Viewed

@@ -84,6 +84,62 @@ print(comparison_results)
 **We still recommend using the llm-blender wrapper to use the PairRM, as many useful application functions have been implemented to support various scenarios, such as rank, and conversation comparisons, best-of-n-sampling, etc.**
 # Pairwise Reward Model for LLMs (PairRM) from LLM-Blender

 **We still recommend using the llm-blender wrapper to use the PairRM, as many useful application functions have been implemented to support various scenarios, such as rank, and conversation comparisons, best-of-n-sampling, etc.**
+You can also easily compare two conversations like the followings:
+```python
+def tokenize_conv_pair(convAs: List[str], convBs: List[str]):
+    """Compare two conversations by takeing USER turns as inputs and ASSISTANT turns as candidates
+        Multi-turn conversations comparison is also supportted.
+        a conversation format is:
+        ```python
+        [
+            {
+                "content": "hello",
+                "role": "USER"
+            },
+            {
+                "content": "hi",
+                "role": "ASSISTANT"
+            },
+            ...
+        ]
+        ```
+    Args:
+        convAs (List[List[dict]]): List of conversations
+        convAs (List[List[dict]]): List of conversations
+    """
+    for c in convAs + convBs:
+        assert len(c) % 2 == 0, "Each conversation must have even number of turns"
+        assert all([c[i]['role'] == 'USER' for i in range(0, len(c), 2)]), "Each even turn must be USER"
+        assert all([c[i]['role'] == 'ASSISTANT' for i in range(1, len(c), 2)]), "Each odd turn must be ASSISTANT"
+    # check conversations correctness
+    assert len(convAs) == len(convBs), "Number of conversations must be the same"
+    for c_a, c_b in zip(convAs, convBs):
+        assert len(c_a) == len(c_b), "Number of turns in each conversation must be the same"
+        assert all([c_a[i]['content'] == c_b[i]['content'] for i in range(0, len(c_a), 2)]), "USER turns must be the same"
+    instructions = ["Finish the following coversation in each i-th turn by filling in <Response i> with your response."] * len(convAs)
+    inputs = [
+        "\n".join([
+            "USER: " + x[i]['content'] +
+            f"\nAssistant: <Response {i//2+1}>" for i in range(0, len(x), 2)
+        ]) for x in convAs
+    ]
+    cand1_texts = [
+        "\n".join([
+            f"<Response {i//2+1}>: " + x[i]['content'] for i in range(1, len(x), 2)
+        ]) for x in convAs
+    ]
+    cand2_texts = [
+        "\n".join([
+            f"<Response {i//2+1}>: " + x[i]['content'] for i in range(1, len(x), 2)
+        ]) for x in convBs
+    ]
+    inputs = [inst + inp for inst, inp in zip(instructions, inputs)]
+    encodings = tokenize_pair(inputs, cand1_texts, cand2_texts)
+    return encodings
+```
 # Pairwise Reward Model for LLMs (PairRM) from LLM-Blender