Safetensors
qwen2
Delta-Vector lbourdois commited on
Commit
6cd7e6e
·
verified ·
1 Parent(s): c966ed2

Improve language tag (#1)

Browse files

- Improve language tag (5bd81f9c0dc8df900c77afaf0e7d64381fbdc14f)


Co-authored-by: Loïck BOURDOIS <[email protected]>

Files changed (1) hide show
  1. README.md +80 -68
README.md CHANGED
@@ -1,69 +1,81 @@
1
- ---
2
- license: apache-2.0
3
- datasets:
4
- - Quest-AI/quest-270k-chunked-64-judgement
5
- language:
6
- - en
7
- base_model:
8
- - Qwen/Qwen2.5-7B
9
- ---
10
- # Pretrained Reward Model Classifier
11
-
12
- ## Overview
13
-
14
- This is a specialized binary classifier that evaluates text chunks and predicts whether they would be "Chosen" (A) or "Rejected" (B).
15
-
16
- ## How It Works
17
-
18
- 1. Text is split into exact 64-token chunks using the Qwen 2.5 tokenizer
19
- 2. The model evaluates the preference between chunks in a specific format
20
- 3. Only token IDs [32, 33] have non-zero weights in the LM head (A=Chosen, B=Rejected)
21
-
22
- ## Input Format
23
-
24
- The model expects input in this precise format:
25
-
26
- ```
27
- [Original text from previous 64-token chunks]
28
-
29
- <<JUDGEMENT_REGION>>
30
- [Next 64-token chunk to evaluate]
31
- <</JUDGEMENT_REGION>>
32
-
33
- <<JUDGEMENT>>
34
- ```
35
-
36
- ## Example
37
-
38
- Original paragraph:
39
- ```
40
- The city council meeting started promptly at 6 PM with all members present. Mayor Johnson opened by addressing concerns about the new parking regulations downtown. Citizens expressed both support and opposition during the public comment period. The council ultimately voted 4-2 to implement the regulations starting next month.
41
- ```
42
-
43
- Formatted for prediction:
44
- ```
45
- The city council meeting started promptly at 6 PM with all members present. Mayor Johnson opened by addressing concerns about the new parking regulations downtown.
46
-
47
- <<JUDGEMENT_REGION>>
48
- Citizens expressed both support and opposition during the public comment period. The council ultimately voted 4-2 to implement the regulations starting next month.
49
- <</JUDGEMENT_REGION>>
50
-
51
- <<JUDGEMENT>>
52
- ```
53
-
54
- ## Output
55
-
56
- The model predicts whether the chunk in the JUDGEMENT_REGION would be:
57
- - A: Chosen (preferred content)
58
- - B: Rejected (less preferred content)
59
-
60
- The prediction is based on the relative probabilities between these two tokens only.
61
-
62
- ## Analysis
63
-
64
- For practical use, results should be aggregated by taking the mean of log ratios between the two probabilities:
65
- ```
66
- log_ratio = log(P(A) / P(B))
67
- ```
68
-
 
 
 
 
 
 
 
 
 
 
 
 
69
  This log ratio approach provides a more stable and interpretable signal across multiple evaluations than using raw probabilities alone.
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - Quest-AI/quest-270k-chunked-64-judgement
5
+ language:
6
+ - zho
7
+ - eng
8
+ - fra
9
+ - spa
10
+ - por
11
+ - deu
12
+ - ita
13
+ - rus
14
+ - jpn
15
+ - kor
16
+ - vie
17
+ - tha
18
+ - ara
19
+ base_model:
20
+ - Qwen/Qwen2.5-7B
21
+ ---
22
+ # Pretrained Reward Model Classifier
23
+
24
+ ## Overview
25
+
26
+ This is a specialized binary classifier that evaluates text chunks and predicts whether they would be "Chosen" (A) or "Rejected" (B).
27
+
28
+ ## How It Works
29
+
30
+ 1. Text is split into exact 64-token chunks using the Qwen 2.5 tokenizer
31
+ 2. The model evaluates the preference between chunks in a specific format
32
+ 3. Only token IDs [32, 33] have non-zero weights in the LM head (A=Chosen, B=Rejected)
33
+
34
+ ## Input Format
35
+
36
+ The model expects input in this precise format:
37
+
38
+ ```
39
+ [Original text from previous 64-token chunks]
40
+
41
+ <<JUDGEMENT_REGION>>
42
+ [Next 64-token chunk to evaluate]
43
+ <</JUDGEMENT_REGION>>
44
+
45
+ <<JUDGEMENT>>
46
+ ```
47
+
48
+ ## Example
49
+
50
+ Original paragraph:
51
+ ```
52
+ The city council meeting started promptly at 6 PM with all members present. Mayor Johnson opened by addressing concerns about the new parking regulations downtown. Citizens expressed both support and opposition during the public comment period. The council ultimately voted 4-2 to implement the regulations starting next month.
53
+ ```
54
+
55
+ Formatted for prediction:
56
+ ```
57
+ The city council meeting started promptly at 6 PM with all members present. Mayor Johnson opened by addressing concerns about the new parking regulations downtown.
58
+
59
+ <<JUDGEMENT_REGION>>
60
+ Citizens expressed both support and opposition during the public comment period. The council ultimately voted 4-2 to implement the regulations starting next month.
61
+ <</JUDGEMENT_REGION>>
62
+
63
+ <<JUDGEMENT>>
64
+ ```
65
+
66
+ ## Output
67
+
68
+ The model predicts whether the chunk in the JUDGEMENT_REGION would be:
69
+ - A: Chosen (preferred content)
70
+ - B: Rejected (less preferred content)
71
+
72
+ The prediction is based on the relative probabilities between these two tokens only.
73
+
74
+ ## Analysis
75
+
76
+ For practical use, results should be aggregated by taking the mean of log ratios between the two probabilities:
77
+ ```
78
+ log_ratio = log(P(A) / P(B))
79
+ ```
80
+
81
  This log ratio approach provides a more stable and interpretable signal across multiple evaluations than using raw probabilities alone.