Improve language tag (#1)

Browse files

- Improve language tag (5bd81f9c0dc8df900c77afaf0e7d64381fbdc14f)

Co-authored-by: Loïck BOURDOIS <[email protected]>

Files changed (1) hide show

README.md +80 -68

README.md CHANGED Viewed

@@ -1,69 +1,81 @@
----
-license: apache-2.0
-datasets:
-- Quest-AI/quest-270k-chunked-64-judgement
-language:
-- en
-base_model:
-- Qwen/Qwen2.5-7B
----
-# Pretrained Reward Model Classifier
-## Overview
-This is a specialized binary classifier that evaluates text chunks and predicts whether they would be "Chosen" (A) or "Rejected" (B).
-## How It Works
-1. Text is split into exact 64-token chunks using the Qwen 2.5 tokenizer
-2. The model evaluates the preference between chunks in a specific format
-3. Only token IDs [32, 33] have non-zero weights in the LM head (A=Chosen, B=Rejected)
-## Input Format
-The model expects input in this precise format:
-```
-[Original text from previous 64-token chunks]
-<<JUDGEMENT_REGION>>
-[Next 64-token chunk to evaluate]
-<</JUDGEMENT_REGION>>
-<<JUDGEMENT>>
-```
-## Example
-Original paragraph:
-```
-The city council meeting started promptly at 6 PM with all members present. Mayor Johnson opened by addressing concerns about the new parking regulations downtown. Citizens expressed both support and opposition during the public comment period. The council ultimately voted 4-2 to implement the regulations starting next month.
-```
-Formatted for prediction:
-```
-The city council meeting started promptly at 6 PM with all members present. Mayor Johnson opened by addressing concerns about the new parking regulations downtown.
-<<JUDGEMENT_REGION>>
-Citizens expressed both support and opposition during the public comment period. The council ultimately voted 4-2 to implement the regulations starting next month.
-<</JUDGEMENT_REGION>>
-<<JUDGEMENT>>
-```
-## Output
-The model predicts whether the chunk in the JUDGEMENT_REGION would be:
-- A: Chosen (preferred content)
-- B: Rejected (less preferred content)
-The prediction is based on the relative probabilities between these two tokens only.
-## Analysis
-For practical use, results should be aggregated by taking the mean of log ratios between the two probabilities:
-```
-log_ratio = log(P(A) / P(B))
-```
 This log ratio approach provides a more stable and interpretable signal across multiple evaluations than using raw probabilities alone.

+---
+license: apache-2.0
+datasets:
+- Quest-AI/quest-270k-chunked-64-judgement
+language:
+- zho
+- eng
+- fra
+- spa
+- por
+- deu
+- ita
+- rus
+- jpn
+- kor
+- vie
+- tha
+- ara
+base_model:
+- Qwen/Qwen2.5-7B
+---
+# Pretrained Reward Model Classifier
+## Overview
+This is a specialized binary classifier that evaluates text chunks and predicts whether they would be "Chosen" (A) or "Rejected" (B).
+## How It Works
+1. Text is split into exact 64-token chunks using the Qwen 2.5 tokenizer
+2. The model evaluates the preference between chunks in a specific format
+3. Only token IDs [32, 33] have non-zero weights in the LM head (A=Chosen, B=Rejected)
+## Input Format
+The model expects input in this precise format:
+```
+[Original text from previous 64-token chunks]
+<<JUDGEMENT_REGION>>
+[Next 64-token chunk to evaluate]
+<</JUDGEMENT_REGION>>
+<<JUDGEMENT>>
+```
+## Example
+Original paragraph:
+```
+The city council meeting started promptly at 6 PM with all members present. Mayor Johnson opened by addressing concerns about the new parking regulations downtown. Citizens expressed both support and opposition during the public comment period. The council ultimately voted 4-2 to implement the regulations starting next month.
+```
+Formatted for prediction:
+```
+The city council meeting started promptly at 6 PM with all members present. Mayor Johnson opened by addressing concerns about the new parking regulations downtown.
+<<JUDGEMENT_REGION>>
+Citizens expressed both support and opposition during the public comment period. The council ultimately voted 4-2 to implement the regulations starting next month.
+<</JUDGEMENT_REGION>>
+<<JUDGEMENT>>
+```
+## Output
+The model predicts whether the chunk in the JUDGEMENT_REGION would be:
+- A: Chosen (preferred content)
+- B: Rejected (less preferred content)
+The prediction is based on the relative probabilities between these two tokens only.
+## Analysis
+For practical use, results should be aggregated by taking the mean of log ratios between the two probabilities:
+```
+log_ratio = log(P(A) / P(B))
+```
 This log ratio approach provides a more stable and interpretable signal across multiple evaluations than using raw probabilities alone.