Update README.md
Browse files
README.md
CHANGED
|
@@ -96,6 +96,36 @@ python ../scratch/run_qwen.py --model [PATH_TO_YOUR_MODEL] --no_bos --save [SAVE
|
|
| 96 |
```
|
| 97 |
Note the `--no_bos` option here.
|
| 98 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 99 |
### ALFWorld
|
| 100 |
|
| 101 |
This part requires [ALFWorld](https://github.com/alfworld/alfworld) to be installed.
|
|
@@ -120,4 +150,4 @@ You can use `--split eval_in_distribution` for seen environments.
|
|
| 120 |
year={2024},
|
| 121 |
url={https://api.semanticscholar.org/CorpusID:274965107}
|
| 122 |
}
|
| 123 |
-
```
|
|
|
|
| 96 |
```
|
| 97 |
Note the `--no_bos` option here.
|
| 98 |
|
| 99 |
+
Here is a script that uses the OREO model to solve a specific math problem:
|
| 100 |
+
```python
|
| 101 |
+
from vllm import LLM, SamplingParams
|
| 102 |
+
from transformers import AutoTokenizer
|
| 103 |
+
|
| 104 |
+
model_path = "/mnt/data/ckpt/pcl/qwen_full_lr5e-6_beta0-03_rew01_actor-loss-dro_kl-reg-unbiased1e-2_plot-weights"
|
| 105 |
+
tokenizer = AutoTokenizer.from_pretrained(model_path)
|
| 106 |
+
llm = LLM(model_path)
|
| 107 |
+
params = SamplingParams(temperature=0, max_tokens=2048)
|
| 108 |
+
|
| 109 |
+
message = [
|
| 110 |
+
{"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{}."},
|
| 111 |
+
{
|
| 112 |
+
"role": "user",
|
| 113 |
+
"content": "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
|
| 114 |
+
},
|
| 115 |
+
]
|
| 116 |
+
prompt = tokenizer.apply_chat_template(message, tokenize=False, add_generation_prompt=True)
|
| 117 |
+
|
| 118 |
+
result = llm.generate(prompt, params)
|
| 119 |
+
print(result[0].outputs[0].text)
|
| 120 |
+
```
|
| 121 |
+
The output should be something like the following:
|
| 122 |
+
```
|
| 123 |
+
First find the total number of eggs Janet has each day: $16$ eggs/day
|
| 124 |
+
Then subtract the number of eggs she eats for breakfast: $16-3=13$ eggs/day
|
| 125 |
+
Then subtract the number of eggs she bakes for her friends: $13-4=9$ eggs/day
|
| 126 |
+
Then multiply the number of eggs she sells by the price per egg to find her daily earnings: $9\cdot2=\boxed{18}$ dollars/day
|
| 127 |
+
```
|
| 128 |
+
|
| 129 |
### ALFWorld
|
| 130 |
|
| 131 |
This part requires [ALFWorld](https://github.com/alfworld/alfworld) to be installed.
|
|
|
|
| 150 |
year={2024},
|
| 151 |
url={https://api.semanticscholar.org/CorpusID:274965107}
|
| 152 |
}
|
| 153 |
+
```
|