add usecases in readme
Browse files
README.md
CHANGED
|
@@ -63,4 +63,52 @@ Detailed information including technical report will be released later.
|
|
| 63 |
|---|---|---|---|---|---|---|---|---|
|
| 64 |
||Instruct|Instruct|Non-thinking|Thinking|Non-thinking|Thinking|Non-thinking|Thinking|
|
| 65 |
|Average|67.08|50.95|54.97|77.82|54.66|79.55|54.78|78.66|
|
| 66 |
-
|Improvement||+31.65%|+22.02%|-13.80%|+22.72%|-15.68%|+22.45%|-14.73%|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 63 |
|---|---|---|---|---|---|---|---|---|
|
| 64 |
||Instruct|Instruct|Non-thinking|Thinking|Non-thinking|Thinking|Non-thinking|Thinking|
|
| 65 |
|Average|67.08|50.95|54.97|77.82|54.66|79.55|54.78|78.66|
|
| 66 |
+
|Improvement||+31.65%|+22.02%|-13.80%|+22.72%|-15.68%|+22.45%|-14.73%|
|
| 67 |
+
|
| 68 |
+
## How to use in transformers
|
| 69 |
+
```python
|
| 70 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 71 |
+
import torch
|
| 72 |
+
|
| 73 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 74 |
+
"Motif-Technologies/Motif-2-12.7B-Instruct",
|
| 75 |
+
trust_remote_code = True,
|
| 76 |
+
_attn_implementation = "flash_attention_2",
|
| 77 |
+
dtype = torch.bfloat16 # currently supports bf16 only, for efficiency
|
| 78 |
+
).cuda()
|
| 79 |
+
|
| 80 |
+
tokenizer = AutoTokenizer.from_pretrained(
|
| 81 |
+
"Motif-Technologies/Motif-2-12.7B-Instruct",
|
| 82 |
+
trust_remote_code = True,
|
| 83 |
+
)
|
| 84 |
+
|
| 85 |
+
query = "What is the capital city of South Korea?"
|
| 86 |
+
input_ids = tokenizer.apply_chat_template(
|
| 87 |
+
[
|
| 88 |
+
{'role': 'system', 'content': 'you are an helpful assistant'},
|
| 89 |
+
{'role': 'user', 'content': query},
|
| 90 |
+
],
|
| 91 |
+
add_generation_prompt = True,
|
| 92 |
+
enable_thinking = False, # or True
|
| 93 |
+
return_tensors='pt',
|
| 94 |
+
).cuda()
|
| 95 |
+
|
| 96 |
+
output = model.generate(input_ids, max_new_tokens=1024, pad_token_id=tokenizer.eos_token_id)
|
| 97 |
+
output = tokenizer.decode(output[0, input_ids.shape[-1]:], skip_special_tokens = False)
|
| 98 |
+
print(output)
|
| 99 |
+
```
|
| 100 |
+
### outputs
|
| 101 |
+
```
|
| 102 |
+
# with enable_thinking=True, the model is FORCED to think.
|
| 103 |
+
Okay, the user is asking for the capital city of South Korea. Let me think. I know that South Korea's capital is Seoul. But wait, I should double-check to make sure I'm not mixing it up with other countries. For example, North Korea's capital is Pyongyang. So yes, South Korea's capital is definitely Seoul. I should just provide that as the answer.
|
| 104 |
+
</think>
|
| 105 |
+
The capital city of South Korea is **Seoul**.
|
| 106 |
+
<|endofturn|><|endoftext|>
|
| 107 |
+
|
| 108 |
+
# with enable_thinking=False, the model chooses to think or not. in this example, thinking is not worth it.
|
| 109 |
+
The capital city of South Korea is Seoul.
|
| 110 |
+
<|endofturn|><|endoftext|>
|
| 111 |
+
```
|
| 112 |
+
|
| 113 |
+
## How to use in vllm
|
| 114 |
+
TBD
|