Update README.md
Browse files
README.md
CHANGED
|
@@ -580,6 +580,234 @@ curl http://localhost:8000/v1/completions \
|
|
| 580 |
}'
|
| 581 |
```
|
| 582 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 583 |
## License
|
| 584 |
|
| 585 |
The model is licensed under [HyperCLOVA X SEED Model License Agreement](./LICENSE)
|
|
|
|
| 580 |
}'
|
| 581 |
```
|
| 582 |
|
| 583 |
+
### Chat Completions Usage Example
|
| 584 |
+
|
| 585 |
+
1. Using the Chat completions endpoint
|
| 586 |
+
|
| 587 |
+
<!-- end list -->
|
| 588 |
+
|
| 589 |
+
- Basic serving script (same as completions)
|
| 590 |
+
|
| 591 |
+
`vllm serve naver-hyperclovax/HyperCLOVAX-SEED-Think-14B --trust_remote_code`
|
| 592 |
+
|
| 593 |
+
- Sampling parameters such as `top_k`, `temperature`, `top_p`, `repetition_penalty`, and `max_tokens` can be set freely.
|
| 594 |
+
|
| 595 |
+
- However, the `skip_special_tokens` and `stop` options must be set as below for vLLM to recognize the model's token generation stop signal and cease generation.
|
| 596 |
+
|
| 597 |
+
- request example
|
| 598 |
+
|
| 599 |
+
```bash
|
| 600 |
+
curl -X POST http://localhost:8000/v1/chat/completions \
|
| 601 |
+
-H "Content-Type: application/json" \
|
| 602 |
+
-d '{
|
| 603 |
+
"model": "naver-hyperclovax/HyperCLOVAX-SEED-Think-14B",
|
| 604 |
+
"messages": [
|
| 605 |
+
{"role": "system", "content": "- The AI language model is named \"CLOVA X\" and was developed by NAVER.\n- Today is Friday, July 18, 2025."},
|
| 606 |
+
{"role": "user", "content": "Explain in as much detail as possible the relationship between the Schrödinger equation and quantum mechanics."}
|
| 607 |
+
],
|
| 608 |
+
"skip_special_tokens":false,
|
| 609 |
+
"stop": [
|
| 610 |
+
"<|im_end|><|endofturn|>",
|
| 611 |
+
"<|im_end|><|stop|>"
|
| 612 |
+
],
|
| 613 |
+
"top_k": -1,
|
| 614 |
+
"temperature": 0.5,
|
| 615 |
+
"top_p": 0.6,
|
| 616 |
+
"repetition_penalty": 1.05,
|
| 617 |
+
"max_tokens": 8192
|
| 618 |
+
}'
|
| 619 |
+
```
|
| 620 |
+
|
| 621 |
+
<!-- end list -->
|
| 622 |
+
|
| 623 |
+
2. tool call usage example
|
| 624 |
+
|
| 625 |
+
<!-- end list -->
|
| 626 |
+
|
| 627 |
+
- Serving script
|
| 628 |
+
|
| 629 |
+
- You need to add `--enable-auto-tool-choice --tool-call-parser hcx` to the existing script.
|
| 630 |
+
|
| 631 |
+
`vllm serve naver-hyperclovax/HyperCLOVAX-SEED-Think-14B --trust_remote_code --enable-auto-tool-choice --tool-call-parser hcx`
|
| 632 |
+
|
| 633 |
+
- request example
|
| 634 |
+
|
| 635 |
+
- If you put the available tools in `tools`, they will be applied to the `tool_list` part passed to the model.
|
| 636 |
+
|
| 637 |
+
```bash
|
| 638 |
+
curl -X POST http://localhost:8000/v1/chat/completions \
|
| 639 |
+
-H "Content-Type: application/json" \
|
| 640 |
+
-d '{
|
| 641 |
+
"model": "naver-hyperclovax/HyperCLOVAX-SEED-Think-14B",
|
| 642 |
+
"messages": [
|
| 643 |
+
{"role": "user", "content": "Could you please tell me the current weather conditions in Boston, MA in Celsius?"}
|
| 644 |
+
],
|
| 645 |
+
"stop": ["<|im_end|><|stop|>", "<|im_end|><|endofturn|>"],
|
| 646 |
+
"max_tokens": 8192,
|
| 647 |
+
"skip_special_tokens": false,
|
| 648 |
+
"tools": [
|
| 649 |
+
{
|
| 650 |
+
"type": "function",
|
| 651 |
+
"function": {
|
| 652 |
+
"name": "get_current_weather",
|
| 653 |
+
"description": "Retrieves the current weather conditions for a specified city and state.",
|
| 654 |
+
"parameters": {
|
| 655 |
+
"type": "dict",
|
| 656 |
+
"required": ["location"],
|
| 657 |
+
"properties": {
|
| 658 |
+
"location": {
|
| 659 |
+
"type": "string",
|
| 660 |
+
"description": "The location for which to get the weather, in the format of \'\\'City, State\\'', such as \'\\'San Francisco, CA\\'' if State for the city exists. \'\\'City, Country\\'' if State for the city does not exist. Use short form for state."
|
| 661 |
+
},
|
| 662 |
+
"unit": {
|
| 663 |
+
"type": "string",
|
| 664 |
+
"description": "The unit of temperature for the weather report.",
|
| 665 |
+
"enum": ["celsius", "fahrenheit"],
|
| 666 |
+
"default": "fahrenheit"
|
| 667 |
+
}
|
| 668 |
+
}
|
| 669 |
+
}
|
| 670 |
+
}
|
| 671 |
+
}
|
| 672 |
+
]
|
| 673 |
+
}'
|
| 674 |
+
```
|
| 675 |
+
|
| 676 |
+
- response example
|
| 677 |
+
|
| 678 |
+
- Parsed tool calls are returned in the `tool_calls` field.
|
| 679 |
+
- If there is a response generated by the model other than the tool call, it is returned in the `content` field. Otherwise, `null` is returned.
|
| 680 |
+
```bash
|
| 681 |
+
{
|
| 682 |
+
"id": "chatcmpl-b9aad45639464c0ebf71861df13b4eb2",
|
| 683 |
+
"object": "chat.completion",
|
| 684 |
+
"created": 1753358351,
|
| 685 |
+
"model": "naver-hyperclovax/HyperCLOVAX-SEED-Think-14B",
|
| 686 |
+
"choices": [
|
| 687 |
+
{
|
| 688 |
+
"index": 0,
|
| 689 |
+
"message": {
|
| 690 |
+
"role": "assistant",
|
| 691 |
+
"reasoning_content": null,
|
| 692 |
+
"content": null,
|
| 693 |
+
"tool_calls": [
|
| 694 |
+
{
|
| 695 |
+
"id": "chatcmpl-tool-b9513352e4c64065a315e495b2613753",
|
| 696 |
+
"type": "function",
|
| 697 |
+
"function": {
|
| 698 |
+
"name": "get_current_weather",
|
| 699 |
+
"arguments": "{\"location\": \"Boston, MA\", \"unit\": \"celsius\"}"
|
| 700 |
+
}
|
| 701 |
+
}
|
| 702 |
+
]
|
| 703 |
+
},
|
| 704 |
+
"logprobs": null,
|
| 705 |
+
"finish_reason": "tool_calls",
|
| 706 |
+
"stop_reason": "<|im_end|><|stop|>"
|
| 707 |
+
}
|
| 708 |
+
],
|
| 709 |
+
"usage": {
|
| 710 |
+
"prompt_tokens": 189,
|
| 711 |
+
"total_tokens": 224,
|
| 712 |
+
"completion_tokens": 35,
|
| 713 |
+
"prompt_tokens_details": null
|
| 714 |
+
},
|
| 715 |
+
"prompt_logprobs": null,
|
| 716 |
+
"kv_transfer_params": null
|
| 717 |
+
}
|
| 718 |
+
```
|
| 719 |
+
|
| 720 |
+
<!-- end list -->
|
| 721 |
+
|
| 722 |
+
3. reasoning usage example
|
| 723 |
+
|
| 724 |
+
<!-- end list -->
|
| 725 |
+
|
| 726 |
+
- Serving script
|
| 727 |
+
- You need to add `--enable-reasoning --reasoning-parser hcx` to the existing script.
|
| 728 |
+
|
| 729 |
+
`vllm serve naver-hyperclovax/HyperCLOVAX-SEED-Think-14B --trust_remote_code --enable-reasoning --reasoning-parser hcx`
|
| 730 |
+
|
| 731 |
+
- The `--enable-reasoning` option has been deprecated since vLLM v0.9.0.
|
| 732 |
+
|
| 733 |
+
- If you are using vLLM v0.9.0 or higher, you only need to add `--reasoning-parser hcx` without `--enable-reasoning`.
|
| 734 |
+
|
| 735 |
+
- The reasoning parser extracts the reasoning content from responses generated in reasoning mode. This option does not always make the model operate in reasoning mode, nor does excluding the parser necessarily force non-reasoning operation.
|
| 736 |
+
|
| 737 |
+
- request example
|
| 738 |
+
|
| 739 |
+
- `"chat_template_kwargs": {"force_reasoning": true}` forces reasoning.
|
| 740 |
+
- `"chat_template_kwargs": {"skip_reasoning": true}` forces non-reasoning.
|
| 741 |
+
- If both are set to `true`, `force_reasoning: true` has higher priority.
|
| 742 |
+
- If neither is given, the model decides whether to reason or not.
|
| 743 |
+
|
| 744 |
+
<!-- end list -->
|
| 745 |
+
|
| 746 |
+
```bash
|
| 747 |
+
curl -X POST http://localhost:8000/v1/chat/completions \
|
| 748 |
+
-H "Content-Type: application/json" \
|
| 749 |
+
-d '{
|
| 750 |
+
"model": "naver-hyperclovax/HyperCLOVAX-SEED-Think-14B",
|
| 751 |
+
"messages": [
|
| 752 |
+
{"role": "system", "content": "You are a helpful assistant."},
|
| 753 |
+
{"role": "user", "content": "Tell me the prime number closest to 1000."}
|
| 754 |
+
],
|
| 755 |
+
"stop": ["<|im_end|><|stop|>", "<|im_end|><|endofturn|>"],
|
| 756 |
+
"max_tokens": 8192,
|
| 757 |
+
"skip_special_tokens": false,
|
| 758 |
+
"chat_template_kwargs": {"force_reasoning": true}
|
| 759 |
+
}'
|
| 760 |
+
```
|
| 761 |
+
|
| 762 |
+
- response example
|
| 763 |
+
|
| 764 |
+
- The reasoning part is returned in the `reasoning_content` field, and the assistant's final response is returned in the `content` field separately.
|
| 765 |
+
|
| 766 |
+
<!-- end list -->
|
| 767 |
+
|
| 768 |
+
```bash
|
| 769 |
+
{
|
| 770 |
+
"id": "chatcmpl-157d282ebaca4333a9f04b1bdfa7eb8b",
|
| 771 |
+
"object": "chat.completion",
|
| 772 |
+
"created": 1753361336,
|
| 773 |
+
"model": "naver-hyperclovax/HyperCLOVAX-SEED-Think-14B",
|
| 774 |
+
"choices": [
|
| 775 |
+
{
|
| 776 |
+
"index": 0,
|
| 777 |
+
"message": {
|
| 778 |
+
"role": "assistant",
|
| 779 |
+
"reasoning_content": "Okay, so I need to find the prime number closest to 1000. Hmm, let's start by recalling what a prime number is. ... (중략) ... But in this case, at least for the nearest, 3 versus9, which gives997 as the answer.\n\nTherefore, the correct answer would be997.",
|
| 780 |
+
"content": "\nThe prime number closest to 1000 is **997**. \n\nHere's the reasoning:\n1. **Check if 1000 is prime:** It's even, so not prime.\n2. **Find primes near 1000:**\n - **Below 1000:** Check numbers downward. \n - Upon verifying, 997 is prime (no divisors up to its square root).\n - **Above 1000:** Next prime after 997 is 1009, which is 9 units away from 1000.\n3. **Compare distances:**\n - \\( |1000 - 997| = 3 \\)\n - \\( |1009 - 1000| = 9 \\)\n \nSince 3 < 9, **997** is the closest prime. \n\n\\boxed{997}",
|
| 781 |
+
"tool_calls": []
|
| 782 |
+
},
|
| 783 |
+
"logprobs": null,
|
| 784 |
+
"finish_reason": "stop",
|
| 785 |
+
"stop_reason": "<|im_end|><|endofturn|>"
|
| 786 |
+
}
|
| 787 |
+
],
|
| 788 |
+
"usage": {
|
| 789 |
+
"prompt_tokens": 38,
|
| 790 |
+
"total_tokens": 3254,
|
| 791 |
+
"completion_tokens": 3216,
|
| 792 |
+
"prompt_tokens_details": null
|
| 793 |
+
},
|
| 794 |
+
"prompt_logprobs": null,
|
| 795 |
+
"kv_transfer_params": null
|
| 796 |
+
}
|
| 797 |
+
```
|
| 798 |
+
|
| 799 |
+
<!-- end list -->
|
| 800 |
+
|
| 801 |
+
4. reasoning + tool call usage example
|
| 802 |
+
|
| 803 |
+
<!-- end list -->
|
| 804 |
+
|
| 805 |
+
- Serving script
|
| 806 |
+
- If you want to use both the reasoning parser and the tool call parser, you can combine the reasoning serving script and the tool call serving script.
|
| 807 |
+
|
| 808 |
+
`vllm serve naver-hyperclovax/HyperCLOVAX-SEED-Think-14B --trust_remote_code --enable-auto-tool-choice --tool-call-parser hcx --enable-reasoning --reasoning-parser hcx`
|
| 809 |
+
|
| 810 |
+
|
| 811 |
## License
|
| 812 |
|
| 813 |
The model is licensed under [HyperCLOVA X SEED Model License Agreement](./LICENSE)
|