kbu1564 commited on
Commit
7c5212f
·
verified ·
1 Parent(s): dab48f7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +228 -0
README.md CHANGED
@@ -580,6 +580,234 @@ curl http://localhost:8000/v1/completions \
580
  }'
581
  ```
582
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
583
  ## License
584
 
585
  The model is licensed under [HyperCLOVA X SEED Model License Agreement](./LICENSE)
 
580
  }'
581
  ```
582
 
583
+ ### Chat Completions Usage Example
584
+
585
+ 1. Using the Chat completions endpoint
586
+
587
+ <!-- end list -->
588
+
589
+ - Basic serving script (same as completions)
590
+
591
+ `vllm serve naver-hyperclovax/HyperCLOVAX-SEED-Think-14B --trust_remote_code`
592
+
593
+ - Sampling parameters such as `top_k`, `temperature`, `top_p`, `repetition_penalty`, and `max_tokens` can be set freely.
594
+
595
+ - However, the `skip_special_tokens` and `stop` options must be set as below for vLLM to recognize the model's token generation stop signal and cease generation.
596
+
597
+ - request example
598
+
599
+ ```bash
600
+ curl -X POST http://localhost:8000/v1/chat/completions \
601
+ -H "Content-Type: application/json" \
602
+ -d '{
603
+ "model": "naver-hyperclovax/HyperCLOVAX-SEED-Think-14B",
604
+ "messages": [
605
+ {"role": "system", "content": "- The AI language model is named \"CLOVA X\" and was developed by NAVER.\n- Today is Friday, July 18, 2025."},
606
+ {"role": "user", "content": "Explain in as much detail as possible the relationship between the Schrödinger equation and quantum mechanics."}
607
+ ],
608
+ "skip_special_tokens":false,
609
+ "stop": [
610
+ "<|im_end|><|endofturn|>",
611
+ "<|im_end|><|stop|>"
612
+ ],
613
+ "top_k": -1,
614
+ "temperature": 0.5,
615
+ "top_p": 0.6,
616
+ "repetition_penalty": 1.05,
617
+ "max_tokens": 8192
618
+ }'
619
+ ```
620
+
621
+ <!-- end list -->
622
+
623
+ 2. tool call usage example
624
+
625
+ <!-- end list -->
626
+
627
+ - Serving script
628
+
629
+ - You need to add `--enable-auto-tool-choice --tool-call-parser hcx` to the existing script.
630
+
631
+ `vllm serve naver-hyperclovax/HyperCLOVAX-SEED-Think-14B --trust_remote_code --enable-auto-tool-choice --tool-call-parser hcx`
632
+
633
+ - request example
634
+
635
+ - If you put the available tools in `tools`, they will be applied to the `tool_list` part passed to the model.
636
+
637
+ ```bash
638
+ curl -X POST http://localhost:8000/v1/chat/completions \
639
+ -H "Content-Type: application/json" \
640
+ -d '{
641
+ "model": "naver-hyperclovax/HyperCLOVAX-SEED-Think-14B",
642
+ "messages": [
643
+ {"role": "user", "content": "Could you please tell me the current weather conditions in Boston, MA in Celsius?"}
644
+ ],
645
+ "stop": ["<|im_end|><|stop|>", "<|im_end|><|endofturn|>"],
646
+ "max_tokens": 8192,
647
+ "skip_special_tokens": false,
648
+ "tools": [
649
+ {
650
+ "type": "function",
651
+ "function": {
652
+ "name": "get_current_weather",
653
+ "description": "Retrieves the current weather conditions for a specified city and state.",
654
+ "parameters": {
655
+ "type": "dict",
656
+ "required": ["location"],
657
+ "properties": {
658
+ "location": {
659
+ "type": "string",
660
+ "description": "The location for which to get the weather, in the format of \'\\'City, State\\'', such as \'\\'San Francisco, CA\\'' if State for the city exists. \'\\'City, Country\\'' if State for the city does not exist. Use short form for state."
661
+ },
662
+ "unit": {
663
+ "type": "string",
664
+ "description": "The unit of temperature for the weather report.",
665
+ "enum": ["celsius", "fahrenheit"],
666
+ "default": "fahrenheit"
667
+ }
668
+ }
669
+ }
670
+ }
671
+ }
672
+ ]
673
+ }'
674
+ ```
675
+
676
+ - response example
677
+
678
+ - Parsed tool calls are returned in the `tool_calls` field.
679
+ - If there is a response generated by the model other than the tool call, it is returned in the `content` field. Otherwise, `null` is returned.
680
+ ```bash
681
+ {
682
+ "id": "chatcmpl-b9aad45639464c0ebf71861df13b4eb2",
683
+ "object": "chat.completion",
684
+ "created": 1753358351,
685
+ "model": "naver-hyperclovax/HyperCLOVAX-SEED-Think-14B",
686
+ "choices": [
687
+ {
688
+ "index": 0,
689
+ "message": {
690
+ "role": "assistant",
691
+ "reasoning_content": null,
692
+ "content": null,
693
+ "tool_calls": [
694
+ {
695
+ "id": "chatcmpl-tool-b9513352e4c64065a315e495b2613753",
696
+ "type": "function",
697
+ "function": {
698
+ "name": "get_current_weather",
699
+ "arguments": "{\"location\": \"Boston, MA\", \"unit\": \"celsius\"}"
700
+ }
701
+ }
702
+ ]
703
+ },
704
+ "logprobs": null,
705
+ "finish_reason": "tool_calls",
706
+ "stop_reason": "<|im_end|><|stop|>"
707
+ }
708
+ ],
709
+ "usage": {
710
+ "prompt_tokens": 189,
711
+ "total_tokens": 224,
712
+ "completion_tokens": 35,
713
+ "prompt_tokens_details": null
714
+ },
715
+ "prompt_logprobs": null,
716
+ "kv_transfer_params": null
717
+ }
718
+ ```
719
+
720
+ <!-- end list -->
721
+
722
+ 3. reasoning usage example
723
+
724
+ <!-- end list -->
725
+
726
+ - Serving script
727
+ - You need to add `--enable-reasoning --reasoning-parser hcx` to the existing script.
728
+
729
+ `vllm serve naver-hyperclovax/HyperCLOVAX-SEED-Think-14B --trust_remote_code --enable-reasoning --reasoning-parser hcx`
730
+
731
+ - The `--enable-reasoning` option has been deprecated since vLLM v0.9.0.
732
+
733
+ - If you are using vLLM v0.9.0 or higher, you only need to add `--reasoning-parser hcx` without `--enable-reasoning`.
734
+
735
+ - The reasoning parser extracts the reasoning content from responses generated in reasoning mode. This option does not always make the model operate in reasoning mode, nor does excluding the parser necessarily force non-reasoning operation.
736
+
737
+ - request example
738
+
739
+ - `"chat_template_kwargs": {"force_reasoning": true}` forces reasoning.
740
+ - `"chat_template_kwargs": {"skip_reasoning": true}` forces non-reasoning.
741
+ - If both are set to `true`, `force_reasoning: true` has higher priority.
742
+ - If neither is given, the model decides whether to reason or not.
743
+
744
+ <!-- end list -->
745
+
746
+ ```bash
747
+ curl -X POST http://localhost:8000/v1/chat/completions \
748
+ -H "Content-Type: application/json" \
749
+ -d '{
750
+ "model": "naver-hyperclovax/HyperCLOVAX-SEED-Think-14B",
751
+ "messages": [
752
+ {"role": "system", "content": "You are a helpful assistant."},
753
+ {"role": "user", "content": "Tell me the prime number closest to 1000."}
754
+ ],
755
+ "stop": ["<|im_end|><|stop|>", "<|im_end|><|endofturn|>"],
756
+ "max_tokens": 8192,
757
+ "skip_special_tokens": false,
758
+ "chat_template_kwargs": {"force_reasoning": true}
759
+ }'
760
+ ```
761
+
762
+ - response example
763
+
764
+ - The reasoning part is returned in the `reasoning_content` field, and the assistant's final response is returned in the `content` field separately.
765
+
766
+ <!-- end list -->
767
+
768
+ ```bash
769
+ {
770
+ "id": "chatcmpl-157d282ebaca4333a9f04b1bdfa7eb8b",
771
+ "object": "chat.completion",
772
+ "created": 1753361336,
773
+ "model": "naver-hyperclovax/HyperCLOVAX-SEED-Think-14B",
774
+ "choices": [
775
+ {
776
+ "index": 0,
777
+ "message": {
778
+ "role": "assistant",
779
+ "reasoning_content": "Okay, so I need to find the prime number closest to 1000. Hmm, let's start by recalling what a prime number is. ... (중략) ... But in this case, at least for the nearest, 3 versus9, which gives997 as the answer.\n\nTherefore, the correct answer would be997.",
780
+ "content": "\nThe prime number closest to 1000 is **997**. \n\nHere's the reasoning:\n1. **Check if 1000 is prime:** It's even, so not prime.\n2. **Find primes near 1000:**\n - **Below 1000:** Check numbers downward. \n - Upon verifying, 997 is prime (no divisors up to its square root).\n - **Above 1000:** Next prime after 997 is 1009, which is 9 units away from 1000.\n3. **Compare distances:**\n - \\( |1000 - 997| = 3 \\)\n - \\( |1009 - 1000| = 9 \\)\n \nSince 3 < 9, **997** is the closest prime. \n\n\\boxed{997}",
781
+ "tool_calls": []
782
+ },
783
+ "logprobs": null,
784
+ "finish_reason": "stop",
785
+ "stop_reason": "<|im_end|><|endofturn|>"
786
+ }
787
+ ],
788
+ "usage": {
789
+ "prompt_tokens": 38,
790
+ "total_tokens": 3254,
791
+ "completion_tokens": 3216,
792
+ "prompt_tokens_details": null
793
+ },
794
+ "prompt_logprobs": null,
795
+ "kv_transfer_params": null
796
+ }
797
+ ```
798
+
799
+ <!-- end list -->
800
+
801
+ 4. reasoning + tool call usage example
802
+
803
+ <!-- end list -->
804
+
805
+ - Serving script
806
+ - If you want to use both the reasoning parser and the tool call parser, you can combine the reasoning serving script and the tool call serving script.
807
+
808
+ `vllm serve naver-hyperclovax/HyperCLOVAX-SEED-Think-14B --trust_remote_code --enable-auto-tool-choice --tool-call-parser hcx --enable-reasoning --reasoning-parser hcx`
809
+
810
+
811
  ## License
812
 
813
  The model is licensed under [HyperCLOVA X SEED Model License Agreement](./LICENSE)