zhichen
/

Llama3-Chinese

@@ -36,7 +36,7 @@
 **Github:** [https://github.com/seanzhang-zhichen/llama3-chinese](https://github.com/seanzhang-zhichen/llama3-chinese)
-![DEMO](./images/vllm_web_demo.png)
 ## Download Model
@@ -63,7 +63,6 @@ git clone https://www.modelscope.cn/LLM-Research/Meta-Llama-3-8B.git
 ```bash
 git lfs install
 git clone https://www.modelscope.cn/seanzhang/Llama3-Chinese-Lora.git
 ```
 **From HuggingFace**
@@ -96,6 +95,48 @@ git lfs install
 git clone https://huggingface.co/zhichen/Llama3-Chinese
 ```
 ## VLLM WEB DEMO
@@ -131,7 +172,7 @@ If you used Llama3-Chinese in your research, cite it in the following format:
 ```latex
 @misc{Llama3-Chinese,
   title={Llama3-Chinese},
-  author={Zhichen Zhang},
   year={2024},
   howpublished={\url{https://github.com/seanzhang-zhichen/llama3-chinese}},
 }

 **Github:** [https://github.com/seanzhang-zhichen/llama3-chinese](https://github.com/seanzhang-zhichen/llama3-chinese)
+![DEMO](./images/web_demo.png)
 ## Download Model
 ```bash
 git lfs install
 git clone https://www.modelscope.cn/seanzhang/Llama3-Chinese-Lora.git
 ```
 **From HuggingFace**
 git clone https://huggingface.co/zhichen/Llama3-Chinese
 ```
+## Inference
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+model_id = "zhichen/Llama3-Chinese"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")
+messages = [
+    {"role": "system", "content": "You are a helpful assistant."},
+    {"role": "user", "content": "你好"},
+]
+input_ids = tokenizer.apply_chat_template(
+    messages, add_generation_prompt=True, return_tensors="pt"
+).to(model.device)
+outputs = model.generate(
+    input_ids,
+    max_new_tokens=2048,
+    do_sample=True,
+    temperature=0.7,
+    top_p=0.95,
+)
+response = outputs[0][input_ids.shape[-1]:]
+print(tokenizer.decode(response, skip_special_tokens=True))
+```
+## CLI DEMO
+```bash
+python cli_demo.py --model_path zhichen/Llama3-Chinese
+```
+## WEB DEMO
+```bash
+python web_demo.py --model_path zhichen/Llama3-Chinese
+```
 ## VLLM WEB DEMO
 ```latex
 @misc{Llama3-Chinese,
   title={Llama3-Chinese},
+  author={Zhichen Zhang, Xin LU, Long Chen},
   year={2024},
   howpublished={\url{https://github.com/seanzhang-zhichen/llama3-chinese}},
 }

README_CN.md CHANGED Viewed

@@ -37,7 +37,7 @@
 **Github:** [https://github.com/seanzhang-zhichen/llama3-chinese](https://github.com/seanzhang-zhichen/llama3-chinese)
-![DEMO](./images/vllm_web_demo.png)
 ## 模型下载
@@ -96,6 +96,47 @@ git clone https://huggingface.co/zhichen/Llama3-Chinese
 ```
 ## vllm web 推理
@@ -133,7 +174,7 @@ Llama3-Chinese项目代码的授权协议为 [The Apache License 2.0](./LICENSE)
 ```latex
 @misc{Llama3-Chinese,
   title={Llama3-Chinese},
-  author={Zhichen Zhang},
   year={2024},
   howpublished={\url{https://github.com/seanzhang-zhichen/llama3-chinese}},
 }

 **Github:** [https://github.com/seanzhang-zhichen/llama3-chinese](https://github.com/seanzhang-zhichen/llama3-chinese)
+![DEMO](./images/web_demo.png)
 ## 模型下载
 ```
+## 推理
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+model_id = "zhichen/Llama3-Chinese"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")
+messages = [
+    {"role": "system", "content": "You are a helpful assistant."},
+    {"role": "user", "content": "你好"},
+]
+input_ids = tokenizer.apply_chat_template(
+    messages, add_generation_prompt=True, return_tensors="pt"
+).to(model.device)
+outputs = model.generate(
+    input_ids,
+    max_new_tokens=2048,
+    do_sample=True,
+    temperature=0.7,
+    top_p=0.95,
+)
+response = outputs[0][input_ids.shape[-1]:]
+print(tokenizer.decode(response, skip_special_tokens=True))
+```
+## 命令行推理
+```bash
+python cli_demo.py --model_path zhichen/Llama3-Chinese
+```
+## web推理
+```bash
+python web_demo.py --model_path zhichen/Llama3-Chinese
+```
 ## vllm web 推理
 ```latex
 @misc{Llama3-Chinese,
   title={Llama3-Chinese},
+  author={Zhichen Zhang, Xin LU, Long Chen},
   year={2024},
   howpublished={\url{https://github.com/seanzhang-zhichen/llama3-chinese}},
 }