sorryhyun commited on
Commit
e3208b6
ยท
verified ยท
1 Parent(s): dccd437

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +37 -30
README.md CHANGED
@@ -1,31 +1,38 @@
1
- llama.cpp๋ฅผ ์‚ฌ์šฉํ•ด gguf๋กœ ๋ณ€ํ™˜ํ–ˆ์Šต๋‹ˆ๋‹ค.
2
-
3
-
4
- ```python
5
- from llama_cpp import Llama
6
-
7
- llm = Llama(
8
- model_path="HyperCLOVAX-SEED-Text-Instruct-1.5B-gguf-bf16.gguf",
9
- n_gpu_layers=-1,
10
- main_gpu=0,
11
- n_ctx=2048
12
- )
13
-
14
- output = llm(
15
- "์žฌ๋ฏธ์žˆ๋Š” ์ด์•ผ๊ธฐ ํ•˜๋‚˜ ๋งŒ๋“ค์–ด์ค˜. 1000์ž ์ด์ƒ์ด์–ด์•ผ ํ•ด. ์‹œ์ž‘:", # Prompt
16
- max_tokens=2048,
17
- echo=True,
18
-
19
- )
20
- print(output)
21
- ```
22
-
23
- geforce 3070 RTX๋กœ ํ…Œ์ŠคํŠธํ–ˆ์œผ๋ฉฐ, ์„ฑ๋Šฅ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
24
-
25
- ```angular2html
26
- bf16, peak: 4GB
27
- llama_perf_context_print: load time = 210.50 ms
28
- llama_perf_context_print: prompt eval time = 210.42 ms / 19 tokens ( 11.07 ms per token, 90.30 tokens per second)
29
- llama_perf_context_print: eval time = 17923.17 ms / 2028 runs ( 8.84 ms per token, 113.15 tokens per second)
30
- llama_perf_context_print: total time = 21307.79 ms / 2047 tokens
 
 
 
 
 
 
 
31
  ```
 
1
+ ---
2
+ language:
3
+ - ko
4
+ - en
5
+ base_model:
6
+ - naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-1.5B
7
+ ---
8
+ llama.cpp๋ฅผ ์‚ฌ์šฉํ•ด gguf๋กœ ๋ณ€ํ™˜ํ–ˆ์Šต๋‹ˆ๋‹ค.
9
+
10
+
11
+ ```python
12
+ from llama_cpp import Llama
13
+
14
+ llm = Llama(
15
+ model_path="HyperCLOVAX-SEED-Text-Instruct-1.5B-gguf-bf16.gguf",
16
+ n_gpu_layers=-1,
17
+ main_gpu=0,
18
+ n_ctx=2048
19
+ )
20
+
21
+ output = llm(
22
+ "์žฌ๋ฏธ์žˆ๋Š” ์ด์•ผ๊ธฐ ํ•˜๋‚˜ ๋งŒ๋“ค์–ด์ค˜. 1000์ž ์ด์ƒ์ด์–ด์•ผ ํ•ด. ์‹œ์ž‘:", # Prompt
23
+ max_tokens=2048,
24
+ echo=True,
25
+
26
+ )
27
+ print(output)
28
+ ```
29
+
30
+ geforce 3070 RTX๋กœ ํ…Œ์ŠคํŠธํ–ˆ์œผ๋ฉฐ, ์„ฑ๋Šฅ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
31
+
32
+ ```angular2html
33
+ bf16, peak: 4GB
34
+ llama_perf_context_print: load time = 210.50 ms
35
+ llama_perf_context_print: prompt eval time = 210.42 ms / 19 tokens ( 11.07 ms per token, 90.30 tokens per second)
36
+ llama_perf_context_print: eval time = 17923.17 ms / 2028 runs ( 8.84 ms per token, 113.15 tokens per second)
37
+ llama_perf_context_print: total time = 21307.79 ms / 2047 tokens
38
  ```