--- language: - ko - en base_model: - naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-1.5B --- llama.cpp를 사용해 gguf로 변환했습니다. ```python from llama_cpp import Llama llm = Llama( model_path="HyperCLOVAX-SEED-Text-Instruct-1.5B-gguf-bf16.gguf", n_gpu_layers=-1, main_gpu=0, n_ctx=2048 ) output = llm( "재미있는 이야기 하나 만들어줘. 1000자 이상이어야 해. 시작:", # Prompt max_tokens=2048, echo=True, ) print(output) ``` geforce 3070 RTX로 테스트했으며, 성능은 다음과 같습니다. ```angular2html bf16, peak: 4GB llama_perf_context_print: load time = 210.50 ms llama_perf_context_print: prompt eval time = 210.42 ms / 19 tokens ( 11.07 ms per token, 90.30 tokens per second) llama_perf_context_print: eval time = 17923.17 ms / 2028 runs ( 8.84 ms per token, 113.15 tokens per second) llama_perf_context_print: total time = 21307.79 ms / 2047 tokens ```