---
datasets:
- yentinglin/zh_TW_c4
- yentinglin/traditional_chinese_instructions
inference: false
license: llama2
language:
- zh
model_creator: Yen-Ting Lin
model_link: https://huggingface.co/yentinglin/Taiwan-LLaMa-v1.0
model_name: Language Models for Taiwanese Culture 1.0
model_type: llama
quantized_by: weiren119
---
# Taiwan-LLaMa-v1.0-GPTQ
- Model creator: [Yen-Ting Lin](https://huggingface.co/yentinglin)
- Original model: [Language Models for Taiwanese Culture v1.0](https://huggingface.co/yentinglin/Taiwan-LLaMa-v1.0)
## Description
This repo contains GPTQ format model files for [Yen-Ting Lin's Language Models for Taiwanese Culture v1.0](https://huggingface.co/yentinglin/Taiwan-LLaMa-v1.0).
## Intro
- The 4bits-GQTQ model was converted from [Taiwan-LLaMa-v1.0 13b](https://huggingface.co/yentinglin/Taiwan-LLaMa-v1.0) by the package [auto-gptq](https://github.com/PanQiWei/AutoGPTQ)
## How to use gptq model pyhton code
- Install gptq package: `pip install auto-gptq`
- Here is the example code
```
from transformers import AutoTokenizer,TextStreamer,TextIteratorStreamer
from auto_gptq import AutoGPTQForCausalLM
class TaiwanLLaMaGPTQ:
def __init__(self, model_dir):
self.tokenizer = AutoTokenizer.from_pretrained(model_dir, use_fast=True)
self.model = AutoGPTQForCausalLM.from_quantized(model_dir,
trust_remote_code=True,
use_safetensors=True,
device_map="auto",
use_triton=False,
strict=False)
self.chat_history = []
self.system_prompt = """You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information."""
self.streamer = TextStreamer(self.tokenizer, skip_prompt=True, skip_special_tokens=True)
self.thread_streamer = TextIteratorStreamer(self.tokenizer, skip_special_tokens=True)
def get_prompt(self, message: str, chat_history: list[tuple[str, str]]) -> str:
texts = [f'[INST] < [INST] ')
texts.append(f'{message.strip()} [/INST]')
return ''.join(texts)
def generate(self, message: str):
prompt = self.get_prompt(message, self.chat_history)
tokens = self.tokenizer(prompt, return_tensors='pt').input_ids
generate_ids = self.model.generate(input_ids=tokens.cuda(), max_new_tokens=4096, streamer=self.streamer)
output = self.tokenizer.decode(generate_ids[0, len(tokens[0]):-1]).strip()
self.chat_history.append([message, output])
return output
def thread_generate(self, message:str):
from threading import Thread
prompt = self.get_prompt(message, self.chat_history)
inputs = self.tokenizer(prompt, return_tensors="pt")
generation_kwargs = dict(
inputs=inputs.input_ids.cuda(),
attention_mask=inputs.attention_mask,
temperature=0.1,
max_new_tokens=1024,
streamer=self.thread_streamer,
)
# Run generation on separate thread to enable response streaming.
thread = Thread(target=self.model.generate, kwargs=generation_kwargs)
thread.start()
for new_text in self.thread_streamer:
yield new_text
thread.join()
inferencer = TaiwanLLaMaGPTQ("weiren119/Taiwan-LLaMa-v1.0-4bits-GPTQ")
s = ''
while True:
s = input("User: ")
if s != '':
print ('Answer:')
print (inferencer.generate(s))
print ('-'*80)
```
# Original model card: Yen-Ting Lin's Language Models for Taiwanese Culture v1.0
# Language Models for Taiwanese Culture
✍️ Online Demo
•
🤗 HF Repo • 🐦 Twitter • 📃 [Paper Coming Soon]
• 👨️ Yen-Ting Lin
The scores are calculated with ChatGPT as the baseline, represented as 100%. The other values show the relative performance of different models compared to ChatGPT.
| Language Model | Relative Score (%) |
|-------------------------------------|--------------------|
| GPT-4 | 102.59% |
| ChatGPT | 100.00% |
| **Taiwan-LLaMa v1.0** | 76.76% |
| Claude-Instant-1.2 | 74.04% |
| Llama2_Traditional_Chinese_13b_Chat | 56.21% |
## How to deploy the model on my own machine?
We recommend hosting models with [🤗 Text Generation Inference](https://github.com/huggingface/text-generation-inference). Please see their [license](https://github.com/huggingface/text-generation-inference/blob/main/LICENSE) for details on usage and limitations.
```bash
bash run_text_generation_inference.sh "yentinglin/Taiwan-LLaMa" NUM_GPUS DIR_TO_SAVE_MODEL PORT MAX_INPUT_LEN MODEL_MAX_LEN
```
Prompt format follows vicuna-v1.1 template:
```
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {user} ASSISTANT:
```
## Setup development environment
```bash
conda create -n taiwan-llama python=3.10 -y
conda activate taiwan-llama
pip install -r requirements.txt
```
## Citations
If you use our code, data, or models in your research, please cite this repository. You can use the following BibTeX entry:
```bibtex
@inproceedings{lin-chen-2023-llm,
title = "{LLM}-Eval: Unified Multi-Dimensional Automatic Evaluation for Open-Domain Conversations with Large Language Models",
author = "Lin, Yen-Ting and Chen, Yun-Nung",
booktitle = "Proceedings of the 5th Workshop on NLP for Conversational AI (NLP4ConvAI 2023)",
month = jul,
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.nlp4convai-1.5",
pages = "47--58"
}
@misc{taiwanllama,
author={Lin, Yen-Ting and Chen, Yun-Nung},
title={Taiwanese-Aligned Language Models based on Meta-Llama2},
year={2023},
url={https://github.com/adamlin120/Taiwan-LLaMa},
note={Code and models available at https://github.com/adamlin120/Taiwan-LLaMa},
}
```
## Collaborate With Us
If you are interested in contributing to the development of Traditional Chinese language models, exploring new applications, or leveraging Taiwan-LLaMa for your specific needs, please don't hesitate to contact us. We welcome collaborations from academia, industry, and individual contributors.
## License
The code in this project is licensed under the Apache 2.0 License - see the [LICENSE](LICENSE) file for details.
The models included in this project are licensed under the LLAMA 2 Community License. See the [LLAMA2 License](https://github.com/facebookresearch/llama/blob/main/LICENSE) for full details.
## OpenAI Data Acknowledgment
The data included in this project were generated using OpenAI's models and are subject to OpenAI's Terms of Use. Please review [OpenAI's Terms of Use](https://openai.com/policies/terms-of-use) for details on usage and limitations.
## Acknowledgements
We thank [Meta LLaMA team](https://github.com/facebookresearch/llama) and [Vicuna team](https://github.com/lm-sys/FastChat) for their open-source efforts in democratizing large language models.