|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- zh |
|
|
- en |
|
|
library_name: transformers |
|
|
pipeline_tag: text-generation |
|
|
tags: |
|
|
- causal-lm |
|
|
- bilingual |
|
|
- chinese |
|
|
- chat |
|
|
- conversational |
|
|
- llm |
|
|
- pytorch |
|
|
- rope |
|
|
- gqa |
|
|
metrics: |
|
|
- accuracy |
|
|
model-index: |
|
|
- name: Kirim-V1-Base |
|
|
results: [] |
|
|
widget: |
|
|
- text: 你好,请介绍一下自己 |
|
|
example_title: Chinese Greeting |
|
|
- text: Write a Python function to sort a list |
|
|
example_title: Code Generation |
|
|
- text: 解释一下量子计算的基本原理 |
|
|
example_title: Technical Explanation |
|
|
--- |
|
|
|
|
|
# Kirim-V1-Base |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
<img src="https://imgur.com/n5uQJXF.png" alt="Helion-V2-Thinking Logo" width="70%"/> |
|
|
|
|
|
</div> |
|
|
|
|
|
--- |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
**A high-performance bilingual language model optimized for Chinese understanding with English interface** |
|
|
|
|
|
[中文文档](README_CN.md) | [Model Card](MODEL_CARD.md) |
|
|
|
|
|
</div> |
|
|
|
|
|
|
|
|
## Introduction |
|
|
|
|
|
This release of Kirim-V1-Base is designed to deliver exceptional Chinese language understanding while maintaining strong English capabilities. The model addresses several key areas based on community feedback: |
|
|
|
|
|
* **Language consistency**: Significantly reduced instances of mixed Chinese-English responses and eliminated abnormal character generation; |
|
|
* **Reasoning capabilities**: Enhanced logical reasoning and step-by-step problem solving across both languages; |
|
|
* **Code generation**: Improved code quality with better comment generation in the user's preferred language; |
|
|
* **Context retention**: Better long-context understanding up to 32K tokens with optimized attention mechanisms. |
|
|
|
|
|
The model employs an efficient architecture with Grouped Query Attention (GQA) and YaRN RoPE scaling, enabling superior performance while maintaining computational efficiency. Kirim-V1-Base excels at natural conversations, technical discussions, creative writing, and code generation in both Chinese and English. |
|
|
|
|
|
--- |
|
|
|
|
|
## How to Run Locally |
|
|
|
|
|
### Installation |
|
|
|
|
|
First, install the required dependencies: |
|
|
|
|
|
```bash |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
Or install manually: |
|
|
|
|
|
```bash |
|
|
pip install torch>=2.0.0 transformers>=4.36.0 accelerate sentencepiece |
|
|
``` |
|
|
|
|
|
### Quick Start |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
# Load model and tokenizer |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
"Kirim-ai/Kirim-V1-base", |
|
|
torch_dtype="auto", |
|
|
device_map="auto", |
|
|
trust_remote_code=True |
|
|
) |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained( |
|
|
"Kirim-ai/Kirim-V1-base", |
|
|
trust_remote_code=True |
|
|
) |
|
|
|
|
|
# Prepare conversation |
|
|
messages = [ |
|
|
{"role": "system", "content": "You are Kirim, a helpful AI assistant proficient in both Chinese and English."}, |
|
|
{"role": "user", "content": "介绍一下深度学习的基本原理"} |
|
|
] |
|
|
|
|
|
# Apply chat template |
|
|
text = tokenizer.apply_chat_template( |
|
|
messages, |
|
|
tokenize=False, |
|
|
add_generation_prompt=True |
|
|
) |
|
|
|
|
|
# Tokenize and generate |
|
|
inputs = tokenizer([text], return_tensors="pt").to(model.device) |
|
|
|
|
|
outputs = model.generate( |
|
|
**inputs, |
|
|
max_new_tokens=2048, |
|
|
temperature=0.7, |
|
|
top_p=0.9, |
|
|
top_k=50, |
|
|
repetition_penalty=1.1, |
|
|
do_sample=True |
|
|
) |
|
|
|
|
|
# Decode response |
|
|
response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True) |
|
|
print(response) |
|
|
``` |
|
|
|
|
|
### Using the Inference Script |
|
|
|
|
|
We provide a convenient inference script for easy interaction: |
|
|
|
|
|
```bash |
|
|
# Interactive chat mode |
|
|
python inference.py --model_path Kirim-ai/Kirim-V1-base --chat |
|
|
|
|
|
# Single prompt generation |
|
|
python inference.py --prompt "Explain quantum computing in simple terms" |
|
|
|
|
|
# With 4-bit quantization (requires 12GB+ VRAM) |
|
|
python inference.py --load_in_4bit --chat |
|
|
|
|
|
# With 8-bit quantization (requires 16GB+ VRAM) |
|
|
python inference.py --load_in_8bit --chat |
|
|
``` |
|
|
|
|
|
### Deployment Options |
|
|
|
|
|
**Full Precision (BF16)** |
|
|
- Memory Required: ~24GB VRAM |
|
|
- Best quality and performance |
|
|
```python |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
"Kirim-ai/Kirim-V1-base", |
|
|
torch_dtype=torch.bfloat16, |
|
|
device_map="auto" |
|
|
) |
|
|
``` |
|
|
|
|
|
**8-bit Quantization** |
|
|
- Memory Required: ~16GB VRAM |
|
|
- Minimal quality loss |
|
|
```python |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
"Kirim-ai/Kirim-V1-base", |
|
|
load_in_8bit=True, |
|
|
device_map="auto" |
|
|
) |
|
|
``` |
|
|
|
|
|
**4-bit Quantization** |
|
|
- Memory Required: ~12GB VRAM |
|
|
- Good for consumer GPUs |
|
|
```python |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
"Kirim-ai/Kirim-V1-base", |
|
|
load_in_4bit=True, |
|
|
device_map="auto" |
|
|
) |
|
|
``` |
|
|
|
|
|
### Chat Template |
|
|
|
|
|
The model uses the following chat template format: |
|
|
|
|
|
``` |
|
|
<|begin_of_text|><|system|> |
|
|
{system_message} |
|
|
<|user|> |
|
|
{user_message} |
|
|
<|assistant|> |
|
|
{assistant_response} |
|
|
``` |
|
|
|
|
|
You can customize the system prompt to adjust the model's behavior: |
|
|
|
|
|
```python |
|
|
messages = [ |
|
|
{"role": "system", "content": "你是一个专业的Python编程助手,请用中文回答问题。"}, |
|
|
{"role": "user", "content": "如何优化这段代码?"} |
|
|
] |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Architecture |
|
|
|
|
|
| Parameter | Value | |
|
|
|-----------|-------| |
|
|
| Model Type | Causal Language Model | |
|
|
| Architecture | Decoder-only Transformer | |
|
|
| Hidden Size | 4096 | |
|
|
| Layers | 32 | |
|
|
| Attention Heads | 32 | |
|
|
| KV Heads | 8 (Grouped Query Attention) | |
|
|
| Vocabulary Size | 102,400 | |
|
|
| Context Length | 32,768 tokens | |
|
|
| Activation Function | SiLU | |
|
|
| Position Encoding | RoPE with YaRN scaling (factor: 2.0) | |
|
|
| Normalization | RMSNorm (eps: 1e-6) | |
|
|
| Precision | BFloat16 | |
|
|
| Total Parameters | ~13B | |
|
|
|
|
|
--- |
|
|
|
|
|
|
|
|
## Features & Capabilities |
|
|
|
|
|
### Bilingual Proficiency |
|
|
- Native-level Chinese understanding and generation |
|
|
- Fluent English communication |
|
|
- Seamless code-switching when appropriate |
|
|
- Cultural context awareness |
|
|
|
|
|
### Code Generation |
|
|
- Multi-language code generation (Python, JavaScript, Java, C++, etc.) |
|
|
- Code explanation and debugging |
|
|
- Algorithm implementation |
|
|
- Best practices and optimization suggestions |
|
|
|
|
|
### Reasoning & Analysis |
|
|
- Step-by-step problem solving |
|
|
- Mathematical reasoning |
|
|
- Logical deduction |
|
|
- Critical thinking and analysis |
|
|
|
|
|
### Creative Writing |
|
|
- Story generation |
|
|
- Poetry and creative content |
|
|
- Content summarization |
|
|
- Style adaptation |
|
|
|
|
|
### Technical Knowledge |
|
|
- Programming and software development |
|
|
- Mathematics and science |
|
|
- Technology and engineering |
|
|
- Business and finance |
|
|
|
|
|
--- |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- **No vision capabilities**: This model processes text only and cannot interpret images, diagrams, or visual content |
|
|
- **Knowledge cutoff**: Training data up to October 2025 |
|
|
- **Potential hallucinations**: May occasionally generate plausible-sounding but incorrect information |
|
|
- **Bias**: May reflect biases present in training data |
|
|
- **Arithmetic**: May struggle with complex calculations without step-by-step reasoning |
|
|
|
|
|
--- |
|
|
|
|
|
|
|
|
## License |
|
|
|
|
|
This model is released under the **Apache License 2.0**. See [LICENSE](LICENSE) for full details. |
|
|
|
|
|
You are free to: |
|
|
- Use the model commercially |
|
|
- Modify and distribute the model |
|
|
- Use the model for research |
|
|
|
|
|
With the following conditions: |
|
|
- Provide attribution |
|
|
- Include the license |
|
|
- State any changes made |
|
|
|
|
|
--- |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use Kirim-V1-Base in your research or applications, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{kirim2025v1base, |
|
|
title={Kirim-V1-Base: A High-Performance Bilingual Language Model}, |
|
|
author={Kirim AI Team}, |
|
|
year={2025}, |
|
|
url={https://huggingface.co/Kirim-ai/Kirim-V1} |
|
|
} |
|
|
``` |