README.md · Kirim-ai/Kirim-V1-Base at main

Kirim-V1-Base / README.md

Kirim1

Update README.md

85c0d33 verified 29 days ago

preview code

raw

history blame contribute delete

7.24 kB

	---
	license: apache-2.0
	language:
	- zh
	- en
	library_name: transformers
	pipeline_tag: text-generation
	tags:
	- causal-lm
	- bilingual
	- chinese
	- chat
	- conversational
	- llm
	- pytorch
	- rope
	- gqa
	metrics:
	- accuracy
	model-index:
	- name: Kirim-V1-Base
	results: []
	widget:
	- text: 你好，请介绍一下自己
	example_title: Chinese Greeting
	- text: Write a Python function to sort a list
	example_title: Code Generation
	- text: 解释一下量子计算的基本原理
	example_title: Technical Explanation
	---

	# Kirim-V1-Base

	<div align="center">

	<img src="https://imgur.com/n5uQJXF.png" alt="Helion-V2-Thinking Logo" width="70%"/>

	</div>

	---

	<div align="center">

	A high-performance bilingual language model optimized for Chinese understanding with English interface

	[中文文档](README_CN.md) \| [Model Card](MODEL_CARD.md)

	</div>


	## Introduction

	This release of Kirim-V1-Base is designed to deliver exceptional Chinese language understanding while maintaining strong English capabilities. The model addresses several key areas based on community feedback:

	* Language consistency: Significantly reduced instances of mixed Chinese-English responses and eliminated abnormal character generation;
	* Reasoning capabilities: Enhanced logical reasoning and step-by-step problem solving across both languages;
	* Code generation: Improved code quality with better comment generation in the user's preferred language;
	* Context retention: Better long-context understanding up to 32K tokens with optimized attention mechanisms.

	The model employs an efficient architecture with Grouped Query Attention (GQA) and YaRN RoPE scaling, enabling superior performance while maintaining computational efficiency. Kirim-V1-Base excels at natural conversations, technical discussions, creative writing, and code generation in both Chinese and English.

	---

	## How to Run Locally

	### Installation

	First, install the required dependencies:

	```bash
	pip install -r requirements.txt
	```

	Or install manually:

	```bash
	pip install torch>=2.0.0 transformers>=4.36.0 accelerate sentencepiece
	```

	### Quick Start

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	# Load model and tokenizer
	model = AutoModelForCausalLM.from_pretrained(
	"Kirim-ai/Kirim-V1-base",
	torch_dtype="auto",
	device_map="auto",
	trust_remote_code=True
	)

	tokenizer = AutoTokenizer.from_pretrained(
	"Kirim-ai/Kirim-V1-base",
	trust_remote_code=True
	)

	# Prepare conversation
	messages = [
	{"role": "system", "content": "You are Kirim, a helpful AI assistant proficient in both Chinese and English."},
	{"role": "user", "content": "介绍一下深度学习的基本原理"}
	]

	# Apply chat template
	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)

	# Tokenize and generate
	inputs = tokenizer([text], return_tensors="pt").to(model.device)

	outputs = model.generate(
	**inputs,
	max_new_tokens=2048,
	temperature=0.7,
	top_p=0.9,
	top_k=50,
	repetition_penalty=1.1,
	do_sample=True
	)

	# Decode response
	response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
	print(response)
	```

	### Using the Inference Script

	We provide a convenient inference script for easy interaction:

	```bash
	# Interactive chat mode
	python inference.py --model_path Kirim-ai/Kirim-V1-base --chat

	# Single prompt generation
	python inference.py --prompt "Explain quantum computing in simple terms"

	# With 4-bit quantization (requires 12GB+ VRAM)
	python inference.py --load_in_4bit --chat

	# With 8-bit quantization (requires 16GB+ VRAM)
	python inference.py --load_in_8bit --chat
	```

	### Deployment Options

	Full Precision (BF16)
	- Memory Required: ~24GB VRAM
	- Best quality and performance
	```python
	model = AutoModelForCausalLM.from_pretrained(
	"Kirim-ai/Kirim-V1-base",
	torch_dtype=torch.bfloat16,
	device_map="auto"
	)
	```

	8-bit Quantization
	- Memory Required: ~16GB VRAM
	- Minimal quality loss
	```python
	model = AutoModelForCausalLM.from_pretrained(
	"Kirim-ai/Kirim-V1-base",
	load_in_8bit=True,
	device_map="auto"
	)
	```

	4-bit Quantization
	- Memory Required: ~12GB VRAM
	- Good for consumer GPUs
	```python
	model = AutoModelForCausalLM.from_pretrained(
	"Kirim-ai/Kirim-V1-base",
	load_in_4bit=True,
	device_map="auto"
	)
	```

	### Chat Template

	The model uses the following chat template format:

	```
	<\|begin_of_text\|><\|system\|>
	{system_message}
	<\|user\|>
	{user_message}
	<\|assistant\|>
	{assistant_response}
	```

	You can customize the system prompt to adjust the model's behavior:

	```python
	messages = [
	{"role": "system", "content": "你是一个专业的Python编程助手，请用中文回答问题。"},
	{"role": "user", "content": "如何优化这段代码？"}
	]
	```

	---

	## Model Architecture

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Model Type \| Causal Language Model \|
	\| Architecture \| Decoder-only Transformer \|
	\| Hidden Size \| 4096 \|
	\| Layers \| 32 \|
	\| Attention Heads \| 32 \|
	\| KV Heads \| 8 (Grouped Query Attention) \|
	\| Vocabulary Size \| 102,400 \|
	\| Context Length \| 32,768 tokens \|
	\| Activation Function \| SiLU \|
	\| Position Encoding \| RoPE with YaRN scaling (factor: 2.0) \|
	\| Normalization \| RMSNorm (eps: 1e-6) \|
	\| Precision \| BFloat16 \|
	\| Total Parameters \| ~13B \|

	---


	## Features & Capabilities

	### Bilingual Proficiency
	- Native-level Chinese understanding and generation
	- Fluent English communication
	- Seamless code-switching when appropriate
	- Cultural context awareness

	### Code Generation
	- Multi-language code generation (Python, JavaScript, Java, C++, etc.)
	- Code explanation and debugging
	- Algorithm implementation
	- Best practices and optimization suggestions

	### Reasoning & Analysis
	- Step-by-step problem solving
	- Mathematical reasoning
	- Logical deduction
	- Critical thinking and analysis

	### Creative Writing
	- Story generation
	- Poetry and creative content
	- Content summarization
	- Style adaptation

	### Technical Knowledge
	- Programming and software development
	- Mathematics and science
	- Technology and engineering
	- Business and finance

	---

	## Limitations

	- No vision capabilities: This model processes text only and cannot interpret images, diagrams, or visual content
	- Knowledge cutoff: Training data up to October 2025
	- Potential hallucinations: May occasionally generate plausible-sounding but incorrect information
	- Bias: May reflect biases present in training data
	- Arithmetic: May struggle with complex calculations without step-by-step reasoning

	---


	## License

	This model is released under the Apache License 2.0. See [LICENSE](LICENSE) for full details.

	You are free to:
	- Use the model commercially
	- Modify and distribute the model
	- Use the model for research

	With the following conditions:
	- Provide attribution
	- Include the license
	- State any changes made

	---

	## Citation

	If you use Kirim-V1-Base in your research or applications, please cite:

	```bibtex
	@misc{kirim2025v1base,
	title={Kirim-V1-Base: A High-Performance Bilingual Language Model},
	author={Kirim AI Team},
	year={2025},
	url={https://huggingface.co/Kirim-ai/Kirim-V1}
	}
	```