---
license: apache-2.0
datasets:
- yentinglin/TaiwanChat
language:
- zh
base_model:
- HuggingFaceTB/SmolLM2-135M-Instruct
pipeline_tag: text-generation
---
# SmolLM2‑135M‑Instruct‑TaiwanChat

A fine‑tuned SmolLM2‑135M Instruct model on the TaiwanChat dataset, optimized for multi‑turn Traditional Chinese conversational AI.

---

## Model Description

- **Base model:** `HuggingFaceTB/SmolLM2-135M-Instruct`  
- **Fine‑tuned on:** `yentinglin/TaiwanChat` (first 3 000 training samples)  
- **Task:** Instruction‑tuned chat in Mandarin/Taiwanese  
- **Framework:** Hugging Face Transformers [`Trainer`]  
- **Precision:**  
  - BF16 on Intel XPU (if available)  
  - FP16 on CUDA (if available)  
  - Falls back to CPU otherwise  
- **Memory optimizations:** Gradient checkpointing enabled  

---

## How to Use

### 1. Install dependencies

```bash
pip install transformers datasets accelerate
```

### 2. Load & Generate

```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

model_id = "Luigi/SmolLM2-135M-Instruct-TaiwanChat"

# 1) Select device
device_str = "cpu"
if torch.xpu.is_available():
    device_str = "xpu"
elif torch.cuda.is_available():
    device_str = "cuda"

# 2) Load tokenizer & model
tokenizer = AutoTokenizer.from_pretrained(model_id)
model     = AutoModelForCausalLM.from_pretrained(model_id).to(device_str)

# 3) Set up HF pipeline (use integer index for device)
hf_device = 0 if device_str in ("cuda", "xpu") else -1
generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device=hf_device,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.8,
)

# 4) Inference example
prompt = "請問台北今天的天氣如何？"
result = generator(prompt)
print(result[0]["generated_text"])
```

---

## Training Script

All training logic is contained in  
`SmolLM2-135M-Instruct-TaiwanChat.py`.

**Key settings** (hard‑coded at top of script):

```python
PROJECT_NAME = "SmolLM2-135M-Instruct-TaiwanChat"
BASE_MODEL_ID = "HuggingFaceTB/SmolLM2-135M-Instruct"
DATASET_ID    = "yentinglin/TaiwanChat"
N_SAMPLES     = 3000
MAX_LEN       = 512
```

**Trainer hyperparameters**:

- `per_device_train_batch_size=4`  
- `learning_rate=5e-5`  
- `num_train_epochs=3`  
- `fp16` on CUDA, `bf16` on XPU  
- `logging_steps=1000`  
- `save_steps=5000`  
- `gradient_checkpointing=True`  
- `push_to_hub=True`  

### Run training

```bash
python SmolLM2-135M-Instruct-TaiwanChat.py
```

The script will:

1. Auto‑detect and select **XPU**, **CUDA**, or **CPU**.  
2. Load & preprocess the first 3 000 samples from TaiwanChat.  
3. Fine‑tune the model with your chosen precision and logging.  
4. Save the fine‑tuned model & tokenizer under `./SmolLM2-135M-Instruct-TaiwanChat`.  
5. Push the checkpoint to `huggingface.co/Luigi/SmolLM2-135M-Instruct-TaiwanChat`.

---

## Limitations

- Trained on a subset (3 000 samples) only – may underperform on out‑of‑domain queries.  
- No separate validation or evaluation loop in the script.  
- Generated responses may be incorrect or inconsistent – always verify before production use.

---

## License

- **Code** is released under the [Apache 2.0 License](LICENSE).
- **Training data** (and any derived model weights) are licensed under [CC BY‑NC 4.0](LICENSE-CC-BY-NC-4.0) – non‑commercial only.

You may use and modify the code for any purpose, but any use of the dataset or the models trained on it must comply with the CC BY‑NC 4.0 terms.

## Citation

If you use this model, please cite:

```bibtex
@misc{SmolLM2TaiwanChat2025,
  title        = {SmolLM2‑135M‑Instruct‑TaiwanChat},
  author       = {Luigi Liu},
  year         = {2025},
  howpublished = {\url{https://huggingface.co/Luigi/SmolLM2-135M-Instruct-TaiwanChat}}
}
```