---
language:
  - ar
  - en
license: apache-2.0
base_model: Qwen/Qwen3.5-0.8B
tags:
  - arabic
  - tawkeed
  - edge
  - on-device
  - fine-tuned
---

# tawkeed-0.8b

**tawkeed-0.8b** is an Arabic-first language model built by [Tawkeed](https://huggingface.co/tawkeed-sa), fine-tuned for on-device and edge AI deployment.

Forked from [Qwen/Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B) and fine-tuned on large-scale Arabic corpora, this model is optimized to run natively on Tawkeed devices — delivering fast, private, Arabic-language AI at the edge.

## Highlights

- **Arabic-first** — trained and rigorously tested on Arabic text across diverse domains
- **Edge-optimized** — sized and tuned to run efficiently on Tawkeed edge hardware
- **Production-ready** — validated on Tawkeed's Arabic benchmark suite for real-world accuracy
- **Bilingual** — retains strong English capability from the base model

## Model Details

| Property | Value |
|---|---|
| Base Model | [Qwen/Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B) |
| Parameters | 0.8b |
| Language | Arabic (ar), English (en) |
| License | Apache 2.0 |
| Fine-tuning | Continued pretraining + SFT on Arabic data |
| Deployment | On-device / Edge / Cloud |

## Training

This model is fine-tuned through a multi-stage Arabic enhancement pipeline:

1. **Continued pretraining** on Arabic corpora — Wikipedia, CulturaX, OSCAR
2. **Supervised fine-tuning (SFT)** on curated Arabic instruction datasets — OALL, Alpaca-GPT4-Arabic, Aya
3. **Evaluation** on Tawkeed's Arabic benchmark suite to ensure quality across generation, comprehension, and reasoning tasks

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("tawkeed-sa/tawkeed-0.8b")
tokenizer = AutoTokenizer.from_pretrained("tawkeed-sa/tawkeed-0.8b")

messages = [{"role": "user", "content": "ما هي عاصمة المملكة العربية السعودية؟"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## Tawkeed Model Family

A complete suite of Arabic AI models — from compact edge models to large-scale MoE — all fine-tuned and tested for Arabic.

| Model | Size | Type |
|---|---|---|
| [tawkeed-sa/tawkeed-0.8b](https://huggingface.co/tawkeed-sa/tawkeed-0.8b) | 0.8b | Arabic LLM |
| [tawkeed-sa/tawkeed-2b](https://huggingface.co/tawkeed-sa/tawkeed-2b) | 2b | Arabic LLM |
| [tawkeed-sa/tawkeed-4b](https://huggingface.co/tawkeed-sa/tawkeed-4b) | 4b | Arabic LLM |
| [tawkeed-sa/tawkeed-9b](https://huggingface.co/tawkeed-sa/tawkeed-9b) | 9b | Arabic LLM |
| [tawkeed-sa/tawkeed-27b](https://huggingface.co/tawkeed-sa/tawkeed-27b) | 27b | Arabic LLM |
| [tawkeed-sa/tawkeed-40b](https://huggingface.co/tawkeed-sa/tawkeed-40b) | 40b | Arabic LLM |
| [tawkeed-sa/tawkeed-27b-MLX](https://huggingface.co/tawkeed-sa/tawkeed-27b-MLX) | 27b 8-bit | LLM — Apple Silicon (MLX) |
| [tawkeed-sa/tawkeed-27b-GGUF](https://huggingface.co/tawkeed-sa/tawkeed-27b-GGUF) | 27b Q8_0 | LLM — Ollama / llama.cpp |
| [tawkeed-sa/tawkeed-ocr](https://huggingface.co/tawkeed-sa/tawkeed-ocr) | — | OCR |
| [tawkeed-sa/tawkeed-embedding](https://huggingface.co/tawkeed-sa/tawkeed-embedding) | — | Embedding |

## About Tawkeed

Tawkeed builds Arabic-native AI that runs on the edge. Every model in the family is fine-tuned for Arabic, tested on Arabic benchmarks, and optimized for deployment on Tawkeed devices.

Built by [Tawkeed](https://huggingface.co/tawkeed-sa).