--- language: - ar - en license: apache-2.0 base_model: Qwen/Qwen3.5-0.8B tags: - arabic - tawkeed - edge - on-device - fine-tuned --- # tawkeed-0.8b **tawkeed-0.8b** is an Arabic-first language model built by [Tawkeed](https://huggingface.co/tawkeed-sa), fine-tuned for on-device and edge AI deployment. Forked from [Qwen/Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B) and fine-tuned on large-scale Arabic corpora, this model is optimized to run natively on Tawkeed devices — delivering fast, private, Arabic-language AI at the edge. ## Highlights - **Arabic-first** — trained and rigorously tested on Arabic text across diverse domains - **Edge-optimized** — sized and tuned to run efficiently on Tawkeed edge hardware - **Production-ready** — validated on Tawkeed's Arabic benchmark suite for real-world accuracy - **Bilingual** — retains strong English capability from the base model ## Model Details | Property | Value | |---|---| | Base Model | [Qwen/Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B) | | Parameters | 0.8b | | Language | Arabic (ar), English (en) | | License | Apache 2.0 | | Fine-tuning | Continued pretraining + SFT on Arabic data | | Deployment | On-device / Edge / Cloud | ## Training This model is fine-tuned through a multi-stage Arabic enhancement pipeline: 1. **Continued pretraining** on Arabic corpora — Wikipedia, CulturaX, OSCAR 2. **Supervised fine-tuning (SFT)** on curated Arabic instruction datasets — OALL, Alpaca-GPT4-Arabic, Aya 3. **Evaluation** on Tawkeed's Arabic benchmark suite to ensure quality across generation, comprehension, and reasoning tasks ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("tawkeed-sa/tawkeed-0.8b") tokenizer = AutoTokenizer.from_pretrained("tawkeed-sa/tawkeed-0.8b") messages = [{"role": "user", "content": "ما هي عاصمة المملكة العربية السعودية؟"}] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=512) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Tawkeed Model Family A complete suite of Arabic AI models — from compact edge models to large-scale MoE — all fine-tuned and tested for Arabic. | Model | Size | Type | |---|---|---| | [tawkeed-sa/tawkeed-0.8b](https://huggingface.co/tawkeed-sa/tawkeed-0.8b) | 0.8b | Arabic LLM | | [tawkeed-sa/tawkeed-2b](https://huggingface.co/tawkeed-sa/tawkeed-2b) | 2b | Arabic LLM | | [tawkeed-sa/tawkeed-4b](https://huggingface.co/tawkeed-sa/tawkeed-4b) | 4b | Arabic LLM | | [tawkeed-sa/tawkeed-9b](https://huggingface.co/tawkeed-sa/tawkeed-9b) | 9b | Arabic LLM | | [tawkeed-sa/tawkeed-27b](https://huggingface.co/tawkeed-sa/tawkeed-27b) | 27b | Arabic LLM | | [tawkeed-sa/tawkeed-40b](https://huggingface.co/tawkeed-sa/tawkeed-40b) | 40b | Arabic LLM | | [tawkeed-sa/tawkeed-27b-MLX](https://huggingface.co/tawkeed-sa/tawkeed-27b-MLX) | 27b 8-bit | LLM — Apple Silicon (MLX) | | [tawkeed-sa/tawkeed-27b-GGUF](https://huggingface.co/tawkeed-sa/tawkeed-27b-GGUF) | 27b Q8_0 | LLM — Ollama / llama.cpp | | [tawkeed-sa/tawkeed-ocr](https://huggingface.co/tawkeed-sa/tawkeed-ocr) | — | OCR | | [tawkeed-sa/tawkeed-embedding](https://huggingface.co/tawkeed-sa/tawkeed-embedding) | — | Embedding | ## About Tawkeed Tawkeed builds Arabic-native AI that runs on the edge. Every model in the family is fine-tuned for Arabic, tested on Arabic benchmarks, and optimized for deployment on Tawkeed devices. Built by [Tawkeed](https://huggingface.co/tawkeed-sa).