☕ Qehwa — Pashto's First LLM

The first and best Pakistani Pashto large language model — specifically trained on Peshawari dialect.

Built by a solo developer as a free and open resource for 60+ million Pashto speakers worldwide.

⚠️ This model performs best on Pakistani/Peshawari Pashto. Performance may be lower on Afghan Pashto dialect.


🌟 Model Description

Qehwa is a fully instruction-tuned Pashto language model built on top of Qwen2.5-7B. It is the result of two-stage training:

  1. Continued Pre-Training (CPT) on 3.4 million clean Pakistani Pashto documents
  2. Supervised Fine-Tuning (SFT) on 126,519 high-quality Peshawari Pashto instruction-response pairs

This is the first dedicated Pakistani Pashto LLM — no comparable model exists publicly. It specifically targets the Peshawari/KPK dialect rather than generic or Afghan Pashto.

This repo contains the fully merged model — ready to use with standard transformers, no additional libraries required.


✨ Capabilities

  • ✅ Answers questions in pure Peshawari Pashto
  • ✅ Responds to English instructions in Pashto
  • ✅ Responds to Urdu instructions in Pashto
  • ✅ Natural Pashto conversation
  • ✅ Pashto creative writing and poetry
  • ✅ Islamic topics in Pashto
  • ✅ KPK history, culture, and geography
  • ✅ Pashtunwali traditions and ethics
  • ✅ Pashto grammar correction
  • ✅ English to Pashto translation
  • ✅ Correct Pashto-specific characters: ښ ږ ټ ډ ړ ځ

📊 Evaluation Results

Qehwa was evaluated on a custom benchmark of 150 tests across 15 categories — the first ever comprehensive Pashto LLM benchmark. Since no standard Pashto benchmark exists publicly, this evaluation was designed specifically for Pakistani Pashto.

Top Performing Categories

Category Score
English → Pashto 90% 🔥🔥
Urdu → Pashto 84% 🔥🔥
Health & Daily Life in Pashto 90% 🔥🔥
Culture & History 90% 🔥
Geography & Nature 90% 🔥

Overall Average Accuracy across all 15 benchmark categories: 85.3%

Evaluation Methodology

  • 150 custom Pashto prompts across 15 categories
  • Evaluated on A100 40GB GPU
  • Human reviewed outputs for fluency, accuracy and dialect correctness
  • No existing Pashto benchmark was available — this is the first Pashto LLM benchmark

💻 Installation

pip install transformers accelerate torch

For faster inference:

pip install unsloth

For running locally on CPU or small GPU:

pip install transformers accelerate bitsandbytes

🚀 How to Use

✅ Method 1 — Transformers (Recommended)

Best for: Research, production, standard usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "junaid008/qehwa-pashto-llm"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model     = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype = torch.bfloat16,
    device_map  = "auto",
)

ALPACA_TEMPLATE = """Below is an instruction in Pashto or English. Write a detailed response in Pashto.

### Instruction:
{}

### Response:
{}"""

def generate(prompt):
    inputs = tokenizer(
        ALPACA_TEMPLATE.format(prompt, ""),
        return_tensors = "pt",
    ).to("cuda")

    outputs = model.generate(
        **inputs,
        max_new_tokens     = 500,
        temperature        = 0.7,
        do_sample          = True,
        repetition_penalty = 1.1,
        pad_token_id       = tokenizer.eos_token_id,
    )

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response.split("### Response:")[-1].strip()

# Pashto input
print(generate("د پیښور تاریخ راته ووایه"))

# English input
print(generate("Tell me about Pashtunwali"))

# Urdu input
print(generate("پشاور کے بارے میں بتاؤ"))

✅ Method 2 — 4-bit Quantization (Low VRAM)

Best for: GPUs with 8GB VRAM or less

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

model_name = "junaid008/qehwa-pashto-llm"

bnb_config = BitsAndBytesConfig(
    load_in_4bit              = True,
    bnb_4bit_quant_type       = "nf4",
    bnb_4bit_compute_dtype    = torch.bfloat16,
    bnb_4bit_use_double_quant = True,
)

tokenizer = AutoTokenizer.from_pretrained(model_name)
model     = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config = bnb_config,
    device_map          = "auto",
)

ALPACA_TEMPLATE = """Below is an instruction in Pashto or English. Write a detailed response in Pashto.

### Instruction:
{}

### Response:
{}"""

def generate(prompt):
    inputs = tokenizer(
        ALPACA_TEMPLATE.format(prompt, ""),
        return_tensors = "pt",
    ).to("cuda")

    outputs = model.generate(
        **inputs,
        max_new_tokens     = 500,
        temperature        = 0.7,
        do_sample          = True,
        repetition_penalty = 1.1,
        pad_token_id       = tokenizer.eos_token_id,
    )

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response.split("### Response:")[-1].strip()

print(generate("پښتونولي تشریح کړه"))

✅ Method 3 — Unsloth (2x Faster Inference)

Best for: Speed-optimized usage, Colab, A100/H100

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name     = "junaid008/qehwa-pashto-llm",
    max_seq_length = 2048,
    dtype          = None,
    load_in_4bit   = False,
)
FastLanguageModel.for_inference(model)

ALPACA_TEMPLATE = """Below is an instruction in Pashto or English. Write a detailed response in Pashto.

### Instruction:
{}

### Response:
{}"""

import torch
inputs = tokenizer(
    ALPACA_TEMPLATE.format("د پیښور تاریخ راته ووایه", ""),
    return_tensors = "pt",
).to("cuda")

outputs = model.generate(
    **inputs,
    max_new_tokens     = 500,
    temperature        = 0.7,
    do_sample          = True,
    repetition_penalty = 1.1,
    pad_token_id       = tokenizer.pad_token_id,
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response.split("### Response:")[-1].strip())

✅ Method 4 — CPU Only (No GPU)

Best for: Testing on laptop, no GPU available (slow but works)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "junaid008/qehwa-pashto-llm"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model     = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype = torch.float32,  # float32 for CPU
    device_map  = "cpu",
)

ALPACA_TEMPLATE = """Below is an instruction in Pashto or English. Write a detailed response in Pashto.

### Instruction:
{}

### Response:
{}"""

inputs = tokenizer(
    ALPACA_TEMPLATE.format("پښتو ژبه د چا ده؟", ""),
    return_tensors = "pt",
)

outputs = model.generate(
    **inputs,
    max_new_tokens = 200,
    do_sample      = False,   # greedy for CPU speed
    pad_token_id   = tokenizer.eos_token_id,
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response.split("### Response:")[-1].strip())

✅ Method 5 — Google Colab (Free)

Best for: Trying without any local setup

Open in Colab and run:

# Install
!pip install transformers accelerate -q

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained("junaid008/qehwa-pashto-llm")
model     = AutoModelForCausalLM.from_pretrained(
    "junaid008/qehwa-pashto-llm",
    torch_dtype = torch.bfloat16,
    device_map  = "auto",
)

ALPACA_TEMPLATE = """Below is an instruction in Pashto or English. Write a detailed response in Pashto.

### Instruction:
{}

### Response:
{}"""

def generate(prompt):
    inputs  = tokenizer(ALPACA_TEMPLATE.format(prompt, ""), return_tensors="pt").to("cuda")
    outputs = model.generate(**inputs, max_new_tokens=500, temperature=0.7,
                              do_sample=True, pad_token_id=tokenizer.eos_token_id)
    return tokenizer.decode(outputs[0], skip_special_tokens=True).split("### Response:")[-1].strip()

print(generate("Tell me about Peshawar"))
print(generate("پښتونولي تشریح کړه"))
print(generate("پشاور کا مشہور کھانا کیا ہے؟"))

⚙️ Hardware Requirements

Method VRAM Speed
bfloat16 full 16GB+ ✅ Fast
4-bit quantized 8GB+ ✅ Good
Unsloth 16GB+ 🔥 2x Faster
CPU only No GPU ⚠️ Slow

📊 Training Details

Stage 1 — Continued Pre-Training (CPT)

Parameter Value
Base model Qwen/Qwen2.5-7B
Hardware NVIDIA A100-SXM4-40GB
Training steps 5,000
Final CPT loss ~1.8
Dataset size 3,400,000 documents
Sequence length 2,048 tokens
Precision bfloat16
LoRA rank 64
Learning rate 5e-5
Effective batch size 32

Stage 2 — Supervised Fine-Tuning (SFT)

Parameter Value
Base model junaid008/pashto-qwen2.5-7b-v3 (CPT)
Hardware NVIDIA A100-SXM4-40GB
Training steps 7,908
Final SFT loss 0.455
Dataset size 126,519 pairs
Epochs 2
Sequence length 2,048 tokens
Precision bfloat16
LoRA rank 64
Learning rate 5e-5
Effective batch size 32

📚 Dataset

CPT Dataset

  • 3.4 million Pakistani Pashto documents
  • Sources: news, books, religious texts, Wikipedia, web crawl
  • Custom cleaned with Pashto-specific Unicode normalization

SFT Dataset — 126,519 pairs:

Type Description
Pashto → Pashto Questions and answers in pure Peshawari Pashto
English → Pashto English instructions with Pashto responses
Urdu → Pashto Urdu instructions with Pashto responses
Conversation Natural dialogue in Peshawari dialect
Islamic topics Religious knowledge in Pashto
Creative writing Pashto poetry, stories, descriptions
Grammar Pashto language correction and explanation
Translation English to Pashto translation pairs
Cultural Pashtunwali, traditions, history, geography

🗣️ Example Outputs

Pashto input:

Prompt : د پیښور تاریخ راته ووایه
Output : پیښور د نړۍ یو له تر ټولو پخوانیو ښارونو څخه دی چې تاریخ یې زرګونه
         کاله پخوا ته رسیږي. د دې ښار پخوانی نوم پوشکلاوتي او بیا پروشاپور و.
         د پیښور د بالاحصار کلا د مغلو او سکهانو د وخت یادګار دی...

English input:

Prompt : Tell me about Pashtunwali
Output : پښتونولي د پښتنو د ژوند هغه لرغونی او سپېڅلی ضابطه اخلاق ده چې د دوی
         ټولنیز ژوند یې تنظیم کړی دی. په دې کې مېلمه پالنه، غیرت، او د کمزورو
         ساتنه تر ټولو لوی ارزښتونه ګڼل کېږي...

Urdu input:

Prompt : پشتونوالی کیا ہے؟
Output : پښتونولي د پښتنو د ژوند هغه لرغونی او زرین ضابطه اخلاق ده...

👤 Author

Junaid Aslam

  • HuggingFace: junaid008
  • Built independently as a contribution to Pashto NLP

📜 License

Apache 2.0 — free to use, modify, and distribute with attribution.


🤝 Citation

@misc{qehwa-pashto-llm,
  author    = {Junaid Aslam},
  title     = {Qehwa — Pashto's First LLM},
  year      = {2026},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/junaid008/qehwa-pashto-llm}
}
Downloads last month
337
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for junaid008/qehwa-pashto-llm

Base model

Qwen/Qwen2.5-7B
Finetuned
(849)
this model
Finetunes
1 model

Space using junaid008/qehwa-pashto-llm 1