☕ Qehwa — Pashto's First LLM

The first and best Pakistani Pashto large language model — specifically trained on Peshawari dialect.

Built by a solo developer as a free and open resource for 60+ million Pashto speakers worldwide.

⚠️ This model performs best on Pakistani/Peshawari Pashto. Performance may be lower on Afghan Pashto dialect.

🌟 Model Description

Qehwa is a fully instruction-tuned Pashto language model built on top of Qwen2.5-7B. It is the result of two-stage training:

Continued Pre-Training (CPT) on 3.4 million clean Pakistani Pashto documents
Supervised Fine-Tuning (SFT) on 126,519 high-quality Peshawari Pashto instruction-response pairs

This is the first dedicated Pakistani Pashto LLM — no comparable model exists publicly. It specifically targets the Peshawari/KPK dialect rather than generic or Afghan Pashto.

This repo contains the fully merged model — ready to use with standard transformers, no additional libraries required.

✨ Capabilities

✅ Answers questions in pure Peshawari Pashto
✅ Responds to English instructions in Pashto
✅ Responds to Urdu instructions in Pashto
✅ Natural Pashto conversation
✅ Pashto creative writing and poetry
✅ Islamic topics in Pashto
✅ KPK history, culture, and geography
✅ Pashtunwali traditions and ethics
✅ Pashto grammar correction
✅ English to Pashto translation
✅ Correct Pashto-specific characters: ښ ږ ټ ډ ړ ځ

📊 Evaluation Results

Qehwa was evaluated on a custom benchmark of 150 tests across 15 categories — the first ever comprehensive Pashto LLM benchmark. Since no standard Pashto benchmark exists publicly, this evaluation was designed specifically for Pakistani Pashto.

Top Performing Categories

Category	Score
English → Pashto	90% 🔥🔥
Urdu → Pashto	84% 🔥🔥
Health & Daily Life in Pashto	90% 🔥🔥
Culture & History	90% 🔥
Geography & Nature	90% 🔥

Overall Average Accuracy across all 15 benchmark categories: 85.3%

Evaluation Methodology

150 custom Pashto prompts across 15 categories
Evaluated on A100 40GB GPU
Human reviewed outputs for fluency, accuracy and dialect correctness
No existing Pashto benchmark was available — this is the first Pashto LLM benchmark

💻 Installation

pip install transformers accelerate torch

For faster inference:

pip install unsloth

For running locally on CPU or small GPU:

pip install transformers accelerate bitsandbytes

🚀 How to Use

✅ Method 1 — Transformers (Recommended)

Best for: Research, production, standard usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "junaid008/qehwa-pashto-llm"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model     = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype = torch.bfloat16,
    device_map  = "auto",
)

ALPACA_TEMPLATE = """Below is an instruction in Pashto or English. Write a detailed response in Pashto.

### Instruction:
{}

### Response:
{}"""

def generate(prompt):
    inputs = tokenizer(
        ALPACA_TEMPLATE.format(prompt, ""),
        return_tensors = "pt",
    ).to("cuda")

    outputs = model.generate(
        **inputs,
        max_new_tokens     = 500,
        temperature        = 0.7,
        do_sample          = True,
        repetition_penalty = 1.1,
        pad_token_id       = tokenizer.eos_token_id,
    )

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response.split("### Response:")[-1].strip()

# Pashto input
print(generate("د پیښور تاریخ راته ووایه"))

# English input
print(generate("Tell me about Pashtunwali"))

# Urdu input
print(generate("پشاور کے بارے میں بتاؤ"))

✅ Method 2 — 4-bit Quantization (Low VRAM)

Best for: GPUs with 8GB VRAM or less

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

model_name = "junaid008/qehwa-pashto-llm"

bnb_config = BitsAndBytesConfig(
    load_in_4bit              = True,
    bnb_4bit_quant_type       = "nf4",
    bnb_4bit_compute_dtype    = torch.bfloat16,
    bnb_4bit_use_double_quant = True,
)

tokenizer = AutoTokenizer.from_pretrained(model_name)
model     = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config = bnb_config,
    device_map          = "auto",
)

ALPACA_TEMPLATE = """Below is an instruction in Pashto or English. Write a detailed response in Pashto.

### Instruction:
{}

### Response:
{}"""

def generate(prompt):
    inputs = tokenizer(
        ALPACA_TEMPLATE.format(prompt, ""),
        return_tensors = "pt",
    ).to("cuda")

    outputs = model.generate(
        **inputs,
        max_new_tokens     = 500,
        temperature        = 0.7,
        do_sample          = True,
        repetition_penalty = 1.1,
        pad_token_id       = tokenizer.eos_token_id,
    )

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response.split("### Response:")[-1].strip()

print(generate("پښتونولي تشریح کړه"))

✅ Method 3 — Unsloth (2x Faster Inference)

Best for: Speed-optimized usage, Colab, A100/H100

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name     = "junaid008/qehwa-pashto-llm",
    max_seq_length = 2048,
    dtype          = None,
    load_in_4bit   = False,
)
FastLanguageModel.for_inference(model)

ALPACA_TEMPLATE = """Below is an instruction in Pashto or English. Write a detailed response in Pashto.

### Instruction:
{}

### Response:
{}"""

import torch
inputs = tokenizer(
    ALPACA_TEMPLATE.format("د پیښور تاریخ راته ووایه", ""),
    return_tensors = "pt",
).to("cuda")

outputs = model.generate(
    **inputs,
    max_new_tokens     = 500,
    temperature        = 0.7,
    do_sample          = True,
    repetition_penalty = 1.1,
    pad_token_id       = tokenizer.pad_token_id,
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response.split("### Response:")[-1].strip())

✅ Method 4 — CPU Only (No GPU)

Best for: Testing on laptop, no GPU available (slow but works)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "junaid008/qehwa-pashto-llm"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model     = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype = torch.float32,  # float32 for CPU
    device_map  = "cpu",
)

ALPACA_TEMPLATE = """Below is an instruction in Pashto or English. Write a detailed response in Pashto.

### Instruction:
{}

### Response:
{}"""

inputs = tokenizer(
    ALPACA_TEMPLATE.format("پښتو ژبه د چا ده؟", ""),
    return_tensors = "pt",
)

outputs = model.generate(
    **inputs,
    max_new_tokens = 200,
    do_sample      = False,   # greedy for CPU speed
    pad_token_id   = tokenizer.eos_token_id,
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response.split("### Response:")[-1].strip())

✅ Method 5 — Google Colab (Free)

Best for: Trying without any local setup

Open in Colab and run:

# Install
!pip install transformers accelerate -q

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained("junaid008/qehwa-pashto-llm")
model     = AutoModelForCausalLM.from_pretrained(
    "junaid008/qehwa-pashto-llm",
    torch_dtype = torch.bfloat16,
    device_map  = "auto",
)

ALPACA_TEMPLATE = """Below is an instruction in Pashto or English. Write a detailed response in Pashto.

### Instruction:
{}

### Response:
{}"""

def generate(prompt):
    inputs  = tokenizer(ALPACA_TEMPLATE.format(prompt, ""), return_tensors="pt").to("cuda")
    outputs = model.generate(**inputs, max_new_tokens=500, temperature=0.7,
                              do_sample=True, pad_token_id=tokenizer.eos_token_id)
    return tokenizer.decode(outputs[0], skip_special_tokens=True).split("### Response:")[-1].strip()

print(generate("Tell me about Peshawar"))
print(generate("پښتونولي تشریح کړه"))
print(generate("پشاور کا مشہور کھانا کیا ہے؟"))

⚙️ Hardware Requirements

Method	VRAM	Speed
bfloat16 full	16GB+	✅ Fast
4-bit quantized	8GB+	✅ Good
Unsloth	16GB+	🔥 2x Faster
CPU only	No GPU	⚠️ Slow

📊 Training Details

Stage 1 — Continued Pre-Training (CPT)

Parameter	Value
Base model	Qwen/Qwen2.5-7B
Hardware	NVIDIA A100-SXM4-40GB
Training steps	5,000
Final CPT loss	~1.8
Dataset size	3,400,000 documents
Sequence length	2,048 tokens
Precision	bfloat16
LoRA rank	64
Learning rate	5e-5
Effective batch size	32

Stage 2 — Supervised Fine-Tuning (SFT)

Parameter	Value
Base model	junaid008/pashto-qwen2.5-7b-v3 (CPT)
Hardware	NVIDIA A100-SXM4-40GB
Training steps	7,908
Final SFT loss	0.455
Dataset size	126,519 pairs
Epochs	2
Sequence length	2,048 tokens
Precision	bfloat16
LoRA rank	64
Learning rate	5e-5
Effective batch size	32

📚 Dataset

CPT Dataset

3.4 million Pakistani Pashto documents
Sources: news, books, religious texts, Wikipedia, web crawl
Custom cleaned with Pashto-specific Unicode normalization

SFT Dataset — 126,519 pairs:

Type	Description
Pashto → Pashto	Questions and answers in pure Peshawari Pashto
English → Pashto	English instructions with Pashto responses
Urdu → Pashto	Urdu instructions with Pashto responses
Conversation	Natural dialogue in Peshawari dialect
Islamic topics	Religious knowledge in Pashto
Creative writing	Pashto poetry, stories, descriptions
Grammar	Pashto language correction and explanation
Translation	English to Pashto translation pairs
Cultural	Pashtunwali, traditions, history, geography

🗣️ Example Outputs

Pashto input:

Prompt : د پیښور تاریخ راته ووایه
Output : پیښور د نړۍ یو له تر ټولو پخوانیو ښارونو څخه دی چې تاریخ یې زرګونه
         کاله پخوا ته رسیږي. د دې ښار پخوانی نوم پوشکلاوتي او بیا پروشاپور و.
         د پیښور د بالاحصار کلا د مغلو او سکهانو د وخت یادګار دی...

English input:

Prompt : Tell me about Pashtunwali
Output : پښتونولي د پښتنو د ژوند هغه لرغونی او سپېڅلی ضابطه اخلاق ده چې د دوی
         ټولنیز ژوند یې تنظیم کړی دی. په دې کې مېلمه پالنه، غیرت، او د کمزورو
         ساتنه تر ټولو لوی ارزښتونه ګڼل کېږي...

Urdu input:

Prompt : پشتونوالی کیا ہے؟
Output : پښتونولي د پښتنو د ژوند هغه لرغونی او زرین ضابطه اخلاق ده...

👤 Author

Junaid Aslam

HuggingFace: junaid008
Built independently as a contribution to Pashto NLP

📜 License

Apache 2.0 — free to use, modify, and distribute with attribution.

🤝 Citation

@misc{qehwa-pashto-llm,
  author    = {Junaid Aslam},
  title     = {Qehwa — Pashto's First LLM},
  year      = {2026},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/junaid008/qehwa-pashto-llm}
}

Downloads last month: 337

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for junaid008/qehwa-pashto-llm

Base model

Qwen/Qwen2.5-7B

Finetuned

(849)

this model

Finetunes

1 model

junaid008
/

qehwa-pashto-llm