Model Details

Model Description

This model is intended to provide a compact, fine-tuned representation model for Persian text by leveraging the ALBERT base architecture and fine-tuning it on the high-quality Persian corpus Naab corpus (including ZWNJ-aware tokenisation). The model can be used for tasks such as:

  • Masked token prediction (fill-mask) in Persian text
  • Feature extraction / embedding generation for downstream tasks (e.g., classification, ranking, clustering)
  • Pre-training backbone for further fine-tuning on task-specific Persian NLP tasks

How to Use


from transformers import AlbertTokenizer, AutoModelForMaskedLM

tokenizer = AlbertTokenizer.from_pretrained("shekar-ai/albert-base-v2-persian-zwnj-naab-mlm")
model = AutoModelForMaskedLM.from_pretrained("shekar-ai/albert-base-v2-persian-zwnj-naab-mlm")

# Example usage: fill-mask
input_text = "من به مدرسه [MASK] می‌روم."
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
# convert logits to predictions etc.

Citation

BibTeX:

@article{Amirivojdan2025Shekar,
author = {Amirivojdan, Ahmad},
doi = {10.21105/joss.09128},
journal = {Journal of Open Source Software},
month = oct,
number = {114},
pages = {9128},
title = {{Shekar: A Python Toolkit for Persian Natural Language Processing}},
url = {https://joss.theoj.org/papers/10.21105/joss.09128},
volume = {10},
year = {2025}
}
Downloads last month
6
Safetensors
Model size
11.5M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for shekar-ai/albert-base-v2-persian-zwnj-naab-mlm

Finetuned
(239)
this model
Finetunes
2 models

Dataset used to train shekar-ai/albert-base-v2-persian-zwnj-naab-mlm