Model Details

Developed by: Ahmad Amirivojdan
Language(s) (NLP): Persian
Repository: Shekar - Open Source Persian NLP Toolkit
Paper: Shekar: A Python Toolkit for Persian Natural Language Processing
License: MIT

Model Description

This model is intended to provide a compact, fine-tuned representation model for Persian text by leveraging the ALBERT base architecture and fine-tuning it on the high-quality Persian corpus Naab corpus (including ZWNJ-aware tokenisation). The model can be used for tasks such as:

Masked token prediction (fill-mask) in Persian text
Feature extraction / embedding generation for downstream tasks (e.g., classification, ranking, clustering)
Pre-training backbone for further fine-tuning on task-specific Persian NLP tasks

How to Use


from transformers import AlbertTokenizer, AutoModelForMaskedLM

tokenizer = AlbertTokenizer.from_pretrained("shekar-ai/albert-base-v2-persian-zwnj-naab-mlm")
model = AutoModelForMaskedLM.from_pretrained("shekar-ai/albert-base-v2-persian-zwnj-naab-mlm")

# Example usage: fill-mask
input_text = "من به مدرسه [MASK] می‌روم."
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
# convert logits to predictions etc.

Citation

BibTeX:

@article{Amirivojdan2025Shekar,
author = {Amirivojdan, Ahmad},
doi = {10.21105/joss.09128},
journal = {Journal of Open Source Software},
month = oct,
number = {114},
pages = {9128},
title = {{Shekar: A Python Toolkit for Persian Natural Language Processing}},
url = {https://joss.theoj.org/papers/10.21105/joss.09128},
volume = {10},
year = {2025}
}

Downloads last month: 6

Safetensors

Model size

11.5M params

Tensor type

F32

Model tree for shekar-ai/albert-base-v2-persian-zwnj-naab-mlm

Base model

albert/albert-base-v2

Finetuned

(239)

this model

Finetunes

2 models

shekar-ai
/

albert-base-v2-persian-zwnj-naab-mlm

Model Details

Model Description

How to Use

Citation

Model tree for shekar-ai/albert-base-v2-persian-zwnj-naab-mlm

Dataset used to train shekar-ai/albert-base-v2-persian-zwnj-naab-mlm