metadata
license: mit
datasets:
- yiyic/oscar_arb_Arab_train
- yiyic/oscar_arb_Arab_test
- SaiedAlshahrani/Arabic_Wikipedia_20230101_bots
- ClusterlabAi/101_billion_arabic_words_dataset
language:
- ar
metrics:
- f1
- exact_match
base_model:
- answerdotai/ModernBERT-base
tags:
- Embedding
- Arabic
- Sentiment_Analysis
- QA
- NER
Model Card: ModernAraBERT
Summary
- Arabic encoder adapted from
answerdotai/ModernBERT-basevia continued pretraining on Arabic corpora (~9.8GB). - Strong results across SA, NER (Macro-F1), and QA EM vs. AraBERT/mBERT/MARBERT.
- License: MIT · Paper: LREC 2026 · Hub: gizadatateam/ModernAraBERT
Intended Uses
- Masked LM, feature extraction, and transfer learning for Arabic tasks.
- Downstream: sentiment analysis, NER, extractive QA, general classification/labeling.
How to use
from transformers import AutoTokenizer, AutoModelForMaskedLM
name = "gizadatateam/ModernAraBERT"
model = AutoModelForMaskedLM.from_pretrained(name)
tokenizer = AutoTokenizer.from_pretrained(name)
Training data and recipe (brief)
- Corpora: OSIAN, Arabic Billion Words, Arabic Wikipedia, OSCAR Arabic
- Tokenizer: ModernBERT vocab + 80K Arabic tokens
- Objective: MLM (3 epochs; 128→512 seq len)
- Hardware: A100 40GB; framework: PyTorch + Transformers + Accelerate
Evaluation (from paper)
Sentiment Analysis — Macro-F1 (%)
| Model | LABR | HARD | AJGT |
|---|---|---|---|
| AraBERTv1 | 45.35 | 72.65 | 58.01 |
| AraBERTv2 | 45.79 | 67.10 | 53.59 |
| mBERT | 44.18 | 71.70 | 61.55 |
| MARBERT | 45.54 | 67.39 | 60.63 |
| ModernAraBERT | 56.45 | 89.37 | 70.54 |
NER — Macro-F1 (%)
| Model | Macro-F1 |
|---|---|
| AraBERTv1 | 13.46 |
| AraBERTv2 | 16.77 |
| mBERT | 12.15 |
| MARBERT | 7.42 |
| ModernAraBERT | 28.23 |
QA (ARCD test) — EM (%)
| Model | EM |
|---|---|
| AraBERT | 25.36 |
| AraBERTv2 | 26.08 |
| mBERT | 25.12 |
| MARBERT | 23.58 |
| ModernAraBERT | 27.10 |
Citation
@inproceedings{<paper_id>,
title={Efficient Adaptation of English Language Models for Low-Resource and Morphologically Rich Languages: The Case of Arabic},
author={Maher, Eldamaty, Ashraf, ElShawi, Mostafa},
booktitle={Proceedings of <conference_name>},
year={2025},
organization={<conference_name>}
}