ModernAraBERT / README.md
gizadatateam's picture
Update README.md
8f1434e verified
metadata
license: mit
datasets:
  - yiyic/oscar_arb_Arab_train
  - yiyic/oscar_arb_Arab_test
  - SaiedAlshahrani/Arabic_Wikipedia_20230101_bots
  - ClusterlabAi/101_billion_arabic_words_dataset
language:
  - ar
metrics:
  - f1
  - exact_match
base_model:
  - answerdotai/ModernBERT-base
tags:
  - Embedding
  - Arabic
  - Sentiment_Analysis
  - QA
  - NER

Model Card: ModernAraBERT

Summary

  • Arabic encoder adapted from answerdotai/ModernBERT-base via continued pretraining on Arabic corpora (~9.8GB).
  • Strong results across SA, NER (Macro-F1), and QA EM vs. AraBERT/mBERT/MARBERT.
  • License: MIT · Paper: LREC 2026 · Hub: gizadatateam/ModernAraBERT

Intended Uses

  • Masked LM, feature extraction, and transfer learning for Arabic tasks.
  • Downstream: sentiment analysis, NER, extractive QA, general classification/labeling.

How to use

from transformers import AutoTokenizer, AutoModelForMaskedLM
name = "gizadatateam/ModernAraBERT"
model = AutoModelForMaskedLM.from_pretrained(name)
tokenizer = AutoTokenizer.from_pretrained(name)

Training data and recipe (brief)

  • Corpora: OSIAN, Arabic Billion Words, Arabic Wikipedia, OSCAR Arabic
  • Tokenizer: ModernBERT vocab + 80K Arabic tokens
  • Objective: MLM (3 epochs; 128→512 seq len)
  • Hardware: A100 40GB; framework: PyTorch + Transformers + Accelerate

Evaluation (from paper)

Sentiment Analysis — Macro-F1 (%)

Model LABR HARD AJGT
AraBERTv1 45.35 72.65 58.01
AraBERTv2 45.79 67.10 53.59
mBERT 44.18 71.70 61.55
MARBERT 45.54 67.39 60.63
ModernAraBERT 56.45 89.37 70.54

NER — Macro-F1 (%)

Model Macro-F1
AraBERTv1 13.46
AraBERTv2 16.77
mBERT 12.15
MARBERT 7.42
ModernAraBERT 28.23

QA (ARCD test) — EM (%)

Model EM
AraBERT 25.36
AraBERTv2 26.08
mBERT 25.12
MARBERT 23.58
ModernAraBERT 27.10

Citation

@inproceedings{<paper_id>,
  title={Efficient Adaptation of English Language Models for Low-Resource and Morphologically Rich Languages: The Case of Arabic},
  author={Maher, Eldamaty, Ashraf, ElShawi, Mostafa},
  booktitle={Proceedings of <conference_name>},
  year={2025},
  organization={<conference_name>}
}