---
language:
- ar
pipeline_tag: audio-classification
library_name: speechbrain
tags:
- DIalectID
- ADI
- ADI-17
- speechbrain
- Identification
- pytorch
- embeddings
datasets:
- ADI-17
metrics:
- f1
- precision
- recall
- accuracy
arxiv: 2511.10090
---

## Install Requirements

### Clone the ADI-20 github repository:
```bash
git clone https://github.com/elyadata/ADI-20
cd ADI-20
pip install -r requirements.txt
```

### Note on SpeechBrain
While you can use the pipy version of Speechbrain included in the `requirements.txt` in the ADI-20 github repository, you may also install it from source using the following command:

```bash
pip install git+https://github.com/speechbrain/speechbrain.git@develop
```

## Perform Arabic Dialect Identification
```python
from inference.classifier_attention_pooling import WhisperDialectClassifier

dialect_id = WhisperDialectClassifier.from_hparams(
    source="Elyadata/ADI-whisper-ADI17",
    hparams_file="hyperparams.yaml",
    savedir="pretrained_DID", 
    run_opts={"device": "cuda"} # If using a GPU (recommended).
    )

out_prob, score, index, text_lab = dialect_id.classify_file("your_file.wav")
print(f"Predicted dialect: {text_lab[0]}")
print("-" * 15)
print(f"Dialect index: {index}")
print(f"Score: {score}")
print(f"Output log probs: {out_prob}")
print("-" * 15)

```

## NADI 2025
We have also used the [ADI-20 version of this model](https://huggingface.co/Elyadata/ADI-whisper-ADI20) for dialect identification task in the [NADI 2025](https://nadi.dlnlp.ai/2025/) challenge and ranked first:

| RANK | Codabench Username | Accuracy | Cost   |
|------|---------------------|----------|--------|
| 🥇   | harounelleuch (***this model***)      | 0.7983   | 0.1788 |
| 🥈   | badr_alabsi         | 0.7640   | 0.2265 |
| 🥉   | rafiulbiswas        | 0.616    | 0.3068 |
| 4    | gahmed92            | 0.612    | 0.3477 |
| 5    | ADI Baseline        | 0.6109   | 0.3422 |

For more information on how we used the model, you can refer to:
- Our system paper:  [arXiv](https://arxiv.org/abs/2511.10090), [ACL Anthology](https://aclanthology.org/2025.arabicnlp-sharedtasks.105/)
- NADI findings paper:  [arXiv](https://arxiv.org/abs/2509.02038), [ACL Anthology](https://aclanthology.org/2025.arabicnlp-sharedtasks.99/)



## Citations
If using this work, please cite:
```bibtex
@inproceedings{elleuch25_interspeech,
  title     = {{ADI-20: Arabic Dialect Identification dataset and models}},
  author    = {Haroun Elleuch and Salima Mdhaffar and Yannick Estève and Fethi Bougares},
  year      = {2025},
  booktitle = {{Interspeech 2025}},
  pages     = {2775--2779},
  doi       = {10.21437/Interspeech.2025-884},
  issn      = {2958-1796},
}

@inproceedings{elleuch-etal-2025-elyadata,
    title = "{ELYADATA} {\&} {LIA} at {NADI} 2025: {ASR} and {ADI} Subtasks",
    author = "Elleuch, Haroun  and
      Saidi, Youssef  and
      Mdhaffar, Salima  and
      Est{\`e}ve, Yannick  and
      Bougares, Fethi",
    booktitle = "Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.arabicnlp-sharedtasks.105/",
    doi = "10.18653/v1/2025.arabicnlp-sharedtasks.105",
    pages = "762--766",
    ISBN = "979-8-89176-356-2",
}
```