--- language: - ar pipeline_tag: audio-classification library_name: speechbrain tags: - DIalectID - ADI - ADI-17 - speechbrain - Identification - pytorch - embeddings datasets: - ADI-17 metrics: - f1 - precision - recall - accuracy arxiv: 2511.10090 --- ## Install Requirements ### Clone the ADI-20 github repository: ```bash git clone https://github.com/elyadata/ADI-20 cd ADI-20 pip install -r requirements.txt ``` ### Note on SpeechBrain While you can use the pipy version of Speechbrain included in the `requirements.txt` in the ADI-20 github repository, you may also install it from source using the following command: ```bash pip install git+https://github.com/speechbrain/speechbrain.git@develop ``` ## Perform Arabic Dialect Identification ```python from inference.classifier_attention_pooling import WhisperDialectClassifier dialect_id = WhisperDialectClassifier.from_hparams( source="Elyadata/ADI-whisper-ADI17", hparams_file="hyperparams.yaml", savedir="pretrained_DID", run_opts={"device": "cuda"} # If using a GPU (recommended). ) out_prob, score, index, text_lab = dialect_id.classify_file("your_file.wav") print(f"Predicted dialect: {text_lab[0]}") print("-" * 15) print(f"Dialect index: {index}") print(f"Score: {score}") print(f"Output log probs: {out_prob}") print("-" * 15) ``` ## NADI 2025 We have also used the [ADI-20 version of this model](https://huggingface.co/Elyadata/ADI-whisper-ADI20) for dialect identification task in the [NADI 2025](https://nadi.dlnlp.ai/2025/) challenge and ranked first: | RANK | Codabench Username | Accuracy | Cost | |------|---------------------|----------|--------| | 🥇 | harounelleuch (***this model***) | 0.7983 | 0.1788 | | 🥈 | badr_alabsi | 0.7640 | 0.2265 | | 🥉 | rafiulbiswas | 0.616 | 0.3068 | | 4 | gahmed92 | 0.612 | 0.3477 | | 5 | ADI Baseline | 0.6109 | 0.3422 | For more information on how we used the model, you can refer to: - Our system paper: [arXiv](https://arxiv.org/abs/2511.10090), [ACL Anthology](https://aclanthology.org/2025.arabicnlp-sharedtasks.105/) - NADI findings paper: [arXiv](https://arxiv.org/abs/2509.02038), [ACL Anthology](https://aclanthology.org/2025.arabicnlp-sharedtasks.99/) ## Citations If using this work, please cite: ```bibtex @inproceedings{elleuch25_interspeech, title = {{ADI-20: Arabic Dialect Identification dataset and models}}, author = {Haroun Elleuch and Salima Mdhaffar and Yannick Estève and Fethi Bougares}, year = {2025}, booktitle = {{Interspeech 2025}}, pages = {2775--2779}, doi = {10.21437/Interspeech.2025-884}, issn = {2958-1796}, } @inproceedings{elleuch-etal-2025-elyadata, title = "{ELYADATA} {\&} {LIA} at {NADI} 2025: {ASR} and {ADI} Subtasks", author = "Elleuch, Haroun and Saidi, Youssef and Mdhaffar, Salima and Est{\`e}ve, Yannick and Bougares, Fethi", booktitle = "Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks", month = nov, year = "2025", address = "Suzhou, China", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2025.arabicnlp-sharedtasks.105/", doi = "10.18653/v1/2025.arabicnlp-sharedtasks.105", pages = "762--766", ISBN = "979-8-89176-356-2", } ```