PyTorch
xlm-roberta
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

AfroLID

AfroLID, a neural LID toolkit for 517 African languages and varieties. AfroLID exploits a multi-domain web dataset manually curated from across 14 language families utilizing five orthographic systems. AfroLID is described in this paper: AfroLID: A Neural Language Identification Tool for African Languages.

What's New in AfroLID v1.5?

  • Fine-tuned on SERENGETI, a massively multilingual language model covering 517 African languages and language varieties.
  • Enhanced model performance, improving macro-F1 from 95.95 to 97.41.
  • Built on Hugging Face Transformers for seamless integration.
  • Optimized for easy use with the Hugging Face pipeline.
  • Better efficiency and accuracy, making it more robust for African langauges identification.

How to use AfroLID v1.5?

from transformers import pipeline


afrolid = pipeline("text-classification", model='UBC-NLP/afrolid_1.5')

input_text="6Acï looi aya në wuöt dït kɔ̈k yiic ku lɔ wuöt tɔ̈u tëmec piny de Manatha ku Eparaim ku Thimion , ku ɣään mec tɔ̈u të lɔ rut cï Naptali"

result = afrolid(input_text)

# Extract the label and score from the first result
language = result[0]['label']
score = result[0]['score']

print(f"detected langauge: {language}\tscore: {round(score*100, 2)}")

Output:

detected langauge: dip	score: 99.99

Supported languages

Please refer to suported-languages

Citation

If you use the AfroLID v1.5 model for your scientific publication, or if you find the resources in this repository useful, please cite our papers as follows:

AfroLID's paper*

@article{adebara2022afrolid,
  title={AfroLID: A Neural Language Identification Tool for African Languages},
  author={Adebara, Ife and Elmadany, AbdelRahim and Abdul-Mageed, Muhammad and Inciarte, Alcides Alcoba},
  booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
  month = December,
  year = "2022",
}

Serengeti's Paper

@inproceedings{adebara-etal-2023-serengeti,
    title = "{SERENGETI}: Massively Multilingual Language Models for {A}frica",
    author = "Adebara, Ife  and
      Elmadany, AbdelRahim  and
      Abdul-Mageed, Muhammad  and
      Alcoba Inciarte, Alcides",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2023",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.findings-acl.97",
    doi = "10.18653/v1/2023.findings-acl.97",
    pages = "1498--1537",
}
Downloads last month
142
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support