echarif's picture
Update README.md
024351f verified
---
license: apache-2.0
metrics:
- bleu
base_model:
- facebook/mbart-large-cc25
pipeline_tag: translation
---
# Moroccan Darija to English Translation Model (Fine-Tuned mBART)
This model is a fine-tuned version of the mBART model, designed specifically for the Moroccan Darija to English translation task. It is based on Facebook's mBART, a multilingual model capable of handling various language pairs. This fine-tuned model has been trained on a Moroccan Darija dataset to perform accurate translations from Darija to English.
## Model Overview
- **Model Type**: mBART (Multilingual BART)
- **Language Pair**: Moroccan Darija → English
- **Task**: Machine Translation
- **Training Dataset**: The model was fine-tuned on a custom dataset containing Moroccan Darija to English translation pairs.
## Model Details
The mBART model is a transformer-based sequence-to-sequence model, designed to handle multiple languages. It is particularly useful for tasks such as translation, text generation, and summarization.
For this specific task, the model has been fine-tuned to accurately translate text from **Moroccan Darija** to **English**, making it suitable for applications involving the translation of conversational and informal text from Morocco.
## Intended Use
This model can be used to:
- Translate sentences from Moroccan Darija to English.
## How to Use the Model
You can easily load the model and tokenizer using the Hugging Face `transformers` library. Here's an example:
```python
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast
# Load the pre-trained model and tokenizer
model_name = 'echarif/mBART_for_darija_transaltion'
model = MBartForConditionalGeneration.from_pretrained(model_name)
tokenizer = MBart50TokenizerFast.from_pretrained(model_name)
# Prepare your input text (Moroccan Darija)
input_text = "insert your Moroccan Darija sentence here"
# Tokenize the input text
inputs = tokenizer(input_text, return_tensors="pt", padding=True)
# Generate the translated output
translated_tokens = model.generate(**inputs)
translated_text = tokenizer.decode(translated_tokens[0], skip_special_tokens=True)
print(f"Translated Text: {translated_text}")