|
|
--- |
|
|
license: apache-2.0 |
|
|
metrics: |
|
|
- bleu |
|
|
base_model: |
|
|
- facebook/mbart-large-cc25 |
|
|
pipeline_tag: translation |
|
|
--- |
|
|
# Moroccan Darija to English Translation Model (Fine-Tuned mBART) |
|
|
|
|
|
This model is a fine-tuned version of the mBART model, designed specifically for the Moroccan Darija to English translation task. It is based on Facebook's mBART, a multilingual model capable of handling various language pairs. This fine-tuned model has been trained on a Moroccan Darija dataset to perform accurate translations from Darija to English. |
|
|
|
|
|
## Model Overview |
|
|
|
|
|
- **Model Type**: mBART (Multilingual BART) |
|
|
- **Language Pair**: Moroccan Darija → English |
|
|
- **Task**: Machine Translation |
|
|
- **Training Dataset**: The model was fine-tuned on a custom dataset containing Moroccan Darija to English translation pairs. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
The mBART model is a transformer-based sequence-to-sequence model, designed to handle multiple languages. It is particularly useful for tasks such as translation, text generation, and summarization. |
|
|
|
|
|
For this specific task, the model has been fine-tuned to accurately translate text from **Moroccan Darija** to **English**, making it suitable for applications involving the translation of conversational and informal text from Morocco. |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
This model can be used to: |
|
|
- Translate sentences from Moroccan Darija to English. |
|
|
|
|
|
## How to Use the Model |
|
|
|
|
|
You can easily load the model and tokenizer using the Hugging Face `transformers` library. Here's an example: |
|
|
|
|
|
```python |
|
|
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast |
|
|
|
|
|
# Load the pre-trained model and tokenizer |
|
|
model_name = 'echarif/mBART_for_darija_transaltion' |
|
|
model = MBartForConditionalGeneration.from_pretrained(model_name) |
|
|
tokenizer = MBart50TokenizerFast.from_pretrained(model_name) |
|
|
|
|
|
# Prepare your input text (Moroccan Darija) |
|
|
input_text = "insert your Moroccan Darija sentence here" |
|
|
|
|
|
# Tokenize the input text |
|
|
inputs = tokenizer(input_text, return_tensors="pt", padding=True) |
|
|
|
|
|
# Generate the translated output |
|
|
translated_tokens = model.generate(**inputs) |
|
|
translated_text = tokenizer.decode(translated_tokens[0], skip_special_tokens=True) |
|
|
|
|
|
print(f"Translated Text: {translated_text}") |