echarif
/

mBART_for_darija_transaltion

Model card Files Files and versions

mBART_for_darija_transaltion / README.md

echarif's picture

Update README.md

024351f verified 9 months ago

|

history blame contribute delete

2.21 kB

	---
	license: apache-2.0
	metrics:
	- bleu
	base_model:
	- facebook/mbart-large-cc25
	pipeline_tag: translation
	---
	# Moroccan Darija to English Translation Model (Fine-Tuned mBART)

	This model is a fine-tuned version of the mBART model, designed specifically for the Moroccan Darija to English translation task. It is based on Facebook's mBART, a multilingual model capable of handling various language pairs. This fine-tuned model has been trained on a Moroccan Darija dataset to perform accurate translations from Darija to English.

	## Model Overview

	- Model Type: mBART (Multilingual BART)
	- Language Pair: Moroccan Darija → English
	- Task: Machine Translation
	- Training Dataset: The model was fine-tuned on a custom dataset containing Moroccan Darija to English translation pairs.

	## Model Details

	The mBART model is a transformer-based sequence-to-sequence model, designed to handle multiple languages. It is particularly useful for tasks such as translation, text generation, and summarization.

	For this specific task, the model has been fine-tuned to accurately translate text from Moroccan Darija to English, making it suitable for applications involving the translation of conversational and informal text from Morocco.

	## Intended Use

	This model can be used to:
	- Translate sentences from Moroccan Darija to English.

	## How to Use the Model

	You can easily load the model and tokenizer using the Hugging Face `transformers` library. Here's an example:

	```python
	from transformers import MBartForConditionalGeneration, MBart50TokenizerFast

	# Load the pre-trained model and tokenizer
	model_name = 'echarif/mBART_for_darija_transaltion'
	model = MBartForConditionalGeneration.from_pretrained(model_name)
	tokenizer = MBart50TokenizerFast.from_pretrained(model_name)

	# Prepare your input text (Moroccan Darija)
	input_text = "insert your Moroccan Darija sentence here"

	# Tokenize the input text
	inputs = tokenizer(input_text, return_tensors="pt", padding=True)

	# Generate the translated output
	translated_tokens = model.generate(**inputs)
	translated_text = tokenizer.decode(translated_tokens[0], skip_special_tokens=True)

	print(f"Translated Text: {translated_text}")