We released the suite of models we trained as part of our work on scaling laws of decoder-only machine translation systems. This work has been published in WMT24 and is available here.
These models have been trained on a mixture of general and financial sentences on 11 language directions. They support 8 languages (English, French, German, Italian, Spanish, Dutch, Swedish and Portuguese) as well as 9 domains (general + 8 financial subdomains). They are not tailored for document-level translation.
A running demo of these models is available on our dedicated space.
Evaluation
The below table details the performance of our models on general domain translation.
| Model | BLEU | COMET | COMET-Kiwi | 
|---|---|---|---|
| FinTranslate-70M | 29.62 | 81.31 | 80.72 | 
| FinTranslate-160M | 32.43 | 84.00 | 83.45 | 
| FinTranslate-410M | 33.60 | 84.81 | 84.14 | 
| FinTranslate-Bronze | 34.08 | 85.10 | 84.35 | 
| FinTranslate-Silver | 34.42 | 85.10 | 84.33 | 
| FinTranslate-Gold | 36.07 | 85.88 | 84.82 | 
| Llama3.1 8B | 30.43 | 84.82 | 84.47 | 
| Mistral 7B | 23.26 | 80.08 | 82.29 | 
| Tower 7B | 33.50 | 85.91 | 85.02 | 
The below table details the performance of our models on financial translation.
| Model | BLEU | COMET | COMET-Kiwi | 
|---|---|---|---|
| FinTranslate-70M | 44.63 | 86.95 | 80.88 | 
| FinTranslate-160M | 49.02 | 88.27 | 81.80 | 
| FinTranslate-410M | 50.85 | 88.64 | 81.73 | 
| FinTranslate-Bronze | 52.00 | 88.85 | 81.71 | 
| FinTranslate-Silver | 53.28 | 89.98 | 81.61 | 
| FinTranslate-Gold | 58.34 | 89.62 | 81.35 | 
| Llama 3.1 8B | 34.99 | 84.42 | 81.75 | 
| Mistral 7B | 38.93 | 76.52 | 76.17 | 
| Tower 7B | 38.93 | 86.49 | 82.66 | 
How to use it
from transformers import AutoTokenizer, AutoModelForCausalLM
LANGUAGES = ["en", "de", "es", "fr", "it", "nl", "sv", "pt"]
DOMAINS = {
    "Asset manangement": "am",
    "Annual report": "ar",
    "Corporate action": "corporateAction",
    "Equity research": "equi",
    "Fund fact sheet": "ffs",
    "Kiid": "kiid",
    "Life insurance": "lifeInsurance",
    "Regulatory": "regulatory",
    "General": "general",
}
def language_token(lang):
    return f"<lang_{lang}>"
def domain_token(dom):
    return f"<dom_{dom}>"
def format_input(src, tgt_lang, src_lang, domain):
    assert tgt_lang in LANGUAGES
    tgt_lang_token = language_token(tgt_lang)
    # Please read our paper to understand why we need to prefix the input with <eos>
    base_input = f"<eos>{src}</src>{tgt_lang_token}"
    if src_lang is None:
        return base_input
    else:
        assert src_lang in LANGUAGES
        src_lang_token = language_token(src_lang)
        base_input = f"{base_input}{src_lang_token}"
    if domain is None:
        return base_input
    else:
        domain = DOMAINS.get(domain, "general")
        dom_token = domain_token(domain)
        base_input = f"{base_input}{dom_token}"
    return base_input
model_id = "DragonLLM/FinTranslate-410M"
model = AutoModelForCausalLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)
source_sentence = "Dragon LLM est une entreprise française spécialisé dans le domaine de l'IA générative."
formatted_sentence = format_input(source_sentence, "en", "fr", "General")
inputs = tokenizer(formatted_sentence, return_tensors="pt", return_token_type_ids=False)
outputs = model.generate(**inputs, max_new_tokens=64)
input_size = inputs["input_ids"].size(1)
translated_sentence = tokenizer.decode(
    outputs[0, input_size:], skip_special_tokens=True
)
print(translated_sentence)
# Dragon LLM is a French company specialized in the field of generative AI.
Citing this work
If you use this model in your work, please cite it as:
@inproceedings{caillaut-etal-2024-scaling,
    title = "Scaling Laws of Decoder-Only Models on the Multilingual Machine Translation Task",
    author = {Caillaut, Ga{\"e}tan  and
      Nakhl{\'e}, Mariam  and
      Qader, Raheel  and
      Liu, Jingshu  and
      Barth{\'e}lemy, Jean-Gabriel},
    editor = "Haddow, Barry  and
      Kocmi, Tom  and
      Koehn, Philipp  and
      Monz, Christof",
    booktitle = "Proceedings of the Ninth Conference on Machine Translation",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.wmt-1.124/",
    doi = "10.18653/v1/2024.wmt-1.124",
    pages = "1318--1331"
}
- Downloads last month
- 2
