QLoRA Model: WMT25-EuroLLM-9B-CPO
This repository contains QLoRA weights for the utter-project/EuroLLM-9B-Instruct model.
This adapter was trained as part of the research for Laniqo's submission to the WMT25 General MT Shared Task.
Paper: TBA
Authors: Kamil Guttmann, Zofia Rostek, Adrian Charkiewicz, Antoni Solarski, Mikołaj Pokrywka, Artur Nowakowski
Model Description
This QLoRA improves multilingual translation quality by aligning its outputs more closely with neural MT quality metrics, which are known to correlate highly with human preferences.
It was trained using the Contrastive Preference Optimization method on a synthetic preference dataset. This dataset was generated by applying Minimum Bayes Risk decoding to 10,000 examples per language pair from the NewsPALM dataset, covering translations from English to Korean, Japanese, Ukrainian, Czech, Chinese, Russian, and Estonian.
Training Details
- Base Model:
utter-project/EuroLLM-9B-Instruct - Training Method: Contrastive Preference Optimization with QLoRA.
- Dataset: A synthetic preference dataset created from the NewsPALM corpus via MBR decoding.
Key Hyperparameters
| Parameter Category | Value/Description |
|---|---|
| QLoRA Configuration | |
| LoRA Rank (r) | 16 |
| LoRA Alpha | 32 |
| LoRA Dropout | 0.0 |
| CPO Objective Configuration | |
| Loss Type | Sigmoid |
| Beta | 0.7 |
| Label Smoothing | 0.15 |
| CPO Alpha | 1.0 |
| General Training Configuration | |
| Per-Device Batch Size | 4 |
| Gradient Accumulation Steps | 12 |
| Effective Global Batch Size | 48 |
| Learning Rate | 5e-7 |
| LR Scheduler Type | Cosine |
| Warm-up Steps | 100 |
Model tree for laniqo/WMT25-EuroLLM-9B-CPO
Base model
utter-project/EuroLLM-9B