QLoRA Model: WMT25-EuroLLM-9B-CPO

This repository contains QLoRA weights for the utter-project/EuroLLM-9B-Instruct model.

This adapter was trained as part of the research for Laniqo's submission to the WMT25 General MT Shared Task.

Paper: TBA

Authors: Kamil Guttmann, Zofia Rostek, Adrian Charkiewicz, Antoni Solarski, Mikołaj Pokrywka, Artur Nowakowski

Model Description

This QLoRA improves multilingual translation quality by aligning its outputs more closely with neural MT quality metrics, which are known to correlate highly with human preferences.

It was trained using the Contrastive Preference Optimization method on a synthetic preference dataset. This dataset was generated by applying Minimum Bayes Risk decoding to 10,000 examples per language pair from the NewsPALM dataset, covering translations from English to Korean, Japanese, Ukrainian, Czech, Chinese, Russian, and Estonian.

Training Details

  • Base Model: utter-project/EuroLLM-9B-Instruct
  • Training Method: Contrastive Preference Optimization with QLoRA.
  • Dataset: A synthetic preference dataset created from the NewsPALM corpus via MBR decoding.

Key Hyperparameters

Parameter Category Value/Description
QLoRA Configuration
LoRA Rank (r) 16
LoRA Alpha 32
LoRA Dropout 0.0
CPO Objective Configuration
Loss Type Sigmoid
Beta 0.7
Label Smoothing 0.15
CPO Alpha 1.0
General Training Configuration
Per-Device Batch Size 4
Gradient Accumulation Steps 12
Effective Global Batch Size 48
Learning Rate 5e-7
LR Scheduler Type Cosine
Warm-up Steps 100
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for laniqo/WMT25-EuroLLM-9B-CPO

Adapter
(4)
this model