Qwen2.5-3B-Instruct-hsb-dsb

This model is the TartuNLP submission to the WMT25 Shared Task on Limited Resource Slavic Languages, covering Upper Sorbian (hsb) and Lower Sorbian (dsb).
It is based on Qwen2.5-3B-Instruct and adapted through continued pretraining on Sorbian monolingual and parallel data, plus general instruction-tuning datasets.

The model jointly supports machine translation (MT) and question answering (QA) for both Sorbian languages, achieving the top rank in the shared task.

⚠️ Note: This model is research-focused and has not been tested for general usage. Use at your own risk.

Example usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "tartuNLP/Qwen2.5-3B-Instruct-hsb-dsb"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

messages = [
    {"role": "system", "content": "Translate the following text from German to Upper Sorbian."},
    {"role": "user", "content": "Wie lange willst du noch bleiben?"}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Shared task results

Results shared by the organizers (source).

Upper Sorbian:

DE-HSB points HSB-QA points final points
TartuNLP 86.33 4 58.10 4 8
NRC 87.20 4 29.05 1 5
SDKM 75.73 2 55.24 3 5
baseline 13.88 1 42.86 2 3

Lower Sorbian:

DE-DSB points DSB-QA points final points
TartuNLP 78.20 4 57.56 4 8
NRC 78.24 4 32.20 1 5
SDKM 64.34 2 51.71 3 5
baseline 12.21 1 45.85 2 3

Training details

  • Total training tokens: ~1.2B
  • Sequence length: 4096
  • Training hardware: LUMI supercomputer (AMD MI250x GPUs)
  • Training time: ~139 GPU-hours

Citation info

To be announced.

Downloads last month
31
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tartuNLP/Qwen2.5-3B-Instruct-hsb-dsb

Base model

Qwen/Qwen2.5-3B
Finetuned
(805)
this model

Datasets used to train tartuNLP/Qwen2.5-3B-Instruct-hsb-dsb