Qwen2.5-3B-Instruct-hsb-dsb

This model is the TartuNLP submission to the WMT25 Shared Task on Limited Resource Slavic Languages, covering Upper Sorbian (hsb) and Lower Sorbian (dsb).
It is based on Qwen2.5-3B-Instruct and adapted through continued pretraining on Sorbian monolingual and parallel data, plus general instruction-tuning datasets.

The model jointly supports machine translation (MT) and question answering (QA) for both Sorbian languages, achieving the top rank in the shared task.

⚠️ Note: This model is research-focused and has not been tested for general usage. Use at your own risk.

Example usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "tartuNLP/Qwen2.5-3B-Instruct-hsb-dsb"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

messages = [
    {"role": "system", "content": "Translate the following text from German to Upper Sorbian."},
    {"role": "user", "content": "Wie lange willst du noch bleiben?"}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Shared task results

Results shared by the organizers (source).

Upper Sorbian:

	DE-HSB	points	HSB-QA	points	final points
TartuNLP	86.33	4	58.10	4	8
NRC	87.20	4	29.05	1	5
SDKM	75.73	2	55.24	3	5
baseline	13.88	1	42.86	2	3

Lower Sorbian:

	DE-DSB	points	DSB-QA	points	final points
TartuNLP	78.20	4	57.56	4	8
NRC	78.24	4	32.20	1	5
SDKM	64.34	2	51.71	3	5
baseline	12.21	1	45.85	2	3

Training details

Total training tokens: ~1.2B
Sequence length: 4096
Training hardware: LUMI supercomputer (AMD MI250x GPUs)
Training time: ~139 GPU-hours

Citation info

To be announced.

Downloads last month: 31

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for tartuNLP/Qwen2.5-3B-Instruct-hsb-dsb

Base model

Qwen/Qwen2.5-3B

Finetuned

Qwen/Qwen2.5-3B-Instruct

Finetuned

(805)

this model

tartuNLP
/

Qwen2.5-3B-Instruct-hsb-dsb