Qwen2.5-3B-Instruct-hsb-dsb
This model is the TartuNLP submission to the WMT25 Shared Task on Limited Resource Slavic Languages, covering Upper Sorbian (hsb) and Lower Sorbian (dsb).
It is based on Qwen2.5-3B-Instruct and adapted through continued pretraining on Sorbian monolingual and parallel data, plus general instruction-tuning datasets.
The model jointly supports machine translation (MT) and question answering (QA) for both Sorbian languages, achieving the top rank in the shared task.
⚠️ Note: This model is research-focused and has not been tested for general usage. Use at your own risk.
Example usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "tartuNLP/Qwen2.5-3B-Instruct-hsb-dsb"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
messages = [
{"role": "system", "content": "Translate the following text from German to Upper Sorbian."},
{"role": "user", "content": "Wie lange willst du noch bleiben?"}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
Shared task results
Results shared by the organizers (source).
Upper Sorbian:
| DE-HSB | points | HSB-QA | points | final points | |
|---|---|---|---|---|---|
| TartuNLP | 86.33 | 4 | 58.10 | 4 | 8 |
| NRC | 87.20 | 4 | 29.05 | 1 | 5 |
| SDKM | 75.73 | 2 | 55.24 | 3 | 5 |
| baseline | 13.88 | 1 | 42.86 | 2 | 3 |
Lower Sorbian:
| DE-DSB | points | DSB-QA | points | final points | |
|---|---|---|---|---|---|
| TartuNLP | 78.20 | 4 | 57.56 | 4 | 8 |
| NRC | 78.24 | 4 | 32.20 | 1 | 5 |
| SDKM | 64.34 | 2 | 51.71 | 3 | 5 |
| baseline | 12.21 | 1 | 45.85 | 2 | 3 |
Training details
- Total training tokens: ~1.2B
- Sequence length: 4096
- Training hardware: LUMI supercomputer (AMD MI250x GPUs)
- Training time: ~139 GPU-hours
Citation info
To be announced.
- Downloads last month
- 31