---
library_name: transformers
language:
- hsb
- dsb
datasets:
- HuggingFaceFW/fineweb-2
- CohereLabs/aya_dataset
- Magpie-Align/Magpie-Llama-3.1-Pro-MT-300K-Filtered
- OpenAssistant/oasst2
- ai2-adapt-dev/flan_v2_converted
- utter-project/EuroBlocks-SFT-Synthetic-1124
base_model:
- Qwen/Qwen2.5-3B-Instruct
---

# Qwen2.5-3B-Instruct-hsb-dsb

This model is the TartuNLP submission to the **WMT25 Shared Task on Limited Resource Slavic Languages**, covering **Upper Sorbian** (hsb) and **Lower Sorbian** (dsb).  
It is based on **Qwen2.5-3B-Instruct** and adapted through continued pretraining on Sorbian monolingual and parallel data, plus general instruction-tuning datasets.  

The model jointly supports machine translation (MT) and question answering (QA) for both Sorbian languages, achieving the top rank in the shared task.  

⚠️ **Note:** This model is research-focused and has not been tested for general usage. Use at your own risk.  


## Example usage
```
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "tartuNLP/Qwen2.5-3B-Instruct-hsb-dsb"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

messages = [
    {"role": "system", "content": "Translate the following text from German to Upper Sorbian."},
    {"role": "user", "content": "Wie lange willst du noch bleiben?"}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
```

## Shared task results
Results shared by the organizers ([source](https://github.com/TUM-NLP/llms-limited-resources2025/blob/main/results.md)).


**Upper Sorbian:**
|              | DE-HSB    | points | HSB-QA    | points | final points |
|--------------|-----------|--------|-----------|--------|--------------|
| **TartuNLP** | 86.33     | 4      | **58.10** | 4      | 8            |
| NRC          | **87.20** | 4      | 29.05     | 1      | 5            |
| SDKM         | 75.73     | 2      | 55.24     | 3      | 5            |
| baseline     | 13.88     | 1      | 42.86     | 2      | 3            |


**Lower Sorbian:**
|              | DE-DSB    | points | DSB-QA    | points | final points |
|--------------|-----------|--------|-----------|--------|--------------|
| **TartuNLP** | 78.20     | 4      | **57.56** | 4      | 8            |
| NRC          | **78.24** | 4      | 32.20     | 1      | 5            |
| SDKM         | 64.34     | 2      | 51.71     | 3      | 5            |
| baseline     | 12.21     | 1      | 45.85     | 2      | 3            |


## Training details
- Total training tokens: ~1.2B  
- Sequence length: 4096  
- Training hardware: LUMI supercomputer (AMD MI250x GPUs)  
- Training time: ~139 GPU-hours

## Citation info
To be announced.