---
library_name: transformers
language:
- hsb
- dsb
datasets:
- HuggingFaceFW/fineweb-2
- CohereLabs/aya_dataset
- Magpie-Align/Magpie-Llama-3.1-Pro-MT-300K-Filtered
- OpenAssistant/oasst2
- ai2-adapt-dev/flan_v2_converted
- utter-project/EuroBlocks-SFT-Synthetic-1124
base_model:
- Qwen/Qwen2.5-3B-Instruct
---

# Qwen2.5-3B-Instruct-hsb-dsb

This model is the TartuNLP submission to the **WMT25 Shared Task on Limited Resource Slavic Languages**, covering **Upper Sorbian** (hsb) and **Lower Sorbian** (dsb).  
It is based on **Qwen2.5-3B-Instruct** and adapted through continued pretraining on Sorbian monolingual and parallel data, plus general instruction-tuning datasets.  

The model jointly supports machine translation (MT) and question answering (QA) for both Sorbian languages, achieving the top rank in the shared task.  

**📄More details in [the shared task paper](https://aclanthology.org/2025.wmt-1.88/).**

⚠️ **Note:** This model is research-focused and has not been tested for general usage. Use at your own risk.  


## Example usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "tartuNLP/Qwen2.5-3B-Instruct-hsb-dsb"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

messages = [
    {"role": "system", "content": "Translate the following text from German to Upper Sorbian."},
    {"role": "user", "content": "Wie lange willst du noch bleiben?"}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
```

## Shared task results
Results shared by the organizers ([source](https://github.com/TUM-NLP/llms-limited-resources2025/blob/main/results.md)).


**Upper Sorbian:**
|              | DE-HSB    | points | HSB-QA    | points | final points |
|--------------|-----------|--------|-----------|--------|--------------|
| **TartuNLP** | 86.33     | 4      | **58.10** | 4      | 8            |
| NRC          | **87.20** | 4      | 29.05     | 1      | 5            |
| SDKM         | 75.73     | 2      | 55.24     | 3      | 5            |
| baseline     | 13.88     | 1      | 42.86     | 2      | 3            |


**Lower Sorbian:**
|              | DE-DSB    | points | DSB-QA    | points | final points |
|--------------|-----------|--------|-----------|--------|--------------|
| **TartuNLP** | 78.20     | 4      | **57.56** | 4      | 8            |
| NRC          | **78.24** | 4      | 32.20     | 1      | 5            |
| SDKM         | 64.34     | 2      | 51.71     | 3      | 5            |
| baseline     | 12.21     | 1      | 45.85     | 2      | 3            |


## Training details
- Total training tokens: ~1.2B  
- Sequence length: 4096  
- Training hardware: LUMI supercomputer (AMD MI250x GPUs)  
- Training time: ~139 GPU-hours

## Citation info
```
@inproceedings{purason-fishel-2025-tartunlp,
    title = "{T}artu{NLP} at {WMT}25 {LLM}s with Limited Resources for {S}lavic Languages Shared Task",
    author = "Purason, Taido  and
      Fishel, Mark",
    editor = "Haddow, Barry  and
      Kocmi, Tom  and
      Koehn, Philipp  and
      Monz, Christof",
    booktitle = "Proceedings of the Tenth Conference on Machine Translation",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.wmt-1.88/",
    doi = "10.18653/v1/2025.wmt-1.88",
    pages = "1143--1150",
    ISBN = "979-8-89176-341-8",
    abstract = "This paper describes the TartuNLP submission to the Upper Sorbian (hsb) and Lower Sorbian (dsb) tracks of the WMT25 LLMs with Limited Resources for Slavic Languages shared task, which jointly targets machine translation (MT) and question answering (QA). We develop a single multilingual model based on Qwen2.5-3B-Instruct by continuing pretraining on Sorbian monolingual and parallel data together with general instruction datasets, combining language acquisition and instruction-following in a single step. The resulting model delivers substantial improvements over the baseline Qwen2.5-3B-Instruct model and also achieves the highest ranking for both tasks in the hsb and dsb shared task tracks."
}
```