---
library_name: transformers
language:
- en
- de
- fr
license: other
base_model: straker/tiri-tahi-3b-base-pt-bf16
tags:
- translation
- sft
- seq2seq
- fuzzy-match
pipeline_tag: translation
---

# Tiri Tahi 3B - Genesis SFT (EN-DE/FR)

A supervised fine-tuned version of [straker/tiri-tahi-3b-base-pt-bf16](https://huggingface.co/straker/tiri-tahi-3b-base-pt-bf16) for machine translation with translation memory (fuzzy match) augmentation.

## Model Details

- **Base model:** Tiri Tahi 3B (MADLAD-400 architecture, T5-based encoder-decoder)
- **Task:** Machine translation with fuzzy match context
- **Language pairs:** English-German (EN-DE), English-French (EN-FR)
- **Parameters:** ~3B

## Training Data

The model was fine-tuned on 72,230 translation pairs with 4,012 held out for validation:

| Language Pair | Training Samples |
|---|---|
| EN-DE | 44,592 |
| EN-FR | 27,638 |

Each training example includes up to 2 fuzzy matches from translation memory, providing the model with reference translations at varying similarity scores to improve output quality.

### Input Format

The model uses the MADLAD-400 `<2xx>` prefix format with fuzzy match context prepended to the source text:

```
<2de>source text to translate
```

When fuzzy matches are available, they are prepended as context to help guide the translation.

## Training Procedure

### Hyperparameters

| Parameter | Value |
|---|---|
| Learning rate | 1e-4 |
| LR scheduler | Cosine |
| Warmup steps | 50 |
| Batch size | 32 |
| Epochs | 5 |
| Weight decay | 0.01 |
| Label smoothing | 0.05 |
| Max source length | 1024 |
| Max target length | 256 |
| Precision | bf16 |
| Gradient checkpointing | Enabled |
| Optimizer | AdamW (fused) |

### Training Results

| Metric | Value |
|---|---|
| Final train loss | 0.49 |
| Training time | ~2.5 hours (across resumed runs) |
| Train samples/sec | 79.18 |

## Intended Uses

- Machine translation for EN-DE and EN-FR language pairs
- Translation memory-augmented machine translation (leveraging fuzzy matches)
- CAT (Computer-Assisted Translation) tool integration

## Limitations

- Only trained on EN-DE and EN-FR; other language pairs may produce lower quality output
- Performance depends on quality and relevance of provided fuzzy matches
- Not evaluated on standard MT benchmarks (BLEU, COMET) in this release

## Framework Versions

- Transformers 4.57.6
- PyTorch 2.11.0+cu128
- Datasets 4.8.4
- Tokenizers 0.22.2