---
language:
  - lus
  - en
library_name: peft
license: mit
pipeline_tag: translation
tags:
  - mizo
  - bible
  - nllb
  - lora
  - seq2seq
  - translation
base_model: facebook/nllb-200-distilled-600M
---

# Mizo Bible LoRA (NLLB-200-distilled-600M)
This is a **LoRA adapter** for Mizo, trained in two stages on top of
`facebook/nllb-200-distilled-600M`:
...


# Mizo Bible LoRA (NLLB-200-distilled-600M)

This is a **LoRA adapter** for Mizo, trained in **two stages** on top of
`facebook/nllb-200-distilled-600M`:

1. **Stage 1 – Dictionary**
   - Data: Mizo dictionary pairs  
     - English headword → Mizo explanation/definition  
   - Purpose: give the model strong coverage of modern Mizo vocabulary and word senses.

2. **Stage 2 – Bible (Eng → Mizo)**
   - Data: English → Mizo verse-level alignment of the Bible.  
   - Training continues from the dictionary LoRA (stage 1), so Bible fine-tuning sits “on top” of the dictionary knowledge.
   - Some archaic usage of the conjunction **“Tin”** in Mizo targets is down-weighted by cleaning it from the training text, so the model does not spam “Tin” in normal sentences.

## Base model

- [`facebook/nllb-200-distilled-600M`](https://huggingface.co/facebook/nllb-200-distilled-600M)

This repository contains **only the LoRA adapter weights**, not the full base model.

## Usage

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from peft import PeftModel
import torch

BASE_MODEL = "facebook/nllb-200-distilled-600M"
LORA_REPO  = "frankiethiak/nllb-mizo-bible-lora"   # update if different

tokenizer = AutoTokenizer.from_pretrained(
    BASE_MODEL,
    src_lang="eng_Latn",
    tgt_lang="lus_Latn",
)

base_model = AutoModelForSeq2SeqLM.from_pretrained(BASE_MODEL)
model = PeftModel.from_pretrained(base_model, LORA_REPO)
model.eval()

text = "In the beginning God created the heaven and the earth."
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
    gen = model.generate(**inputs, max_new_tokens=80, num_beams=4)

print(tokenizer.decode(gen[0], skip_special_tokens=True))