--- language: - lus - en library_name: peft license: mit pipeline_tag: translation tags: - mizo - bible - nllb - lora - seq2seq - translation base_model: facebook/nllb-200-distilled-600M --- # Mizo Bible LoRA (NLLB-200-distilled-600M) This is a **LoRA adapter** for Mizo, trained in two stages on top of `facebook/nllb-200-distilled-600M`: ... # Mizo Bible LoRA (NLLB-200-distilled-600M) This is a **LoRA adapter** for Mizo, trained in **two stages** on top of `facebook/nllb-200-distilled-600M`: 1. **Stage 1 – Dictionary** - Data: Mizo dictionary pairs - English headword → Mizo explanation/definition - Purpose: give the model strong coverage of modern Mizo vocabulary and word senses. 2. **Stage 2 – Bible (Eng → Mizo)** - Data: English → Mizo verse-level alignment of the Bible. - Training continues from the dictionary LoRA (stage 1), so Bible fine-tuning sits “on top” of the dictionary knowledge. - Some archaic usage of the conjunction **“Tin”** in Mizo targets is down-weighted by cleaning it from the training text, so the model does not spam “Tin” in normal sentences. ## Base model - [`facebook/nllb-200-distilled-600M`](https://huggingface.co/facebook/nllb-200-distilled-600M) This repository contains **only the LoRA adapter weights**, not the full base model. ## Usage ```python from transformers import AutoTokenizer, AutoModelForSeq2SeqLM from peft import PeftModel import torch BASE_MODEL = "facebook/nllb-200-distilled-600M" LORA_REPO = "frankiethiak/nllb-mizo-bible-lora" # update if different tokenizer = AutoTokenizer.from_pretrained( BASE_MODEL, src_lang="eng_Latn", tgt_lang="lus_Latn", ) base_model = AutoModelForSeq2SeqLM.from_pretrained(BASE_MODEL) model = PeftModel.from_pretrained(base_model, LORA_REPO) model.eval() text = "In the beginning God created the heaven and the earth." inputs = tokenizer(text, return_tensors="pt") with torch.no_grad(): gen = model.generate(**inputs, max_new_tokens=80, num_beams=4) print(tokenizer.decode(gen[0], skip_special_tokens=True))