OmniTranslate 1.0 LoRA
This is the LoRA version of OmniTranslate.
If you don't want to download the full merged model, you can download the LoRA and run with the base Qwen 3 0.6B.
Otherwise, stick to the full merged version, model card starts below.
OmniTranslate 1.0
OmniTranslate is a massively multilingual machine translation model supporting over 500 languages. Fine-tuned from Qwen 3 0.6B (with Unsloth), this model is designed for translation tasks on any device!
Features
- 500+ Languages Supported: The broadest coverage of languages supported for a translation model that's under 1 billion parameters!
- Tiny Size: Beats any other large model on speed and memory usage. No other model is able to compete with this!
Issues
- Accuracy on Common Languages: Accuracy on common languages in the dataset (like Spanish, Chinese, Romanian) is generally very good! Sometimes there's a chance that OmniTranslate can make hiccups. Examples are
roșăandamiwhen translating to Romanian. - Accuracy on Rare Languages: Accuracy on rare languages in the dataset (like Toki Pona) isn't as good as on common languages!
As it follows, OmniTranslate 1.0 is a experimental model and shouldn't be used for tasks where accurate translations matter.
Notes
Providing the ISO code instead of the language name can improve the results a lot.
Usage
Code is by Gemini 3 Flash:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# 1. Load from your Hugging Face Repo
model_id = "MihaiPopa-1/OmniTranslate-1.0"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float32, # Standard for CPU
device_map="cpu" # Forces CPU usage
)
# 2. Translate (replace ron_Latn with your language here)
prompt = "<|im_start|>user\nTranslate to ron_Latn: OmniTranslate is a massively multilingual machine translation model supporting over 500 languages!<|im_end|>\n<|im_start|>assistant\n<think>\n"
inputs = tokenizer(prompt, return_tensors="pt").to("cpu")
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.1)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Data Used
I used my own OmniSurgical 1.0, which the dataset itself is a extract of HF's FineTranslations.
120 sentences per language (60 per language pair).
Uploaded model
- Developed by: MihaiPopa-1
- License: apache-2.0
- Finetuned from model : unsloth/qwen3-0.6b-unsloth-bnb-4bit
This qwen3 model was trained 2x faster with Unsloth
