--- license: apache-2.0 license_link: https://huggingface.co/Qihoo360/Light-MT-7B/blob/main/LICENSE language: - en - zh pipeline_tag: text-generation base_model: Qwen/Qwen2.5-7B tags: - machine-translation - multilingual - qwen2 library_name: transformers --- # Light-MT-7B

## Introduction Light-MT-7B is a machine translation focused variant of Qwen2.5-7B developed by 360 AI Research. The model follows the Multilingual Translation Policy Optimization (MtPO) pipeline introduced in the paper "Extending Foundation Models to Low-Resource Languages" and targets Southeast Asian and other under-served languages while preserving general instruction-following ability. **This repo contains the machine translation specialized 7B model**, which has the following features: - Type: Causal Language Models for Machine Translation - Training Stage: Continued pretraining, curriculum SFT, and MtPO reinforcement learning - Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias - Number of Parameters: 7.61B (6.53B non-embedding) - Number of Layers: 28 - Number of Attention Heads (GQA): 28 for Q and 4 for KV - Context Length: Up to 131,072 tokens - Vocabulary Size: 180,736 tokens with MtPO vocabulary expansion ## Model Highlights Key outcomes from the MtPO recipe: - 2.1x-5.4x compression gains on FLORES-Plus corpora across Khmer, Lao, Myanmar, Thai, Tibetan, and other scripts through targeted tokenizer expansion. - Curriculum supervised fine-tuning over a 7M-sample mixture progressing from general instructions to ASEAN-focused translation prompts. - MtPO reinforcement learning that maintains entropy during decoding via asymmetric clipping, temperature consistency, and microbatch-normalized advantages. - Reinforcement Learning with Verifiable Rewards (RLVR) to enforce length ratios, structural tokens, language targeting, and code mixing checks for reliable outputs. - 200B continued pretraining tokens plus 60k MtPO steps, preserving BBH, CMMLU, HellaSwag, and MMLU performance while lifting translation quality. ## Requirements The code of Light-MT-7B is compatible with the latest Hugging Face `transformers` library. We recommend using the latest version of `transformers`. With `transformers<4.37.0`, you will encounter the following error: ``` KeyError: 'qwen2' ``` ## Quickstart Here provides a code snippet to show you how to load the tokenizer and model for machine translation tasks. ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "qihoo360/Light-MT-7B" model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_name) # Example translation prompt prompt = "Translate the following English text to Chinese: Hello, how are you today?" messages = [ {"role": "system", "content": "You are a professional translator. Translate the given text accurately and naturally."}, {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) generated_ids = model.generate( **model_inputs, max_new_tokens=512, temperature=0.7, do_sample=True ) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] print(response) ``` ## Training Pipeline (MtPO) MtPO runs in four stages from tokenizer expansion to reinforcement learning alignment. - **Stage 1 - Vocabulary expansion:** Extend the Qwen2.5 tokenizer with 3k-4k tokens per target language (Khmer, Lao, Mongolian, Myanmar, Tamil, Thai, Tibetan, Uyghur). FLORES-Plus diagnostics show 2.1x-5.4x compression gains, cutting Khmer token counts from 402 to 103 for representative passages. - **Stage 2 - Balanced continued pretraining:** Continue training on 200B tokens with a 1:1 mix between English and the expanded low-resource corpus to preserve high-resource coverage while materially improving low-resource fluency. - **Stage 3 - Curriculum SFT:** Train on a 7M-sample blend (5:1 general instructions vs. multilingual data) that progresses from base instruction-following to ASEAN translation and mixed-format prompts. - **Stage 4 - MtPO reinforcement learning:** Optimize with entropy-tempered policy updates that keep sampling temperature consistent, apply asymmetric ratio clipping, and normalize advantages at the microbatch level to avoid length bias or entropy collapse. ## Verifiable Reward Guardrails Reinforcement Learning with Verifiable Rewards (RLVR) combines the translation reward model with deterministic validators. During MtPO we sample K candidates per prompt, score them with RLVR, and keep the top-G diverse outputs for gradient updates. Each candidate is checked for: - Length ratio safety relative to the source (default bounds 0.5-2.0 with soft penalties outside range) - Structural token preservation for HTML, Markdown, and code blocks using lightweight parsers - Target-language verification via a confidence-gated language ID classifier - Code-mixing penalties that suppress unintended language drift These verifiable rewards are added to the semantic score so bad outputs receive immediate negative credit, while high-quality candidates remain eligible for optimization. ## Data and Training Budget Summary of resources and evaluation suites used during MtPO development. - Continued pretraining: 200B tokens with adaptive sampling over English, ASEAN, Tibetan, Mongolian, Tamil, and Uyghur corpora - MtPO reinforcement learning: 60k steps, batch size 128, top-G candidate selection with RLVR filtering - Reward model: Preference data spans ten error categories (accuracy, fluency, terminology, formatting, code-mixing, etc.) - Benchmarks: FLORES-Plus (90 directions), BBH, CMMLU, HellaSwag, MMLU ## Model Details - **Model Type**: Qwen2-based Causal Language Model - **Language(s)**: Multilingual (English, Chinese, Khmer, Lao, Myanmar, Thai, Tibetan, Mongolian, Tamil, Malay, Indonesian, Filipino, Vietnamese, Uyghur, etc.) - **License**: Apache 2.0 - **Finetuned from**: Qwen/Qwen2.5-7B - **Model Size**: 7.61B parameters - **Context Length**: 131,072 tokens ## Usage This model is specifically designed for machine translation tasks. It can handle various translation scenarios including: - English <-> Chinese translation - Multilingual translation tasks - Professional document translation - Conversational translation ## Evaluation ### Translation and General Benchmarks Light-MT-7B-MtPO is evaluated on FLORES-Plus (90 directions) and standard instruction-following benchmarks. Scores below use sacreBLEU (higher is better) and zero-shot accuracy (percentage). | Model | Group | xx->en | en->xx | xx->xx | Avg. | BBH | CMMLU | HellaSwag | MMLU | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | Gemma3-27B-IT | Multilingual chat | **36.8** | 30.7 | 22.3 | 24.7 | 55.9 | 55.9 | 55.9 | **56.0** | | Qwen3-8B | Multilingual chat | 31.1 | 23.3 | 14.4 | 16.9 | **63.8** | 60.8 | 26.0 | 51.3 | | Qwen2.5-7B-Instruct | Multilingual chat | 24.8 | 17.4 | 9.2 | 11.6 | 54.4 | **64.1** | **85.2** | 40.9 | | Apertus-8B-Instruct | Multilingual chat | 32.5 | 25.7 | 15.6 | 18.3 | 49.2 | 45.3 | 64.2 | 45.2 | | Tower-Plus-9B | Multilingual chat | 28.2 | 18.3 | 9.8 | 12.5 | 40.4 | 57.2 | 73.1 | 42.1 | | Qwen-MT-Plus | Translation-focused | 34.0 | 29.6 | 19.6 | 22.1 | - | - | - | - | | Seed-X-PPO-7B | Translation-focused | 25.9 | 22.6 | 10.5 | 13.3 | - | - | - | - | | Hunyuan-MT-7B | Translation-focused | 24.6 | 23.4 | 14.8 | 16.6 | - | - | - | - | | Light-TLLM-7B-SFT | Our models | 35.4 | 32.0 | 22.7 | 24.3 | 59.6 | 61.4 | 83.7 | 47.2 | | **Light-TLLM-7B-MtPO** | Our models | 36.1 | **32.7** | **23.1** | **24.9** | 60.9 | 63.2 | **85.2** | 48.5 | - en->xx directions gain +1.1 BLEU over the next best 7B system while preserving reasoning accuracy (+1.3 MMLU over SFT). - Average BLEU across all FLORES-Plus directions rises to 24.9 despite the compact 7B footprint. ### Tokenizer Efficiency Vocabulary expansion provides substantial compression on targeted scripts (higher compression ratio means fewer tokens per sentence). | Language | Added tokens | Old compression ratio | New compression ratio | Speedup | | --- | --- | --- | --- | --- | | Khmer | 3712 | 0.85 | 3.49 | 4.09x | | Lao | 3359 | 0.85 | 3.05 | 3.59x | | Myanmar | 3226 | 0.69 | 2.87 | 4.17x | | Thai | 2958 | 1.79 | 2.97 | 1.66x | | Tibetan | 3920 | 0.75 | 4.03 | 5.39x | - Khmer passages shrink from 402 tokens to 103 tokens in the running example used in the paper. - Compression gains translate into lower latency and memory cost during decoding for low-resource scripts. ### Constraint Reliability (RLVR) RLVR introduces deterministic checks that reduce failure modes compared with general chat models and MT baselines. | Model | Language targeting | Length control | Format preservation | Code mixing | Overall | | --- | --- | --- | --- | --- | --- | | **Light-TLLM-7B-MtPO** | **97.8** | 99.2 | **92.15** | 92.3 | **95.3** | | Qwen2.5-7B-Instruct | 92.0 | 97.0 | 51.8 | 62.8 | 75.9 | | Gemma3-27B-IT | 97.4 | 91.6 | 42.1 | 90.9 | 80.5 | | Qwen-MT-Plus | 97.6 | **99.8** | 82.5 | 94.8 | 93.6 | | Seed-X-PPO-7B | 97.6 | 79.8 | 79.0 | 90.3 | 86.6 | | DeepSeek-V3 | 95.4 | 95.7 | 67.6 | 95.0 | 88.4 | | Hunyuan-MT-7B | 91.8 | 90.7 | 71.1 | **96.2** | 87.4 | - Format retention jumps to 92.15 percent versus 51.8 percent for Qwen2.5-7B-Instruct, mitigating HTML or Markdown corruption. - Language targeting stays above 97 percent while MtPO avoids verbosity by normalizing advantages at the microbatch level. - Overall pass rate reaches 95.3 percent, surpassing Qwen2.5-7B-Instruct by 19.4 points, DeepSeek-V3 by 6.9 points, and Qwen-MT-Plus by 1.7 points despite identical constraint settings. ### Per-Language FLORES Highlights - **English->Thai:** 34.1 BLEU, +1.5 over Qwen-MT-Plus. - **English->Myanmar:** 12.9 BLEU with stable length control. - **English->Filipino:** 35.4 BLEU after MtPO, combining instruction fidelity and translation quality. - **Khmer->English:** 44.7 BLEU, reflecting gains from tokenizer expansion. - **Vietnamese->English:** 37.6 BLEU with consistent improvements across ASEAN language pairs. ## Citation If you find our work helpful, feel free to give us a cite. ``` @inproceedings{liu2026mtpo, title = {Light-MT-7B}, author = {Light-MT Team}, booktitle = {International Conference on Learning Representations}, year = {2025}, url = {https://huggingface.co/qihoo360/Light-MT-7B} } ``` ## Disclaimer This model is provided for research and educational purposes. Please ensure responsible use and compliance with applicable laws and regulations when using this model.