metadata
language:
- en
- hi
- ta
- te
- kn
- bn
- ml
- es
- fr
- ja
- zh
license: cc-by-nc-4.0
tags:
- translation
- news
- multilingual
- nllb
- journalism
- media
pipeline_tag: translation
🌍 Multilingual News Translator
Translate news articles from ANY source into 10 languages instantly!
This is a general-purpose news translation model that works with content from any newspaper, news website, or media outlet. No specific data sources are used - this is a pre-trained multilingual model suitable for translating journalistic content.
✨ Key Features
- 🌐 Universal: Works with ANY news source (BBC, Reuters, local newspapers, blogs, etc.)
- 🚀 Fast: Instant translations
- 🎯 Accurate: Optimized for formal news language
- 📰 Journalistic: Handles news terminology well
- 🆓 Free: Open for non-commercial use
🎯 Supported Languages
- 🇮🇳 Hindi (हिन्दी)
- 🇮🇳 Telugu (తెలుగు)
- 🇮🇳 Tamil (தமிழ்)
- 🇮🇳 Kannada (ಕನ್ನಡ)
- 🇮🇳 Bengali (বাংলা)
- 🇮🇳 Malayalam (മലയാളം)
- 🇪🇸 Spanish (Español)
- 🇫🇷 French (Français)
- 🇯🇵 Japanese (日本語)
- 🇨🇳 Chinese (中文)
🚀 Quick Start
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
# Load model
model_name = "YOUR_USERNAME/multilingual-news-translator"
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Translate to Hindi
text = "Global markets showed strong growth today"
tokenizer.src_lang = "eng_Latn"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(
**inputs,
forced_bos_token_id=tokenizer.lang_code_to_id["hin_Deva"],
max_length=512
)
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(translation)
📖 Language Codes Reference
| Language | Code | Script |
|---|---|---|
| English | eng_Latn |
Source language |
| Hindi | hin_Deva |
देवनागरी |
| Telugu | tel_Telu |
తెలుగు |
| Tamil | tam_Taml |
தமிழ் |
| Kannada | kan_Knda |
ಕನ್ನಡ |
| Bengali | ben_Beng |
বাংলা |
| Malayalam | mal_Mlym |
മലയാളം |
| Spanish | spa_Latn |
Latin |
| French | fra_Latn |
Latin |
| Japanese | jpn_Jpan |
日本語 |
| Chinese | zho_Hans |
简体中文 |
💡 Complete Example
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
class NewsTranslator:
def __init__(self, model_name):
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
self.model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
self.languages = {
'hindi': 'hin_Deva',
'tamil': 'tam_Taml',
'spanish': 'spa_Latn',
'french': 'fra_Latn'
}
def translate(self, text, target_lang):
self.tokenizer.src_lang = "eng_Latn"
inputs = self.tokenizer(text, return_tensors="pt", truncation=True)
outputs = self.model.generate(
**inputs,
forced_bos_token_id=self.tokenizer.lang_code_to_id[self.languages[target_lang]],
max_length=512,
num_beams=5
)
return self.tokenizer.decode(outputs[0], skip_special_tokens=True)
# Usage
translator = NewsTranslator("YOUR_USERNAME/multilingual-news-translator")
result = translator.translate("Breaking news from around the world", "hindi")
print(result)
🎯 Use Cases
- News Aggregators: Translate content from multiple sources
- Media Monitoring: Track news in multiple languages
- Research: Analyze global news coverage
- Personal Use: Read international news in your language
- Journalism: Cross-language reporting
- Education: Study comparative journalism
📊 Model Information
- Base Model: NLLB-200 (600M parameters)
- Architecture: Transformer-based sequence-to-sequence
- Training: Pre-trained on multilingual web data
- Languages: 200+ languages (10 optimized for news)
- Framework: PyTorch / TensorFlow compatible
- Size: ~2.5GB
⚠️ Limitations
- Optimized for formal news content and journalistic language
- Best with complete sentences and proper grammar
- May not handle extreme slang or very informal language well
- Long texts should be split into paragraphs (max 512 tokens)
- Translation quality depends on content complexity
📜 License & Legal
- License: CC-BY-NC-4.0 (Non-commercial use)
- Base Model: Meta's NLLB-200 (Open source)
- Data: Pre-trained on public multilingual web data
- Usage: Free for research, personal, and non-commercial applications
⚠️ Important: This model does NOT contain data from any specific news organization. It is a general-purpose translation model trained on public multilingual data. Users are responsible for respecting copyright when translating content from specific sources.
🙏 Credits
Built using Meta's NLLB-200 (No Language Left Behind) model
Made with ❤️ for the global news community