--- language: - en - hi - ta - te - kn - bn - ml - es - fr - ja - zh license: cc-by-nc-4.0 tags: - translation - news - multilingual - nllb - journalism - media pipeline_tag: translation --- # 🌍 Multilingual News Translator **Translate news articles from ANY source into 10 languages instantly!** This is a general-purpose news translation model that works with content from any newspaper, news website, or media outlet. No specific data sources are used - this is a pre-trained multilingual model suitable for translating journalistic content. ## ✨ Key Features - 🌐 **Universal**: Works with ANY news source (BBC, Reuters, local newspapers, blogs, etc.) - 🚀 **Fast**: Instant translations - 🎯 **Accurate**: Optimized for formal news language - 📰 **Journalistic**: Handles news terminology well - 🆓 **Free**: Open for non-commercial use ## 🎯 Supported Languages - 🇮🇳 **Hindi** (हिन्दी) - 🇮🇳 **Telugu** (తెలుగు) - 🇮🇳 **Tamil** (தமிழ்) - 🇮🇳 **Kannada** (ಕನ್ನಡ) - 🇮🇳 **Bengali** (বাংলা) - 🇮🇳 **Malayalam** (മലയാളം) - 🇪🇸 **Spanish** (Español) - 🇫🇷 **French** (Français) - 🇯🇵 **Japanese** (日本語) - 🇨🇳 **Chinese** (中文) ## 🚀 Quick Start ```python from transformers import AutoTokenizer, AutoModelForSeq2SeqLM # Load model model_name = "YOUR_USERNAME/multilingual-news-translator" model = AutoModelForSeq2SeqLM.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name) # Translate to Hindi text = "Global markets showed strong growth today" tokenizer.src_lang = "eng_Latn" inputs = tokenizer(text, return_tensors="pt") outputs = model.generate( **inputs, forced_bos_token_id=tokenizer.lang_code_to_id["hin_Deva"], max_length=512 ) translation = tokenizer.decode(outputs[0], skip_special_tokens=True) print(translation) ``` ## 📖 Language Codes Reference | Language | Code | Script | |----------|------|--------| | English | `eng_Latn` | Source language | | Hindi | `hin_Deva` | देवनागरी | | Telugu | `tel_Telu` | తెలుగు | | Tamil | `tam_Taml` | தமிழ் | | Kannada | `kan_Knda` | ಕನ್ನಡ | | Bengali | `ben_Beng` | বাংলা | | Malayalam | `mal_Mlym` | മലയാളം | | Spanish | `spa_Latn` | Latin | | French | `fra_Latn` | Latin | | Japanese | `jpn_Jpan` | 日本語 | | Chinese | `zho_Hans` | 简体中文 | ## 💡 Complete Example ```python from transformers import AutoTokenizer, AutoModelForSeq2SeqLM class NewsTranslator: def __init__(self, model_name): self.tokenizer = AutoTokenizer.from_pretrained(model_name) self.model = AutoModelForSeq2SeqLM.from_pretrained(model_name) self.languages = { 'hindi': 'hin_Deva', 'tamil': 'tam_Taml', 'spanish': 'spa_Latn', 'french': 'fra_Latn' } def translate(self, text, target_lang): self.tokenizer.src_lang = "eng_Latn" inputs = self.tokenizer(text, return_tensors="pt", truncation=True) outputs = self.model.generate( **inputs, forced_bos_token_id=self.tokenizer.lang_code_to_id[self.languages[target_lang]], max_length=512, num_beams=5 ) return self.tokenizer.decode(outputs[0], skip_special_tokens=True) # Usage translator = NewsTranslator("YOUR_USERNAME/multilingual-news-translator") result = translator.translate("Breaking news from around the world", "hindi") print(result) ``` ## 🎯 Use Cases - **News Aggregators**: Translate content from multiple sources - **Media Monitoring**: Track news in multiple languages - **Research**: Analyze global news coverage - **Personal Use**: Read international news in your language - **Journalism**: Cross-language reporting - **Education**: Study comparative journalism ## 📊 Model Information - **Base Model**: NLLB-200 (600M parameters) - **Architecture**: Transformer-based sequence-to-sequence - **Training**: Pre-trained on multilingual web data - **Languages**: 200+ languages (10 optimized for news) - **Framework**: PyTorch / TensorFlow compatible - **Size**: ~2.5GB ## ⚠️ Limitations - Optimized for formal news content and journalistic language - Best with complete sentences and proper grammar - May not handle extreme slang or very informal language well - Long texts should be split into paragraphs (max 512 tokens) - Translation quality depends on content complexity ## 📜 License & Legal - **License**: CC-BY-NC-4.0 (Non-commercial use) - **Base Model**: Meta's NLLB-200 (Open source) - **Data**: Pre-trained on public multilingual web data - **Usage**: Free for research, personal, and non-commercial applications ⚠️ **Important**: This model does NOT contain data from any specific news organization. It is a general-purpose translation model trained on public multilingual data. Users are responsible for respecting copyright when translating content from specific sources. ## 🙏 Credits Built using Meta's NLLB-200 (No Language Left Behind) model --- **Made with ❤️ for the global news community**