--- license: cc-by-4.0 language: - asm - mni - kha - lus - grt - trp - njz - brx - nag - eng - hin tags: - ocr - northeast-india - doctr - vitstr - mizo - garo - khasi - nyishi - kokborok - nagamese - bodo - meitei ---

MWire Labs Logo

# NE-OCR ### High-Accuracy OCR for Northeast Indian Scripts [![Technical Report](https://img.shields.io/badge/Technical_Report-PDF-blue)](https://mwirelabs.com/wp-content/uploads/2026/03/NE_OCR_Technical_Report.pdf) [![License](https://img.shields.io/badge/License-CC--BY--4.0-green)](https://creativecommons.org/licenses/by/4.0/) [![Benchmark](https://img.shields.io/badge/Benchmark-26k_Samples-orange)](#benchmark-test-set) **Purpose-built OCR for Northeast India with 94.99% average character accuracy across 12 language–script pairs.** Outperforms EasyOCR, Tesseract 5, and TrOCR-large on 9 of 12 language–script pairs. Fast inference and strong performance where general OCR systems fail. Developed by **MWire Labs, Shillong, Meghalaya**.

NE-OCR Architecture Diagram

NE-OCR is built on a ViTSTR-Base encoder with CTC decoding. The model processes 32×128 RGB word/line crops across Latin, Bengali, Devanagari, and Meitei Mayek scripts, outputting text from a 1,056-character multilingual vocabulary. ## Model Details - **Architecture:** DocTR ViTSTR-Base (86M parameters) - **Vocab size:** 1056 characters (Latin, Bengali, Devanagari, Meitei Mayek) - **Input:** 32×128 RGB image crops (word/line level, ≤32 chars) - **Training data:** ~988k deduplicated samples across 12 languages - **Trained by:** MWire Labs ## Inference Speed Measured on NVIDIA A40 (batch size = 1):

NE-OCR Latency Comparison

- **NE-OCR:** 17.2 ms/image - EasyOCR: 37.2 ms - TrOCR-large: 92.1 ms - Tesseract 5: 166.1 ms - Chandra (VLM): 313 ms NE-OCR is: - 2× faster than EasyOCR - 9× faster than Tesseract - 18× faster than VLM-based OCR systems ## Benchmark Comparison — Character Accuracy (ChA%) Evaluated on a fixed 26,000-sample benchmark (2,000 per language–script pair). Higher is better. | Language | Script | **NE-OCR** | EasyOCR | Tesseract 5 | TrOCR-large | Chandra | |----------|--------|------------|----------|-------------|-------------|----------| | Assamese | Bengali | **97.46%** | 32.25% | 8.79% | 0.80% | 57.83% | | Bodo | Devanagari | **83.38%** | 82.65% | 64.85% | 1.85% | 74.76% | | English | Latin | 90.35% | 68.91% | 50.77% | 88.87% | **91.30%** | | Garo | Latin | 93.52% | 69.43% | 69.90% | 87.83% | **94.15%** | | Hindi | Devanagari | **97.69%** | 49.54% | 41.48% | 1.27% | 85.78% | | Khasi | Latin | **98.85%** | 77.78% | 80.72% | 93.22% | 94.15% | | Kokborok | Latin | **97.59%** | 83.00% | 78.76% | 94.58% | 96.19% | | Meitei (Bengali) | Bengali | **97.09%** | 33.64% | 7.30% | 0.55% | 48.34% | | Meitei (Mayek) | Meitei Mayek | **95.56%** | 2.50% | 2.24% | 2.45% | 2.57% | | Mizo | Latin | **95.96%** | 67.62% | 68.44% | 84.58% | 92.96% | | Nagamese | Latin | **97.91%** | 81.60% | 78.05% | 93.46% | 97.60% | | Nyishi | Latin | **94.50%** | 69.56% | 69.92% | 87.23% | 91.85% | | **Average** | — | **94.99%** | 59.87% | 51.77% | 53.06% | 77.29% | ## Benchmark Test Set A public benchmark test set is available in the `benchmark/` folder of this repository for reproducing evaluation results and comparing against other OCR models. - **Combined:** `benchmark/ne_ocr_benchmark.parquet` — 26,000 samples across all 12 languages - **Per-language:** `benchmark/{lang}_test.parquet` — 2,000 samples each - **Format:** Parquet with columns: `image_path`, `text`, `lang` - **Filter:** All samples ≤32 characters (word/line-level crops) Results reported in this model card are computed on this exact test set. ## Usage ````python import torch, json import numpy as np from PIL import Image from huggingface_hub import hf_hub_download from doctr.models import vitstr_base # Download files model_path = hf_hub_download(repo_id='MWirelabs/ne-ocr', filename='ne_ocr_best.pt') vocab_path = hf_hub_download(repo_id='MWirelabs/ne-ocr', filename='ne_ocr_vocab.json') # Load vocab with open(vocab_path, encoding='utf-8') as f: vocab_data = json.load(f) vocab_str = ''.join(vocab_data['vocab'][1:]) # Load model model = vitstr_base(pretrained=False, vocab=vocab_str) model.load_state_dict(torch.load(model_path, map_location='cpu')) model.eval() # Inference (word/line crop, max 32 chars) img = Image.open('your_crop.jpg').convert('RGB').resize((128, 32)) img_tensor = torch.tensor(np.array(img, dtype=np.float32)/255.0).permute(2,0,1).unsqueeze(0) out = model(img_tensor) print(out['preds'][0][0]) ```` ## Notes - Model is designed for **word/line-level crops** (≤32 characters), not full pages - For full page OCR, use a text detection model first (e.g. DBNet) to extract crops - Bodo accuracy is lower due to limited training data; planned improvement in V2 ## License CC-BY-4.0 — MWire Labs