---
license: cc-by-4.0
language:
- asm
- mni
- kha
- lus
- grt
- trp
- njz
- brx
- nag
- eng
- hin
tags:
- ocr
- northeast-india
- doctr
- vitstr
- mizo
- garo
- khasi
- nyishi
- kokborok
- nagamese
- bodo
- meitei
---

<p align="center">
  <img src="https://huggingface.co/MWirelabs/ne-ocr/resolve/main/assets/mwire.png" width="180" alt="MWire Labs Logo">
</p>

# NE-OCR  
### High-Accuracy OCR for Northeast Indian Scripts

[![Technical Report](https://img.shields.io/badge/Technical_Report-PDF-blue)](https://mwirelabs.com/wp-content/uploads/2026/03/NE_OCR_Technical_Report.pdf)
[![License](https://img.shields.io/badge/License-CC--BY--4.0-green)](https://creativecommons.org/licenses/by/4.0/)
[![Benchmark](https://img.shields.io/badge/Benchmark-26k_Samples-orange)](#benchmark-test-set)

**Purpose-built OCR for Northeast India with 94.99% average character accuracy across 12 language–script pairs.**  
Outperforms EasyOCR, Tesseract 5, and TrOCR-large on 9 of 12 language–script pairs.  
Fast inference and strong performance where general OCR systems fail.

Developed by **MWire Labs, Shillong, Meghalaya**.

<p align="center">
  <img src="https://huggingface.co/MWirelabs/ne-ocr/resolve/main/assets/neocrarchitecture.jpg" width="850" alt="NE-OCR Architecture Diagram">
</p>

NE-OCR is built on a ViTSTR-Base encoder with CTC decoding. The model processes 32×128 RGB word/line crops across Latin, Bengali, Devanagari, and Meitei Mayek scripts, outputting text from a 1,056-character multilingual vocabulary.

## Model Details
- **Architecture:** DocTR ViTSTR-Base (86M parameters)
- **Vocab size:** 1056 characters (Latin, Bengali, Devanagari, Meitei Mayek)
- **Input:** 32×128 RGB image crops (word/line level, ≤32 chars)
- **Training data:** ~988k deduplicated samples across 12 languages
- **Trained by:** MWire Labs

## Inference Speed

Measured on NVIDIA A40 (batch size = 1):

<p align="center">
  <img src="https://huggingface.co/MWirelabs/ne-ocr/resolve/main/assets/inferenceneocr.png" width="700" alt="NE-OCR Latency Comparison">
</p>

- **NE-OCR:** 17.2 ms/image  
- EasyOCR: 37.2 ms  
- TrOCR-large: 92.1 ms  
- Tesseract 5: 166.1 ms  
- Chandra (VLM): 313 ms  

NE-OCR is:
- 2× faster than EasyOCR  
- 9× faster than Tesseract  
- 18× faster than VLM-based OCR systems  

## Benchmark Comparison — Character Accuracy (ChA%)

Evaluated on a fixed 26,000-sample benchmark (2,000 per language–script pair).  
Higher is better.

| Language | Script | **NE-OCR** | EasyOCR | Tesseract 5 | TrOCR-large | Chandra |
|----------|--------|------------|----------|-------------|-------------|----------|
| Assamese | Bengali | **97.46%** | 32.25% | 8.79% | 0.80% | 57.83% |
| Bodo | Devanagari | **83.38%** | 82.65% | 64.85% | 1.85% | 74.76% |
| English | Latin | 90.35% | 68.91% | 50.77% | 88.87% | **91.30%** |
| Garo | Latin | 93.52% | 69.43% | 69.90% | 87.83% | **94.15%** |
| Hindi | Devanagari | **97.69%** | 49.54% | 41.48% | 1.27% | 85.78% |
| Khasi | Latin | **98.85%** | 77.78% | 80.72% | 93.22% | 94.15% |
| Kokborok | Latin | **97.59%** | 83.00% | 78.76% | 94.58% | 96.19% |
| Meitei (Bengali) | Bengali | **97.09%** | 33.64% | 7.30% | 0.55% | 48.34% |
| Meitei (Mayek) | Meitei Mayek | **95.56%** | 2.50% | 2.24% | 2.45% | 2.57% |
| Mizo | Latin | **95.96%** | 67.62% | 68.44% | 84.58% | 92.96% |
| Nagamese | Latin | **97.91%** | 81.60% | 78.05% | 93.46% | 97.60% |
| Nyishi | Latin | **94.50%** | 69.56% | 69.92% | 87.23% | 91.85% |
| **Average** | — | **94.99%** | 59.87% | 51.77% | 53.06% | 77.29% |

## Benchmark Test Set

A public benchmark test set is available in the `benchmark/` folder of this repository for reproducing evaluation results and comparing against other OCR models.

- **Combined:** `benchmark/ne_ocr_benchmark.parquet` — 26,000 samples across all 12 languages
- **Per-language:** `benchmark/{lang}_test.parquet` — 2,000 samples each
- **Format:** Parquet with columns: `image_path`, `text`, `lang`
- **Filter:** All samples ≤32 characters (word/line-level crops)

Results reported in this model card are computed on this exact test set.

## Usage
````python
import torch, json
import numpy as np
from PIL import Image
from huggingface_hub import hf_hub_download
from doctr.models import vitstr_base

# Download files
model_path = hf_hub_download(repo_id='MWirelabs/ne-ocr', filename='ne_ocr_best.pt')
vocab_path = hf_hub_download(repo_id='MWirelabs/ne-ocr', filename='ne_ocr_vocab.json')

# Load vocab
with open(vocab_path, encoding='utf-8') as f:
    vocab_data = json.load(f)
vocab_str = ''.join(vocab_data['vocab'][1:])

# Load model
model = vitstr_base(pretrained=False, vocab=vocab_str)
model.load_state_dict(torch.load(model_path, map_location='cpu'))
model.eval()

# Inference (word/line crop, max 32 chars)
img = Image.open('your_crop.jpg').convert('RGB').resize((128, 32))
img_tensor = torch.tensor(np.array(img, dtype=np.float32)/255.0).permute(2,0,1).unsqueeze(0)
out = model(img_tensor)
print(out['preds'][0][0])
````

## Notes
- Model is designed for **word/line-level crops** (≤32 characters), not full pages
- For full page OCR, use a text detection model first (e.g. DBNet) to extract crops
- Bodo accuracy is lower due to limited training data; planned improvement in V2

## License
CC-BY-4.0 — MWire Labs