mashriram's picture
Update README.md
d6e1a6e verified
metadata
language:
  - en
  - hi
  - bn
  - ta
  - te
  - gu
  - kn
  - ml
  - mr
  - or
  - pa
  - ur
  - as
  - brx
  - doi
  - gom
  - kas
  - mai
  - mni
  - ne
  - sa
  - sat
  - sd
license: apache-2.0
base_model: Qwen/Qwen3-VL-4B-Instruct
tags:
  - vision
  - multilingual
  - indic-languages
  - gguf
  - quantized
  - translation
  - document-understanding
  - llama-cpp
datasets:
  - ai4bharat/BPCC
  - ai4bharat/Pralekha
  - ai4bharat/indicdlp
  - lmms-lab/DocVQA

Sarvam-1-VL-4B-Instruct - GGUF (Quantized)

Model Description

GGUF quantized version for CPU/edge deployment using llama.cpp. Includes Q4_K_M quantization for optimal size/quality balance.

Files

  • qwen3-vl-4b-instruct.Q4_K_M.gguf - Quantized model (4-bit)
  • qwen3-vl-4b-instruct.BF16-mmproj.gguf - Quantized model (4-bit)

Training Details

  • Base Model: Qwen/Qwen3-VL-4B-Instruct
  • Quantization: Q4_K_M
  • Original Training: 2,000 steps, loss 6.25

Datasets

Trained on 4 datasets covering:

  • Translation (40%): BPCC - 22 Indic languages ↔ English
  • Instruction Following (20%): Pralekha - 11 language pairs
  • Document Layout (30%): IndicDLP - Document understanding
  • Visual QA (10%): DocVQA - Question answering

Supported Languages

Assamese, Bengali, Bodo, Dogri, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Maithili, Malayalam, Marathi, Manipuri, Nepali, Odia, Punjabi, Sanskrit, Santali, Sindhi, Tamil, Telugu, Urdu, English

Usage with llama.cpp

# Run inference
llama-mtmd-cli \
  -m qwen3-vl-4b-instruct.Q4_K_M.gguf \
  --mmproj qwen3-vl-4b-instruct.BF16-mmproj.gguf \
  -p "Translate this to Hindi:" \
  --image document.jpg

Memory Requirements

  • Q4_K_M: ~2.5GB RAM
  • With mmproj: ~3GB RAM total

Performance

  • Speed: Fast CPU inference
  • Quality: Minimal degradation vs fp16
  • Deployment: Ideal for edge devices

License

Apache 2.0