metadata
language:
- en
- hi
- bn
- ta
- te
- gu
- kn
- ml
- mr
- or
- pa
- ur
- as
- brx
- doi
- gom
- kas
- mai
- mni
- ne
- sa
- sat
- sd
license: apache-2.0
base_model: Qwen/Qwen3-VL-4B-Instruct
tags:
- vision
- multilingual
- indic-languages
- gguf
- quantized
- translation
- document-understanding
- llama-cpp
datasets:
- ai4bharat/BPCC
- ai4bharat/Pralekha
- ai4bharat/indicdlp
- lmms-lab/DocVQA
Sarvam-1-VL-4B-Instruct - GGUF (Quantized)
Model Description
GGUF quantized version for CPU/edge deployment using llama.cpp. Includes Q4_K_M quantization for optimal size/quality balance.
Files
qwen3-vl-4b-instruct.Q4_K_M.gguf- Quantized model (4-bit)qwen3-vl-4b-instruct.BF16-mmproj.gguf- Quantized model (4-bit)
Training Details
- Base Model: Qwen/Qwen3-VL-4B-Instruct
- Quantization: Q4_K_M
- Original Training: 2,000 steps, loss 6.25
Datasets
Trained on 4 datasets covering:
- Translation (40%): BPCC - 22 Indic languages ↔ English
- Instruction Following (20%): Pralekha - 11 language pairs
- Document Layout (30%): IndicDLP - Document understanding
- Visual QA (10%): DocVQA - Question answering
Supported Languages
Assamese, Bengali, Bodo, Dogri, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Maithili, Malayalam, Marathi, Manipuri, Nepali, Odia, Punjabi, Sanskrit, Santali, Sindhi, Tamil, Telugu, Urdu, English
Usage with llama.cpp
# Run inference
llama-mtmd-cli \
-m qwen3-vl-4b-instruct.Q4_K_M.gguf \
--mmproj qwen3-vl-4b-instruct.BF16-mmproj.gguf \
-p "Translate this to Hindi:" \
--image document.jpg
Memory Requirements
- Q4_K_M: ~2.5GB RAM
- With mmproj: ~3GB RAM total
Performance
- Speed: Fast CPU inference
- Quality: Minimal degradation vs fp16
- Deployment: Ideal for edge devices
License
Apache 2.0