---
language:
- en
- hi
- bn
- ta
- te
- gu
- kn
- ml
- mr
- or
- pa
- ur
- as
- brx
- doi
- gom
- kas
- mai
- mni
- ne
- sa
- sat
- sd
license: apache-2.0
base_model: Qwen/Qwen3-VL-4B-Instruct
tags:
- vision
- multilingual
- indic-languages
- gguf
- quantized
- translation
- document-understanding
- llama-cpp
datasets:
- ai4bharat/BPCC
- ai4bharat/Pralekha
- ai4bharat/indicdlp
- lmms-lab/DocVQA
---

# Sarvam-1-VL-4B-Instruct - GGUF (Quantized)

## Model Description

GGUF quantized version for CPU/edge deployment using llama.cpp. Includes Q4_K_M quantization for optimal size/quality balance.

## Files

- `qwen3-vl-4b-instruct.Q4_K_M.gguf` - Quantized model (4-bit)
- `qwen3-vl-4b-instruct.BF16-mmproj.gguf` - Quantized model (4-bit)

## Training Details

- **Base Model:** Qwen/Qwen3-VL-4B-Instruct
- **Quantization:** Q4_K_M
- **Original Training:** 2,000 steps, loss 6.25

## Datasets

Trained on 4 datasets covering:
- **Translation** (40%): BPCC - 22 Indic languages ↔ English
- **Instruction Following** (20%): Pralekha - 11 language pairs
- **Document Layout** (30%): IndicDLP - Document understanding
- **Visual QA** (10%): DocVQA - Question answering

## Supported Languages

Assamese, Bengali, Bodo, Dogri, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Maithili, Malayalam, Marathi, Manipuri, Nepali, Odia, Punjabi, Sanskrit, Santali, Sindhi, Tamil, Telugu, Urdu, English

## Usage with llama.cpp

```bash
# Run inference
llama-mtmd-cli \
  -m qwen3-vl-4b-instruct.Q4_K_M.gguf \
  --mmproj qwen3-vl-4b-instruct.BF16-mmproj.gguf \
  -p "Translate this to Hindi:" \
  --image document.jpg
```

## Memory Requirements

- **Q4_K_M:** ~2.5GB RAM
- **With mmproj:** ~3GB RAM total

## Performance

- **Speed:** Fast CPU inference
- **Quality:** Minimal degradation vs fp16
- **Deployment:** Ideal for edge devices

## License

Apache 2.0