--- language: - en - hi - bn - ta - te - gu - kn - ml - mr - or - pa - ur - as - brx - doi - gom - kas - mai - mni - ne - sa - sat - sd license: apache-2.0 base_model: Qwen/Qwen3-VL-4B-Instruct tags: - vision - multilingual - indic-languages - gguf - quantized - translation - document-understanding - llama-cpp datasets: - ai4bharat/BPCC - ai4bharat/Pralekha - ai4bharat/indicdlp - lmms-lab/DocVQA --- # Sarvam-1-VL-4B-Instruct - GGUF (Quantized) ## Model Description GGUF quantized version for CPU/edge deployment using llama.cpp. Includes Q4_K_M quantization for optimal size/quality balance. ## Files - `qwen3-vl-4b-instruct.Q4_K_M.gguf` - Quantized model (4-bit) - `qwen3-vl-4b-instruct.BF16-mmproj.gguf` - Quantized model (4-bit) ## Training Details - **Base Model:** Qwen/Qwen3-VL-4B-Instruct - **Quantization:** Q4_K_M - **Original Training:** 2,000 steps, loss 6.25 ## Datasets Trained on 4 datasets covering: - **Translation** (40%): BPCC - 22 Indic languages ↔ English - **Instruction Following** (20%): Pralekha - 11 language pairs - **Document Layout** (30%): IndicDLP - Document understanding - **Visual QA** (10%): DocVQA - Question answering ## Supported Languages Assamese, Bengali, Bodo, Dogri, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Maithili, Malayalam, Marathi, Manipuri, Nepali, Odia, Punjabi, Sanskrit, Santali, Sindhi, Tamil, Telugu, Urdu, English ## Usage with llama.cpp ```bash # Run inference llama-mtmd-cli \ -m qwen3-vl-4b-instruct.Q4_K_M.gguf \ --mmproj qwen3-vl-4b-instruct.BF16-mmproj.gguf \ -p "Translate this to Hindi:" \ --image document.jpg ``` ## Memory Requirements - **Q4_K_M:** ~2.5GB RAM - **With mmproj:** ~3GB RAM total ## Performance - **Speed:** Fast CPU inference - **Quality:** Minimal degradation vs fp16 - **Deployment:** Ideal for edge devices ## License Apache 2.0