Qwen3-VL-8B-Instruct-Unredacted-MAX-Quants-GGUF

This repository contains high-quality GGUF quantizations for the prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX model.

Highlights

Unredacted & MAX: Maximum performance version without restrictive filters.
Full Vision Support: Includes multiple versions of the vision projector (mmproj) for different hardware needs.
Optimized: Compatible with the latest llama.cpp and other GGUF-supported backends.

Files Included

1. Model Weights (LLM)

Filename	Quant Method	Description
`Q4_K_M.gguf`	Q4_K_M	Recommended. Best balance of speed and intelligence.
`Q8_0.gguf`	Q8_0	High quality, nearly identical to original weights.
`Q6_K.gguf`	Q6_K	Very high quality, slightly slower than Q4.
`Q5_K_M.gguf`	Q5_K_M	Good balance between Q4 and Q6.
`Q3_K_M.gguf`	Q3_K_M	Low size, moderate quality loss.
`Q2_K.gguf`	Q2_K	Smallest possible size, significant quality loss.
`F16.gguf`	F16	Baseline reference quality.

2. Vision Projectors (mmproj)

Required for image recognition tasks.

Filename	Type	Description
`mmproj-f32.gguf`	F32	Absolute maximum precision (2.3GB).
`mmproj-f16.gguf`	F16	Industry standard for high-quality vision.
`mmproj-bf16.gguf`	BF16	Optimized for modern NVIDIA GPUs (Ampere+).
`mmproj-q8_0.gguf`	Q8_0	Best for saving VRAM without losing recognition detail.

Usage

To use vision capabilities in llama.cpp, use the following command:

./llama-cli -m Qwen3-VL-8B-Instruct-Unredacted-MAX.Q4_K_M.gguf \
            --mmproj Qwen3-VL-8B-Instruct-Unredacted-MAX.mmproj-f16.gguf \
            --image path/to/your/image.jpg \
            -p "Describe this image"

Downloads last month: 1,028

GGUF

Model size

8B params

Architecture

qwen3vl

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for KuroTo4ka/Qwen3-VL-8B-Instruct-Unredacted-MAX-Quants-GGUF

Base model

Qwen/Qwen3-VL-8B-Instruct

Finetuned

prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX

Quantized

(6)

this model