1

Qwen3-VL-8B-Instruct-Unredacted-MAX-Quants-GGUF

This repository contains high-quality GGUF quantizations for the prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX model.

Highlights

  • Unredacted & MAX: Maximum performance version without restrictive filters.
  • Full Vision Support: Includes multiple versions of the vision projector (mmproj) for different hardware needs.
  • Optimized: Compatible with the latest llama.cpp and other GGUF-supported backends.

Files Included

1. Model Weights (LLM)

Filename Quant Method Description
Q4_K_M.gguf Q4_K_M Recommended. Best balance of speed and intelligence.
Q8_0.gguf Q8_0 High quality, nearly identical to original weights.
Q6_K.gguf Q6_K Very high quality, slightly slower than Q4.
Q5_K_M.gguf Q5_K_M Good balance between Q4 and Q6.
Q3_K_M.gguf Q3_K_M Low size, moderate quality loss.
Q2_K.gguf Q2_K Smallest possible size, significant quality loss.
F16.gguf F16 Baseline reference quality.

2. Vision Projectors (mmproj)

Required for image recognition tasks.

Filename Type Description
mmproj-f32.gguf F32 Absolute maximum precision (2.3GB).
mmproj-f16.gguf F16 Industry standard for high-quality vision.
mmproj-bf16.gguf BF16 Optimized for modern NVIDIA GPUs (Ampere+).
mmproj-q8_0.gguf Q8_0 Best for saving VRAM without losing recognition detail.

Usage

To use vision capabilities in llama.cpp, use the following command:

./llama-cli -m Qwen3-VL-8B-Instruct-Unredacted-MAX.Q4_K_M.gguf \
            --mmproj Qwen3-VL-8B-Instruct-Unredacted-MAX.mmproj-f16.gguf \
            --image path/to/your/image.jpg \
            -p "Describe this image"
Downloads last month
1,028
GGUF
Model size
8B params
Architecture
qwen3vl
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for KuroTo4ka/Qwen3-VL-8B-Instruct-Unredacted-MAX-Quants-GGUF