Qwen3-VL-8B-Instruct-Unredacted-MAX-Quants-GGUF
This repository contains high-quality GGUF quantizations for the prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX model.
Highlights
- Unredacted & MAX: Maximum performance version without restrictive filters.
- Full Vision Support: Includes multiple versions of the vision projector (
mmproj) for different hardware needs. - Optimized: Compatible with the latest
llama.cppand other GGUF-supported backends.
Files Included
1. Model Weights (LLM)
| Filename | Quant Method | Description |
|---|---|---|
Q4_K_M.gguf |
Q4_K_M | Recommended. Best balance of speed and intelligence. |
Q8_0.gguf |
Q8_0 | High quality, nearly identical to original weights. |
Q6_K.gguf |
Q6_K | Very high quality, slightly slower than Q4. |
Q5_K_M.gguf |
Q5_K_M | Good balance between Q4 and Q6. |
Q3_K_M.gguf |
Q3_K_M | Low size, moderate quality loss. |
Q2_K.gguf |
Q2_K | Smallest possible size, significant quality loss. |
F16.gguf |
F16 | Baseline reference quality. |
2. Vision Projectors (mmproj)
Required for image recognition tasks.
| Filename | Type | Description |
|---|---|---|
mmproj-f32.gguf |
F32 | Absolute maximum precision (2.3GB). |
mmproj-f16.gguf |
F16 | Industry standard for high-quality vision. |
mmproj-bf16.gguf |
BF16 | Optimized for modern NVIDIA GPUs (Ampere+). |
mmproj-q8_0.gguf |
Q8_0 | Best for saving VRAM without losing recognition detail. |
Usage
To use vision capabilities in llama.cpp, use the following command:
./llama-cli -m Qwen3-VL-8B-Instruct-Unredacted-MAX.Q4_K_M.gguf \
--mmproj Qwen3-VL-8B-Instruct-Unredacted-MAX.mmproj-f16.gguf \
--image path/to/your/image.jpg \
-p "Describe this image"
- Downloads last month
- 1,028
Hardware compatibility
Log In to add your hardware
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
Model tree for KuroTo4ka/Qwen3-VL-8B-Instruct-Unredacted-MAX-Quants-GGUF
Base model
Qwen/Qwen3-VL-8B-Instruct