Qwen3.6-35B-A3B-heretic GGUF

This repository contains GGUF format quantizations of the tvall43/Qwen3.6-35B-A3B-heretic model.

These quantizations were compiled and converted using high-performance compute environments to ensure broad compatibility with modern GGML-based inferencing engines.

Available Quantizations

The following quantization methods are provided to suit various hardware capabilities, from VRAM-constrained setups to high-end rigs.

Format	Characteristics	Recommended Use
Q3_K_M	Smallest, highest perplexity loss	Maximum space savings; quality degradation is noticeable.
Q4_K_S	Small, slightly higher perplexity	Good balance for tight VRAM limits.
Q4_K_M	Medium, optimal balance	Recommended - Excellent balance of size, speed, and quality.
Q5_K_M	Large, very low perplexity loss	High quality, requires more VRAM.
Q6_K	Very large, near-lossless	Premium quality for high-VRAM systems.
Q8_0	Largest, practically lossless	Best for compute-heavy setups; requires massive VRAM.

Usage with llama.cpp

You can run these models using the latest versions of llama.cpp. Note that recent versions of llama.cpp use the cmake build system.

CLI Example

To run inference via the command line:

./build/bin/llama-cli -m Qwen3.6-35B-A3B-heretic-Q4_K_M.gguf -p "Explain the concept of quantum entanglement." -n 512

Downloads last month: 17,059

GGUF

Model size

35B params

Architecture

qwen35moe

Hardware compatibility

3-bit

4-bit

5-bit

6-bit

8-bit

Model tree for Abiray/Qwen3.6-35B-A3B-heretic-GGUF

Base model

Qwen/Qwen3.6-35B-A3B

Finetuned

tvall43/Qwen3.6-35B-A3B-heretic

Quantized

(13)

this model

Collection including Abiray/Qwen3.6-35B-A3B-heretic-GGUF

Qwen3.6-35B-A3B

Collection

10 items • Updated 9 days ago