Qwen3.6-35B-A3B-heretic GGUF

This repository contains GGUF format quantizations of the tvall43/Qwen3.6-35B-A3B-heretic model.

These quantizations were compiled and converted using high-performance compute environments to ensure broad compatibility with modern GGML-based inferencing engines.

Available Quantizations

The following quantization methods are provided to suit various hardware capabilities, from VRAM-constrained setups to high-end rigs.

Format Characteristics Recommended Use
Q3_K_M Smallest, highest perplexity loss Maximum space savings; quality degradation is noticeable.
Q4_K_S Small, slightly higher perplexity Good balance for tight VRAM limits.
Q4_K_M Medium, optimal balance Recommended - Excellent balance of size, speed, and quality.
Q5_K_M Large, very low perplexity loss High quality, requires more VRAM.
Q6_K Very large, near-lossless Premium quality for high-VRAM systems.
Q8_0 Largest, practically lossless Best for compute-heavy setups; requires massive VRAM.

Usage with llama.cpp

You can run these models using the latest versions of llama.cpp. Note that recent versions of llama.cpp use the cmake build system.

CLI Example

To run inference via the command line:

./build/bin/llama-cli -m Qwen3.6-35B-A3B-heretic-Q4_K_M.gguf -p "Explain the concept of quantum entanglement." -n 512
Downloads last month
17,059
GGUF
Model size
35B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Abiray/Qwen3.6-35B-A3B-heretic-GGUF

Quantized
(13)
this model

Collection including Abiray/Qwen3.6-35B-A3B-heretic-GGUF