Qwen3.6-35B-A3B
Collection
10 items • Updated
This repository contains GGUF format quantizations of the tvall43/Qwen3.6-35B-A3B-heretic model.
These quantizations were compiled and converted using high-performance compute environments to ensure broad compatibility with modern GGML-based inferencing engines.
The following quantization methods are provided to suit various hardware capabilities, from VRAM-constrained setups to high-end rigs.
| Format | Characteristics | Recommended Use |
|---|---|---|
| Q3_K_M | Smallest, highest perplexity loss | Maximum space savings; quality degradation is noticeable. |
| Q4_K_S | Small, slightly higher perplexity | Good balance for tight VRAM limits. |
| Q4_K_M | Medium, optimal balance | Recommended - Excellent balance of size, speed, and quality. |
| Q5_K_M | Large, very low perplexity loss | High quality, requires more VRAM. |
| Q6_K | Very large, near-lossless | Premium quality for high-VRAM systems. |
| Q8_0 | Largest, practically lossless | Best for compute-heavy setups; requires massive VRAM. |
You can run these models using the latest versions of llama.cpp. Note that recent versions of llama.cpp use the cmake build system.
To run inference via the command line:
./build/bin/llama-cli -m Qwen3.6-35B-A3B-heretic-Q4_K_M.gguf -p "Explain the concept of quantum entanglement." -n 512
3-bit
4-bit
5-bit
6-bit
8-bit