This is an experimental 4-bit quantization of the dense Qwen3.5-27B, using the unsloth imatrix data, but with the following special rules applied:

IQ4_NL script:

QUANT="IQ4_NL"
llama-quantize \
  --output-tensor-type q8_0 \
  --token-embedding-type q8_0 \
  --tensor-type attn_qkv=q8_0 \
  --tensor-type attn_k=bf16 \
  --tensor-type attn_v=bf16 \
  --tensor-type attn_q=q8_0 \
  --tensor-type attn_output=q8_0 \
  --tensor-type attn_gate=q8_0 \
  --tensor-type ssm_ba=bf16 \
  --tensor-type ssm_beta=bf16 \
  --tensor-type ssm_alpha=bf16 \
  --tensor-type ssm_out=q8_0 \
  --imatrix Qwen3.5-27B-imatrix.gguf_file \
  Qwen3.5-27B-BF16-00001-of-00002.gguf \
  Qwen3.5-27B.${QUANT}.gguf \
  ${QUANT}

IQ4_XS script:

QUANT="IQ4_XS"
llama-quantize \
  --output-tensor-type Q6_K \
  --token-embedding-type Q6_K \
  --tensor-type attn_qkv=q8_0 \
  --tensor-type attn_k=bf16 \
  --tensor-type attn_v=bf16 \
  --tensor-type attn_q=Q6_K \
  --tensor-type attn_output=q8_0 \
  --tensor-type attn_gate=q8_0 \
  --tensor-type ssm_ba=bf16 \
  --tensor-type ssm_beta=bf16 \
  --tensor-type ssm_alpha=bf16 \
  --tensor-type ssm_out=q8_0 \
  --tensor-type ffn_down=Q5_K \
  --imatrix Qwen3.5-27B-imatrix.gguf_file \
  BF16/Qwen3.5-27B-BF16-00001-of-00002.gguf \
  Qwen3.5-27B.${QUANT}.gguf \
  ${QUANT}

BONUS TRACK BONUS TRACK For users of ik_llama.cpp, I've added an iq4_k version as well:

QUANT="iq4_k"
llama-quantize \
  --output-tensor-type iq6_k \
  --token-embedding-type iq6_k \
  --custom-q attn_qkv=iq6_k \
  --custom-q attn_k=bf16 \
  --custom-q attn_v=bf16 \
  --custom-q attn_q=iq6_k \
  --custom-q attn_output=iq6_k \
  --custom-q attn_gate=iq6_k \
  --custom-q ssm_ba=bf16 \
  --custom-q ssm_beta=bf16 \
  --custom-q ssm_alpha=bf16 \
  --custom-q ssm_out=q8_0 \
  --custom-q ffn_down=iq5_k \
  --imatrix Qwen3.5-27B-imatrix.dat \
  BF16/Qwen3.5-27B-BF16-00001-of-00002.gguf \
  Qwen3.5-27B.${QUANT}.ik.gguf \
  ${QUANT}
Downloads last month
1,204
GGUF
Model size
27B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for dinerburger/Qwen3.5-27B-GGUF

Base model

Qwen/Qwen3.5-27B
Quantized
(201)
this model