Huihui-Qwen3.5-27B-abliterated โ€” AWQ W4A16 (text-only + MTP)

AWQ 4-bit quantization of huihui-ai/Huihui-Qwen3.5-27B-abliterated using AutoAWQ with Qwen3.5 support patches.

Text-only (vision encoder removed). MTP head preserved for speculative decoding.

Specs

Property Value
Quantization AWQ W4A16 (group_size=128, zero_point=True)
Size on disk 18.6 GB
MTP head Included (BF16, 0.85 GB)
Vision encoder Removed (-0.92 GB)
Calibration 128 samples, Pile validation

Important: MTP acceptance caveat

When used with MTP speculative decoding, this AWQ quantization shows lower MTP acceptance (31%) compared to GPTQ W4A16 (50%). This results in ~48% lower single-request throughput. For MTP-enabled serving, GPTQ is recommended. See j-a-a-a-y/Huihui-Qwen3.5-27B-abliterated-GPTQ-W4A16.

Benchmarks (RTX 5090, MTP=5)

Metric GPTQ W4A16 This AWQ
Single 256 tok 148.8 tok/s 76.7 tok/s
MTP acceptance 50% 31%
Batch=4 agg 410 tok/s 313 tok/s

Usage with vLLM

python -m vllm.entrypoints.openai.api_server \
    --model j-a-a-a-y/Huihui-Qwen3.5-27B-abliterated-AWQ-W4A16 \
    --served-model-name qwen3.5-27b \
    --dtype float16 \
    --quantization awq_marlin \
    --speculative-config '{"method": "mtp", "num_speculative_tokens": 5}'
Downloads last month
161
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for j-a-a-a-y/Huihui-Qwen3.5-27B-abliterated-AWQ-W4A16

Base model

Qwen/Qwen3.5-27B
Quantized
(17)
this model