Huihui-Qwen3.5-27B-abliterated — AWQ W4A16 (text-only + MTP)

AWQ 4-bit quantization of huihui-ai/Huihui-Qwen3.5-27B-abliterated using AutoAWQ with Qwen3.5 support patches.

Text-only (vision encoder removed). MTP head preserved for speculative decoding.

Specs

Property	Value
Quantization	AWQ W4A16 (group_size=128, zero_point=True)
Size on disk	18.6 GB
MTP head	Included (BF16, 0.85 GB)
Vision encoder	Removed (-0.92 GB)
Calibration	128 samples, Pile validation

Important: MTP acceptance caveat

When used with MTP speculative decoding, this AWQ quantization shows lower MTP acceptance (31%) compared to GPTQ W4A16 (50%). This results in ~48% lower single-request throughput. For MTP-enabled serving, GPTQ is recommended. See j-a-a-a-y/Huihui-Qwen3.5-27B-abliterated-GPTQ-W4A16.

Benchmarks (RTX 5090, MTP=5)

Metric	GPTQ W4A16	This AWQ
Single 256 tok	148.8 tok/s	76.7 tok/s
MTP acceptance	50%	31%
Batch=4 agg	410 tok/s	313 tok/s

Usage with vLLM

python -m vllm.entrypoints.openai.api_server \
    --model j-a-a-a-y/Huihui-Qwen3.5-27B-abliterated-AWQ-W4A16 \
    --served-model-name qwen3.5-27b \
    --dtype float16 \
    --quantization awq_marlin \
    --speculative-config '{"method": "mtp", "num_speculative_tokens": 5}'

Downloads last month: 161

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for j-a-a-a-y/Huihui-Qwen3.5-27B-abliterated-AWQ-W4A16

Base model

Qwen/Qwen3.5-27B

Finetuned

huihui-ai/Huihui-Qwen3.5-27B-abliterated

Quantized

(17)

this model