Huihui-Qwen3.5-27B-abliterated โ AWQ W4A16 (text-only + MTP)
AWQ 4-bit quantization of huihui-ai/Huihui-Qwen3.5-27B-abliterated using AutoAWQ with Qwen3.5 support patches.
Text-only (vision encoder removed). MTP head preserved for speculative decoding.
Specs
| Property | Value |
|---|---|
| Quantization | AWQ W4A16 (group_size=128, zero_point=True) |
| Size on disk | 18.6 GB |
| MTP head | Included (BF16, 0.85 GB) |
| Vision encoder | Removed (-0.92 GB) |
| Calibration | 128 samples, Pile validation |
Important: MTP acceptance caveat
When used with MTP speculative decoding, this AWQ quantization shows lower MTP acceptance (31%) compared to GPTQ W4A16 (50%). This results in ~48% lower single-request throughput. For MTP-enabled serving, GPTQ is recommended. See j-a-a-a-y/Huihui-Qwen3.5-27B-abliterated-GPTQ-W4A16.
Benchmarks (RTX 5090, MTP=5)
| Metric | GPTQ W4A16 | This AWQ |
|---|---|---|
| Single 256 tok | 148.8 tok/s | 76.7 tok/s |
| MTP acceptance | 50% | 31% |
| Batch=4 agg | 410 tok/s | 313 tok/s |
Usage with vLLM
python -m vllm.entrypoints.openai.api_server \
--model j-a-a-a-y/Huihui-Qwen3.5-27B-abliterated-AWQ-W4A16 \
--served-model-name qwen3.5-27b \
--dtype float16 \
--quantization awq_marlin \
--speculative-config '{"method": "mtp", "num_speculative_tokens": 5}'
- Downloads last month
- 161
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for j-a-a-a-y/Huihui-Qwen3.5-27B-abliterated-AWQ-W4A16
Base model
Qwen/Qwen3.5-27B Finetuned
huihui-ai/Huihui-Qwen3.5-27B-abliterated