Qwen3.6-27B
Collection
3 items • Updated
NVFP4 quantized version of Qwen/Qwen3.6-27B by Abiray using custom Blackwell NVFP4 GEMM kernels
55.6 GB → 19.7 GB (0.35x) with vision tower preserved in BF16.
| Base model | Qwen/Qwen3.6-27B |
| Quantization | NVFP4 (W4A4 — weights FP4, activations FP4, scales FP8) |
| Format | compressed-tensors (native vLLM support) |
| Tool | vllm-project/llm-compressor + blackwell-geforce-nvfp4-gemm |
| Size | 19.7 GB (single safetensors shard) |
| Requires | NVIDIA Blackwell GPU (SM 120), vLLM >= 0.19 |
QuantizationModifier:
targets: [Linear]
ignore: [lm_head, 're:.*visual.*', 're:.*mlp.gate$', 're:.*mlp.shared_expert_gate$']
scheme: NVFP4
Base model
Qwen/Qwen3.6-27B