Qwen3.6-27B-NVFP4

NVFP4 quantized version of Qwen/Qwen3.6-27B by Abiray using custom Blackwell NVFP4 GEMM kernels

55.6 GB → 19.7 GB (0.35x) with vision tower preserved in BF16.

NVFP4 Quantization Details


Base model	Qwen/Qwen3.6-27B
Quantization	NVFP4 (W4A4 — weights FP4, activations FP4, scales FP8)
Format	`compressed-tensors` (native vLLM support)
Tool	vllm-project/llm-compressor + blackwell-geforce-nvfp4-gemm
Size	19.7 GB (single safetensors shard)
Requires	NVIDIA Blackwell GPU (SM 120), vLLM >= 0.19

QuantizationModifier:
  targets: [Linear]
  ignore: [lm_head, 're:.*visual.*', 're:.*mlp.gate$', 're:.*mlp.shared_expert_gate$']
  scheme: NVFP4

Safetensors

Model size

17B params

Tensor type

F32

BF16

F8_E4M3

Base model

Quantized

(106)

this model