Qwen3.6-27B-NVFP4

NVFP4 quantized version of Qwen/Qwen3.6-27B by Abiray using custom Blackwell NVFP4 GEMM kernels

55.6 GB → 19.7 GB (0.35x) with vision tower preserved in BF16.

NVFP4 Quantization Details

Base model Qwen/Qwen3.6-27B
Quantization NVFP4 (W4A4 — weights FP4, activations FP4, scales FP8)
Format compressed-tensors (native vLLM support)
Tool vllm-project/llm-compressor + blackwell-geforce-nvfp4-gemm
Size 19.7 GB (single safetensors shard)
Requires NVIDIA Blackwell GPU (SM 120), vLLM >= 0.19

Recipe

QuantizationModifier:
  targets: [Linear]
  ignore: [lm_head, 're:.*visual.*', 're:.*mlp.gate$', 're:.*mlp.shared_expert_gate$']
  scheme: NVFP4
Downloads last month
-
Safetensors
Model size
17B params
Tensor type
F32
·
BF16
·
F8_E4M3
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Abiray/Qwen3.6-27B-NVFP4

Base model

Qwen/Qwen3.6-27B
Quantized
(106)
this model

Collection including Abiray/Qwen3.6-27B-NVFP4