Qwen3-30B-A3B-AWQ-W4-g128-c4-512x512

Post-training quantized checkpoint of Qwen/Qwen3-30B-A3B produced by the pex/baselines pipeline as part of the PEX paper baselines.

Quantization

Knob	Value
Method	AWQ
Scheme	`W4`
Group size	`128`
Producer tool	`autoawq`
Format	`awq`

Calibration

Corpus: datablations/c4-subsets (100m/c4_100m.jsonl)
Samples: 512 × 512 tokens
Seed: 0
Recipe fingerprint: d141fefd74a6f339

Skipped modules

Qwen3 MoE: skip lm_head, router (mlp.gate), and shared-expert gate. All MLP up/down/gate-proj inside each expert ARE quantized.

Serving with vLLM

vllm serve morriszjm/Qwen3-30B-A3B-AWQ-W4-g128-c4-512x512 \
  --tensor-parallel-size 4 \
  --gpu-memory-utilization 0.90 \
  --max-model-len 32768

vLLM auto-detects the quantization format from config.json. For AWQ, you may pass --quantization awq_marlin for the fastest kernel.

Reproducing

The exact producer recipe (including the calibration hash above) is in meta.json next to the weights.

Reference

This checkpoint is one of three quantization baselines (RTN / GPTQ / AWQ) used to anchor the Pareto plots in the PEX paper. Not a SOTA release — it is an out-of-the-box reference produced with each method's paper-default recipe to enable fair method-vs-method comparison.

Downloads last month: 18

Safetensors

Model size

31B params

Tensor type

I32

BF16

Model tree for morriszjm/Qwen3-30B-A3B-AWQ-W4-g128-c4-512x512

Base model

Qwen/Qwen3-30B-A3B-Base

Finetuned

Qwen/Qwen3-30B-A3B

Quantized

(118)

this model