Qwen3-30B-A3B-AWQ-W4-g128-c4-512x512

Post-training quantized checkpoint of Qwen/Qwen3-30B-A3B produced by the pex/baselines pipeline as part of the PEX paper baselines.

Quantization

Knob Value
Method AWQ
Scheme W4
Group size 128
Producer tool autoawq
Format awq

Calibration

  • Corpus: datablations/c4-subsets (100m/c4_100m.jsonl)
  • Samples: 512 × 512 tokens
  • Seed: 0
  • Recipe fingerprint: d141fefd74a6f339

Skipped modules

Qwen3 MoE: skip lm_head, router (mlp.gate), and shared-expert gate. All MLP up/down/gate-proj inside each expert ARE quantized.

Serving with vLLM

vllm serve morriszjm/Qwen3-30B-A3B-AWQ-W4-g128-c4-512x512 \
  --tensor-parallel-size 4 \
  --gpu-memory-utilization 0.90 \
  --max-model-len 32768

vLLM auto-detects the quantization format from config.json. For AWQ, you may pass --quantization awq_marlin for the fastest kernel.

Reproducing

The exact producer recipe (including the calibration hash above) is in meta.json next to the weights.

Reference

This checkpoint is one of three quantization baselines (RTN / GPTQ / AWQ) used to anchor the Pareto plots in the PEX paper. Not a SOTA release — it is an out-of-the-box reference produced with each method's paper-default recipe to enable fair method-vs-method comparison.

Downloads last month
18
Safetensors
Model size
31B params
Tensor type
I32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for morriszjm/Qwen3-30B-A3B-AWQ-W4-g128-c4-512x512

Quantized
(118)
this model