license: apache-2.0
base_model: Jackrong/Qwopus3.5-27B-v3
tags:
- hlwq
- quantized
- gptq
- int4
- polarquant
- vllm
- marlin
pipeline_tag: text-generation
model-index:
- name: Qwopus3.5-27B-v3-PolarQuant-v7-GPTQ
results:
- task:
type: text-generation
name: Code Generation
dataset:
name: HumanEval
type: openai_humaneval
metrics:
- name: pass@1 (thinking)
type: pass@1
value: 78.66
- name: pass@1 (standard)
type: pass@1
value: 55.49
Naming notice (2026-04-10). The "PolarQuant" technique used in this model is being rebranded to HLWQ (Hadamard-Lloyd Weight Quantization). The change is only the name; the algorithm and the weights in this repository are unchanged.
The rebrand resolves a name collision with an unrelated, earlier KV cache quantization method also named PolarQuant (Han et al., arXiv:2502.02617, 2025). HLWQ addresses weight quantization with a deterministic Walsh-Hadamard rotation and Lloyd-Max scalar codebook; Han et al.'s PolarQuant addresses KV cache quantization with a random polar rotation. The two methods are technically distinct.
Existing loaders that load this repository by ID continue to work without changes. Future model uploads will use the HLWQ name.
Reference paper for this technique: arXiv:2603.29078 (v2 in preparation; v1 still uses the old name).
π§ Qwopus3.5-27B-v3 PolarQuant v7 GPTQ
π― 27B Reasoning Model in 19 GB
| Metric | INT4 (ours) | BF16 (base) | Delta |
|---|---|---|---|
| π― HumanEval (thinking) | 78.66% | 97.56% (Jackrong) | -18.9pp |
| π― HumanEval (standard) | 55.49% | not measured | β |
| π¦ Download | 19.2 GB | 54.7 GB | -65% |
| β‘ BPW | 4.475 | 16 | 3.6x smaller |
| π Kernel | Marlin | β | Native vLLM |
Note on the thinking-mode gap: The 18.9pp delta from BF16 (97.56% β 78.66%) is a real quality impact of INT4 quantization on chain-of-thought code generation at 27B scale. Users who need maximum thinking-mode quality should consider the BF16 base model from Jackrong.
π Benchmarks
π¬ About This Model
This is the 27B reasoning model from the Qwopus3.5 series, quantized with our proven PolarQuant v7 config. The model uses <think> tags for chain-of-thought reasoning before answering.
Uses the same GPTQ gs64 + FOEM config as our 9B v7 release, where it narrowly edged out BF16 on HumanEval standard mode (67.07% vs 66.87%). Note that at 27B thinking-mode scale there is a measurable gap from the BF16 baseline (see results below).
π Quick Start
vLLM (recommended)
from vllm import LLM, SamplingParams
model = LLM(
"caiovicentino1/Qwopus3.5-27B-v3-PolarQuant-v7-GPTQ",
trust_remote_code=True,
language_model_only=True,
gpu_memory_utilization=0.75,
)
output = model.generate(
["Write a Python function to sort a list:"],
SamplingParams(max_tokens=4096, temperature=0.0),
)
print(output[0].outputs[0].text)
vLLM Server
vllm serve caiovicentino1/Qwopus3.5-27B-v3-PolarQuant-v7-GPTQ \
--trust-remote-code --language-model-only \
--gpu-memory-utilization 0.75 --max-model-len 16384
π§ Quantization Config
# Same config as our 9B v7 β bits=4, gs=64, FOEM
from gptqmodel import GPTQModel
from gptqmodel.quantization import QuantizeConfig
from gptqmodel.quantization.config import FOEMConfig
quantize_config = QuantizeConfig(
bits=4,
group_size=64,
sym=True,
desc_act=True,
foem=FOEMConfig(alpha=0.25, beta=0.2, device="auto")
)
- Quantizer: GPTQModel v6.0.3
- Calibration: 512 samples from
neuralmagic/LLM_compression_calibration - Kernel: Marlin (native vLLM, zero overhead)
- Time: 33 min on RTX PRO 6000 Blackwell (102 GB)
π HumanEval Results
| Mode | Score | Method |
|---|---|---|
| Thinking (chat template) | 78.66% | 129/164, automated exec() |
| Standard (lm-eval) | 55.49% | lm_eval --tasks humaneval |
| BF16 Thinking (Jackrong) | 97.56% | Reported by base model author |
About the 18.9pp gap from BF16 thinking
The drop from 97.56% to 78.66% reflects a real quality cost of INT4 quantization on chain-of-thought code generation at 27B scale. Thinking mode is more sensitive to weight precision than single-shot generation because small numerical errors can compound across reasoning steps.
Part of the gap may also come from differences in evaluation harness (we use automated exec()-based checking), but we have not isolated this component through direct comparison.
Recommendation: if you need maximum thinking-mode quality, use the BF16 base model. If you need fast low-VRAM vLLM serving and can tolerate this gap, v7-GPTQ is the right pick.
π Technical Details
| Parameter | Value |
|---|---|
| Base Model | Jackrong/Qwopus3.5-27B-v3 |
| Architecture | Qwen3.5 (48 linear_attn + 16 full_attn) |
| Hidden Size | 5120 |
| Layers | 64 |
| Bits | 4 |
| Group Size | 64 |
| FOEM | alpha=0.25, beta=0.2 |
| BPW | 4.475 |
| Format | GPTQ v1 (Marlin compatible) |
π Links
- π 9B version (67.07% beats BF16)
- π Paper: PolarQuant β arXiv:2603.29078
- π» GitHub: polarengine-vllm
- π¦ PyPI: pip install polarquant
- π Base Model: Jackrong/Qwopus3.5-27B-v3
π Citation
@article{vicentino2026polarquant,
title={PolarQuant: Polar Coordinate Quantization for Efficient LLM Inference},
author={Vicentino, Caio},
journal={arXiv preprint arXiv:2603.29078},
year={2026}
}
π Acknowledgements
- Jackrong for the Qwopus3.5-27B-v3 base model and HumanEval methodology
- GPTQModel team for FOEM implementation
- vLLM team for Marlin kernel support


