GPTQ + KV TQ4 local results: HumanEval ~ 67.68%

#2
by shawnqiu - opened

Hi, thanks a lot for releasing this model β€” great work.

I have a quick question regarding the HumanEval pass@1 comparison and test alignment.

Qwopus3.5‑9B‑v3 (base) reports a pass@1 = 87.80% (144 / 164), which is very strong. In my own local tests of the GPTQ variant (using KV quantization = TQ4, HumanEval setup on my side), I get pass@1 β‰ˆ 67.68%. This is still very high quality in practice, but noticeably lower than the reported base-model result.

Hi, thanks a lot for releasing this model β€” great work.

I have a quick question regarding the HumanEval pass@1 comparison and test alignment.

Qwopus3.5‑9B‑v3 (base) reports a pass@1 = 87.80% (144 / 164), which is very strong. In my own local tests of the GPTQ variant (using KV quantization = TQ4, HumanEval setup on my side), I get pass@1 β‰ˆ 67.68%. This is still very high quality in practice, but noticeably lower than the reported base-model result.

i appreciate your work!

still great results but far from the baseline, sorry for any miss interp.

Sign up or log in to comment