GPTQ + KV TQ4 local results: HumanEval ~ 67.68%

by shawnqiu - opened 5 days ago

Hi, thanks a lot for releasing this model — great work.

I have a quick question regarding the HumanEval pass@1 comparison and test alignment.

Qwopus3.5‑9B‑v3 (base) reports a pass@1 = 87.80% (144 / 164), which is very strong. In my own local tests of the GPTQ variant (using KV quantization = TQ4, HumanEval setup on my side), I get pass@1 ≈ 67.68%. This is still very high quality in practice, but noticeably lower than the reported base-model result.

caiovicentino1

Owner 5 days ago

Hi, thanks a lot for releasing this model — great work.

I have a quick question regarding the HumanEval pass@1 comparison and test alignment.

Qwopus3.5‑9B‑v3 (base) reports a pass@1 = 87.80% (144 / 164), which is very strong. In my own local tests of the GPTQ variant (using KV quantization = TQ4, HumanEval setup on my side), I get pass@1 ≈ 67.68%. This is still very high quality in practice, but noticeably lower than the reported base-model result.

i appreciate your work!

still great results but far from the baseline, sorry for any miss interp.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment