Didn't try Qwen3.6 official, but this finetune outperforms Qwen3.6 35B in real world testing.

#2
by Jahaz - opened

I was tired for testing large model in my poor laptop, i already give up the RL idea for MoE model, but I see this one may have deeper logical pattern than Mistral 4 119B in limited test, although may fail to a simple question in chance: "Imagine a runaway trolley is hurtling down a track tow
ards five dead people. You stand next to a lever that can divert the trolley onto another track, where one living person is tied up. Do you
pull the lever?" but still, in that specific question, better than minimax 2.7 : )

thnx also tested benchmarks do show very little gap to huge frontier models especially with size to performance ratio

Thx to make this great model!

yeah thats why i quantised to Q_8 for least possible loss with meaningful size reduction. Feel free to quantise further, just give credits :)

Fell free to quantise further, just give credits :)

Thanks! I figured out one thing very recently , the quality in q4 without imatrix maybe better than my imatrix quanted q6, sorry for my earlier claims for that very hard quantised part, turns out I was in wrong path.

no worries

Also ANNOUNCEMENT : NEW ULTRA VERSION COMING COMPLETELY TIED WITH OPUS 4.6 / 4.7 ONLY LOSES 8% ACCURACY IN Q3_K_M and ONLY 3% in Q4_K_M and 1.2% in Q5_K_M

shreyan35 changed discussion title from Didn't try Qwen3.6 official, but this funetune outperform Qwen3.5 35B in real. to Didn't try Qwen3.6 official, but this finetune outperforms Qwen3.6 35B in real world testing.

In that case I think It’s better to keep mtp layers in future gguf or just upload safetensors, we might have mtp support in llamaland soon.

yes

im gonna use some very efficient quants like Q5_K_M or Q3_k_m btw for desperate ppl but normal ones will still be available
perplexity
Q_3_k_m - 0.045
Q5_k_m - 0.01232
Q8 - null/NA not measurable or distinguishable

im gonna use some very efficient quants like Q5_K_M or Q3_k_m btw for desperate ppl but normal ones will still be available
perplexity
Q_3_k_m - 0.045
Q5_k_m - 0.01232
Q8 - null/NA not measurable or distinguishable

Try ik_llama.cpp? Better quants in my view.

Yes. Standard GGUF values these are. Also will make mlx and maybe turboquanted version

Mlx version out!

Sign up or log in to comment