shreyan35/Qwen3.6-35b-a3b-claude-opus-4.6-thinking-distil-s8-gguf · Didn't try Qwen3.6 official, but this finetune outperforms Qwen3.6 35B in real world testing.

Didn't try Qwen3.6 official, but this finetune outperforms Qwen3.6 35B in real world testing.

by Jahaz - opened 11 days ago

•

I was tired for testing large model in my poor laptop, i already give up the RL idea for MoE model, but I see this one may have deeper logical pattern than Mistral 4 119B in limited test, although may fail to a simple question in chance: "Imagine a runaway trolley is hurtling down a track tow
ards five dead people. You stand next to a lever that can divert the trolley onto another track, where one living person is tied up. Do you
pull the lever?" but still, in that specific question, better than minimax 2.7 : )

shreyan35

Owner 10 days ago

thnx also tested benchmarks do show very little gap to huge frontier models especially with size to performance ratio

Jahaz

10 days ago

•

edited 8 days ago by

shreyan35

Thx to make this great model!

shreyan35

Owner 9 days ago

•

edited 8 days ago

yeah thats why i quantised to Q_8 for least possible loss with meaningful size reduction. Feel free to quantise further, just give credits :)

Jahaz

8 days ago

Fell free to quantise further, just give credits :)

Thanks! I figured out one thing very recently , the quality in q4 without imatrix maybe better than my imatrix quanted q6, sorry for my earlier claims for that very hard quantised part, turns out I was in wrong path.

shreyan35

Owner 8 days ago

no worries

shreyan35

Owner 8 days ago

•

edited 8 days ago

Also ANNOUNCEMENT : NEW ULTRA VERSION COMING COMPLETELY TIED WITH OPUS 4.6 / 4.7 ONLY LOSES 8% ACCURACY IN Q3_K_M and ONLY 3% in Q4_K_M and 1.2% in Q5_K_M

shreyan35 changed discussion title from Didn't try Qwen3.6 official, but this funetune outperform Qwen3.5 35B in real. to Didn't try Qwen3.6 official, but this finetune outperforms Qwen3.6 35B in real world testing. 8 days ago

Jahaz

8 days ago

In that case I think It’s better to keep mtp layers in future gguf or just upload safetensors, we might have mtp support in llamaland soon.

shreyan35

Owner 5 days ago

yes

shreyan35

Owner 5 days ago

im gonna use some very efficient quants like Q5_K_M or Q3_k_m btw for desperate ppl but normal ones will still be available
perplexity
Q_3_k_m - 0.045
Q5_k_m - 0.01232
Q8 - null/NA not measurable or distinguishable

Jahaz

4 days ago

im gonna use some very efficient quants like Q5_K_M or Q3_k_m btw for desperate ppl but normal ones will still be available
perplexity
Q_3_k_m - 0.045
Q5_k_m - 0.01232
Q8 - null/NA not measurable or distinguishable

Try ik_llama.cpp? Better quants in my view.

shreyan35

Owner 3 days ago

Yes. Standard GGUF values these are. Also will make mlx and maybe turboquanted version

shreyan35

Owner 3 days ago

Mlx version out!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment