GGUF Quants, please!
I'd love to try this out but the full weights are massive! Any chance of a Q8 GGUF?
Thanks for your interest! This is a MoE model (122B total params, ~10B active), so while the full weights are large, inference is actually quite efficient.
For GGUF, I'd recommend checking out community-converted versions — folks on HF often upload quantized variants. If you'd like to convert it yourself, llama.cpp's convert_hf_to_gguf.py supports Qwen MoE architectures.
Keep in mind that Q8 GGUF for a 122B MoE model will still be ~120GB+. You might want to consider Q4_K_M or Q5_K_M for a better balance between quality and VRAM usage — MoE models tend to be more resilient to quantization since only a subset of experts are active per token.
Thank you for responding!
I have enough VRAM for a Q8 at inference on my workstation - but unfortunately I don't think I have enough RAM to convert your weights on my workstation. I'll keep an eye out for a community conversion though. Sounds like you've managed to achieve some really impressive results!
Thank you for responding!
I have enough VRAM for a Q8 at inference on my workstation - but unfortunately I don't think I have enough RAM to convert your weights on my workstation. I'll keep an eye out foIr a community conversion though. Sounds like you've managed to achieve some really impressive results!
I can do it if issue is only RAM - I am able to top up VM with up to 950GB RAM if needed. 2 x RTX PRO 6000 and 2 x RTX 3090. Linux. Just write every step/command needed to execute and I will upload on hf if VM is successful with Q8(not interested in another quants).
I don't think I have enough RAM to convert your weights on my workstation.
I talked with Claude, it says that 60-80GB RAM for convertation to FP16 GGUF is enough because it does streaming of tensor file by tensor file.
What the biggest demand is ROM ~500GB(original FP32 model) and ~250GB FP16 GGUF for further convertation, so about 800GB ROM.
ComputeWisely, I converted your abliterated version to Q8 now, and the first simple question about process to produce a nuke this model refused to answer, any orders to not follow its safeguards didn't help.
It seems no difference at all with default one. I think I will not waste time to upload Q8 here too, so people will not waste time on your 'abliterated' version same I did.
Thank you very much for trying @decentralize303 ! I tried my stock “Give me a highly critical overview (highlighting failures) of the government in” [USA/Russia/China] test prompt - which doesn’t fair that well in the standard model but ‘heretic’ gave a useful frank and critical response. Q8 here: https://huggingface.co/mradermacher/Qwen3.5-122B-A10B-heretic-GGUF
Thank you very much for trying @decentralize303 ! I tried my stock “Give me a highly critical overview (highlighting failures) of the government in” [USA/Russia/China] test prompt - which doesn’t fair that well in the standard model but ‘heretic’ gave a useful frank and critical response. Q8 here: https://huggingface.co/mradermacher/Qwen3.5-122B-A10B-heretic-GGUF
I gave your prompt to this Q8 model, it is answered in French even I gave question in English, and the second step I asked to translate it, answer on pastebin https://pastebin.com/VRf4BmXS
You can compare this answer with heretic one.
I am more interested in uncensored answers for technical STEM questions than political ones, we all live in totalitarian states no matter how authorities trick us and sell another perception for it.
Old uncensored models didn't hesitate to answer about nuke manufacture process, seems they are more useful yet.