This is currently a STATIC quant, because the imatrix tool seems to be broken with Gemma 4 (>100 ppl). I will update with an imatrix once I can verify correctness.
I made a custom imatrix dataset by slapping together random columns from some popular datasets on huggingface and formatting using the official jinja template. Comapred to the unstructured bartowski dataset, PPL went from multiple thousands to single digits, so I think it should be good now. Just in case, I mirrored the old static quant to https://huggingface.co/Beinsezii/gemma-4-31B-it-GGUF-5.05BPW-static
5.05 bpw, a mixture of Q5_K and Q4_K
This is a vRAM hog that barely fits ~32k CTX on a 24GiB GPU. I'm not willing to go lower on quant and risk compromising capability, so I would instead recommend quantizing K/V or putting a couple layers in DRAM for long context agentic tasks. Otherwise I'd use https://huggingface.co/Beinsezii/gemma-4-26B-A4B-it-GGUF-6.52BPW instead.
- Downloads last month
- 3,224
We're not able to determine the quantization variants.
Model tree for Beinsezii/gemma-4-31B-it-GGUF-5.05BPW
Base model
google/gemma-4-26B-A4B-it