AWQ version of gemma-4-26B-A4B-it
#1
by ankandrew - opened
Hi, thanks for making this one available so fast! Could you also quantize https://huggingface.co/google/gemma-4-26B-A4B-it? It would be great, thanks!
Thank you for your interest :) The AWQ version for google/gemma-4-26B-A4B-it has already been quantized, but there is some weight loading mismatch between the quantized model and vllm. I am manually working on the model weight names as we speak and trying to make it work with vllm.
Sorry to ask this, but how's the performance of the quantized models you quantize, and of other orgs like Readhat or Nvidia? I'm not such into quantizations, but I want to know about the performance differences since I'll be deploying these for the company I'm working for.
Can this be served with sglang?