AWQ version of gemma-4-26B-A4B-it

#1
by ankandrew - opened

Hi, thanks for making this one available so fast! Could you also quantize https://huggingface.co/google/gemma-4-26B-A4B-it? It would be great, thanks!

cyankiwi org

Thank you for your interest :) The AWQ version for google/gemma-4-26B-A4B-it has already been quantized, but there is some weight loading mismatch between the quantized model and vllm. I am manually working on the model weight names as we speak and trying to make it work with vllm.

cyankiwi org

cyankiwi/gemma-4-26B-A4B-it-AWQ-4bit is out now :) Please enjoy

Sorry to ask this, but how's the performance of the quantized models you quantize, and of other orgs like Readhat or Nvidia? I'm not such into quantizations, but I want to know about the performance differences since I'll be deploying these for the company I'm working for.

Great, thanks @cpatonn ! Btw, what calibration dataset was used to quantize this model? Was it also quantized with multimodal data? Thanks.

Can this be served with sglang?

Sign up or log in to comment