AWQ version of gemma-4-26B-A4B-it

by ankandrew - opened 12 days ago

Hi, thanks for making this one available so fast! Could you also quantize https://huggingface.co/google/gemma-4-26B-A4B-it? It would be great, thanks!

cpatonn

cyankiwi org 12 days ago

Thank you for your interest :) The AWQ version for google/gemma-4-26B-A4B-it has already been quantized, but there is some weight loading mismatch between the quantized model and vllm. I am manually working on the model weight names as we speak and trying to make it work with vllm.

cpatonn

cyankiwi org 12 days ago

cyankiwi/gemma-4-26B-A4B-it-AWQ-4bit is out now :) Please enjoy

docato

12 days ago

Sorry to ask this, but how's the performance of the quantized models you quantize, and of other orgs like Readhat or Nvidia? I'm not such into quantizations, but I want to know about the performance differences since I'll be deploying these for the company I'm working for.

ankandrew

12 days ago

Great, thanks @cpatonn ! Btw, what calibration dataset was used to quantize this model? Was it also quantized with multimodal data? Thanks.

deekseep

11 days ago

Can this be served with sglang?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment