Gemma 4 E4B IT AutoRound AWQ 4-bit

This repository contains an AutoRound AWQ 4-bit quantization of google/gemma-4-E4B-it.

Quantization summary

Method: AutoRound -> AWQ
Bit-width: 4-bit
Group size: 128
Iterations: 500
Quantized block: model.language_model.layers
Preserved in higher precision: vision_tower, audio_tower, embed_vision, embed_audio, lm_head

Validation

This checkpoint was smoke-tested with the Transformers AWQ loader and generated the expected response to a simple text prompt.

Loader note

Use the Transformers AWQ loader. The working path that was validated is:

from transformers import AutoModelForCausalLM, AutoProcessor

model = AutoModelForCausalLM.from_pretrained(
    "Chunity/gemma-4-E4B-it-AWQ-4bit",
    dtype="auto",
    low_cpu_mem_usage=False,
)
processor = AutoProcessor.from_pretrained("Chunity/gemma-4-E4B-it-AWQ-4bit")

Size

Approximate on-disk size: 9.9G

Caveat

This is a mixed FP/AWQ multimodal checkpoint. Runtime compatibility depends on loader support for modules_to_not_convert in the quantization config.

Downloads last month: 275

Safetensors

Model size

5B params

Tensor type

I32

BF16

Model tree for Chunity/gemma-4-E4B-it-AWQ-4bit

Base model

google/gemma-4-E4B-it

Quantized

(121)

this model