Gemma 4 E4B IT AutoRound AWQ 4-bit

This repository contains an AutoRound AWQ 4-bit quantization of google/gemma-4-E4B-it.

Quantization summary

  • Method: AutoRound -> AWQ
  • Bit-width: 4-bit
  • Group size: 128
  • Iterations: 500
  • Quantized block: model.language_model.layers
  • Preserved in higher precision: vision_tower, audio_tower, embed_vision, embed_audio, lm_head

Validation

This checkpoint was smoke-tested with the Transformers AWQ loader and generated the expected response to a simple text prompt.

Loader note

Use the Transformers AWQ loader. The working path that was validated is:

from transformers import AutoModelForCausalLM, AutoProcessor

model = AutoModelForCausalLM.from_pretrained(
    "Chunity/gemma-4-E4B-it-AWQ-4bit",
    dtype="auto",
    low_cpu_mem_usage=False,
)
processor = AutoProcessor.from_pretrained("Chunity/gemma-4-E4B-it-AWQ-4bit")

Size

Approximate on-disk size: 9.9G

Caveat

This is a mixed FP/AWQ multimodal checkpoint. Runtime compatibility depends on loader support for modules_to_not_convert in the quantization config.

Downloads last month
275
Safetensors
Model size
5B params
Tensor type
I32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Chunity/gemma-4-E4B-it-AWQ-4bit

Quantized
(121)
this model