Gemma 4 E4B IT AutoRound AWQ 4-bit
This repository contains an AutoRound AWQ 4-bit quantization of google/gemma-4-E4B-it.
Quantization summary
- Method: AutoRound -> AWQ
- Bit-width: 4-bit
- Group size: 128
- Iterations: 500
- Quantized block:
model.language_model.layers - Preserved in higher precision:
vision_tower,audio_tower,embed_vision,embed_audio,lm_head
Validation
This checkpoint was smoke-tested with the Transformers AWQ loader and generated the expected response to a simple text prompt.
Loader note
Use the Transformers AWQ loader. The working path that was validated is:
from transformers import AutoModelForCausalLM, AutoProcessor
model = AutoModelForCausalLM.from_pretrained(
"Chunity/gemma-4-E4B-it-AWQ-4bit",
dtype="auto",
low_cpu_mem_usage=False,
)
processor = AutoProcessor.from_pretrained("Chunity/gemma-4-E4B-it-AWQ-4bit")
Size
Approximate on-disk size: 9.9G
Caveat
This is a mixed FP/AWQ multimodal checkpoint. Runtime compatibility depends on loader support for modules_to_not_convert in the quantization config.
- Downloads last month
- 275
Model tree for Chunity/gemma-4-E4B-it-AWQ-4bit
Base model
google/gemma-4-E4B-it