gemma-4-26B-A4B-it-qat-ptq-NVFP4

This repository contains an NVFP4 post-training quantized (PTQ) version of the Gemma 4 26B A4B instruction-tuned Mixture-of-Experts (MoE) model, created from the QAT checkpoint google/gemma-4-26B-A4B-it-qat-q4_0-unquantized. The model was quantized using Neural Magic's LLM Compressor with the NVFP4 scheme, applying data-driven calibration on the neuralmagic/calibration dataset (20 samples, 8192 sequence length) to quantize both weights and activations while preserving inference quality. During quantization, the language modeling head, embedding layers, MoE router layers, and vision tower components were excluded from compression according to the official Gemma 4 NVFP4 workflow. MoE expert calibration was handled automatically through the SequentialGemma4TextExperts pipeline, ensuring proper expert routing behavior and compatibility with compressed-tensors inference runtimes. The resulting model is stored in compressed-tensors format and is intended for efficient deployment, reduced memory consumption, and accelerated inference while retaining the multimodal instruction-following, reasoning, coding, and long-context capabilities of the original Gemma 4 26B A4B architecture. The original base model is available at google/gemma-4-26B-A4B-it-qat-q4_0-unquantized.

recipe.yaml

Setting Value
Modifier QuantizationModifier
Targets Linear
Scheme NVFP4
Ignore Layers lm_head
re:.*embed.*
re:.*router.*
re:.*vision_tower.*
Bypass Divisibility Checks false

memory footprint

Model Memory Footprint
Original (BF16) ~49 GB
NVFP4 ~16.5 GB
Metric Value
Compression ~3.0×

llm-compressor

An open-source library developed by the vLLM team, designed to optimize Large Language Models (LLMs) for production deployment — https://github.com/vllm-project/llm-compressor

Downloads last month
36
Safetensors
Model size
15B params
Tensor type
F32
·
BF16
·
F8_E4M3
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for prithivMLmods/gemma-4-26B-A4B-it-qat-ptq-NVFP4

Dataset used to train prithivMLmods/gemma-4-26B-A4B-it-qat-ptq-NVFP4

Collection including prithivMLmods/gemma-4-26B-A4B-it-qat-ptq-NVFP4