Gemma 4 26B A4B IT QAT Assistant MTP Q8_0 GGUF

This repository contains a GGUF conversion of the official Google Gemma 4 26B A4B IT QAT assistant/drafter checkpoint.

  • Source checkpoint: google/gemma-4-26B-A4B-it-qat-q4_0-unquantized-assistant
  • Output file: gemma-4-26B-A4B-it-qat-assistant-MTP-Q8_0.gguf
  • Quantization: Q8_0
  • Format: GGUF
  • Intended runtime: llama.cpp with Gemma 4 MTP / draft-model support

This is not a standalone chat model. It is an assistant / drafter / MTP head intended to be used together with a matching Gemma 4 26B A4B IT QAT target model for speculative decoding.

File

File Description
gemma-4-26B-A4B-it-qat-assistant-MTP-Q8_0.gguf Q8_0 GGUF conversion of the Gemma 4 26B A4B QAT assistant checkpoint

Source

Converted from the official Google checkpoint:

google/gemma-4-26B-A4B-it-qat-q4_0-unquantized-assistant

Usage

This GGUF is a draft / assistant / MTP model, not a standalone chat model. It must be loaded together with a matching Gemma 4 26B A4B IT QAT target model.

llama-server example:

llama-server \
  -m gemma-4-26B-A4B-it-qat-UD-Q4_K_XL.gguf \
  --model-draft gemma-4-26B-A4B-it-qat-assistant-MTP-Q8_0.gguf \
  --spec-type draft-mtp \
  --spec-draft-n-max 4

Conversion

Converted with llama.cpp using Gemma 4 assistant / MTP support:

python convert_hf_to_gguf.py \
  gemma-4-26B-A4B-it-qat-q4_0-unquantized-assistant \
  --outfile gemma-4-26B-A4B-it-qat-assistant-MTP-Q8_0.gguf \
  --outtype q8_0

Local testing

In local testing with a Gemma 4 26B A4B IT QAT target GGUF and upstream llama.cpp Gemma 4 MTP support, --spec-draft-n-max 4 gave a good balance of throughput and draft acceptance.

Notes

This file was created to make the official QAT assistant/drafter checkpoint usable with llama.cpp's Gemma 4 MTP / speculative decoding path.

This model is intended for users who already have a compatible Gemma 4 26B A4B IT QAT target GGUF and want to enable speculative decoding with the matching QAT assistant head.

License and terms

This model is a converted derivative of Google's official Gemma 4 QAT assistant checkpoint.

Gemma 4 is released under the Apache 2.0 license. Users should also review the original Google model page and comply with any applicable terms associated with the source checkpoint:

google/gemma-4-26B-A4B-it-qat-q4_0-unquantized-assistant

Downloads last month
1,965
GGUF
Model size
0.4B params
Architecture
gemma4-assistant
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Janvitos/gemma-4-26B-A4B-it-qat-assistant-MTP-Q8_0-GGUF