These are miscellaneous GGUF quantizations of the instruct-tuned Gemma 4 series of models, released by Google.

For more information about Gemma, you should refer to the original model cards.

The chat template baked into these GGUFs is technically outdated, however, inference in llama.cpp should still work exactly as it should, thanks to these fixes:

llama.cpp#21704: common : better align to the updated official gemma4 template
llama.cpp#21760: common/gemma4 : handle parsing edge cases

Downloads last month: 85,928

GGUF

Model size

25B params

Architecture

gemma4

Hardware compatibility

16-bit

32-bit

View +18 variants

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ddh0/gemma-4-it-GGUF

Base model

google/gemma-4-26B-A4B-it

Quantized

(91)

this model