Gemma 4
Collection
5 items โข Updated
This repository contains GGUF format quantizations for DavidAU's Gemma-4-E4B-it-The-DECKARD-HERETIC-UNCENSORED-Thinking.
These files allow for highly efficient local inference on CPUs and Apple Silicon, as well as VRAM-constrained GPUs, utilizing llama.cpp and compatible frontends.
The base model is a specialized, heavily fine-tuned version of Google's Gemma 4 E4B (Effective 4B parameters).
This repository provides multiple levels of quantization to help you balance VRAM/RAM usage, generation speed, and model fidelity.
| File Name | Bit Resolution | Recommended Use |
|---|---|---|
| Q3_K_M | 3-bit | Ultra-low RAM usage. Noticeable perplexity degradation but runs on very constrained hardware. |
| Q4_K_M | 4-bit | Recommended. The sweet spot for local LLMs. Great balance of speed, low memory footprint, and quality. |
| Q5_K_M | 5-bit | Higher precision. Use this if you have the memory to spare and want slightly better reasoning. |
| Q6_K | 6-bit | Very high fidelity. Close to unquantized performance with a larger memory footprint. |
| Q8_0 | 8-bit | Extremely close to FP16 baseline. Best for users who want maximum precision and have ample RAM/VRAM. |
3-bit
4-bit
5-bit
6-bit
8-bit
Base model
google/gemma-4-E4B-it