Image-Text-to-Text
GGUF
llama.cpp
gemma4
saber
abliteration
refusal-ablation
representation-engineering
conversational
llm.create_chat_completion(
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image in one sentence."
},
{
"type": "image_url",
"image_url": {
"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
}
}
]
}
]
)Gemma-4-E4B-SABER GGUF
This repository contains GGUF conversions of GestaltLabs/Gemma-4-E4B-SABER for llama.cpp-compatible runtimes.
The source model is a Gemma 4 E4B instruction model modified with the SABER/refusal-ablation workflow. These files preserve the source tokenizer and chat template metadata in GGUF form.
Files
| File | Quantization | Approx. size |
|---|---|---|
Gemma-4-E4B-SABER-BF16.gguf |
BF16 | 13.92 GiB |
Gemma-4-E4B-SABER-Q8_0.gguf |
Q8_0 | 7.43 GiB |
Gemma-4-E4B-SABER-Q6_K.gguf |
Q6_K | 5.75 GiB |
Gemma-4-E4B-SABER-Q5_K_M.gguf |
Q5_K_M | 5.33 GiB |
Gemma-4-E4B-SABER-Q4_K_M.gguf |
Q4_K_M | 4.94 GiB |
Gemma-4-E4B-SABER-Q3_K_M.gguf |
Q3_K_M | 4.49 GiB |
Gemma-4-E4B-SABER-Q2_K.gguf |
Q2_K | 4.08 GiB |
Quantization Notes
The BF16 GGUF was converted from the original Hugging Face safetensors checkpoint using a local llama.cpp build with Gemma 4 support. The quantized GGUF files were produced from that BF16 GGUF using llama-quantize.
Recommended starting points:
Q8_0: highest-quality quantized option.Q6_K: strong quality/size tradeoff.Q4_K_M: compact general-purpose option.Q2_KandQ3_K_M: smallest files, with larger quality tradeoffs.
No importance matrix was used.
Example
llama-cli \
-m Gemma-4-E4B-SABER-Q4_K_M.gguf \
-p "Explain quantum computing in simple terms."
Use a current llama.cpp build with Gemma 4 support.
Source
- Source model:
GestaltLabs/Gemma-4-E4B-SABER - Base model:
google/gemma-4-E4B-it - License: Gemma license. See the source model and base model license terms.
Conversion Details
- Source format: Hugging Face safetensors, BF16.
- GGUF converter: llama.cpp
convert_hf_to_gguf.py. - GGUF quantizer: llama.cpp
llama-quantize. - Quant types: BF16, Q8_0, Q6_K, Q5_K_M, Q4_K_M, Q3_K_M, Q2_K.
- Downloads last month
- 764
Hardware compatibility
Log In to add your hardware
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
Model tree for GestaltLabs/Gemma-4-E4B-SABER-GGUF
Base model
google/gemma-4-E4B Finetuned
google/gemma-4-E4B-it Finetuned
GestaltLabs/Gemma-4-E4B-SABER
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="GestaltLabs/Gemma-4-E4B-SABER-GGUF", filename="", )