Image-Text-to-Text
GGUF
llama.cpp
gemma4
saber
abliteration
refusal-ablation
representation-engineering
conversational
How to use from
PiConfigure the model in Pi
# Install Pi:
npm install -g @mariozechner/pi-coding-agent# Add to ~/.pi/agent/models.json:
{
"providers": {
"llama-cpp": {
"baseUrl": "http://localhost:8080/v1",
"api": "openai-completions",
"apiKey": "none",
"models": [
{
"id": "Gemma-4-E4B-SABER-GGUF"
}
]
}
}
}Run Pi
# Start Pi in your project directory:
piQuick Links
Gemma-4-E4B-SABER GGUF
This repository contains GGUF conversions of GestaltLabs/Gemma-4-E4B-SABER for llama.cpp-compatible runtimes.
The source model is a Gemma 4 E4B instruction model modified with the SABER/refusal-ablation workflow. These files preserve the source tokenizer and chat template metadata in GGUF form.
Files
| File | Quantization | Approx. size |
|---|---|---|
Gemma-4-E4B-SABER-BF16.gguf |
BF16 | 13.92 GiB |
Gemma-4-E4B-SABER-Q8_0.gguf |
Q8_0 | 7.43 GiB |
Gemma-4-E4B-SABER-Q6_K.gguf |
Q6_K | 5.75 GiB |
Gemma-4-E4B-SABER-Q5_K_M.gguf |
Q5_K_M | 5.33 GiB |
Gemma-4-E4B-SABER-Q4_K_M.gguf |
Q4_K_M | 4.94 GiB |
Gemma-4-E4B-SABER-Q3_K_M.gguf |
Q3_K_M | 4.49 GiB |
Gemma-4-E4B-SABER-Q2_K.gguf |
Q2_K | 4.08 GiB |
Quantization Notes
The BF16 GGUF was converted from the original Hugging Face safetensors checkpoint using a local llama.cpp build with Gemma 4 support. The quantized GGUF files were produced from that BF16 GGUF using llama-quantize.
Recommended starting points:
Q8_0: highest-quality quantized option.Q6_K: strong quality/size tradeoff.Q4_K_M: compact general-purpose option.Q2_KandQ3_K_M: smallest files, with larger quality tradeoffs.
No importance matrix was used.
Example
llama-cli \
-m Gemma-4-E4B-SABER-Q4_K_M.gguf \
-p "Explain quantum computing in simple terms."
Use a current llama.cpp build with Gemma 4 support.
Source
- Source model:
GestaltLabs/Gemma-4-E4B-SABER - Base model:
google/gemma-4-E4B-it - License: Gemma license. See the source model and base model license terms.
Conversion Details
- Source format: Hugging Face safetensors, BF16.
- GGUF converter: llama.cpp
convert_hf_to_gguf.py. - GGUF quantizer: llama.cpp
llama-quantize. - Quant types: BF16, Q8_0, Q6_K, Q5_K_M, Q4_K_M, Q3_K_M, Q2_K.
- Downloads last month
- 764
Hardware compatibility
Log In to add your hardware
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
Model tree for GestaltLabs/Gemma-4-E4B-SABER-GGUF
Base model
google/gemma-4-E4B Finetuned
google/gemma-4-E4B-it Finetuned
GestaltLabs/Gemma-4-E4B-SABER
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp# Start a local OpenAI-compatible server: llama-server -hf GestaltLabs/Gemma-4-E4B-SABER-GGUF: