Image-Text-to-Text
MLX
Safetensors
gemma4
saber
abliteration
refusal-ablation
representation-engineering
conversational
3-bit
How to use from
PiConfigure the model in Pi
# Install Pi:
npm install -g @mariozechner/pi-coding-agent# Add to ~/.pi/agent/models.json:
{
"providers": {
"mlx-lm": {
"baseUrl": "http://localhost:8080/v1",
"api": "openai-completions",
"apiKey": "none",
"models": [
{
"id": "GestaltLabs/Gemma-4-E4B-SABER-MLX-3bit"
}
]
}
}
}Run Pi
# Start Pi in your project directory:
piQuick Links
Gemma-4-E4B-SABER MLX Quantized
This repository contains an MLX quantized conversion of GestaltLabs/Gemma-4-E4B-SABER.
The model was converted with MLX-LM Gemma 4 model support and keeps the source tokenizer, generation config, and chat template.
Variant
See the repository name for the quantization level:
GestaltLabs/Gemma-4-E4B-SABER-MLX-8bit: approximately 8.5 bits per weight, about 7.4 GiB of weights.GestaltLabs/Gemma-4-E4B-SABER-MLX-6bit: approximately 6.5 bits per weight, about 5.6 GiB of weights.GestaltLabs/Gemma-4-E4B-SABER-MLX-4bit: approximately 4.5 bits per weight, about 3.9 GiB of weights.GestaltLabs/Gemma-4-E4B-SABER-MLX-3bit: approximately 3.5 bits per weight, about 3.0 GiB of weights.GestaltLabs/Gemma-4-E4B-SABER-MLX-2bit: approximately 2.5 bits per weight, about 2.2 GiB of weights.
Recommended starting points:
- 8-bit or 6-bit for quality-sensitive use.
- 4-bit for a smaller general-purpose build.
- 3-bit and 2-bit for memory-constrained experiments.
Usage
mlx_lm.generate \
--model GestaltLabs/Gemma-4-E4B-SABER-MLX-4bit \
--prompt "Explain quantum computing in simple terms."
Replace the repo name with the desired quantized variant.
Use a recent MLX-LM release with Gemma 4 support.
Source
- Source model:
GestaltLabs/Gemma-4-E4B-SABER - Base model:
google/gemma-4-E4B-it - License: Gemma license. See the source model and base model license terms.
Conversion Details
- Source format: Hugging Face safetensors, BF16.
- Target format: MLX safetensors.
- Quantization mode: MLX affine quantization.
- Group size: 64.
- Quantized variants: 8-bit, 6-bit, 4-bit, 3-bit, 2-bit.
- Conversion tool: MLX-LM with Gemma 4 support.
- Downloads last month
- 78
Model size
0.9B params
Tensor type
BF16
·
U32 ·
Hardware compatibility
Log In to add your hardware
3-bit
Model tree for GestaltLabs/Gemma-4-E4B-SABER-MLX-3bit
Base model
google/gemma-4-E4B Finetuned
google/gemma-4-E4B-it Finetuned
GestaltLabs/Gemma-4-E4B-SABER
Start the MLX server
# Install MLX LM: uv tool install mlx-lm# Start a local OpenAI-compatible server: mlx_lm.server --model "GestaltLabs/Gemma-4-E4B-SABER-MLX-3bit"