Gemma-4-E4B-SABER MLX Quantized
This repository contains an MLX quantized conversion of GestaltLabs/Gemma-4-E4B-SABER.
The model was converted with MLX-LM Gemma 4 model support and keeps the source tokenizer, generation config, and chat template.
Variant
See the repository name for the quantization level:
GestaltLabs/Gemma-4-E4B-SABER-MLX-8bit: approximately 8.5 bits per weight, about 7.4 GiB of weights.GestaltLabs/Gemma-4-E4B-SABER-MLX-6bit: approximately 6.5 bits per weight, about 5.6 GiB of weights.GestaltLabs/Gemma-4-E4B-SABER-MLX-4bit: approximately 4.5 bits per weight, about 3.9 GiB of weights.GestaltLabs/Gemma-4-E4B-SABER-MLX-3bit: approximately 3.5 bits per weight, about 3.0 GiB of weights.GestaltLabs/Gemma-4-E4B-SABER-MLX-2bit: approximately 2.5 bits per weight, about 2.2 GiB of weights.
Recommended starting points:
- 8-bit or 6-bit for quality-sensitive use.
- 4-bit for a smaller general-purpose build.
- 3-bit and 2-bit for memory-constrained experiments.
Usage
mlx_lm.generate \
--model GestaltLabs/Gemma-4-E4B-SABER-MLX-4bit \
--prompt "Explain quantum computing in simple terms."
Replace the repo name with the desired quantized variant.
Use a recent MLX-LM release with Gemma 4 support.
Source
- Source model:
GestaltLabs/Gemma-4-E4B-SABER - Base model:
google/gemma-4-E4B-it - License: Gemma license. See the source model and base model license terms.
Conversion Details
- Source format: Hugging Face safetensors, BF16.
- Target format: MLX safetensors.
- Quantization mode: MLX affine quantization.
- Group size: 64.
- Quantized variants: 8-bit, 6-bit, 4-bit, 3-bit, 2-bit.
- Conversion tool: MLX-LM with Gemma 4 support.
- Downloads last month
- 91
Model size
1B params
Tensor type
BF16
·
U32 ·
Hardware compatibility
Log In to add your hardware
4-bit
Model tree for GestaltLabs/Gemma-4-E4B-SABER-MLX-4bit
Base model
google/gemma-4-E4B Finetuned
google/gemma-4-E4B-it Finetuned
GestaltLabs/Gemma-4-E4B-SABER