Gemma-4-E4B-SABER MLX Quantized

This repository contains an MLX quantized conversion of GestaltLabs/Gemma-4-E4B-SABER.

The model was converted with MLX-LM Gemma 4 model support and keeps the source tokenizer, generation config, and chat template.

Variant

See the repository name for the quantization level:

  • GestaltLabs/Gemma-4-E4B-SABER-MLX-8bit: approximately 8.5 bits per weight, about 7.4 GiB of weights.
  • GestaltLabs/Gemma-4-E4B-SABER-MLX-6bit: approximately 6.5 bits per weight, about 5.6 GiB of weights.
  • GestaltLabs/Gemma-4-E4B-SABER-MLX-4bit: approximately 4.5 bits per weight, about 3.9 GiB of weights.
  • GestaltLabs/Gemma-4-E4B-SABER-MLX-3bit: approximately 3.5 bits per weight, about 3.0 GiB of weights.
  • GestaltLabs/Gemma-4-E4B-SABER-MLX-2bit: approximately 2.5 bits per weight, about 2.2 GiB of weights.

Recommended starting points:

  • 8-bit or 6-bit for quality-sensitive use.
  • 4-bit for a smaller general-purpose build.
  • 3-bit and 2-bit for memory-constrained experiments.

Usage

mlx_lm.generate \
  --model GestaltLabs/Gemma-4-E4B-SABER-MLX-4bit \
  --prompt "Explain quantum computing in simple terms."

Replace the repo name with the desired quantized variant.

Use a recent MLX-LM release with Gemma 4 support.

Source

  • Source model: GestaltLabs/Gemma-4-E4B-SABER
  • Base model: google/gemma-4-E4B-it
  • License: Gemma license. See the source model and base model license terms.

Conversion Details

  • Source format: Hugging Face safetensors, BF16.
  • Target format: MLX safetensors.
  • Quantization mode: MLX affine quantization.
  • Group size: 64.
  • Quantized variants: 8-bit, 6-bit, 4-bit, 3-bit, 2-bit.
  • Conversion tool: MLX-LM with Gemma 4 support.
Downloads last month
87
Safetensors
Model size
7B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for GestaltLabs/Gemma-4-E4B-SABER-MLX-6bit

Quantized
(6)
this model