Gemma-4-E4B-SABER MLX Quantized

This repository contains an MLX quantized conversion of GestaltLabs/Gemma-4-E4B-SABER.

The model was converted with MLX-LM Gemma 4 model support and keeps the source tokenizer, generation config, and chat template.

Variant

See the repository name for the quantization level:

GestaltLabs/Gemma-4-E4B-SABER-MLX-8bit: approximately 8.5 bits per weight, about 7.4 GiB of weights.
GestaltLabs/Gemma-4-E4B-SABER-MLX-6bit: approximately 6.5 bits per weight, about 5.6 GiB of weights.
GestaltLabs/Gemma-4-E4B-SABER-MLX-4bit: approximately 4.5 bits per weight, about 3.9 GiB of weights.
GestaltLabs/Gemma-4-E4B-SABER-MLX-3bit: approximately 3.5 bits per weight, about 3.0 GiB of weights.
GestaltLabs/Gemma-4-E4B-SABER-MLX-2bit: approximately 2.5 bits per weight, about 2.2 GiB of weights.

Recommended starting points:

mlx_lm.generate \
  --model GestaltLabs/Gemma-4-E4B-SABER-MLX-4bit \
  --prompt "Explain quantum computing in simple terms."

Replace the repo name with the desired quantized variant.

Use a recent MLX-LM release with Gemma 4 support.

Safetensors

Model size

0.9B params

Tensor type

BF16

U32

MLX

Hardware compatibility

3-bit

Base model

Finetuned

Finetuned

Quantized

(6)

this model