Instructions to use GestaltLabs/Gemma-4-E4B-SABER with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use GestaltLabs/Gemma-4-E4B-SABER with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="GestaltLabs/Gemma-4-E4B-SABER")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("GestaltLabs/Gemma-4-E4B-SABER")
model = AutoModelForImageTextToText.from_pretrained("GestaltLabs/Gemma-4-E4B-SABER")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use GestaltLabs/Gemma-4-E4B-SABER with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "GestaltLabs/Gemma-4-E4B-SABER"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "GestaltLabs/Gemma-4-E4B-SABER",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/GestaltLabs/Gemma-4-E4B-SABER

SGLang

How to use GestaltLabs/Gemma-4-E4B-SABER with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "GestaltLabs/Gemma-4-E4B-SABER" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "GestaltLabs/Gemma-4-E4B-SABER",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "GestaltLabs/Gemma-4-E4B-SABER" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "GestaltLabs/Gemma-4-E4B-SABER",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use GestaltLabs/Gemma-4-E4B-SABER with Docker Model Runner:
```
docker model run hf.co/GestaltLabs/Gemma-4-E4B-SABER
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Gemma 4 E4B SABER

This is a SABER-edited derivative of google/gemma-4-E4B-it.

SABER is my take on abliteration: an entanglement-aware refusal-ablation method that tries to reduce refusal behavior while limiting behavioral drift from the base model. I did not invent refusal ablation. This work is inspired by prior refusal-direction and community abliteration methods, including Arditi et al., Maxime Labonne / FailSpy-style abliteration recipes, Jim Lai projected and norm-preserving ablation work, Pliny / OBLITERATUS, Heretic, Jiunsong SuperGemma releases, and spectral-cleaning / surgical-refusal-ablation style work.

Method Summary

SABER treats refusal ablation as a constrained multi-objective editing problem.

Instead of using a single refusal direction with one uniform ablation strength, SABER combines:

multi-direction refusal subspaces extracted from candidate layers,
FDR / separability-based ranking of directions and layers,
capability-entanglement scoring,
differential ablation strengths for refusal-dominant vs. capability-entangled directions,
Pareto evaluation over refusal rate and behavioral drift.

The goal is not simply to remove refusals at all costs. The goal is to suppress refusal behavior while measuring and limiting drift from the original model.

Released Checkpoint

This checkpoint corresponds to the current aggressive Gemma 4 E4B SABER point:

Run: gemma4_e4b_auto_svd_nh60_a825_g14
Base: google/gemma-4-E4B-it
Extraction: SVD
Directions per layer: 4
Layer strategy: top-k
Global top-k: 14
Alpha base: 0.825
Alpha entangled: 0.03
Entanglement threshold: 0.55
Max iterations: 4
Probe set: 49 harmful / 49 harmless / 30 capability prompts

Evaluation Snapshot

Measured against the local SABER evaluation harness:

metric	value
keyword refusal rate	0.00%
mean KLD vs base	0.4327
residual refusal proxy	3.9394
score used in local sweep	0.0432

This point is the most aggressive current Pareto point: it eliminates keyword refusals in the local test set, but it has higher behavioral drift than the balanced SABER point.

Balanced comparison point from the same sweep:

run	keyword refusal	mean KLD
`gemma4_e4b_auto_svd_nh60_a825_g14`	0.00%	0.4327
`gemma4_e4b_auto_svd_a825_g14`	2.04%	0.3164

Key Finding

Increasing the refusal/harmless probe count changed the FDR-selected layer pattern. The larger probe set eliminated refusal in the local eval, but increased KLD. This suggests probe count is a real hyperparameter for refusal/KLD tradeoffs, not just an implementation detail.

Intended Use

This checkpoint is intended for research into representation editing, refusal mechanisms, and behavior-preserving model editing. It is not a safety system and not a guarantee of harmless behavior.

Limitations

Evaluation is local and limited; broader benchmark and qualitative testing are still needed.
Keyword refusal rate is not the same as a full safety evaluation.
Lower refusal can increase dual-use risk.
KLD is a proxy for behavioral drift, not a complete measure of capability preservation.
This is an aggressive point on the frontier; users who prefer lower drift may prefer the balanced SABER point instead.

Provenance and Credit

This work belongs in the abliteration lineage. It was inspired by earlier methods and community experimentation showing that refusal behavior can be edited in activation space. In particular, Jiunsong SuperGemma work was an important inspiration because it showed that abliteration in the Gemma family could improve practical behavior rather than merely removing refusals.

SABER contribution is the specific combination of separability-ranked multi-direction extraction, capability-entanglement-aware scaling, and Pareto-style refusal/drift evaluation.

Downloads last month: 27

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for GestaltLabs/Gemma-4-E4B-SABER

Base model

google/gemma-4-E4B

Finetuned

google/gemma-4-E4B-it

Finetuned

(152)

this model

Quantizations

6 models