Instructions to use Alfaxad/wild-gemma-4-E4B-it-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Alfaxad/wild-gemma-4-E4B-it-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Alfaxad/wild-gemma-4-E4B-it-GGUF",
	filename="wild-gemma-4-E4B-it.Q4_K_M.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use Alfaxad/wild-gemma-4-E4B-it-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Alfaxad/wild-gemma-4-E4B-it-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Alfaxad/wild-gemma-4-E4B-it-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Alfaxad/wild-gemma-4-E4B-it-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Alfaxad/wild-gemma-4-E4B-it-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Alfaxad/wild-gemma-4-E4B-it-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf Alfaxad/wild-gemma-4-E4B-it-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Alfaxad/wild-gemma-4-E4B-it-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Alfaxad/wild-gemma-4-E4B-it-GGUF:Q4_K_M

Use Docker

docker model run hf.co/Alfaxad/wild-gemma-4-E4B-it-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use Alfaxad/wild-gemma-4-E4B-it-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Alfaxad/wild-gemma-4-E4B-it-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Alfaxad/wild-gemma-4-E4B-it-GGUF",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Alfaxad/wild-gemma-4-E4B-it-GGUF:Q4_K_M

Ollama
How to use Alfaxad/wild-gemma-4-E4B-it-GGUF with Ollama:
```
ollama run hf.co/Alfaxad/wild-gemma-4-E4B-it-GGUF:Q4_K_M
```

Unsloth Studio

How to use Alfaxad/wild-gemma-4-E4B-it-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Alfaxad/wild-gemma-4-E4B-it-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Alfaxad/wild-gemma-4-E4B-it-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Alfaxad/wild-gemma-4-E4B-it-GGUF to start chatting

Atomic Chat new
Docker Model Runner
How to use Alfaxad/wild-gemma-4-E4B-it-GGUF with Docker Model Runner:
```
docker model run hf.co/Alfaxad/wild-gemma-4-E4B-it-GGUF:Q4_K_M
```

Lemonade

How to use Alfaxad/wild-gemma-4-E4B-it-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Alfaxad/wild-gemma-4-E4B-it-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.wild-gemma-4-E4B-it-GGUF-Q4_K_M

List all available models

lemonade list

Wild Gemma 4 E4B IT GGUF / Ollama

This repository contains the working Ollama-compatible GGUF export for Alfaxad/wild-gemma-4-E4B-it, the Savanna Sentinel fine-tune of Gemma 4 E4B IT.

The final model file is:

wild-gemma-4-E4B-it.Q4_K_M.gguf

It is a single combined Gemma 4 GGUF containing the language model tensors plus the multimodal vision/projector tensors. The first split text-GGUF plus mmproj export path did not load correctly for this custom Gemma 4 model in Ollama during validation, so the final artifact was rebuilt as a combined GGUF using the official Gemma 4 metadata layout and then smoke-tested with image+text prompts.

What This Model Does

Wild Gemma 4 E4B IT is specialized for Savanna Sentinel camera-trap workflows:

Classify Serengeti camera-trap events as blank or non-blank
Identify likely species from one to three frame bursts
Return structured JSON for event interpretation
Route uncertain events for review
Support tool-agent/report-generation style JSON tasks

This GGUF is intended for local Ollama inference and deployment testing. For the full merged Transformers model, use Alfaxad/wild-gemma-4-E4B-it.

File Details

Field	Value
Architecture	Gemma 4
Quantization	Q4_K_M
Context length	131,072
GGUF size	6,325,644,864 bytes
Modalities evaluated	Image + text
Audio evaluated	No
Base model	`Alfaxad/wild-gemma-4-E4B-it`

Ollama Usage

The published Ollama model is intended to be used as:

ollama run alfaxad/wild-gemma4:e4b

For local creation from this repository:

hf download Alfaxad/wild-gemma-4-E4B-it-GGUF wild-gemma-4-E4B-it.Q4_K_M.gguf
cat > Modelfile <<'EOF'
FROM ./wild-gemma-4-E4B-it.Q4_K_M.gguf
RENDERER gemma4
PARSER gemma4
PARAMETER temperature 1
PARAMETER top_p 0.95
PARAMETER top_k 64
SYSTEM "You are Savanna Sentinel. Return only valid JSON."
EOF
ollama create wild-gemma4:e4b -f Modelfile
ollama run wild-gemma4:e4b

The model is configured to follow the Gemma 4/Ollama defaults used in evaluation:

temperature = 1.0
top_p = 0.95
top_k = 64

For multimodal prompts, place images before text. This matches the Gemma 4 and Ollama guidance used during validation.

Thinking Mode

Gemma 4 supports thinking mode. In Ollama, enable thinking through the runtime support or by starting the system prompt with:

<|think|>

For non-thinking schema production, omit that token and request strict JSON. When thinking is enabled, strip thought-channel content and validate only the final JSON. Do not put prior thought content into multi-turn history.

Evaluation Snapshot

These are diagnostic evals from the corrected Ollama/GGUF export:

Mode	Rows	JSON valid	Species exact	Species overlap	Blank correct	Review correct
Non-thinking	40	0.725	0.364	0.364	0.889	1.000
Thinking	24	0.792	0.500	0.500	1.000	1.000

The metrics are useful for regression checks and export validation, not as a final scientific benchmark. Full metrics and predictions are included under metrics/.

Metrics Files

metrics/combined_gguf_officialmeta_status.json
metrics/combined_gguf_status.json
metrics/manual_lora_gguf_status.json
metrics/evaluation_ollama_manual_combined_q4_officialmeta_redo.json
metrics/predictions_ollama_manual_combined_q4_officialmeta_redo_direct.jsonl
metrics/predictions_ollama_manual_combined_q4_officialmeta_redo_thinking.jsonl

Prompting Pattern

Use strict, schema-first prompts:

You are Savanna Sentinel. Return only valid JSON.

Classify this Serengeti camera-trap capture event. Use the image burst first, then the metadata. Return JSON matching savanna_sentinel_event_v1.

Example target shape:

{
  "schema_version": "savanna_sentinel_event_v1",
  "capture_event_id": "ASG...",
  "blank": false,
  "detections": [
    {
      "species": "zebra",
      "count_bin": "3",
      "behaviors": {
        "standing": false,
        "resting": false,
        "moving": true,
        "eating": false,
        "interacting": false
      },
      "young_present": false,
      "confidence": "high",
      "evidence": {
        "visual_basis": "Striped equids visible across the image burst.",
        "frames_used": [1, 2, 3]
      }
    }
  ],
  "review": {
    "review_needed": false,
    "reasons": []
  }
}

Limitations

The Q4_K_M export is smaller and faster than the merged HF model, but quantization can change behavior.
JSON validity is not guaranteed; callers should parse and validate outputs.
Tool/report tasks remain weaker than the core event/review tasks in the diagnostic evals.
Audio support from Gemma 4 E4B was not evaluated in this Savanna Sentinel export.
This model is specialized for Snapshot Serengeti-style camera-trap data and should be validated before use on other regions or camera systems.