How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="ZeZZm/aero-deuce-GGUF",
	filename="aero-deuce-q4km.gguf",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Aero-Deuce โ€” GGUF Q4_K_M

A fine-tuned Gemma 4 12B instruction-following model. This is the GGUF quantized version (~7 GB) that runs locally on CPU or GPU with no Python required.

Download

Click the Files and versions tab above and download aero-deuce-q4km.gguf. That's the only file you need.

Which format should I use?

Format Best for Link
GGUF โ† you are here Local inference, llama.cpp, LM Studio, GPT4All This repo
MLX 4-bit Apple Silicon (Mac) ZeZZm/aero-deuce-MLX
LoRA Adapter Merging with base model, further fine-tuning ZeZZm/aero-deuce

Quick Start

LM Studio (easiest โ€” GUI app):

  1. Download LM Studio
  2. Search for ZeZZm/aero-deuce-GGUF
  3. Click download, then chat

llama.cpp:

# Download
wget https://huggingface.co/ZeZZm/aero-deuce-GGUF/resolve/main/aero-deuce-q4km.gguf

# Run
llama-cli -m aero-deuce-q4km.gguf -c 4096 --conversation

Ollama:

# After downloading the GGUF file:
echo 'FROM ./aero-deuce-q4km.gguf
SYSTEM "You are Aero-Deuce, developed by the Aero-Deuce team."
PARAMETER stop "<|end_of_turn>"
PARAMETER stop "<|start_of_turn>"' > Modelfile

ollama create aero-deuce -f Modelfile
ollama run aero-deuce

GPT4All:

  1. Download GPT4All
  2. File โ†’ Open โ†’ select aero-deuce-q4km.gguf
  3. Start chatting

Model Details

Property Value
Base Model google/gemma-4-12b-it (12B params)
Training Method QLoRA + Muon optimizer
Training Data 30K instruction-following samples
Training Steps 2,000
Quantization Q4_K_M (~4.95 bits per weight)
File Size ~7 GB
Context Length 4,096 tokens

System Prompt

A system prompt identifying the model as Aero-Deuce is embedded in the GGUF chat template. It works automatically in most frontends. For llama-cli, pass -sys "You are Aero-Deuce." for best results.

License

Apache 2.0

Downloads last month
651
GGUF
Model size
12B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using ZeZZm/aero-deuce-GGUF 1