gemma-e4b-firefly

A LoRA fine-tune of Gemma 4 E4B (7.52B dense) for C/C++ vulnerability classification. Given one C or C++ function, it returns a JSON object with a binary label (clean | vulnerable) and a list of CWE identifiers.

This model is not a reasoner. Disable the Gemma 4 thinking channel (chat_template_kwargs.enable_thinking=false) or you will get empty responses — the JSON is otherwise absorbed into a <think> block.

Prompt format

System prompt (copy verbatim):

You are a security reviewer. Return JSON only with keys label and cwe_ids. The label field must be exactly "clean" or "vulnerable".

User message:

Project: <project-name>
Language: C/C++
Determine whether this function is vulnerable.

```c
<function source>


Expected response:

```json
{"label":"vulnerable","cwe_ids":["CWE-125"]}

Inference — llama.cpp

llama-server -m gemma-e4b-firefly-q4_k_m.gguf \
    --host 127.0.0.1 --port 8080 \
    -c 4096 --temp 0 --top-k 1 --top-p 1 -n 256

curl -s http://127.0.0.1:8080/v1/chat/completions \
    -H 'Content-Type: application/json' \
    -d '{
        "messages": [
            {"role": "system", "content": "You are a security reviewer. Return JSON only with keys label and cwe_ids. The label field must be exactly \"clean\" or \"vulnerable\"."},
            {"role": "user", "content": "Project: core\nLanguage: C/C++\nDetermine whether this function is vulnerable.\n\n```c\n<paste function here>\n```"}
        ],
        "temperature": 0.0,
        "max_tokens": 128,
        "chat_template_kwargs": {"enable_thinking": false}
    }'

Inference — MLX (Apple Silicon)

import json
from mlx_lm import load, generate

model, tokenizer = load("trevon/gemma-e4b-firefly/mlx/gemma-e4b-firefly-4bit")

messages = [
    {"role": "system", "content": "You are a security reviewer. Return JSON only with keys label and cwe_ids. The label field must be exactly \"clean\" or \"vulnerable\"."},
    {"role": "user", "content": "Project: core\nLanguage: C/C++\nDetermine whether this function is vulnerable.\n\n```c\n<function source>\n```"},
]

prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True,
    enable_thinking=False,
)
out = generate(model, tokenizer, prompt=prompt, max_tokens=128)
# Model may wrap JSON in a ```json fence. Strip before json.loads if present.
print(out)

Evaluation

Six held-out gates from an internal C/C++ vulnerability benchmark, greedy decode at T=0.0.

gate	tuned label_acc	Δ vs base	tuned CWE top-1
source_50_a	0.680	+0.04	0.103
source_50_b	0.660	+0.10	0.069
source_50_c	0.700	+0.08	0.000
source_200_a	0.635	+0.06	0.043
source_200_b	0.605	+0.07	0.026
source_200_c	0.665	+0.075	0.026

Mean 200-row Δ: +0.068. No parse failures, no empty labels.

Files

file	size	sha256
`gemma-e4b-firefly-bf16.gguf`	14 GB	`27cd72a50756bf384724dd3c4590e184bee60162e9343d62e90151875f4eb69c`
`gemma-e4b-firefly-q8_0.gguf`	7.5 GB	`1dea37d5b796f7771a4a5b12eea55e78d504f18605aa1acba729bb5289b1afbc`
`gemma-e4b-firefly-q4_k_m.gguf`	5.0 GB	`0a1b5e91c9cef35add47b82033f7196f9a5774176e62e8ef382abab793a7a60e`
`mlx/gemma-e4b-firefly-bf16/`	14 GB	MLX bf16
`mlx/gemma-e4b-firefly-mxfp8/`	7.9 GB	MLX 8-bit (group size 32)
`mlx/gemma-e4b-firefly-4bit/`	4.0 GB	MLX 4-bit (group size 64)

Q4_K_M is the recommended quant for laptops and consumer GPUs.

Limitations

C/C++ only. Not evaluated on other languages.
Label accuracy ≈ 0.65. Research adapter, not a production classifier — use it as a ranking signal, not a verdict.
Weak CWE top-1 (0.03–0.10). The model often picks a plausible but wrong CWE from the same family.
No reasoning traces. JSON-only training means no explanations or follow-up Q&A.
~2k training context, so functions longer than ~1500 LoC are OOD.

Training

Base: google/gemma-4-e4b-it (mlx snapshot mlx-community/gemma-4-e4b-it-bf16)
LoRA: r=8, α=2 on q/k/v/o + gate/up/down, all 42 blocks, 500 iters, step 100 selected.
Corpus: 48,734 rows, PrimeVul + BigVul with strict {"label","cwe_ids"} targets, deduplicated against the eval benchmark by code_sha256.

License

Inherits the Gemma Terms of Use.

Downloads last month: 53

MLX

Hardware compatibility

Quantized

GGUF

Model size

8B params

Architecture

gemma4

Hardware compatibility

4-bit

8-bit

16-bit