gemma-e4b-firefly

A LoRA fine-tune of Gemma 4 E4B (7.52B dense) for C/C++ vulnerability classification. Given one C or C++ function, it returns a JSON object with a binary label (clean | vulnerable) and a list of CWE identifiers.

This model is not a reasoner. Disable the Gemma 4 thinking channel (chat_template_kwargs.enable_thinking=false) or you will get empty responses — the JSON is otherwise absorbed into a <think> block.

Prompt format

System prompt (copy verbatim):

You are a security reviewer. Return JSON only with keys label and cwe_ids. The label field must be exactly "clean" or "vulnerable".

User message:

Project: <project-name>
Language: C/C++
Determine whether this function is vulnerable.

```c
<function source>

Expected response:

```json
{"label":"vulnerable","cwe_ids":["CWE-125"]}

Inference — llama.cpp

llama-server -m gemma-e4b-firefly-q4_k_m.gguf \
    --host 127.0.0.1 --port 8080 \
    -c 4096 --temp 0 --top-k 1 --top-p 1 -n 256
curl -s http://127.0.0.1:8080/v1/chat/completions \
    -H 'Content-Type: application/json' \
    -d '{
        "messages": [
            {"role": "system", "content": "You are a security reviewer. Return JSON only with keys label and cwe_ids. The label field must be exactly \"clean\" or \"vulnerable\"."},
            {"role": "user", "content": "Project: core\nLanguage: C/C++\nDetermine whether this function is vulnerable.\n\n```c\n<paste function here>\n```"}
        ],
        "temperature": 0.0,
        "max_tokens": 128,
        "chat_template_kwargs": {"enable_thinking": false}
    }'

Inference — MLX (Apple Silicon)

import json
from mlx_lm import load, generate

model, tokenizer = load("trevon/gemma-e4b-firefly/mlx/gemma-e4b-firefly-4bit")

messages = [
    {"role": "system", "content": "You are a security reviewer. Return JSON only with keys label and cwe_ids. The label field must be exactly \"clean\" or \"vulnerable\"."},
    {"role": "user", "content": "Project: core\nLanguage: C/C++\nDetermine whether this function is vulnerable.\n\n```c\n<function source>\n```"},
]

prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True,
    enable_thinking=False,
)
out = generate(model, tokenizer, prompt=prompt, max_tokens=128)
# Model may wrap JSON in a ```json fence. Strip before json.loads if present.
print(out)

Evaluation

Six held-out gates from an internal C/C++ vulnerability benchmark, greedy decode at T=0.0.

gate tuned label_acc Δ vs base tuned CWE top-1
source_50_a 0.680 +0.04 0.103
source_50_b 0.660 +0.10 0.069
source_50_c 0.700 +0.08 0.000
source_200_a 0.635 +0.06 0.043
source_200_b 0.605 +0.07 0.026
source_200_c 0.665 +0.075 0.026

Mean 200-row Δ: +0.068. No parse failures, no empty labels.

Files

file size sha256
gemma-e4b-firefly-bf16.gguf 14 GB 27cd72a50756bf384724dd3c4590e184bee60162e9343d62e90151875f4eb69c
gemma-e4b-firefly-q8_0.gguf 7.5 GB 1dea37d5b796f7771a4a5b12eea55e78d504f18605aa1acba729bb5289b1afbc
gemma-e4b-firefly-q4_k_m.gguf 5.0 GB 0a1b5e91c9cef35add47b82033f7196f9a5774176e62e8ef382abab793a7a60e
mlx/gemma-e4b-firefly-bf16/ 14 GB MLX bf16
mlx/gemma-e4b-firefly-mxfp8/ 7.9 GB MLX 8-bit (group size 32)
mlx/gemma-e4b-firefly-4bit/ 4.0 GB MLX 4-bit (group size 64)

Q4_K_M is the recommended quant for laptops and consumer GPUs.

Limitations

  • C/C++ only. Not evaluated on other languages.
  • Label accuracy ≈ 0.65. Research adapter, not a production classifier — use it as a ranking signal, not a verdict.
  • Weak CWE top-1 (0.03–0.10). The model often picks a plausible but wrong CWE from the same family.
  • No reasoning traces. JSON-only training means no explanations or follow-up Q&A.
  • ~2k training context, so functions longer than ~1500 LoC are OOD.

Training

  • Base: google/gemma-4-e4b-it (mlx snapshot mlx-community/gemma-4-e4b-it-bf16)
  • LoRA: r=8, α=2 on q/k/v/o + gate/up/down, all 42 blocks, 500 iters, step 100 selected.
  • Corpus: 48,734 rows, PrimeVul + BigVul with strict {"label","cwe_ids"} targets, deduplicated against the eval benchmark by code_sha256.

License

Inherits the Gemma Terms of Use.

Downloads last month
53
MLX
Hardware compatibility
Log In to add your hardware

Quantized

GGUF
Model size
8B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support