qwen3-4b-cybersec-GGUF — Cybersecurity Fine-Tuned Language Model

A Qwen3-4B model fine-tuned on the Voidreaper2026/cybersec-master-dataset and quantised to GGUF Q8_0 for local deployment. The training corpus spans 1.8 million deduplicated records from NVD, OSV, GitHub Advisory Database, ExploitDB, MITRE ATT&CK, CISA KEV, Security Stack Exchange, Kali Linux tooling, and Vulners vulnerability intelligence.

This model is designed to operate as the fast extraction and classification layer in a grounded triage pipeline — not as a standalone severity oracle. That distinction matters, and the rest of this card explains why.


Quickstart

llama.cpp

# Install
brew install llama.cpp          # macOS
winget install llama.cpp        # Windows

# Run as OpenAI-compatible server
llama-server -hf Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0

# Or run directly in terminal
llama-cli -hf Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0

Ollama

ollama run hf.co/Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0

llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="Voidreaper2026/qwen3-4b-cybersec-GGUF",
    filename="model-Q8_0.gguf",
    n_ctx=4096
)

response = llm.create_chat_completion(
    messages=[
        {
            "role": "user",
            "content": \"\"\"Extract the following fields from your knowledge of CVE-2023-44487.
Return as JSON only. Return null for any field you cannot confirm with certainty.

Fields: cve_id, cwe_ids, affected_products, attack_vector,
privileges_required, patch_available, cisa_kev, mitre_attack_technique\"\"\"
        }
    ],
    temperature=0.1,
    max_tokens=512
)

print(response["choices"][0]["message"]["content"])

LM Studio / Jan

Search for Voidreaper2026/qwen3-4b-cybersec-GGUF directly in the app.


Training Data

Source Records Description
NVD 500,935 CVE database back to 2002
OSV 754,273 Multi-ecosystem vulnerability DB
GitHub Advisory DB 328,525 Security advisories, CC-BY 4.0
Cybersec Causal Reasoning 99,870 Reasoning triples
Security Stack Exchange 55,930 Real-world Q&A
ExploitDB 46,457 Public exploit database
Vulners 87,063 Exploit and advisory intelligence
MITRE ATT&CK 2,205 Techniques, mitigations, groups
CISA KEV 1,587 Known Exploited Vulnerabilities
Kali Linux Tools 790 Tool descriptions and flags
Total (deduplicated) 1,807,941

The Problem This Pipeline Solves

Every LLM over-inflates CVE severity scores. This is a field-wide problem, not a model-specific one.

It has nothing to do with training data quality. It is structural:

  • Pre-training data is skewed by nature. The internet massively over-represents Critical and High CVEs. Nobody publishes a detailed breakdown of a CVSS 4.2. Every LLM inherits this bias from pre-training, before any fine-tuning happens.

  • NVD base scores are worst-case by design. CVSS base scores assume no mitigating controls, full network exposure, and worst-case environment. A legitimate 9.8 in the database might realistically be a 4.0 in most real deployments.

  • Instruction tuning pushes toward caution. RLHF rewards thorough, safety-conscious answers. In a security context that trains a bias toward worst-case severity framing.

The pipeline below bypasses this entirely by ensuring severity scores are always retrieved from source data, never generated from model weights.


Recommended Architecture: Grounded Triage Pipeline

User Query
    |
    v
+------------------------------------------------------------------+
|           qwen3-4b-cybersec  (Extraction Layer)                  |
|                                                                  |
|  Fast, cheap, runs fully local on CPU or AMD/NVIDIA GPU.         |
|  Extracts CVE IDs, CWE types, affected products, attack surface. |
|  Does NOT output severity scores or CVSS values.                 |
+-----------------------------+------------------------------------+
                              |  Structured: CVE IDs, CWEs, products
                              v
+------------------------------------------------------------------+
|                    RAG Retrieval Layer                           |
|                                                                  |
|  Vector search over embedded cybersec-master-dataset.            |
|  Returns verbatim CVSS vectors, KEV status, ATT&CK mappings.     |
+-----------------------------+------------------------------------+
                              |
                              |  If CVE not in index:
                              v
+------------------------------------------------------------------+
|                 Web Search Fallback (No-RAG Path)                |
|                                                                  |
|  Live lookups: NVD API, CISA KEV, CVE.mitre.org, vendor         |
|  advisories. Output tagged source: web_search_backed.            |
+-----------------------------+------------------------------------+
                              |  Retrieved context
                              v
+------------------------------------------------------------------+
|             Large Model  (Triage and Synthesis Layer)            |
|                                                                  |
|  Operates on retrieved context only — never on weights.          |
|  Contextualises severity for the user's actual environment.      |
|  Flags confidence: rag_backed / web_search_backed /              |
|  model_generated (treat with caution).                           |
+------------------------------------------------------------------+

Why each component earns its place

qwen3-4b-cybersec is the economical workhorse. Entity extraction, CWE classification, and query structuring are exactly what a fine-tuned 4B model excels at. Runs fast, cheap, and fully local including on AMD GPUs via llama.cpp. Kept out of the scoring loop entirely.

RAG retrieval is the score source. Retrieving CVSS vectors verbatim from the source dataset completely bypasses the inflation problem regardless of which LLM you use.

Web search covers the temporal gap. Zero-days and post-training CVEs get live NVD API lookups, tagged so downstream systems know the data was not RAG-backed.

The large model synthesises, never invents. Given grounded context, it contextualises risk for the user's environment without ever recalling a score from weights.


Confidence Flagging

Flag Meaning Trust level
rag_backed Score retrieved verbatim from dataset index High
web_search_backed Score fetched live from NVD API or vendor advisory High
model_generated No retrieval source found — model inference only Low — verify manually

RAG Implementation Notes

Embedding the dataset

from datasets import load_dataset
from sentence_transformers import SentenceTransformer

ds = load_dataset("Voidreaper2026/cybersec-master-dataset", split="train")
encoder = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")

# Embed at record level to preserve CVSS vector coherence
def get_embed_text(record):
    convs = record["conversations"]
    assistant_turn = next((c["value"] for c in convs if c["from"] == "gpt"), "")
    return f"{record.get('cve_id', '')} {assistant_turn}"

NVD API fallback

import httpx

async def nvd_lookup(cve_id: str) -> dict:
    url = f"https://services.nvd.nist.gov/rest/json/cves/2.0?cveId={cve_id}"
    async with httpx.AsyncClient() as client:
        r = await client.get(url, timeout=10)
        r.raise_for_status()
        data = r.json()
        vulns = data.get("vulnerabilities", [])
        if not vulns:
            return {"source": "web_search_backed", "found": False, "cve_id": cve_id}
        cve = vulns[0]["cve"]
        metrics = cve.get("metrics", {})
        cvss_data = (
            metrics.get("cvssMetricV31", [{}])[0].get("cvssData", {})
            or metrics.get("cvssMetricV30", [{}])[0].get("cvssData", {})
        )
        return {
            "source": "web_search_backed",
            "found": True,
            "cve_id": cve_id,
            "cvss_score": cvss_data.get("baseScore"),
            "cvss_vector": cvss_data.get("vectorString"),
            "severity": cvss_data.get("baseSeverity"),
            "description": cve.get("descriptions", [{}])[0].get("value", ""),
            "published": cve.get("published"),
        }

Intended Use

  • SOC L1/L2 assistant tooling within the pipeline architecture above
  • Structured CVE entity extraction as a preprocessing step
  • Vulnerability report drafting and summarisation
  • Security awareness training content generation
  • CTF hint generation and write-up assistance

Out of Scope

  • Standalone authoritative CVSS scoring from model output alone
  • Automated patch prioritisation without RAG retrieval or NVD API verification
  • Any workflow where model-generated severity feeds directly into SLA enforcement

These constraints apply equally to all LLMs used for CVE scoring.


Licence

Apache 2.0. Training data sources retain their individual licences — see the dataset card for full attribution.

Downloads last month
43
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Voidreaper2026/qwen3-4b-cybersec-GGUF

Finetuned
Qwen/Qwen3-4B
Quantized
(227)
this model

Dataset used to train Voidreaper2026/qwen3-4b-cybersec-GGUF