Instructions to use Voidreaper2026/qwen3-4b-cybersec-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Voidreaper2026/qwen3-4b-cybersec-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Voidreaper2026/qwen3-4b-cybersec-GGUF",
	filename="model-Q8_0.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use Voidreaper2026/qwen3-4b-cybersec-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0
# Run inference directly in the terminal:
llama-cli -hf Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0
# Run inference directly in the terminal:
llama-cli -hf Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0
# Run inference directly in the terminal:
./llama-cli -hf Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0

Use Docker

docker model run hf.co/Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0

LM Studio
Jan

vLLM

How to use Voidreaper2026/qwen3-4b-cybersec-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Voidreaper2026/qwen3-4b-cybersec-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Voidreaper2026/qwen3-4b-cybersec-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0

Ollama
How to use Voidreaper2026/qwen3-4b-cybersec-GGUF with Ollama:
```
ollama run hf.co/Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0
```

Unsloth Studio

How to use Voidreaper2026/qwen3-4b-cybersec-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Voidreaper2026/qwen3-4b-cybersec-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Voidreaper2026/qwen3-4b-cybersec-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Voidreaper2026/qwen3-4b-cybersec-GGUF to start chatting

How to use Voidreaper2026/qwen3-4b-cybersec-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use Voidreaper2026/qwen3-4b-cybersec-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use Voidreaper2026/qwen3-4b-cybersec-GGUF with Docker Model Runner:
```
docker model run hf.co/Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0
```

Lemonade

How to use Voidreaper2026/qwen3-4b-cybersec-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0

Run and chat with the model

lemonade run user.qwen3-4b-cybersec-GGUF-Q8_0

List all available models

lemonade list

qwen3-4b-cybersec-GGUF — Cybersecurity Fine-Tuned Language Model

A Qwen3-4B model fine-tuned on the Voidreaper2026/cybersec-master-dataset and quantised to GGUF Q8_0 for local deployment. The training corpus spans 1.8 million deduplicated records from NVD, OSV, GitHub Advisory Database, ExploitDB, MITRE ATT&CK, CISA KEV, Security Stack Exchange, Kali Linux tooling, and Vulners vulnerability intelligence.

This model is designed to operate as the fast extraction and classification layer in a grounded triage pipeline — not as a standalone severity oracle. That distinction matters, and the rest of this card explains why.

Quickstart

llama.cpp

# Install
brew install llama.cpp          # macOS
winget install llama.cpp        # Windows

# Run as OpenAI-compatible server
llama-server -hf Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0

# Or run directly in terminal
llama-cli -hf Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0

Ollama

ollama run hf.co/Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0

llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="Voidreaper2026/qwen3-4b-cybersec-GGUF",
    filename="model-Q8_0.gguf",
    n_ctx=4096
)

response = llm.create_chat_completion(
    messages=[
        {
            "role": "user",
            "content": \"\"\"Extract the following fields from your knowledge of CVE-2023-44487.
Return as JSON only. Return null for any field you cannot confirm with certainty.

Fields: cve_id, cwe_ids, affected_products, attack_vector,
privileges_required, patch_available, cisa_kev, mitre_attack_technique\"\"\"
        }
    ],
    temperature=0.1,
    max_tokens=512
)

print(response["choices"][0]["message"]["content"])

LM Studio / Jan

Search for Voidreaper2026/qwen3-4b-cybersec-GGUF directly in the app.

Training Data

Source	Records	Description
NVD	500,935	CVE database back to 2002
OSV	754,273	Multi-ecosystem vulnerability DB
GitHub Advisory DB	328,525	Security advisories, CC-BY 4.0
Cybersec Causal Reasoning	99,870	Reasoning triples
Security Stack Exchange	55,930	Real-world Q&A
ExploitDB	46,457	Public exploit database
Vulners	87,063	Exploit and advisory intelligence
MITRE ATT&CK	2,205	Techniques, mitigations, groups
CISA KEV	1,587	Known Exploited Vulnerabilities
Kali Linux Tools	790	Tool descriptions and flags
Total (deduplicated)	1,807,941

The Problem This Pipeline Solves

Every LLM over-inflates CVE severity scores. This is a field-wide problem, not a model-specific one.

It has nothing to do with training data quality. It is structural:

Pre-training data is skewed by nature. The internet massively over-represents Critical and High CVEs. Nobody publishes a detailed breakdown of a CVSS 4.2. Every LLM inherits this bias from pre-training, before any fine-tuning happens.
NVD base scores are worst-case by design. CVSS base scores assume no mitigating controls, full network exposure, and worst-case environment. A legitimate 9.8 in the database might realistically be a 4.0 in most real deployments.
Instruction tuning pushes toward caution. RLHF rewards thorough, safety-conscious answers. In a security context that trains a bias toward worst-case severity framing.

The pipeline below bypasses this entirely by ensuring severity scores are always retrieved from source data, never generated from model weights.

Recommended Architecture: Grounded Triage Pipeline

User Query
    |
    v
+------------------------------------------------------------------+
|           qwen3-4b-cybersec  (Extraction Layer)                  |
|                                                                  |
|  Fast, cheap, runs fully local on CPU or AMD/NVIDIA GPU.         |
|  Extracts CVE IDs, CWE types, affected products, attack surface. |
|  Does NOT output severity scores or CVSS values.                 |
+-----------------------------+------------------------------------+
                              |  Structured: CVE IDs, CWEs, products
                              v
+------------------------------------------------------------------+
|                    RAG Retrieval Layer                           |
|                                                                  |
|  Vector search over embedded cybersec-master-dataset.            |
|  Returns verbatim CVSS vectors, KEV status, ATT&CK mappings.     |
+-----------------------------+------------------------------------+
                              |
                              |  If CVE not in index:
                              v
+------------------------------------------------------------------+
|                 Web Search Fallback (No-RAG Path)                |
|                                                                  |
|  Live lookups: NVD API, CISA KEV, CVE.mitre.org, vendor         |
|  advisories. Output tagged source: web_search_backed.            |
+-----------------------------+------------------------------------+
                              |  Retrieved context
                              v
+------------------------------------------------------------------+
|             Large Model  (Triage and Synthesis Layer)            |
|                                                                  |
|  Operates on retrieved context only — never on weights.          |
|  Contextualises severity for the user's actual environment.      |
|  Flags confidence: rag_backed / web_search_backed /              |
|  model_generated (treat with caution).                           |
+------------------------------------------------------------------+

Why each component earns its place

qwen3-4b-cybersec is the economical workhorse. Entity extraction, CWE classification, and query structuring are exactly what a fine-tuned 4B model excels at. Runs fast, cheap, and fully local including on AMD GPUs via llama.cpp. Kept out of the scoring loop entirely.

RAG retrieval is the score source. Retrieving CVSS vectors verbatim from the source dataset completely bypasses the inflation problem regardless of which LLM you use.

Web search covers the temporal gap. Zero-days and post-training CVEs get live NVD API lookups, tagged so downstream systems know the data was not RAG-backed.

The large model synthesises, never invents. Given grounded context, it contextualises risk for the user's environment without ever recalling a score from weights.

Confidence Flagging

Flag	Meaning	Trust level
`rag_backed`	Score retrieved verbatim from dataset index	High
`web_search_backed`	Score fetched live from NVD API or vendor advisory	High
`model_generated`	No retrieval source found — model inference only	Low — verify manually

RAG Implementation Notes

Embedding the dataset

from datasets import load_dataset
from sentence_transformers import SentenceTransformer

ds = load_dataset("Voidreaper2026/cybersec-master-dataset", split="train")
encoder = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")

# Embed at record level to preserve CVSS vector coherence
def get_embed_text(record):
    convs = record["conversations"]
    assistant_turn = next((c["value"] for c in convs if c["from"] == "gpt"), "")
    return f"{record.get('cve_id', '')} {assistant_turn}"

NVD API fallback

import httpx

async def nvd_lookup(cve_id: str) -> dict:
    url = f"https://services.nvd.nist.gov/rest/json/cves/2.0?cveId={cve_id}"
    async with httpx.AsyncClient() as client:
        r = await client.get(url, timeout=10)
        r.raise_for_status()
        data = r.json()
        vulns = data.get("vulnerabilities", [])
        if not vulns:
            return {"source": "web_search_backed", "found": False, "cve_id": cve_id}
        cve = vulns[0]["cve"]
        metrics = cve.get("metrics", {})
        cvss_data = (
            metrics.get("cvssMetricV31", [{}])[0].get("cvssData", {})
            or metrics.get("cvssMetricV30", [{}])[0].get("cvssData", {})
        )
        return {
            "source": "web_search_backed",
            "found": True,
            "cve_id": cve_id,
            "cvss_score": cvss_data.get("baseScore"),
            "cvss_vector": cvss_data.get("vectorString"),
            "severity": cvss_data.get("baseSeverity"),
            "description": cve.get("descriptions", [{}])[0].get("value", ""),
            "published": cve.get("published"),
        }

Intended Use

SOC L1/L2 assistant tooling within the pipeline architecture above
Structured CVE entity extraction as a preprocessing step
Vulnerability report drafting and summarisation
Security awareness training content generation
CTF hint generation and write-up assistance

Out of Scope

Standalone authoritative CVSS scoring from model output alone
Automated patch prioritisation without RAG retrieval or NVD API verification
Any workflow where model-generated severity feeds directly into SLA enforcement

These constraints apply equally to all LLMs used for CVE scoring.

Licence

Apache 2.0. Training data sources retain their individual licences — see the dataset card for full attribution.

Downloads last month: 43

GGUF

Model size

4B params

Architecture

qwen3

Hardware compatibility

8-bit

Model tree for Voidreaper2026/qwen3-4b-cybersec-GGUF

Base model

Qwen/Qwen3-4B-Base

Finetuned

Qwen/Qwen3-4B

Quantized

(227)

this model

Voidreaper2026
/

qwen3-4b-cybersec-GGUF

qwen3-4b-cybersec-GGUF — Cybersecurity Fine-Tuned Language Model

Quickstart

llama.cpp

Ollama

llama-cpp-python

LM Studio / Jan

Training Data

The Problem This Pipeline Solves

Recommended Architecture: Grounded Triage Pipeline

Why each component earns its place

Confidence Flagging

RAG Implementation Notes

Embedding the dataset

NVD API fallback

Intended Use

Out of Scope

Licence

Model tree for Voidreaper2026/qwen3-4b-cybersec-GGUF

Dataset used to train Voidreaper2026/qwen3-4b-cybersec-GGUF