Instructions to use Voidreaper2026/qwen3-4b-cybersec-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use Voidreaper2026/qwen3-4b-cybersec-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Voidreaper2026/qwen3-4b-cybersec-GGUF", filename="model-Q8_0.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use Voidreaper2026/qwen3-4b-cybersec-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0 # Run inference directly in the terminal: llama-cli -hf Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0 # Run inference directly in the terminal: llama-cli -hf Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0 # Run inference directly in the terminal: ./llama-cli -hf Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0 # Run inference directly in the terminal: ./build/bin/llama-cli -hf Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0
Use Docker
docker model run hf.co/Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0
- LM Studio
- Jan
- vLLM
How to use Voidreaper2026/qwen3-4b-cybersec-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Voidreaper2026/qwen3-4b-cybersec-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Voidreaper2026/qwen3-4b-cybersec-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0
- Ollama
How to use Voidreaper2026/qwen3-4b-cybersec-GGUF with Ollama:
ollama run hf.co/Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0
- Unsloth Studio
How to use Voidreaper2026/qwen3-4b-cybersec-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Voidreaper2026/qwen3-4b-cybersec-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Voidreaper2026/qwen3-4b-cybersec-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Voidreaper2026/qwen3-4b-cybersec-GGUF to start chatting
- Pi
How to use Voidreaper2026/qwen3-4b-cybersec-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Voidreaper2026/qwen3-4b-cybersec-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use Voidreaper2026/qwen3-4b-cybersec-GGUF with Docker Model Runner:
docker model run hf.co/Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0
- Lemonade
How to use Voidreaper2026/qwen3-4b-cybersec-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0
Run and chat with the model
lemonade run user.qwen3-4b-cybersec-GGUF-Q8_0
List all available models
lemonade list
qwen3-4b-cybersec-GGUF — Cybersecurity Fine-Tuned Language Model
A Qwen3-4B model fine-tuned on the
Voidreaper2026/cybersec-master-dataset
and quantised to GGUF Q8_0 for local deployment. The training corpus spans 1.8 million
deduplicated records from NVD, OSV, GitHub Advisory Database, ExploitDB, MITRE ATT&CK,
CISA KEV, Security Stack Exchange, Kali Linux tooling, and Vulners vulnerability
intelligence.
This model is designed to operate as the fast extraction and classification layer in a grounded triage pipeline — not as a standalone severity oracle. That distinction matters, and the rest of this card explains why.
Quickstart
llama.cpp
# Install
brew install llama.cpp # macOS
winget install llama.cpp # Windows
# Run as OpenAI-compatible server
llama-server -hf Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0
# Or run directly in terminal
llama-cli -hf Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0
Ollama
ollama run hf.co/Voidreaper2026/qwen3-4b-cybersec-GGUF:Q8_0
llama-cpp-python
from llama_cpp import Llama
llm = Llama.from_pretrained(
repo_id="Voidreaper2026/qwen3-4b-cybersec-GGUF",
filename="model-Q8_0.gguf",
n_ctx=4096
)
response = llm.create_chat_completion(
messages=[
{
"role": "user",
"content": \"\"\"Extract the following fields from your knowledge of CVE-2023-44487.
Return as JSON only. Return null for any field you cannot confirm with certainty.
Fields: cve_id, cwe_ids, affected_products, attack_vector,
privileges_required, patch_available, cisa_kev, mitre_attack_technique\"\"\"
}
],
temperature=0.1,
max_tokens=512
)
print(response["choices"][0]["message"]["content"])
LM Studio / Jan
Search for Voidreaper2026/qwen3-4b-cybersec-GGUF directly in the app.
Training Data
| Source | Records | Description |
|---|---|---|
| NVD | 500,935 | CVE database back to 2002 |
| OSV | 754,273 | Multi-ecosystem vulnerability DB |
| GitHub Advisory DB | 328,525 | Security advisories, CC-BY 4.0 |
| Cybersec Causal Reasoning | 99,870 | Reasoning triples |
| Security Stack Exchange | 55,930 | Real-world Q&A |
| ExploitDB | 46,457 | Public exploit database |
| Vulners | 87,063 | Exploit and advisory intelligence |
| MITRE ATT&CK | 2,205 | Techniques, mitigations, groups |
| CISA KEV | 1,587 | Known Exploited Vulnerabilities |
| Kali Linux Tools | 790 | Tool descriptions and flags |
| Total (deduplicated) | 1,807,941 |
The Problem This Pipeline Solves
Every LLM over-inflates CVE severity scores. This is a field-wide problem, not a model-specific one.
It has nothing to do with training data quality. It is structural:
Pre-training data is skewed by nature. The internet massively over-represents Critical and High CVEs. Nobody publishes a detailed breakdown of a CVSS 4.2. Every LLM inherits this bias from pre-training, before any fine-tuning happens.
NVD base scores are worst-case by design. CVSS base scores assume no mitigating controls, full network exposure, and worst-case environment. A legitimate 9.8 in the database might realistically be a 4.0 in most real deployments.
Instruction tuning pushes toward caution. RLHF rewards thorough, safety-conscious answers. In a security context that trains a bias toward worst-case severity framing.
The pipeline below bypasses this entirely by ensuring severity scores are always retrieved from source data, never generated from model weights.
Recommended Architecture: Grounded Triage Pipeline
User Query
|
v
+------------------------------------------------------------------+
| qwen3-4b-cybersec (Extraction Layer) |
| |
| Fast, cheap, runs fully local on CPU or AMD/NVIDIA GPU. |
| Extracts CVE IDs, CWE types, affected products, attack surface. |
| Does NOT output severity scores or CVSS values. |
+-----------------------------+------------------------------------+
| Structured: CVE IDs, CWEs, products
v
+------------------------------------------------------------------+
| RAG Retrieval Layer |
| |
| Vector search over embedded cybersec-master-dataset. |
| Returns verbatim CVSS vectors, KEV status, ATT&CK mappings. |
+-----------------------------+------------------------------------+
|
| If CVE not in index:
v
+------------------------------------------------------------------+
| Web Search Fallback (No-RAG Path) |
| |
| Live lookups: NVD API, CISA KEV, CVE.mitre.org, vendor |
| advisories. Output tagged source: web_search_backed. |
+-----------------------------+------------------------------------+
| Retrieved context
v
+------------------------------------------------------------------+
| Large Model (Triage and Synthesis Layer) |
| |
| Operates on retrieved context only — never on weights. |
| Contextualises severity for the user's actual environment. |
| Flags confidence: rag_backed / web_search_backed / |
| model_generated (treat with caution). |
+------------------------------------------------------------------+
Why each component earns its place
qwen3-4b-cybersec is the economical workhorse. Entity extraction, CWE classification, and query structuring are exactly what a fine-tuned 4B model excels at. Runs fast, cheap, and fully local including on AMD GPUs via llama.cpp. Kept out of the scoring loop entirely.
RAG retrieval is the score source. Retrieving CVSS vectors verbatim from the source dataset completely bypasses the inflation problem regardless of which LLM you use.
Web search covers the temporal gap. Zero-days and post-training CVEs get live NVD API lookups, tagged so downstream systems know the data was not RAG-backed.
The large model synthesises, never invents. Given grounded context, it contextualises risk for the user's environment without ever recalling a score from weights.
Confidence Flagging
| Flag | Meaning | Trust level |
|---|---|---|
rag_backed |
Score retrieved verbatim from dataset index | High |
web_search_backed |
Score fetched live from NVD API or vendor advisory | High |
model_generated |
No retrieval source found — model inference only | Low — verify manually |
RAG Implementation Notes
Embedding the dataset
from datasets import load_dataset
from sentence_transformers import SentenceTransformer
ds = load_dataset("Voidreaper2026/cybersec-master-dataset", split="train")
encoder = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
# Embed at record level to preserve CVSS vector coherence
def get_embed_text(record):
convs = record["conversations"]
assistant_turn = next((c["value"] for c in convs if c["from"] == "gpt"), "")
return f"{record.get('cve_id', '')} {assistant_turn}"
NVD API fallback
import httpx
async def nvd_lookup(cve_id: str) -> dict:
url = f"https://services.nvd.nist.gov/rest/json/cves/2.0?cveId={cve_id}"
async with httpx.AsyncClient() as client:
r = await client.get(url, timeout=10)
r.raise_for_status()
data = r.json()
vulns = data.get("vulnerabilities", [])
if not vulns:
return {"source": "web_search_backed", "found": False, "cve_id": cve_id}
cve = vulns[0]["cve"]
metrics = cve.get("metrics", {})
cvss_data = (
metrics.get("cvssMetricV31", [{}])[0].get("cvssData", {})
or metrics.get("cvssMetricV30", [{}])[0].get("cvssData", {})
)
return {
"source": "web_search_backed",
"found": True,
"cve_id": cve_id,
"cvss_score": cvss_data.get("baseScore"),
"cvss_vector": cvss_data.get("vectorString"),
"severity": cvss_data.get("baseSeverity"),
"description": cve.get("descriptions", [{}])[0].get("value", ""),
"published": cve.get("published"),
}
Intended Use
- SOC L1/L2 assistant tooling within the pipeline architecture above
- Structured CVE entity extraction as a preprocessing step
- Vulnerability report drafting and summarisation
- Security awareness training content generation
- CTF hint generation and write-up assistance
Out of Scope
- Standalone authoritative CVSS scoring from model output alone
- Automated patch prioritisation without RAG retrieval or NVD API verification
- Any workflow where model-generated severity feeds directly into SLA enforcement
These constraints apply equally to all LLMs used for CVE scoring.
Licence
Apache 2.0. Training data sources retain their individual licences — see the dataset card for full attribution.
- Downloads last month
- 43
8-bit