Qwen2.5-1.5B — Slips IDS Unified Security Analyst (v2)

Model Description

A fine-tuned version of Qwen2.5-1.5B-Instruct specialized for three complementary security analysis tasks on network incidents from Slips IDS — all in a single adapter:

  1. Summarization — translating technical Slips DAG alert logs into clear, human-readable incident summaries with per-event severity labels (CRITICAL / HIGH / MEDIUM / LOW / INFO)
  2. Cause Analysis — identifying the likely cause (malicious activity, misconfiguration, or legitimate behavior) with structured reasoning
  3. Risk Assessment — producing calibrated risk level, business impact, likelihood of malicious activity, and investigation priority

Slips is a network intrusion detection system that generates DAG-structured alert logs — chains of related security events per source IP per time window. This unified model handles the full analyst pipeline in one inference call or as separate targeted queries.

This model merges the capabilities of stratosphere/qwen2.5-1.5b-slips-immune-summarization and stratosphere/qwen2.5-1.5b-slips-immune-risk into a single fine-tuned adapter trained jointly on all three tasks.


Quick Start

Ollama (Recommended)

ollama run harpomaxx/qwen2.5-1.5b-slips-immune-unified-v2

Python (Transformers)

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "harpomaxx/qwen2.5-1.5b-slips-immune-unified-v2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="auto"
)

# --- Task 1: Summarization ---
summary_prompt = """You are a security analyst. Your task is to translate technical security events into clear, concise, human-readable summaries and assess their severity.

INCIDENT METADATA:
- Incident ID: {incident_id}
- Source IP: {source_ip}
- Timewindow: {timewindow}
- Accumulated Threat Level: {threat_level}
- Time Range: {start} to {end}
- Total Events: {count}

RAW EVENTS:
{dag_analysis}

YOUR TASK:
1. Transform technical event descriptions into clear, readable summaries
2. Group identical or similar events
3. Assess severity (CRITICAL/HIGH/MEDIUM/LOW/INFO)
4. Calculate overall severity breakdown

OUTPUT FORMAT:
============================================================
Incident: <incident_id>
Source IP: <source_ip> | Timewindow: <timewindow>
Timeline: <start> to <end>
Threat Level: <threat_level> | Events: <count>

• HH:MM-HH:MM - [Your clear grouped summary] [SEVERITY]
• HH:MM - [Your clear summary] [SEVERITY]

Total Evidence: <count> events
Severity breakdown: [e.g., "High: 5, Medium: 3, Info: 2"]"""

# --- Task 2: Cause Analysis ---
cause_prompt = """You are a cybersecurity analyst. Analyze the following network security incident and provide a structured analysis of possible causes.

INCIDENT METADATA:
- Incident ID: {incident_id}
- Source IP: {source_ip}
- Accumulated Threat Level: {threat_level}

SECURITY EVIDENCE:
{dag_analysis}

Output Requirements:
- Respond with ONLY the analysis content

**Possible Causes:**

**1. Malicious Activity:**
• [Specific attack technique]

**2. Legitimate Activity:**
• [Benign operational cause]

**3. Misconfigurations:**
• [Technical misconfigurations]

**Conclusion:** [Assessment of most likely cause category]"""

# --- Task 3: Risk Assessment ---
risk_prompt = """You are a cybersecurity analyst. Analyze the following network security incident and provide a structured risk assessment.

INCIDENT METADATA:
- Incident ID: {incident_id}
- Source IP: {source_ip}
- Accumulated Threat Level: {threat_level}

SECURITY EVIDENCE:
{dag_analysis}

**Risk Level:** [Critical/High/Medium/Low]

**Justification:** [Technical justification]

**Business Impact:** [Single clear sentence describing business effect]

**Likelihood of Malicious Activity:** [High/Medium/Low] - [Brief rationale]

**Investigation Priority:** [Immediate/High/Medium/Low] - [Brief justification]"""

def run_task(prompt):
    messages = [{"role": "user", "content": prompt}]
    input_ids = tokenizer.apply_chat_template(
        messages, return_tensors="pt", add_generation_prompt=True
    ).to(model.device)
    output = model.generate(input_ids, max_new_tokens=512, do_sample=False)
    return tokenizer.decode(output[0][input_ids.shape[1]:], skip_special_tokens=True)

Training Details

Dataset

  • Source: 750 incidents from real Slips IDS network captures (675 train / 75 eval incidents)
  • Tasks: Three tasks per incident — summarization (S), cause analysis (A), risk assessment (B) — interleaved
  • Responses: 4 model responses per incident per task (GPT-4o, GPT-4o-mini, Qwen2.5 3B, Qwen2.5 1.5B)
  • Selection: Best-of-N — highest-scoring response selected via LLM-as-judge
  • Filtering: Responses with judge score < 4 discarded
  • Split: 2195 train / 225 eval records (augmented with 85 risk-only extra samples, seed=42)
  • Dataset: stratosphere/immune-unified-sft-dataset

Training Procedure

Parameter Value
Base Model unsloth/Qwen2.5-1.5B-Instruct
Training Method SFT (Supervised Fine-Tuning)
Framework Unsloth + TRL SFTTrainer
LoRA Rank (r) 128
LoRA Alpha 128
LoRA Dropout 0.0
RSLoRA Enabled (required at r=64)
LoRA Targets q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Sequence Length 4096
Batch Size 1 (effective: 16 via gradient accumulation)
Learning Rate 2e-5
LR Scheduler Cosine
Warmup Steps 30
Weight Decay 0.01
Epochs 2
Optimizer adamw_8bit
Precision BF16
Quantization 4bit (QLoRA)
Hardware A100 80GB MiG 20GB slice

Training Results

Step Epoch Eval Loss
50 0.57 0.8047
100 1.12 0.7594
150 1.69 0.7426
200 2.25 0.7327
250 2.82 0.7293 ← best

Eval loss decreased monotonically across all checkpoints with no sign of overfitting.


Evaluation Results

Summarization Task

Evaluated on 47 held-out Slips IDS incidents using gpt-oss-120b as an independent LLM-as-judge.

Rank Model Avg Score Win Rate
1 GPT-4o-mini 6.89/10 42.6%
2 GPT-4o 5.87/10 29.8%
3 Qwen2.5-1.5B (finetuned) 4.70/10 19.1%
4 Qwen2.5 3B (baseline) 4.57/10 8.5%
5 Qwen2.5 1B (baseline) 3.36/10 0.0%

The finetuned 1.5B model beats both untuned baselines and achieves a 19.1% win rate — higher than the 3B baseline.

Cause Analysis & Risk Assessment Tasks

Evaluated on 67 held-out Slips IDS incidents.

Rank Model Avg Cause Score Avg Risk Score Win Rate
1 GPT-4o 15.33 11.99 40.3%
2 Qwen2.5-1.5B (finetuned) 15.58 10.27 37.3%
3 GPT-4o-mini 15.31 11.63 19.4%
4 Qwen2.5 1.5B (baseline) 9.15 8.79 3.0%
5 Qwen2.5 3B (baseline) 7.40 9.61 0.0%

Key Finding: The finetuned model is nearly tied with GPT-4o overall and beats GPT-4o on cause analysis (15.58 vs 15.33), at a fraction of the inference cost.


Known Limitations

  • Context window: Performance degrades on incidents with ≥500 events where DAG token counts exceed the 4096-token limit. Complex incidents are truncated.
  • Risk calibration: The model is stronger at identifying causes than calibrating risk levels (cause score 15.58 vs risk score 10.27).
  • Normal traffic: Summarization accuracy on normal (benign) traffic is lower than on incident traffic.
  • Domain: Trained exclusively on Slips IDS logs — not suitable for other IDS formats or general security tasks.

Intended Use

  • Automated triage of Slips IDS alerts for security analysts
  • Full pipeline: summarize → analyze cause → assess risk, in a single model
  • First-pass analysis of network incident logs as input to downstream reporting or ticketing workflows
  • Edge/on-premises deployment (RPi5, low-resource servers) via GGUF quantization

Out-of-Scope Use

  • General-purpose chat or instruction following
  • Security domains outside Slips IDS / network intrusion detection
  • Non-English inputs

Model Details

  • Model Size: 1.5B parameters
  • Tensor Type: BF16
  • License: Apache-2.0

Citation

@misc{qwen2.5-1.5b-slips-unified,
  title        = {Qwen2.5-1.5B fine-tuned for unified Slips IDS security analysis},
  author       = {Stratosphere Laboratory, CTU Prague},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/harpomaxx/qwen2.5-1.5b-slips-immune-unified-v2}}
}

Acknowledgments

Supported by the NLnet Foundation as part of the IMMUNE project, promoting open internet standards and open source software.

Downloads last month
89
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for stratosphere/qwen2.5-1.5b-slips-immune-unified

Adapter
(486)
this model

Dataset used to train stratosphere/qwen2.5-1.5b-slips-immune-unified