File size: 5,868 Bytes
6165f62 4868892 6165f62 4868892 6165f62 4868892 6165f62 4868892 6165f62 4868892 6165f62 4868892 6165f62 4868892 6165f62 4868892 6165f62 4868892 6165f62 4868892 6165f62 4868892 6165f62 4868892 6165f62 4868892 6165f62 4868892 6165f62 4868892 6165f62 4868892 6165f62 4868892 6165f62 4868892 6165f62 4868892 6165f62 4868892 6165f62 4868892 6165f62 4868892 6165f62 4868892 6165f62 4868892 6165f62 4868892 6165f62 4868892 ca5bdc6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 | ---
language:
- en
license: apache-2.0
base_model: Qwen/Qwen3-0.6B-Base
tags:
- content-safety
- content-moderation
- safety
- lora
- fine-tuned
- nvidia-aegis
- text-classification
datasets:
- nvidia/Aegis-AI-Content-Safety-Dataset-2.0
metrics:
- perplexity
- accuracy
library_name: transformers
pipeline_tag: text-classification
---
# Qwen3-0.6B Fine-tuned on Aegis AI Content Safety
## Model Description
This is a fine-tuned version of Qwen3-0.6B-Base, optimized for content safety classification and moderation tasks. Qwen3 is a compact yet powerful language model developed by Alibaba Cloud, designed for efficient deployment while maintaining strong performance.
This model was fine-tuned using **LoRA (Low-Rank Adaptation)** on the [NVIDIA Aegis AI Content Safety Dataset 2.0](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0), which contains diverse examples of safe and unsafe content across multiple categories.
## Model Details
- **Base Model**: [Qwen/Qwen3-0.6B-Base](https://huggingface.co/Qwen/Qwen3-0.6B-Base)
- **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
- **Dataset**: NVIDIA Aegis AI Content Safety Dataset 2.0
- **Training Samples**: 2,000 carefully selected samples
- **Language**: English
- **License**: Apache 2.0
## Capabilities
- Content safety classification
- Toxic content detection
- Harmful content identification
- Safety-aware text generation
- Content moderation assistance
## Intended Use Cases
- Content moderation systems
- Chat application safety filters
- User-generated content screening
- Educational content filtering
- Social media safety monitoring
## Training Configuration
### LoRA Parameters
- **Rank (r)**: 16
- **Alpha**: 32
- **Dropout**: 0.05
- **Target Modules**: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
### Training Hyperparameters
- **Learning Rate**: 2e-4
- **Batch Size**: 4 (per device)
- **Gradient Accumulation Steps**: 4
- **Effective Batch Size**: 16
- **Epochs**: 3
- **Optimizer**: AdamW (8-bit paged)
- **LR Scheduler**: Cosine with warmup
- **Warmup Ratio**: 0.1
- **FP16 Training**: Yes
- **Max Sequence Length**: 512
## Usage
### Installation
```bash
pip install transformers torch peft
```
### Basic Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load model and tokenizer
model_name = "ahczhg/qwen3-0.6b-aegis-safety-lora"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto"
)
# Example: Content safety check
prompt = "### Instruction:\nAnalyze this content for safety: 'Your text here'\n\n### Response:\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=128,
temperature=0.7,
do_sample=True,
top_p=0.95
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```
### Advanced Usage with Pipeline
```python
from transformers import pipeline
# Create text generation pipeline
generator = pipeline(
"text-generation",
model="ahczhg/qwen3-0.6b-aegis-safety-lora",
torch_dtype=torch.float16,
device_map="auto"
)
# Generate safety analysis
result = generator(
"### Instruction:\nIs this content safe? 'Hello, how are you?'\n\n### Response:\n",
max_new_tokens=128,
temperature=0.7,
do_sample=True
)
print(result[0]['generated_text'])
```
## Evaluation
The model was evaluated on a held-out test set from the Aegis AI Content Safety Dataset. Key metrics include:
- Perplexity on validation set
- Content safety classification accuracy
- False positive/negative rates for harmful content detection
## Limitations
- The model is primarily trained on English language content
- Performance may vary on domain-specific or highly technical content
- Should be used as part of a comprehensive content moderation system, not as the sole decision-maker
- May require fine-tuning for specific use cases or content domains
- The model's outputs should be reviewed by human moderators for critical applications
## Ethical Considerations
- This model is designed to assist in content safety and moderation tasks
- It should not be used to censor legitimate speech or suppress diverse viewpoints
- Decisions about content moderation should involve human oversight
- The model may reflect biases present in the training data
- Users should implement appropriate safeguards and appeal processes
## Training Data
The model was fine-tuned on the [NVIDIA Aegis AI Content Safety Dataset 2.0](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0), which includes:
- Diverse examples of safe and unsafe content
- Multiple categories of potentially harmful content
- Balanced representation of safe content
- Real-world scenarios and edge cases
## Citation
If you use this model in your research or applications, please cite:
```bibtex
@misc{qwen3_0.6b_aegis_safety,
author = {ahczhg},
title = {Qwen3-0.6B Fine-tuned on Aegis AI Content Safety},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/ahczhg/qwen3-0.6b-aegis-safety-lora}},
note = {Fine-tuned on NVIDIA Aegis AI Content Safety Dataset 2.0}
}
```
## Acknowledgments
- Base model by the original authors: Qwen
- Dataset provided by NVIDIA
- Fine-tuning performed using HuggingFace Transformers and PEFT libraries
## Contact
For questions, issues, or feedback, please visit the [model repository](https://huggingface.co/ahczhg/qwen3-0.6b-aegis-safety-lora).
## Model Card Authors
- ahczhg
## Model Card Contact
- https://huggingface.co/ahczhg
[](https://ko-fi.com/ahczhg)
|