File size: 5,868 Bytes
6165f62
4868892
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6165f62
4868892
6165f62
 
4868892
6165f62
4868892
6165f62
4868892
6165f62
4868892
6165f62
 
 
4868892
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6165f62
 
 
4868892
6165f62
4868892
 
 
6165f62
4868892
6165f62
4868892
 
 
 
 
6165f62
4868892
6165f62
4868892
 
 
 
 
6165f62
4868892
6165f62
4868892
6165f62
4868892
 
 
 
6165f62
4868892
6165f62
4868892
6165f62
4868892
 
 
 
 
 
 
 
 
 
6165f62
4868892
6165f62
4868892
 
 
6165f62
4868892
6165f62
4868892
6165f62
4868892
6165f62
4868892
6165f62
 
 
4868892
ca5bdc6
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
---
language:
- en
license: apache-2.0
base_model: Qwen/Qwen3-0.6B-Base
tags:
- content-safety
- content-moderation
- safety
- lora
- fine-tuned
- nvidia-aegis
- text-classification
datasets:
- nvidia/Aegis-AI-Content-Safety-Dataset-2.0
metrics:
- perplexity
- accuracy
library_name: transformers
pipeline_tag: text-classification
---

# Qwen3-0.6B Fine-tuned on Aegis AI Content Safety

## Model Description

This is a fine-tuned version of Qwen3-0.6B-Base, optimized for content safety classification and moderation tasks. Qwen3 is a compact yet powerful language model developed by Alibaba Cloud, designed for efficient deployment while maintaining strong performance.

This model was fine-tuned using **LoRA (Low-Rank Adaptation)** on the [NVIDIA Aegis AI Content Safety Dataset 2.0](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0), which contains diverse examples of safe and unsafe content across multiple categories.

## Model Details

- **Base Model**: [Qwen/Qwen3-0.6B-Base](https://huggingface.co/Qwen/Qwen3-0.6B-Base)
- **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
- **Dataset**: NVIDIA Aegis AI Content Safety Dataset 2.0
- **Training Samples**: 2,000 carefully selected samples
- **Language**: English
- **License**: Apache 2.0

## Capabilities

- Content safety classification
- Toxic content detection
- Harmful content identification
- Safety-aware text generation
- Content moderation assistance

## Intended Use Cases

- Content moderation systems
- Chat application safety filters
- User-generated content screening
- Educational content filtering
- Social media safety monitoring

## Training Configuration

### LoRA Parameters
- **Rank (r)**: 16
- **Alpha**: 32
- **Dropout**: 0.05
- **Target Modules**: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

### Training Hyperparameters
- **Learning Rate**: 2e-4
- **Batch Size**: 4 (per device)
- **Gradient Accumulation Steps**: 4
- **Effective Batch Size**: 16
- **Epochs**: 3
- **Optimizer**: AdamW (8-bit paged)
- **LR Scheduler**: Cosine with warmup
- **Warmup Ratio**: 0.1
- **FP16 Training**: Yes
- **Max Sequence Length**: 512

## Usage

### Installation

```bash
pip install transformers torch peft
```

### Basic Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model_name = "ahczhg/qwen3-0.6b-aegis-safety-lora"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Example: Content safety check
prompt = "### Instruction:\nAnalyze this content for safety: 'Your text here'\n\n### Response:\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=128,
        temperature=0.7,
        do_sample=True,
        top_p=0.95
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```

### Advanced Usage with Pipeline

```python
from transformers import pipeline

# Create text generation pipeline
generator = pipeline(
    "text-generation",
    model="ahczhg/qwen3-0.6b-aegis-safety-lora",
    torch_dtype=torch.float16,
    device_map="auto"
)

# Generate safety analysis
result = generator(
    "### Instruction:\nIs this content safe? 'Hello, how are you?'\n\n### Response:\n",
    max_new_tokens=128,
    temperature=0.7,
    do_sample=True
)

print(result[0]['generated_text'])
```

## Evaluation

The model was evaluated on a held-out test set from the Aegis AI Content Safety Dataset. Key metrics include:

- Perplexity on validation set
- Content safety classification accuracy
- False positive/negative rates for harmful content detection

## Limitations

- The model is primarily trained on English language content
- Performance may vary on domain-specific or highly technical content
- Should be used as part of a comprehensive content moderation system, not as the sole decision-maker
- May require fine-tuning for specific use cases or content domains
- The model's outputs should be reviewed by human moderators for critical applications

## Ethical Considerations

- This model is designed to assist in content safety and moderation tasks
- It should not be used to censor legitimate speech or suppress diverse viewpoints
- Decisions about content moderation should involve human oversight
- The model may reflect biases present in the training data
- Users should implement appropriate safeguards and appeal processes

## Training Data

The model was fine-tuned on the [NVIDIA Aegis AI Content Safety Dataset 2.0](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0), which includes:

- Diverse examples of safe and unsafe content
- Multiple categories of potentially harmful content
- Balanced representation of safe content
- Real-world scenarios and edge cases

## Citation

If you use this model in your research or applications, please cite:

```bibtex
@misc{qwen3_0.6b_aegis_safety,
  author = {ahczhg},
  title = {Qwen3-0.6B Fine-tuned on Aegis AI Content Safety},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/ahczhg/qwen3-0.6b-aegis-safety-lora}},
  note = {Fine-tuned on NVIDIA Aegis AI Content Safety Dataset 2.0}
}
```

## Acknowledgments

- Base model by the original authors: Qwen
- Dataset provided by NVIDIA
- Fine-tuning performed using HuggingFace Transformers and PEFT libraries

## Contact

For questions, issues, or feedback, please visit the [model repository](https://huggingface.co/ahczhg/qwen3-0.6b-aegis-safety-lora).

## Model Card Authors

- ahczhg

## Model Card Contact

- https://huggingface.co/ahczhg

[![Support me on Ko-fi](https://ko-fi.com/img/githubbutton_sm.svg)](https://ko-fi.com/ahczhg)