ahczhg's picture
Update README.md
ca5bdc6 verified
---
language:
- en
license: apache-2.0
base_model: Qwen/Qwen3-0.6B-Base
tags:
- content-safety
- content-moderation
- safety
- lora
- fine-tuned
- nvidia-aegis
- text-classification
datasets:
- nvidia/Aegis-AI-Content-Safety-Dataset-2.0
metrics:
- perplexity
- accuracy
library_name: transformers
pipeline_tag: text-classification
---
# Qwen3-0.6B Fine-tuned on Aegis AI Content Safety
## Model Description
This is a fine-tuned version of Qwen3-0.6B-Base, optimized for content safety classification and moderation tasks. Qwen3 is a compact yet powerful language model developed by Alibaba Cloud, designed for efficient deployment while maintaining strong performance.
This model was fine-tuned using **LoRA (Low-Rank Adaptation)** on the [NVIDIA Aegis AI Content Safety Dataset 2.0](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0), which contains diverse examples of safe and unsafe content across multiple categories.
## Model Details
- **Base Model**: [Qwen/Qwen3-0.6B-Base](https://huggingface.co/Qwen/Qwen3-0.6B-Base)
- **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
- **Dataset**: NVIDIA Aegis AI Content Safety Dataset 2.0
- **Training Samples**: 2,000 carefully selected samples
- **Language**: English
- **License**: Apache 2.0
## Capabilities
- Content safety classification
- Toxic content detection
- Harmful content identification
- Safety-aware text generation
- Content moderation assistance
## Intended Use Cases
- Content moderation systems
- Chat application safety filters
- User-generated content screening
- Educational content filtering
- Social media safety monitoring
## Training Configuration
### LoRA Parameters
- **Rank (r)**: 16
- **Alpha**: 32
- **Dropout**: 0.05
- **Target Modules**: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
### Training Hyperparameters
- **Learning Rate**: 2e-4
- **Batch Size**: 4 (per device)
- **Gradient Accumulation Steps**: 4
- **Effective Batch Size**: 16
- **Epochs**: 3
- **Optimizer**: AdamW (8-bit paged)
- **LR Scheduler**: Cosine with warmup
- **Warmup Ratio**: 0.1
- **FP16 Training**: Yes
- **Max Sequence Length**: 512
## Usage
### Installation
```bash
pip install transformers torch peft
```
### Basic Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load model and tokenizer
model_name = "ahczhg/qwen3-0.6b-aegis-safety-lora"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto"
)
# Example: Content safety check
prompt = "### Instruction:\nAnalyze this content for safety: 'Your text here'\n\n### Response:\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=128,
temperature=0.7,
do_sample=True,
top_p=0.95
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```
### Advanced Usage with Pipeline
```python
from transformers import pipeline
# Create text generation pipeline
generator = pipeline(
"text-generation",
model="ahczhg/qwen3-0.6b-aegis-safety-lora",
torch_dtype=torch.float16,
device_map="auto"
)
# Generate safety analysis
result = generator(
"### Instruction:\nIs this content safe? 'Hello, how are you?'\n\n### Response:\n",
max_new_tokens=128,
temperature=0.7,
do_sample=True
)
print(result[0]['generated_text'])
```
## Evaluation
The model was evaluated on a held-out test set from the Aegis AI Content Safety Dataset. Key metrics include:
- Perplexity on validation set
- Content safety classification accuracy
- False positive/negative rates for harmful content detection
## Limitations
- The model is primarily trained on English language content
- Performance may vary on domain-specific or highly technical content
- Should be used as part of a comprehensive content moderation system, not as the sole decision-maker
- May require fine-tuning for specific use cases or content domains
- The model's outputs should be reviewed by human moderators for critical applications
## Ethical Considerations
- This model is designed to assist in content safety and moderation tasks
- It should not be used to censor legitimate speech or suppress diverse viewpoints
- Decisions about content moderation should involve human oversight
- The model may reflect biases present in the training data
- Users should implement appropriate safeguards and appeal processes
## Training Data
The model was fine-tuned on the [NVIDIA Aegis AI Content Safety Dataset 2.0](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0), which includes:
- Diverse examples of safe and unsafe content
- Multiple categories of potentially harmful content
- Balanced representation of safe content
- Real-world scenarios and edge cases
## Citation
If you use this model in your research or applications, please cite:
```bibtex
@misc{qwen3_0.6b_aegis_safety,
author = {ahczhg},
title = {Qwen3-0.6B Fine-tuned on Aegis AI Content Safety},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/ahczhg/qwen3-0.6b-aegis-safety-lora}},
note = {Fine-tuned on NVIDIA Aegis AI Content Safety Dataset 2.0}
}
```
## Acknowledgments
- Base model by the original authors: Qwen
- Dataset provided by NVIDIA
- Fine-tuning performed using HuggingFace Transformers and PEFT libraries
## Contact
For questions, issues, or feedback, please visit the [model repository](https://huggingface.co/ahczhg/qwen3-0.6b-aegis-safety-lora).
## Model Card Authors
- ahczhg
## Model Card Contact
- https://huggingface.co/ahczhg
[![Support me on Ko-fi](https://ko-fi.com/img/githubbutton_sm.svg)](https://ko-fi.com/ahczhg)