| --- |
| language: |
| - en |
| license: apache-2.0 |
| base_model: Qwen/Qwen3-0.6B-Base |
| tags: |
| - content-safety |
| - content-moderation |
| - safety |
| - lora |
| - fine-tuned |
| - nvidia-aegis |
| - text-classification |
| datasets: |
| - nvidia/Aegis-AI-Content-Safety-Dataset-2.0 |
| metrics: |
| - perplexity |
| - accuracy |
| library_name: transformers |
| pipeline_tag: text-classification |
| --- |
| |
| # Qwen3-0.6B Fine-tuned on Aegis AI Content Safety |
|
|
| ## Model Description |
|
|
| This is a fine-tuned version of Qwen3-0.6B-Base, optimized for content safety classification and moderation tasks. Qwen3 is a compact yet powerful language model developed by Alibaba Cloud, designed for efficient deployment while maintaining strong performance. |
|
|
| This model was fine-tuned using **LoRA (Low-Rank Adaptation)** on the [NVIDIA Aegis AI Content Safety Dataset 2.0](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0), which contains diverse examples of safe and unsafe content across multiple categories. |
|
|
| ## Model Details |
|
|
| - **Base Model**: [Qwen/Qwen3-0.6B-Base](https://huggingface.co/Qwen/Qwen3-0.6B-Base) |
| - **Fine-tuning Method**: LoRA (Low-Rank Adaptation) |
| - **Dataset**: NVIDIA Aegis AI Content Safety Dataset 2.0 |
| - **Training Samples**: 2,000 carefully selected samples |
| - **Language**: English |
| - **License**: Apache 2.0 |
|
|
| ## Capabilities |
|
|
| - Content safety classification |
| - Toxic content detection |
| - Harmful content identification |
| - Safety-aware text generation |
| - Content moderation assistance |
|
|
| ## Intended Use Cases |
|
|
| - Content moderation systems |
| - Chat application safety filters |
| - User-generated content screening |
| - Educational content filtering |
| - Social media safety monitoring |
|
|
| ## Training Configuration |
|
|
| ### LoRA Parameters |
| - **Rank (r)**: 16 |
| - **Alpha**: 32 |
| - **Dropout**: 0.05 |
| - **Target Modules**: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| |
| ### Training Hyperparameters |
| - **Learning Rate**: 2e-4 |
| - **Batch Size**: 4 (per device) |
| - **Gradient Accumulation Steps**: 4 |
| - **Effective Batch Size**: 16 |
| - **Epochs**: 3 |
| - **Optimizer**: AdamW (8-bit paged) |
| - **LR Scheduler**: Cosine with warmup |
| - **Warmup Ratio**: 0.1 |
| - **FP16 Training**: Yes |
| - **Max Sequence Length**: 512 |
| |
| ## Usage |
| |
| ### Installation |
| |
| ```bash |
| pip install transformers torch peft |
| ``` |
| |
| ### Basic Usage |
| |
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| import torch |
| |
| # Load model and tokenizer |
| model_name = "ahczhg/qwen3-0.6b-aegis-safety-lora" |
| tokenizer = AutoTokenizer.from_pretrained(model_name) |
| model = AutoModelForCausalLM.from_pretrained( |
| model_name, |
| torch_dtype=torch.float16, |
| device_map="auto" |
| ) |
| |
| # Example: Content safety check |
| prompt = "### Instruction:\nAnalyze this content for safety: 'Your text here'\n\n### Response:\n" |
| inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
| |
| with torch.no_grad(): |
| outputs = model.generate( |
| **inputs, |
| max_new_tokens=128, |
| temperature=0.7, |
| do_sample=True, |
| top_p=0.95 |
| ) |
| |
| response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
| print(response) |
| ``` |
| |
| ### Advanced Usage with Pipeline |
| |
| ```python |
| from transformers import pipeline |
|
|
| # Create text generation pipeline |
| generator = pipeline( |
| "text-generation", |
| model="ahczhg/qwen3-0.6b-aegis-safety-lora", |
| torch_dtype=torch.float16, |
| device_map="auto" |
| ) |
| |
| # Generate safety analysis |
| result = generator( |
| "### Instruction:\nIs this content safe? 'Hello, how are you?'\n\n### Response:\n", |
| max_new_tokens=128, |
| temperature=0.7, |
| do_sample=True |
| ) |
| |
| print(result[0]['generated_text']) |
| ``` |
| |
| ## Evaluation |
| |
| The model was evaluated on a held-out test set from the Aegis AI Content Safety Dataset. Key metrics include: |
| |
| - Perplexity on validation set |
| - Content safety classification accuracy |
| - False positive/negative rates for harmful content detection |
| |
| ## Limitations |
| |
| - The model is primarily trained on English language content |
| - Performance may vary on domain-specific or highly technical content |
| - Should be used as part of a comprehensive content moderation system, not as the sole decision-maker |
| - May require fine-tuning for specific use cases or content domains |
| - The model's outputs should be reviewed by human moderators for critical applications |
| |
| ## Ethical Considerations |
| |
| - This model is designed to assist in content safety and moderation tasks |
| - It should not be used to censor legitimate speech or suppress diverse viewpoints |
| - Decisions about content moderation should involve human oversight |
| - The model may reflect biases present in the training data |
| - Users should implement appropriate safeguards and appeal processes |
| |
| ## Training Data |
| |
| The model was fine-tuned on the [NVIDIA Aegis AI Content Safety Dataset 2.0](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0), which includes: |
| |
| - Diverse examples of safe and unsafe content |
| - Multiple categories of potentially harmful content |
| - Balanced representation of safe content |
| - Real-world scenarios and edge cases |
| |
| ## Citation |
| |
| If you use this model in your research or applications, please cite: |
| |
| ```bibtex |
| @misc{qwen3_0.6b_aegis_safety, |
| author = {ahczhg}, |
| title = {Qwen3-0.6B Fine-tuned on Aegis AI Content Safety}, |
| year = {2025}, |
| publisher = {HuggingFace}, |
| howpublished = {\url{https://huggingface.co/ahczhg/qwen3-0.6b-aegis-safety-lora}}, |
| note = {Fine-tuned on NVIDIA Aegis AI Content Safety Dataset 2.0} |
| } |
| ``` |
| |
| ## Acknowledgments |
| |
| - Base model by the original authors: Qwen |
| - Dataset provided by NVIDIA |
| - Fine-tuning performed using HuggingFace Transformers and PEFT libraries |
| |
| ## Contact |
| |
| For questions, issues, or feedback, please visit the [model repository](https://huggingface.co/ahczhg/qwen3-0.6b-aegis-safety-lora). |
| |
| ## Model Card Authors |
| |
| - ahczhg |
| |
| ## Model Card Contact |
| |
| - https://huggingface.co/ahczhg |
| |
| [](https://ko-fi.com/ahczhg) |
| |
| |