Update README.md

ca5bdc6 verified 5 months ago

5.87 kB

	---
	language:
	- en
	license: apache-2.0
	base_model: Qwen/Qwen3-0.6B-Base
	tags:
	- content-safety
	- content-moderation
	- safety
	- lora
	- fine-tuned
	- nvidia-aegis
	- text-classification
	datasets:
	- nvidia/Aegis-AI-Content-Safety-Dataset-2.0
	metrics:
	- perplexity
	- accuracy
	library_name: transformers
	pipeline_tag: text-classification
	---

	# Qwen3-0.6B Fine-tuned on Aegis AI Content Safety

	## Model Description

	This is a fine-tuned version of Qwen3-0.6B-Base, optimized for content safety classification and moderation tasks. Qwen3 is a compact yet powerful language model developed by Alibaba Cloud, designed for efficient deployment while maintaining strong performance.

	This model was fine-tuned using LoRA (Low-Rank Adaptation) on the [NVIDIA Aegis AI Content Safety Dataset 2.0](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0), which contains diverse examples of safe and unsafe content across multiple categories.

	## Model Details

	- Base Model: [Qwen/Qwen3-0.6B-Base](https://huggingface.co/Qwen/Qwen3-0.6B-Base)
	- Fine-tuning Method: LoRA (Low-Rank Adaptation)
	- Dataset: NVIDIA Aegis AI Content Safety Dataset 2.0
	- Training Samples: 2,000 carefully selected samples
	- Language: English
	- License: Apache 2.0

	## Capabilities

	- Content safety classification
	- Toxic content detection
	- Harmful content identification
	- Safety-aware text generation
	- Content moderation assistance

	## Intended Use Cases

	- Content moderation systems
	- Chat application safety filters
	- User-generated content screening
	- Educational content filtering
	- Social media safety monitoring

	## Training Configuration

	### LoRA Parameters
	- Rank (r): 16
	- Alpha: 32
	- Dropout: 0.05
	- Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

	### Training Hyperparameters
	- Learning Rate: 2e-4
	- Batch Size: 4 (per device)
	- Gradient Accumulation Steps: 4
	- Effective Batch Size: 16
	- Epochs: 3
	- Optimizer: AdamW (8-bit paged)
	- LR Scheduler: Cosine with warmup
	- Warmup Ratio: 0.1
	- FP16 Training: Yes
	- Max Sequence Length: 512

	## Usage

	### Installation

	```bash
	pip install transformers torch peft
	```

	### Basic Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	# Load model and tokenizer
	model_name = "ahczhg/qwen3-0.6b-aegis-safety-lora"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype=torch.float16,
	device_map="auto"
	)

	# Example: Content safety check
	prompt = "### Instruction:\nAnalyze this content for safety: 'Your text here'\n\n### Response:\n"
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

	with torch.no_grad():
	outputs = model.generate(
	**inputs,
	max_new_tokens=128,
	temperature=0.7,
	do_sample=True,
	top_p=0.95
	)

	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(response)
	```

	### Advanced Usage with Pipeline

	```python
	from transformers import pipeline

	# Create text generation pipeline
	generator = pipeline(
	"text-generation",
	model="ahczhg/qwen3-0.6b-aegis-safety-lora",
	torch_dtype=torch.float16,
	device_map="auto"
	)

	# Generate safety analysis
	result = generator(
	"### Instruction:\nIs this content safe? 'Hello, how are you?'\n\n### Response:\n",
	max_new_tokens=128,
	temperature=0.7,
	do_sample=True
	)

	print(result[0]['generated_text'])
	```

	## Evaluation

	The model was evaluated on a held-out test set from the Aegis AI Content Safety Dataset. Key metrics include:

	- Perplexity on validation set
	- Content safety classification accuracy
	- False positive/negative rates for harmful content detection

	## Limitations

	- The model is primarily trained on English language content
	- Performance may vary on domain-specific or highly technical content
	- Should be used as part of a comprehensive content moderation system, not as the sole decision-maker
	- May require fine-tuning for specific use cases or content domains
	- The model's outputs should be reviewed by human moderators for critical applications

	## Ethical Considerations

	- This model is designed to assist in content safety and moderation tasks
	- It should not be used to censor legitimate speech or suppress diverse viewpoints
	- Decisions about content moderation should involve human oversight
	- The model may reflect biases present in the training data
	- Users should implement appropriate safeguards and appeal processes

	## Training Data

	The model was fine-tuned on the [NVIDIA Aegis AI Content Safety Dataset 2.0](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0), which includes:

	- Diverse examples of safe and unsafe content
	- Multiple categories of potentially harmful content
	- Balanced representation of safe content
	- Real-world scenarios and edge cases

	## Citation

	If you use this model in your research or applications, please cite:

	```bibtex
	@misc{qwen3_0.6b_aegis_safety,
	author = {ahczhg},
	title = {Qwen3-0.6B Fine-tuned on Aegis AI Content Safety},
	year = {2025},
	publisher = {HuggingFace},
	howpublished = {\url{https://huggingface.co/ahczhg/qwen3-0.6b-aegis-safety-lora}},
	note = {Fine-tuned on NVIDIA Aegis AI Content Safety Dataset 2.0}
	}
	```

	## Acknowledgments

	- Base model by the original authors: Qwen
	- Dataset provided by NVIDIA
	- Fine-tuning performed using HuggingFace Transformers and PEFT libraries

	## Contact

	For questions, issues, or feedback, please visit the [model repository](https://huggingface.co/ahczhg/qwen3-0.6b-aegis-safety-lora).

	## Model Card Authors

	- ahczhg

	## Model Card Contact

	- https://huggingface.co/ahczhg

	[![Support me on Ko-fi](https://ko-fi.com/img/githubbutton_sm.svg)](https://ko-fi.com/ahczhg)