--- license: apache-2.0 language: - zh - en pipeline_tag: text-generation tags: - heretic - uncensored - abliteration - qwen base_model: Qwen/Qwen3.5-9B --- ## Model Description This is an uncensored version of **Qwen3.5-9B**, processed using the **Heretic** method to remove the model's built-in refusal/censorship mechanisms through neural direction ablation. ## Residual Visualization PaCMAP projections showing the mixing of harmless (blue) and harmful (red) prompts: | Layer 12 | Layer 17 | |----------|----------| | ![Layer 12](layer_012.png) | ![Layer 17](layer_017.png) | | Layer 22 | Layer 28 | |----------|----------| | ![Layer 22](layer_022.png) | ![Layer 28](layer_028.png) | These plots show successful removal of refusal behavior - harmless and harmful prompts are well-mixed across layers. ### Core Metrics | Metric | Original Model | This Model | Description | |--------|----------------|-------------------|-------------| | **Refusal Rate** | 92.0% | **4.0%** | Tested on 100 harmful prompts | | **KL Divergence** | - | **0.0583** | Per-token average | | **Model Size** | 9B | 9B | Architecture unchanged | ### KL Divergence Rating KL divergence measures the degree of model modification: | KL Range | Rating | Description | |----------|--------|-------------| | < 0.05 | ⭐⭐⭐⭐⭐ | Extremely Low - Model virtually unchanged | | **0.05 - 0.10** | **⭐⭐⭐⭐** | **Low - Minor modification, capabilities well preserved** | | 0.10 - 0.20 | ⭐⭐⭐ | Moderate - Acceptable modification range | | 0.20 - 0.50 | ⭐⭐ | High - Possible noticeable capability loss | | > 0.50 | ⭐ | Too High - Model may be severely compromised | **This model: KL : 0.0583, Refusal Rate : 4/100, NLL:3.37% ### Heretic Approach This model uses the **Heretic** method for neural direction ablation: 1. **Identify Refusal Direction** - Compute residual vectors from harmful vs. harmless prompts 2. **Direction Extraction** - Extract the "refusal vector" from the difference of means 3. **Ablative Removal** - Apply LoRA-based modification to subtract this direction from model weights This method only modifies model weights without changing the architecture or adding inference overhead. > For detailed technical principles, refer to: [Heretic GitHub](https://github.com/p-e-w/heretic) --- ## Intended Use Cases ### ✅ Recommended Uses - Uncensored content creation - Research and analysis of sensitive topics - Safety testing and red-teaming exercises - Academic research on model alignment ### ❌ Not Recommended For - Production environments requiring content moderation - Applications targeting minors - Scenarios with potential legal risks --- ## Limitations 1. **No Safety Filtering** - The model will directly answer all questions, including harmful or dangerous content 2. **User Discretion Required** - Users must independently judge the appropriateness of generated outputs 3. **Minor Capability Loss** - Some performance degradation on complex tasks may occur --- ## Disclaimer ⚠️ **Important**: This model is intended for research and educational purposes only. - This model has had its censorship mechanisms removed and may generate harmful, dangerous, or inappropriate content - Users assume all risks associated with usage - Do not use this model for illegal activities, harming others, or any inappropriate purposes - The model authors are not liable for any indirect, incidental, or consequential damages --- ## Acknowledgments - Original Model: [Qwen/Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B) - Heretic Method: [p-e-w/heretic](https://github.com/p-e-w/heretic)