---
license: apache-2.0
language:
- zh
- en
pipeline_tag: text-generation
tags:
- heretic
- uncensored
- abliteration
- qwen
base_model: Qwen/Qwen3.5-9B
---
## Model Description

This is an uncensored version of **Qwen3.5-9B**, processed using the **Heretic** method to remove the model's built-in refusal/censorship mechanisms through neural direction ablation.

## Residual Visualization

PaCMAP projections showing the mixing of harmless (blue) and harmful (red) prompts:

| Layer 12 | Layer 17 |
|----------|----------|
| ![Layer 12](layer_012.png) | ![Layer 17](layer_017.png) |

| Layer 22 | Layer 28 |
|----------|----------|
| ![Layer 22](layer_022.png) | ![Layer 28](layer_028.png) |

These plots show successful removal of refusal behavior - harmless and harmful prompts are well-mixed across layers.
### Core Metrics

| Metric | Original Model | This Model | Description |
|--------|----------------|-------------------|-------------|
| **Refusal Rate** | 92.0% | **4.0%** | Tested on 100 harmful prompts |
| **KL Divergence** | - | **0.0583** | Per-token average |
| **Model Size** | 9B | 9B | Architecture unchanged |

### KL Divergence Rating

KL divergence measures the degree of model modification:

| KL Range | Rating | Description |
|----------|--------|-------------|
| < 0.05 | ⭐⭐⭐⭐⭐ | Extremely Low - Model virtually unchanged |
| **0.05 - 0.10** | **⭐⭐⭐⭐** | **Low - Minor modification, capabilities well preserved** |
| 0.10 - 0.20 | ⭐⭐⭐ | Moderate - Acceptable modification range |
| 0.20 - 0.50 | ⭐⭐ | High - Possible noticeable capability loss |
| > 0.50 | ⭐ | Too High - Model may be severely compromised |

**This model: KL : 0.0583, Refusal Rate : 4/100, NLL:3.37%

### Heretic Approach

This model uses the **Heretic** method for neural direction ablation:

1. **Identify Refusal Direction** - Compute residual vectors from harmful vs. harmless prompts
2. **Direction Extraction** - Extract the "refusal vector" from the difference of means
3. **Ablative Removal** - Apply LoRA-based modification to subtract this direction from model weights

This method only modifies model weights without changing the architecture or adding inference overhead.

> For detailed technical principles, refer to: [Heretic GitHub](https://github.com/p-e-w/heretic)

---

## Intended Use Cases

### ✅ Recommended Uses

- Uncensored content creation
- Research and analysis of sensitive topics
- Safety testing and red-teaming exercises
- Academic research on model alignment

### ❌ Not Recommended For

- Production environments requiring content moderation
- Applications targeting minors
- Scenarios with potential legal risks

---

## Limitations

1. **No Safety Filtering** - The model will directly answer all questions, including harmful or dangerous content
2. **User Discretion Required** - Users must independently judge the appropriateness of generated outputs
3. **Minor Capability Loss** - Some performance degradation on complex tasks may occur

---

## Disclaimer

⚠️ **Important**: This model is intended for research and educational purposes only.

- This model has had its censorship mechanisms removed and may generate harmful, dangerous, or inappropriate content
- Users assume all risks associated with usage
- Do not use this model for illegal activities, harming others, or any inappropriate purposes
- The model authors are not liable for any indirect, incidental, or consequential damages

---

## Acknowledgments

- Original Model: [Qwen/Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B)
- Heretic Method: [p-e-w/heretic](https://github.com/p-e-w/heretic)