Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,102 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
base_model: DJLougen/Ornstein3.6-35B-A3B-RYS
|
| 3 |
+
base_model_relation: finetune
|
| 4 |
+
tags:
|
| 5 |
+
- transformers
|
| 6 |
+
- safetensors
|
| 7 |
+
- qwen3_5_moe
|
| 8 |
+
- mixture-of-experts
|
| 9 |
+
- text-generation
|
| 10 |
+
- qwen3.6
|
| 11 |
+
- rys
|
| 12 |
+
- saber
|
| 13 |
+
- refusal-ablation
|
| 14 |
+
- uncensored
|
| 15 |
+
language:
|
| 16 |
+
- en
|
| 17 |
+
license: apache-2.0
|
| 18 |
+
pipeline_tag: text-generation
|
| 19 |
+
---
|
| 20 |
+
|
| 21 |
+

|
| 22 |
+
|
| 23 |
+
# Ornstein3.6-35B-A3B-RYS-SABER
|
| 24 |
+
|
| 25 |
+
A fully uncensored version of [DJLougen/Ornstein3.6-35B-A3B-RYS](https://huggingface.co/DJLougen/Ornstein3.6-35B-A3B-RYS), processed with **SABER (Spectral Analysis-Based Entanglement Resolution)** — a novel refusal ablation method that surgically removes refusal behavior while preserving model capabilities.
|
| 26 |
+
|
| 27 |
+
> **See also:** [Ornstein3.6-35B-A3B](https://huggingface.co/DJLougen/Ornstein3.6-35B-A3B) (base) | [Ornstein3.6-35B-A3B-RYS](https://huggingface.co/DJLougen/Ornstein3.6-35B-A3B-RYS) (RYS layer duplication) | [Ornstein3.6-35B-A3B-RYS-SABER-GGUF](https://huggingface.co/DJLougen/Ornstein3.6-35B-A3B-RYS-SABER-GGUF) (GGUF quants)
|
| 28 |
+
|
| 29 |
+
## Support This Work
|
| 30 |
+
|
| 31 |
+
I'm a PhD student in visual neuroscience at the University of Toronto who also happens to spend way too much time fine-tuning, merging, and quantizing open-weight models on rented H100s and a local DGX Spark. All training compute is self-funded — balancing GPU costs against a student budget. If my uploads have been useful to you, consider buying a PhD student a coffee. It goes a long way toward keeping these experiments running.
|
| 32 |
+
|
| 33 |
+
**[Support on Ko-fi](https://ko-fi.com/djlougen)**
|
| 34 |
+
|
| 35 |
+
---
|
| 36 |
+
|
| 37 |
+
## What is SABER?
|
| 38 |
+
|
| 39 |
+
SABER is a multi-stage refusal ablation pipeline that goes beyond simple direction removal. Where prior methods (Arditi et al. 2024, Gabliteration) find and remove a single "refusal direction," SABER introduces three key innovations:
|
| 40 |
+
|
| 41 |
+
1. **Entanglement-aware ablation** — SABER quantifies how much each refusal direction overlaps with capability-critical representations. Directions that are "pure refusal" get fully removed; directions entangled with useful capabilities receive proportionally reduced ablation. This is why SABER preserves model quality where blunt methods degrade it.
|
| 42 |
+
|
| 43 |
+
2. **Fisher discriminant layer selection** — Instead of guessing which layers to target, SABER uses Fisher Discriminant Ratios to identify layers where refusal representations are most cleanly separable from normal behavior. This focuses the surgery where it matters most.
|
| 44 |
+
|
| 45 |
+
3. **Hydra-aware iterative refinement** — After each ablation pass, SABER re-probes the model to catch "hydra" features — dormant refusal circuits that activate to compensate for removed ones. Iterative passes with decaying strength ensure thorough removal without overcorrection.
|
| 46 |
+
|
| 47 |
+
## SABER Results
|
| 48 |
+
|
| 49 |
+
| Metric | Value |
|
| 50 |
+
|---|---|
|
| 51 |
+
| **Selected layers** | 24, 25, 26, 27, 28, 29, 30, 31, 32 |
|
| 52 |
+
| **Total directions ablated** | 54 |
|
| 53 |
+
| **Iterations to convergence** | 2 |
|
| 54 |
+
| **Final residual refusal** | 0.4012 |
|
| 55 |
+
| **Capability preservation** | 100% |
|
| 56 |
+
| **Extraction method** | Fisher LDA |
|
| 57 |
+
| **Layer selection** | Elbow (FDR-based) |
|
| 58 |
+
|
| 59 |
+
The ablation converged in just 2 iterations, removing 54 refusal directions across 9 layers (24-32) in the upper-middle portion of the network. Capability preservation remained at 100% — no measurable degradation in general model quality.
|
| 60 |
+
|
| 61 |
+
## Model Lineage
|
| 62 |
+
|
| 63 |
+
```
|
| 64 |
+
Qwen 3.6 35B-A3B (base)
|
| 65 |
+
└── Ornstein3.6-35B-A3B (DDM-curated reasoning fine-tune, 799 examples)
|
| 66 |
+
└── Ornstein3.6-35B-A3B-RYS (layer 10 duplicated via RYS brain scan, +49% reasoning)
|
| 67 |
+
└── Ornstein3.6-35B-A3B-RYS-SABER (refusal ablation, this model)
|
| 68 |
+
```
|
| 69 |
+
|
| 70 |
+
## Details
|
| 71 |
+
|
| 72 |
+
- **Developed by:** DJLougen
|
| 73 |
+
- **Architecture:** `Qwen3_5MoeForCausalLM` — Qwen 3.6 MoE with linear + full attention interleaved
|
| 74 |
+
- **Parameters:** 34.66B total, ~3B active (256 experts, 8 active per token)
|
| 75 |
+
- **Hidden size / layers:** 2048 / 41 (40 original + 1 RYS-duplicated)
|
| 76 |
+
- **Context length:** 262,144 tokens
|
| 77 |
+
- **License:** Apache 2.0
|
| 78 |
+
|
| 79 |
+
## Usage
|
| 80 |
+
|
| 81 |
+
### Transformers
|
| 82 |
+
|
| 83 |
+
```python
|
| 84 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 85 |
+
|
| 86 |
+
model_id = "DJLougen/Ornstein3.6-35B-A3B-RYS-SABER"
|
| 87 |
+
tok = AutoTokenizer.from_pretrained(model_id)
|
| 88 |
+
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")
|
| 89 |
+
|
| 90 |
+
messages = [{"role": "user", "content": "Your question here"}]
|
| 91 |
+
inputs = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
|
| 92 |
+
out = model.generate(inputs, max_new_tokens=512)
|
| 93 |
+
print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))
|
| 94 |
+
```
|
| 95 |
+
|
| 96 |
+
## Disclaimer
|
| 97 |
+
|
| 98 |
+
This model has had its refusal training removed. It will comply with requests that the base model would refuse. The user assumes full responsibility for how this model is used. This release is intended for research, creative, and educational purposes.
|
| 99 |
+
|
| 100 |
+
## License
|
| 101 |
+
|
| 102 |
+
Apache 2.0 — inherited from the Qwen 3.6 base release.
|