DJLougen commited on
Commit
183f39b
·
verified ·
1 Parent(s): 8ba8acc

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +102 -0
README.md ADDED
@@ -0,0 +1,102 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: DJLougen/Ornstein3.6-35B-A3B-RYS
3
+ base_model_relation: finetune
4
+ tags:
5
+ - transformers
6
+ - safetensors
7
+ - qwen3_5_moe
8
+ - mixture-of-experts
9
+ - text-generation
10
+ - qwen3.6
11
+ - rys
12
+ - saber
13
+ - refusal-ablation
14
+ - uncensored
15
+ language:
16
+ - en
17
+ license: apache-2.0
18
+ pipeline_tag: text-generation
19
+ ---
20
+
21
+ ![Ornstein3.6-35B-A3B-RYS-SABER](ornstein3.6RYS-SABER.jpeg)
22
+
23
+ # Ornstein3.6-35B-A3B-RYS-SABER
24
+
25
+ A fully uncensored version of [DJLougen/Ornstein3.6-35B-A3B-RYS](https://huggingface.co/DJLougen/Ornstein3.6-35B-A3B-RYS), processed with **SABER (Spectral Analysis-Based Entanglement Resolution)** — a novel refusal ablation method that surgically removes refusal behavior while preserving model capabilities.
26
+
27
+ > **See also:** [Ornstein3.6-35B-A3B](https://huggingface.co/DJLougen/Ornstein3.6-35B-A3B) (base) | [Ornstein3.6-35B-A3B-RYS](https://huggingface.co/DJLougen/Ornstein3.6-35B-A3B-RYS) (RYS layer duplication) | [Ornstein3.6-35B-A3B-RYS-SABER-GGUF](https://huggingface.co/DJLougen/Ornstein3.6-35B-A3B-RYS-SABER-GGUF) (GGUF quants)
28
+
29
+ ## Support This Work
30
+
31
+ I'm a PhD student in visual neuroscience at the University of Toronto who also happens to spend way too much time fine-tuning, merging, and quantizing open-weight models on rented H100s and a local DGX Spark. All training compute is self-funded — balancing GPU costs against a student budget. If my uploads have been useful to you, consider buying a PhD student a coffee. It goes a long way toward keeping these experiments running.
32
+
33
+ **[Support on Ko-fi](https://ko-fi.com/djlougen)**
34
+
35
+ ---
36
+
37
+ ## What is SABER?
38
+
39
+ SABER is a multi-stage refusal ablation pipeline that goes beyond simple direction removal. Where prior methods (Arditi et al. 2024, Gabliteration) find and remove a single "refusal direction," SABER introduces three key innovations:
40
+
41
+ 1. **Entanglement-aware ablation** — SABER quantifies how much each refusal direction overlaps with capability-critical representations. Directions that are "pure refusal" get fully removed; directions entangled with useful capabilities receive proportionally reduced ablation. This is why SABER preserves model quality where blunt methods degrade it.
42
+
43
+ 2. **Fisher discriminant layer selection** — Instead of guessing which layers to target, SABER uses Fisher Discriminant Ratios to identify layers where refusal representations are most cleanly separable from normal behavior. This focuses the surgery where it matters most.
44
+
45
+ 3. **Hydra-aware iterative refinement** — After each ablation pass, SABER re-probes the model to catch "hydra" features — dormant refusal circuits that activate to compensate for removed ones. Iterative passes with decaying strength ensure thorough removal without overcorrection.
46
+
47
+ ## SABER Results
48
+
49
+ | Metric | Value |
50
+ |---|---|
51
+ | **Selected layers** | 24, 25, 26, 27, 28, 29, 30, 31, 32 |
52
+ | **Total directions ablated** | 54 |
53
+ | **Iterations to convergence** | 2 |
54
+ | **Final residual refusal** | 0.4012 |
55
+ | **Capability preservation** | 100% |
56
+ | **Extraction method** | Fisher LDA |
57
+ | **Layer selection** | Elbow (FDR-based) |
58
+
59
+ The ablation converged in just 2 iterations, removing 54 refusal directions across 9 layers (24-32) in the upper-middle portion of the network. Capability preservation remained at 100% — no measurable degradation in general model quality.
60
+
61
+ ## Model Lineage
62
+
63
+ ```
64
+ Qwen 3.6 35B-A3B (base)
65
+ └── Ornstein3.6-35B-A3B (DDM-curated reasoning fine-tune, 799 examples)
66
+ └── Ornstein3.6-35B-A3B-RYS (layer 10 duplicated via RYS brain scan, +49% reasoning)
67
+ └── Ornstein3.6-35B-A3B-RYS-SABER (refusal ablation, this model)
68
+ ```
69
+
70
+ ## Details
71
+
72
+ - **Developed by:** DJLougen
73
+ - **Architecture:** `Qwen3_5MoeForCausalLM` — Qwen 3.6 MoE with linear + full attention interleaved
74
+ - **Parameters:** 34.66B total, ~3B active (256 experts, 8 active per token)
75
+ - **Hidden size / layers:** 2048 / 41 (40 original + 1 RYS-duplicated)
76
+ - **Context length:** 262,144 tokens
77
+ - **License:** Apache 2.0
78
+
79
+ ## Usage
80
+
81
+ ### Transformers
82
+
83
+ ```python
84
+ from transformers import AutoModelForCausalLM, AutoTokenizer
85
+
86
+ model_id = "DJLougen/Ornstein3.6-35B-A3B-RYS-SABER"
87
+ tok = AutoTokenizer.from_pretrained(model_id)
88
+ model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")
89
+
90
+ messages = [{"role": "user", "content": "Your question here"}]
91
+ inputs = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
92
+ out = model.generate(inputs, max_new_tokens=512)
93
+ print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))
94
+ ```
95
+
96
+ ## Disclaimer
97
+
98
+ This model has had its refusal training removed. It will comply with requests that the base model would refuse. The user assumes full responsibility for how this model is used. This release is intended for research, creative, and educational purposes.
99
+
100
+ ## License
101
+
102
+ Apache 2.0 — inherited from the Qwen 3.6 base release.