--- base_model: DJLougen/Ornstein3.6-35B-A3B-RYS-SABER tags: - gguf - qwen3_5_moe - mixture-of-experts - text-generation - qwen3.6 - rys - saber - refusal-ablation - uncensored - quantized language: - en license: apache-2.0 pipeline_tag: text-generation --- ![Ornstein3.6-35B-A3B-RYS-SABER](ornstein3.6RYS-SABER.jpeg) # Ornstein3.6-35B-A3B-RYS-SABER-GGUF GGUF quantizations of [DJLougen/Ornstein3.6-35B-A3B-RYS-SABER](https://huggingface.co/DJLougen/Ornstein3.6-35B-A3B-RYS-SABER) — the fully uncensored, RYS-enhanced Ornstein fine-tune with SABER refusal ablation applied. > **Full-precision model:** [DJLougen/Ornstein3.6-35B-A3B-RYS-SABER](https://huggingface.co/DJLougen/Ornstein3.6-35B-A3B-RYS-SABER) | **Censored version:** [DJLougen/Ornstein3.6-35B-A3B-RYS](https://huggingface.co/DJLougen/Ornstein3.6-35B-A3B-RYS) ## Support This Work I'm a PhD student in visual neuroscience at the University of Toronto who also happens to spend way too much time fine-tuning, merging, and quantizing open-weight models on rented H100s and a local DGX Spark. All training compute is self-funded — balancing GPU costs against a student budget. If my uploads have been useful to you, consider buying a PhD student a coffee. It goes a long way toward keeping these experiments running. **[Support on Ko-fi](https://ko-fi.com/djlougen)** --- ## Available Quantizations | Quantization | Use Case | |---|---| | **Q8_0** | Best quality, highest memory | | **Q6_K** | Near-lossless, good for 48GB+ VRAM | | **Q5_K_M** | Excellent quality/size balance | | **Q5_K_S** | Slightly smaller Q5 | | **Q5_0** | Legacy Q5 format | | **Q4_K_M** | Recommended default for 24GB VRAM | | **Q4_K_S** | Smaller Q4 variant | | **Q4_0** | Legacy Q4 format | | **Q3_K_L** | Low memory, acceptable quality | | **Q3_K_M** | Lower memory | | **Q3_K_S** | Aggressive 3-bit | | **Q2_K** | Minimum viable quality | ## Model Lineage ``` Qwen 3.6 35B-A3B → Ornstein3.6 (DDM fine-tune) → RYS (layer 10 dup, +49%) → SABER (refusal ablated) ``` ## Model Details - **Architecture:** Qwen 3.6 MoE (35B total, ~3B active per token) - **Layers:** 41 (40 original + 1 RYS-duplicated layer 10) - **Context:** 262,144 tokens - **SABER:** 54 refusal directions ablated across layers 24-32, 100% capability preserved ## Usage ### llama.cpp ```bash llama-cli -m Ornstein3.6-35B-A3B-RYS-SABER-Q4_K_M.gguf -p "Your prompt here" -ngl 99 ``` ### Ollama ``` ollama run hf.co/DJLougen/Ornstein3.6-35B-A3B-RYS-SABER-GGUF:Q4_K_M ``` ## Disclaimer This model has had its refusal training removed. It will comply with requests that the base model would refuse. The user assumes full responsibility for how this model is used. This release is intended for research, creative, and educational purposes. ## License Apache 2.0