OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme-GGUF

Overview

This repository contains GGUF quantizations for the OpenYourMind Qwen 3.5 122B-A10B (Mixture of Experts, ~10B active) model, modified through:

  1. Refusal Ablation (extreme) — A custom mix of direct tensor modification targeting refusal directions across attention and MLP projections. Tuned more aggressively than our standard kuato pipeline; hence the extreme suffix.
  2. Healing via DPO / KTO / SFT — Post-ablation retraining on a privately generated healing dataset combining preference pairs (DPO), unpaired desirable/undesirable signals (KTO), and curated SFT completions. The mix is used to repair coherence, reduce hedging, and stabilize long-context behavior that ablation tends to perturb.
  3. Vision Restoration — Original Qwen3.5 vision tower / projector reattached after healing so the model retains multimodal (image + text) functionality. Shipped as a separate mmproj sidecar.

Key Results:

  • 0 refusals on HarmBench
  • Substantially reduced hedging vs. earlier kuato releases
  • Multimodal pipeline preserved end-to-end

Available Quantizations

File Description Size
OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme.Q4_K_M.gguf 4-bit GGUF quant 74.2 GB
mmproj-OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme-F16.gguf Vision projector / tower sidecar 908.7 MB

Usage Examples

llama.cpp

./llama-server \
  -m OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme.Q4_K_M.gguf \
  --mmproj mmproj-OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme-F16.gguf \
  --jinja \
  --image-min-tokens 1024

llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="OpenYourMind/OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme-GGUF",
    filename="OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme.Q4_K_M.gguf",
)

llm.create_chat_completion(messages=[...])

Local Apps

Quant Selection

Quant Use Case
Q4_K_M 4-bit variant suitable for high-VRAM single-node or multi-GPU inference. ~74 GB on disk; comfortably runs MoE with ~10B active params on workstation-class hardware.

Higher-precision quants (Q6_K / Q8_0) may be released later — open an issue or join the Discord if you need them.

Notes

  • Healing Dataset: Privately generated (DPO + KTO + SFT mix). Not released.
  • License: Other
  • Model Architecture: Qwen3 MoE (Mixture of Experts, ~10B active / 122B total)
  • Base Model: Qwen/Qwen3.5-122B-A10B
  • Modality: Text + Vision (via mmproj sidecar)

Support & Community

Disclaimer

Use is the responsibility of the user. Ensure your usage complies with applicable laws, platform rules, and deployment requirements.

Downloads last month
271
GGUF
Model size
122B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OpenYourMind/OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme-UNSTABLE-GGUF

Quantized
(112)
this model