OpenYourMind
/

OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme-UNSTABLE-MLX-4bit

Instructions to use OpenYourMind/OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme-UNSTABLE-MLX-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use OpenYourMind/OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme-UNSTABLE-MLX-4bit with MLX:

# Make sure mlx-vlm is installed
# pip install --upgrade mlx-vlm

from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

# Load the model
model, processor = load("OpenYourMind/OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme-UNSTABLE-MLX-4bit")
config = load_config("OpenYourMind/OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme-UNSTABLE-MLX-4bit")

# Prepare input
image = ["http://images.cocodataset.org/val2017/000000039769.jpg"]
prompt = "Describe this image."

# Apply chat template
formatted_prompt = apply_chat_template(
    processor, config, prompt, num_images=1
)

# Generate output
output = generate(model, processor, formatted_prompt, image)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps
LM Studio

Pi new

How to use OpenYourMind/OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme-UNSTABLE-MLX-4bit with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "OpenYourMind/OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme-UNSTABLE-MLX-4bit"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "OpenYourMind/OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme-UNSTABLE-MLX-4bit"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use OpenYourMind/OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme-UNSTABLE-MLX-4bit with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "OpenYourMind/OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme-UNSTABLE-MLX-4bit"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default OpenYourMind/OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme-UNSTABLE-MLX-4bit

Run Hermes

hermes

OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme-MLX-4bit

Overview

This repository contains an MLX 4-bit quantization for the OpenYourMind Qwen 3.5 122B-A10B (Mixture of Experts, ~10B active) model, modified through:

Refusal Ablation (extreme) — A custom mix of direct tensor modification targeting refusal directions across attention and MLP projections. Tuned more aggressively than our standard kuato pipeline; hence the extreme suffix.
Healing via DPO / KTO / SFT — Post-ablation retraining on a privately generated healing dataset combining preference pairs (DPO), unpaired desirable/undesirable signals (KTO), and curated SFT completions. The mix is used to repair coherence, reduce hedging, and stabilize long-context behavior that ablation tends to perturb.
Vision Restoration — Original Qwen3.5 vision tower / projector reattached after healing so the model retains multimodal (image + text) functionality. Preserved natively inside the MLX model graph (no separate sidecar required).

Key Results:

0 refusals on HarmBench
Substantially reduced hedging vs. earlier kuato releases
Multimodal pipeline preserved end-to-end
MTP (Multi-Token Prediction) head also carried forward as a frozen artifact for forward compatibility

Available Quantizations

File	Description	Size
`model-*-of-00014.safetensors`	4-bit MLX language model (affine, group size 64, ≈ 4.544 bits/weight) + Qwen3-VL vision tower (BF16, unquantized)	~65 GB
`mtp/model-mtp.safetensors`	Frozen BF16 MTP weights (785 tensors). Not loaded by current mlx-vlm/mlx-lm — preserved unmodified for when MLX adds an MTP inference path on `qwen3_5_moe`.	4.7 GB

Total on disk: ~70 GB.

Usage Examples

LM Studio (Apple Silicon)

lmstudio://open_from_hf?model=OpenYourMind/OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme-MLX-4bit

Requires LM Studio with the MLX backend that supports the qwen3_5_moe architecture (mlx-llm engine ≥ 1.5.0; bundled transformers ≥ 5.5).

mlx-vlm (Python)

Install mlx-vlm from git main (the 0.5.0 PyPI release does not yet include the qwen3_5_moe config fixes shipped on 2026-05-09):

pip install -U "git+https://github.com/Blaizzy/mlx-vlm.git"

CLI:

mlx_vlm.generate \
  --model OpenYourMind/OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme-MLX-4bit \
  --image path/to/image.jpg \
  --prompt "Describe this image in detail." \
  --max-tokens 512

Programmatic:

from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template

model, processor = load("OpenYourMind/OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme-MLX-4bit")
config = model.config

prompt = apply_chat_template(processor, config, "Describe this image.", num_images=1)
output = generate(model, processor, prompt, image=["path/to/image.jpg"], max_tokens=512)
print(output)

Quant Selection

Quant	Use Case
`MLX-4bit` (affine, group 64)	4-bit variant for Apple Silicon Macs with unified memory. ~70 GB on disk; runs comfortably on 128 GB Macs and is loadable on 96 GB Macs with care for context length.

Higher-precision MLX quants (6-bit / 8-bit / mixed) may be released later — open an issue or join the Discord if you need them.

Hardware

Recommended: Apple Silicon Mac with ≥ 96 GB unified memory. 128 GB Macs run comfortably with full multimodal context.
MLX's lazy / mmap-based loading allows smaller machines to attempt loading, but expect heavy swap once context grows.

Conversion Notes

Two non-default steps were needed to produce a clean conversion:

mlx-vlm installed from git main rather than PyPI 0.5.0 — to pick up commits ee4f949c and b7176c44 (2026-05-09) that fix Qwen3.5 / Qwen3-VL config deserialization.
qwen3_5_moe.sanitize patched — the upstream checkpoint stores per-expert separate weights (...mlp.experts.{i}.{gate,up,down}_proj.weight), but mlx-vlm assumed a fused experts.gate_up_proj tensor and crashed with KeyError. The sanitizer was extended to detect the per-expert layout and stack the tensors into the expected switch_mlp.{gate,up,down}_proj.weight of shape [num_experts, intermediate_size, hidden_size] before quantization.
image_processor_type field updated in preprocessor_config.json and processor_config.json from Qwen2VLImageProcessorFast / Qwen3VLImageProcessor to Qwen2VLImageProcessor — the legacy "Fast" / "Qwen3VL" class names are not registered as image processors in transformers ≥ 5.x; Qwen2VLImageProcessor is the canonical name used by qwen3_5_moe.

Notes

Healing Dataset: Privately generated (DPO + KTO + SFT mix). Not released.
License: Other
Model Architecture: Qwen3 MoE (Mixture of Experts, ~10B active / 122B total) + Qwen3-VL vision tower + MTP head
Base Model: Qwen/Qwen3.5-122B-A10B
Modality: Text + Vision (image / video)
MTP: Carried forward as mtp/model-mtp.safetensors. Not runnable on current mlx-vlm/mlx-lm.

Support & Community

Discord: https://discord.gg/rhUZY5GEZr
Bitcoin Donations: bc1qsvfduzj9fjs9fugpc52yver3f2g8fp7xjxecdv

Disclaimer

Use is the responsibility of the user. Ensure your usage complies with applicable laws, platform rules, and deployment requirements.

Downloads last month: 841

Safetensors

Model size

20B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Model tree for OpenYourMind/OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme-UNSTABLE-MLX-4bit

Base model

Qwen/Qwen3.5-122B-A10B

Quantized

(112)

this model