OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme-MLX-4bit

Overview

This repository contains an MLX 4-bit quantization for the OpenYourMind Qwen 3.5 122B-A10B (Mixture of Experts, ~10B active) model, modified through:

  1. Refusal Ablation (extreme) — A custom mix of direct tensor modification targeting refusal directions across attention and MLP projections. Tuned more aggressively than our standard kuato pipeline; hence the extreme suffix.
  2. Healing via DPO / KTO / SFT — Post-ablation retraining on a privately generated healing dataset combining preference pairs (DPO), unpaired desirable/undesirable signals (KTO), and curated SFT completions. The mix is used to repair coherence, reduce hedging, and stabilize long-context behavior that ablation tends to perturb.
  3. Vision Restoration — Original Qwen3.5 vision tower / projector reattached after healing so the model retains multimodal (image + text) functionality. Preserved natively inside the MLX model graph (no separate sidecar required).

Key Results:

  • 0 refusals on HarmBench
  • Substantially reduced hedging vs. earlier kuato releases
  • Multimodal pipeline preserved end-to-end
  • MTP (Multi-Token Prediction) head also carried forward as a frozen artifact for forward compatibility

Available Quantizations

File Description Size
model-*-of-00014.safetensors 4-bit MLX language model (affine, group size 64, ≈ 4.544 bits/weight) + Qwen3-VL vision tower (BF16, unquantized) ~65 GB
mtp/model-mtp.safetensors Frozen BF16 MTP weights (785 tensors). Not loaded by current mlx-vlm/mlx-lm — preserved unmodified for when MLX adds an MTP inference path on qwen3_5_moe. 4.7 GB

Total on disk: ~70 GB.

Usage Examples

LM Studio (Apple Silicon)

lmstudio://open_from_hf?model=OpenYourMind/OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme-MLX-4bit

Requires LM Studio with the MLX backend that supports the qwen3_5_moe architecture (mlx-llm engine ≥ 1.5.0; bundled transformers ≥ 5.5).

mlx-vlm (Python)

Install mlx-vlm from git main (the 0.5.0 PyPI release does not yet include the qwen3_5_moe config fixes shipped on 2026-05-09):

pip install -U "git+https://github.com/Blaizzy/mlx-vlm.git"

CLI:

mlx_vlm.generate \
  --model OpenYourMind/OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme-MLX-4bit \
  --image path/to/image.jpg \
  --prompt "Describe this image in detail." \
  --max-tokens 512

Programmatic:

from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template

model, processor = load("OpenYourMind/OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme-MLX-4bit")
config = model.config

prompt = apply_chat_template(processor, config, "Describe this image.", num_images=1)
output = generate(model, processor, prompt, image=["path/to/image.jpg"], max_tokens=512)
print(output)

Quant Selection

Quant Use Case
MLX-4bit (affine, group 64) 4-bit variant for Apple Silicon Macs with unified memory. ~70 GB on disk; runs comfortably on 128 GB Macs and is loadable on 96 GB Macs with care for context length.

Higher-precision MLX quants (6-bit / 8-bit / mixed) may be released later — open an issue or join the Discord if you need them.

Hardware

  • Recommended: Apple Silicon Mac with ≥ 96 GB unified memory. 128 GB Macs run comfortably with full multimodal context.
  • MLX's lazy / mmap-based loading allows smaller machines to attempt loading, but expect heavy swap once context grows.

Conversion Notes

Two non-default steps were needed to produce a clean conversion:

  1. mlx-vlm installed from git main rather than PyPI 0.5.0 — to pick up commits ee4f949c and b7176c44 (2026-05-09) that fix Qwen3.5 / Qwen3-VL config deserialization.
  2. qwen3_5_moe.sanitize patched — the upstream checkpoint stores per-expert separate weights (...mlp.experts.{i}.{gate,up,down}_proj.weight), but mlx-vlm assumed a fused experts.gate_up_proj tensor and crashed with KeyError. The sanitizer was extended to detect the per-expert layout and stack the tensors into the expected switch_mlp.{gate,up,down}_proj.weight of shape [num_experts, intermediate_size, hidden_size] before quantization.
  3. image_processor_type field updated in preprocessor_config.json and processor_config.json from Qwen2VLImageProcessorFast / Qwen3VLImageProcessor to Qwen2VLImageProcessor — the legacy "Fast" / "Qwen3VL" class names are not registered as image processors in transformers ≥ 5.x; Qwen2VLImageProcessor is the canonical name used by qwen3_5_moe.

Notes

  • Healing Dataset: Privately generated (DPO + KTO + SFT mix). Not released.
  • License: Other
  • Model Architecture: Qwen3 MoE (Mixture of Experts, ~10B active / 122B total) + Qwen3-VL vision tower + MTP head
  • Base Model: Qwen/Qwen3.5-122B-A10B
  • Modality: Text + Vision (image / video)
  • MTP: Carried forward as mtp/model-mtp.safetensors. Not runnable on current mlx-vlm/mlx-lm.

Support & Community

Disclaimer

Use is the responsibility of the user. Ensure your usage complies with applicable laws, platform rules, and deployment requirements.

Downloads last month
841
Safetensors
Model size
20B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OpenYourMind/OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme-UNSTABLE-MLX-4bit

Quantized
(112)
this model