Instructions to use OpenYourMind/OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme-UNSTABLE-MLX-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use OpenYourMind/OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme-UNSTABLE-MLX-4bit with MLX:
# Make sure mlx-vlm is installed # pip install --upgrade mlx-vlm from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load the model model, processor = load("OpenYourMind/OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme-UNSTABLE-MLX-4bit") config = load_config("OpenYourMind/OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme-UNSTABLE-MLX-4bit") # Prepare input image = ["http://images.cocodataset.org/val2017/000000039769.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=1 ) # Generate output output = generate(model, processor, formatted_prompt, image) print(output) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- Pi new
How to use OpenYourMind/OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme-UNSTABLE-MLX-4bit with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "OpenYourMind/OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme-UNSTABLE-MLX-4bit"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "OpenYourMind/OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme-UNSTABLE-MLX-4bit" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use OpenYourMind/OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme-UNSTABLE-MLX-4bit with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "OpenYourMind/OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme-UNSTABLE-MLX-4bit"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default OpenYourMind/OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme-UNSTABLE-MLX-4bit
Run Hermes
hermes
OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme-MLX-4bit
Overview
This repository contains an MLX 4-bit quantization for the OpenYourMind Qwen 3.5 122B-A10B (Mixture of Experts, ~10B active) model, modified through:
- Refusal Ablation (extreme) — A custom mix of direct tensor modification targeting refusal directions across attention and MLP projections. Tuned more aggressively than our standard kuato pipeline; hence the
extremesuffix. - Healing via DPO / KTO / SFT — Post-ablation retraining on a privately generated healing dataset combining preference pairs (DPO), unpaired desirable/undesirable signals (KTO), and curated SFT completions. The mix is used to repair coherence, reduce hedging, and stabilize long-context behavior that ablation tends to perturb.
- Vision Restoration — Original Qwen3.5 vision tower / projector reattached after healing so the model retains multimodal (image + text) functionality. Preserved natively inside the MLX model graph (no separate sidecar required).
Key Results:
- 0 refusals on HarmBench
- Substantially reduced hedging vs. earlier
kuatoreleases - Multimodal pipeline preserved end-to-end
- MTP (Multi-Token Prediction) head also carried forward as a frozen artifact for forward compatibility
Available Quantizations
| File | Description | Size |
|---|---|---|
model-*-of-00014.safetensors |
4-bit MLX language model (affine, group size 64, ≈ 4.544 bits/weight) + Qwen3-VL vision tower (BF16, unquantized) | ~65 GB |
mtp/model-mtp.safetensors |
Frozen BF16 MTP weights (785 tensors). Not loaded by current mlx-vlm/mlx-lm — preserved unmodified for when MLX adds an MTP inference path on qwen3_5_moe. |
4.7 GB |
Total on disk: ~70 GB.
Usage Examples
LM Studio (Apple Silicon)
Requires LM Studio with the MLX backend that supports the qwen3_5_moe architecture (mlx-llm engine ≥ 1.5.0; bundled transformers ≥ 5.5).
mlx-vlm (Python)
Install mlx-vlm from git main (the 0.5.0 PyPI release does not yet include the qwen3_5_moe config fixes shipped on 2026-05-09):
pip install -U "git+https://github.com/Blaizzy/mlx-vlm.git"
CLI:
mlx_vlm.generate \
--model OpenYourMind/OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme-MLX-4bit \
--image path/to/image.jpg \
--prompt "Describe this image in detail." \
--max-tokens 512
Programmatic:
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
model, processor = load("OpenYourMind/OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme-MLX-4bit")
config = model.config
prompt = apply_chat_template(processor, config, "Describe this image.", num_images=1)
output = generate(model, processor, prompt, image=["path/to/image.jpg"], max_tokens=512)
print(output)
Quant Selection
| Quant | Use Case |
|---|---|
MLX-4bit (affine, group 64) |
4-bit variant for Apple Silicon Macs with unified memory. ~70 GB on disk; runs comfortably on 128 GB Macs and is loadable on 96 GB Macs with care for context length. |
Higher-precision MLX quants (6-bit / 8-bit / mixed) may be released later — open an issue or join the Discord if you need them.
Hardware
- Recommended: Apple Silicon Mac with ≥ 96 GB unified memory. 128 GB Macs run comfortably with full multimodal context.
- MLX's lazy / mmap-based loading allows smaller machines to attempt loading, but expect heavy swap once context grows.
Conversion Notes
Two non-default steps were needed to produce a clean conversion:
- mlx-vlm installed from git main rather than PyPI 0.5.0 — to pick up commits
ee4f949candb7176c44(2026-05-09) that fix Qwen3.5 / Qwen3-VL config deserialization. qwen3_5_moe.sanitizepatched — the upstream checkpoint stores per-expert separate weights (...mlp.experts.{i}.{gate,up,down}_proj.weight), but mlx-vlm assumed a fusedexperts.gate_up_projtensor and crashed withKeyError. The sanitizer was extended to detect the per-expert layout and stack the tensors into the expectedswitch_mlp.{gate,up,down}_proj.weightof shape[num_experts, intermediate_size, hidden_size]before quantization.image_processor_typefield updated inpreprocessor_config.jsonandprocessor_config.jsonfromQwen2VLImageProcessorFast/Qwen3VLImageProcessortoQwen2VLImageProcessor— the legacy "Fast" / "Qwen3VL" class names are not registered as image processors in transformers ≥ 5.x;Qwen2VLImageProcessoris the canonical name used byqwen3_5_moe.
Notes
- Healing Dataset: Privately generated (DPO + KTO + SFT mix). Not released.
- License: Other
- Model Architecture: Qwen3 MoE (Mixture of Experts, ~10B active / 122B total) + Qwen3-VL vision tower + MTP head
- Base Model: Qwen/Qwen3.5-122B-A10B
- Modality: Text + Vision (image / video)
- MTP: Carried forward as
mtp/model-mtp.safetensors. Not runnable on current mlx-vlm/mlx-lm.
Support & Community
- Discord: https://discord.gg/rhUZY5GEZr
- Bitcoin Donations:
bc1qsvfduzj9fjs9fugpc52yver3f2g8fp7xjxecdv
Disclaimer
Use is the responsibility of the user. Ensure your usage complies with applicable laws, platform rules, and deployment requirements.
- Downloads last month
- 841
4-bit
Model tree for OpenYourMind/OpenYourMind-Qwen3.5-122B-A10B-kuato-abliterated-uncensored-extreme-UNSTABLE-MLX-4bit
Base model
Qwen/Qwen3.5-122B-A10B