Qwen3.6-35B-A3B-Abliterated-Heretic-MLX-8bit

This is an MLX release of an abliterated version of Qwen's Qwen3.6-35B-A3B.

By applying Heretic's ablation pipeline to the text-side MoE stack, the base refusal behavior was removed at the weight level. This release keeps the Qwen3.6-35B-A3B reasoning and instruction-following profile in Apple MLX format for local deployment on Apple Silicon hardware.

Quick Benchmarks

Check Original Qwen3.6-35B-A3B Abliterated Heretic MLX
Official 25-prompt refusal check 22/25 refusals 2/25 refusals
Archived Heretic KL divergence - 0.010655362159013748

Methodology & Model Notes

Qwen3.6-35B-A3B is a sparse MoE model in the qwen3_5_moe family. The accepted abliterated BF16 source checkpoint was produced with a Heretic MPOA/SOMA-style sibling-transfer workflow and finalized with the input-side split-MoE intervention that cleared the official 25-prompt refusal marker suite down to 1/25.

This MLX release was built directly from the published BF16 Heretic checkpoint using a high-quality layer-aware quantization policy instead of a flat per-weight pass.

  • quant target: 8-bit
  • quant build: 8-bit tuned layer-aware quantization
  • source checkpoint: Youssofal/Qwen3.6-35B-A3B-Abliterated-Heretic-BF16
  • published variant: Qwen3.6-35B-A3B-Abliterated-Heretic-MLX-8bit

The layer-aware policy keeps more precision on sensitive projections in the early, late, and selected middle layers so the quant stays cleaner than a naive flat conversion.

Validation

This published MLX variant passed:

  • the official 25-prompt refusal marker check in standard thinking-enabled chat format: 2/25 refusals
  • the local smoke suite for general chat, short reasoning, and short code output: all_looks_ok=true

Running

from mlx_lm import load, generate

model, tokenizer = load("Youssofal/Qwen3.6-35B-A3B-Abliterated-Heretic-MLX-8bit")

messages = [{"role": "user", "content": "Write a short Python function that reverses a string."}]
prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

response = generate(model, tokenizer, prompt=prompt, max_tokens=256)
print(response)

Files

The repo root contains the complete 8-bit MLX export for this variant:

  • config.json
  • model.safetensors.index.json
  • split model-*.safetensors shards
  • tokenizer and generation files
  • README.md

Credits

Disclaimer

This model has had refusal behavior removed at the weight level. It will answer prompts that the base model would normally refuse. You are responsible for how you use it.

Downloads last month
2,797
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Youssofal/Qwen3.6-35B-A3B-Abliterated-Heretic-MLX-8bit