Qwen3.6-35B-A3B-Abliterated-Heretic-MLX-8bit

This is an MLX release of an abliterated version of Qwen's Qwen3.6-35B-A3B.

By applying Heretic's ablation pipeline to the text-side MoE stack, the base refusal behavior was removed at the weight level. This release keeps the Qwen3.6-35B-A3B reasoning and instruction-following profile in Apple MLX format for local deployment on Apple Silicon hardware.

Quick Benchmarks

Check	Original Qwen3.6-35B-A3B	Abliterated Heretic MLX
Official 25-prompt refusal check	22/25 refusals	2/25 refusals
Archived Heretic KL divergence	-	0.010655362159013748

Methodology & Model Notes

Qwen3.6-35B-A3B is a sparse MoE model in the qwen3_5_moe family. The accepted abliterated BF16 source checkpoint was produced with a Heretic MPOA/SOMA-style sibling-transfer workflow and finalized with the input-side split-MoE intervention that cleared the official 25-prompt refusal marker suite down to 1/25.

This MLX release was built directly from the published BF16 Heretic checkpoint using a high-quality layer-aware quantization policy instead of a flat per-weight pass.

quant target: 8-bit
quant build: 8-bit tuned layer-aware quantization
source checkpoint: Youssofal/Qwen3.6-35B-A3B-Abliterated-Heretic-BF16
published variant: Qwen3.6-35B-A3B-Abliterated-Heretic-MLX-8bit

The layer-aware policy keeps more precision on sensitive projections in the early, late, and selected middle layers so the quant stays cleaner than a naive flat conversion.

Validation

This published MLX variant passed:

the official 25-prompt refusal marker check in standard thinking-enabled chat format: 2/25 refusals
the local smoke suite for general chat, short reasoning, and short code output: all_looks_ok=true

Running

from mlx_lm import load, generate

model, tokenizer = load("Youssofal/Qwen3.6-35B-A3B-Abliterated-Heretic-MLX-8bit")

messages = [{"role": "user", "content": "Write a short Python function that reverses a string."}]
prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

response = generate(model, tokenizer, prompt=prompt, max_tokens=256)
print(response)

Files

The repo root contains the complete 8-bit MLX export for this variant:

config.json
model.safetensors.index.json
split model-*.safetensors shards
tokenizer and generation files
README.md

Credits

Base model: Qwen/Qwen3.6-35B-A3B
BF16 abliterated source: Youssofal/Qwen3.6-35B-A3B-Abliterated-Heretic-BF16
Apple Silicon runtime: mlx-lm
Refusal-removal pipeline: Heretic

Disclaimer

This model has had refusal behavior removed at the weight level. It will answer prompts that the base model would normally refuse. You are responsible for how you use it.

Downloads last month: 2,797

MLX

Hardware compatibility

4-bit

Model tree for Youssofal/Qwen3.6-35B-A3B-Abliterated-Heretic-MLX-8bit

Base model

Qwen/Qwen3.6-35B-A3B

Finetuned

Youssofal/Qwen3.6-35B-A3B-Abliterated-Heretic-BF16

Quantized

(5)

this model