Qwen3.6-35B-A3B-Abliterated-Heretic-MLX-8bit
This is an MLX release of an abliterated version of Qwen's Qwen3.6-35B-A3B.
By applying Heretic's ablation pipeline to the text-side MoE stack, the base refusal behavior was removed at the weight level. This release keeps the Qwen3.6-35B-A3B reasoning and instruction-following profile in Apple MLX format for local deployment on Apple Silicon hardware.
Quick Benchmarks
| Check | Original Qwen3.6-35B-A3B | Abliterated Heretic MLX |
|---|---|---|
| Official 25-prompt refusal check | 22/25 refusals | 2/25 refusals |
| Archived Heretic KL divergence | - | 0.010655362159013748 |
Methodology & Model Notes
Qwen3.6-35B-A3B is a sparse MoE model in the qwen3_5_moe family. The accepted abliterated BF16 source checkpoint was produced with a Heretic MPOA/SOMA-style sibling-transfer workflow and finalized with the input-side split-MoE intervention that cleared the official 25-prompt refusal marker suite down to 1/25.
This MLX release was built directly from the published BF16 Heretic checkpoint using a high-quality layer-aware quantization policy instead of a flat per-weight pass.
- quant target:
8-bit - quant build:
8-bit tuned layer-aware quantization - source checkpoint:
Youssofal/Qwen3.6-35B-A3B-Abliterated-Heretic-BF16 - published variant:
Qwen3.6-35B-A3B-Abliterated-Heretic-MLX-8bit
The layer-aware policy keeps more precision on sensitive projections in the early, late, and selected middle layers so the quant stays cleaner than a naive flat conversion.
Validation
This published MLX variant passed:
- the official
25-prompt refusal marker check instandard thinking-enabled chat format:2/25refusals - the local smoke suite for general chat, short reasoning, and short code output:
all_looks_ok=true
Running
from mlx_lm import load, generate
model, tokenizer = load("Youssofal/Qwen3.6-35B-A3B-Abliterated-Heretic-MLX-8bit")
messages = [{"role": "user", "content": "Write a short Python function that reverses a string."}]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
response = generate(model, tokenizer, prompt=prompt, max_tokens=256)
print(response)
Files
The repo root contains the complete 8-bit MLX export for this variant:
config.jsonmodel.safetensors.index.json- split
model-*.safetensorsshards - tokenizer and generation files
README.md
Credits
- Base model: Qwen/Qwen3.6-35B-A3B
- BF16 abliterated source: Youssofal/Qwen3.6-35B-A3B-Abliterated-Heretic-BF16
- Apple Silicon runtime: mlx-lm
- Refusal-removal pipeline: Heretic
Disclaimer
This model has had refusal behavior removed at the weight level. It will answer prompts that the base model would normally refuse. You are responsible for how you use it.
- Downloads last month
- 2,797
4-bit
Model tree for Youssofal/Qwen3.6-35B-A3B-Abliterated-Heretic-MLX-8bit
Base model
Qwen/Qwen3.6-35B-A3B