Qwen3.6-35B-A3B-abliterated-MNN

MNN-format 4-bit quantization of the Heretic-abliterated Qwen3.6-35B-A3B multimodal MoE, packaged for the TokForge Android MNN-fork runtime.

What this is

  • Source (upstream abliteration): Youssofal/Qwen3.6-35B-A3B-Abliterated-Heretic-BF16
  • Upstream of that: Qwen/Qwen3.6-35B-A3B
  • Abliteration methodology: Heretic input-side split-MoE transfer (MPOA/SOMA-style). Upstream reports 1/25 refusal rate on the official 25-prompt harmful-behaviors check (vs. 22/25 for the base model) with KL-divergence 0.0107 vs. base.
  • Architecture: qwen3_5_moe (40 layers, 256 experts, 8 active per token, hybrid linear + full attention, DeepStack vision).

Bundle contents (parity with taobao-mnn/Qwen3.6-35B-A3B-MNN)

  • config.json β€” MNN LLM geometry (model_type qwen3_5_moe, jinja chat template, MRoPE, vision offsets)
  • llm_config.json β€” backend defaults (cpu, thread_num=4, precision=low, memory=low)
  • llm.mnn / llm.mnn.weight β€” quantized MNN graph + external weight blob (Q4, block 64, HQQ)
  • embeddings_bf16.bin β€” BF16 embedding table (separated from quantized weights via --seperate_embed)
  • tokenizer.mtok β€” MNN binary tokenizer
  • visual.mnn / visual.mnn.weight β€” vision transformer (DeepStack VLM)

Quantization scheme

Identical to the base taobao-mnn/Qwen3.6-35B-A3B-MNN bundle:

Flag Value
--quant_bit 4
--quant_block 64
--lm_quant_bit 4
--lm_quant_block 64
--embed_bit 16
--hqq enabled
--seperate_embed enabled

This parity means the abliterated variant should behave identically to the base Qwen3.6 bundle in terms of load time, memory footprint, and decode throughput on TokForge-supported devices.

VLM capability (added post-release, issue #217)

This bundle now ships with visual.mnn + visual.mnn.weight β€” DeepStack vision transformer for image input. llm_config.json has is_visual: true with the full vision config block (image_mean, image_norm, image_size: 420, vision_start, vision_end, image_pad, num_grid_per_side: 48, has_deepstack: true).

The vision tower in Qwen3.6-35B-A3B is architecturally identical across the base Qwen release and the Heretic-abliterated variant (abliteration targets only the LLM decoder MLP layers, verified by structural comparison of all 333 *.visual.* weight keys in both safetensors). The visual assets here are therefore drop-in compatible with taobao-mnn/Qwen3.6-35B-A3B-MNN and byte-identical to those in the base bundle.

Attention-stack fields (attention_type: mix, sliding_window: 4, layer_nums: 40) match the base taobao-mnn/Qwen3.6-35B-A3B-MNN bundle exactly.

Original conversion note

The first upload of this repo (pre-#217) shipped text-only: the ONNX visual export ran during the initial llmexport.py run, but the final MNNConvert step did not emit visual.mnn / visual.mnn.weight into the output dir. The exporter has since been patched with a --visual_only flag in the TokForge MNN fork to allow re-emitting vision assets without a full re-conversion. See the upstream issue thread for details.

Target runtime

TokForge Android β€” MNN fork with .mtok tokenizer, DeepStack VLM support, and the libMNN cherry-pick that lets the base Qwen3.6-35B-A3B load on 24 GB devices (RedMagic SM8850 verified at ~6.66 cold / ~8.32 warm tok/s).

Usage with upstream MNN llm_demo

git clone https://github.com/alibaba/MNN.git
cd MNN && mkdir build && cd build
cmake .. -DMNN_LOW_MEMORY=true -DMNN_CPU_WEIGHT_DEQUANT_GEMM=true \
         -DMNN_BUILD_LLM=true -DMNN_SUPPORT_TRANSFORMER_FUSE=true
make -j

./llm_demo /path/to/Qwen3.6-35B-A3B-abliterated-MNN/config.json prompt.txt

License & safety

  • Apache-2.0 (inherited from Qwen and the Youssofal abliteration).
  • This is a safety-reduced / uncensored variant. It refuses far less than the base model on the MPOA/SOMA refusal benchmark. Deploy with appropriate user-facing controls and local policy.
  • Export pipeline: alibaba/MNN llmexport (tq-merged branch, TokForge fork).
Downloads last month
396
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for darkmaniac7/Qwen3.6-35B-A3B-abliterated-MNN

Finetuned
(73)
this model

Collection including darkmaniac7/Qwen3.6-35B-A3B-abliterated-MNN