Qwen3.6-35B-A3B-abliterated-MNN
MNN-format 4-bit quantization of the Heretic-abliterated Qwen3.6-35B-A3B multimodal MoE, packaged for the TokForge Android MNN-fork runtime.
What this is
- Source (upstream abliteration):
Youssofal/Qwen3.6-35B-A3B-Abliterated-Heretic-BF16 - Upstream of that:
Qwen/Qwen3.6-35B-A3B - Abliteration methodology: Heretic input-side split-MoE transfer (MPOA/SOMA-style). Upstream reports 1/25 refusal rate on the official 25-prompt harmful-behaviors check (vs. 22/25 for the base model) with KL-divergence 0.0107 vs. base.
- Architecture:
qwen3_5_moe(40 layers, 256 experts, 8 active per token, hybrid linear + full attention, DeepStack vision).
Bundle contents (parity with taobao-mnn/Qwen3.6-35B-A3B-MNN)
config.jsonβ MNN LLM geometry (model_typeqwen3_5_moe, jinja chat template, MRoPE, vision offsets)llm_config.jsonβ backend defaults (cpu, thread_num=4, precision=low, memory=low)llm.mnn/llm.mnn.weightβ quantized MNN graph + external weight blob (Q4, block 64, HQQ)embeddings_bf16.binβ BF16 embedding table (separated from quantized weights via--seperate_embed)tokenizer.mtokβ MNN binary tokenizervisual.mnn/visual.mnn.weightβ vision transformer (DeepStack VLM)
Quantization scheme
Identical to the base taobao-mnn/Qwen3.6-35B-A3B-MNN bundle:
| Flag | Value |
|---|---|
--quant_bit |
4 |
--quant_block |
64 |
--lm_quant_bit |
4 |
--lm_quant_block |
64 |
--embed_bit |
16 |
--hqq |
enabled |
--seperate_embed |
enabled |
This parity means the abliterated variant should behave identically to the base Qwen3.6 bundle in terms of load time, memory footprint, and decode throughput on TokForge-supported devices.
VLM capability (added post-release, issue #217)
This bundle now ships with visual.mnn + visual.mnn.weight β DeepStack vision transformer for image input. llm_config.json has is_visual: true with the full vision config block (image_mean, image_norm, image_size: 420, vision_start, vision_end, image_pad, num_grid_per_side: 48, has_deepstack: true).
The vision tower in Qwen3.6-35B-A3B is architecturally identical across the base Qwen release and the Heretic-abliterated variant (abliteration targets only the LLM decoder MLP layers, verified by structural comparison of all 333 *.visual.* weight keys in both safetensors). The visual assets here are therefore drop-in compatible with taobao-mnn/Qwen3.6-35B-A3B-MNN and byte-identical to those in the base bundle.
Attention-stack fields (attention_type: mix, sliding_window: 4, layer_nums: 40) match the base taobao-mnn/Qwen3.6-35B-A3B-MNN bundle exactly.
Original conversion note
The first upload of this repo (pre-#217) shipped text-only: the ONNX visual export ran during the initial llmexport.py run, but the final MNNConvert step did not emit visual.mnn / visual.mnn.weight into the output dir. The exporter has since been patched with a --visual_only flag in the TokForge MNN fork to allow re-emitting vision assets without a full re-conversion. See the upstream issue thread for details.
Target runtime
TokForge Android β MNN fork with .mtok tokenizer, DeepStack VLM support, and the libMNN cherry-pick that lets the base Qwen3.6-35B-A3B load on 24 GB devices (RedMagic SM8850 verified at ~6.66 cold / ~8.32 warm tok/s).
Usage with upstream MNN llm_demo
git clone https://github.com/alibaba/MNN.git
cd MNN && mkdir build && cd build
cmake .. -DMNN_LOW_MEMORY=true -DMNN_CPU_WEIGHT_DEQUANT_GEMM=true \
-DMNN_BUILD_LLM=true -DMNN_SUPPORT_TRANSFORMER_FUSE=true
make -j
./llm_demo /path/to/Qwen3.6-35B-A3B-abliterated-MNN/config.json prompt.txt
License & safety
- Apache-2.0 (inherited from Qwen and the Youssofal abliteration).
- This is a safety-reduced / uncensored variant. It refuses far less than the base model on the MPOA/SOMA refusal benchmark. Deploy with appropriate user-facing controls and local policy.
- Export pipeline: alibaba/MNN llmexport (tq-merged branch, TokForge fork).
- Downloads last month
- 396
Model tree for darkmaniac7/Qwen3.6-35B-A3B-abliterated-MNN
Base model
Qwen/Qwen3.6-35B-A3B