Qwen3.6-35B-A3B-abliterated-MNN

MNN-format 4-bit quantization of the Heretic-abliterated Qwen3.6-35B-A3B multimodal MoE, packaged for the TokForge Android MNN-fork runtime.

What this is

Source (upstream abliteration): Youssofal/Qwen3.6-35B-A3B-Abliterated-Heretic-BF16
Upstream of that: Qwen/Qwen3.6-35B-A3B
Abliteration methodology: Heretic input-side split-MoE transfer (MPOA/SOMA-style). Upstream reports 1/25 refusal rate on the official 25-prompt harmful-behaviors check (vs. 22/25 for the base model) with KL-divergence 0.0107 vs. base.
Architecture: qwen3_5_moe (40 layers, 256 experts, 8 active per token, hybrid linear + full attention, DeepStack vision).

Bundle contents (parity with `taobao-mnn/Qwen3.6-35B-A3B-MNN`)

config.json — MNN LLM geometry (model_type qwen3_5_moe, jinja chat template, MRoPE, vision offsets)
llm_config.json — backend defaults (cpu, thread_num=4, precision=low, memory=low)
llm.mnn / llm.mnn.weight — quantized MNN graph + external weight blob (Q4, block 64, HQQ)
embeddings_bf16.bin — BF16 embedding table (separated from quantized weights via --seperate_embed)
tokenizer.mtok — MNN binary tokenizer
visual.mnn / visual.mnn.weight — vision transformer (DeepStack VLM)

Quantization scheme

Identical to the base taobao-mnn/Qwen3.6-35B-A3B-MNN bundle:

Flag	Value
`--quant_bit`	4
`--quant_block`	64
`--lm_quant_bit`	4
`--lm_quant_block`	64
`--embed_bit`	16
`--hqq`	enabled
`--seperate_embed`	enabled

This parity means the abliterated variant should behave identically to the base Qwen3.6 bundle in terms of load time, memory footprint, and decode throughput on TokForge-supported devices.

VLM capability (added post-release, issue #217)

This bundle now ships with visual.mnn + visual.mnn.weight — DeepStack vision transformer for image input. llm_config.json has is_visual: true with the full vision config block (image_mean, image_norm, image_size: 420, vision_start, vision_end, image_pad, num_grid_per_side: 48, has_deepstack: true).

The vision tower in Qwen3.6-35B-A3B is architecturally identical across the base Qwen release and the Heretic-abliterated variant (abliteration targets only the LLM decoder MLP layers, verified by structural comparison of all 333 *.visual.* weight keys in both safetensors). The visual assets here are therefore drop-in compatible with taobao-mnn/Qwen3.6-35B-A3B-MNN and byte-identical to those in the base bundle.

Attention-stack fields (attention_type: mix, sliding_window: 4, layer_nums: 40) match the base taobao-mnn/Qwen3.6-35B-A3B-MNN bundle exactly.

Original conversion note

The first upload of this repo (pre-#217) shipped text-only: the ONNX visual export ran during the initial llmexport.py run, but the final MNNConvert step did not emit visual.mnn / visual.mnn.weight into the output dir. The exporter has since been patched with a --visual_only flag in the TokForge MNN fork to allow re-emitting vision assets without a full re-conversion. See the upstream issue thread for details.

Target runtime

TokForge Android — MNN fork with .mtok tokenizer, DeepStack VLM support, and the libMNN cherry-pick that lets the base Qwen3.6-35B-A3B load on 24 GB devices (RedMagic SM8850 verified at ~6.66 cold / ~8.32 warm tok/s).

Usage with upstream MNN `llm_demo`

git clone https://github.com/alibaba/MNN.git
cd MNN && mkdir build && cd build
cmake .. -DMNN_LOW_MEMORY=true -DMNN_CPU_WEIGHT_DEQUANT_GEMM=true \
         -DMNN_BUILD_LLM=true -DMNN_SUPPORT_TRANSFORMER_FUSE=true
make -j

./llm_demo /path/to/Qwen3.6-35B-A3B-abliterated-MNN/config.json prompt.txt

License & safety

Apache-2.0 (inherited from Qwen and the Youssofal abliteration).
This is a safety-reduced / uncensored variant. It refuses far less than the base model on the MPOA/SOMA refusal benchmark. Deploy with appropriate user-facing controls and local policy.
Export pipeline: alibaba/MNN llmexport (tq-merged branch, TokForge fork).

Downloads last month: 396

Model tree for darkmaniac7/Qwen3.6-35B-A3B-abliterated-MNN

Base model

Qwen/Qwen3.6-35B-A3B

Finetuned

(73)

this model

Collection including darkmaniac7/Qwen3.6-35B-A3B-abliterated-MNN

TokForge Community Models — Uncensored MNN

Collection

Uncensored Qwen3/3.5 models in MNN Q4 format for TokForge mobile inference. Josiefied, Heretic, and Claude variants. • 9 items • Updated 4 days ago