td_moe_qwen3_30b_a3b_match_mobe_ep_whiten_both

TD-MoE Tucker decomposition on Qwen3-30B-A3B, compression matched to MoBE/Sparse-MoBE b32 (37.5% removed per gate/up matrix, 62.5% kept). Expert-preserving, BOTH input+output whitening (full method).

Method: td_moe
Base model: Qwen/Qwen3-30B-A3B
Compressed: gate_proj, up_proj over layers 0-48 (down_proj kept dense)
Settings: TD-MoE Tucker, compression_ratio=0.375 (removed), expert_mode=preserve

See config.toml and training.json for the full run configuration and per-layer reconstruction stats.

Evaluation

Task	Metric	Value
arc_challenge	sample_len	1172.0000
arc_challenge	acc,none	0.3464
arc_challenge	acc_stderr,none	0.0139
arc_challenge	acc_norm,none	0.3686
arc_challenge	acc_norm_stderr,none	0.0141
arc_easy	sample_len	2376.0000
arc_easy	acc,none	0.6014
arc_easy	acc_stderr,none	0.0100
arc_easy	acc_norm,none	0.5623
arc_easy	acc_norm_stderr,none	0.0102
winogrande	sample_len	1267.0000
winogrande	acc,none	0.6235
winogrande	acc_stderr,none	0.0136
piqa	sample_len	1838.0000
piqa	acc,none	0.6855
piqa	acc_stderr,none	0.0108
piqa	acc_norm,none	0.6959
piqa	acc_norm_stderr,none	0.0107
hellaswag	sample_len	10042.0000
hellaswag	acc,none	0.3973
hellaswag	acc_stderr,none	0.0049
hellaswag	acc_norm,none	0.5232
hellaswag	acc_norm_stderr,none	0.0050
openbookqa	sample_len	500.0000
openbookqa	acc,none	0.2620
openbookqa	acc_stderr,none	0.0197
openbookqa	acc_norm,none	0.3360
openbookqa	acc_norm_stderr,none	0.0211

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "AverageMetaheuristicsEnjoyer/qwen3-30b-a3b-td-moe-ep-whiten-both-fix",
    trust_remote_code=True,
    torch_dtype="bfloat16",
    device_map="auto",
)
tok = AutoTokenizer.from_pretrained("AverageMetaheuristicsEnjoyer/qwen3-30b-a3b-td-moe-ep-whiten-both-fix", trust_remote_code=True)

The factored experts are reconstructed at runtime via the bundled Qwen3TDMoEForCausalLM.

Downloads last month: 12

Safetensors

Model size

23B params

Tensor type

BF16

Model tree for AverageMetaheuristicsEnjoyer/qwen3-30b-a3b-td-moe-ep-whiten-both-fix

Base model

Qwen/Qwen3-30B-A3B-Base

Finetuned

Qwen/Qwen3-30B-A3B

Finetuned

(63)

this model