td_moe_qwen3_30b_a3b_match_mobe_ep

TD-MoE Tucker decomposition on Qwen3-30B-A3B, compression matched to MoBE/Sparse-MoBE b32 (37.5% removed per gate/up matrix, 62.5% kept). Expert-preserving.

Method: td_moe
Base model: Qwen/Qwen3-30B-A3B
Compressed: gate_proj, up_proj over layers 0-48 (down_proj kept dense)
Settings: TD-MoE Tucker, compression_ratio=0.375 (removed), expert_mode=preserve

See config.toml and training.json for the full run configuration and per-layer reconstruction stats.

Evaluation

Task	Metric	Value
arc_challenge	sample_len	1172.0000
arc_challenge	acc,none	0.2167
arc_challenge	acc_stderr,none	0.0120
arc_challenge	acc_norm,none	0.2483
arc_challenge	acc_norm_stderr,none	0.0126
arc_easy	sample_len	2376.0000
arc_easy	acc,none	0.3363
arc_easy	acc_stderr,none	0.0097
arc_easy	acc_norm,none	0.3287
arc_easy	acc_norm_stderr,none	0.0096
winogrande	sample_len	1267.0000
winogrande	acc,none	0.5099
winogrande	acc_stderr,none	0.0140
piqa	sample_len	1838.0000
piqa	acc,none	0.5615
piqa	acc_stderr,none	0.0116
piqa	acc_norm,none	0.5560
piqa	acc_norm_stderr,none	0.0116
hellaswag	sample_len	10042.0000
hellaswag	acc,none	0.3186
hellaswag	acc_stderr,none	0.0046
hellaswag	acc_norm,none	0.3871
hellaswag	acc_norm_stderr,none	0.0049
openbookqa	sample_len	500.0000
openbookqa	acc,none	0.1980
openbookqa	acc_stderr,none	0.0178
openbookqa	acc_norm,none	0.3220
openbookqa	acc_norm_stderr,none	0.0209

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "AverageMetaheuristicsEnjoyer/qwen3-30b-a3b-td-moe-ep-whiten-out-h100",
    trust_remote_code=True,
    torch_dtype="bfloat16",
    device_map="auto",
)
tok = AutoTokenizer.from_pretrained("AverageMetaheuristicsEnjoyer/qwen3-30b-a3b-td-moe-ep-whiten-out-h100", trust_remote_code=True)

The factored experts are reconstructed at runtime via the bundled Qwen3TDMoEForCausalLM.

Downloads last month: 17

Safetensors

Model size

23B params

Tensor type

BF16

Model tree for AverageMetaheuristicsEnjoyer/qwen3-30b-a3b-td-moe-ep-whiten-out-h100

Base model

Qwen/Qwen3-30B-A3B-Base

Finetuned

Qwen/Qwen3-30B-A3B

Finetuned

(62)

this model