td_moe_qwen3_30b_a3b_match_mobe_ep

TD-MoE Tucker decomposition on Qwen3-30B-A3B, compression matched to MoBE/Sparse-MoBE b32 (37.5% removed per gate/up matrix, 62.5% kept). Expert-preserving.

  • Method: td_moe
  • Base model: Qwen/Qwen3-30B-A3B
  • Compressed: gate_proj, up_proj over layers 0-48 (down_proj kept dense)
  • Settings: TD-MoE Tucker, compression_ratio=0.375 (removed), expert_mode=preserve

See config.toml and training.json for the full run configuration and per-layer reconstruction stats.

Evaluation

Task Metric Value
arc_challenge sample_len 1172.0000
arc_challenge acc,none 0.2167
arc_challenge acc_stderr,none 0.0120
arc_challenge acc_norm,none 0.2483
arc_challenge acc_norm_stderr,none 0.0126
arc_easy sample_len 2376.0000
arc_easy acc,none 0.3363
arc_easy acc_stderr,none 0.0097
arc_easy acc_norm,none 0.3287
arc_easy acc_norm_stderr,none 0.0096
winogrande sample_len 1267.0000
winogrande acc,none 0.5099
winogrande acc_stderr,none 0.0140
piqa sample_len 1838.0000
piqa acc,none 0.5615
piqa acc_stderr,none 0.0116
piqa acc_norm,none 0.5560
piqa acc_norm_stderr,none 0.0116
hellaswag sample_len 10042.0000
hellaswag acc,none 0.3186
hellaswag acc_stderr,none 0.0046
hellaswag acc_norm,none 0.3871
hellaswag acc_norm_stderr,none 0.0049
openbookqa sample_len 500.0000
openbookqa acc,none 0.1980
openbookqa acc_stderr,none 0.0178
openbookqa acc_norm,none 0.3220
openbookqa acc_norm_stderr,none 0.0209

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "AverageMetaheuristicsEnjoyer/qwen3-30b-a3b-td-moe-ep-whiten-out-h100",
    trust_remote_code=True,
    torch_dtype="bfloat16",
    device_map="auto",
)
tok = AutoTokenizer.from_pretrained("AverageMetaheuristicsEnjoyer/qwen3-30b-a3b-td-moe-ep-whiten-out-h100", trust_remote_code=True)

The factored experts are reconstructed at runtime via the bundled Qwen3TDMoEForCausalLM.

Downloads last month
17
Safetensors
Model size
23B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AverageMetaheuristicsEnjoyer/qwen3-30b-a3b-td-moe-ep-whiten-out-h100

Finetuned
(62)
this model