td_moe_qwen3_30b_a3b_match_mobe_ep_whiten_both
TD-MoE Tucker decomposition on Qwen3-30B-A3B, compression matched to MoBE/Sparse-MoBE b32 (37.5% removed per gate/up matrix, 62.5% kept). Expert-preserving, BOTH input+output whitening (full method).
- Method:
td_moe - Base model: Qwen/Qwen3-30B-A3B
- Compressed:
gate_proj, up_projover layers 0-48 (down_projkept dense) - Settings: TD-MoE Tucker, compression_ratio=0.375 (removed), expert_mode=preserve
See config.toml and training.json for the full run configuration and per-layer reconstruction stats.
Evaluation
| Task | Metric | Value |
|---|---|---|
| arc_challenge | sample_len | 1172.0000 |
| arc_challenge | acc,none | 0.3464 |
| arc_challenge | acc_stderr,none | 0.0139 |
| arc_challenge | acc_norm,none | 0.3686 |
| arc_challenge | acc_norm_stderr,none | 0.0141 |
| arc_easy | sample_len | 2376.0000 |
| arc_easy | acc,none | 0.6014 |
| arc_easy | acc_stderr,none | 0.0100 |
| arc_easy | acc_norm,none | 0.5623 |
| arc_easy | acc_norm_stderr,none | 0.0102 |
| winogrande | sample_len | 1267.0000 |
| winogrande | acc,none | 0.6235 |
| winogrande | acc_stderr,none | 0.0136 |
| piqa | sample_len | 1838.0000 |
| piqa | acc,none | 0.6855 |
| piqa | acc_stderr,none | 0.0108 |
| piqa | acc_norm,none | 0.6959 |
| piqa | acc_norm_stderr,none | 0.0107 |
| hellaswag | sample_len | 10042.0000 |
| hellaswag | acc,none | 0.3973 |
| hellaswag | acc_stderr,none | 0.0049 |
| hellaswag | acc_norm,none | 0.5232 |
| hellaswag | acc_norm_stderr,none | 0.0050 |
| openbookqa | sample_len | 500.0000 |
| openbookqa | acc,none | 0.2620 |
| openbookqa | acc_stderr,none | 0.0197 |
| openbookqa | acc_norm,none | 0.3360 |
| openbookqa | acc_norm_stderr,none | 0.0211 |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"AverageMetaheuristicsEnjoyer/qwen3-30b-a3b-td-moe-ep-whiten-both-fix",
trust_remote_code=True,
torch_dtype="bfloat16",
device_map="auto",
)
tok = AutoTokenizer.from_pretrained("AverageMetaheuristicsEnjoyer/qwen3-30b-a3b-td-moe-ep-whiten-both-fix", trust_remote_code=True)
The factored experts are reconstructed at runtime via the bundled Qwen3TDMoEForCausalLM.
- Downloads last month
- 12