td_moe_qwen3_30b_a3b_match_mobe_ep
TD-MoE Tucker decomposition on Qwen3-30B-A3B, compression matched to MoBE/Sparse-MoBE b32 (37.5% removed per gate/up matrix, 62.5% kept). Expert-preserving.
- Method:
td_moe - Base model: Qwen/Qwen3-30B-A3B
- Compressed:
gate_proj, up_projover layers 0-48 (down_projkept dense) - Settings: TD-MoE Tucker, compression_ratio=0.375 (removed), expert_mode=preserve
See config.toml and training.json for the full run configuration and per-layer reconstruction stats.
Evaluation
| Task | Metric | Value |
|---|---|---|
| arc_challenge | sample_len | 1172.0000 |
| arc_challenge | acc,none | 0.2167 |
| arc_challenge | acc_stderr,none | 0.0120 |
| arc_challenge | acc_norm,none | 0.2483 |
| arc_challenge | acc_norm_stderr,none | 0.0126 |
| arc_easy | sample_len | 2376.0000 |
| arc_easy | acc,none | 0.3363 |
| arc_easy | acc_stderr,none | 0.0097 |
| arc_easy | acc_norm,none | 0.3287 |
| arc_easy | acc_norm_stderr,none | 0.0096 |
| winogrande | sample_len | 1267.0000 |
| winogrande | acc,none | 0.5099 |
| winogrande | acc_stderr,none | 0.0140 |
| piqa | sample_len | 1838.0000 |
| piqa | acc,none | 0.5615 |
| piqa | acc_stderr,none | 0.0116 |
| piqa | acc_norm,none | 0.5560 |
| piqa | acc_norm_stderr,none | 0.0116 |
| hellaswag | sample_len | 10042.0000 |
| hellaswag | acc,none | 0.3186 |
| hellaswag | acc_stderr,none | 0.0046 |
| hellaswag | acc_norm,none | 0.3871 |
| hellaswag | acc_norm_stderr,none | 0.0049 |
| openbookqa | sample_len | 500.0000 |
| openbookqa | acc,none | 0.1980 |
| openbookqa | acc_stderr,none | 0.0178 |
| openbookqa | acc_norm,none | 0.3220 |
| openbookqa | acc_norm_stderr,none | 0.0209 |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"AverageMetaheuristicsEnjoyer/qwen3-30b-a3b-td-moe-ep-whiten-out-h100",
trust_remote_code=True,
torch_dtype="bfloat16",
device_map="auto",
)
tok = AutoTokenizer.from_pretrained("AverageMetaheuristicsEnjoyer/qwen3-30b-a3b-td-moe-ep-whiten-out-h100", trust_remote_code=True)
The factored experts are reconstructed at runtime via the bundled Qwen3TDMoEForCausalLM.
- Downloads last month
- 17