Qwen3-30B-A3B-Instruct-2507-HCSMoE

This model is a compressed version of Qwen/Qwen3-30B-A3B-Instruct-2507. It is obtained by reducing the number of experts in each MoE layer from 128 to 96 using the HCSMoE baseline method as described in https://bknyaz.github.io/blog/2026/moe/. The compressed model has 23B params (44GB) instead of 31B (57GB) of the original model, reducing storage and GPU memory requirements by roughly 25%. At the same time, the model retains >=93% of the original model's performance on a variety of benchmarks (see Results section below). Additional efficiency optimization (e.g., quantization) can be added similarly to the original model.

See additional details at Qwen3-30B-A3B-Instruct-2507-REAM.

Results

Model Winogrande ARC-C ARC-E BoolQ HellaSwag MMLU OpenBookQA RTE AVG
Qwen3-30B-A3B-Instruct-2507 73.2 60.7 85.1 88.7 61.2 80.1 32.4 76.5 69.7
Qwen3-30B-A3B-Instruct-2507-HCSMoE 71.7 53.4 78.5 87.1 52.1 69.5 27.4 79.1 64.9
Model IFeval AIME25 GSM8K GPQA-D HumanEval LiveCodeBench AVG
Qwen3-30B-A3B-Instruct-2507 90.4 56.7 89.3 47.0 93.3 48.6 70.9
Qwen3-30B-A3B-Instruct-2507-HCSMoE 89.8 46.7 85.4 37.4 93.9 44.5 66.3

License

Please refer to the license of the original model Qwen/Qwen3-30B-A3B-Instruct-2507.

Downloads last month
3
Safetensors
Model size
23B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SamsungSAILMontreal/Qwen3-30B-A3B-Instruct-2507-HCSMoE

Finetuned
(70)
this model
Quantizations
2 models

Collection including SamsungSAILMontreal/Qwen3-30B-A3B-Instruct-2507-HCSMoE