SamsungSAILMontreal
/

Qwen3-30B-A3B-Instruct-2507-HCSMoE

Text Generation

Mixture of Experts

Model card Files Files and versions

Qwen3-30B-A3B-Instruct-2507-HCSMoE

This model is a compressed version of Qwen/Qwen3-30B-A3B-Instruct-2507. It is obtained by reducing the number of experts in each MoE layer from 128 to 96 using the HCSMoE baseline method as described in https://bknyaz.github.io/blog/2026/moe/. The compressed model has 23B params (44GB) instead of 31B (57GB) of the original model, reducing storage and GPU memory requirements by roughly 25%. At the same time, the model retains >=93% of the original model's performance on a variety of benchmarks (see Results section below). Additional efficiency optimization (e.g., quantization) can be added similarly to the original model.

See additional details at Qwen3-30B-A3B-Instruct-2507-REAM.

Results

Model	Winogrande	ARC-C	ARC-E	BoolQ	HellaSwag	MMLU	OpenBookQA	RTE	AVG
Qwen3-30B-A3B-Instruct-2507	73.2	60.7	85.1	88.7	61.2	80.1	32.4	76.5	69.7
Qwen3-30B-A3B-Instruct-2507-HCSMoE	71.7	53.4	78.5	87.1	52.1	69.5	27.4	79.1	64.9

Model	IFeval	AIME25	GSM8K	GPQA-D	HumanEval	LiveCodeBench	AVG
Qwen3-30B-A3B-Instruct-2507	90.4	56.7	89.3	47.0	93.3	48.6	70.9
Qwen3-30B-A3B-Instruct-2507-HCSMoE	89.8	46.7	85.4	37.4	93.9	44.5	66.3

License

Please refer to the license of the original model Qwen/Qwen3-30B-A3B-Instruct-2507.

Downloads last month: 3

Safetensors

Model size

23B params

Tensor type

BF16

·

Model tree for SamsungSAILMontreal/Qwen3-30B-A3B-Instruct-2507-HCSMoE

Base model

Qwen/Qwen3-30B-A3B-Instruct-2507

Finetuned

(70)

this model

Quantizations

Collection including SamsungSAILMontreal/Qwen3-30B-A3B-Instruct-2507-HCSMoE

REAM

Compressed MoE models with a reduced number of experts. See additional models at https://huggingface.co/bknyaz. • 11 items • Updated 6 days ago • 5