---
library_name: transformers
tags:
- compression
- expert-merging
- moe
license: apache-2.0
base_model:
- Qwen/Qwen3-235B-A22B-Instruct-2507
---

# Qwen3-235B-A22B-Instruct-2507-REAP

This model is a compressed version of [Qwen/Qwen3-235B-A22B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507). 
It is obtained by reducing the number of experts in each MoE layer from 128 to 96 using the REAP baseline method as described in https://bknyaz.github.io/blog/2026/moe/.
The compressed model has 180B params (350GB) instead of 235B (470GB) of the original model, 
reducing storage and GPU memory requirements by roughly 25%. At the same time, 
the model retains >=99% of the original model's performance on a variety of benchmarks (see Results section below).
Additional efficiency optimization (e.g., quantization) can be added similarly to the original model.

See additional details at [Qwen3-30B-A3B-Instruct-2507-REAM](https://huggingface.co/SamsungSAILMontreal/Qwen3-30B-A3B-Instruct-2507-REAM).

### Results

| Model                              | IFeval | AIME25 | GSM8K | GPQA-D | HumanEval | LiveCodeBench | AVG   |
|------------------------------------|--------|--------|-------|--------|-----------|---------------|-------|
| Qwen3-235B-A22B-Instruct-2507      | 93.3   | 66.7   | 89.4  | 48.5   | 95.1      | 46.4          | 73.2  | 
| Qwen3-235B-A22B-Instruct-2507-REAP | 92.0   | 63.3   | 88.8  | 46.0   | 94.5      | 53.1          | 72.9  |

## License

Please refer to the license of the original model [Qwen/Qwen3-235B-A22B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507).