rdtand commited on
Commit
478dd97
·
verified ·
1 Parent(s): bfd62d7

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -65,7 +65,7 @@ Both the format and the prune set are priced in the same knapsack via REAP-style
65
 
66
  $$S_j = \frac{1}{T_{\text{cal}}} \sum_t g_j(t) \cdot \lVert f_j(t) \rVert_2^2$$
67
 
68
- This is the **dropout-loss estimate** from the [REAP paper](https://arxiv.org/abs/2410.21271): how much the layer's output norm drops in expectation when expert `j` is removed, weighted by the gradient signal flowing through that expert. Sum across experts and you get a per-(router, expert) score in Δloss units, directly comparable to the quantization Δloss.
69
 
70
  Per-layer prune candidates emit `floor(R · num_experts)` lowest-S experts at each ratio R; the DP picks (R, format) jointly. After the pareto sweep, prismaquant produces a **uniform-kept** prune manifest so vLLM's MoE kernel sees a single `num_local_experts` per layer (this artifact: 176 of 256 kept everywhere).
71
 
@@ -215,7 +215,7 @@ Full source + reproduction notes: <https://github.com/RobTand/prismaquant>
215
 
216
  - [MiniMaxAI](https://huggingface.co/MiniMaxAI) — source model.
217
  - [vLLM](https://github.com/vllm-project/vllm) — compressed-tensors serving stack with native NVFP4 + FP8 MoE kernels.
218
- - [REAP (Lasby et al. 2025)](https://arxiv.org/abs/2410.21271) — per-expert dropout-loss saliency formulation.
219
  - HAQ / HAWQ-V1/V2/V3 (Wang, Dong, Yao, et al.) — mixed-precision allocation foundations.
220
  - GPTQ (Frantar et al. 2022), AutoRound — per-Linear quantizer building blocks.
221
 
 
65
 
66
  $$S_j = \frac{1}{T_{\text{cal}}} \sum_t g_j(t) \cdot \lVert f_j(t) \rVert_2^2$$
67
 
68
+ This is the **dropout-loss estimate** from the REAP family of MoE expert-importance scores: how much the layer's output norm drops in expectation when expert `j` is removed, weighted by the gradient signal flowing through that expert. Sum across experts and you get a per-(router, expert) score in Δloss units, directly comparable to the quantization Δloss.
69
 
70
  Per-layer prune candidates emit `floor(R · num_experts)` lowest-S experts at each ratio R; the DP picks (R, format) jointly. After the pareto sweep, prismaquant produces a **uniform-kept** prune manifest so vLLM's MoE kernel sees a single `num_local_experts` per layer (this artifact: 176 of 256 kept everywhere).
71
 
 
215
 
216
  - [MiniMaxAI](https://huggingface.co/MiniMaxAI) — source model.
217
  - [vLLM](https://github.com/vllm-project/vllm) — compressed-tensors serving stack with native NVFP4 + FP8 MoE kernels.
218
+ - REAP-style per-expert dropout-loss saliency.
219
  - HAQ / HAWQ-V1/V2/V3 (Wang, Dong, Yao, et al.) — mixed-precision allocation foundations.
220
  - GPTQ (Frantar et al. 2022), AutoRound — per-Linear quantizer building blocks.
221