Nemotron Super NVFP4 Count-REAP Keep95
This repository contains a structurally pruned variant of nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4.
Pruning details
- Metric:
count - Keep ratio:
0.95 - Routed experts kept per MoE layer:
487 / 512 - Experts removed per MoE layer:
25 / 512 - Usage source: merged representative
imo-answerbenchfinal count dumps (pid61+pid63) - Materialization script:
compress/reap/scripts/materialize_pruned_nemotron_model.py
Notes
- This is count-based REAP, not score-based REAP.
reap_prune_plan.jsonrecords the exact per-layer expert remapping.
- Downloads last month
- 47