Nemotron Super NVFP4 Count-REAP Keep90
This repository contains a structurally pruned variant of nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4.
Pruning details
- Metric:
count - Keep ratio:
0.90 - Routed experts kept per MoE layer:
461 / 512 - Experts removed per MoE layer:
51 / 512 - Usage source: merged representative
imo-answerbenchfinal count dumps (pid61+pid63) - Materialization script:
compress/reap/scripts/materialize_pruned_nemotron_model.py
Notes
- This is count-based REAP, not score-based REAP.
reap_prune_plan.jsonrecords the exact per-layer expert remapping.
- Downloads last month
- 42