Nemotron Super NVFP4 Count-REAP Keep90

This repository contains a structurally pruned variant of nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4.

Pruning details

  • Metric: count
  • Keep ratio: 0.90
  • Routed experts kept per MoE layer: 461 / 512
  • Experts removed per MoE layer: 51 / 512
  • Usage source: merged representative imo-answerbench final count dumps (pid61 + pid63)
  • Materialization script: compress/reap/scripts/materialize_pruned_nemotron_model.py

Notes

  • This is count-based REAP, not score-based REAP.
  • reap_prune_plan.json records the exact per-layer expert remapping.
Downloads last month
42
Safetensors
Model size
68B params
Tensor type
F32
BF16
F8_E4M3
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for aimosprite/nemotron-super-120b-a12b-nvfp4-count-reap-keep90

Quantized
(3)
this model