aimosprite
/

nemotron-super-120b-a12b-nvfp4-count-reap-keep95

Text Generation

8-bit precision

Model card Files Files and versions

Nemotron Super NVFP4 Count-REAP Keep95

This repository contains a structurally pruned variant of nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4.

Pruning details

Metric: count
Keep ratio: 0.95
Routed experts kept per MoE layer: 487 / 512
Experts removed per MoE layer: 25 / 512
Usage source: merged representative imo-answerbench final count dumps (pid61 + pid63)
Materialization script: compress/reap/scripts/materialize_pruned_nemotron_model.py

Notes

This is count-based REAP, not score-based REAP.
reap_prune_plan.json records the exact per-layer expert remapping.

Downloads last month: 47

Safetensors

Model size

71B params

Tensor type

F32

·

BF16

·

F8_E4M3

·

U8

·

Model tree for aimosprite/nemotron-super-120b-a12b-nvfp4-count-reap-keep95

Base model

nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4

Quantized

(3)

this model