metadata
pipeline_tag: text-generation
license: other
license_name: modified-mit
license_link: https://github.com/MiniMax-AI/MiniMax-M2.7/blob/main/LICENSE
base_model:
- MiniMaxAI/MiniMax-M2.7
tags:
- minimax_m2
- llama.cpp
- gguf
- nvfp4
These are some quants I use depending on the memory availability. I also added nvfp4 in the hope for custom kernels emerging in the future. I recommend the Q3K-IQ4XS and IQ4XS-Q5K quants.
KLD
I need to use the Q8 version due to hardware restrictions for running the kld baseline. However it is quantized in the same way as the original model which also uses 8 bits for the expert weights so the difference should not be big.
Sadly I am getting weird outputs (nan floats from llama-perplexity) from some kld runs so take this with a salt lake.
| Provider | Quant | Size GB | Mean PPL | Mean KLD | Same Top p |
|---|---|---|---|---|---|
| KS | Q8 | 7.0266 +/- 0.05210 | baseline | baseline | |
| KS | IQ4XS-Q5K | 135.5 | 90.720 ± 0.077 % | ||
| KS | IQ4XS | 123.8 | 7.153799 ± 0.053213 | 0.086127 ± 0.001029 | 89.425 ± 0.082 % |
| KS | IQ4XS-Q4K | 126.1 | 89.205 ± 0.083 % | ||
| KS | NVFP4 | 130.8 | 7.177182 ± 0.053324 | 0.105053 ± 0.001034 | 88.154 ± 0.086 % |
| unsloth | UD-Q4_K_XL | 141 | 86.990 ± 0.090 % | ||
| KS | Q3K-IQ4XS | 108.6 | 7.297092 ± 0.054489 | 0.140361 ± 0.001216 | 86.387 ± 0.091 % |