File size: 1,639 Bytes
173b748 be3b973 382e6ac be3b973 44828a9 0f4c245 44828a9 382e6ac 44828a9 0f4c245 44828a9 0f4c245 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | ---
pipeline_tag: text-generation
license: other
license_name: modified-mit
license_link: https://github.com/MiniMax-AI/MiniMax-M2.7/blob/main/LICENSE
base_model:
- MiniMaxAI/MiniMax-M2.7
tags:
- minimax_m2
- llama.cpp
- gguf
- nvfp4
---
These are some quants I use depending on the memory availability.
I also added nvfp4 in the hope for custom kernels emerging in the future.
I recommend the Q3K-IQ4XS and IQ4XS-Q5K quants.
# KLD
I need to use the Q8 version due to hardware restrictions for running the kld baseline.
However it is quantized in the same way as the original model which also uses 8 bits for the expert weights so the difference should not be big.
Sadly I am getting weird outputs (nan floats from llama-perplexity) from some kld runs so take this with a salt lake.
|Provider |Quant |Size GB |Mean PPL |Mean KLD |Same Top p |
|-----------|-----------|-----------|-----------------------|-----------------------|-------------------|
|KS |Q8 | |7.0266 +/- 0.05210 |baseline |baseline |
|KS |IQ4XS-Q5K |135.5 | | |90.720 ± 0.077 % |
|KS |IQ4XS |123.8 |7.153799 ± 0.053213 |0.086127 ± 0.001029 |89.425 ± 0.082 % |
|KS |IQ4XS-Q4K |126.1 | | |89.205 ± 0.083 % |
|KS |NVFP4 |130.8 |7.177182 ± 0.053324 |0.105053 ± 0.001034 |88.154 ± 0.086 % |
|unsloth |UD-Q4_K_XL |141 | | |86.990 ± 0.090 % |
|KS |Q3K-IQ4XS |108.6 |7.297092 ± 0.054489 |0.140361 ± 0.001216 |86.387 ± 0.091 % |
|