File size: 1,639 Bytes
173b748
 
 
 
 
 
 
 
 
 
 
 
 
be3b973
382e6ac
 
 
be3b973
44828a9
 
 
 
 
 
 
 
 
 
0f4c245
44828a9
382e6ac
44828a9
0f4c245
44828a9
0f4c245
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
---
pipeline_tag: text-generation
license: other
license_name: modified-mit
license_link: https://github.com/MiniMax-AI/MiniMax-M2.7/blob/main/LICENSE
base_model:
  - MiniMaxAI/MiniMax-M2.7
tags:
  - minimax_m2
  - llama.cpp
  - gguf
  - nvfp4
---

These are some quants I use depending on the memory availability.
I also added nvfp4 in the hope for custom kernels emerging in the future.
I recommend the Q3K-IQ4XS and IQ4XS-Q5K quants.

# KLD

I need to use the Q8 version due to hardware restrictions for running the kld baseline.
However it is quantized in the same way as the original model which also uses 8 bits for the expert weights so the difference should not be big.

Sadly I am getting weird outputs (nan floats from llama-perplexity) from some kld runs so take this with a salt lake.

|Provider   |Quant      |Size GB    |Mean PPL               |Mean KLD               |Same Top p         |
|-----------|-----------|-----------|-----------------------|-----------------------|-------------------|
|KS         |Q8         |           |7.0266 +/- 0.05210     |baseline               |baseline           |
|KS         |IQ4XS-Q5K  |135.5      |  |  |90.720 ± 0.077 %   |
|KS         |IQ4XS      |123.8      |7.153799 ±   0.053213  |0.086127 ±   0.001029  |89.425 ± 0.082 %   |
|KS         |IQ4XS-Q4K  |126.1      |  |  |89.205 ± 0.083 %   |
|KS         |NVFP4      |130.8      |7.177182 ±   0.053324  |0.105053 ±   0.001034  |88.154 ± 0.086 %   |
|unsloth    |UD-Q4_K_XL |141        |  |  |86.990 ± 0.090 %   |
|KS         |Q3K-IQ4XS  |108.6      |7.297092 ±   0.054489  |0.140361 ±   0.001216  |86.387 ± 0.091 %   |