Update README.md
Browse files
README.md
CHANGED
|
@@ -11,12 +11,23 @@ tags:
|
|
| 11 |
- gguf
|
| 12 |
- nvfp4
|
| 13 |
---
|
| 14 |
-
In progress...
|
| 15 |
|
| 16 |
-
|
|
|
|
| 17 |
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
- gguf
|
| 12 |
- nvfp4
|
| 13 |
---
|
|
|
|
| 14 |
|
| 15 |
+
Some quants I use depending on the memory availability at nvfp4 in the hope for custom kernels.
|
| 16 |
+
I recommend the Q3-IQ4XS and IQ4XS-Q5K quants. I currently use IQ4XS-Q4K.
|
| 17 |
|
| 18 |
+
# KLD
|
| 19 |
+
|
| 20 |
+
I need to use the Q8 version due to hardware restrictions for running the kld baseline.
|
| 21 |
+
However it is quantized in the same way as the original model which also uses 8 bits for the expert weights so the difference should not be big.
|
| 22 |
+
|
| 23 |
+
Sadly I am getting weird outputs (nan floats from llama-perplexity) from some kld runs so take this with a salt lake.
|
| 24 |
+
|
| 25 |
+
|Provider |Quant |Size GB |Mean PPL |Mean KLD |Same Top p |
|
| 26 |
+
|-----------|-----------|-----------|-----------------------|-----------------------|-------------------|
|
| 27 |
+
|KS |Q8 | |7.0266 +/- 0.05210 |baseline |baseline |
|
| 28 |
+
|KS |IQ4XS |123.8 |7.153799 ± 0.053213 |0.086127 ± 0.001029 |89.425 ± 0.082 % |
|
| 29 |
+
|KS |IQ4XS-Q5K |135.5 | | | |
|
| 30 |
+
|KS |IQ4XS-Q4K |126.1 | | | |
|
| 31 |
+
|KS |NVFP4 |130.8 |7.177182 ± 0.053324 |0.105053 ± 0.001034 |88.154 ± 0.086 % |
|
| 32 |
+
|KS |Q3K-IQ4XS |108.6 |7.297092 ± 0.054489 |0.140361 ± 0.001216 |86.387 ± 0.091 % |
|
| 33 |
+
|unsloth |UD-Q4_K_XL |141 | | | |
|