krampenschiesser commited on
Commit
44828a9
·
verified ·
1 Parent(s): 52f7ed6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -7
README.md CHANGED
@@ -11,12 +11,23 @@ tags:
11
  - gguf
12
  - nvfp4
13
  ---
14
- In progress...
15
 
16
- planned:
 
17
 
18
- * nvfp4, size estimate 130.8gb
19
- * iq4xs, size estimate 123.8gb
20
- * iq4xs-q4k, size estimate 126.1gb
21
- * iq4xs-q5k, size estimate 135.5gb
22
- * q3k-iq4xs, size estimate 108.6gb
 
 
 
 
 
 
 
 
 
 
 
 
11
  - gguf
12
  - nvfp4
13
  ---
 
14
 
15
+ Some quants I use depending on the memory availability at nvfp4 in the hope for custom kernels.
16
+ I recommend the Q3-IQ4XS and IQ4XS-Q5K quants. I currently use IQ4XS-Q4K.
17
 
18
+ # KLD
19
+
20
+ I need to use the Q8 version due to hardware restrictions for running the kld baseline.
21
+ However it is quantized in the same way as the original model which also uses 8 bits for the expert weights so the difference should not be big.
22
+
23
+ Sadly I am getting weird outputs (nan floats from llama-perplexity) from some kld runs so take this with a salt lake.
24
+
25
+ |Provider |Quant |Size GB |Mean PPL |Mean KLD |Same Top p |
26
+ |-----------|-----------|-----------|-----------------------|-----------------------|-------------------|
27
+ |KS |Q8 | |7.0266 +/- 0.05210 |baseline |baseline |
28
+ |KS |IQ4XS |123.8 |7.153799 ± 0.053213 |0.086127 ± 0.001029 |89.425 ± 0.082 % |
29
+ |KS |IQ4XS-Q5K |135.5 | | | |
30
+ |KS |IQ4XS-Q4K |126.1 | | | |
31
+ |KS |NVFP4 |130.8 |7.177182 ± 0.053324 |0.105053 ± 0.001034 |88.154 ± 0.086 % |
32
+ |KS |Q3K-IQ4XS |108.6 |7.297092 ± 0.054489 |0.140361 ± 0.001216 |86.387 ± 0.091 % |
33
+ |unsloth |UD-Q4_K_XL |141 | | | |