AesSedai commited on
Commit
256f0de
·
verified ·
1 Parent(s): 0501737

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -7
README.md CHANGED
@@ -3,12 +3,21 @@ base_model:
3
  - moonshotai/Kimi-K2.5
4
  ---
5
  ## Updates
 
 
6
 
7
- 02/11/2026: Vision support for K2.5 has been merged into llama.cpp's master branch and no longer needs to use the PR branch.
8
 
9
- 02/08/2026: I've updated the PR code to address feedback and updated the mmproj files here to be compatible with the new PR code.
10
 
11
- 02/01/2026: moonshotai has published an updated chat_template.jinja, I have updated the GGUFs in this repository so please re-download the first shard (00001) for your desired quant.
 
 
 
 
 
 
 
12
  - The default system prompt might cause confusion to users and unexpected behaviours, so we remove it.
13
  - The token <|media_start|> is incorrect; it has been replaced with <|media_begin|> in the chat template.
14
 
@@ -19,7 +28,12 @@ MMPROJ files for image vision input have been provided, and support has been mer
19
 
20
  This Q4_X quant is the "full quality" equivalent since the conditional experts are natively INT4 quantized directly from the original model, and the rest of the model is Q8_0. I also produced and tested a Q8_0 / Q4_K quant, the model size was identical and the PPL was barely higher. Their performance was about the same so I've only uploaded the Q4_X variant.
21
 
22
- | Quant | Size | Mixture | PPL | Uploaded? |
23
- | :--------- | :--------- | :------- | :------- | :------- |
24
- | Q4_X | 543.62 GiB (4.55 BPW) | Q8_0 / Q4_0 | 1.8248 +/- 0.00699 | |
25
- | Q4_K | 543.62 GiB (4.55 BPW) | Q8_0 / Q4_K | 1.8256 +/- 0.00700 | |
 
 
 
 
 
 
3
  - moonshotai/Kimi-K2.5
4
  ---
5
  ## Updates
6
+ ### 03/25/2026
7
+ I've re-quanted and uploaded new versions for the IQ2_XXS, IQ2_2, and IQ3_S quantizations. Those three are using a mixture of @eaddario's [target-bpw PR](https://github.com/ggml-org/llama.cpp/pull/12511) along with some small changes I added to support `--tensor-type` overrides.
8
 
9
+ The result is that these quants perform better than my previous quants, and when I measured the old quants a couple of days ago it turns out there was some pretty catastrophic issues with the IQ3_S and the IQ2_S specifically. These new quants measure much better and should serve as better quality replacements.
10
 
11
+ I don't have specific FFN Up / Gate / Down mixtures for the IQ2_XXS, IQ2_S, and IQ3_S quants due to how the bpw budget selection works, but I've kept most of the model in high quality like the rest of my MoE-optimized quants.
12
 
13
+ ### 02/11/2026
14
+ Vision support for K2.5 has been merged into llama.cpp's master branch and no longer needs to use the PR branch.
15
+
16
+ ### 02/08/2026
17
+ I've updated the PR code to address feedback and updated the mmproj files here to be compatible with the new PR code.
18
+
19
+ ### 02/01/2026
20
+ moonshotai has published an updated chat_template.jinja, I have updated the GGUFs in this repository so please re-download the first shard (00001) for your desired quant.
21
  - The default system prompt might cause confusion to users and unexpected behaviours, so we remove it.
22
  - The token <|media_start|> is incorrect; it has been replaced with <|media_begin|> in the chat template.
23
 
 
28
 
29
  This Q4_X quant is the "full quality" equivalent since the conditional experts are natively INT4 quantized directly from the original model, and the rest of the model is Q8_0. I also produced and tested a Q8_0 / Q4_K quant, the model size was identical and the PPL was barely higher. Their performance was about the same so I've only uploaded the Q4_X variant.
30
 
31
+ | Quant | Size | Mixture | PPL | 1-(Mean PPL(Q)/PPL(base)) | KLD |
32
+ | :--------- | :--------- | :------- | :------- | :------- | :------- |
33
+ | Q4_X | 543.62 GiB (4.55 BPW) | Q8_0 / Q4_0 | 1.8248 +/- 0.00699 | 0 | 0 |
34
+ | IQ3_S | 377.50 GiB (3.16 BPW) | Q8_0 / varies | 2.116713 ± 0.008620 | +16.0796% | 0.158551 ± 0.001084 |
35
+ | IQ2_S | 311.71 GiB (2.61 BPW) | Q8_0 / varies | 2.433594 ± 0.010455 | +33.4572% | 0.294937 ± 0.001721 |
36
+ | IQ2_XXS | 262.74 GiB (2.20 BPW) | Q8_0 / varies | 3.119876 ± 0.014508 | +71.0926% | 0.540149 ± 0.002570 |
37
+
38
+ ![kld_graph](kld_data/01_kld_vs_filesize.png "Chart showing Pareto KLD analysis of quants")
39
+ ![ppl_graph](kld_data/02_ppl_vs_filesize.png "Chart showing Pareto PPL analysis of quants")