AesSedai
/

Kimi-K2.5-GGUF

GGUF

imatrix

conversational

Model card Files Files and versions

xet

Community

AesSedai commited on 28 days ago

Commit

256f0de

verified ·

1 Parent(s): 0501737

Update README.md

Browse files

Files changed (1) hide show

README.md +21 -7

README.md CHANGED Viewed

@@ -3,12 +3,21 @@ base_model:
 - moonshotai/Kimi-K2.5
 ---
 ## Updates
-02/11/2026: Vision support for K2.5 has been merged into llama.cpp's master branch and no longer needs to use the PR branch.
-02/08/2026: I've updated the PR code to address feedback and updated the mmproj files here to be compatible with the new PR code.
-02/01/2026: moonshotai has published an updated chat_template.jinja, I have updated the GGUFs in this repository so please re-download the first shard (00001) for your desired quant.
   - The default system prompt might cause confusion to users and unexpected behaviours, so we remove it.
   - The token <|media_start|> is incorrect; it has been replaced with <|media_begin|> in the chat template.
@@ -19,7 +28,12 @@ MMPROJ files for image vision input have been provided, and support has been mer
 This Q4_X quant is the "full quality" equivalent since the conditional experts are natively INT4 quantized directly from the original model, and the rest of the model is Q8_0. I also produced and tested a Q8_0 / Q4_K quant, the model size was identical and the PPL was barely higher. Their performance was about the same so I've only uploaded the Q4_X variant.
-| Quant | Size | Mixture | PPL | Uploaded? |
-| :--------- | :--------- | :------- |  :------- | :------- |
-| Q4_X | 543.62 GiB (4.55 BPW) | Q8_0 / Q4_0 | 1.8248 +/- 0.00699 | ✅ |
-| Q4_K | 543.62 GiB (4.55 BPW) | Q8_0 / Q4_K | 1.8256 +/- 0.00700 | ❌ |

 - moonshotai/Kimi-K2.5
 ---
 ## Updates
+### 03/25/2026
+I've re-quanted and uploaded new versions for the IQ2_XXS, IQ2_2, and IQ3_S quantizations. Those three are using a mixture of @eaddario's [target-bpw PR](https://github.com/ggml-org/llama.cpp/pull/12511) along with some small changes I added to support `--tensor-type` overrides.
+The result is that these quants perform better than my previous quants, and when I measured the old quants a couple of days ago it turns out there was some pretty catastrophic issues with the IQ3_S and the IQ2_S specifically. These new quants measure much better and should serve as better quality replacements.
+I don't have specific FFN Up / Gate / Down mixtures for the IQ2_XXS, IQ2_S, and IQ3_S quants due to how the bpw budget selection works, but I've kept most of the model in high quality like the rest of my MoE-optimized quants.
+### 02/11/2026
+Vision support for K2.5 has been merged into llama.cpp's master branch and no longer needs to use the PR branch.
+### 02/08/2026
+I've updated the PR code to address feedback and updated the mmproj files here to be compatible with the new PR code.
+### 02/01/2026
+ moonshotai has published an updated chat_template.jinja, I have updated the GGUFs in this repository so please re-download the first shard (00001) for your desired quant.
   - The default system prompt might cause confusion to users and unexpected behaviours, so we remove it.
   - The token <|media_start|> is incorrect; it has been replaced with <|media_begin|> in the chat template.
 This Q4_X quant is the "full quality" equivalent since the conditional experts are natively INT4 quantized directly from the original model, and the rest of the model is Q8_0. I also produced and tested a Q8_0 / Q4_K quant, the model size was identical and the PPL was barely higher. Their performance was about the same so I've only uploaded the Q4_X variant.
+| Quant | Size | Mixture | PPL | 1-(Mean PPL(Q)/PPL(base)) | KLD |
+| :--------- | :--------- | :------- | :------- | :------- | :------- |
+| Q4_X | 543.62 GiB (4.55 BPW) | Q8_0 / Q4_0 | 1.8248 +/- 0.00699 | 0 | 0 |
+| IQ3_S | 377.50 GiB (3.16 BPW) | Q8_0 / varies | 2.116713 ± 0.008620 | +16.0796% | 0.158551 ± 0.001084 |
+| IQ2_S | 311.71 GiB (2.61 BPW) | Q8_0 / varies | 2.433594 ± 0.010455 | +33.4572% | 0.294937 ± 0.001721 |
+| IQ2_XXS | 262.74 GiB (2.20 BPW) | Q8_0 / varies | 3.119876 ± 0.014508 | +71.0926% | 0.540149 ± 0.002570 |
+![kld_graph](kld_data/01_kld_vs_filesize.png "Chart showing Pareto KLD analysis of quants")
+![ppl_graph](kld_data/02_ppl_vs_filesize.png "Chart showing Pareto PPL analysis of quants")