Update README.md
Browse files
README.md
CHANGED
|
@@ -3,12 +3,21 @@ base_model:
|
|
| 3 |
- moonshotai/Kimi-K2.5
|
| 4 |
---
|
| 5 |
## Updates
|
|
|
|
|
|
|
| 6 |
|
| 7 |
-
|
| 8 |
|
| 9 |
-
|
| 10 |
|
| 11 |
-
02/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
- The default system prompt might cause confusion to users and unexpected behaviours, so we remove it.
|
| 13 |
- The token <|media_start|> is incorrect; it has been replaced with <|media_begin|> in the chat template.
|
| 14 |
|
|
@@ -19,7 +28,12 @@ MMPROJ files for image vision input have been provided, and support has been mer
|
|
| 19 |
|
| 20 |
This Q4_X quant is the "full quality" equivalent since the conditional experts are natively INT4 quantized directly from the original model, and the rest of the model is Q8_0. I also produced and tested a Q8_0 / Q4_K quant, the model size was identical and the PPL was barely higher. Their performance was about the same so I've only uploaded the Q4_X variant.
|
| 21 |
|
| 22 |
-
| Quant | Size | Mixture | PPL |
|
| 23 |
-
| :--------- | :--------- | :------- |
|
| 24 |
-
| Q4_X | 543.62 GiB (4.55 BPW) | Q8_0 / Q4_0 | 1.8248 +/- 0.00699 |
|
| 25 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
- moonshotai/Kimi-K2.5
|
| 4 |
---
|
| 5 |
## Updates
|
| 6 |
+
### 03/25/2026
|
| 7 |
+
I've re-quanted and uploaded new versions for the IQ2_XXS, IQ2_2, and IQ3_S quantizations. Those three are using a mixture of @eaddario's [target-bpw PR](https://github.com/ggml-org/llama.cpp/pull/12511) along with some small changes I added to support `--tensor-type` overrides.
|
| 8 |
|
| 9 |
+
The result is that these quants perform better than my previous quants, and when I measured the old quants a couple of days ago it turns out there was some pretty catastrophic issues with the IQ3_S and the IQ2_S specifically. These new quants measure much better and should serve as better quality replacements.
|
| 10 |
|
| 11 |
+
I don't have specific FFN Up / Gate / Down mixtures for the IQ2_XXS, IQ2_S, and IQ3_S quants due to how the bpw budget selection works, but I've kept most of the model in high quality like the rest of my MoE-optimized quants.
|
| 12 |
|
| 13 |
+
### 02/11/2026
|
| 14 |
+
Vision support for K2.5 has been merged into llama.cpp's master branch and no longer needs to use the PR branch.
|
| 15 |
+
|
| 16 |
+
### 02/08/2026
|
| 17 |
+
I've updated the PR code to address feedback and updated the mmproj files here to be compatible with the new PR code.
|
| 18 |
+
|
| 19 |
+
### 02/01/2026
|
| 20 |
+
moonshotai has published an updated chat_template.jinja, I have updated the GGUFs in this repository so please re-download the first shard (00001) for your desired quant.
|
| 21 |
- The default system prompt might cause confusion to users and unexpected behaviours, so we remove it.
|
| 22 |
- The token <|media_start|> is incorrect; it has been replaced with <|media_begin|> in the chat template.
|
| 23 |
|
|
|
|
| 28 |
|
| 29 |
This Q4_X quant is the "full quality" equivalent since the conditional experts are natively INT4 quantized directly from the original model, and the rest of the model is Q8_0. I also produced and tested a Q8_0 / Q4_K quant, the model size was identical and the PPL was barely higher. Their performance was about the same so I've only uploaded the Q4_X variant.
|
| 30 |
|
| 31 |
+
| Quant | Size | Mixture | PPL | 1-(Mean PPL(Q)/PPL(base)) | KLD |
|
| 32 |
+
| :--------- | :--------- | :------- | :------- | :------- | :------- |
|
| 33 |
+
| Q4_X | 543.62 GiB (4.55 BPW) | Q8_0 / Q4_0 | 1.8248 +/- 0.00699 | 0 | 0 |
|
| 34 |
+
| IQ3_S | 377.50 GiB (3.16 BPW) | Q8_0 / varies | 2.116713 ± 0.008620 | +16.0796% | 0.158551 ± 0.001084 |
|
| 35 |
+
| IQ2_S | 311.71 GiB (2.61 BPW) | Q8_0 / varies | 2.433594 ± 0.010455 | +33.4572% | 0.294937 ± 0.001721 |
|
| 36 |
+
| IQ2_XXS | 262.74 GiB (2.20 BPW) | Q8_0 / varies | 3.119876 ± 0.014508 | +71.0926% | 0.540149 ± 0.002570 |
|
| 37 |
+
|
| 38 |
+

|
| 39 |
+

|