AesSedai
/

GLM-4.7-GGUF

Model card Files Files and versions

AesSedai commited on Dec 24, 2025

Commit

9bbceea

·

verified ·

1 Parent(s): 9f16c00

Create README.md

Files changed (1) hide show

README.md +22 -0

README.md ADDED Viewed

	@@ -0,0 +1,22 @@

+---
+base_model:
+- zai-org/GLM-4.7
+---
+This repo contains specialized MoE-quants for GLM-4.7. The idea being that given the huge size of the FFN tensors compared to the rest of the tensors in the model, it should be possible to achieve a better quality while keeping the overall size of the entire model smaller compared to a similar naive quantization. To that end, the quantization type default is kept in high quality (Q8_0 to Q5_K) and the FFN UP + FFN GATE tensors are quanted down along with the FFN DOWN tensors.
+The mixture convention is as follows: `[Default Type]-[FFN_UP]-[FFN_GATE]-[FFN_DOWN]`, eg: `Q8_0-Q4_K-Q4_K-Q5_K`. This means:
+- Q8_0 is the default type (attention, shared expert, etc.)
+- Q4_K was used for the FFN_UP and FFN_GATE conditional expert tensors
+- Q5_K was used for the FFN_DOWN conditional expert tensors
+I've mapped these mixes to the closest BPW I could reasonably discern.
+| Quant | Size | Mixture | PPL | KLD |
+| :--------- | :--------- | :------- | :------- | :--------- |
+| Q8_0 | 354.79 GiB (8.50 BPW) | Q8_0 | 8.6821 ± 0.15706 | 0 |
+| Q5_K_M | 250.15 GiB (6.00 BPW) | Q8_0-Q5_K-Q5_K-Q6_K | 8.682378 ± 0.157101 | 0.011578 ± 0.000687 |
+| Q4_K_M | 209.77 GiB (5.03 BPW) | Q8_0-Q4_K-Q4_K-Q5_K | 8.746787 ± 0.158456 | 0.017262 ± 0.000585 |
+| IQ4_XS | 165.28 GiB (3.96 BPW) | Q8_0-IQ3_S-IQ3_S-IQ4_XS | 8.866443 ± 0.160719 | 0.043752 ± 0.001071 |
+| IQ2_M  | 107.12 GiB (2.57 BPW) | Q5_K-IQ2_XXS-IQ2_XXS-IQ3_XXS | 9.824880 ± 0.179312 | 0.194644 ± 0.003154 |
+![ppl_ratio_vs_kld](ppl_ratio_vs_kld.png "Chart showing PPL vs KLD analysis of quants")