Update README.md
#7
by JunyanYang - opened
README.md
CHANGED
|
@@ -16,8 +16,9 @@ base_model:
|
|
| 16 |
- **Operating System(s):** Linux
|
| 17 |
- **Inference Engine:** [vLLM](https://docs.vllm.ai/en/latest/)
|
| 18 |
- **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html) (V0.11.1)
|
| 19 |
-
- **
|
| 20 |
-
- **
|
|
|
|
| 21 |
- **Calibration Dataset:** [Pile](https://huggingface.co/datasets/mit-han-lab/pile-val-backup)
|
| 22 |
|
| 23 |
This model was built with Kimi-K2-Thinking model by applying [AMD-Quark](https://quark.docs.amd.com/latest/index.html) for MXFP4 quantization.
|
|
@@ -29,7 +30,7 @@ The model was quantized from [unsloth/Kimi-K2-Thinking-BF16](https://huggingface
|
|
| 29 |
**Quantization scripts:**
|
| 30 |
```
|
| 31 |
cd Quark/examples/torch/language_modeling/llm_ptq/
|
| 32 |
-
exclude_layers="*self_attn* *mlp.gate *lm_head *mlp.gate_proj *mlp.up_proj *mlp.down_proj
|
| 33 |
|
| 34 |
python quantize_quark.py \
|
| 35 |
--model_dir unsloth/Kimi-K2-Thinking-BF16 \
|
|
@@ -61,13 +62,13 @@ The model was evaluated on GSM8K benchmarks.
|
|
| 61 |
</td>
|
| 62 |
</tr>
|
| 63 |
<tr>
|
| 64 |
-
<td>GSM8K (
|
| 65 |
</td>
|
| 66 |
<td>94.16
|
| 67 |
</td>
|
| 68 |
-
<td>93.
|
| 69 |
</td>
|
| 70 |
-
<td>
|
| 71 |
</td>
|
| 72 |
</tr>
|
| 73 |
</table>
|
|
|
|
| 16 |
- **Operating System(s):** Linux
|
| 17 |
- **Inference Engine:** [vLLM](https://docs.vllm.ai/en/latest/)
|
| 18 |
- **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html) (V0.11.1)
|
| 19 |
+
- **Quantized layers:** Experts, Shared_experts
|
| 20 |
+
- **Weight quantization:** OCP MXFP4, Static
|
| 21 |
+
- **Activation quantization:** OCP MXFP4, Dynamic
|
| 22 |
- **Calibration Dataset:** [Pile](https://huggingface.co/datasets/mit-han-lab/pile-val-backup)
|
| 23 |
|
| 24 |
This model was built with Kimi-K2-Thinking model by applying [AMD-Quark](https://quark.docs.amd.com/latest/index.html) for MXFP4 quantization.
|
|
|
|
| 30 |
**Quantization scripts:**
|
| 31 |
```
|
| 32 |
cd Quark/examples/torch/language_modeling/llm_ptq/
|
| 33 |
+
exclude_layers="*self_attn* *mlp.gate *lm_head *mlp.gate_proj *mlp.up_proj *mlp.down_proj"
|
| 34 |
|
| 35 |
python quantize_quark.py \
|
| 36 |
--model_dir unsloth/Kimi-K2-Thinking-BF16 \
|
|
|
|
| 62 |
</td>
|
| 63 |
</tr>
|
| 64 |
<tr>
|
| 65 |
+
<td>GSM8K (flexible-extract)
|
| 66 |
</td>
|
| 67 |
<td>94.16
|
| 68 |
</td>
|
| 69 |
+
<td>93.03
|
| 70 |
</td>
|
| 71 |
+
<td>98.80%
|
| 72 |
</td>
|
| 73 |
</tr>
|
| 74 |
</table>
|