Update quantized scripts.md

#4
by JunyanYang - opened
Files changed (1) hide show
  1. README.md +35 -13
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
  base_model:
3
- - Qwen/Qwen3.5-397B-A17B
4
  language:
5
  - en
6
  library_name: transformers
@@ -14,12 +14,12 @@ license_link: https://huggingface.co/Qwen/Qwen3.5-397B-A17B/blob/main/LICENSE
14
  - **Input:** Text
15
  - **Output:** Text
16
  - **Supported Hardware Microarchitecture:** AMD MI300 MI350/MI355
17
- - **ROCm**: 7.0
18
- - **PyTorch**: 2.8.0
19
- - **Transformers**: 5.2.0
20
  - **Operating System(s):** Linux
21
  - **Inference Engine:** [SGLang](https://docs.sglang.ai/)/[vLLM](https://docs.vllm.ai/en/latest/)
22
- - **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html) (v0.11)
23
  - **Weight quantization:** OCP MXFP4, Static
24
  - **Activation quantization:** OCP MXFP4, Dynamic
25
 
@@ -31,13 +31,35 @@ The model was quantized from [Qwen/Qwen3.5-397B-A17B-FP8](https://huggingface.co
31
 
32
  **Quantization scripts:**
33
  ```
34
- cd Quark/examples/torch/language_modeling/llm_ptq/
35
- export exclude_layers="lm_head model.visual.* mtp.* *mlp.gate *shared_expert_gate* *.linear_attn.* *.self_attn.* *.shared_expert.*"
36
- python3 quantize_quark.py --model_dir Qwen/Qwen3.5-397B-A17B-FP8 \
37
- --quant_scheme mxfp4 \
38
- --file2file_quantization \
39
- --exclude_layers $exclude_layers \
40
- --output_dir amd/Qwen3.5-397B-A17B-MXFP4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
  ```
42
  For further details or issues, please refer to the AMD-Quark documentation or contact the respective developers.
43
 
@@ -72,7 +94,7 @@ The model was evaluated on gsm8k benchmarks using the [vllm](https://github.com/
72
 
73
  ### Reproduction
74
 
75
- The GSM8K results were obtained using the vLLM framework, based on the Docker image `rocm/vllm-dev:nightly_main_20260211`, and vLLM is installed inside the container with fixes applied for model support.
76
 
77
  #### Evaluating model in a new terminal
78
  ```
 
1
  ---
2
  base_model:
3
+ - Qwen/Qwen3.5-397B-A17B-FP8
4
  language:
5
  - en
6
  library_name: transformers
 
14
  - **Input:** Text
15
  - **Output:** Text
16
  - **Supported Hardware Microarchitecture:** AMD MI300 MI350/MI355
17
+ - **ROCm**: 7.0.0
18
+ - **PyTorch**: 2.9.1
19
+ - **Transformers**: 5.3.0
20
  - **Operating System(s):** Linux
21
  - **Inference Engine:** [SGLang](https://docs.sglang.ai/)/[vLLM](https://docs.vllm.ai/en/latest/)
22
+ - **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html) (v0.11.1)
23
  - **Weight quantization:** OCP MXFP4, Static
24
  - **Activation quantization:** OCP MXFP4, Dynamic
25
 
 
31
 
32
  **Quantization scripts:**
33
  ```
34
+ import os
35
+ from quark.torch import LLMTemplate, ModelQuantizer
36
+ from quark.common.profiler import GlobalProfiler
37
+
38
+ # Register qwen3_5_moe template
39
+ qwen3_5_moe_template = LLMTemplate(
40
+ model_type="qwen3_5_moe",
41
+ kv_layers_name=["*k_proj", "*v_proj"],
42
+ q_layer_name="*q_proj"
43
+ )
44
+ LLMTemplate.register_template(qwen3_5_moe_template)
45
+
46
+ # Configuration
47
+ ckpt_path = "Qwen/Qwen3.5-397B-A17B-FP8"
48
+ output_dir = "amd/Qwen3.5-397B-A17B-MXFP4"
49
+ quant_scheme = "mxfp4"
50
+ exclude_layers = ["lm_head", "model.visual.*", "mtp.*", "*mlp.gate", "*shared_expert_gate*", "*.linear_attn.*", "*.self_attn.*", "*.shared_expert.*"]
51
+
52
+ # Get quant config from template
53
+ template = LLMTemplate.get("qwen3_5_moe")
54
+ quant_config = template.get_config(scheme=quant_scheme, exclude_layers=exclude_layers)
55
+
56
+ # Quantize with File-to-file mode
57
+ profiler = GlobalProfiler(output_path=os.path.join(output_dir, "quark_profile.yaml"))
58
+ quantizer = ModelQuantizer(quant_config)
59
+ quantizer.direct_quantize_checkpoint(
60
+ pretrained_model_path=ckpt_path,
61
+ save_path=output_dir,
62
+ )
63
  ```
64
  For further details or issues, please refer to the AMD-Quark documentation or contact the respective developers.
65
 
 
94
 
95
  ### Reproduction
96
 
97
+ The GSM8K results were obtained using the vLLM framework, based on the Docker image `rocm/vllm-dev:nightly_main_20260211`, and vLLM is installed inside the container.
98
 
99
  #### Evaluating model in a new terminal
100
  ```