0xSero commited on
Commit
6b78b3b
·
verified ·
1 Parent(s): d58f81e

Add files using upload-large-folder tool

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .gitattributes +2 -0
  2. README.md +135 -0
  3. chat_template.jinja +117 -0
  4. config.json +173 -0
  5. generation_config.json +13 -0
  6. model-00001-of-00056.safetensors +3 -0
  7. model-00002-of-00056.safetensors +3 -0
  8. model-00003-of-00056.safetensors +3 -0
  9. model-00004-of-00056.safetensors +3 -0
  10. model-00005-of-00056.safetensors +3 -0
  11. model-00006-of-00056.safetensors +3 -0
  12. model-00007-of-00056.safetensors +3 -0
  13. model-00008-of-00056.safetensors +3 -0
  14. model-00010-of-00056.safetensors +3 -0
  15. model-00012-of-00056.safetensors +3 -0
  16. model-00013-of-00056.safetensors +3 -0
  17. model-00014-of-00056.safetensors +3 -0
  18. model-00015-of-00056.safetensors +3 -0
  19. model-00016-of-00056.safetensors +3 -0
  20. model-00017-of-00056.safetensors +3 -0
  21. model-00018-of-00056.safetensors +3 -0
  22. model-00019-of-00056.safetensors +3 -0
  23. model-00020-of-00056.safetensors +3 -0
  24. model-00021-of-00056.safetensors +3 -0
  25. model-00022-of-00056.safetensors +3 -0
  26. model-00023-of-00056.safetensors +3 -0
  27. model-00024-of-00056.safetensors +3 -0
  28. model-00025-of-00056.safetensors +3 -0
  29. model-00026-of-00056.safetensors +3 -0
  30. model-00027-of-00056.safetensors +3 -0
  31. model-00028-of-00056.safetensors +3 -0
  32. model-00029-of-00056.safetensors +3 -0
  33. model-00031-of-00056.safetensors +3 -0
  34. model-00035-of-00056.safetensors +3 -0
  35. model-00037-of-00056.safetensors +3 -0
  36. model-00039-of-00056.safetensors +3 -0
  37. model-00041-of-00056.safetensors +3 -0
  38. model-00042-of-00056.safetensors +3 -0
  39. model-00043-of-00056.safetensors +3 -0
  40. model-00044-of-00056.safetensors +3 -0
  41. model-00048-of-00056.safetensors +3 -0
  42. model-00051-of-00056.safetensors +3 -0
  43. model-00052-of-00056.safetensors +3 -0
  44. model-00053-of-00056.safetensors +3 -0
  45. model-00054-of-00056.safetensors +3 -0
  46. model-00055-of-00056.safetensors +3 -0
  47. model-00056-of-00056.safetensors +3 -0
  48. model.safetensors.index.json +3 -0
  49. quantization_config.json +33 -0
  50. tokenizer.json +3 -0
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ model.safetensors.index.json filter=lfs diff=lfs merge=lfs -text
37
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,135 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: glm-5
4
+ license_link: https://huggingface.co/zai-org/GLM-5.1/blob/main/LICENSE
5
+ base_model: 0xSero/GLM-5.1-555B-A14B-REAP
6
+ tags:
7
+ - reap
8
+ - pruning
9
+ - moe
10
+ - expert-pruning
11
+ - glm
12
+ - gptq
13
+ - w4a16
14
+ - autoround
15
+ - vllm
16
+ library_name: transformers
17
+ pipeline_tag: text-generation
18
+ quantization_config:
19
+ quant_method: gptq
20
+ bits: 4
21
+ group_size: 128
22
+ sym: true
23
+ desc_act: false
24
+ checkpoint_format: gptq
25
+ ---
26
+
27
+ # GLM-5.1 — 25% Expert Pruned (REAP) — W4A16
28
+
29
+ This is a **GPTQ 4-bit weight-quantized** variant of the 25% expert-pruned [`zai-org/GLM-5.1`](https://huggingface.co/zai-org/GLM-5.1) using [REAP](https://github.com/CerebrasResearch/reap) (Relative Expert Activation Pruning), produced with [AutoRound](https://github.com/intel/auto-round) for learned rounding optimization.
30
+
31
+ | Property | Value |
32
+ |----------|-------|
33
+ | Base model | `zai-org/GLM-5.1` (744B MoE, 256 experts/layer) |
34
+ | Architecture | `GlmMoeDsaForCausalLM` (MoE + Dynamic Sparse Attention) |
35
+ | Routed experts | 256 → 192 (25% removed, 64 per layer) |
36
+ | Active params/token | ~14B (top-8 routing preserved) |
37
+ | Quantization | GPTQ W4A16 (int4 symmetric, group_size=128) |
38
+ | Quantizer | auto-round 0.12.2 (200 iterations, SignSGD) |
39
+ | Quantized size | **277 GB** (56 safetensor shards) |
40
+ | BF16 source | [`0xSero/GLM-5.1-555B-A14B-REAP`](https://huggingface.co/0xSero/GLM-5.1-555B-A14B-REAP) |
41
+ | GGUF variant | [`0xSero/GLM-5.1-555B-A14B-REAP-GGUF`](https://huggingface.co/0xSero/GLM-5.1-555B-A14B-REAP-GGUF) (325 GB, Q4_K_M) |
42
+
43
+ ## Benchmark Results (GGUF Q4_K_M, inference mode, temp=0.8)
44
+
45
+ The GPTQ W4A16 uses the same learned rounding method (AutoRound) as the GGUF Q4_K_M. Benchmark scores from the GGUF variant (zero repetition loops):
46
+
47
+ | Suite | Metric | Result | Repetition Loops |
48
+ |-------|--------|--------|-----------------|
49
+ | Terminal-Bench (50) | Proxy Pass | 44/50 (88%) | 0/50 |
50
+ | SWE-bench Pro (50) | Proxy Pass | 33/50 (66%) | 0/50 |
51
+ | GSM8K (50) | Correct | 30/50 (60%) | 0/50 |
52
+ | HLE (50) | Correct | 9/50 (18%) | 0/50 |
53
+
54
+ **Zero repetition loops across 220 benchmark probes.** The 25% prune retains 192/256 experts, providing enough expert diversity for stable generation at all sequence lengths.
55
+
56
+ ## How to Use
57
+
58
+ ### vLLM
59
+
60
+ ```python
61
+ from vllm import LLM, SamplingParams
62
+
63
+ llm = LLM(
64
+ model="0xSero/GLM-5.1-555B-A14B-REAP-GPTQ-W4A16",
65
+ tensor_parallel_size=4, # 4× B200 or 8× A100
66
+ max_model_len=8192,
67
+ trust_remote_code=True,
68
+ )
69
+
70
+ params = SamplingParams(temperature=0.8, max_tokens=4096)
71
+ outputs = llm.generate(["Hello, world!"], params)
72
+ ```
73
+
74
+ ### SGLang
75
+
76
+ ```bash
77
+ python -m sglang.launch_server \
78
+ --model-path 0xSero/GLM-5.1-555B-A14B-REAP-GPTQ-W4A16 \
79
+ --tp 4 \
80
+ --trust-remote-code
81
+ ```
82
+
83
+ ### Requires
84
+
85
+ - ~70-80 GiB VRAM per GPU across 4 GPUs (B200), or ~280 GiB total
86
+ - CUDA 12.8+ (sm_100a / Blackwell)
87
+ - vLLM >= 0.19.0 with `deep_gemm` installed (for DSA sparse attention)
88
+ - `trust_remote_code=True`
89
+
90
+ ## Quantization Details
91
+
92
+ **Method:** AutoRound W4A16 — learned rounding via SignSGD (200 iterations per layer), calibrated on 128 samples from NeelNanda/pile-10k at 2048 sequence length.
93
+
94
+ **Protected (kept at full precision):**
95
+ - Dense MLP layers 0-2 (`gate_proj`, `up_proj`, `down_proj`)
96
+ - DSA indexer (`weights_proj`)
97
+ - `lm_head`
98
+
99
+ **Quantized to int4 (43,971/44,059 linear layers):**
100
+ - All attention projections (`q_a_proj`, `q_b_proj`, `kv_a_proj`, `kv_b_proj`, `o_proj`)
101
+ - All routed MoE expert projections (192 experts × gate/up/down × 75 MoE layers)
102
+ - Shared expert projections
103
+
104
+ **GPTQ config:** `bits=4, group_size=128, sym=true, desc_act=false`
105
+
106
+ ## Why GPTQ over GGUF Q4_K_M?
107
+
108
+ | | GPTQ W4A16 (this) | GGUF Q4_K_M |
109
+ |---|---|---|
110
+ | Size | 277 GB | 325 GB |
111
+ | Serving | vLLM, SGLang, TGI (GPU) | llama.cpp (CPU/GPU hybrid) |
112
+ | Quant method | Learned rounding (SignSGD) | K-means clustering |
113
+ | Throughput | Higher (GPU-native kernels) | Lower |
114
+ | Best for | Production GPU serving | Local inference, edge |
115
+
116
+ GPTQ packs 4-bit weights more efficiently with `group_size=128` symmetric quantization, resulting in a smaller checkpoint than GGUF Q4_K_M at the same bit-width.
117
+
118
+ ## Related Models
119
+
120
+ | Model | Prune % | Experts | Format | Size | Status |
121
+ |-------|---------|---------|--------|------|--------|
122
+ | [`0xSero/GLM-5.1-555B-A14B-REAP`](https://huggingface.co/0xSero/GLM-5.1-555B-A14B-REAP) | 25% | 192/256 | BF16 | 1.1T | Source checkpoint |
123
+ | [`0xSero/GLM-5.1-555B-A14B-REAP-GGUF`](https://huggingface.co/0xSero/GLM-5.1-555B-A14B-REAP-GGUF) | 25% | 192/256 | GGUF Q4_K_M | 325G | llama.cpp serving |
124
+ | **This model** | **25%** | **192/256** | **GPTQ W4A16** | **277G** | **vLLM/SGLang serving** |
125
+ | [`0xSero/GLM-5.1-444B-A14B-REAP`](https://huggingface.co/0xSero/GLM-5.1-444B-A14B-REAP) | 40% | 154/256 | BF16 | 910G | Has repetition issues — use 25% |
126
+
127
+ ## Support This Work
128
+
129
+ If you find these models useful, please consider supporting continued open-source model compression research:
130
+
131
+ **[donate.sybilsolutions.ai](https://donate.sybilsolutions.ai)**
132
+
133
+ ## Citation
134
+
135
+ If you use this model, please cite the [REAP paper](https://github.com/CerebrasResearch/reap) and [AutoRound](https://github.com/intel/auto-round).
chat_template.jinja ADDED
@@ -0,0 +1,117 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [gMASK]<sop>
2
+ {%- if tools -%}
3
+ {%- macro tool_to_json(tool) -%}
4
+ {%- set ns_tool = namespace(first=true) -%}
5
+ {{ '{' -}}
6
+ {%- for k, v in tool.items() -%}
7
+ {%- if k != 'defer_loading' and k != 'strict' -%}
8
+ {%- if not ns_tool.first -%}{{- ', ' -}}{%- endif -%}
9
+ {%- set ns_tool.first = false -%}
10
+ "{{ k }}": {{ v | tojson(ensure_ascii=False) }}
11
+ {%- endif -%}
12
+ {%- endfor -%}
13
+ {{- '}' -}}
14
+ {%- endmacro -%}
15
+ <|system|>
16
+ # Tools
17
+
18
+ You may call one or more functions to assist with the user query.
19
+
20
+ You are provided with function signatures within <tools></tools> XML tags:
21
+ <tools>
22
+ {% for tool in tools %}
23
+ {%- if 'function' in tool -%}
24
+ {%- set tool = tool['function'] -%}
25
+ {%- endif -%}
26
+ {% if tool.defer_loading is not defined or not tool.defer_loading %}
27
+ {{ tool_to_json(tool) }}
28
+ {% endif %}
29
+ {% endfor %}
30
+ </tools>
31
+
32
+ For each function call, output the function name and arguments within the following XML format:
33
+ <tool_call>{function-name}<arg_key>{arg-key-1}</arg_key><arg_value>{arg-value-1}</arg_value><arg_key>{arg-key-2}</arg_key><arg_value>{arg-value-2}</arg_value>...</tool_call>{%- endif -%}
34
+ {%- macro visible_text(content) -%}
35
+ {%- if content is string -%}
36
+ {{- content }}
37
+ {%- elif content is iterable and content is not mapping -%}
38
+ {%- for item in content -%}
39
+ {%- if item is mapping and item.type == 'text' -%}
40
+ {{- item.text }}
41
+ {%- elif item is string -%}
42
+ {{- item }}
43
+ {%- endif -%}
44
+ {%- endfor -%}
45
+ {%- else -%}
46
+ {{- content }}
47
+ {%- endif -%}
48
+ {%- endmacro -%}
49
+ {%- set ns = namespace(last_user_index=-1, thinking_indices='') -%}
50
+ {%- for m in messages %}
51
+ {%- if m.role == 'user' %}
52
+ {%- set ns.last_user_index = loop.index0 -%}
53
+ {%- elif m.role == 'assistant' %}
54
+ {%- if m.reasoning_content is string %}
55
+ {%- set ns.thinking_indices = ns.thinking_indices ~ ',' ~ ns.last_user_index ~ ',' -%}
56
+ {%- endif %}
57
+ {%- endif %}
58
+ {%- endfor %}
59
+ {%- set ns.has_thinking = false -%}
60
+ {%- for m in messages -%}
61
+ {%- if m.role == 'user' -%}<|user|>{{ visible_text(m.content) }}{% set ns.has_thinking = (',' ~ loop.index0 ~ ',') in ns.thinking_indices -%}
62
+ {%- elif m.role == 'assistant' -%}
63
+ <|assistant|>
64
+ {%- set content = visible_text(m.content) %}
65
+ {%- if m.reasoning_content is string %}
66
+ {%- set reasoning_content = m.reasoning_content %}
67
+ {%- elif '</think>' in content %}
68
+ {%- set reasoning_content = content.split('</think>')[0].split('<think>')[-1] %}
69
+ {%- set content = content.split('</think>')[-1] %}
70
+ {%- elif loop.index0 > ns.last_user_index and not (enable_thinking is defined and not enable_thinking) %}
71
+ {%- set reasoning_content = '' %}
72
+ {%- elif loop.index0 < ns.last_user_index and ns.has_thinking %}
73
+ {%- set reasoning_content = '' %}
74
+ {%- endif %}
75
+ {%- if ((clear_thinking is defined and not clear_thinking) or loop.index0 > ns.last_user_index) and reasoning_content is defined -%}
76
+ {{ '<think>' + reasoning_content + '</think>'}}
77
+ {%- else -%}
78
+ {{ '</think>' }}
79
+ {%- endif -%}
80
+ {%- if content.strip() -%}
81
+ {{ content.strip() }}
82
+ {%- endif -%}
83
+ {% if m.tool_calls %}
84
+ {% for tc in m.tool_calls %}
85
+ {%- if tc.function %}
86
+ {%- set tc = tc.function %}
87
+ {%- endif %}
88
+ {{- '<tool_call>' + tc.name -}}
89
+ {% set _args = tc.arguments %}{% for k, v in _args.items() %}<arg_key>{{ k }}</arg_key><arg_value>{{ v | tojson(ensure_ascii=False) if v is not string else v }}</arg_value>{% endfor %}</tool_call>{% endfor %}
90
+ {% endif %}
91
+ {%- elif m.role == 'tool' -%}
92
+ {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
93
+ {{- '<|observation|>' -}}
94
+ {%- endif %}
95
+ {%- if m.content is string -%}
96
+ {{- '<tool_response>' + m.content + '</tool_response>' -}}
97
+ {%- else -%}
98
+ {{- '<tool_response><tools>\n' -}}
99
+ {% for tr in m.content %}
100
+ {%- for tool in tools -%}
101
+ {%- if 'function' in tool -%}
102
+ {%- set tool = tool['function'] -%}
103
+ {%- endif -%}
104
+ {%- if tool.name == tr.name -%}
105
+ {{- tool_to_json(tool) + '\n' -}}
106
+ {%- endif -%}
107
+ {%- endfor -%}
108
+ {%- endfor -%}
109
+ {{- '</tools></tool_response>' -}}
110
+ {% endif -%}
111
+ {%- elif m.role == 'system' -%}
112
+ <|system|>{{ visible_text(m.content) }}
113
+ {%- endif -%}
114
+ {%- endfor -%}
115
+ {%- if add_generation_prompt -%}
116
+ <|assistant|>{{- '</think>' if (enable_thinking is defined and not enable_thinking) else '<think>' -}}
117
+ {%- endif -%}
config.json ADDED
@@ -0,0 +1,173 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "GlmMoeDsaForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 0,
8
+ "dtype": "float16",
9
+ "eos_token_id": [
10
+ 154820,
11
+ 154827,
12
+ 154829
13
+ ],
14
+ "ep_size": 1,
15
+ "first_k_dense_replace": 3,
16
+ "hidden_act": "silu",
17
+ "hidden_size": 6144,
18
+ "index_head_dim": 128,
19
+ "index_n_heads": 32,
20
+ "index_topk": 2048,
21
+ "indexer_rope_interleave": true,
22
+ "initializer_range": 0.02,
23
+ "intermediate_size": 12288,
24
+ "kv_lora_rank": 512,
25
+ "max_position_embeddings": 202752,
26
+ "mlp_layer_types": [
27
+ "dense",
28
+ "dense",
29
+ "dense",
30
+ "sparse",
31
+ "sparse",
32
+ "sparse",
33
+ "sparse",
34
+ "sparse",
35
+ "sparse",
36
+ "sparse",
37
+ "sparse",
38
+ "sparse",
39
+ "sparse",
40
+ "sparse",
41
+ "sparse",
42
+ "sparse",
43
+ "sparse",
44
+ "sparse",
45
+ "sparse",
46
+ "sparse",
47
+ "sparse",
48
+ "sparse",
49
+ "sparse",
50
+ "sparse",
51
+ "sparse",
52
+ "sparse",
53
+ "sparse",
54
+ "sparse",
55
+ "sparse",
56
+ "sparse",
57
+ "sparse",
58
+ "sparse",
59
+ "sparse",
60
+ "sparse",
61
+ "sparse",
62
+ "sparse",
63
+ "sparse",
64
+ "sparse",
65
+ "sparse",
66
+ "sparse",
67
+ "sparse",
68
+ "sparse",
69
+ "sparse",
70
+ "sparse",
71
+ "sparse",
72
+ "sparse",
73
+ "sparse",
74
+ "sparse",
75
+ "sparse",
76
+ "sparse",
77
+ "sparse",
78
+ "sparse",
79
+ "sparse",
80
+ "sparse",
81
+ "sparse",
82
+ "sparse",
83
+ "sparse",
84
+ "sparse",
85
+ "sparse",
86
+ "sparse",
87
+ "sparse",
88
+ "sparse",
89
+ "sparse",
90
+ "sparse",
91
+ "sparse",
92
+ "sparse",
93
+ "sparse",
94
+ "sparse",
95
+ "sparse",
96
+ "sparse",
97
+ "sparse",
98
+ "sparse",
99
+ "sparse",
100
+ "sparse",
101
+ "sparse",
102
+ "sparse",
103
+ "sparse",
104
+ "sparse"
105
+ ],
106
+ "model_type": "glm_moe_dsa",
107
+ "moe_intermediate_size": 2048,
108
+ "moe_layer_freq": 1,
109
+ "n_group": 1,
110
+ "n_routed_experts": 192,
111
+ "n_shared_experts": 1,
112
+ "norm_topk_prob": true,
113
+ "num_attention_heads": 64,
114
+ "num_experts_per_tok": 8,
115
+ "num_hidden_layers": 78,
116
+ "num_key_value_heads": 64,
117
+ "num_nextn_predict_layers": 1,
118
+ "pad_token_id": 154820,
119
+ "pretraining_tp": 1,
120
+ "q_lora_rank": 2048,
121
+ "qk_head_dim": 256,
122
+ "qk_nope_head_dim": 192,
123
+ "qk_rope_head_dim": 64,
124
+ "quantization_config": {
125
+ "autoround_version": "0.12.2",
126
+ "bits": 4,
127
+ "damp_percent": 0.01,
128
+ "data_type": "int",
129
+ "desc_act": false,
130
+ "dynamic": {
131
+ "-:.*layers\\.0\\.mlp.*": {},
132
+ "-:.*layers\\.1\\.mlp.*": {},
133
+ "-:.*layers\\.2\\.mlp.*": {},
134
+ "-:.*weights_proj.*": {}
135
+ },
136
+ "group_size": 128,
137
+ "iters": 10,
138
+ "lm_head": false,
139
+ "low_gpu_mem_usage": true,
140
+ "modules_in_block_to_quantize": [
141
+ [
142
+ "self_attn.q_a_proj",
143
+ "self_attn.q_b_proj",
144
+ "self_attn.kv_a_proj_with_mqa",
145
+ "self_attn.kv_b_proj",
146
+ "self_attn.o_proj",
147
+ "self_attn.indexer.wq_b",
148
+ "self_attn.indexer.wk"
149
+ ]
150
+ ],
151
+ "nsamples": 64,
152
+ "provider": "auto-round",
153
+ "quant_method": "gptq",
154
+ "sym": true,
155
+ "true_sequential": false
156
+ },
157
+ "rms_norm_eps": 1e-05,
158
+ "rope_interleave": true,
159
+ "rope_parameters": {
160
+ "rope_theta": 1000000,
161
+ "rope_type": "default"
162
+ },
163
+ "routed_scaling_factor": 2.5,
164
+ "scoring_func": "sigmoid",
165
+ "tie_word_embeddings": false,
166
+ "topk_group": 1,
167
+ "topk_method": "noaux_tc",
168
+ "transformers_version": "5.4.0",
169
+ "use_cache": true,
170
+ "v_head_dim": 256,
171
+ "vocab_size": 154880,
172
+ "torch_dtype": "float16"
173
+ }
generation_config.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "do_sample": true,
4
+ "eos_token_id": [
5
+ 154820,
6
+ 154827,
7
+ 154829
8
+ ],
9
+ "pad_token_id": 154820,
10
+ "temperature": 1.0,
11
+ "top_p": 0.95,
12
+ "transformers_version": "5.4.0"
13
+ }
model-00001-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bc0b41b9d8b5a4fc47f512025a96cc43cc1c4eaecff9885c38f5157611a1b39a
3
+ size 5368863504
model-00002-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f5407cd33cf64335dc8ddc9991d95f3adcece4c47a01548d2415afdc50cc1d4d
3
+ size 5366777928
model-00003-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:93fd9e305a8b3348d43327fe15d38f656f89730074f864c6e234dec80ccddf1e
3
+ size 5365123192
model-00004-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ec3daceebbb4ac68dc4ed9ec32a08c53bf21ea3177b10ad6d0dc0ff1fc12211a
3
+ size 5366762120
model-00005-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5bcf24af0cbb662b66e2451720654b5a487cc0a01e94927f41e3264eca30fc94
3
+ size 5365122736
model-00006-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cbb95f06dc2d555d3331c67016f5f1916c8a6652869100bb64350483bb6b3c29
3
+ size 5365125536
model-00007-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c52d79f6973f24772b3f2a045000baead3240fb4d5cf216489ed4b78c455908b
3
+ size 5366781408
model-00008-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:03c5ad78f2414e3ec4cdc8f2b6bf1a1e915c6f40517ff42b49b5848016a1732f
3
+ size 5365126200
model-00010-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:42d04e45759c1e73f4d2c00571de9fe8e3c33d4edfafed9eadc02a2f8331f4b8
3
+ size 5364420784
model-00012-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ab84b48e87a4a03bccd3e3289680630faca12ce27b5edbe22687d71f612389b3
3
+ size 5366765272
model-00013-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:40332bf3e76a22a4a3414d51599369d0e546c07c41753aee2b8d13a7efb736af
3
+ size 5365125992
model-00014-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:777221fa9409dc86784ffe3d9a82773d6f6208d4795e084e4df9f58b00fdc884
3
+ size 5365126856
model-00015-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cd3832f059f1847182ead84a5411408e525d5fc030934afea24ce3cc29c841bb
3
+ size 5366781328
model-00016-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:062120e76615dbecd4fe97263a3b5acff20953227c66f92e56d3060888820a64
3
+ size 5365126288
model-00017-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f67d6ef4c7086302378e589cf2d1fe43773300fa3dc99bb4882cb2c90790fcca
3
+ size 5366781992
model-00018-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7aa4eeccd31a40954103438c3e6ff7454285c497d8ed4601ffca8315a143e6fb
3
+ size 5365125896
model-00019-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:243c6b1c0afd00402c2cd2c878d1c7c8b4ff3d4473cce5d219041a0080ab7b54
3
+ size 5365126616
model-00020-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3e6f7b7e38bbd25a554fa8a000ddc3f9104dac23dc87447fb1cac3dce4005387
3
+ size 5366765184
model-00021-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:313a283104c1dd31f88a6b4a584e261c7a88cc9a8d9a69257d881be409ce70a5
3
+ size 5365126040
model-00022-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c48a2163264bdbc3f6715a4e3cd2b7a2830f87335d24192c228b5ddc4fd790d1
3
+ size 5365126896
model-00023-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bdd55c524373b979ad9a0c9ae5cbed491b72fd5dc9e825198c7d4766e4548d37
3
+ size 5366781240
model-00024-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8e133c898cb4feb048f2886c84e61b1422538841fb50b0d08b232b6d5fe92a9d
3
+ size 5365126368
model-00025-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ad68a187b3806a1139e45906a196c55d20cbbafb3102893d2322167fda707879
3
+ size 5366781824
model-00026-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:be0acf530078c33fea84d4848d831f6e05ebeef7e3e04c025e53e8a8aa6bb53a
3
+ size 5365125976
model-00027-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b86d4a1b21100a11ebb6d11cbdeb4bef4ec91b3d50222f4e8e2a804f4db74658
3
+ size 5365126696
model-00028-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c890b8ff80e5a4d5c772a274329820d4e25cc97f9088227fdc00463f1223b678
3
+ size 5366765104
model-00029-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:be08fab49f5a2550909863f3445a67cf50f688b03d694b15e1bad57bfcb861e2
3
+ size 5365126128
model-00031-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e08b0cb93ad9f5ab2a00ff6355845bf90b923f26362d8f6d505211165a134d4e
3
+ size 5366781160
model-00035-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:92365dc498f68b23466dac5f120a7fe8eb3635a35970118b2bfd9c2b6a78a688
3
+ size 5365126784
model-00037-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d0969ecfb0719093db27839ba8fe827ea4d143069f136b6a3c45e2110e14c1fa
3
+ size 5365126208
model-00039-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:49be9918b3fb8eb9eab2c065fcd2fed185b35819452035c7abfa310e94c453c1
3
+ size 5364395952
model-00041-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5c160cfbe16446404c42523e908f06a793d9e2e2378d049f25e317ab62c453a0
3
+ size 5366781640
model-00042-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:46d222643b5178e0a85d3ba0fa2880f1c1ac4154da1a6b23822fb274cddc1232
3
+ size 5365125992
model-00043-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dc857c800d14d934e9e5b6abe97c23d2386d10f0241e9ab00acde97ba2cff891
3
+ size 5365126864
model-00044-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:60910b36b643a2065e239a97425fa7ddd193cad9f4ee63e6f1eb6c849346dbaa
3
+ size 5366764936
model-00048-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f455493e6bbad6f85482898d29b55501fb106628242d19994edb7d56f6f1884c
3
+ size 5365126624
model-00051-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:449d589713e044c3917a0b02dd946746e2c76882c1aa3e68523dd07a2e8a9ac9
3
+ size 5365126896
model-00052-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bef112fcbc0ab6ef31571f1b91b9709747699d4baa4df86df3d5e42bd61b0017
3
+ size 5366764848
model-00053-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7080abe8695b5a6c4d608c97e52b890efeffb73debc3c2cd9989c152f0c0c375
3
+ size 5365126376
model-00054-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d3a56fc554628b11932876c06cf8b52dd02fc743a861a51719134c1cc0940087
3
+ size 5366781816
model-00055-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0f687ee3b81b456f6972d398bbc89384afe4c1527ccac1fde2ab8caf2a9d1dc9
3
+ size 3615185888
model-00056-of-00056.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:165264873338892852fbd6fe42cee3dab9ea21a36df5207f734f821eb7886005
3
+ size 3806343464
model.safetensors.index.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a2dec232545a01b3f146bb9f475afc5b1a5d08f1a5829e78ca7c7f8399ed222d
3
+ size 16004846
quantization_config.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bits": 4,
3
+ "data_type": "int",
4
+ "group_size": 128,
5
+ "sym": true,
6
+ "iters": 10,
7
+ "low_gpu_mem_usage": true,
8
+ "nsamples": 64,
9
+ "autoround_version": "0.12.2",
10
+ "dynamic": {
11
+ "-:.*layers\\.0\\.mlp.*": {},
12
+ "-:.*layers\\.1\\.mlp.*": {},
13
+ "-:.*layers\\.2\\.mlp.*": {},
14
+ "-:.*weights_proj.*": {}
15
+ },
16
+ "lm_head": false,
17
+ "provider": "auto-round",
18
+ "quant_method": "gptq",
19
+ "desc_act": false,
20
+ "true_sequential": false,
21
+ "damp_percent": 0.01,
22
+ "modules_in_block_to_quantize": [
23
+ [
24
+ "self_attn.q_a_proj",
25
+ "self_attn.q_b_proj",
26
+ "self_attn.kv_a_proj_with_mqa",
27
+ "self_attn.kv_b_proj",
28
+ "self_attn.o_proj",
29
+ "self_attn.indexer.wq_b",
30
+ "self_attn.indexer.wk"
31
+ ]
32
+ ]
33
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:19e773648cb4e65de8660ea6365e10acca112d42a854923df93db4a6f333a82d
3
+ size 20217442