Add files using upload-large-folder tool
Browse filesThis view is limited to 50 files because it contains too many changes. See raw diff
- .gitattributes +2 -0
- README.md +135 -0
- chat_template.jinja +117 -0
- config.json +173 -0
- generation_config.json +13 -0
- model-00001-of-00056.safetensors +3 -0
- model-00002-of-00056.safetensors +3 -0
- model-00003-of-00056.safetensors +3 -0
- model-00004-of-00056.safetensors +3 -0
- model-00005-of-00056.safetensors +3 -0
- model-00006-of-00056.safetensors +3 -0
- model-00007-of-00056.safetensors +3 -0
- model-00008-of-00056.safetensors +3 -0
- model-00010-of-00056.safetensors +3 -0
- model-00012-of-00056.safetensors +3 -0
- model-00013-of-00056.safetensors +3 -0
- model-00014-of-00056.safetensors +3 -0
- model-00015-of-00056.safetensors +3 -0
- model-00016-of-00056.safetensors +3 -0
- model-00017-of-00056.safetensors +3 -0
- model-00018-of-00056.safetensors +3 -0
- model-00019-of-00056.safetensors +3 -0
- model-00020-of-00056.safetensors +3 -0
- model-00021-of-00056.safetensors +3 -0
- model-00022-of-00056.safetensors +3 -0
- model-00023-of-00056.safetensors +3 -0
- model-00024-of-00056.safetensors +3 -0
- model-00025-of-00056.safetensors +3 -0
- model-00026-of-00056.safetensors +3 -0
- model-00027-of-00056.safetensors +3 -0
- model-00028-of-00056.safetensors +3 -0
- model-00029-of-00056.safetensors +3 -0
- model-00031-of-00056.safetensors +3 -0
- model-00035-of-00056.safetensors +3 -0
- model-00037-of-00056.safetensors +3 -0
- model-00039-of-00056.safetensors +3 -0
- model-00041-of-00056.safetensors +3 -0
- model-00042-of-00056.safetensors +3 -0
- model-00043-of-00056.safetensors +3 -0
- model-00044-of-00056.safetensors +3 -0
- model-00048-of-00056.safetensors +3 -0
- model-00051-of-00056.safetensors +3 -0
- model-00052-of-00056.safetensors +3 -0
- model-00053-of-00056.safetensors +3 -0
- model-00054-of-00056.safetensors +3 -0
- model-00055-of-00056.safetensors +3 -0
- model-00056-of-00056.safetensors +3 -0
- model.safetensors.index.json +3 -0
- quantization_config.json +33 -0
- tokenizer.json +3 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
model.safetensors.index.json filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
|
@@ -0,0 +1,135 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: other
|
| 3 |
+
license_name: glm-5
|
| 4 |
+
license_link: https://huggingface.co/zai-org/GLM-5.1/blob/main/LICENSE
|
| 5 |
+
base_model: 0xSero/GLM-5.1-555B-A14B-REAP
|
| 6 |
+
tags:
|
| 7 |
+
- reap
|
| 8 |
+
- pruning
|
| 9 |
+
- moe
|
| 10 |
+
- expert-pruning
|
| 11 |
+
- glm
|
| 12 |
+
- gptq
|
| 13 |
+
- w4a16
|
| 14 |
+
- autoround
|
| 15 |
+
- vllm
|
| 16 |
+
library_name: transformers
|
| 17 |
+
pipeline_tag: text-generation
|
| 18 |
+
quantization_config:
|
| 19 |
+
quant_method: gptq
|
| 20 |
+
bits: 4
|
| 21 |
+
group_size: 128
|
| 22 |
+
sym: true
|
| 23 |
+
desc_act: false
|
| 24 |
+
checkpoint_format: gptq
|
| 25 |
+
---
|
| 26 |
+
|
| 27 |
+
# GLM-5.1 — 25% Expert Pruned (REAP) — W4A16
|
| 28 |
+
|
| 29 |
+
This is a **GPTQ 4-bit weight-quantized** variant of the 25% expert-pruned [`zai-org/GLM-5.1`](https://huggingface.co/zai-org/GLM-5.1) using [REAP](https://github.com/CerebrasResearch/reap) (Relative Expert Activation Pruning), produced with [AutoRound](https://github.com/intel/auto-round) for learned rounding optimization.
|
| 30 |
+
|
| 31 |
+
| Property | Value |
|
| 32 |
+
|----------|-------|
|
| 33 |
+
| Base model | `zai-org/GLM-5.1` (744B MoE, 256 experts/layer) |
|
| 34 |
+
| Architecture | `GlmMoeDsaForCausalLM` (MoE + Dynamic Sparse Attention) |
|
| 35 |
+
| Routed experts | 256 → 192 (25% removed, 64 per layer) |
|
| 36 |
+
| Active params/token | ~14B (top-8 routing preserved) |
|
| 37 |
+
| Quantization | GPTQ W4A16 (int4 symmetric, group_size=128) |
|
| 38 |
+
| Quantizer | auto-round 0.12.2 (200 iterations, SignSGD) |
|
| 39 |
+
| Quantized size | **277 GB** (56 safetensor shards) |
|
| 40 |
+
| BF16 source | [`0xSero/GLM-5.1-555B-A14B-REAP`](https://huggingface.co/0xSero/GLM-5.1-555B-A14B-REAP) |
|
| 41 |
+
| GGUF variant | [`0xSero/GLM-5.1-555B-A14B-REAP-GGUF`](https://huggingface.co/0xSero/GLM-5.1-555B-A14B-REAP-GGUF) (325 GB, Q4_K_M) |
|
| 42 |
+
|
| 43 |
+
## Benchmark Results (GGUF Q4_K_M, inference mode, temp=0.8)
|
| 44 |
+
|
| 45 |
+
The GPTQ W4A16 uses the same learned rounding method (AutoRound) as the GGUF Q4_K_M. Benchmark scores from the GGUF variant (zero repetition loops):
|
| 46 |
+
|
| 47 |
+
| Suite | Metric | Result | Repetition Loops |
|
| 48 |
+
|-------|--------|--------|-----------------|
|
| 49 |
+
| Terminal-Bench (50) | Proxy Pass | 44/50 (88%) | 0/50 |
|
| 50 |
+
| SWE-bench Pro (50) | Proxy Pass | 33/50 (66%) | 0/50 |
|
| 51 |
+
| GSM8K (50) | Correct | 30/50 (60%) | 0/50 |
|
| 52 |
+
| HLE (50) | Correct | 9/50 (18%) | 0/50 |
|
| 53 |
+
|
| 54 |
+
**Zero repetition loops across 220 benchmark probes.** The 25% prune retains 192/256 experts, providing enough expert diversity for stable generation at all sequence lengths.
|
| 55 |
+
|
| 56 |
+
## How to Use
|
| 57 |
+
|
| 58 |
+
### vLLM
|
| 59 |
+
|
| 60 |
+
```python
|
| 61 |
+
from vllm import LLM, SamplingParams
|
| 62 |
+
|
| 63 |
+
llm = LLM(
|
| 64 |
+
model="0xSero/GLM-5.1-555B-A14B-REAP-GPTQ-W4A16",
|
| 65 |
+
tensor_parallel_size=4, # 4× B200 or 8× A100
|
| 66 |
+
max_model_len=8192,
|
| 67 |
+
trust_remote_code=True,
|
| 68 |
+
)
|
| 69 |
+
|
| 70 |
+
params = SamplingParams(temperature=0.8, max_tokens=4096)
|
| 71 |
+
outputs = llm.generate(["Hello, world!"], params)
|
| 72 |
+
```
|
| 73 |
+
|
| 74 |
+
### SGLang
|
| 75 |
+
|
| 76 |
+
```bash
|
| 77 |
+
python -m sglang.launch_server \
|
| 78 |
+
--model-path 0xSero/GLM-5.1-555B-A14B-REAP-GPTQ-W4A16 \
|
| 79 |
+
--tp 4 \
|
| 80 |
+
--trust-remote-code
|
| 81 |
+
```
|
| 82 |
+
|
| 83 |
+
### Requires
|
| 84 |
+
|
| 85 |
+
- ~70-80 GiB VRAM per GPU across 4 GPUs (B200), or ~280 GiB total
|
| 86 |
+
- CUDA 12.8+ (sm_100a / Blackwell)
|
| 87 |
+
- vLLM >= 0.19.0 with `deep_gemm` installed (for DSA sparse attention)
|
| 88 |
+
- `trust_remote_code=True`
|
| 89 |
+
|
| 90 |
+
## Quantization Details
|
| 91 |
+
|
| 92 |
+
**Method:** AutoRound W4A16 — learned rounding via SignSGD (200 iterations per layer), calibrated on 128 samples from NeelNanda/pile-10k at 2048 sequence length.
|
| 93 |
+
|
| 94 |
+
**Protected (kept at full precision):**
|
| 95 |
+
- Dense MLP layers 0-2 (`gate_proj`, `up_proj`, `down_proj`)
|
| 96 |
+
- DSA indexer (`weights_proj`)
|
| 97 |
+
- `lm_head`
|
| 98 |
+
|
| 99 |
+
**Quantized to int4 (43,971/44,059 linear layers):**
|
| 100 |
+
- All attention projections (`q_a_proj`, `q_b_proj`, `kv_a_proj`, `kv_b_proj`, `o_proj`)
|
| 101 |
+
- All routed MoE expert projections (192 experts × gate/up/down × 75 MoE layers)
|
| 102 |
+
- Shared expert projections
|
| 103 |
+
|
| 104 |
+
**GPTQ config:** `bits=4, group_size=128, sym=true, desc_act=false`
|
| 105 |
+
|
| 106 |
+
## Why GPTQ over GGUF Q4_K_M?
|
| 107 |
+
|
| 108 |
+
| | GPTQ W4A16 (this) | GGUF Q4_K_M |
|
| 109 |
+
|---|---|---|
|
| 110 |
+
| Size | 277 GB | 325 GB |
|
| 111 |
+
| Serving | vLLM, SGLang, TGI (GPU) | llama.cpp (CPU/GPU hybrid) |
|
| 112 |
+
| Quant method | Learned rounding (SignSGD) | K-means clustering |
|
| 113 |
+
| Throughput | Higher (GPU-native kernels) | Lower |
|
| 114 |
+
| Best for | Production GPU serving | Local inference, edge |
|
| 115 |
+
|
| 116 |
+
GPTQ packs 4-bit weights more efficiently with `group_size=128` symmetric quantization, resulting in a smaller checkpoint than GGUF Q4_K_M at the same bit-width.
|
| 117 |
+
|
| 118 |
+
## Related Models
|
| 119 |
+
|
| 120 |
+
| Model | Prune % | Experts | Format | Size | Status |
|
| 121 |
+
|-------|---------|---------|--------|------|--------|
|
| 122 |
+
| [`0xSero/GLM-5.1-555B-A14B-REAP`](https://huggingface.co/0xSero/GLM-5.1-555B-A14B-REAP) | 25% | 192/256 | BF16 | 1.1T | Source checkpoint |
|
| 123 |
+
| [`0xSero/GLM-5.1-555B-A14B-REAP-GGUF`](https://huggingface.co/0xSero/GLM-5.1-555B-A14B-REAP-GGUF) | 25% | 192/256 | GGUF Q4_K_M | 325G | llama.cpp serving |
|
| 124 |
+
| **This model** | **25%** | **192/256** | **GPTQ W4A16** | **277G** | **vLLM/SGLang serving** |
|
| 125 |
+
| [`0xSero/GLM-5.1-444B-A14B-REAP`](https://huggingface.co/0xSero/GLM-5.1-444B-A14B-REAP) | 40% | 154/256 | BF16 | 910G | Has repetition issues — use 25% |
|
| 126 |
+
|
| 127 |
+
## Support This Work
|
| 128 |
+
|
| 129 |
+
If you find these models useful, please consider supporting continued open-source model compression research:
|
| 130 |
+
|
| 131 |
+
**[donate.sybilsolutions.ai](https://donate.sybilsolutions.ai)**
|
| 132 |
+
|
| 133 |
+
## Citation
|
| 134 |
+
|
| 135 |
+
If you use this model, please cite the [REAP paper](https://github.com/CerebrasResearch/reap) and [AutoRound](https://github.com/intel/auto-round).
|
chat_template.jinja
ADDED
|
@@ -0,0 +1,117 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[gMASK]<sop>
|
| 2 |
+
{%- if tools -%}
|
| 3 |
+
{%- macro tool_to_json(tool) -%}
|
| 4 |
+
{%- set ns_tool = namespace(first=true) -%}
|
| 5 |
+
{{ '{' -}}
|
| 6 |
+
{%- for k, v in tool.items() -%}
|
| 7 |
+
{%- if k != 'defer_loading' and k != 'strict' -%}
|
| 8 |
+
{%- if not ns_tool.first -%}{{- ', ' -}}{%- endif -%}
|
| 9 |
+
{%- set ns_tool.first = false -%}
|
| 10 |
+
"{{ k }}": {{ v | tojson(ensure_ascii=False) }}
|
| 11 |
+
{%- endif -%}
|
| 12 |
+
{%- endfor -%}
|
| 13 |
+
{{- '}' -}}
|
| 14 |
+
{%- endmacro -%}
|
| 15 |
+
<|system|>
|
| 16 |
+
# Tools
|
| 17 |
+
|
| 18 |
+
You may call one or more functions to assist with the user query.
|
| 19 |
+
|
| 20 |
+
You are provided with function signatures within <tools></tools> XML tags:
|
| 21 |
+
<tools>
|
| 22 |
+
{% for tool in tools %}
|
| 23 |
+
{%- if 'function' in tool -%}
|
| 24 |
+
{%- set tool = tool['function'] -%}
|
| 25 |
+
{%- endif -%}
|
| 26 |
+
{% if tool.defer_loading is not defined or not tool.defer_loading %}
|
| 27 |
+
{{ tool_to_json(tool) }}
|
| 28 |
+
{% endif %}
|
| 29 |
+
{% endfor %}
|
| 30 |
+
</tools>
|
| 31 |
+
|
| 32 |
+
For each function call, output the function name and arguments within the following XML format:
|
| 33 |
+
<tool_call>{function-name}<arg_key>{arg-key-1}</arg_key><arg_value>{arg-value-1}</arg_value><arg_key>{arg-key-2}</arg_key><arg_value>{arg-value-2}</arg_value>...</tool_call>{%- endif -%}
|
| 34 |
+
{%- macro visible_text(content) -%}
|
| 35 |
+
{%- if content is string -%}
|
| 36 |
+
{{- content }}
|
| 37 |
+
{%- elif content is iterable and content is not mapping -%}
|
| 38 |
+
{%- for item in content -%}
|
| 39 |
+
{%- if item is mapping and item.type == 'text' -%}
|
| 40 |
+
{{- item.text }}
|
| 41 |
+
{%- elif item is string -%}
|
| 42 |
+
{{- item }}
|
| 43 |
+
{%- endif -%}
|
| 44 |
+
{%- endfor -%}
|
| 45 |
+
{%- else -%}
|
| 46 |
+
{{- content }}
|
| 47 |
+
{%- endif -%}
|
| 48 |
+
{%- endmacro -%}
|
| 49 |
+
{%- set ns = namespace(last_user_index=-1, thinking_indices='') -%}
|
| 50 |
+
{%- for m in messages %}
|
| 51 |
+
{%- if m.role == 'user' %}
|
| 52 |
+
{%- set ns.last_user_index = loop.index0 -%}
|
| 53 |
+
{%- elif m.role == 'assistant' %}
|
| 54 |
+
{%- if m.reasoning_content is string %}
|
| 55 |
+
{%- set ns.thinking_indices = ns.thinking_indices ~ ',' ~ ns.last_user_index ~ ',' -%}
|
| 56 |
+
{%- endif %}
|
| 57 |
+
{%- endif %}
|
| 58 |
+
{%- endfor %}
|
| 59 |
+
{%- set ns.has_thinking = false -%}
|
| 60 |
+
{%- for m in messages -%}
|
| 61 |
+
{%- if m.role == 'user' -%}<|user|>{{ visible_text(m.content) }}{% set ns.has_thinking = (',' ~ loop.index0 ~ ',') in ns.thinking_indices -%}
|
| 62 |
+
{%- elif m.role == 'assistant' -%}
|
| 63 |
+
<|assistant|>
|
| 64 |
+
{%- set content = visible_text(m.content) %}
|
| 65 |
+
{%- if m.reasoning_content is string %}
|
| 66 |
+
{%- set reasoning_content = m.reasoning_content %}
|
| 67 |
+
{%- elif '</think>' in content %}
|
| 68 |
+
{%- set reasoning_content = content.split('</think>')[0].split('<think>')[-1] %}
|
| 69 |
+
{%- set content = content.split('</think>')[-1] %}
|
| 70 |
+
{%- elif loop.index0 > ns.last_user_index and not (enable_thinking is defined and not enable_thinking) %}
|
| 71 |
+
{%- set reasoning_content = '' %}
|
| 72 |
+
{%- elif loop.index0 < ns.last_user_index and ns.has_thinking %}
|
| 73 |
+
{%- set reasoning_content = '' %}
|
| 74 |
+
{%- endif %}
|
| 75 |
+
{%- if ((clear_thinking is defined and not clear_thinking) or loop.index0 > ns.last_user_index) and reasoning_content is defined -%}
|
| 76 |
+
{{ '<think>' + reasoning_content + '</think>'}}
|
| 77 |
+
{%- else -%}
|
| 78 |
+
{{ '</think>' }}
|
| 79 |
+
{%- endif -%}
|
| 80 |
+
{%- if content.strip() -%}
|
| 81 |
+
{{ content.strip() }}
|
| 82 |
+
{%- endif -%}
|
| 83 |
+
{% if m.tool_calls %}
|
| 84 |
+
{% for tc in m.tool_calls %}
|
| 85 |
+
{%- if tc.function %}
|
| 86 |
+
{%- set tc = tc.function %}
|
| 87 |
+
{%- endif %}
|
| 88 |
+
{{- '<tool_call>' + tc.name -}}
|
| 89 |
+
{% set _args = tc.arguments %}{% for k, v in _args.items() %}<arg_key>{{ k }}</arg_key><arg_value>{{ v | tojson(ensure_ascii=False) if v is not string else v }}</arg_value>{% endfor %}</tool_call>{% endfor %}
|
| 90 |
+
{% endif %}
|
| 91 |
+
{%- elif m.role == 'tool' -%}
|
| 92 |
+
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
|
| 93 |
+
{{- '<|observation|>' -}}
|
| 94 |
+
{%- endif %}
|
| 95 |
+
{%- if m.content is string -%}
|
| 96 |
+
{{- '<tool_response>' + m.content + '</tool_response>' -}}
|
| 97 |
+
{%- else -%}
|
| 98 |
+
{{- '<tool_response><tools>\n' -}}
|
| 99 |
+
{% for tr in m.content %}
|
| 100 |
+
{%- for tool in tools -%}
|
| 101 |
+
{%- if 'function' in tool -%}
|
| 102 |
+
{%- set tool = tool['function'] -%}
|
| 103 |
+
{%- endif -%}
|
| 104 |
+
{%- if tool.name == tr.name -%}
|
| 105 |
+
{{- tool_to_json(tool) + '\n' -}}
|
| 106 |
+
{%- endif -%}
|
| 107 |
+
{%- endfor -%}
|
| 108 |
+
{%- endfor -%}
|
| 109 |
+
{{- '</tools></tool_response>' -}}
|
| 110 |
+
{% endif -%}
|
| 111 |
+
{%- elif m.role == 'system' -%}
|
| 112 |
+
<|system|>{{ visible_text(m.content) }}
|
| 113 |
+
{%- endif -%}
|
| 114 |
+
{%- endfor -%}
|
| 115 |
+
{%- if add_generation_prompt -%}
|
| 116 |
+
<|assistant|>{{- '</think>' if (enable_thinking is defined and not enable_thinking) else '<think>' -}}
|
| 117 |
+
{%- endif -%}
|
config.json
ADDED
|
@@ -0,0 +1,173 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"architectures": [
|
| 3 |
+
"GlmMoeDsaForCausalLM"
|
| 4 |
+
],
|
| 5 |
+
"attention_bias": false,
|
| 6 |
+
"attention_dropout": 0.0,
|
| 7 |
+
"bos_token_id": 0,
|
| 8 |
+
"dtype": "float16",
|
| 9 |
+
"eos_token_id": [
|
| 10 |
+
154820,
|
| 11 |
+
154827,
|
| 12 |
+
154829
|
| 13 |
+
],
|
| 14 |
+
"ep_size": 1,
|
| 15 |
+
"first_k_dense_replace": 3,
|
| 16 |
+
"hidden_act": "silu",
|
| 17 |
+
"hidden_size": 6144,
|
| 18 |
+
"index_head_dim": 128,
|
| 19 |
+
"index_n_heads": 32,
|
| 20 |
+
"index_topk": 2048,
|
| 21 |
+
"indexer_rope_interleave": true,
|
| 22 |
+
"initializer_range": 0.02,
|
| 23 |
+
"intermediate_size": 12288,
|
| 24 |
+
"kv_lora_rank": 512,
|
| 25 |
+
"max_position_embeddings": 202752,
|
| 26 |
+
"mlp_layer_types": [
|
| 27 |
+
"dense",
|
| 28 |
+
"dense",
|
| 29 |
+
"dense",
|
| 30 |
+
"sparse",
|
| 31 |
+
"sparse",
|
| 32 |
+
"sparse",
|
| 33 |
+
"sparse",
|
| 34 |
+
"sparse",
|
| 35 |
+
"sparse",
|
| 36 |
+
"sparse",
|
| 37 |
+
"sparse",
|
| 38 |
+
"sparse",
|
| 39 |
+
"sparse",
|
| 40 |
+
"sparse",
|
| 41 |
+
"sparse",
|
| 42 |
+
"sparse",
|
| 43 |
+
"sparse",
|
| 44 |
+
"sparse",
|
| 45 |
+
"sparse",
|
| 46 |
+
"sparse",
|
| 47 |
+
"sparse",
|
| 48 |
+
"sparse",
|
| 49 |
+
"sparse",
|
| 50 |
+
"sparse",
|
| 51 |
+
"sparse",
|
| 52 |
+
"sparse",
|
| 53 |
+
"sparse",
|
| 54 |
+
"sparse",
|
| 55 |
+
"sparse",
|
| 56 |
+
"sparse",
|
| 57 |
+
"sparse",
|
| 58 |
+
"sparse",
|
| 59 |
+
"sparse",
|
| 60 |
+
"sparse",
|
| 61 |
+
"sparse",
|
| 62 |
+
"sparse",
|
| 63 |
+
"sparse",
|
| 64 |
+
"sparse",
|
| 65 |
+
"sparse",
|
| 66 |
+
"sparse",
|
| 67 |
+
"sparse",
|
| 68 |
+
"sparse",
|
| 69 |
+
"sparse",
|
| 70 |
+
"sparse",
|
| 71 |
+
"sparse",
|
| 72 |
+
"sparse",
|
| 73 |
+
"sparse",
|
| 74 |
+
"sparse",
|
| 75 |
+
"sparse",
|
| 76 |
+
"sparse",
|
| 77 |
+
"sparse",
|
| 78 |
+
"sparse",
|
| 79 |
+
"sparse",
|
| 80 |
+
"sparse",
|
| 81 |
+
"sparse",
|
| 82 |
+
"sparse",
|
| 83 |
+
"sparse",
|
| 84 |
+
"sparse",
|
| 85 |
+
"sparse",
|
| 86 |
+
"sparse",
|
| 87 |
+
"sparse",
|
| 88 |
+
"sparse",
|
| 89 |
+
"sparse",
|
| 90 |
+
"sparse",
|
| 91 |
+
"sparse",
|
| 92 |
+
"sparse",
|
| 93 |
+
"sparse",
|
| 94 |
+
"sparse",
|
| 95 |
+
"sparse",
|
| 96 |
+
"sparse",
|
| 97 |
+
"sparse",
|
| 98 |
+
"sparse",
|
| 99 |
+
"sparse",
|
| 100 |
+
"sparse",
|
| 101 |
+
"sparse",
|
| 102 |
+
"sparse",
|
| 103 |
+
"sparse",
|
| 104 |
+
"sparse"
|
| 105 |
+
],
|
| 106 |
+
"model_type": "glm_moe_dsa",
|
| 107 |
+
"moe_intermediate_size": 2048,
|
| 108 |
+
"moe_layer_freq": 1,
|
| 109 |
+
"n_group": 1,
|
| 110 |
+
"n_routed_experts": 192,
|
| 111 |
+
"n_shared_experts": 1,
|
| 112 |
+
"norm_topk_prob": true,
|
| 113 |
+
"num_attention_heads": 64,
|
| 114 |
+
"num_experts_per_tok": 8,
|
| 115 |
+
"num_hidden_layers": 78,
|
| 116 |
+
"num_key_value_heads": 64,
|
| 117 |
+
"num_nextn_predict_layers": 1,
|
| 118 |
+
"pad_token_id": 154820,
|
| 119 |
+
"pretraining_tp": 1,
|
| 120 |
+
"q_lora_rank": 2048,
|
| 121 |
+
"qk_head_dim": 256,
|
| 122 |
+
"qk_nope_head_dim": 192,
|
| 123 |
+
"qk_rope_head_dim": 64,
|
| 124 |
+
"quantization_config": {
|
| 125 |
+
"autoround_version": "0.12.2",
|
| 126 |
+
"bits": 4,
|
| 127 |
+
"damp_percent": 0.01,
|
| 128 |
+
"data_type": "int",
|
| 129 |
+
"desc_act": false,
|
| 130 |
+
"dynamic": {
|
| 131 |
+
"-:.*layers\\.0\\.mlp.*": {},
|
| 132 |
+
"-:.*layers\\.1\\.mlp.*": {},
|
| 133 |
+
"-:.*layers\\.2\\.mlp.*": {},
|
| 134 |
+
"-:.*weights_proj.*": {}
|
| 135 |
+
},
|
| 136 |
+
"group_size": 128,
|
| 137 |
+
"iters": 10,
|
| 138 |
+
"lm_head": false,
|
| 139 |
+
"low_gpu_mem_usage": true,
|
| 140 |
+
"modules_in_block_to_quantize": [
|
| 141 |
+
[
|
| 142 |
+
"self_attn.q_a_proj",
|
| 143 |
+
"self_attn.q_b_proj",
|
| 144 |
+
"self_attn.kv_a_proj_with_mqa",
|
| 145 |
+
"self_attn.kv_b_proj",
|
| 146 |
+
"self_attn.o_proj",
|
| 147 |
+
"self_attn.indexer.wq_b",
|
| 148 |
+
"self_attn.indexer.wk"
|
| 149 |
+
]
|
| 150 |
+
],
|
| 151 |
+
"nsamples": 64,
|
| 152 |
+
"provider": "auto-round",
|
| 153 |
+
"quant_method": "gptq",
|
| 154 |
+
"sym": true,
|
| 155 |
+
"true_sequential": false
|
| 156 |
+
},
|
| 157 |
+
"rms_norm_eps": 1e-05,
|
| 158 |
+
"rope_interleave": true,
|
| 159 |
+
"rope_parameters": {
|
| 160 |
+
"rope_theta": 1000000,
|
| 161 |
+
"rope_type": "default"
|
| 162 |
+
},
|
| 163 |
+
"routed_scaling_factor": 2.5,
|
| 164 |
+
"scoring_func": "sigmoid",
|
| 165 |
+
"tie_word_embeddings": false,
|
| 166 |
+
"topk_group": 1,
|
| 167 |
+
"topk_method": "noaux_tc",
|
| 168 |
+
"transformers_version": "5.4.0",
|
| 169 |
+
"use_cache": true,
|
| 170 |
+
"v_head_dim": 256,
|
| 171 |
+
"vocab_size": 154880,
|
| 172 |
+
"torch_dtype": "float16"
|
| 173 |
+
}
|
generation_config.json
ADDED
|
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"_from_model_config": true,
|
| 3 |
+
"do_sample": true,
|
| 4 |
+
"eos_token_id": [
|
| 5 |
+
154820,
|
| 6 |
+
154827,
|
| 7 |
+
154829
|
| 8 |
+
],
|
| 9 |
+
"pad_token_id": 154820,
|
| 10 |
+
"temperature": 1.0,
|
| 11 |
+
"top_p": 0.95,
|
| 12 |
+
"transformers_version": "5.4.0"
|
| 13 |
+
}
|
model-00001-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:bc0b41b9d8b5a4fc47f512025a96cc43cc1c4eaecff9885c38f5157611a1b39a
|
| 3 |
+
size 5368863504
|
model-00002-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f5407cd33cf64335dc8ddc9991d95f3adcece4c47a01548d2415afdc50cc1d4d
|
| 3 |
+
size 5366777928
|
model-00003-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:93fd9e305a8b3348d43327fe15d38f656f89730074f864c6e234dec80ccddf1e
|
| 3 |
+
size 5365123192
|
model-00004-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ec3daceebbb4ac68dc4ed9ec32a08c53bf21ea3177b10ad6d0dc0ff1fc12211a
|
| 3 |
+
size 5366762120
|
model-00005-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:5bcf24af0cbb662b66e2451720654b5a487cc0a01e94927f41e3264eca30fc94
|
| 3 |
+
size 5365122736
|
model-00006-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:cbb95f06dc2d555d3331c67016f5f1916c8a6652869100bb64350483bb6b3c29
|
| 3 |
+
size 5365125536
|
model-00007-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c52d79f6973f24772b3f2a045000baead3240fb4d5cf216489ed4b78c455908b
|
| 3 |
+
size 5366781408
|
model-00008-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:03c5ad78f2414e3ec4cdc8f2b6bf1a1e915c6f40517ff42b49b5848016a1732f
|
| 3 |
+
size 5365126200
|
model-00010-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:42d04e45759c1e73f4d2c00571de9fe8e3c33d4edfafed9eadc02a2f8331f4b8
|
| 3 |
+
size 5364420784
|
model-00012-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ab84b48e87a4a03bccd3e3289680630faca12ce27b5edbe22687d71f612389b3
|
| 3 |
+
size 5366765272
|
model-00013-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:40332bf3e76a22a4a3414d51599369d0e546c07c41753aee2b8d13a7efb736af
|
| 3 |
+
size 5365125992
|
model-00014-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:777221fa9409dc86784ffe3d9a82773d6f6208d4795e084e4df9f58b00fdc884
|
| 3 |
+
size 5365126856
|
model-00015-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:cd3832f059f1847182ead84a5411408e525d5fc030934afea24ce3cc29c841bb
|
| 3 |
+
size 5366781328
|
model-00016-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:062120e76615dbecd4fe97263a3b5acff20953227c66f92e56d3060888820a64
|
| 3 |
+
size 5365126288
|
model-00017-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f67d6ef4c7086302378e589cf2d1fe43773300fa3dc99bb4882cb2c90790fcca
|
| 3 |
+
size 5366781992
|
model-00018-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7aa4eeccd31a40954103438c3e6ff7454285c497d8ed4601ffca8315a143e6fb
|
| 3 |
+
size 5365125896
|
model-00019-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:243c6b1c0afd00402c2cd2c878d1c7c8b4ff3d4473cce5d219041a0080ab7b54
|
| 3 |
+
size 5365126616
|
model-00020-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:3e6f7b7e38bbd25a554fa8a000ddc3f9104dac23dc87447fb1cac3dce4005387
|
| 3 |
+
size 5366765184
|
model-00021-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:313a283104c1dd31f88a6b4a584e261c7a88cc9a8d9a69257d881be409ce70a5
|
| 3 |
+
size 5365126040
|
model-00022-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c48a2163264bdbc3f6715a4e3cd2b7a2830f87335d24192c228b5ddc4fd790d1
|
| 3 |
+
size 5365126896
|
model-00023-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:bdd55c524373b979ad9a0c9ae5cbed491b72fd5dc9e825198c7d4766e4548d37
|
| 3 |
+
size 5366781240
|
model-00024-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8e133c898cb4feb048f2886c84e61b1422538841fb50b0d08b232b6d5fe92a9d
|
| 3 |
+
size 5365126368
|
model-00025-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ad68a187b3806a1139e45906a196c55d20cbbafb3102893d2322167fda707879
|
| 3 |
+
size 5366781824
|
model-00026-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:be0acf530078c33fea84d4848d831f6e05ebeef7e3e04c025e53e8a8aa6bb53a
|
| 3 |
+
size 5365125976
|
model-00027-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:b86d4a1b21100a11ebb6d11cbdeb4bef4ec91b3d50222f4e8e2a804f4db74658
|
| 3 |
+
size 5365126696
|
model-00028-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c890b8ff80e5a4d5c772a274329820d4e25cc97f9088227fdc00463f1223b678
|
| 3 |
+
size 5366765104
|
model-00029-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:be08fab49f5a2550909863f3445a67cf50f688b03d694b15e1bad57bfcb861e2
|
| 3 |
+
size 5365126128
|
model-00031-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e08b0cb93ad9f5ab2a00ff6355845bf90b923f26362d8f6d505211165a134d4e
|
| 3 |
+
size 5366781160
|
model-00035-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:92365dc498f68b23466dac5f120a7fe8eb3635a35970118b2bfd9c2b6a78a688
|
| 3 |
+
size 5365126784
|
model-00037-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d0969ecfb0719093db27839ba8fe827ea4d143069f136b6a3c45e2110e14c1fa
|
| 3 |
+
size 5365126208
|
model-00039-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:49be9918b3fb8eb9eab2c065fcd2fed185b35819452035c7abfa310e94c453c1
|
| 3 |
+
size 5364395952
|
model-00041-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:5c160cfbe16446404c42523e908f06a793d9e2e2378d049f25e317ab62c453a0
|
| 3 |
+
size 5366781640
|
model-00042-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:46d222643b5178e0a85d3ba0fa2880f1c1ac4154da1a6b23822fb274cddc1232
|
| 3 |
+
size 5365125992
|
model-00043-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:dc857c800d14d934e9e5b6abe97c23d2386d10f0241e9ab00acde97ba2cff891
|
| 3 |
+
size 5365126864
|
model-00044-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:60910b36b643a2065e239a97425fa7ddd193cad9f4ee63e6f1eb6c849346dbaa
|
| 3 |
+
size 5366764936
|
model-00048-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f455493e6bbad6f85482898d29b55501fb106628242d19994edb7d56f6f1884c
|
| 3 |
+
size 5365126624
|
model-00051-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:449d589713e044c3917a0b02dd946746e2c76882c1aa3e68523dd07a2e8a9ac9
|
| 3 |
+
size 5365126896
|
model-00052-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:bef112fcbc0ab6ef31571f1b91b9709747699d4baa4df86df3d5e42bd61b0017
|
| 3 |
+
size 5366764848
|
model-00053-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7080abe8695b5a6c4d608c97e52b890efeffb73debc3c2cd9989c152f0c0c375
|
| 3 |
+
size 5365126376
|
model-00054-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d3a56fc554628b11932876c06cf8b52dd02fc743a861a51719134c1cc0940087
|
| 3 |
+
size 5366781816
|
model-00055-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:0f687ee3b81b456f6972d398bbc89384afe4c1527ccac1fde2ab8caf2a9d1dc9
|
| 3 |
+
size 3615185888
|
model-00056-of-00056.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:165264873338892852fbd6fe42cee3dab9ea21a36df5207f734f821eb7886005
|
| 3 |
+
size 3806343464
|
model.safetensors.index.json
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a2dec232545a01b3f146bb9f475afc5b1a5d08f1a5829e78ca7c7f8399ed222d
|
| 3 |
+
size 16004846
|
quantization_config.json
ADDED
|
@@ -0,0 +1,33 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"bits": 4,
|
| 3 |
+
"data_type": "int",
|
| 4 |
+
"group_size": 128,
|
| 5 |
+
"sym": true,
|
| 6 |
+
"iters": 10,
|
| 7 |
+
"low_gpu_mem_usage": true,
|
| 8 |
+
"nsamples": 64,
|
| 9 |
+
"autoround_version": "0.12.2",
|
| 10 |
+
"dynamic": {
|
| 11 |
+
"-:.*layers\\.0\\.mlp.*": {},
|
| 12 |
+
"-:.*layers\\.1\\.mlp.*": {},
|
| 13 |
+
"-:.*layers\\.2\\.mlp.*": {},
|
| 14 |
+
"-:.*weights_proj.*": {}
|
| 15 |
+
},
|
| 16 |
+
"lm_head": false,
|
| 17 |
+
"provider": "auto-round",
|
| 18 |
+
"quant_method": "gptq",
|
| 19 |
+
"desc_act": false,
|
| 20 |
+
"true_sequential": false,
|
| 21 |
+
"damp_percent": 0.01,
|
| 22 |
+
"modules_in_block_to_quantize": [
|
| 23 |
+
[
|
| 24 |
+
"self_attn.q_a_proj",
|
| 25 |
+
"self_attn.q_b_proj",
|
| 26 |
+
"self_attn.kv_a_proj_with_mqa",
|
| 27 |
+
"self_attn.kv_b_proj",
|
| 28 |
+
"self_attn.o_proj",
|
| 29 |
+
"self_attn.indexer.wq_b",
|
| 30 |
+
"self_attn.indexer.wk"
|
| 31 |
+
]
|
| 32 |
+
]
|
| 33 |
+
}
|
tokenizer.json
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:19e773648cb4e65de8660ea6365e10acca112d42a854923df93db4a6f333a82d
|
| 3 |
+
size 20217442
|