0xSero commited on 8 days ago

Commit

6b78b3b

verified ·

1 Parent(s): d58f81e

Add files using upload-large-folder tool

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitattributes +2 -0
README.md +135 -0
chat_template.jinja +117 -0
config.json +173 -0
generation_config.json +13 -0
model-00001-of-00056.safetensors +3 -0
model-00002-of-00056.safetensors +3 -0
model-00003-of-00056.safetensors +3 -0
model-00004-of-00056.safetensors +3 -0
model-00005-of-00056.safetensors +3 -0
model-00006-of-00056.safetensors +3 -0
model-00007-of-00056.safetensors +3 -0
model-00008-of-00056.safetensors +3 -0
model-00010-of-00056.safetensors +3 -0
model-00012-of-00056.safetensors +3 -0
model-00013-of-00056.safetensors +3 -0
model-00014-of-00056.safetensors +3 -0
model-00015-of-00056.safetensors +3 -0
model-00016-of-00056.safetensors +3 -0
model-00017-of-00056.safetensors +3 -0
model-00018-of-00056.safetensors +3 -0
model-00019-of-00056.safetensors +3 -0
model-00020-of-00056.safetensors +3 -0
model-00021-of-00056.safetensors +3 -0
model-00022-of-00056.safetensors +3 -0
model-00023-of-00056.safetensors +3 -0
model-00024-of-00056.safetensors +3 -0
model-00025-of-00056.safetensors +3 -0
model-00026-of-00056.safetensors +3 -0
model-00027-of-00056.safetensors +3 -0
model-00028-of-00056.safetensors +3 -0
model-00029-of-00056.safetensors +3 -0
model-00031-of-00056.safetensors +3 -0
model-00035-of-00056.safetensors +3 -0
model-00037-of-00056.safetensors +3 -0
model-00039-of-00056.safetensors +3 -0
model-00041-of-00056.safetensors +3 -0
model-00042-of-00056.safetensors +3 -0
model-00043-of-00056.safetensors +3 -0
model-00044-of-00056.safetensors +3 -0
model-00048-of-00056.safetensors +3 -0
model-00051-of-00056.safetensors +3 -0
model-00052-of-00056.safetensors +3 -0
model-00053-of-00056.safetensors +3 -0
model-00054-of-00056.safetensors +3 -0
model-00055-of-00056.safetensors +3 -0
model-00056-of-00056.safetensors +3 -0
model.safetensors.index.json +3 -0
quantization_config.json +33 -0
tokenizer.json +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+model.safetensors.index.json filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,135 @@

+---
+license: other
+license_name: glm-5
+license_link: https://huggingface.co/zai-org/GLM-5.1/blob/main/LICENSE
+base_model: 0xSero/GLM-5.1-555B-A14B-REAP
+tags:
+  - reap
+  - pruning
+  - moe
+  - expert-pruning
+  - glm
+  - gptq
+  - w4a16
+  - autoround
+  - vllm
+library_name: transformers
+pipeline_tag: text-generation
+quantization_config:
+  quant_method: gptq
+  bits: 4
+  group_size: 128
+  sym: true
+  desc_act: false
+  checkpoint_format: gptq
+---
+# GLM-5.1 — 25% Expert Pruned (REAP) — W4A16
+This is a **GPTQ 4-bit weight-quantized** variant of the 25% expert-pruned [`zai-org/GLM-5.1`](https://huggingface.co/zai-org/GLM-5.1) using [REAP](https://github.com/CerebrasResearch/reap) (Relative Expert Activation Pruning), produced with [AutoRound](https://github.com/intel/auto-round) for learned rounding optimization.
+| Property | Value |
+|----------|-------|
+| Base model | `zai-org/GLM-5.1` (744B MoE, 256 experts/layer) |
+| Architecture | `GlmMoeDsaForCausalLM` (MoE + Dynamic Sparse Attention) |
+| Routed experts | 256 → 192 (25% removed, 64 per layer) |
+| Active params/token | ~14B (top-8 routing preserved) |
+| Quantization | GPTQ W4A16 (int4 symmetric, group_size=128) |
+| Quantizer | auto-round 0.12.2 (200 iterations, SignSGD) |
+| Quantized size | **277 GB** (56 safetensor shards) |
+| BF16 source | [`0xSero/GLM-5.1-555B-A14B-REAP`](https://huggingface.co/0xSero/GLM-5.1-555B-A14B-REAP) |
+| GGUF variant | [`0xSero/GLM-5.1-555B-A14B-REAP-GGUF`](https://huggingface.co/0xSero/GLM-5.1-555B-A14B-REAP-GGUF) (325 GB, Q4_K_M) |
+## Benchmark Results (GGUF Q4_K_M, inference mode, temp=0.8)
+The GPTQ W4A16 uses the same learned rounding method (AutoRound) as the GGUF Q4_K_M. Benchmark scores from the GGUF variant (zero repetition loops):
+| Suite | Metric | Result | Repetition Loops |
+|-------|--------|--------|-----------------|
+| Terminal-Bench (50) | Proxy Pass | 44/50 (88%) | 0/50 |
+| SWE-bench Pro (50) | Proxy Pass | 33/50 (66%) | 0/50 |
+| GSM8K (50) | Correct | 30/50 (60%) | 0/50 |
+| HLE (50) | Correct | 9/50 (18%) | 0/50 |
+**Zero repetition loops across 220 benchmark probes.** The 25% prune retains 192/256 experts, providing enough expert diversity for stable generation at all sequence lengths.
+## How to Use
+### vLLM
+```python
+from vllm import LLM, SamplingParams
+llm = LLM(
+    model="0xSero/GLM-5.1-555B-A14B-REAP-GPTQ-W4A16",
+    tensor_parallel_size=4,    # 4× B200 or 8× A100
+    max_model_len=8192,
+    trust_remote_code=True,
+)
+params = SamplingParams(temperature=0.8, max_tokens=4096)
+outputs = llm.generate(["Hello, world!"], params)
+```
+### SGLang
+```bash
+python -m sglang.launch_server \
+  --model-path 0xSero/GLM-5.1-555B-A14B-REAP-GPTQ-W4A16 \
+  --tp 4 \
+  --trust-remote-code
+```
+### Requires
+- ~70-80 GiB VRAM per GPU across 4 GPUs (B200), or ~280 GiB total
+- CUDA 12.8+ (sm_100a / Blackwell)
+- vLLM >= 0.19.0 with `deep_gemm` installed (for DSA sparse attention)
+- `trust_remote_code=True`
+## Quantization Details
+**Method:** AutoRound W4A16 — learned rounding via SignSGD (200 iterations per layer), calibrated on 128 samples from NeelNanda/pile-10k at 2048 sequence length.
+**Protected (kept at full precision):**
+- Dense MLP layers 0-2 (`gate_proj`, `up_proj`, `down_proj`)
+- DSA indexer (`weights_proj`)
+- `lm_head`
+**Quantized to int4 (43,971/44,059 linear layers):**
+- All attention projections (`q_a_proj`, `q_b_proj`, `kv_a_proj`, `kv_b_proj`, `o_proj`)
+- All routed MoE expert projections (192 experts × gate/up/down × 75 MoE layers)
+- Shared expert projections
+**GPTQ config:** `bits=4, group_size=128, sym=true, desc_act=false`
+## Why GPTQ over GGUF Q4_K_M?
+| | GPTQ W4A16 (this) | GGUF Q4_K_M |
+|---|---|---|
+| Size | 277 GB | 325 GB |
+| Serving | vLLM, SGLang, TGI (GPU) | llama.cpp (CPU/GPU hybrid) |
+| Quant method | Learned rounding (SignSGD) | K-means clustering |
+| Throughput | Higher (GPU-native kernels) | Lower |
+| Best for | Production GPU serving | Local inference, edge |
+GPTQ packs 4-bit weights more efficiently with `group_size=128` symmetric quantization, resulting in a smaller checkpoint than GGUF Q4_K_M at the same bit-width.
+## Related Models
+| Model | Prune % | Experts | Format | Size | Status |
+|-------|---------|---------|--------|------|--------|
+| [`0xSero/GLM-5.1-555B-A14B-REAP`](https://huggingface.co/0xSero/GLM-5.1-555B-A14B-REAP) | 25% | 192/256 | BF16 | 1.1T | Source checkpoint |
+| [`0xSero/GLM-5.1-555B-A14B-REAP-GGUF`](https://huggingface.co/0xSero/GLM-5.1-555B-A14B-REAP-GGUF) | 25% | 192/256 | GGUF Q4_K_M | 325G | llama.cpp serving |
+| **This model** | **25%** | **192/256** | **GPTQ W4A16** | **277G** | **vLLM/SGLang serving** |
+| [`0xSero/GLM-5.1-444B-A14B-REAP`](https://huggingface.co/0xSero/GLM-5.1-444B-A14B-REAP) | 40% | 154/256 | BF16 | 910G | Has repetition issues — use 25% |
+## Support This Work
+If you find these models useful, please consider supporting continued open-source model compression research:
+**[donate.sybilsolutions.ai](https://donate.sybilsolutions.ai)**
+## Citation
+If you use this model, please cite the [REAP paper](https://github.com/CerebrasResearch/reap) and [AutoRound](https://github.com/intel/auto-round).

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,117 @@

+[gMASK]<sop>
+{%- if tools -%}
+{%- macro tool_to_json(tool) -%}
+    {%- set ns_tool = namespace(first=true) -%}
+    {{ '{' -}}
+    {%- for k, v in tool.items() -%}
+        {%- if k != 'defer_loading' and k != 'strict' -%}
+            {%- if not ns_tool.first -%}{{- ', ' -}}{%- endif -%}
+            {%- set ns_tool.first = false -%}
+            "{{ k }}": {{ v | tojson(ensure_ascii=False) }}
+        {%- endif -%}
+    {%- endfor -%}
+    {{- '}' -}}
+{%- endmacro -%}
+<|system|>
+# Tools
+You may call one or more functions to assist with the user query.
+You are provided with function signatures within <tools></tools> XML tags:
+<tools>
+{% for tool in tools %}
+{%- if 'function' in tool -%}
+    {%- set tool = tool['function'] -%}
+{%- endif -%}
+{% if tool.defer_loading is not defined or not tool.defer_loading %}
+{{ tool_to_json(tool) }}
+{% endif %}
+{% endfor %}
+</tools>
+For each function call, output the function name and arguments within the following XML format:
+<tool_call>{function-name}<arg_key>{arg-key-1}</arg_key><arg_value>{arg-value-1}</arg_value><arg_key>{arg-key-2}</arg_key><arg_value>{arg-value-2}</arg_value>...</tool_call>{%- endif -%}
+{%- macro visible_text(content) -%}
+    {%- if content is string -%}
+        {{- content }}
+    {%- elif content is iterable and content is not mapping -%}
+        {%- for item in content -%}
+            {%- if item is mapping and item.type == 'text' -%}
+                {{- item.text }}
+            {%- elif item is string -%}
+                {{- item }}
+            {%- endif -%}
+        {%- endfor -%}
+    {%- else -%}
+        {{- content }}
+    {%- endif -%}
+{%- endmacro -%}
+{%- set ns = namespace(last_user_index=-1, thinking_indices='') -%}
+{%- for m in messages %}
+    {%- if m.role == 'user' %}
+        {%- set ns.last_user_index = loop.index0 -%}
+    {%- elif m.role == 'assistant' %}
+        {%- if m.reasoning_content is string %}
+            {%- set ns.thinking_indices = ns.thinking_indices ~ ',' ~ ns.last_user_index ~ ',' -%}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- set ns.has_thinking = false -%}
+{%- for m in messages -%}
+{%- if m.role == 'user' -%}<|user|>{{ visible_text(m.content) }}{% set ns.has_thinking = (',' ~ loop.index0 ~ ',') in ns.thinking_indices -%}
+{%- elif m.role == 'assistant' -%}
+<|assistant|>
+{%- set content = visible_text(m.content) %}
+{%- if m.reasoning_content is string %}
+    {%- set reasoning_content = m.reasoning_content %}
+{%- elif '</think>' in content %}
+    {%- set reasoning_content = content.split('</think>')[0].split('<think>')[-1] %}
+    {%- set content = content.split('</think>')[-1] %}
+{%- elif loop.index0 > ns.last_user_index and not (enable_thinking is defined and not enable_thinking) %}
+    {%- set reasoning_content = '' %}
+{%- elif loop.index0 < ns.last_user_index and ns.has_thinking %}
+    {%- set reasoning_content = '' %}
+{%- endif %}
+{%- if ((clear_thinking is defined and not clear_thinking) or loop.index0 > ns.last_user_index) and reasoning_content is defined -%}
+{{ '<think>' + reasoning_content +  '</think>'}}
+{%- else -%}
+{{ '</think>' }}
+{%- endif -%}
+{%- if content.strip() -%}
+{{ content.strip() }}
+{%- endif -%}
+{% if m.tool_calls %}
+{% for tc in m.tool_calls %}
+{%- if tc.function %}
+    {%- set tc = tc.function %}
+{%- endif %}
+{{- '<tool_call>' + tc.name -}}
+{% set _args = tc.arguments %}{% for k, v in _args.items() %}<arg_key>{{ k }}</arg_key><arg_value>{{ v | tojson(ensure_ascii=False) if v is not string else v }}</arg_value>{% endfor %}</tool_call>{% endfor %}
+{% endif %}
+{%- elif m.role == 'tool' -%}
+{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
+    {{- '<|observation|>' -}}
+{%- endif %}
+{%- if m.content is string -%}
+    {{- '<tool_response>' + m.content + '</tool_response>' -}}
+{%- else -%}
+    {{- '<tool_response><tools>\n' -}}
+    {% for tr in m.content %}
+        {%- for tool in tools -%}
+            {%- if 'function' in tool -%}
+                {%- set tool = tool['function'] -%}
+            {%- endif -%}
+            {%- if tool.name == tr.name -%}
+                {{- tool_to_json(tool) + '\n' -}}
+            {%- endif -%}
+        {%- endfor -%}
+    {%- endfor -%}
+    {{- '</tools></tool_response>' -}}
+{% endif -%}
+{%- elif m.role == 'system' -%}
+<|system|>{{ visible_text(m.content) }}
+{%- endif -%}
+{%- endfor -%}
+{%- if add_generation_prompt -%}
+    <|assistant|>{{- '</think>' if (enable_thinking is defined and not enable_thinking) else '<think>' -}}
+{%- endif -%}

config.json ADDED Viewed

	@@ -0,0 +1,173 @@

+{
+  "architectures": [
+    "GlmMoeDsaForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 0,
+  "dtype": "float16",
+  "eos_token_id": [
+    154820,
+    154827,
+    154829
+  ],
+  "ep_size": 1,
+  "first_k_dense_replace": 3,
+  "hidden_act": "silu",
+  "hidden_size": 6144,
+  "index_head_dim": 128,
+  "index_n_heads": 32,
+  "index_topk": 2048,
+  "indexer_rope_interleave": true,
+  "initializer_range": 0.02,
+  "intermediate_size": 12288,
+  "kv_lora_rank": 512,
+  "max_position_embeddings": 202752,
+  "mlp_layer_types": [
+    "dense",
+    "dense",
+    "dense",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse"
+  ],
+  "model_type": "glm_moe_dsa",
+  "moe_intermediate_size": 2048,
+  "moe_layer_freq": 1,
+  "n_group": 1,
+  "n_routed_experts": 192,
+  "n_shared_experts": 1,
+  "norm_topk_prob": true,
+  "num_attention_heads": 64,
+  "num_experts_per_tok": 8,
+  "num_hidden_layers": 78,
+  "num_key_value_heads": 64,
+  "num_nextn_predict_layers": 1,
+  "pad_token_id": 154820,
+  "pretraining_tp": 1,
+  "q_lora_rank": 2048,
+  "qk_head_dim": 256,
+  "qk_nope_head_dim": 192,
+  "qk_rope_head_dim": 64,
+  "quantization_config": {
+    "autoround_version": "0.12.2",
+    "bits": 4,
+    "damp_percent": 0.01,
+    "data_type": "int",
+    "desc_act": false,
+    "dynamic": {
+      "-:.*layers\\.0\\.mlp.*": {},
+      "-:.*layers\\.1\\.mlp.*": {},
+      "-:.*layers\\.2\\.mlp.*": {},
+      "-:.*weights_proj.*": {}
+    },
+    "group_size": 128,
+    "iters": 10,
+    "lm_head": false,
+    "low_gpu_mem_usage": true,
+    "modules_in_block_to_quantize": [
+      [
+        "self_attn.q_a_proj",
+        "self_attn.q_b_proj",
+        "self_attn.kv_a_proj_with_mqa",
+        "self_attn.kv_b_proj",
+        "self_attn.o_proj",
+        "self_attn.indexer.wq_b",
+        "self_attn.indexer.wk"
+      ]
+    ],
+    "nsamples": 64,
+    "provider": "auto-round",
+    "quant_method": "gptq",
+    "sym": true,
+    "true_sequential": false
+  },
+  "rms_norm_eps": 1e-05,
+  "rope_interleave": true,
+  "rope_parameters": {
+    "rope_theta": 1000000,
+    "rope_type": "default"
+  },
+  "routed_scaling_factor": 2.5,
+  "scoring_func": "sigmoid",
+  "tie_word_embeddings": false,
+  "topk_group": 1,
+  "topk_method": "noaux_tc",
+  "transformers_version": "5.4.0",
+  "use_cache": true,
+  "v_head_dim": 256,
+  "vocab_size": 154880,
+  "torch_dtype": "float16"
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,13 @@

+{
+  "_from_model_config": true,
+  "do_sample": true,
+  "eos_token_id": [
+    154820,
+    154827,
+    154829
+  ],
+  "pad_token_id": 154820,
+  "temperature": 1.0,
+  "top_p": 0.95,
+  "transformers_version": "5.4.0"
+}

model-00001-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bc0b41b9d8b5a4fc47f512025a96cc43cc1c4eaecff9885c38f5157611a1b39a
+size 5368863504

model-00002-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f5407cd33cf64335dc8ddc9991d95f3adcece4c47a01548d2415afdc50cc1d4d
+size 5366777928

model-00003-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:93fd9e305a8b3348d43327fe15d38f656f89730074f864c6e234dec80ccddf1e
+size 5365123192

model-00004-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ec3daceebbb4ac68dc4ed9ec32a08c53bf21ea3177b10ad6d0dc0ff1fc12211a
+size 5366762120

model-00005-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5bcf24af0cbb662b66e2451720654b5a487cc0a01e94927f41e3264eca30fc94
+size 5365122736

model-00006-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cbb95f06dc2d555d3331c67016f5f1916c8a6652869100bb64350483bb6b3c29
+size 5365125536

model-00007-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c52d79f6973f24772b3f2a045000baead3240fb4d5cf216489ed4b78c455908b
+size 5366781408

model-00008-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:03c5ad78f2414e3ec4cdc8f2b6bf1a1e915c6f40517ff42b49b5848016a1732f
+size 5365126200

model-00010-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:42d04e45759c1e73f4d2c00571de9fe8e3c33d4edfafed9eadc02a2f8331f4b8
+size 5364420784

model-00012-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ab84b48e87a4a03bccd3e3289680630faca12ce27b5edbe22687d71f612389b3
+size 5366765272

model-00013-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:40332bf3e76a22a4a3414d51599369d0e546c07c41753aee2b8d13a7efb736af
+size 5365125992

model-00014-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:777221fa9409dc86784ffe3d9a82773d6f6208d4795e084e4df9f58b00fdc884
+size 5365126856

model-00015-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cd3832f059f1847182ead84a5411408e525d5fc030934afea24ce3cc29c841bb
+size 5366781328

model-00016-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:062120e76615dbecd4fe97263a3b5acff20953227c66f92e56d3060888820a64
+size 5365126288

model-00017-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f67d6ef4c7086302378e589cf2d1fe43773300fa3dc99bb4882cb2c90790fcca
+size 5366781992

model-00018-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7aa4eeccd31a40954103438c3e6ff7454285c497d8ed4601ffca8315a143e6fb
+size 5365125896

model-00019-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:243c6b1c0afd00402c2cd2c878d1c7c8b4ff3d4473cce5d219041a0080ab7b54
+size 5365126616

model-00020-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3e6f7b7e38bbd25a554fa8a000ddc3f9104dac23dc87447fb1cac3dce4005387
+size 5366765184

model-00021-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:313a283104c1dd31f88a6b4a584e261c7a88cc9a8d9a69257d881be409ce70a5
+size 5365126040

model-00022-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c48a2163264bdbc3f6715a4e3cd2b7a2830f87335d24192c228b5ddc4fd790d1
+size 5365126896

model-00023-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bdd55c524373b979ad9a0c9ae5cbed491b72fd5dc9e825198c7d4766e4548d37
+size 5366781240

model-00024-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8e133c898cb4feb048f2886c84e61b1422538841fb50b0d08b232b6d5fe92a9d
+size 5365126368

model-00025-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ad68a187b3806a1139e45906a196c55d20cbbafb3102893d2322167fda707879
+size 5366781824

model-00026-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:be0acf530078c33fea84d4848d831f6e05ebeef7e3e04c025e53e8a8aa6bb53a
+size 5365125976

model-00027-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b86d4a1b21100a11ebb6d11cbdeb4bef4ec91b3d50222f4e8e2a804f4db74658
+size 5365126696

model-00028-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c890b8ff80e5a4d5c772a274329820d4e25cc97f9088227fdc00463f1223b678
+size 5366765104

model-00029-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:be08fab49f5a2550909863f3445a67cf50f688b03d694b15e1bad57bfcb861e2
+size 5365126128

model-00031-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e08b0cb93ad9f5ab2a00ff6355845bf90b923f26362d8f6d505211165a134d4e
+size 5366781160

model-00035-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:92365dc498f68b23466dac5f120a7fe8eb3635a35970118b2bfd9c2b6a78a688
+size 5365126784

model-00037-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d0969ecfb0719093db27839ba8fe827ea4d143069f136b6a3c45e2110e14c1fa
+size 5365126208

model-00039-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:49be9918b3fb8eb9eab2c065fcd2fed185b35819452035c7abfa310e94c453c1
+size 5364395952

model-00041-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5c160cfbe16446404c42523e908f06a793d9e2e2378d049f25e317ab62c453a0
+size 5366781640

model-00042-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:46d222643b5178e0a85d3ba0fa2880f1c1ac4154da1a6b23822fb274cddc1232
+size 5365125992

model-00043-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dc857c800d14d934e9e5b6abe97c23d2386d10f0241e9ab00acde97ba2cff891
+size 5365126864

model-00044-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:60910b36b643a2065e239a97425fa7ddd193cad9f4ee63e6f1eb6c849346dbaa
+size 5366764936

model-00048-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f455493e6bbad6f85482898d29b55501fb106628242d19994edb7d56f6f1884c
+size 5365126624

model-00051-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:449d589713e044c3917a0b02dd946746e2c76882c1aa3e68523dd07a2e8a9ac9
+size 5365126896

model-00052-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bef112fcbc0ab6ef31571f1b91b9709747699d4baa4df86df3d5e42bd61b0017
+size 5366764848

model-00053-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7080abe8695b5a6c4d608c97e52b890efeffb73debc3c2cd9989c152f0c0c375
+size 5365126376

model-00054-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d3a56fc554628b11932876c06cf8b52dd02fc743a861a51719134c1cc0940087
+size 5366781816

model-00055-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0f687ee3b81b456f6972d398bbc89384afe4c1527ccac1fde2ab8caf2a9d1dc9
+size 3615185888

model-00056-of-00056.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:165264873338892852fbd6fe42cee3dab9ea21a36df5207f734f821eb7886005
+size 3806343464

model.safetensors.index.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a2dec232545a01b3f146bb9f475afc5b1a5d08f1a5829e78ca7c7f8399ed222d
+size 16004846

quantization_config.json ADDED Viewed

	@@ -0,0 +1,33 @@

+{
+  "bits": 4,
+  "data_type": "int",
+  "group_size": 128,
+  "sym": true,
+  "iters": 10,
+  "low_gpu_mem_usage": true,
+  "nsamples": 64,
+  "autoround_version": "0.12.2",
+  "dynamic": {
+    "-:.*layers\\.0\\.mlp.*": {},
+    "-:.*layers\\.1\\.mlp.*": {},
+    "-:.*layers\\.2\\.mlp.*": {},
+    "-:.*weights_proj.*": {}
+  },
+  "lm_head": false,
+  "provider": "auto-round",
+  "quant_method": "gptq",
+  "desc_act": false,
+  "true_sequential": false,
+  "damp_percent": 0.01,
+  "modules_in_block_to_quantize": [
+    [
+      "self_attn.q_a_proj",
+      "self_attn.q_b_proj",
+      "self_attn.kv_a_proj_with_mqa",
+      "self_attn.kv_b_proj",
+      "self_attn.o_proj",
+      "self_attn.indexer.wq_b",
+      "self_attn.indexer.wk"
+    ]
+  ]
+}

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:19e773648cb4e65de8660ea6365e10acca112d42a854923df93db4a6f333a82d
+size 20217442