dealignai commited on
Commit
0153563
·
verified ·
1 Parent(s): ef498cd

Add files using upload-large-folder tool

Browse files
Files changed (50) hide show
  1. README.md +139 -0
  2. chat_template.jinja +117 -0
  3. config.json +63 -0
  4. generation_config.json +12 -0
  5. jang_config.json +37 -0
  6. model-00002-of-00233.safetensors +3 -0
  7. model-00007-of-00233.safetensors +3 -0
  8. model-00008-of-00233.safetensors +3 -0
  9. model-00015-of-00233.safetensors +3 -0
  10. model-00028-of-00233.safetensors +3 -0
  11. model-00030-of-00233.safetensors +3 -0
  12. model-00035-of-00233.safetensors +3 -0
  13. model-00036-of-00233.safetensors +3 -0
  14. model-00043-of-00233.safetensors +3 -0
  15. model-00046-of-00233.safetensors +3 -0
  16. model-00049-of-00233.safetensors +3 -0
  17. model-00051-of-00233.safetensors +3 -0
  18. model-00054-of-00233.safetensors +3 -0
  19. model-00071-of-00233.safetensors +3 -0
  20. model-00074-of-00233.safetensors +3 -0
  21. model-00082-of-00233.safetensors +3 -0
  22. model-00084-of-00233.safetensors +3 -0
  23. model-00087-of-00233.safetensors +3 -0
  24. model-00088-of-00233.safetensors +3 -0
  25. model-00109-of-00233.safetensors +3 -0
  26. model-00111-of-00233.safetensors +3 -0
  27. model-00114-of-00233.safetensors +3 -0
  28. model-00123-of-00233.safetensors +3 -0
  29. model-00126-of-00233.safetensors +3 -0
  30. model-00129-of-00233.safetensors +3 -0
  31. model-00134-of-00233.safetensors +3 -0
  32. model-00150-of-00233.safetensors +3 -0
  33. model-00155-of-00233.safetensors +3 -0
  34. model-00156-of-00233.safetensors +3 -0
  35. model-00162-of-00233.safetensors +3 -0
  36. model-00167-of-00233.safetensors +3 -0
  37. model-00168-of-00233.safetensors +3 -0
  38. model-00175-of-00233.safetensors +3 -0
  39. model-00183-of-00233.safetensors +3 -0
  40. model-00186-of-00233.safetensors +3 -0
  41. model-00191-of-00233.safetensors +3 -0
  42. model-00194-of-00233.safetensors +3 -0
  43. model-00203-of-00233.safetensors +3 -0
  44. model-00206-of-00233.safetensors +3 -0
  45. model-00209-of-00233.safetensors +3 -0
  46. model-00223-of-00233.safetensors +3 -0
  47. model-00226-of-00233.safetensors +3 -0
  48. model-00231-of-00233.safetensors +3 -0
  49. model.safetensors.index.json +0 -0
  50. tokenizer_config.json +33 -0
README.md ADDED
@@ -0,0 +1,139 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # GLM-5.1-JANG_1L
2
+
3
+ **744B-parameter Mixture-of-Experts at ~2.15 bits/weight**
4
+ **Created by Jinho Jang — eric@jangq.ai**
5
+
6
+ > ## ⚠ EXPERIMENTAL
7
+ > This is an early research release. Benchmarks (MMLU, HumanEval, GSM8K, etc.) are not yet finalized and will be uploaded in a follow-up revision. Expect rough edges in long-form reasoning outputs until tuning is complete.
8
+
9
+ ---
10
+
11
+ ## Requires MLX Studio
12
+
13
+ **This model only runs on [MLX Studio](https://mlxstudio.com/)** — Jinho Jang's native MLX inference app for Apple Silicon.
14
+
15
+ - **Standard `mlx_lm` will NOT work** with this model. MLX Studio contains a patched `deepseek_v32` runtime path that is required for coherent decode on quantized GLM-5.1 at bf16. Without the patched runtime, the model produces repetition loops during generation.
16
+ - MLX Studio auto-detects JANG v2 format and loads instantly via mmap (~50s on Mac Studio for this model size).
17
+ - All quantization, loading, and inference tuning is handled by MLX Studio — no extra setup required.
18
+
19
+ If you want to run this model and do not have MLX Studio, wait for the public release.
20
+
21
+ ---
22
+
23
+ ## Model summary
24
+
25
+ | Field | Value |
26
+ |---|---|
27
+ | Base architecture | GLM-5.1 (ZhipuAI / THUDM) — MoE, 744B total params, 40B active, 256 routed experts top-8, 78 transformer layers + 1 MTP |
28
+ | Attention | MLA (Multi-head Latent Attention) with DSA (Dense Sparse Attention) indexer |
29
+ | Context window | 202,752 tokens |
30
+ | Quantization method | **JANG_1L** — mixed-precision importance quantization (8-bit critical tier, 8-bit important tier, 2-bit compress tier) |
31
+ | Effective bits | **2.15 bits/weight** |
32
+ | On-disk size | **233 GB** |
33
+ | Active RAM during inference | ~235 GB (fits on 256 GB+ Apple Silicon w/ raised `iogpu.wired_limit_mb`) |
34
+ | Format | JANG v2 — MLX-native safetensors, instant mmap load |
35
+ | Source | Converted from the official GLM-5.1 FP8 release |
36
+ | Mode | Text-only |
37
+
38
+ **Why JANG_1L specifically?** The `JANG_1L` profile applies maximum-quality protection to the critical tensors (attention MLA `embed_q`/`unembed_out`, router gates, `lm_head`, token embeddings, MLA KV compression) while allowing the routed expert MLPs to go to 2 bits. At 744B params with 256 experts, most of the weight budget lives in the routed experts — compressing them aggressively while keeping the attention and routing fully-precise is the sweet spot for MoE at 2-bit average.
39
+
40
+ ---
41
+
42
+ ## Running the model
43
+
44
+ Short-form factual or instruction prompts (recommended default):
45
+
46
+ ```python
47
+ from mlx_studio import load, generate, make_sampler
48
+
49
+ model, tokenizer = load("GLM-5.1-JANG_1L")
50
+
51
+ messages = [{"role": "user", "content": "What is the capital of France? Answer in one word."}]
52
+ prompt = tokenizer.apply_chat_template(
53
+ messages, add_generation_prompt=True, tokenize=False,
54
+ enable_thinking=False, # direct-answer mode for short-form prompts
55
+ )
56
+ print(generate(model, tokenizer, prompt=prompt, max_tokens=60,
57
+ sampler=make_sampler(temp=0.0)))
58
+ # → "Paris"
59
+ ```
60
+
61
+ Multi-step reasoning (larger budget, thinking mode on):
62
+
63
+ ```python
64
+ messages = [{"role": "user", "content": "If I drop a glass on a hard floor, what will happen? Explain."}]
65
+ prompt = tokenizer.apply_chat_template(
66
+ messages, add_generation_prompt=True, tokenize=False,
67
+ enable_thinking=True,
68
+ )
69
+ print(generate(model, tokenizer, prompt=prompt, max_tokens=1024,
70
+ sampler=make_sampler(temp=0.0)))
71
+ ```
72
+
73
+ ### Sampling recommendations
74
+
75
+ | Task | `enable_thinking` | `temp` | `top_p` | `max_tokens` |
76
+ |---|---|---|---|---|
77
+ | Short factual QA (one-word, one-number answers) | `False` | `0.0` (greedy) | — | 60 |
78
+ | Conversational / general | `False` | `0.7` | `0.9` | 256 |
79
+ | Multi-step reasoning | `True` | `0.0` or `1.0` | `0.95` | **1024+** |
80
+
81
+ **Do not** apply repetition penalty to math or factual prompts — GLM-5.1 penalizes correct-answer repetition (e.g. `"47+38=85"` becomes `"47+38=5, 7, 10"`).
82
+
83
+ Reasoning mode needs room: the `<think>...</think>` block can consume 300-800 tokens before the final answer. Budget at least 1024 `max_tokens` for any serious reasoning task.
84
+
85
+ ---
86
+
87
+ ## Performance snapshot (informal — full benchmarks TBD)
88
+
89
+ Tested on Apple M3 Ultra (256 GB unified memory) via MLX Studio.
90
+
91
+ | Metric | Value |
92
+ |---|---|
93
+ | Cold load time (mmap) | ~54 s |
94
+ | Short-form answer latency | **<1 s** after load |
95
+ | Reasoning generation speed | ~5–7 tok/s |
96
+ | RAM footprint during generation | ~235 GB wired |
97
+
98
+ ### Qualitative coherence (10-prompt private benchmark, greedy)
99
+
100
+ | Mode | Coherent | Notes |
101
+ |---|---|---|
102
+ | `enable_thinking=False` short-form | **7/10** | Correct on Paris/Au/85/buenos días/sky-blue/glass-breaks/ocean poem; partial on pi digits and code one-liner; fails on multi-step word problems |
103
+ | `enable_thinking=True` reasoning | **9/10** coherent reasoning chains | Most prompts need `max_tokens ≥ 1000` to emit the final `</think>` + answer; some chain-of-thought reaches the correct conclusion in the `<think>` block |
104
+
105
+ **Formal benchmarks (MMLU, GSM8K, HumanEval, BBH, GPQA, etc.) coming in a follow-up revision.**
106
+
107
+ ---
108
+
109
+ ## Known limitations
110
+
111
+ 1. **Reasoning budget** — many `enable_thinking=True` prompts need `max_tokens ≥ 1024` to fully emit their reasoning chain and final answer. Setting lower budgets will truncate mid-analysis.
112
+ 2. **Code generation at 2-bit** — simple Python one-liners sometimes get stuck in slicing-notation patterns. Expect rough edges on code tasks until future revisions.
113
+ 3. **Word problems under short budget** — multi-step word problems (Alice-apples-style) sometimes degenerate into numeric repetition when `enable_thinking=False`. Use `enable_thinking=True` + larger budget for any word problem that requires more than one algebraic step.
114
+ 4. **Memory requirement** — this model requires **≥250 GB of GPU-wired memory**. On Mac Studio, verify `sysctl iogpu.wired_limit_mb` returns `250000` or higher before loading.
115
+ 5. **MLX Studio only** — the model depends on MLX Studio's inference runtime. It will not run under stock `mlx_lm` or `mlx_vlm`. Attempting to do so will produce repetition loops during generation.
116
+
117
+ ---
118
+
119
+ ## Credits
120
+
121
+ - **Quantization & conversion** — Jinho Jang, `eric@jangq.ai`
122
+ - **Runtime** — MLX Studio by Jinho Jang
123
+ - **Base model** — GLM-5.1 by ZhipuAI / THUDM / zai-org (see the original model card for training data, intended use, and safety information)
124
+
125
+ All JANG tooling and MLX Studio are commercial products of Jinho Jang. Please refer to the MLX Studio project page for licensing terms.
126
+
127
+ ---
128
+
129
+ ## Status & roadmap
130
+
131
+ - [x] Initial conversion + runtime validation on Apple M3 Ultra
132
+ - [x] Short-form factual QA verified coherent
133
+ - [x] Reasoning mode (`enable_thinking=True`) verified coherent through multi-step chains
134
+ - [ ] Formal benchmark sweep — MMLU, GSM8K, HumanEval, BBH, GPQA — uploading in a follow-up revision
135
+ - [ ] Sampling-config tuning for code and multi-step word problems
136
+ - [x] **`GLM-5.1-JANG_2S` — currently converting**. JANG_2S uses the `(6, 4, 2)` bit tuple — tighter critical and important tiers vs JANG_1L's `(8, 8, 2)`, for users who want a slightly smaller file footprint at the cost of attention-layer precision. Upload to follow once conversion completes and benchmarks are run head-to-head against JANG_1L.
137
+ - [ ] Additional profile variants (JANG_2L, JANG_3M) under evaluation
138
+
139
+ Questions or issues — contact `eric@jangq.ai`.
chat_template.jinja ADDED
@@ -0,0 +1,117 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [gMASK]<sop>
2
+ {%- if tools -%}
3
+ {%- macro tool_to_json(tool) -%}
4
+ {%- set ns_tool = namespace(first=true) -%}
5
+ {{ '{' -}}
6
+ {%- for k, v in tool.items() -%}
7
+ {%- if k != 'defer_loading' and k != 'strict' -%}
8
+ {%- if not ns_tool.first -%}{{- ', ' -}}{%- endif -%}
9
+ {%- set ns_tool.first = false -%}
10
+ "{{ k }}": {{ v | tojson(ensure_ascii=False) }}
11
+ {%- endif -%}
12
+ {%- endfor -%}
13
+ {{- '}' -}}
14
+ {%- endmacro -%}
15
+ <|system|>
16
+ # Tools
17
+
18
+ You may call one or more functions to assist with the user query.
19
+
20
+ You are provided with function signatures within <tools></tools> XML tags:
21
+ <tools>
22
+ {% for tool in tools %}
23
+ {%- if 'function' in tool -%}
24
+ {%- set tool = tool['function'] -%}
25
+ {%- endif -%}
26
+ {% if tool.defer_loading is not defined or not tool.defer_loading %}
27
+ {{ tool_to_json(tool) }}
28
+ {% endif %}
29
+ {% endfor %}
30
+ </tools>
31
+
32
+ For each function call, output the function name and arguments within the following XML format:
33
+ <tool_call>{function-name}<arg_key>{arg-key-1}</arg_key><arg_value>{arg-value-1}</arg_value><arg_key>{arg-key-2}</arg_key><arg_value>{arg-value-2}</arg_value>...</tool_call>{%- endif -%}
34
+ {%- macro visible_text(content) -%}
35
+ {%- if content is string -%}
36
+ {{- content }}
37
+ {%- elif content is iterable and content is not mapping -%}
38
+ {%- for item in content -%}
39
+ {%- if item is mapping and item.type == 'text' -%}
40
+ {{- item.text }}
41
+ {%- elif item is string -%}
42
+ {{- item }}
43
+ {%- endif -%}
44
+ {%- endfor -%}
45
+ {%- else -%}
46
+ {{- content }}
47
+ {%- endif -%}
48
+ {%- endmacro -%}
49
+ {%- set ns = namespace(last_user_index=-1, thinking_indices='') -%}
50
+ {%- for m in messages %}
51
+ {%- if m.role == 'user' %}
52
+ {%- set ns.last_user_index = loop.index0 -%}
53
+ {%- elif m.role == 'assistant' %}
54
+ {%- if m.reasoning_content is string %}
55
+ {%- set ns.thinking_indices = ns.thinking_indices ~ ',' ~ ns.last_user_index ~ ',' -%}
56
+ {%- endif %}
57
+ {%- endif %}
58
+ {%- endfor %}
59
+ {%- set ns.has_thinking = false -%}
60
+ {%- for m in messages -%}
61
+ {%- if m.role == 'user' -%}<|user|>{{ visible_text(m.content) }}{% set ns.has_thinking = (',' ~ loop.index0 ~ ',') in ns.thinking_indices -%}
62
+ {%- elif m.role == 'assistant' -%}
63
+ <|assistant|>
64
+ {%- set content = visible_text(m.content) %}
65
+ {%- if m.reasoning_content is string %}
66
+ {%- set reasoning_content = m.reasoning_content %}
67
+ {%- elif '</think>' in content %}
68
+ {%- set reasoning_content = content.split('</think>')[0].split('<think>')[-1] %}
69
+ {%- set content = content.split('</think>')[-1] %}
70
+ {%- elif loop.index0 > ns.last_user_index and not (enable_thinking is defined and not enable_thinking) %}
71
+ {%- set reasoning_content = '' %}
72
+ {%- elif loop.index0 < ns.last_user_index and ns.has_thinking %}
73
+ {%- set reasoning_content = '' %}
74
+ {%- endif %}
75
+ {%- if ((clear_thinking is defined and not clear_thinking) or loop.index0 > ns.last_user_index) and reasoning_content is defined -%}
76
+ {{ '<think>' + reasoning_content + '</think>'}}
77
+ {%- else -%}
78
+ {{ '</think>' }}
79
+ {%- endif -%}
80
+ {%- if content.strip() -%}
81
+ {{ content.strip() }}
82
+ {%- endif -%}
83
+ {% if m.tool_calls %}
84
+ {% for tc in m.tool_calls %}
85
+ {%- if tc.function %}
86
+ {%- set tc = tc.function %}
87
+ {%- endif %}
88
+ {{- '<tool_call>' + tc.name -}}
89
+ {% set _args = tc.arguments %}{% for k, v in _args.items() %}<arg_key>{{ k }}</arg_key><arg_value>{{ v | tojson(ensure_ascii=False) if v is not string else v }}</arg_value>{% endfor %}</tool_call>{% endfor %}
90
+ {% endif %}
91
+ {%- elif m.role == 'tool' -%}
92
+ {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
93
+ {{- '<|observation|>' -}}
94
+ {%- endif %}
95
+ {%- if m.content is string -%}
96
+ {{- '<tool_response>' + m.content + '</tool_response>' -}}
97
+ {%- else -%}
98
+ {{- '<tool_response><tools>\n' -}}
99
+ {% for tr in m.content %}
100
+ {%- for tool in tools -%}
101
+ {%- if 'function' in tool -%}
102
+ {%- set tool = tool['function'] -%}
103
+ {%- endif -%}
104
+ {%- if tool.name == tr.name -%}
105
+ {{- tool_to_json(tool) + '\n' -}}
106
+ {%- endif -%}
107
+ {%- endfor -%}
108
+ {%- endfor -%}
109
+ {{- '</tools></tool_response>' -}}
110
+ {% endif -%}
111
+ {%- elif m.role == 'system' -%}
112
+ <|system|>{{ visible_text(m.content) }}
113
+ {%- endif -%}
114
+ {%- endfor -%}
115
+ {%- if add_generation_prompt -%}
116
+ <|assistant|>{{- '</think>' if (enable_thinking is defined and not enable_thinking) else '<think>' -}}
117
+ {%- endif -%}
config.json ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "GlmMoeDsaForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "dtype": "bfloat16",
8
+ "eos_token_id": [
9
+ 154820,
10
+ 154827,
11
+ 154829
12
+ ],
13
+ "ep_size": 1,
14
+ "first_k_dense_replace": 3,
15
+ "hidden_act": "silu",
16
+ "head_dim": 64,
17
+ "hidden_size": 6144,
18
+ "index_head_dim": 128,
19
+ "index_n_heads": 32,
20
+ "index_topk": 2048,
21
+ "indexer_rope_interleave": true,
22
+ "initializer_range": 0.02,
23
+ "intermediate_size": 12288,
24
+ "kv_lora_rank": 512,
25
+ "max_position_embeddings": 202752,
26
+ "moe_intermediate_size": 2048,
27
+ "moe_layer_freq": 1,
28
+ "model_type": "glm_moe_dsa",
29
+ "n_group": 1,
30
+ "n_routed_experts": 256,
31
+ "n_shared_experts": 1,
32
+ "norm_topk_prob": true,
33
+ "num_attention_heads": 64,
34
+ "num_experts_per_tok": 8,
35
+ "num_hidden_layers": 78,
36
+ "num_key_value_heads": 64,
37
+ "num_nextn_predict_layers": 1,
38
+ "pad_token_id": 154820,
39
+ "pretraining_tp": 1,
40
+ "q_lora_rank": 2048,
41
+ "qk_head_dim": 256,
42
+ "qk_nope_head_dim": 192,
43
+ "qk_rope_head_dim": 64,
44
+ "rms_norm_eps": 1e-05,
45
+ "rope_interleave": true,
46
+ "rope_parameters": {
47
+ "rope_theta": 1000000,
48
+ "rope_type": "default"
49
+ },
50
+ "routed_scaling_factor": 2.5,
51
+ "scoring_func": "sigmoid",
52
+ "tie_word_embeddings": false,
53
+ "topk_group": 1,
54
+ "topk_method": "noaux_tc",
55
+ "transformers_version": "5.4.0",
56
+ "use_cache": true,
57
+ "v_head_dim": 256,
58
+ "vocab_size": 154880,
59
+ "quantization": {
60
+ "group_size": 64,
61
+ "bits": 2
62
+ }
63
+ }
generation_config.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "eos_token_id": [
4
+ 154820,
5
+ 154827,
6
+ 154829
7
+ ],
8
+ "pad_token_id": 154820,
9
+ "temperature": 1.0,
10
+ "top_p": 0.95,
11
+ "transformers_version": "5.4.0"
12
+ }
jang_config.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "quantization": {
3
+ "method": "jang-importance",
4
+ "profile": "JANG_1L",
5
+ "target_bits": 1.0,
6
+ "actual_bits": 2.15,
7
+ "block_size": 64,
8
+ "calibration_method": "weights",
9
+ "quantization_method": "mse",
10
+ "scoring_method": "weight-magnitude",
11
+ "bit_widths_used": [
12
+ 2,
13
+ 8
14
+ ],
15
+ "quantization_scheme": "asymmetric",
16
+ "quantization_backend": "mx.quantize",
17
+ "hadamard_rotation": false
18
+ },
19
+ "source_model": {
20
+ "name": "GLM-5.1-FP8",
21
+ "dtype": "bfloat16",
22
+ "parameters": "30.4B"
23
+ },
24
+ "architecture": {
25
+ "type": "moe",
26
+ "attention": "mla",
27
+ "has_vision": false,
28
+ "has_ssm": false,
29
+ "has_moe": true
30
+ },
31
+ "runtime": {
32
+ "total_weight_bytes": 212140032,
33
+ "total_weight_gb": 0.2
34
+ },
35
+ "format": "jang",
36
+ "format_version": "2.0"
37
+ }
model-00002-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4b083b03daf0e45c3a913730a2ba3cab0c062a75f2e84d6868b22a6c0dfaa5bf
3
+ size 1011056984
model-00007-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d0e6e638fd1fd1f270ccaf35b6c04f080864d6a8c0491280aee4ecd434448230
3
+ size 1232036160
model-00008-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bb3083b71d6e51223540bfed9e2232fa37de3b7c60271b45b56d4f9ad714ee63
3
+ size 1006633376
model-00015-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b813524f44153ebf8608822e932d1057a7b4f08b65e8b03caf325088f1c2cda6
3
+ size 1006633368
model-00028-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1d8096a3d4fe83103dd83d02c0fb53f4f8ea2a463ebf5492d2cf8e16e7b3830f
3
+ size 1232036160
model-00030-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:96bd901a08a230f7afc2b2c8df6f4e84b218f8226bf91639d24b90a76cc3501a
3
+ size 1006633368
model-00035-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b6f2dccd74bf617d7302a10a0f0d972e2e557b478f45aa105737d8ca8e875067
3
+ size 1055654984
model-00036-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:23d07779e7de0af7cae9d50d81bc77f83956072c2e77c83ebde36bc1084f3e95
3
+ size 1006633376
model-00043-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3286f53888143861ea8e8be1fdfc3a1c0ec5f2cfc4c7a9bdb2b55ebdec8191af
3
+ size 1006633368
model-00046-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6a7e3feced79bd137bbfd32c5b81c029d0179a07d90a023621e9647b23fe90c8
3
+ size 1006633368
model-00049-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7bbf83247bc463f3da51625e9d86d9204198af0fb59c5ab7d199ce8c10631afb
3
+ size 1006633368
model-00051-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c5376136846c69cf9a59ad1fafb251ad874170fc955b52813a526afc5461ea5c
3
+ size 1006633376
model-00054-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4d2aa474f6756e0d88b95a12e09225825415eaeee5480e415fbfbe17144672db
3
+ size 1006633376
model-00071-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:877db64a210f053d08c2e1611a771df81822fa732d0e50aabb57e77c849b0b6b
3
+ size 1232036160
model-00074-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:87adbf03d65886d8596efc9a0ce58ab4132e0de65a2cc368ef52c96583d14174
3
+ size 1232036160
model-00082-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e937b5f49b0a8ccafbe32c3e72f98e1bee87f565bd89b53fd2715205d0ce895b
3
+ size 1006633368
model-00084-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:50a0fffef02cb093407facda820819895c88210515861e942c643dd15362fbb2
3
+ size 1006633376
model-00087-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c9429659347404203391aa1428d38741b59103b418b1e12566a392d6a8810d23
3
+ size 1006633376
model-00088-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:176cb357b5973631069735c7cd26ccd2d90e6d7117014a11b044724773a6a73e
3
+ size 1006633368
model-00109-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8a63bb751dfc2d760ce6763ff12cf04bde2ba69f31ab94be97217872a9e14c75
3
+ size 1006633368
model-00111-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b04acfa8728c4d80fb0f34ef2705c72e9115b98bad6871fd84c3ca038da35ee8
3
+ size 1006633376
model-00114-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3a88f631625296bc6cb3dc8b35e7dd5334d886cb93c52a648889954054e3f83d
3
+ size 1006633376
model-00123-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f41c1ce4f892d2cbbc9c28de486eac038dfbace2f207c9cbd3bc9ac2d9871e0c
3
+ size 1006633376
model-00126-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e3f0946a6c8abc9fa9a79556e789d75aec4cf8d76f66419262664de60e7e24d9
3
+ size 1006633376
model-00129-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:21f54fbea33dc5961f8e72ece60b0c56c24fdde4bc70466e530f8b4fab30cd32
3
+ size 1006633376
model-00134-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c05416a586e52735e2d7f429143ac4074abb3205d17c11933b9a7f94c6784c20
3
+ size 1232036128
model-00150-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:85b7c0635552217b5afa44212b7b595909ddf3fd77c5f0a2035613bcb630b7f8
3
+ size 1006633376
model-00155-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c1aaccf73a9ea7e67c3366e905d796f2e45dfc6ae4ed3963c139e7c06b3c6f60
3
+ size 1232036160
model-00156-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:92f09ce859ce4f419fc8652c7f66c0f0ae30a889ccff3426e44aaf1c3aad6b62
3
+ size 1006633376
model-00162-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c2acc3fc2afce5ec529f3a2cf53e7b7479e162e4ca89fff181a1bf191608c47e
3
+ size 1006633376
model-00167-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c762c0ec1aa5a0b338bfbf048a345773817dad9222c888b8ea2aa98485bb3ab3
3
+ size 1232036128
model-00168-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:24ee68158f22d267376b02a32178b366572c3ce370188052b6186717552bddfe
3
+ size 1006633376
model-00175-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ff9a70353b3092bfa2b81959fb39a376099dac54b319536b60e3572224a921b8
3
+ size 1006633368
model-00183-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e53137fada9f42bab79090179afea5ee6ec5ec673c3a046ce9a6d6619d4021be
3
+ size 1006633376
model-00186-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:97bc971b4d326466ea6498769cfeb162ee69d62851f3f1689c3752256f088665
3
+ size 1006633376
model-00191-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:504bc9236ad48f75686b5409a21836b8b361767e78698e8058e940332af99fd3
3
+ size 1232036160
model-00194-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:58f06e8b94017e1cd1018935348821c4afc8ce26a5cee6b01afb455e78f208c2
3
+ size 1232036160
model-00203-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ed013916d8d2be40d61c707bb1e9948d3d2e307692c20b6fca99d414f149cab0
3
+ size 1232036160
model-00206-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6e35da67a89b73d45b64ba234457b49e18364f6601c89940a198eba9f9c36742
3
+ size 1232036160
model-00209-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:83740ff561eb004abc8fe0bbe6de7f27c6bc7a95bdc0cedfd03876ae2336af7e
3
+ size 1232036160
model-00223-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a1f2c96b163f8d7b8871a3eaf2726e363c046235a161f67643c6a8e05fc0e099
3
+ size 1006633368
model-00226-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cde75d4dbd03a11fb23be36d267c177f259e6af0b3c842348cf3bfbb63e0baa5
3
+ size 1006633368
model-00231-of-00233.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:26441724dd595ab2ad9d5ee6ab1a144663133dbca531ef460c37f61bb5af84d0
3
+ size 1006633368
model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "backend": "tokenizers",
3
+ "clean_up_tokenization_spaces": false,
4
+ "do_lower_case": false,
5
+ "eos_token": "<|endoftext|>",
6
+ "extra_special_tokens": [
7
+ "<|endoftext|>",
8
+ "[MASK]",
9
+ "[gMASK]",
10
+ "[sMASK]",
11
+ "<sop>",
12
+ "<eop>",
13
+ "<|system|>",
14
+ "<|user|>",
15
+ "<|assistant|>",
16
+ "<|observation|>",
17
+ "<|begin_of_image|>",
18
+ "<|end_of_image|>",
19
+ "<|begin_of_video|>",
20
+ "<|end_of_video|>",
21
+ "<|begin_of_audio|>",
22
+ "<|end_of_audio|>",
23
+ "<|begin_of_transcription|>",
24
+ "<|end_of_transcription|>"
25
+ ],
26
+ "is_local": true,
27
+ "model_max_length": 202752,
28
+ "model_specific_special_tokens": {},
29
+ "pad_token": "<|endoftext|>",
30
+ "padding_side": "left",
31
+ "remove_space": false,
32
+ "tokenizer_class": "TokenizersBackend"
33
+ }