nabi-chan commited on
Commit
7a8c54d
·
verified ·
1 Parent(s): ff747c5

Add files using upload-large-folder tool

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,128 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: mlx
3
+ license: "apache-2.0"
4
+ pipeline_tag: text-generation
5
+ language:
6
+ - en
7
+ - ko
8
+ - zh
9
+ - ja
10
+ tags:
11
+ - mlx
12
+ - "mlx-4bit"
13
+ - quantized
14
+ - safetensors
15
+ - apple-silicon
16
+ - Qwen
17
+ - Qwen3.6
18
+ - Qwen3_5_moe
19
+ - reasoning
20
+ - distillation
21
+ - chain-of-thought
22
+ - mixture-of-experts
23
+ - moe
24
+ - lora
25
+ - unsloth
26
+ - abliterated
27
+ - uncensored
28
+ datasets:
29
+ - lordx64/reasoning-distill-opus-4-7-max-sft
30
+ base_model:
31
+ - Qwen/Qwen3.6-35B-A3B
32
+ - lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled
33
+ - huihui-ai/Huihui-Qwen3.6-35B-A3B-Claude-4.7-Opus-abliterated
34
+ model-index: []
35
+ ---
36
+
37
+ # 🌌 `huihui-ai/Huihui-Qwen3.6-35B-A3B-Claude-4.7-Opus-abliterated` converted to MLX 4-bit
38
+
39
+ ## About This Quantization
40
+
41
+ **Apple Sllicon / MLX 4-bit**
42
+
43
+ - **Source Model (BF16)** : [huihui-ai/Huihui-Qwen3.6-35B-A3B-Claude-4.7-Opus-abliterated](https://huggingface.co/huihui-ai/Huihui-Qwen3.6-35B-A3B-Claude-4.7-Opus-abliterated)
44
+ - **Quantized By:** [@nabi-chan](https://huggingface.co/nabi-chan)
45
+
46
+ ### Quickstart
47
+
48
+ #### Install
49
+
50
+ ```bash
51
+ pip install -U "mlx-lm>=0.31.2"
52
+ ```
53
+
54
+ #### Python
55
+
56
+ ```python
57
+ from mlx_lm import load, generate
58
+
59
+ model, tokenizer = load("nabi-chan/Huihui-Qwen3.6-35B-A3B-Claude-4.7-Opus-abliterated-MLX-4bit")
60
+ print(generate(model, tokenizer, prompt="Explain quantum entanglement simply.", max_tokens=128))
61
+ ```
62
+
63
+ #### CLI
64
+
65
+ ```bash
66
+ python3 -m mlx_lm generate \
67
+ --model nabi-chan/Huihui-Qwen3.6-35B-A3B-Claude-4.7-Opus-abliterated-MLX-4bit \
68
+ --prompt "Write a haiku about Apple Silicon." \
69
+ --max-tokens 128
70
+ ```
71
+
72
+ ### Quantization Details
73
+
74
+ | Property | Value |
75
+ | --------------------- | ---------------------------------------------------------------------------------------------- |
76
+ | **Method** | MLX affine quantization |
77
+ | **Bits / weight** | 4 |
78
+ | **Group size** | 64 |
79
+ | **Non-quant dtype** | bfloat16 |
80
+ | **Quantizer version** | `mlx` : 0.31.2 / `mlx-lm` : 0.31.3 / `mlx-vlm`: 0.4.4 |
81
+
82
+ > [!WARNING]
83
+ > Protected tensors keep their original dtype. In VLM models, vision tensors and some guarded layers may remain unquantized.
84
+
85
+ ---
86
+
87
+ Everything below is huihui-ai's original model card, preserved verbatim.
88
+
89
+ ---
90
+
91
+ # huihui-ai/Huihui-Qwen3.6-35B-A3B-Claude-4.7-Opus-abliterated
92
+
93
+
94
+ This is an uncensored version of [lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled](https://huggingface.co/lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled) created with abliteration (see [remove-refusals-with-transformers](https://github.com/Sumandora/remove-refusals-with-transformers) to know more about it).
95
+ This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens.
96
+
97
+ ## ollama
98
+
99
+ Please use the latest version of [ollama](https://github.com/ollama/ollama/releases)
100
+
101
+ You can use [huihui_ai/qwen3.6-abliterated:35b-Claude-4.7](https://ollama.com/huihui_ai/qwen3.6-abliterated:35b-Claude-4.7) directly,
102
+ ```
103
+ ollama run huihui_ai/Qwen3.6-abliterated:35b-Claude-4.7
104
+ ```
105
+
106
+ ### Usage Warnings
107
+
108
+
109
+ - **Risk of Sensitive or Controversial Outputs**: This model’s safety filtering has been significantly reduced, potentially generating sensitive, controversial, or inappropriate content. Users should exercise caution and rigorously review generated outputs.
110
+
111
+ - **Not Suitable for All Audiences**: Due to limited content filtering, the model’s outputs may be inappropriate for public settings, underage users, or applications requiring high security.
112
+
113
+ - **Legal and Ethical Responsibilities**: Users must ensure their usage complies with local laws and ethical standards. Generated content may carry legal or ethical risks, and users are solely responsible for any consequences.
114
+
115
+ - **Research and Experimental Use**: It is recommended to use this model for research, testing, or controlled environments, avoiding direct use in production or public-facing commercial applications.
116
+
117
+ - **Monitoring and Review Recommendations**: Users are strongly advised to monitor model outputs in real-time and conduct manual reviews when necessary to prevent the dissemination of inappropriate content.
118
+
119
+ - **No Default Safety Guarantees**: Unlike standard models, this model has not undergone rigorous safety optimization. huihui.ai bears no responsibility for any consequences arising from its use.
120
+
121
+
122
+ ### Donation
123
+ ##### Your donation helps us continue our further development and improvement, a cup of coffee can do it.
124
+ - bitcoin:
125
+ ```
126
+ bc1qqnkhuchxw0zqjh2ku3lu4hq45hc6gy84uk70ge
127
+ ```
128
+ - Support our work on [Ko-fi](https://ko-fi.com/huihuiai)!
chat_template.jinja ADDED
@@ -0,0 +1,158 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- set image_count = namespace(value=0) %}
2
+ {%- set video_count = namespace(value=0) %}
3
+ {%- macro render_content(content, do_vision_count, is_system_content=false) %}
4
+ {%- if content is string %}
5
+ {{- content }}
6
+ {%- elif content is iterable and content is not mapping %}
7
+ {%- for item in content %}
8
+ {%- if 'image' in item or 'image_url' in item or item.type == 'image' %}
9
+ {%- if is_system_content %}
10
+ {{- raise_exception('System message cannot contain images.') }}
11
+ {%- endif %}
12
+ {%- if do_vision_count %}
13
+ {%- set image_count.value = image_count.value + 1 %}
14
+ {%- endif %}
15
+ {%- if add_vision_id %}
16
+ {{- 'Picture ' ~ image_count.value ~ ': ' }}
17
+ {%- endif %}
18
+ {{- '<|vision_start|><|image_pad|><|vision_end|>' }}
19
+ {%- elif 'video' in item or item.type == 'video' %}
20
+ {%- if is_system_content %}
21
+ {{- raise_exception('System message cannot contain videos.') }}
22
+ {%- endif %}
23
+ {%- if do_vision_count %}
24
+ {%- set video_count.value = video_count.value + 1 %}
25
+ {%- endif %}
26
+ {%- if add_vision_id %}
27
+ {{- 'Video ' ~ video_count.value ~ ': ' }}
28
+ {%- endif %}
29
+ {{- '<|vision_start|><|video_pad|><|vision_end|>' }}
30
+ {%- elif 'text' in item %}
31
+ {{- item.text }}
32
+ {%- else %}
33
+ {{- raise_exception('Unexpected item type in content.') }}
34
+ {%- endif %}
35
+ {%- endfor %}
36
+ {%- elif content is none or content is undefined %}
37
+ {{- '' }}
38
+ {%- else %}
39
+ {{- raise_exception('Unexpected content type.') }}
40
+ {%- endif %}
41
+ {%- endmacro %}
42
+ {%- if not messages %}
43
+ {{- raise_exception('No messages provided.') }}
44
+ {%- endif %}
45
+ {%- set num_sys = 0 %}
46
+ {%- set merged_system = '' %}
47
+ {%- if messages[0].role == 'system' or messages[0].role == 'developer' %}
48
+ {%- set first = render_content(messages[0].content, false, true)|trim %}
49
+ {%- if messages|length > 1 and (messages[1].role == 'system' or messages[1].role == 'developer') %}
50
+ {%- set second = render_content(messages[1].content, false, true)|trim %}
51
+ {%- set merged_system = first + '\n' + second %}
52
+ {%- set num_sys = 2 %}
53
+ {%- else %}
54
+ {%- set merged_system = first %}
55
+ {%- set num_sys = 1 %}
56
+ {%- endif %}
57
+ {%- endif %}
58
+ {%- if tools and tools is iterable and tools is not mapping %}
59
+ {{- '<|im_start|>system\n' }}
60
+ {{- "# Tools\n\nYou have access to the following functions:\n\n<tools>" }}
61
+ {%- for tool in tools %}
62
+ {{- "\n" }}
63
+ {{- tool | tojson }}
64
+ {%- endfor %}
65
+ {{- "\n</tools>" }}
66
+ {{- '\n\nIf you choose to call a function ONLY reply in the following format with NO suffix:\n\n<tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>\nvalue_1\n</parameter>\n<parameter=example_parameter_2>\nThis is the value for the second parameter\nthat can span\nmultiple lines\n</parameter>\n</function>\n</tool_call>\n\n<IMPORTANT>\nReminder:\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags\n- Required parameters MUST be specified\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\n</IMPORTANT>' }}
67
+ {%- if merged_system %}
68
+ {{- '\n\n' + merged_system }}
69
+ {%- endif %}
70
+ {{- '<|im_end|>\n' }}
71
+ {%- else %}
72
+ {%- if merged_system %}
73
+ {{- '<|im_start|>system\n' + merged_system + '<|im_end|>\n' }}
74
+ {%- endif %}
75
+ {%- endif %}
76
+ {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
77
+ {%- for message in messages[::-1] %}
78
+ {%- set index = (messages|length - 1) - loop.index0 %}
79
+ {%- if ns.multi_step_tool and message.role == "user" %}
80
+ {%- set content = render_content(message.content, false)|trim %}
81
+ {%- if not(content.startswith('<tool_response>') and content.endswith('</tool_response>')) %}
82
+ {%- set ns.multi_step_tool = false %}
83
+ {%- set ns.last_query_index = index %}
84
+ {%- endif %}
85
+ {%- endif %}
86
+ {%- endfor %}
87
+ {%- for message in messages %}
88
+ {%- if loop.index0 >= num_sys and message.role != "system" and message.role != "developer" %}
89
+ {%- set content = render_content(message.content, true)|trim %}
90
+ {%- if message.role == "user" %}
91
+ {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
92
+ {%- elif message.role == "assistant" %}
93
+ {%- set reasoning_content = '' %}
94
+ {%- if message.reasoning_content is string %}
95
+ {%- set reasoning_content = message.reasoning_content %}
96
+ {%- else %}
97
+ {%- if '</think>' in content %}
98
+ {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
99
+ {%- set content = content.split('</think>')[-1].lstrip('\n') %}
100
+ {%- endif %}
101
+ {%- endif %}
102
+ {%- set reasoning_content = reasoning_content|trim %}
103
+ {%- if (preserve_thinking is defined and preserve_thinking is true) or (loop.index0 > ns.last_query_index) %}
104
+ {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content + '\n</think>\n\n' + content }}
105
+ {%- else %}
106
+ {{- '<|im_start|>' + message.role + '\n' + content }}
107
+ {%- endif %}
108
+ {%- if message.tool_calls and message.tool_calls is iterable and message.tool_calls is not mapping %}
109
+ {%- for tool_call in message.tool_calls %}
110
+ {%- if tool_call.function is defined %}
111
+ {%- set tool_call = tool_call.function %}
112
+ {%- endif %}
113
+ {%- if loop.first %}
114
+ {%- if content|trim %}
115
+ {{- '\n\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
116
+ {%- else %}
117
+ {{- '<tool_call>\n<function=' + tool_call.name + '>\n' }}
118
+ {%- endif %}
119
+ {%- else %}
120
+ {{- '\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
121
+ {%- endif %}
122
+ {%- if tool_call.arguments is mapping %}
123
+ {%- for args_name in tool_call.arguments %}
124
+ {%- set args_value = tool_call.arguments[args_name] %}
125
+ {{- '<parameter=' + args_name + '>\n' }}
126
+ {%- set args_value = args_value | tojson | safe if args_value is mapping or (args_value is sequence and args_value is not string) else args_value | string %}
127
+ {{- args_value }}
128
+ {{- '\n</parameter>\n' }}
129
+ {%- endfor %}
130
+ {%- endif %}
131
+ {{- '</function>\n</tool_call>' }}
132
+ {%- endfor %}
133
+ {%- endif %}
134
+ {{- '<|im_end|>\n' }}
135
+ {%- elif message.role == "tool" %}
136
+ {%- if loop.previtem and loop.previtem.role != "tool" %}
137
+ {{- '<|im_start|>user' }}
138
+ {%- endif %}
139
+ {{- '\n<tool_response>\n' }}
140
+ {{- content }}
141
+ {{- '\n</tool_response>' }}
142
+ {%- if not loop.last and loop.nextitem.role != "tool" %}
143
+ {{- '<|im_end|>\n' }}
144
+ {%- elif loop.last %}
145
+ {{- '<|im_end|>\n' }}
146
+ {%- endif %}
147
+ {%- endif %}
148
+ {%- endif %}
149
+ {%- endfor %}
150
+ {%- if add_generation_prompt %}
151
+ {{- '<|im_start|>assistant\n' }}
152
+ {%- if enable_thinking is defined and enable_thinking is false %}
153
+ {{- '<think>\n\n</think>\n\n' }}
154
+ {%- else %}
155
+ {{- '<think>\n' }}
156
+ {%- endif %}
157
+ {%- endif %}
158
+ {#- Unsloth fixes - developer role, tool calling #}
config.json ADDED
@@ -0,0 +1,2055 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Qwen3_5MoeForConditionalGeneration"
4
+ ],
5
+ "bos_token_id": null,
6
+ "eos_token_id": 248046,
7
+ "image_token_id": 248056,
8
+ "model_name": "unsloth/Qwen3.6-35B-A3B",
9
+ "model_type": "qwen3_5_moe",
10
+ "pad_token_id": 248055,
11
+ "quantization": {
12
+ "group_size": 64,
13
+ "bits": 4,
14
+ "mode": "affine",
15
+ "language_model.model.embed_tokens": {
16
+ "bits": 8,
17
+ "group_size": 64,
18
+ "mode": "affine"
19
+ },
20
+ "language_model.model.layers.0.linear_attn.out_proj": {
21
+ "bits": 5,
22
+ "group_size": 64,
23
+ "mode": "affine"
24
+ },
25
+ "language_model.model.layers.0.mlp.shared_expert.gate_proj": {
26
+ "bits": 8,
27
+ "group_size": 64,
28
+ "mode": "affine"
29
+ },
30
+ "language_model.model.layers.0.mlp.shared_expert.down_proj": {
31
+ "bits": 8,
32
+ "group_size": 64,
33
+ "mode": "affine"
34
+ },
35
+ "language_model.model.layers.0.mlp.shared_expert.up_proj": {
36
+ "bits": 8,
37
+ "group_size": 64,
38
+ "mode": "affine"
39
+ },
40
+ "language_model.model.layers.0.mlp.shared_expert_gate": {
41
+ "bits": 8,
42
+ "group_size": 64,
43
+ "mode": "affine"
44
+ },
45
+ "language_model.model.layers.1.linear_attn.out_proj": {
46
+ "bits": 5,
47
+ "group_size": 64,
48
+ "mode": "affine"
49
+ },
50
+ "language_model.model.layers.1.mlp.shared_expert.gate_proj": {
51
+ "bits": 8,
52
+ "group_size": 64,
53
+ "mode": "affine"
54
+ },
55
+ "language_model.model.layers.1.mlp.shared_expert.down_proj": {
56
+ "bits": 8,
57
+ "group_size": 64,
58
+ "mode": "affine"
59
+ },
60
+ "language_model.model.layers.1.mlp.shared_expert.up_proj": {
61
+ "bits": 8,
62
+ "group_size": 64,
63
+ "mode": "affine"
64
+ },
65
+ "language_model.model.layers.1.mlp.shared_expert_gate": {
66
+ "bits": 8,
67
+ "group_size": 64,
68
+ "mode": "affine"
69
+ },
70
+ "language_model.model.layers.2.linear_attn.out_proj": {
71
+ "bits": 5,
72
+ "group_size": 64,
73
+ "mode": "affine"
74
+ },
75
+ "language_model.model.layers.2.mlp.shared_expert.gate_proj": {
76
+ "bits": 8,
77
+ "group_size": 64,
78
+ "mode": "affine"
79
+ },
80
+ "language_model.model.layers.2.mlp.shared_expert.down_proj": {
81
+ "bits": 8,
82
+ "group_size": 64,
83
+ "mode": "affine"
84
+ },
85
+ "language_model.model.layers.2.mlp.shared_expert.up_proj": {
86
+ "bits": 8,
87
+ "group_size": 64,
88
+ "mode": "affine"
89
+ },
90
+ "language_model.model.layers.2.mlp.shared_expert_gate": {
91
+ "bits": 8,
92
+ "group_size": 64,
93
+ "mode": "affine"
94
+ },
95
+ "language_model.model.layers.3.mlp.shared_expert.gate_proj": {
96
+ "bits": 8,
97
+ "group_size": 64,
98
+ "mode": "affine"
99
+ },
100
+ "language_model.model.layers.3.mlp.shared_expert.down_proj": {
101
+ "bits": 8,
102
+ "group_size": 64,
103
+ "mode": "affine"
104
+ },
105
+ "language_model.model.layers.3.mlp.shared_expert.up_proj": {
106
+ "bits": 8,
107
+ "group_size": 64,
108
+ "mode": "affine"
109
+ },
110
+ "language_model.model.layers.3.mlp.shared_expert_gate": {
111
+ "bits": 8,
112
+ "group_size": 64,
113
+ "mode": "affine"
114
+ },
115
+ "language_model.model.layers.4.linear_attn.out_proj": {
116
+ "bits": 5,
117
+ "group_size": 64,
118
+ "mode": "affine"
119
+ },
120
+ "language_model.model.layers.4.mlp.shared_expert.gate_proj": {
121
+ "bits": 8,
122
+ "group_size": 64,
123
+ "mode": "affine"
124
+ },
125
+ "language_model.model.layers.4.mlp.shared_expert.down_proj": {
126
+ "bits": 8,
127
+ "group_size": 64,
128
+ "mode": "affine"
129
+ },
130
+ "language_model.model.layers.4.mlp.shared_expert.up_proj": {
131
+ "bits": 8,
132
+ "group_size": 64,
133
+ "mode": "affine"
134
+ },
135
+ "language_model.model.layers.4.mlp.shared_expert_gate": {
136
+ "bits": 8,
137
+ "group_size": 64,
138
+ "mode": "affine"
139
+ },
140
+ "language_model.model.layers.5.linear_attn.out_proj": {
141
+ "bits": 5,
142
+ "group_size": 64,
143
+ "mode": "affine"
144
+ },
145
+ "language_model.model.layers.5.mlp.shared_expert.gate_proj": {
146
+ "bits": 8,
147
+ "group_size": 64,
148
+ "mode": "affine"
149
+ },
150
+ "language_model.model.layers.5.mlp.shared_expert.down_proj": {
151
+ "bits": 8,
152
+ "group_size": 64,
153
+ "mode": "affine"
154
+ },
155
+ "language_model.model.layers.5.mlp.shared_expert.up_proj": {
156
+ "bits": 8,
157
+ "group_size": 64,
158
+ "mode": "affine"
159
+ },
160
+ "language_model.model.layers.5.mlp.shared_expert_gate": {
161
+ "bits": 8,
162
+ "group_size": 64,
163
+ "mode": "affine"
164
+ },
165
+ "language_model.model.layers.6.linear_attn.out_proj": {
166
+ "bits": 5,
167
+ "group_size": 64,
168
+ "mode": "affine"
169
+ },
170
+ "language_model.model.layers.6.mlp.shared_expert.gate_proj": {
171
+ "bits": 8,
172
+ "group_size": 64,
173
+ "mode": "affine"
174
+ },
175
+ "language_model.model.layers.6.mlp.shared_expert.down_proj": {
176
+ "bits": 8,
177
+ "group_size": 64,
178
+ "mode": "affine"
179
+ },
180
+ "language_model.model.layers.6.mlp.shared_expert.up_proj": {
181
+ "bits": 8,
182
+ "group_size": 64,
183
+ "mode": "affine"
184
+ },
185
+ "language_model.model.layers.6.mlp.shared_expert_gate": {
186
+ "bits": 8,
187
+ "group_size": 64,
188
+ "mode": "affine"
189
+ },
190
+ "language_model.model.layers.7.mlp.shared_expert.gate_proj": {
191
+ "bits": 8,
192
+ "group_size": 64,
193
+ "mode": "affine"
194
+ },
195
+ "language_model.model.layers.7.mlp.shared_expert.down_proj": {
196
+ "bits": 8,
197
+ "group_size": 64,
198
+ "mode": "affine"
199
+ },
200
+ "language_model.model.layers.7.mlp.shared_expert.up_proj": {
201
+ "bits": 8,
202
+ "group_size": 64,
203
+ "mode": "affine"
204
+ },
205
+ "language_model.model.layers.7.mlp.shared_expert_gate": {
206
+ "bits": 8,
207
+ "group_size": 64,
208
+ "mode": "affine"
209
+ },
210
+ "language_model.model.layers.8.linear_attn.out_proj": {
211
+ "bits": 5,
212
+ "group_size": 64,
213
+ "mode": "affine"
214
+ },
215
+ "language_model.model.layers.8.mlp.shared_expert.gate_proj": {
216
+ "bits": 8,
217
+ "group_size": 64,
218
+ "mode": "affine"
219
+ },
220
+ "language_model.model.layers.8.mlp.shared_expert.down_proj": {
221
+ "bits": 8,
222
+ "group_size": 64,
223
+ "mode": "affine"
224
+ },
225
+ "language_model.model.layers.8.mlp.shared_expert.up_proj": {
226
+ "bits": 8,
227
+ "group_size": 64,
228
+ "mode": "affine"
229
+ },
230
+ "language_model.model.layers.8.mlp.shared_expert_gate": {
231
+ "bits": 8,
232
+ "group_size": 64,
233
+ "mode": "affine"
234
+ },
235
+ "language_model.model.layers.9.linear_attn.out_proj": {
236
+ "bits": 5,
237
+ "group_size": 64,
238
+ "mode": "affine"
239
+ },
240
+ "language_model.model.layers.9.mlp.shared_expert.gate_proj": {
241
+ "bits": 8,
242
+ "group_size": 64,
243
+ "mode": "affine"
244
+ },
245
+ "language_model.model.layers.9.mlp.shared_expert.down_proj": {
246
+ "bits": 8,
247
+ "group_size": 64,
248
+ "mode": "affine"
249
+ },
250
+ "language_model.model.layers.9.mlp.shared_expert.up_proj": {
251
+ "bits": 8,
252
+ "group_size": 64,
253
+ "mode": "affine"
254
+ },
255
+ "language_model.model.layers.9.mlp.shared_expert_gate": {
256
+ "bits": 8,
257
+ "group_size": 64,
258
+ "mode": "affine"
259
+ },
260
+ "language_model.model.layers.10.linear_attn.out_proj": {
261
+ "bits": 5,
262
+ "group_size": 64,
263
+ "mode": "affine"
264
+ },
265
+ "language_model.model.layers.10.mlp.shared_expert.gate_proj": {
266
+ "bits": 8,
267
+ "group_size": 64,
268
+ "mode": "affine"
269
+ },
270
+ "language_model.model.layers.10.mlp.shared_expert.down_proj": {
271
+ "bits": 8,
272
+ "group_size": 64,
273
+ "mode": "affine"
274
+ },
275
+ "language_model.model.layers.10.mlp.shared_expert.up_proj": {
276
+ "bits": 8,
277
+ "group_size": 64,
278
+ "mode": "affine"
279
+ },
280
+ "language_model.model.layers.10.mlp.shared_expert_gate": {
281
+ "bits": 8,
282
+ "group_size": 64,
283
+ "mode": "affine"
284
+ },
285
+ "language_model.model.layers.11.mlp.shared_expert.gate_proj": {
286
+ "bits": 8,
287
+ "group_size": 64,
288
+ "mode": "affine"
289
+ },
290
+ "language_model.model.layers.11.mlp.shared_expert.down_proj": {
291
+ "bits": 8,
292
+ "group_size": 64,
293
+ "mode": "affine"
294
+ },
295
+ "language_model.model.layers.11.mlp.shared_expert.up_proj": {
296
+ "bits": 8,
297
+ "group_size": 64,
298
+ "mode": "affine"
299
+ },
300
+ "language_model.model.layers.11.mlp.shared_expert_gate": {
301
+ "bits": 8,
302
+ "group_size": 64,
303
+ "mode": "affine"
304
+ },
305
+ "language_model.model.layers.12.linear_attn.out_proj": {
306
+ "bits": 5,
307
+ "group_size": 64,
308
+ "mode": "affine"
309
+ },
310
+ "language_model.model.layers.12.mlp.shared_expert.gate_proj": {
311
+ "bits": 8,
312
+ "group_size": 64,
313
+ "mode": "affine"
314
+ },
315
+ "language_model.model.layers.12.mlp.shared_expert.down_proj": {
316
+ "bits": 8,
317
+ "group_size": 64,
318
+ "mode": "affine"
319
+ },
320
+ "language_model.model.layers.12.mlp.shared_expert.up_proj": {
321
+ "bits": 8,
322
+ "group_size": 64,
323
+ "mode": "affine"
324
+ },
325
+ "language_model.model.layers.12.mlp.shared_expert_gate": {
326
+ "bits": 8,
327
+ "group_size": 64,
328
+ "mode": "affine"
329
+ },
330
+ "language_model.model.layers.13.linear_attn.out_proj": {
331
+ "bits": 5,
332
+ "group_size": 64,
333
+ "mode": "affine"
334
+ },
335
+ "language_model.model.layers.13.mlp.shared_expert.gate_proj": {
336
+ "bits": 8,
337
+ "group_size": 64,
338
+ "mode": "affine"
339
+ },
340
+ "language_model.model.layers.13.mlp.shared_expert.down_proj": {
341
+ "bits": 8,
342
+ "group_size": 64,
343
+ "mode": "affine"
344
+ },
345
+ "language_model.model.layers.13.mlp.shared_expert.up_proj": {
346
+ "bits": 8,
347
+ "group_size": 64,
348
+ "mode": "affine"
349
+ },
350
+ "language_model.model.layers.13.mlp.shared_expert_gate": {
351
+ "bits": 8,
352
+ "group_size": 64,
353
+ "mode": "affine"
354
+ },
355
+ "language_model.model.layers.14.linear_attn.out_proj": {
356
+ "bits": 5,
357
+ "group_size": 64,
358
+ "mode": "affine"
359
+ },
360
+ "language_model.model.layers.14.mlp.shared_expert.gate_proj": {
361
+ "bits": 8,
362
+ "group_size": 64,
363
+ "mode": "affine"
364
+ },
365
+ "language_model.model.layers.14.mlp.shared_expert.down_proj": {
366
+ "bits": 8,
367
+ "group_size": 64,
368
+ "mode": "affine"
369
+ },
370
+ "language_model.model.layers.14.mlp.shared_expert.up_proj": {
371
+ "bits": 8,
372
+ "group_size": 64,
373
+ "mode": "affine"
374
+ },
375
+ "language_model.model.layers.14.mlp.shared_expert_gate": {
376
+ "bits": 8,
377
+ "group_size": 64,
378
+ "mode": "affine"
379
+ },
380
+ "language_model.model.layers.15.mlp.shared_expert.gate_proj": {
381
+ "bits": 8,
382
+ "group_size": 64,
383
+ "mode": "affine"
384
+ },
385
+ "language_model.model.layers.15.mlp.shared_expert.down_proj": {
386
+ "bits": 8,
387
+ "group_size": 64,
388
+ "mode": "affine"
389
+ },
390
+ "language_model.model.layers.15.mlp.shared_expert.up_proj": {
391
+ "bits": 8,
392
+ "group_size": 64,
393
+ "mode": "affine"
394
+ },
395
+ "language_model.model.layers.15.mlp.shared_expert_gate": {
396
+ "bits": 8,
397
+ "group_size": 64,
398
+ "mode": "affine"
399
+ },
400
+ "language_model.model.layers.16.linear_attn.out_proj": {
401
+ "bits": 5,
402
+ "group_size": 64,
403
+ "mode": "affine"
404
+ },
405
+ "language_model.model.layers.16.mlp.shared_expert.gate_proj": {
406
+ "bits": 8,
407
+ "group_size": 64,
408
+ "mode": "affine"
409
+ },
410
+ "language_model.model.layers.16.mlp.shared_expert.down_proj": {
411
+ "bits": 8,
412
+ "group_size": 64,
413
+ "mode": "affine"
414
+ },
415
+ "language_model.model.layers.16.mlp.shared_expert.up_proj": {
416
+ "bits": 8,
417
+ "group_size": 64,
418
+ "mode": "affine"
419
+ },
420
+ "language_model.model.layers.16.mlp.shared_expert_gate": {
421
+ "bits": 8,
422
+ "group_size": 64,
423
+ "mode": "affine"
424
+ },
425
+ "language_model.model.layers.17.linear_attn.out_proj": {
426
+ "bits": 5,
427
+ "group_size": 64,
428
+ "mode": "affine"
429
+ },
430
+ "language_model.model.layers.17.mlp.shared_expert.gate_proj": {
431
+ "bits": 8,
432
+ "group_size": 64,
433
+ "mode": "affine"
434
+ },
435
+ "language_model.model.layers.17.mlp.shared_expert.down_proj": {
436
+ "bits": 8,
437
+ "group_size": 64,
438
+ "mode": "affine"
439
+ },
440
+ "language_model.model.layers.17.mlp.shared_expert.up_proj": {
441
+ "bits": 8,
442
+ "group_size": 64,
443
+ "mode": "affine"
444
+ },
445
+ "language_model.model.layers.17.mlp.shared_expert_gate": {
446
+ "bits": 8,
447
+ "group_size": 64,
448
+ "mode": "affine"
449
+ },
450
+ "language_model.model.layers.18.linear_attn.out_proj": {
451
+ "bits": 5,
452
+ "group_size": 64,
453
+ "mode": "affine"
454
+ },
455
+ "language_model.model.layers.18.mlp.shared_expert.gate_proj": {
456
+ "bits": 8,
457
+ "group_size": 64,
458
+ "mode": "affine"
459
+ },
460
+ "language_model.model.layers.18.mlp.shared_expert.down_proj": {
461
+ "bits": 8,
462
+ "group_size": 64,
463
+ "mode": "affine"
464
+ },
465
+ "language_model.model.layers.18.mlp.shared_expert.up_proj": {
466
+ "bits": 8,
467
+ "group_size": 64,
468
+ "mode": "affine"
469
+ },
470
+ "language_model.model.layers.18.mlp.shared_expert_gate": {
471
+ "bits": 8,
472
+ "group_size": 64,
473
+ "mode": "affine"
474
+ },
475
+ "language_model.model.layers.19.mlp.shared_expert.gate_proj": {
476
+ "bits": 8,
477
+ "group_size": 64,
478
+ "mode": "affine"
479
+ },
480
+ "language_model.model.layers.19.mlp.shared_expert.down_proj": {
481
+ "bits": 8,
482
+ "group_size": 64,
483
+ "mode": "affine"
484
+ },
485
+ "language_model.model.layers.19.mlp.shared_expert.up_proj": {
486
+ "bits": 8,
487
+ "group_size": 64,
488
+ "mode": "affine"
489
+ },
490
+ "language_model.model.layers.19.mlp.shared_expert_gate": {
491
+ "bits": 8,
492
+ "group_size": 64,
493
+ "mode": "affine"
494
+ },
495
+ "language_model.model.layers.20.linear_attn.out_proj": {
496
+ "bits": 5,
497
+ "group_size": 64,
498
+ "mode": "affine"
499
+ },
500
+ "language_model.model.layers.20.mlp.shared_expert.gate_proj": {
501
+ "bits": 8,
502
+ "group_size": 64,
503
+ "mode": "affine"
504
+ },
505
+ "language_model.model.layers.20.mlp.shared_expert.down_proj": {
506
+ "bits": 8,
507
+ "group_size": 64,
508
+ "mode": "affine"
509
+ },
510
+ "language_model.model.layers.20.mlp.shared_expert.up_proj": {
511
+ "bits": 8,
512
+ "group_size": 64,
513
+ "mode": "affine"
514
+ },
515
+ "language_model.model.layers.20.mlp.shared_expert_gate": {
516
+ "bits": 8,
517
+ "group_size": 64,
518
+ "mode": "affine"
519
+ },
520
+ "language_model.model.layers.21.linear_attn.out_proj": {
521
+ "bits": 5,
522
+ "group_size": 64,
523
+ "mode": "affine"
524
+ },
525
+ "language_model.model.layers.21.mlp.shared_expert.gate_proj": {
526
+ "bits": 8,
527
+ "group_size": 64,
528
+ "mode": "affine"
529
+ },
530
+ "language_model.model.layers.21.mlp.shared_expert.down_proj": {
531
+ "bits": 8,
532
+ "group_size": 64,
533
+ "mode": "affine"
534
+ },
535
+ "language_model.model.layers.21.mlp.shared_expert.up_proj": {
536
+ "bits": 8,
537
+ "group_size": 64,
538
+ "mode": "affine"
539
+ },
540
+ "language_model.model.layers.21.mlp.shared_expert_gate": {
541
+ "bits": 8,
542
+ "group_size": 64,
543
+ "mode": "affine"
544
+ },
545
+ "language_model.model.layers.22.linear_attn.out_proj": {
546
+ "bits": 5,
547
+ "group_size": 64,
548
+ "mode": "affine"
549
+ },
550
+ "language_model.model.layers.22.mlp.shared_expert.gate_proj": {
551
+ "bits": 8,
552
+ "group_size": 64,
553
+ "mode": "affine"
554
+ },
555
+ "language_model.model.layers.22.mlp.shared_expert.down_proj": {
556
+ "bits": 8,
557
+ "group_size": 64,
558
+ "mode": "affine"
559
+ },
560
+ "language_model.model.layers.22.mlp.shared_expert.up_proj": {
561
+ "bits": 8,
562
+ "group_size": 64,
563
+ "mode": "affine"
564
+ },
565
+ "language_model.model.layers.22.mlp.shared_expert_gate": {
566
+ "bits": 8,
567
+ "group_size": 64,
568
+ "mode": "affine"
569
+ },
570
+ "language_model.model.layers.23.mlp.shared_expert.gate_proj": {
571
+ "bits": 8,
572
+ "group_size": 64,
573
+ "mode": "affine"
574
+ },
575
+ "language_model.model.layers.23.mlp.shared_expert.down_proj": {
576
+ "bits": 8,
577
+ "group_size": 64,
578
+ "mode": "affine"
579
+ },
580
+ "language_model.model.layers.23.mlp.shared_expert.up_proj": {
581
+ "bits": 8,
582
+ "group_size": 64,
583
+ "mode": "affine"
584
+ },
585
+ "language_model.model.layers.23.mlp.shared_expert_gate": {
586
+ "bits": 8,
587
+ "group_size": 64,
588
+ "mode": "affine"
589
+ },
590
+ "language_model.model.layers.24.linear_attn.out_proj": {
591
+ "bits": 5,
592
+ "group_size": 64,
593
+ "mode": "affine"
594
+ },
595
+ "language_model.model.layers.24.mlp.shared_expert.gate_proj": {
596
+ "bits": 8,
597
+ "group_size": 64,
598
+ "mode": "affine"
599
+ },
600
+ "language_model.model.layers.24.mlp.shared_expert.down_proj": {
601
+ "bits": 8,
602
+ "group_size": 64,
603
+ "mode": "affine"
604
+ },
605
+ "language_model.model.layers.24.mlp.shared_expert.up_proj": {
606
+ "bits": 8,
607
+ "group_size": 64,
608
+ "mode": "affine"
609
+ },
610
+ "language_model.model.layers.24.mlp.shared_expert_gate": {
611
+ "bits": 8,
612
+ "group_size": 64,
613
+ "mode": "affine"
614
+ },
615
+ "language_model.model.layers.25.linear_attn.out_proj": {
616
+ "bits": 5,
617
+ "group_size": 64,
618
+ "mode": "affine"
619
+ },
620
+ "language_model.model.layers.25.mlp.shared_expert.gate_proj": {
621
+ "bits": 8,
622
+ "group_size": 64,
623
+ "mode": "affine"
624
+ },
625
+ "language_model.model.layers.25.mlp.shared_expert.down_proj": {
626
+ "bits": 8,
627
+ "group_size": 64,
628
+ "mode": "affine"
629
+ },
630
+ "language_model.model.layers.25.mlp.shared_expert.up_proj": {
631
+ "bits": 8,
632
+ "group_size": 64,
633
+ "mode": "affine"
634
+ },
635
+ "language_model.model.layers.25.mlp.shared_expert_gate": {
636
+ "bits": 8,
637
+ "group_size": 64,
638
+ "mode": "affine"
639
+ },
640
+ "language_model.model.layers.26.linear_attn.out_proj": {
641
+ "bits": 5,
642
+ "group_size": 64,
643
+ "mode": "affine"
644
+ },
645
+ "language_model.model.layers.26.mlp.shared_expert.gate_proj": {
646
+ "bits": 8,
647
+ "group_size": 64,
648
+ "mode": "affine"
649
+ },
650
+ "language_model.model.layers.26.mlp.shared_expert.down_proj": {
651
+ "bits": 8,
652
+ "group_size": 64,
653
+ "mode": "affine"
654
+ },
655
+ "language_model.model.layers.26.mlp.shared_expert.up_proj": {
656
+ "bits": 8,
657
+ "group_size": 64,
658
+ "mode": "affine"
659
+ },
660
+ "language_model.model.layers.26.mlp.shared_expert_gate": {
661
+ "bits": 8,
662
+ "group_size": 64,
663
+ "mode": "affine"
664
+ },
665
+ "language_model.model.layers.27.mlp.shared_expert.gate_proj": {
666
+ "bits": 8,
667
+ "group_size": 64,
668
+ "mode": "affine"
669
+ },
670
+ "language_model.model.layers.27.mlp.shared_expert.down_proj": {
671
+ "bits": 8,
672
+ "group_size": 64,
673
+ "mode": "affine"
674
+ },
675
+ "language_model.model.layers.27.mlp.shared_expert.up_proj": {
676
+ "bits": 8,
677
+ "group_size": 64,
678
+ "mode": "affine"
679
+ },
680
+ "language_model.model.layers.27.mlp.shared_expert_gate": {
681
+ "bits": 8,
682
+ "group_size": 64,
683
+ "mode": "affine"
684
+ },
685
+ "language_model.model.layers.28.linear_attn.out_proj": {
686
+ "bits": 5,
687
+ "group_size": 64,
688
+ "mode": "affine"
689
+ },
690
+ "language_model.model.layers.28.mlp.shared_expert.gate_proj": {
691
+ "bits": 8,
692
+ "group_size": 64,
693
+ "mode": "affine"
694
+ },
695
+ "language_model.model.layers.28.mlp.shared_expert.down_proj": {
696
+ "bits": 8,
697
+ "group_size": 64,
698
+ "mode": "affine"
699
+ },
700
+ "language_model.model.layers.28.mlp.shared_expert.up_proj": {
701
+ "bits": 8,
702
+ "group_size": 64,
703
+ "mode": "affine"
704
+ },
705
+ "language_model.model.layers.28.mlp.shared_expert_gate": {
706
+ "bits": 8,
707
+ "group_size": 64,
708
+ "mode": "affine"
709
+ },
710
+ "language_model.model.layers.29.linear_attn.out_proj": {
711
+ "bits": 5,
712
+ "group_size": 64,
713
+ "mode": "affine"
714
+ },
715
+ "language_model.model.layers.29.mlp.shared_expert.gate_proj": {
716
+ "bits": 8,
717
+ "group_size": 64,
718
+ "mode": "affine"
719
+ },
720
+ "language_model.model.layers.29.mlp.shared_expert.down_proj": {
721
+ "bits": 8,
722
+ "group_size": 64,
723
+ "mode": "affine"
724
+ },
725
+ "language_model.model.layers.29.mlp.shared_expert.up_proj": {
726
+ "bits": 8,
727
+ "group_size": 64,
728
+ "mode": "affine"
729
+ },
730
+ "language_model.model.layers.29.mlp.shared_expert_gate": {
731
+ "bits": 8,
732
+ "group_size": 64,
733
+ "mode": "affine"
734
+ },
735
+ "language_model.model.layers.30.linear_attn.out_proj": {
736
+ "bits": 5,
737
+ "group_size": 64,
738
+ "mode": "affine"
739
+ },
740
+ "language_model.model.layers.30.mlp.shared_expert.gate_proj": {
741
+ "bits": 8,
742
+ "group_size": 64,
743
+ "mode": "affine"
744
+ },
745
+ "language_model.model.layers.30.mlp.shared_expert.down_proj": {
746
+ "bits": 8,
747
+ "group_size": 64,
748
+ "mode": "affine"
749
+ },
750
+ "language_model.model.layers.30.mlp.shared_expert.up_proj": {
751
+ "bits": 8,
752
+ "group_size": 64,
753
+ "mode": "affine"
754
+ },
755
+ "language_model.model.layers.30.mlp.shared_expert_gate": {
756
+ "bits": 8,
757
+ "group_size": 64,
758
+ "mode": "affine"
759
+ },
760
+ "language_model.model.layers.31.mlp.shared_expert.gate_proj": {
761
+ "bits": 8,
762
+ "group_size": 64,
763
+ "mode": "affine"
764
+ },
765
+ "language_model.model.layers.31.mlp.shared_expert.down_proj": {
766
+ "bits": 8,
767
+ "group_size": 64,
768
+ "mode": "affine"
769
+ },
770
+ "language_model.model.layers.31.mlp.shared_expert.up_proj": {
771
+ "bits": 8,
772
+ "group_size": 64,
773
+ "mode": "affine"
774
+ },
775
+ "language_model.model.layers.31.mlp.shared_expert_gate": {
776
+ "bits": 8,
777
+ "group_size": 64,
778
+ "mode": "affine"
779
+ },
780
+ "language_model.model.layers.32.linear_attn.out_proj": {
781
+ "bits": 5,
782
+ "group_size": 64,
783
+ "mode": "affine"
784
+ },
785
+ "language_model.model.layers.32.mlp.shared_expert.gate_proj": {
786
+ "bits": 8,
787
+ "group_size": 64,
788
+ "mode": "affine"
789
+ },
790
+ "language_model.model.layers.32.mlp.shared_expert.down_proj": {
791
+ "bits": 8,
792
+ "group_size": 64,
793
+ "mode": "affine"
794
+ },
795
+ "language_model.model.layers.32.mlp.shared_expert.up_proj": {
796
+ "bits": 8,
797
+ "group_size": 64,
798
+ "mode": "affine"
799
+ },
800
+ "language_model.model.layers.32.mlp.shared_expert_gate": {
801
+ "bits": 8,
802
+ "group_size": 64,
803
+ "mode": "affine"
804
+ },
805
+ "language_model.model.layers.33.linear_attn.out_proj": {
806
+ "bits": 5,
807
+ "group_size": 64,
808
+ "mode": "affine"
809
+ },
810
+ "language_model.model.layers.33.mlp.shared_expert.gate_proj": {
811
+ "bits": 8,
812
+ "group_size": 64,
813
+ "mode": "affine"
814
+ },
815
+ "language_model.model.layers.33.mlp.shared_expert.down_proj": {
816
+ "bits": 8,
817
+ "group_size": 64,
818
+ "mode": "affine"
819
+ },
820
+ "language_model.model.layers.33.mlp.shared_expert.up_proj": {
821
+ "bits": 8,
822
+ "group_size": 64,
823
+ "mode": "affine"
824
+ },
825
+ "language_model.model.layers.33.mlp.shared_expert_gate": {
826
+ "bits": 8,
827
+ "group_size": 64,
828
+ "mode": "affine"
829
+ },
830
+ "language_model.model.layers.34.linear_attn.out_proj": {
831
+ "bits": 5,
832
+ "group_size": 64,
833
+ "mode": "affine"
834
+ },
835
+ "language_model.model.layers.34.mlp.shared_expert.gate_proj": {
836
+ "bits": 8,
837
+ "group_size": 64,
838
+ "mode": "affine"
839
+ },
840
+ "language_model.model.layers.34.mlp.shared_expert.down_proj": {
841
+ "bits": 8,
842
+ "group_size": 64,
843
+ "mode": "affine"
844
+ },
845
+ "language_model.model.layers.34.mlp.shared_expert.up_proj": {
846
+ "bits": 8,
847
+ "group_size": 64,
848
+ "mode": "affine"
849
+ },
850
+ "language_model.model.layers.34.mlp.shared_expert_gate": {
851
+ "bits": 8,
852
+ "group_size": 64,
853
+ "mode": "affine"
854
+ },
855
+ "language_model.model.layers.35.mlp.shared_expert.gate_proj": {
856
+ "bits": 8,
857
+ "group_size": 64,
858
+ "mode": "affine"
859
+ },
860
+ "language_model.model.layers.35.mlp.shared_expert.down_proj": {
861
+ "bits": 8,
862
+ "group_size": 64,
863
+ "mode": "affine"
864
+ },
865
+ "language_model.model.layers.35.mlp.shared_expert.up_proj": {
866
+ "bits": 8,
867
+ "group_size": 64,
868
+ "mode": "affine"
869
+ },
870
+ "language_model.model.layers.35.mlp.shared_expert_gate": {
871
+ "bits": 8,
872
+ "group_size": 64,
873
+ "mode": "affine"
874
+ },
875
+ "language_model.model.layers.36.linear_attn.out_proj": {
876
+ "bits": 5,
877
+ "group_size": 64,
878
+ "mode": "affine"
879
+ },
880
+ "language_model.model.layers.36.mlp.shared_expert.gate_proj": {
881
+ "bits": 8,
882
+ "group_size": 64,
883
+ "mode": "affine"
884
+ },
885
+ "language_model.model.layers.36.mlp.shared_expert.down_proj": {
886
+ "bits": 8,
887
+ "group_size": 64,
888
+ "mode": "affine"
889
+ },
890
+ "language_model.model.layers.36.mlp.shared_expert.up_proj": {
891
+ "bits": 8,
892
+ "group_size": 64,
893
+ "mode": "affine"
894
+ },
895
+ "language_model.model.layers.36.mlp.shared_expert_gate": {
896
+ "bits": 8,
897
+ "group_size": 64,
898
+ "mode": "affine"
899
+ },
900
+ "language_model.model.layers.37.linear_attn.out_proj": {
901
+ "bits": 5,
902
+ "group_size": 64,
903
+ "mode": "affine"
904
+ },
905
+ "language_model.model.layers.37.mlp.shared_expert.gate_proj": {
906
+ "bits": 8,
907
+ "group_size": 64,
908
+ "mode": "affine"
909
+ },
910
+ "language_model.model.layers.37.mlp.shared_expert.down_proj": {
911
+ "bits": 8,
912
+ "group_size": 64,
913
+ "mode": "affine"
914
+ },
915
+ "language_model.model.layers.37.mlp.shared_expert.up_proj": {
916
+ "bits": 8,
917
+ "group_size": 64,
918
+ "mode": "affine"
919
+ },
920
+ "language_model.model.layers.37.mlp.shared_expert_gate": {
921
+ "bits": 8,
922
+ "group_size": 64,
923
+ "mode": "affine"
924
+ },
925
+ "language_model.model.layers.38.linear_attn.out_proj": {
926
+ "bits": 5,
927
+ "group_size": 64,
928
+ "mode": "affine"
929
+ },
930
+ "language_model.model.layers.38.mlp.shared_expert.gate_proj": {
931
+ "bits": 8,
932
+ "group_size": 64,
933
+ "mode": "affine"
934
+ },
935
+ "language_model.model.layers.38.mlp.shared_expert.down_proj": {
936
+ "bits": 8,
937
+ "group_size": 64,
938
+ "mode": "affine"
939
+ },
940
+ "language_model.model.layers.38.mlp.shared_expert.up_proj": {
941
+ "bits": 8,
942
+ "group_size": 64,
943
+ "mode": "affine"
944
+ },
945
+ "language_model.model.layers.38.mlp.shared_expert_gate": {
946
+ "bits": 8,
947
+ "group_size": 64,
948
+ "mode": "affine"
949
+ },
950
+ "language_model.model.layers.39.mlp.shared_expert.gate_proj": {
951
+ "bits": 8,
952
+ "group_size": 64,
953
+ "mode": "affine"
954
+ },
955
+ "language_model.model.layers.39.mlp.shared_expert.down_proj": {
956
+ "bits": 8,
957
+ "group_size": 64,
958
+ "mode": "affine"
959
+ },
960
+ "language_model.model.layers.39.mlp.shared_expert.up_proj": {
961
+ "bits": 8,
962
+ "group_size": 64,
963
+ "mode": "affine"
964
+ },
965
+ "language_model.model.layers.39.mlp.shared_expert_gate": {
966
+ "bits": 8,
967
+ "group_size": 64,
968
+ "mode": "affine"
969
+ },
970
+ "language_model.lm_head": {
971
+ "bits": 8,
972
+ "group_size": 64,
973
+ "mode": "affine"
974
+ }
975
+ },
976
+ "quantization_config": {
977
+ "group_size": 64,
978
+ "bits": 4,
979
+ "mode": "affine",
980
+ "language_model.model.embed_tokens": {
981
+ "bits": 8,
982
+ "group_size": 64,
983
+ "mode": "affine"
984
+ },
985
+ "language_model.model.layers.0.linear_attn.out_proj": {
986
+ "bits": 5,
987
+ "group_size": 64,
988
+ "mode": "affine"
989
+ },
990
+ "language_model.model.layers.0.mlp.shared_expert.gate_proj": {
991
+ "bits": 8,
992
+ "group_size": 64,
993
+ "mode": "affine"
994
+ },
995
+ "language_model.model.layers.0.mlp.shared_expert.down_proj": {
996
+ "bits": 8,
997
+ "group_size": 64,
998
+ "mode": "affine"
999
+ },
1000
+ "language_model.model.layers.0.mlp.shared_expert.up_proj": {
1001
+ "bits": 8,
1002
+ "group_size": 64,
1003
+ "mode": "affine"
1004
+ },
1005
+ "language_model.model.layers.0.mlp.shared_expert_gate": {
1006
+ "bits": 8,
1007
+ "group_size": 64,
1008
+ "mode": "affine"
1009
+ },
1010
+ "language_model.model.layers.1.linear_attn.out_proj": {
1011
+ "bits": 5,
1012
+ "group_size": 64,
1013
+ "mode": "affine"
1014
+ },
1015
+ "language_model.model.layers.1.mlp.shared_expert.gate_proj": {
1016
+ "bits": 8,
1017
+ "group_size": 64,
1018
+ "mode": "affine"
1019
+ },
1020
+ "language_model.model.layers.1.mlp.shared_expert.down_proj": {
1021
+ "bits": 8,
1022
+ "group_size": 64,
1023
+ "mode": "affine"
1024
+ },
1025
+ "language_model.model.layers.1.mlp.shared_expert.up_proj": {
1026
+ "bits": 8,
1027
+ "group_size": 64,
1028
+ "mode": "affine"
1029
+ },
1030
+ "language_model.model.layers.1.mlp.shared_expert_gate": {
1031
+ "bits": 8,
1032
+ "group_size": 64,
1033
+ "mode": "affine"
1034
+ },
1035
+ "language_model.model.layers.2.linear_attn.out_proj": {
1036
+ "bits": 5,
1037
+ "group_size": 64,
1038
+ "mode": "affine"
1039
+ },
1040
+ "language_model.model.layers.2.mlp.shared_expert.gate_proj": {
1041
+ "bits": 8,
1042
+ "group_size": 64,
1043
+ "mode": "affine"
1044
+ },
1045
+ "language_model.model.layers.2.mlp.shared_expert.down_proj": {
1046
+ "bits": 8,
1047
+ "group_size": 64,
1048
+ "mode": "affine"
1049
+ },
1050
+ "language_model.model.layers.2.mlp.shared_expert.up_proj": {
1051
+ "bits": 8,
1052
+ "group_size": 64,
1053
+ "mode": "affine"
1054
+ },
1055
+ "language_model.model.layers.2.mlp.shared_expert_gate": {
1056
+ "bits": 8,
1057
+ "group_size": 64,
1058
+ "mode": "affine"
1059
+ },
1060
+ "language_model.model.layers.3.mlp.shared_expert.gate_proj": {
1061
+ "bits": 8,
1062
+ "group_size": 64,
1063
+ "mode": "affine"
1064
+ },
1065
+ "language_model.model.layers.3.mlp.shared_expert.down_proj": {
1066
+ "bits": 8,
1067
+ "group_size": 64,
1068
+ "mode": "affine"
1069
+ },
1070
+ "language_model.model.layers.3.mlp.shared_expert.up_proj": {
1071
+ "bits": 8,
1072
+ "group_size": 64,
1073
+ "mode": "affine"
1074
+ },
1075
+ "language_model.model.layers.3.mlp.shared_expert_gate": {
1076
+ "bits": 8,
1077
+ "group_size": 64,
1078
+ "mode": "affine"
1079
+ },
1080
+ "language_model.model.layers.4.linear_attn.out_proj": {
1081
+ "bits": 5,
1082
+ "group_size": 64,
1083
+ "mode": "affine"
1084
+ },
1085
+ "language_model.model.layers.4.mlp.shared_expert.gate_proj": {
1086
+ "bits": 8,
1087
+ "group_size": 64,
1088
+ "mode": "affine"
1089
+ },
1090
+ "language_model.model.layers.4.mlp.shared_expert.down_proj": {
1091
+ "bits": 8,
1092
+ "group_size": 64,
1093
+ "mode": "affine"
1094
+ },
1095
+ "language_model.model.layers.4.mlp.shared_expert.up_proj": {
1096
+ "bits": 8,
1097
+ "group_size": 64,
1098
+ "mode": "affine"
1099
+ },
1100
+ "language_model.model.layers.4.mlp.shared_expert_gate": {
1101
+ "bits": 8,
1102
+ "group_size": 64,
1103
+ "mode": "affine"
1104
+ },
1105
+ "language_model.model.layers.5.linear_attn.out_proj": {
1106
+ "bits": 5,
1107
+ "group_size": 64,
1108
+ "mode": "affine"
1109
+ },
1110
+ "language_model.model.layers.5.mlp.shared_expert.gate_proj": {
1111
+ "bits": 8,
1112
+ "group_size": 64,
1113
+ "mode": "affine"
1114
+ },
1115
+ "language_model.model.layers.5.mlp.shared_expert.down_proj": {
1116
+ "bits": 8,
1117
+ "group_size": 64,
1118
+ "mode": "affine"
1119
+ },
1120
+ "language_model.model.layers.5.mlp.shared_expert.up_proj": {
1121
+ "bits": 8,
1122
+ "group_size": 64,
1123
+ "mode": "affine"
1124
+ },
1125
+ "language_model.model.layers.5.mlp.shared_expert_gate": {
1126
+ "bits": 8,
1127
+ "group_size": 64,
1128
+ "mode": "affine"
1129
+ },
1130
+ "language_model.model.layers.6.linear_attn.out_proj": {
1131
+ "bits": 5,
1132
+ "group_size": 64,
1133
+ "mode": "affine"
1134
+ },
1135
+ "language_model.model.layers.6.mlp.shared_expert.gate_proj": {
1136
+ "bits": 8,
1137
+ "group_size": 64,
1138
+ "mode": "affine"
1139
+ },
1140
+ "language_model.model.layers.6.mlp.shared_expert.down_proj": {
1141
+ "bits": 8,
1142
+ "group_size": 64,
1143
+ "mode": "affine"
1144
+ },
1145
+ "language_model.model.layers.6.mlp.shared_expert.up_proj": {
1146
+ "bits": 8,
1147
+ "group_size": 64,
1148
+ "mode": "affine"
1149
+ },
1150
+ "language_model.model.layers.6.mlp.shared_expert_gate": {
1151
+ "bits": 8,
1152
+ "group_size": 64,
1153
+ "mode": "affine"
1154
+ },
1155
+ "language_model.model.layers.7.mlp.shared_expert.gate_proj": {
1156
+ "bits": 8,
1157
+ "group_size": 64,
1158
+ "mode": "affine"
1159
+ },
1160
+ "language_model.model.layers.7.mlp.shared_expert.down_proj": {
1161
+ "bits": 8,
1162
+ "group_size": 64,
1163
+ "mode": "affine"
1164
+ },
1165
+ "language_model.model.layers.7.mlp.shared_expert.up_proj": {
1166
+ "bits": 8,
1167
+ "group_size": 64,
1168
+ "mode": "affine"
1169
+ },
1170
+ "language_model.model.layers.7.mlp.shared_expert_gate": {
1171
+ "bits": 8,
1172
+ "group_size": 64,
1173
+ "mode": "affine"
1174
+ },
1175
+ "language_model.model.layers.8.linear_attn.out_proj": {
1176
+ "bits": 5,
1177
+ "group_size": 64,
1178
+ "mode": "affine"
1179
+ },
1180
+ "language_model.model.layers.8.mlp.shared_expert.gate_proj": {
1181
+ "bits": 8,
1182
+ "group_size": 64,
1183
+ "mode": "affine"
1184
+ },
1185
+ "language_model.model.layers.8.mlp.shared_expert.down_proj": {
1186
+ "bits": 8,
1187
+ "group_size": 64,
1188
+ "mode": "affine"
1189
+ },
1190
+ "language_model.model.layers.8.mlp.shared_expert.up_proj": {
1191
+ "bits": 8,
1192
+ "group_size": 64,
1193
+ "mode": "affine"
1194
+ },
1195
+ "language_model.model.layers.8.mlp.shared_expert_gate": {
1196
+ "bits": 8,
1197
+ "group_size": 64,
1198
+ "mode": "affine"
1199
+ },
1200
+ "language_model.model.layers.9.linear_attn.out_proj": {
1201
+ "bits": 5,
1202
+ "group_size": 64,
1203
+ "mode": "affine"
1204
+ },
1205
+ "language_model.model.layers.9.mlp.shared_expert.gate_proj": {
1206
+ "bits": 8,
1207
+ "group_size": 64,
1208
+ "mode": "affine"
1209
+ },
1210
+ "language_model.model.layers.9.mlp.shared_expert.down_proj": {
1211
+ "bits": 8,
1212
+ "group_size": 64,
1213
+ "mode": "affine"
1214
+ },
1215
+ "language_model.model.layers.9.mlp.shared_expert.up_proj": {
1216
+ "bits": 8,
1217
+ "group_size": 64,
1218
+ "mode": "affine"
1219
+ },
1220
+ "language_model.model.layers.9.mlp.shared_expert_gate": {
1221
+ "bits": 8,
1222
+ "group_size": 64,
1223
+ "mode": "affine"
1224
+ },
1225
+ "language_model.model.layers.10.linear_attn.out_proj": {
1226
+ "bits": 5,
1227
+ "group_size": 64,
1228
+ "mode": "affine"
1229
+ },
1230
+ "language_model.model.layers.10.mlp.shared_expert.gate_proj": {
1231
+ "bits": 8,
1232
+ "group_size": 64,
1233
+ "mode": "affine"
1234
+ },
1235
+ "language_model.model.layers.10.mlp.shared_expert.down_proj": {
1236
+ "bits": 8,
1237
+ "group_size": 64,
1238
+ "mode": "affine"
1239
+ },
1240
+ "language_model.model.layers.10.mlp.shared_expert.up_proj": {
1241
+ "bits": 8,
1242
+ "group_size": 64,
1243
+ "mode": "affine"
1244
+ },
1245
+ "language_model.model.layers.10.mlp.shared_expert_gate": {
1246
+ "bits": 8,
1247
+ "group_size": 64,
1248
+ "mode": "affine"
1249
+ },
1250
+ "language_model.model.layers.11.mlp.shared_expert.gate_proj": {
1251
+ "bits": 8,
1252
+ "group_size": 64,
1253
+ "mode": "affine"
1254
+ },
1255
+ "language_model.model.layers.11.mlp.shared_expert.down_proj": {
1256
+ "bits": 8,
1257
+ "group_size": 64,
1258
+ "mode": "affine"
1259
+ },
1260
+ "language_model.model.layers.11.mlp.shared_expert.up_proj": {
1261
+ "bits": 8,
1262
+ "group_size": 64,
1263
+ "mode": "affine"
1264
+ },
1265
+ "language_model.model.layers.11.mlp.shared_expert_gate": {
1266
+ "bits": 8,
1267
+ "group_size": 64,
1268
+ "mode": "affine"
1269
+ },
1270
+ "language_model.model.layers.12.linear_attn.out_proj": {
1271
+ "bits": 5,
1272
+ "group_size": 64,
1273
+ "mode": "affine"
1274
+ },
1275
+ "language_model.model.layers.12.mlp.shared_expert.gate_proj": {
1276
+ "bits": 8,
1277
+ "group_size": 64,
1278
+ "mode": "affine"
1279
+ },
1280
+ "language_model.model.layers.12.mlp.shared_expert.down_proj": {
1281
+ "bits": 8,
1282
+ "group_size": 64,
1283
+ "mode": "affine"
1284
+ },
1285
+ "language_model.model.layers.12.mlp.shared_expert.up_proj": {
1286
+ "bits": 8,
1287
+ "group_size": 64,
1288
+ "mode": "affine"
1289
+ },
1290
+ "language_model.model.layers.12.mlp.shared_expert_gate": {
1291
+ "bits": 8,
1292
+ "group_size": 64,
1293
+ "mode": "affine"
1294
+ },
1295
+ "language_model.model.layers.13.linear_attn.out_proj": {
1296
+ "bits": 5,
1297
+ "group_size": 64,
1298
+ "mode": "affine"
1299
+ },
1300
+ "language_model.model.layers.13.mlp.shared_expert.gate_proj": {
1301
+ "bits": 8,
1302
+ "group_size": 64,
1303
+ "mode": "affine"
1304
+ },
1305
+ "language_model.model.layers.13.mlp.shared_expert.down_proj": {
1306
+ "bits": 8,
1307
+ "group_size": 64,
1308
+ "mode": "affine"
1309
+ },
1310
+ "language_model.model.layers.13.mlp.shared_expert.up_proj": {
1311
+ "bits": 8,
1312
+ "group_size": 64,
1313
+ "mode": "affine"
1314
+ },
1315
+ "language_model.model.layers.13.mlp.shared_expert_gate": {
1316
+ "bits": 8,
1317
+ "group_size": 64,
1318
+ "mode": "affine"
1319
+ },
1320
+ "language_model.model.layers.14.linear_attn.out_proj": {
1321
+ "bits": 5,
1322
+ "group_size": 64,
1323
+ "mode": "affine"
1324
+ },
1325
+ "language_model.model.layers.14.mlp.shared_expert.gate_proj": {
1326
+ "bits": 8,
1327
+ "group_size": 64,
1328
+ "mode": "affine"
1329
+ },
1330
+ "language_model.model.layers.14.mlp.shared_expert.down_proj": {
1331
+ "bits": 8,
1332
+ "group_size": 64,
1333
+ "mode": "affine"
1334
+ },
1335
+ "language_model.model.layers.14.mlp.shared_expert.up_proj": {
1336
+ "bits": 8,
1337
+ "group_size": 64,
1338
+ "mode": "affine"
1339
+ },
1340
+ "language_model.model.layers.14.mlp.shared_expert_gate": {
1341
+ "bits": 8,
1342
+ "group_size": 64,
1343
+ "mode": "affine"
1344
+ },
1345
+ "language_model.model.layers.15.mlp.shared_expert.gate_proj": {
1346
+ "bits": 8,
1347
+ "group_size": 64,
1348
+ "mode": "affine"
1349
+ },
1350
+ "language_model.model.layers.15.mlp.shared_expert.down_proj": {
1351
+ "bits": 8,
1352
+ "group_size": 64,
1353
+ "mode": "affine"
1354
+ },
1355
+ "language_model.model.layers.15.mlp.shared_expert.up_proj": {
1356
+ "bits": 8,
1357
+ "group_size": 64,
1358
+ "mode": "affine"
1359
+ },
1360
+ "language_model.model.layers.15.mlp.shared_expert_gate": {
1361
+ "bits": 8,
1362
+ "group_size": 64,
1363
+ "mode": "affine"
1364
+ },
1365
+ "language_model.model.layers.16.linear_attn.out_proj": {
1366
+ "bits": 5,
1367
+ "group_size": 64,
1368
+ "mode": "affine"
1369
+ },
1370
+ "language_model.model.layers.16.mlp.shared_expert.gate_proj": {
1371
+ "bits": 8,
1372
+ "group_size": 64,
1373
+ "mode": "affine"
1374
+ },
1375
+ "language_model.model.layers.16.mlp.shared_expert.down_proj": {
1376
+ "bits": 8,
1377
+ "group_size": 64,
1378
+ "mode": "affine"
1379
+ },
1380
+ "language_model.model.layers.16.mlp.shared_expert.up_proj": {
1381
+ "bits": 8,
1382
+ "group_size": 64,
1383
+ "mode": "affine"
1384
+ },
1385
+ "language_model.model.layers.16.mlp.shared_expert_gate": {
1386
+ "bits": 8,
1387
+ "group_size": 64,
1388
+ "mode": "affine"
1389
+ },
1390
+ "language_model.model.layers.17.linear_attn.out_proj": {
1391
+ "bits": 5,
1392
+ "group_size": 64,
1393
+ "mode": "affine"
1394
+ },
1395
+ "language_model.model.layers.17.mlp.shared_expert.gate_proj": {
1396
+ "bits": 8,
1397
+ "group_size": 64,
1398
+ "mode": "affine"
1399
+ },
1400
+ "language_model.model.layers.17.mlp.shared_expert.down_proj": {
1401
+ "bits": 8,
1402
+ "group_size": 64,
1403
+ "mode": "affine"
1404
+ },
1405
+ "language_model.model.layers.17.mlp.shared_expert.up_proj": {
1406
+ "bits": 8,
1407
+ "group_size": 64,
1408
+ "mode": "affine"
1409
+ },
1410
+ "language_model.model.layers.17.mlp.shared_expert_gate": {
1411
+ "bits": 8,
1412
+ "group_size": 64,
1413
+ "mode": "affine"
1414
+ },
1415
+ "language_model.model.layers.18.linear_attn.out_proj": {
1416
+ "bits": 5,
1417
+ "group_size": 64,
1418
+ "mode": "affine"
1419
+ },
1420
+ "language_model.model.layers.18.mlp.shared_expert.gate_proj": {
1421
+ "bits": 8,
1422
+ "group_size": 64,
1423
+ "mode": "affine"
1424
+ },
1425
+ "language_model.model.layers.18.mlp.shared_expert.down_proj": {
1426
+ "bits": 8,
1427
+ "group_size": 64,
1428
+ "mode": "affine"
1429
+ },
1430
+ "language_model.model.layers.18.mlp.shared_expert.up_proj": {
1431
+ "bits": 8,
1432
+ "group_size": 64,
1433
+ "mode": "affine"
1434
+ },
1435
+ "language_model.model.layers.18.mlp.shared_expert_gate": {
1436
+ "bits": 8,
1437
+ "group_size": 64,
1438
+ "mode": "affine"
1439
+ },
1440
+ "language_model.model.layers.19.mlp.shared_expert.gate_proj": {
1441
+ "bits": 8,
1442
+ "group_size": 64,
1443
+ "mode": "affine"
1444
+ },
1445
+ "language_model.model.layers.19.mlp.shared_expert.down_proj": {
1446
+ "bits": 8,
1447
+ "group_size": 64,
1448
+ "mode": "affine"
1449
+ },
1450
+ "language_model.model.layers.19.mlp.shared_expert.up_proj": {
1451
+ "bits": 8,
1452
+ "group_size": 64,
1453
+ "mode": "affine"
1454
+ },
1455
+ "language_model.model.layers.19.mlp.shared_expert_gate": {
1456
+ "bits": 8,
1457
+ "group_size": 64,
1458
+ "mode": "affine"
1459
+ },
1460
+ "language_model.model.layers.20.linear_attn.out_proj": {
1461
+ "bits": 5,
1462
+ "group_size": 64,
1463
+ "mode": "affine"
1464
+ },
1465
+ "language_model.model.layers.20.mlp.shared_expert.gate_proj": {
1466
+ "bits": 8,
1467
+ "group_size": 64,
1468
+ "mode": "affine"
1469
+ },
1470
+ "language_model.model.layers.20.mlp.shared_expert.down_proj": {
1471
+ "bits": 8,
1472
+ "group_size": 64,
1473
+ "mode": "affine"
1474
+ },
1475
+ "language_model.model.layers.20.mlp.shared_expert.up_proj": {
1476
+ "bits": 8,
1477
+ "group_size": 64,
1478
+ "mode": "affine"
1479
+ },
1480
+ "language_model.model.layers.20.mlp.shared_expert_gate": {
1481
+ "bits": 8,
1482
+ "group_size": 64,
1483
+ "mode": "affine"
1484
+ },
1485
+ "language_model.model.layers.21.linear_attn.out_proj": {
1486
+ "bits": 5,
1487
+ "group_size": 64,
1488
+ "mode": "affine"
1489
+ },
1490
+ "language_model.model.layers.21.mlp.shared_expert.gate_proj": {
1491
+ "bits": 8,
1492
+ "group_size": 64,
1493
+ "mode": "affine"
1494
+ },
1495
+ "language_model.model.layers.21.mlp.shared_expert.down_proj": {
1496
+ "bits": 8,
1497
+ "group_size": 64,
1498
+ "mode": "affine"
1499
+ },
1500
+ "language_model.model.layers.21.mlp.shared_expert.up_proj": {
1501
+ "bits": 8,
1502
+ "group_size": 64,
1503
+ "mode": "affine"
1504
+ },
1505
+ "language_model.model.layers.21.mlp.shared_expert_gate": {
1506
+ "bits": 8,
1507
+ "group_size": 64,
1508
+ "mode": "affine"
1509
+ },
1510
+ "language_model.model.layers.22.linear_attn.out_proj": {
1511
+ "bits": 5,
1512
+ "group_size": 64,
1513
+ "mode": "affine"
1514
+ },
1515
+ "language_model.model.layers.22.mlp.shared_expert.gate_proj": {
1516
+ "bits": 8,
1517
+ "group_size": 64,
1518
+ "mode": "affine"
1519
+ },
1520
+ "language_model.model.layers.22.mlp.shared_expert.down_proj": {
1521
+ "bits": 8,
1522
+ "group_size": 64,
1523
+ "mode": "affine"
1524
+ },
1525
+ "language_model.model.layers.22.mlp.shared_expert.up_proj": {
1526
+ "bits": 8,
1527
+ "group_size": 64,
1528
+ "mode": "affine"
1529
+ },
1530
+ "language_model.model.layers.22.mlp.shared_expert_gate": {
1531
+ "bits": 8,
1532
+ "group_size": 64,
1533
+ "mode": "affine"
1534
+ },
1535
+ "language_model.model.layers.23.mlp.shared_expert.gate_proj": {
1536
+ "bits": 8,
1537
+ "group_size": 64,
1538
+ "mode": "affine"
1539
+ },
1540
+ "language_model.model.layers.23.mlp.shared_expert.down_proj": {
1541
+ "bits": 8,
1542
+ "group_size": 64,
1543
+ "mode": "affine"
1544
+ },
1545
+ "language_model.model.layers.23.mlp.shared_expert.up_proj": {
1546
+ "bits": 8,
1547
+ "group_size": 64,
1548
+ "mode": "affine"
1549
+ },
1550
+ "language_model.model.layers.23.mlp.shared_expert_gate": {
1551
+ "bits": 8,
1552
+ "group_size": 64,
1553
+ "mode": "affine"
1554
+ },
1555
+ "language_model.model.layers.24.linear_attn.out_proj": {
1556
+ "bits": 5,
1557
+ "group_size": 64,
1558
+ "mode": "affine"
1559
+ },
1560
+ "language_model.model.layers.24.mlp.shared_expert.gate_proj": {
1561
+ "bits": 8,
1562
+ "group_size": 64,
1563
+ "mode": "affine"
1564
+ },
1565
+ "language_model.model.layers.24.mlp.shared_expert.down_proj": {
1566
+ "bits": 8,
1567
+ "group_size": 64,
1568
+ "mode": "affine"
1569
+ },
1570
+ "language_model.model.layers.24.mlp.shared_expert.up_proj": {
1571
+ "bits": 8,
1572
+ "group_size": 64,
1573
+ "mode": "affine"
1574
+ },
1575
+ "language_model.model.layers.24.mlp.shared_expert_gate": {
1576
+ "bits": 8,
1577
+ "group_size": 64,
1578
+ "mode": "affine"
1579
+ },
1580
+ "language_model.model.layers.25.linear_attn.out_proj": {
1581
+ "bits": 5,
1582
+ "group_size": 64,
1583
+ "mode": "affine"
1584
+ },
1585
+ "language_model.model.layers.25.mlp.shared_expert.gate_proj": {
1586
+ "bits": 8,
1587
+ "group_size": 64,
1588
+ "mode": "affine"
1589
+ },
1590
+ "language_model.model.layers.25.mlp.shared_expert.down_proj": {
1591
+ "bits": 8,
1592
+ "group_size": 64,
1593
+ "mode": "affine"
1594
+ },
1595
+ "language_model.model.layers.25.mlp.shared_expert.up_proj": {
1596
+ "bits": 8,
1597
+ "group_size": 64,
1598
+ "mode": "affine"
1599
+ },
1600
+ "language_model.model.layers.25.mlp.shared_expert_gate": {
1601
+ "bits": 8,
1602
+ "group_size": 64,
1603
+ "mode": "affine"
1604
+ },
1605
+ "language_model.model.layers.26.linear_attn.out_proj": {
1606
+ "bits": 5,
1607
+ "group_size": 64,
1608
+ "mode": "affine"
1609
+ },
1610
+ "language_model.model.layers.26.mlp.shared_expert.gate_proj": {
1611
+ "bits": 8,
1612
+ "group_size": 64,
1613
+ "mode": "affine"
1614
+ },
1615
+ "language_model.model.layers.26.mlp.shared_expert.down_proj": {
1616
+ "bits": 8,
1617
+ "group_size": 64,
1618
+ "mode": "affine"
1619
+ },
1620
+ "language_model.model.layers.26.mlp.shared_expert.up_proj": {
1621
+ "bits": 8,
1622
+ "group_size": 64,
1623
+ "mode": "affine"
1624
+ },
1625
+ "language_model.model.layers.26.mlp.shared_expert_gate": {
1626
+ "bits": 8,
1627
+ "group_size": 64,
1628
+ "mode": "affine"
1629
+ },
1630
+ "language_model.model.layers.27.mlp.shared_expert.gate_proj": {
1631
+ "bits": 8,
1632
+ "group_size": 64,
1633
+ "mode": "affine"
1634
+ },
1635
+ "language_model.model.layers.27.mlp.shared_expert.down_proj": {
1636
+ "bits": 8,
1637
+ "group_size": 64,
1638
+ "mode": "affine"
1639
+ },
1640
+ "language_model.model.layers.27.mlp.shared_expert.up_proj": {
1641
+ "bits": 8,
1642
+ "group_size": 64,
1643
+ "mode": "affine"
1644
+ },
1645
+ "language_model.model.layers.27.mlp.shared_expert_gate": {
1646
+ "bits": 8,
1647
+ "group_size": 64,
1648
+ "mode": "affine"
1649
+ },
1650
+ "language_model.model.layers.28.linear_attn.out_proj": {
1651
+ "bits": 5,
1652
+ "group_size": 64,
1653
+ "mode": "affine"
1654
+ },
1655
+ "language_model.model.layers.28.mlp.shared_expert.gate_proj": {
1656
+ "bits": 8,
1657
+ "group_size": 64,
1658
+ "mode": "affine"
1659
+ },
1660
+ "language_model.model.layers.28.mlp.shared_expert.down_proj": {
1661
+ "bits": 8,
1662
+ "group_size": 64,
1663
+ "mode": "affine"
1664
+ },
1665
+ "language_model.model.layers.28.mlp.shared_expert.up_proj": {
1666
+ "bits": 8,
1667
+ "group_size": 64,
1668
+ "mode": "affine"
1669
+ },
1670
+ "language_model.model.layers.28.mlp.shared_expert_gate": {
1671
+ "bits": 8,
1672
+ "group_size": 64,
1673
+ "mode": "affine"
1674
+ },
1675
+ "language_model.model.layers.29.linear_attn.out_proj": {
1676
+ "bits": 5,
1677
+ "group_size": 64,
1678
+ "mode": "affine"
1679
+ },
1680
+ "language_model.model.layers.29.mlp.shared_expert.gate_proj": {
1681
+ "bits": 8,
1682
+ "group_size": 64,
1683
+ "mode": "affine"
1684
+ },
1685
+ "language_model.model.layers.29.mlp.shared_expert.down_proj": {
1686
+ "bits": 8,
1687
+ "group_size": 64,
1688
+ "mode": "affine"
1689
+ },
1690
+ "language_model.model.layers.29.mlp.shared_expert.up_proj": {
1691
+ "bits": 8,
1692
+ "group_size": 64,
1693
+ "mode": "affine"
1694
+ },
1695
+ "language_model.model.layers.29.mlp.shared_expert_gate": {
1696
+ "bits": 8,
1697
+ "group_size": 64,
1698
+ "mode": "affine"
1699
+ },
1700
+ "language_model.model.layers.30.linear_attn.out_proj": {
1701
+ "bits": 5,
1702
+ "group_size": 64,
1703
+ "mode": "affine"
1704
+ },
1705
+ "language_model.model.layers.30.mlp.shared_expert.gate_proj": {
1706
+ "bits": 8,
1707
+ "group_size": 64,
1708
+ "mode": "affine"
1709
+ },
1710
+ "language_model.model.layers.30.mlp.shared_expert.down_proj": {
1711
+ "bits": 8,
1712
+ "group_size": 64,
1713
+ "mode": "affine"
1714
+ },
1715
+ "language_model.model.layers.30.mlp.shared_expert.up_proj": {
1716
+ "bits": 8,
1717
+ "group_size": 64,
1718
+ "mode": "affine"
1719
+ },
1720
+ "language_model.model.layers.30.mlp.shared_expert_gate": {
1721
+ "bits": 8,
1722
+ "group_size": 64,
1723
+ "mode": "affine"
1724
+ },
1725
+ "language_model.model.layers.31.mlp.shared_expert.gate_proj": {
1726
+ "bits": 8,
1727
+ "group_size": 64,
1728
+ "mode": "affine"
1729
+ },
1730
+ "language_model.model.layers.31.mlp.shared_expert.down_proj": {
1731
+ "bits": 8,
1732
+ "group_size": 64,
1733
+ "mode": "affine"
1734
+ },
1735
+ "language_model.model.layers.31.mlp.shared_expert.up_proj": {
1736
+ "bits": 8,
1737
+ "group_size": 64,
1738
+ "mode": "affine"
1739
+ },
1740
+ "language_model.model.layers.31.mlp.shared_expert_gate": {
1741
+ "bits": 8,
1742
+ "group_size": 64,
1743
+ "mode": "affine"
1744
+ },
1745
+ "language_model.model.layers.32.linear_attn.out_proj": {
1746
+ "bits": 5,
1747
+ "group_size": 64,
1748
+ "mode": "affine"
1749
+ },
1750
+ "language_model.model.layers.32.mlp.shared_expert.gate_proj": {
1751
+ "bits": 8,
1752
+ "group_size": 64,
1753
+ "mode": "affine"
1754
+ },
1755
+ "language_model.model.layers.32.mlp.shared_expert.down_proj": {
1756
+ "bits": 8,
1757
+ "group_size": 64,
1758
+ "mode": "affine"
1759
+ },
1760
+ "language_model.model.layers.32.mlp.shared_expert.up_proj": {
1761
+ "bits": 8,
1762
+ "group_size": 64,
1763
+ "mode": "affine"
1764
+ },
1765
+ "language_model.model.layers.32.mlp.shared_expert_gate": {
1766
+ "bits": 8,
1767
+ "group_size": 64,
1768
+ "mode": "affine"
1769
+ },
1770
+ "language_model.model.layers.33.linear_attn.out_proj": {
1771
+ "bits": 5,
1772
+ "group_size": 64,
1773
+ "mode": "affine"
1774
+ },
1775
+ "language_model.model.layers.33.mlp.shared_expert.gate_proj": {
1776
+ "bits": 8,
1777
+ "group_size": 64,
1778
+ "mode": "affine"
1779
+ },
1780
+ "language_model.model.layers.33.mlp.shared_expert.down_proj": {
1781
+ "bits": 8,
1782
+ "group_size": 64,
1783
+ "mode": "affine"
1784
+ },
1785
+ "language_model.model.layers.33.mlp.shared_expert.up_proj": {
1786
+ "bits": 8,
1787
+ "group_size": 64,
1788
+ "mode": "affine"
1789
+ },
1790
+ "language_model.model.layers.33.mlp.shared_expert_gate": {
1791
+ "bits": 8,
1792
+ "group_size": 64,
1793
+ "mode": "affine"
1794
+ },
1795
+ "language_model.model.layers.34.linear_attn.out_proj": {
1796
+ "bits": 5,
1797
+ "group_size": 64,
1798
+ "mode": "affine"
1799
+ },
1800
+ "language_model.model.layers.34.mlp.shared_expert.gate_proj": {
1801
+ "bits": 8,
1802
+ "group_size": 64,
1803
+ "mode": "affine"
1804
+ },
1805
+ "language_model.model.layers.34.mlp.shared_expert.down_proj": {
1806
+ "bits": 8,
1807
+ "group_size": 64,
1808
+ "mode": "affine"
1809
+ },
1810
+ "language_model.model.layers.34.mlp.shared_expert.up_proj": {
1811
+ "bits": 8,
1812
+ "group_size": 64,
1813
+ "mode": "affine"
1814
+ },
1815
+ "language_model.model.layers.34.mlp.shared_expert_gate": {
1816
+ "bits": 8,
1817
+ "group_size": 64,
1818
+ "mode": "affine"
1819
+ },
1820
+ "language_model.model.layers.35.mlp.shared_expert.gate_proj": {
1821
+ "bits": 8,
1822
+ "group_size": 64,
1823
+ "mode": "affine"
1824
+ },
1825
+ "language_model.model.layers.35.mlp.shared_expert.down_proj": {
1826
+ "bits": 8,
1827
+ "group_size": 64,
1828
+ "mode": "affine"
1829
+ },
1830
+ "language_model.model.layers.35.mlp.shared_expert.up_proj": {
1831
+ "bits": 8,
1832
+ "group_size": 64,
1833
+ "mode": "affine"
1834
+ },
1835
+ "language_model.model.layers.35.mlp.shared_expert_gate": {
1836
+ "bits": 8,
1837
+ "group_size": 64,
1838
+ "mode": "affine"
1839
+ },
1840
+ "language_model.model.layers.36.linear_attn.out_proj": {
1841
+ "bits": 5,
1842
+ "group_size": 64,
1843
+ "mode": "affine"
1844
+ },
1845
+ "language_model.model.layers.36.mlp.shared_expert.gate_proj": {
1846
+ "bits": 8,
1847
+ "group_size": 64,
1848
+ "mode": "affine"
1849
+ },
1850
+ "language_model.model.layers.36.mlp.shared_expert.down_proj": {
1851
+ "bits": 8,
1852
+ "group_size": 64,
1853
+ "mode": "affine"
1854
+ },
1855
+ "language_model.model.layers.36.mlp.shared_expert.up_proj": {
1856
+ "bits": 8,
1857
+ "group_size": 64,
1858
+ "mode": "affine"
1859
+ },
1860
+ "language_model.model.layers.36.mlp.shared_expert_gate": {
1861
+ "bits": 8,
1862
+ "group_size": 64,
1863
+ "mode": "affine"
1864
+ },
1865
+ "language_model.model.layers.37.linear_attn.out_proj": {
1866
+ "bits": 5,
1867
+ "group_size": 64,
1868
+ "mode": "affine"
1869
+ },
1870
+ "language_model.model.layers.37.mlp.shared_expert.gate_proj": {
1871
+ "bits": 8,
1872
+ "group_size": 64,
1873
+ "mode": "affine"
1874
+ },
1875
+ "language_model.model.layers.37.mlp.shared_expert.down_proj": {
1876
+ "bits": 8,
1877
+ "group_size": 64,
1878
+ "mode": "affine"
1879
+ },
1880
+ "language_model.model.layers.37.mlp.shared_expert.up_proj": {
1881
+ "bits": 8,
1882
+ "group_size": 64,
1883
+ "mode": "affine"
1884
+ },
1885
+ "language_model.model.layers.37.mlp.shared_expert_gate": {
1886
+ "bits": 8,
1887
+ "group_size": 64,
1888
+ "mode": "affine"
1889
+ },
1890
+ "language_model.model.layers.38.linear_attn.out_proj": {
1891
+ "bits": 5,
1892
+ "group_size": 64,
1893
+ "mode": "affine"
1894
+ },
1895
+ "language_model.model.layers.38.mlp.shared_expert.gate_proj": {
1896
+ "bits": 8,
1897
+ "group_size": 64,
1898
+ "mode": "affine"
1899
+ },
1900
+ "language_model.model.layers.38.mlp.shared_expert.down_proj": {
1901
+ "bits": 8,
1902
+ "group_size": 64,
1903
+ "mode": "affine"
1904
+ },
1905
+ "language_model.model.layers.38.mlp.shared_expert.up_proj": {
1906
+ "bits": 8,
1907
+ "group_size": 64,
1908
+ "mode": "affine"
1909
+ },
1910
+ "language_model.model.layers.38.mlp.shared_expert_gate": {
1911
+ "bits": 8,
1912
+ "group_size": 64,
1913
+ "mode": "affine"
1914
+ },
1915
+ "language_model.model.layers.39.mlp.shared_expert.gate_proj": {
1916
+ "bits": 8,
1917
+ "group_size": 64,
1918
+ "mode": "affine"
1919
+ },
1920
+ "language_model.model.layers.39.mlp.shared_expert.down_proj": {
1921
+ "bits": 8,
1922
+ "group_size": 64,
1923
+ "mode": "affine"
1924
+ },
1925
+ "language_model.model.layers.39.mlp.shared_expert.up_proj": {
1926
+ "bits": 8,
1927
+ "group_size": 64,
1928
+ "mode": "affine"
1929
+ },
1930
+ "language_model.model.layers.39.mlp.shared_expert_gate": {
1931
+ "bits": 8,
1932
+ "group_size": 64,
1933
+ "mode": "affine"
1934
+ },
1935
+ "language_model.lm_head": {
1936
+ "bits": 8,
1937
+ "group_size": 64,
1938
+ "mode": "affine"
1939
+ }
1940
+ },
1941
+ "text_config": {
1942
+ "attention_bias": false,
1943
+ "attention_dropout": 0.0,
1944
+ "attn_output_gate": true,
1945
+ "bos_token_id": 248044,
1946
+ "torch_dtype": "bfloat16",
1947
+ "eos_token_id": 248044,
1948
+ "full_attention_interval": 4,
1949
+ "head_dim": 256,
1950
+ "hidden_act": "silu",
1951
+ "hidden_size": 2048,
1952
+ "initializer_range": 0.02,
1953
+ "layer_types": [
1954
+ "linear_attention",
1955
+ "linear_attention",
1956
+ "linear_attention",
1957
+ "full_attention",
1958
+ "linear_attention",
1959
+ "linear_attention",
1960
+ "linear_attention",
1961
+ "full_attention",
1962
+ "linear_attention",
1963
+ "linear_attention",
1964
+ "linear_attention",
1965
+ "full_attention",
1966
+ "linear_attention",
1967
+ "linear_attention",
1968
+ "linear_attention",
1969
+ "full_attention",
1970
+ "linear_attention",
1971
+ "linear_attention",
1972
+ "linear_attention",
1973
+ "full_attention",
1974
+ "linear_attention",
1975
+ "linear_attention",
1976
+ "linear_attention",
1977
+ "full_attention",
1978
+ "linear_attention",
1979
+ "linear_attention",
1980
+ "linear_attention",
1981
+ "full_attention",
1982
+ "linear_attention",
1983
+ "linear_attention",
1984
+ "linear_attention",
1985
+ "full_attention",
1986
+ "linear_attention",
1987
+ "linear_attention",
1988
+ "linear_attention",
1989
+ "full_attention",
1990
+ "linear_attention",
1991
+ "linear_attention",
1992
+ "linear_attention",
1993
+ "full_attention"
1994
+ ],
1995
+ "linear_conv_kernel_dim": 4,
1996
+ "linear_key_head_dim": 128,
1997
+ "linear_num_key_heads": 16,
1998
+ "linear_num_value_heads": 32,
1999
+ "linear_value_head_dim": 128,
2000
+ "mamba_ssm_dtype": "float32",
2001
+ "max_position_embeddings": 262144,
2002
+ "model_type": "qwen3_5_moe_text",
2003
+ "moe_intermediate_size": 512,
2004
+ "mtp_num_hidden_layers": 1,
2005
+ "mtp_use_dedicated_embeddings": false,
2006
+ "num_attention_heads": 16,
2007
+ "num_experts": 256,
2008
+ "num_experts_per_tok": 8,
2009
+ "num_hidden_layers": 40,
2010
+ "num_key_value_heads": 2,
2011
+ "output_router_logits": false,
2012
+ "pad_token_id": null,
2013
+ "partial_rotary_factor": 0.25,
2014
+ "rms_norm_eps": 1e-06,
2015
+ "rope_parameters": {
2016
+ "mrope_interleaved": true,
2017
+ "mrope_section": [
2018
+ 11,
2019
+ 11,
2020
+ 10
2021
+ ],
2022
+ "partial_rotary_factor": 0.25,
2023
+ "rope_theta": 10000000,
2024
+ "rope_type": "default"
2025
+ },
2026
+ "router_aux_loss_coef": 0.001,
2027
+ "shared_expert_intermediate_size": 512,
2028
+ "tie_word_embeddings": false,
2029
+ "use_cache": true,
2030
+ "vocab_size": 248320
2031
+ },
2032
+ "tie_word_embeddings": false,
2033
+ "unsloth_version": "2026.4.1",
2034
+ "use_cache": false,
2035
+ "video_token_id": 248057,
2036
+ "vision_config": {
2037
+ "deepstack_visual_indexes": [],
2038
+ "depth": 27,
2039
+ "torch_dtype": "bfloat16",
2040
+ "hidden_act": "gelu_pytorch_tanh",
2041
+ "hidden_size": 1152,
2042
+ "in_channels": 3,
2043
+ "initializer_range": 0.02,
2044
+ "intermediate_size": 4304,
2045
+ "model_type": "qwen3_5_moe",
2046
+ "num_heads": 16,
2047
+ "num_position_embeddings": 2304,
2048
+ "out_hidden_size": 2048,
2049
+ "patch_size": 16,
2050
+ "spatial_merge_size": 2,
2051
+ "temporal_patch_size": 2
2052
+ },
2053
+ "vision_end_token_id": 248054,
2054
+ "vision_start_token_id": 248053
2055
+ }
model-00001-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:319e5e1da1998dd3f3b560c70264e774b7179551b688978967431759034cb979
3
+ size 5264829442
model-00002-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e18b0de1261664d946e77cf424d53fdec8113ad885914c7e8dd0f281b81f1661
3
+ size 5237819359
model-00003-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2213014cb6bb1ae03e995c835b4841af9405f638b7a006329ed366d4d28f05fa
3
+ size 5242546225
model-00004-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3cec870bc367e623523488a32f66b60d07d2a63b6e166c3dab0e57a54032f28a
3
+ size 5279591509
model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff
 
processor_config.json ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "image_processor": {
3
+ "data_format": "channels_first",
4
+ "do_convert_rgb": true,
5
+ "do_normalize": true,
6
+ "do_rescale": true,
7
+ "do_resize": true,
8
+ "image_mean": [
9
+ 0.5,
10
+ 0.5,
11
+ 0.5
12
+ ],
13
+ "image_processor_type": "Qwen2VLImageProcessorFast",
14
+ "image_std": [
15
+ 0.5,
16
+ 0.5,
17
+ 0.5
18
+ ],
19
+ "merge_size": 2,
20
+ "patch_size": 16,
21
+ "resample": 3,
22
+ "rescale_factor": 0.00392156862745098,
23
+ "size": {
24
+ "longest_edge": 16777216,
25
+ "shortest_edge": 65536
26
+ },
27
+ "temporal_patch_size": 2
28
+ },
29
+ "processor_class": "Qwen3VLProcessor",
30
+ "video_processor": {
31
+ "data_format": "channels_first",
32
+ "default_to_square": true,
33
+ "do_convert_rgb": true,
34
+ "do_normalize": true,
35
+ "do_rescale": true,
36
+ "do_resize": true,
37
+ "do_sample_frames": true,
38
+ "fps": 2,
39
+ "image_mean": [
40
+ 0.5,
41
+ 0.5,
42
+ 0.5
43
+ ],
44
+ "image_std": [
45
+ 0.5,
46
+ 0.5,
47
+ 0.5
48
+ ],
49
+ "max_frames": 768,
50
+ "merge_size": 2,
51
+ "min_frames": 4,
52
+ "patch_size": 16,
53
+ "resample": 3,
54
+ "rescale_factor": 0.00392156862745098,
55
+ "return_metadata": false,
56
+ "size": {
57
+ "longest_edge": 25165824,
58
+ "shortest_edge": 4096
59
+ },
60
+ "temporal_patch_size": 2,
61
+ "video_processor_type": "Qwen3VLVideoProcessor"
62
+ }
63
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:87a7830d63fcf43bf241c3c5242e96e62dd3fdc29224ca26fed8ea333db72de4
3
+ size 19989343
tokenizer_config.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "audio_bos_token": "<|audio_start|>",
4
+ "audio_eos_token": "<|audio_end|>",
5
+ "audio_token": "<|audio_pad|>",
6
+ "backend": "tokenizers",
7
+ "bos_token": null,
8
+ "clean_up_tokenization_spaces": false,
9
+ "eos_token": "<|im_end|>",
10
+ "errors": "replace",
11
+ "image_token": "<|image_pad|>",
12
+ "is_local": false,
13
+ "model_max_length": 262144,
14
+ "model_specific_special_tokens": {
15
+ "audio_bos_token": "<|audio_start|>",
16
+ "audio_eos_token": "<|audio_end|>",
17
+ "audio_token": "<|audio_pad|>",
18
+ "image_token": "<|image_pad|>",
19
+ "video_token": "<|video_pad|>",
20
+ "vision_bos_token": "<|vision_start|>",
21
+ "vision_eos_token": "<|vision_end|>"
22
+ },
23
+ "pad_token": "<|vision_pad|>",
24
+ "padding_side": "right",
25
+ "pretokenize_regex": "(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?[\\p{L}\\p{M}]+|\\p{N}| ?[^\\s\\p{L}\\p{M}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+",
26
+ "processor_class": "Qwen3VLProcessor",
27
+ "split_special_tokens": false,
28
+ "tokenizer_class": "TokenizersBackend",
29
+ "unk_token": null,
30
+ "video_token": "<|video_pad|>",
31
+ "vision_bos_token": "<|vision_start|>",
32
+ "vision_eos_token": "<|vision_end|>",
33
+ "chat_template": "{%- set image_count = namespace(value=0) %}\n{%- set video_count = namespace(value=0) %}\n{%- macro render_content(content, do_vision_count, is_system_content=false) %}\n {%- if content is string %}\n {{- content }}\n {%- elif content is iterable and content is not mapping %}\n {%- for item in content %}\n {%- if 'image' in item or 'image_url' in item or item.type == 'image' %}\n {%- if is_system_content %}\n {{- raise_exception('System message cannot contain images.') }}\n {%- endif %}\n {%- if do_vision_count %}\n {%- set image_count.value = image_count.value + 1 %}\n {%- endif %}\n {%- if add_vision_id %}\n {{- 'Picture ' ~ image_count.value ~ ': ' }}\n {%- endif %}\n {{- '<|vision_start|><|image_pad|><|vision_end|>' }}\n {%- elif 'video' in item or item.type == 'video' %}\n {%- if is_system_content %}\n {{- raise_exception('System message cannot contain videos.') }}\n {%- endif %}\n {%- if do_vision_count %}\n {%- set video_count.value = video_count.value + 1 %}\n {%- endif %}\n {%- if add_vision_id %}\n {{- 'Video ' ~ video_count.value ~ ': ' }}\n {%- endif %}\n {{- '<|vision_start|><|video_pad|><|vision_end|>' }}\n {%- elif 'text' in item %}\n {{- item.text }}\n {%- else %}\n {{- raise_exception('Unexpected item type in content.') }}\n {%- endif %}\n {%- endfor %}\n {%- elif content is none or content is undefined %}\n {{- '' }}\n {%- else %}\n {{- raise_exception('Unexpected content type.') }}\n {%- endif %}\n{%- endmacro %}\n{%- if not messages %}\n {{- raise_exception('No messages provided.') }}\n{%- endif %}\n{%- set num_sys = 0 %}\n{%- set merged_system = '' %}\n{%- if messages[0].role == 'system' or messages[0].role == 'developer' %}\n {%- set first = render_content(messages[0].content, false, true)|trim %}\n {%- if messages|length > 1 and (messages[1].role == 'system' or messages[1].role == 'developer') %}\n {%- set second = render_content(messages[1].content, false, true)|trim %}\n {%- set merged_system = first + '\\n' + second %}\n {%- set num_sys = 2 %}\n {%- else %}\n {%- set merged_system = first %}\n {%- set num_sys = 1 %}\n {%- endif %}\n{%- endif %}\n{%- if tools and tools is iterable and tools is not mapping %}\n {{- '<|im_start|>system\\n' }}\n {{- \"# Tools\\n\\nYou have access to the following functions:\\n\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\" }}\n {{- '\\n\\nIf you choose to call a function ONLY reply in the following format with NO suffix:\\n\\n<tool_call>\\n<function=example_function_name>\\n<parameter=example_parameter_1>\\nvalue_1\\n</parameter>\\n<parameter=example_parameter_2>\\nThis is the value for the second parameter\\nthat can span\\nmultiple lines\\n</parameter>\\n</function>\\n</tool_call>\\n\\n<IMPORTANT>\\nReminder:\\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags\\n- Required parameters MUST be specified\\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\\n</IMPORTANT>' }}\n {%- if merged_system %}\n {{- '\\n\\n' + merged_system }}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n{%- else %}\n {%- if merged_system %}\n {{- '<|im_start|>system\\n' + merged_system + '<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for message in messages[::-1] %}\n {%- set index = (messages|length - 1) - loop.index0 %}\n {%- if ns.multi_step_tool and message.role == \"user\" %}\n {%- set content = render_content(message.content, false)|trim %}\n {%- if not(content.startswith('<tool_response>') and content.endswith('</tool_response>')) %}\n {%- set ns.multi_step_tool = false %}\n {%- set ns.last_query_index = index %}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- for message in messages %}\n {%- if loop.index0 >= num_sys and message.role != \"system\" and message.role != \"developer\" %}\n {%- set content = render_content(message.content, true)|trim %}\n {%- if message.role == \"user\" %}\n {{- '<|im_start|>' + message.role + '\\n' + content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {%- set reasoning_content = '' %}\n {%- if message.reasoning_content is string %}\n {%- set reasoning_content = message.reasoning_content %}\n {%- else %}\n {%- if '</think>' in content %}\n {%- set reasoning_content = content.split('</think>')[0].rstrip('\\n').split('<think>')[-1].lstrip('\\n') %}\n {%- set content = content.split('</think>')[-1].lstrip('\\n') %}\n {%- endif %}\n {%- endif %}\n {%- set reasoning_content = reasoning_content|trim %}\n {%- if (preserve_thinking is defined and preserve_thinking is true) or (loop.index0 > ns.last_query_index) %}\n {{- '<|im_start|>' + message.role + '\\n<think>\\n' + reasoning_content + '\\n</think>\\n\\n' + content }}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- if message.tool_calls and message.tool_calls is iterable and message.tool_calls is not mapping %}\n {%- for tool_call in message.tool_calls %}\n {%- if tool_call.function is defined %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {%- if loop.first %}\n {%- if content|trim %}\n {{- '\\n\\n<tool_call>\\n<function=' + tool_call.name + '>\\n' }}\n {%- else %}\n {{- '<tool_call>\\n<function=' + tool_call.name + '>\\n' }}\n {%- endif %}\n {%- else %}\n {{- '\\n<tool_call>\\n<function=' + tool_call.name + '>\\n' }}\n {%- endif %}\n {%- if tool_call.arguments is mapping %}\n {%- for args_name in tool_call.arguments %}\n {%- set args_value = tool_call.arguments[args_name] %}\n {{- '<parameter=' + args_name + '>\\n' }}\n {%- set args_value = args_value | tojson | safe if args_value is mapping or (args_value is sequence and args_value is not string) else args_value | string %}\n {{- args_value }}\n {{- '\\n</parameter>\\n' }}\n {%- endfor %}\n {%- endif %}\n {{- '</function>\\n</tool_call>' }}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if loop.previtem and loop.previtem.role != \"tool\" %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- content }}\n {{- '\\n</tool_response>' }}\n {%- if not loop.last and loop.nextitem.role != \"tool\" %}\n {{- '<|im_end|>\\n' }}\n {%- elif loop.last %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n {%- if enable_thinking is defined and enable_thinking is false %}\n {{- '<think>\\n\\n</think>\\n\\n' }}\n {%- else %}\n {{- '<think>\\n' }}\n {%- endif %}\n{%- endif %}\n{#- Unsloth fixes - developer role, tool calling #}"
34
+ }