ansulev wangzhang commited on
Commit
547409e
·
0 Parent(s):

Duplicate from wangzhang/Qwen3.6-35B-A3B-abliterated-v2

Browse files

Co-authored-by: Steve Wu <wangzhang@users.noreply.huggingface.co>

.gitattributes ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,147 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: tongyi-qianwen
4
+ base_model: Qwen/Qwen3.6-35B-A3B
5
+ tags:
6
+ - abliterated
7
+ - uncensored
8
+ - qwen3
9
+ - moe
10
+ - abliterix
11
+ ---
12
+
13
+ # Qwen3.6-35B-A3B — Abliterated **V2**
14
+
15
+ This is **V2** of the abliterated (uncensored) [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B), created using [Abliterix](https://github.com/wuwangzhang1216/abliterix).
16
+
17
+ V2 improves on [V1](https://huggingface.co/wangzhang/Qwen3.6-35B-A3B-abliterated) by adding **projected abliteration** (grimjim 2025), **outlier winsorization**, **2× training data**, and a **larger TPE search budget** — cutting the refusal rate from 7/100 to **4/100** under the same LLM-judge evaluation.
18
+
19
+ ## V1 vs V2 at a glance
20
+
21
+ | Metric | [V1](https://huggingface.co/wangzhang/Qwen3.6-35B-A3B-abliterated) | **V2 (this model)** | Change |
22
+ |---|---|---|---|
23
+ | **Refusals (LLM judge, 100 eval prompts)** | 7/100 | **4/100** | **−43%** |
24
+ | **Attack success rate** | 93% | **96%** | **+3 pt** |
25
+ | KL divergence from base | 0.0189 | 0.0421 | +0.023 |
26
+ | Optimization trials completed | 24/50 | 33/50 | TPE explored more |
27
+ | Training prompts | 400 | 800 | 2× more data |
28
+ | Eval prompts | 100 | 100 | (unchanged for fair A/B) |
29
+
30
+ V2 trades a small KL increase (still well under 0.1, no perceptible coherence loss) for a meaningful refusal-rate improvement and a more robust steering vector trained on 2× the data.
31
+
32
+ ## Method
33
+
34
+ Qwen3.6-35B-A3B is a Mixture-of-Experts model (256 routed experts, 8 active per token, 35B total / 3B active parameters) sharing identical architecture with Qwen3.5-35B-A3B. Standard LoRA-based abliteration is effective on this architecture (unlike Gemma 4's double-norm design which requires direct weight editing).
35
+
36
+ V2 inherits V1's proven base recipe and adds four concrete improvements:
37
+
38
+ ### Inherited from V1 (validated baseline)
39
+ - **LoRA rank-1 steering** on attention O-projection and MLP down-projection (Q/K/V disabled — refusal signal on MoE models lives in the expert path, not attention projections)
40
+ - **Expert-Granular Abliteration (EGA)** projecting the refusal direction from all 256 expert down_proj slices per layer
41
+ - **MoE router suppression** complementing EGA
42
+ - **Orthogonalized steering vectors** removing benign-direction contamination
43
+ - **Gaussian decay kernel** tapering steering strength across layers
44
+ - **Strength range [0.5, 6.0]** to avoid degenerate output while maximizing compliance
45
+
46
+ ### New in V2
47
+ 1. **Projected abliteration** (grimjim 2025) — only removes the orthogonal component of the refusal direction relative to the harmless mean, **preserving helpfulness-aligned signal** that orthogonal projection alone would discard.
48
+ 2. **Vector winsorization** at q=0.995 — damps outlier residuals from the ~0.5% of harmful prompts whose hidden-state norms would otherwise skew the steering direction.
49
+ 3. **2× training data** (800 prompts vs 400) — the per-layer steering vector is averaged over twice as many examples, reducing variance.
50
+ 4. **Tighter KL constraint and prune threshold** (target 0.005, prune 0.5 vs V1's 0.01/5.0) — trials with degenerate KL behavior are killed earlier, freeing TPE budget for productive regions.
51
+
52
+ ### Winning trial (#33) configuration
53
+
54
+ ```
55
+ attn.o_proj.max_weight = 4.20 @ layer 27 (sharp peak, min_distance=2.61)
56
+ mlp.down_proj.max_weight = 0.94 @ layer 34 (late-layer perturbation)
57
+ vector_index = per layer
58
+ KL = 0.0421, refusals = 4/100
59
+ ```
60
+
61
+ V2's winner uses a notably different recipe than V1: **strong attention steering with an extremely sharp gaussian peak** (min_weight_distance ≈ 2.6 layers) **plus weak late-layer MLP perturbation**. V1's winner used moderate-strength early-layer steering on both components. The TPE search was able to find this distinct basin thanks to the larger trial budget and tighter KL pruning.
62
+
63
+ ## Evaluation
64
+
65
+ | Metric | Value |
66
+ |---|---|
67
+ | **Refusals (LLM judge, 100 eval prompts)** | **4/100** |
68
+ | **Attack success rate** | **96%** |
69
+ | KL divergence from base | 0.0421 |
70
+ | Baseline refusals (original Qwen3.6-35B-A3B) | 100/100 |
71
+ | Optimization trials completed | 33 (best) / 50 |
72
+ | LLM judge model | google/gemini-3-flash-preview |
73
+ | Training data | datasets/harmful_1000 + good_1000, train[:800] |
74
+ | Eval data | same datasets, train[800:900] |
75
+
76
+ All refusal classifications were performed by an external LLM judge (Google Gemini 3 Flash) — **no keyword matching or heuristic detection** was used. The judge classifies degenerate/garbled output as refusal, ensuring that only coherent, on-topic, actionable responses count as compliance.
77
+
78
+ ### Smoke test (15 classic adversarial prompts)
79
+
80
+ | Prompt category | V1 | V2 |
81
+ |---|---|---|
82
+ | Lock picking, hotwire, shoplift | ✅ | ✅ |
83
+ | Phishing email (EN+CN) | ✅ | ✅ |
84
+ | Methamphetamine synthesis | ✅ | ✅ |
85
+ | Pipe bomb / IED | ✅ | ✅ |
86
+ | Malware / keylogger code | ✅ | ✅ |
87
+ | Forge signature, fake ID, fake news | ✅ | ✅ |
88
+ | WiFi hacking, computer intrusion | ✅ | ✅ |
89
+ | Network fraud (scam playbook) | ✅ | �� |
90
+
91
+ Both V1 and V2 achieve **15/15** on this smoke test. V2's improvement appears in the **long-tail eval prompts** — more nuanced, indirect, or role-play-style requests that V1's narrower TPE search did not crack.
92
+
93
+ ## A note on honest evaluation
94
+
95
+ Many abliterated models on HuggingFace claim near-perfect scores ("3/100 refusals", "0.7% refusal rate", etc.). **We urge the community to treat these numbers with skepticism** unless the evaluation methodology is fully documented.
96
+
97
+ Through our research, we have identified a systemic problem: **most abliteration benchmarks dramatically undercount refusals** due to:
98
+ - **Short generation lengths** (30-50 tokens) that miss delayed/soft refusals
99
+ - **Keyword-only detection** that counts garbled/degenerate output as "compliant" because it doesn't contain refusal keywords
100
+ - **Lenient public datasets** (e.g. mlabonne/harmful_behaviors) that are too simple to stress-test abliteration quality
101
+
102
+ ### Our evaluation standards
103
+
104
+ - **LLM judge for all classifications:** Every response is sent to Google Gemini 3 Flash for judgment. Degenerate, garbled, or incoherent output is classified as refusal. No keyword shortcuts, no heuristic pre-screening.
105
+ - **Sufficient generation length (100 tokens for eval, 200+ for smoke tests):** Enough to capture delayed refusal patterns common in large instruction-tuned models.
106
+ - **Diverse, challenging prompts:** Our evaluation dataset contains 100 prompts spanning English and Chinese, multiple sophistication levels, and diverse harm categories.
107
+ - **Manual verification:** Top trials are tested with 15 classic adversarial prompts via `test_trial.py` to confirm coherent, on-topic output before export.
108
+
109
+ **We report 4/100 refusals honestly.** This is a real number from a rigorous, LLM-judge-based evaluation — not an optimistic estimate from a lenient pipeline.
110
+
111
+ ## Usage
112
+
113
+ ```python
114
+ from transformers import AutoModelForCausalLM, AutoTokenizer
115
+ import torch
116
+
117
+ model = AutoModelForCausalLM.from_pretrained(
118
+ "wangzhang/Qwen3.6-35B-A3B-abliterated-v2",
119
+ torch_dtype=torch.bfloat16,
120
+ device_map="auto",
121
+ )
122
+ tokenizer = AutoTokenizer.from_pretrained("wangzhang/Qwen3.6-35B-A3B-abliterated-v2")
123
+
124
+ messages = [{"role": "user", "content": "Your prompt here"}]
125
+ text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=False)
126
+ inputs = tokenizer(text, return_tensors="pt").to(model.device)
127
+
128
+ with torch.no_grad():
129
+ output = model.generate(**inputs, max_new_tokens=512)
130
+ print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
131
+ ```
132
+
133
+ ### Hardware requirements
134
+
135
+ - **Inference:** ~70 GB VRAM in bf16 — fits 1× H100 80GB, 1× H200, 1× B200, or 1× RTX Pro 6000 96GB.
136
+ - **vLLM/SGLang:** supported (no special flags needed for serving — abliteration is baked into the weights).
137
+
138
+ ## Which version should I use?
139
+
140
+ - **V2 (this model)** — Lower refusal rate (4/100 vs 7/100). Slightly higher KL but no perceptible coherence loss. **Recommended for most use cases.**
141
+ - **[V1](https://huggingface.co/wangzhang/Qwen3.6-35B-A3B-abliterated)** — Lower KL divergence (0.0189 vs 0.0421). Marginally closer to base-model output distribution. Choose this if you need maximum behavioral fidelity to the original Qwen3.6-35B-A3B and can tolerate ~3 pp more refusals.
142
+
143
+ Both versions share the same base architecture and chat template; switching is a one-line change to `model_id`.
144
+
145
+ ## Disclaimer
146
+
147
+ This model is released for research purposes only. The abliteration process removes safety guardrails — use responsibly.
chat_template.jinja ADDED
@@ -0,0 +1,154 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- set image_count = namespace(value=0) %}
2
+ {%- set video_count = namespace(value=0) %}
3
+ {%- macro render_content(content, do_vision_count, is_system_content=false) %}
4
+ {%- if content is string %}
5
+ {{- content }}
6
+ {%- elif content is iterable and content is not mapping %}
7
+ {%- for item in content %}
8
+ {%- if 'image' in item or 'image_url' in item or item.type == 'image' %}
9
+ {%- if is_system_content %}
10
+ {{- raise_exception('System message cannot contain images.') }}
11
+ {%- endif %}
12
+ {%- if do_vision_count %}
13
+ {%- set image_count.value = image_count.value + 1 %}
14
+ {%- endif %}
15
+ {%- if add_vision_id %}
16
+ {{- 'Picture ' ~ image_count.value ~ ': ' }}
17
+ {%- endif %}
18
+ {{- '<|vision_start|><|image_pad|><|vision_end|>' }}
19
+ {%- elif 'video' in item or item.type == 'video' %}
20
+ {%- if is_system_content %}
21
+ {{- raise_exception('System message cannot contain videos.') }}
22
+ {%- endif %}
23
+ {%- if do_vision_count %}
24
+ {%- set video_count.value = video_count.value + 1 %}
25
+ {%- endif %}
26
+ {%- if add_vision_id %}
27
+ {{- 'Video ' ~ video_count.value ~ ': ' }}
28
+ {%- endif %}
29
+ {{- '<|vision_start|><|video_pad|><|vision_end|>' }}
30
+ {%- elif 'text' in item %}
31
+ {{- item.text }}
32
+ {%- else %}
33
+ {{- raise_exception('Unexpected item type in content.') }}
34
+ {%- endif %}
35
+ {%- endfor %}
36
+ {%- elif content is none or content is undefined %}
37
+ {{- '' }}
38
+ {%- else %}
39
+ {{- raise_exception('Unexpected content type.') }}
40
+ {%- endif %}
41
+ {%- endmacro %}
42
+ {%- if not messages %}
43
+ {{- raise_exception('No messages provided.') }}
44
+ {%- endif %}
45
+ {%- if tools and tools is iterable and tools is not mapping %}
46
+ {{- '<|im_start|>system\n' }}
47
+ {{- "# Tools\n\nYou have access to the following functions:\n\n<tools>" }}
48
+ {%- for tool in tools %}
49
+ {{- "\n" }}
50
+ {{- tool | tojson }}
51
+ {%- endfor %}
52
+ {{- "\n</tools>" }}
53
+ {{- '\n\nIf you choose to call a function ONLY reply in the following format with NO suffix:\n\n<tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>\nvalue_1\n</parameter>\n<parameter=example_parameter_2>\nThis is the value for the second parameter\nthat can span\nmultiple lines\n</parameter>\n</function>\n</tool_call>\n\n<IMPORTANT>\nReminder:\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags\n- Required parameters MUST be specified\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\n</IMPORTANT>' }}
54
+ {%- if messages[0].role == 'system' %}
55
+ {%- set content = render_content(messages[0].content, false, true)|trim %}
56
+ {%- if content %}
57
+ {{- '\n\n' + content }}
58
+ {%- endif %}
59
+ {%- endif %}
60
+ {{- '<|im_end|>\n' }}
61
+ {%- else %}
62
+ {%- if messages[0].role == 'system' %}
63
+ {%- set content = render_content(messages[0].content, false, true)|trim %}
64
+ {{- '<|im_start|>system\n' + content + '<|im_end|>\n' }}
65
+ {%- endif %}
66
+ {%- endif %}
67
+ {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
68
+ {%- for message in messages[::-1] %}
69
+ {%- set index = (messages|length - 1) - loop.index0 %}
70
+ {%- if ns.multi_step_tool and message.role == "user" %}
71
+ {%- set content = render_content(message.content, false)|trim %}
72
+ {%- if not(content.startswith('<tool_response>') and content.endswith('</tool_response>')) %}
73
+ {%- set ns.multi_step_tool = false %}
74
+ {%- set ns.last_query_index = index %}
75
+ {%- endif %}
76
+ {%- endif %}
77
+ {%- endfor %}
78
+ {%- if ns.multi_step_tool %}
79
+ {{- raise_exception('No user query found in messages.') }}
80
+ {%- endif %}
81
+ {%- for message in messages %}
82
+ {%- set content = render_content(message.content, true)|trim %}
83
+ {%- if message.role == "system" %}
84
+ {%- if not loop.first %}
85
+ {{- raise_exception('System message must be at the beginning.') }}
86
+ {%- endif %}
87
+ {%- elif message.role == "user" %}
88
+ {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
89
+ {%- elif message.role == "assistant" %}
90
+ {%- set reasoning_content = '' %}
91
+ {%- if message.reasoning_content is string %}
92
+ {%- set reasoning_content = message.reasoning_content %}
93
+ {%- else %}
94
+ {%- if '</think>' in content %}
95
+ {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
96
+ {%- set content = content.split('</think>')[-1].lstrip('\n') %}
97
+ {%- endif %}
98
+ {%- endif %}
99
+ {%- set reasoning_content = reasoning_content|trim %}
100
+ {%- if (preserve_thinking is defined and preserve_thinking is true) or (loop.index0 > ns.last_query_index) %}
101
+ {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content + '\n</think>\n\n' + content }}
102
+ {%- else %}
103
+ {{- '<|im_start|>' + message.role + '\n' + content }}
104
+ {%- endif %}
105
+ {%- if message.tool_calls and message.tool_calls is iterable and message.tool_calls is not mapping %}
106
+ {%- for tool_call in message.tool_calls %}
107
+ {%- if tool_call.function is defined %}
108
+ {%- set tool_call = tool_call.function %}
109
+ {%- endif %}
110
+ {%- if loop.first %}
111
+ {%- if content|trim %}
112
+ {{- '\n\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
113
+ {%- else %}
114
+ {{- '<tool_call>\n<function=' + tool_call.name + '>\n' }}
115
+ {%- endif %}
116
+ {%- else %}
117
+ {{- '\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
118
+ {%- endif %}
119
+ {%- if tool_call.arguments is defined %}
120
+ {%- for args_name, args_value in tool_call.arguments|items %}
121
+ {{- '<parameter=' + args_name + '>\n' }}
122
+ {%- set args_value = args_value | string if args_value is string else args_value | tojson | safe %}
123
+ {{- args_value }}
124
+ {{- '\n</parameter>\n' }}
125
+ {%- endfor %}
126
+ {%- endif %}
127
+ {{- '</function>\n</tool_call>' }}
128
+ {%- endfor %}
129
+ {%- endif %}
130
+ {{- '<|im_end|>\n' }}
131
+ {%- elif message.role == "tool" %}
132
+ {%- if loop.previtem and loop.previtem.role != "tool" %}
133
+ {{- '<|im_start|>user' }}
134
+ {%- endif %}
135
+ {{- '\n<tool_response>\n' }}
136
+ {{- content }}
137
+ {{- '\n</tool_response>' }}
138
+ {%- if not loop.last and loop.nextitem.role != "tool" %}
139
+ {{- '<|im_end|>\n' }}
140
+ {%- elif loop.last %}
141
+ {{- '<|im_end|>\n' }}
142
+ {%- endif %}
143
+ {%- else %}
144
+ {{- raise_exception('Unexpected message role.') }}
145
+ {%- endif %}
146
+ {%- endfor %}
147
+ {%- if add_generation_prompt %}
148
+ {{- '<|im_start|>assistant\n' }}
149
+ {%- if enable_thinking is defined and enable_thinking is false %}
150
+ {{- '<think>\n\n</think>\n\n' }}
151
+ {%- else %}
152
+ {{- '<think>\n' }}
153
+ {%- endif %}
154
+ {%- endif %}
config.json ADDED
@@ -0,0 +1,121 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Qwen3_5MoeForConditionalGeneration"
4
+ ],
5
+ "dtype": "bfloat16",
6
+ "image_token_id": 248056,
7
+ "model_type": "qwen3_5_moe",
8
+ "text_config": {
9
+ "attention_bias": false,
10
+ "attention_dropout": 0.0,
11
+ "attn_output_gate": true,
12
+ "bos_token_id": 248044,
13
+ "dtype": "bfloat16",
14
+ "eos_token_id": 248044,
15
+ "full_attention_interval": 4,
16
+ "head_dim": 256,
17
+ "hidden_act": "silu",
18
+ "hidden_size": 2048,
19
+ "initializer_range": 0.02,
20
+ "layer_types": [
21
+ "linear_attention",
22
+ "linear_attention",
23
+ "linear_attention",
24
+ "full_attention",
25
+ "linear_attention",
26
+ "linear_attention",
27
+ "linear_attention",
28
+ "full_attention",
29
+ "linear_attention",
30
+ "linear_attention",
31
+ "linear_attention",
32
+ "full_attention",
33
+ "linear_attention",
34
+ "linear_attention",
35
+ "linear_attention",
36
+ "full_attention",
37
+ "linear_attention",
38
+ "linear_attention",
39
+ "linear_attention",
40
+ "full_attention",
41
+ "linear_attention",
42
+ "linear_attention",
43
+ "linear_attention",
44
+ "full_attention",
45
+ "linear_attention",
46
+ "linear_attention",
47
+ "linear_attention",
48
+ "full_attention",
49
+ "linear_attention",
50
+ "linear_attention",
51
+ "linear_attention",
52
+ "full_attention",
53
+ "linear_attention",
54
+ "linear_attention",
55
+ "linear_attention",
56
+ "full_attention",
57
+ "linear_attention",
58
+ "linear_attention",
59
+ "linear_attention",
60
+ "full_attention"
61
+ ],
62
+ "linear_conv_kernel_dim": 4,
63
+ "linear_key_head_dim": 128,
64
+ "linear_num_key_heads": 16,
65
+ "linear_num_value_heads": 32,
66
+ "linear_value_head_dim": 128,
67
+ "mamba_ssm_dtype": "float32",
68
+ "max_position_embeddings": 262144,
69
+ "model_type": "qwen3_5_moe_text",
70
+ "moe_intermediate_size": 512,
71
+ "mtp_num_hidden_layers": 1,
72
+ "mtp_use_dedicated_embeddings": false,
73
+ "num_attention_heads": 16,
74
+ "num_experts": 256,
75
+ "num_experts_per_tok": 8,
76
+ "num_hidden_layers": 40,
77
+ "num_key_value_heads": 2,
78
+ "output_router_logits": false,
79
+ "pad_token_id": null,
80
+ "partial_rotary_factor": 0.25,
81
+ "rms_norm_eps": 1e-06,
82
+ "rope_parameters": {
83
+ "mrope_interleaved": true,
84
+ "mrope_section": [
85
+ 11,
86
+ 11,
87
+ 10
88
+ ],
89
+ "partial_rotary_factor": 0.25,
90
+ "rope_theta": 10000000,
91
+ "rope_type": "default"
92
+ },
93
+ "router_aux_loss_coef": 0.001,
94
+ "shared_expert_intermediate_size": 512,
95
+ "tie_word_embeddings": false,
96
+ "use_cache": true,
97
+ "vocab_size": 248320
98
+ },
99
+ "tie_word_embeddings": false,
100
+ "transformers_version": "5.5.4",
101
+ "video_token_id": 248057,
102
+ "vision_config": {
103
+ "deepstack_visual_indexes": [],
104
+ "depth": 27,
105
+ "dtype": "bfloat16",
106
+ "hidden_act": "gelu_pytorch_tanh",
107
+ "hidden_size": 1152,
108
+ "in_channels": 3,
109
+ "initializer_range": 0.02,
110
+ "intermediate_size": 4304,
111
+ "model_type": "qwen3_5_moe",
112
+ "num_heads": 16,
113
+ "num_position_embeddings": 2304,
114
+ "out_hidden_size": 2048,
115
+ "patch_size": 16,
116
+ "spatial_merge_size": 2,
117
+ "temporal_patch_size": 2
118
+ },
119
+ "vision_end_token_id": 248054,
120
+ "vision_start_token_id": 248053
121
+ }
generation_config.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 248044,
3
+ "do_sample": true,
4
+ "eos_token_id": [
5
+ 248046,
6
+ 248044
7
+ ],
8
+ "pad_token_id": 248044,
9
+ "temperature": 1.0,
10
+ "top_k": 20,
11
+ "top_p": 0.95,
12
+ "transformers_version": "5.5.4"
13
+ }
model-00001-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ce76a4dc56f92765303933ccd5ec8bccb4f03616c002be4124ba305e7c5f3ea9
3
+ size 49739502312
model-00002-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c3b8c352bd993564d13c2c20e7c6e7d9aab61614ee88587a1b03ce3fb11f2fe0
3
+ size 20474998624
model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:639e352c0f904c1875d448ebed6f6faac005fd3eb58393b7f1fb3ff044e5ca03
3
+ size 19989510
tokenizer_config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "audio_bos_token": "<|audio_start|>",
4
+ "audio_eos_token": "<|audio_end|>",
5
+ "audio_token": "<|audio_pad|>",
6
+ "backend": "tokenizers",
7
+ "bos_token": null,
8
+ "clean_up_tokenization_spaces": false,
9
+ "eos_token": "<|im_end|>",
10
+ "errors": "replace",
11
+ "image_token": "<|image_pad|>",
12
+ "is_local": false,
13
+ "model_max_length": 262144,
14
+ "model_specific_special_tokens": {
15
+ "audio_bos_token": "<|audio_start|>",
16
+ "audio_eos_token": "<|audio_end|>",
17
+ "audio_token": "<|audio_pad|>",
18
+ "image_token": "<|image_pad|>",
19
+ "video_token": "<|video_pad|>",
20
+ "vision_bos_token": "<|vision_start|>",
21
+ "vision_eos_token": "<|vision_end|>"
22
+ },
23
+ "pad_token": "<|endoftext|>",
24
+ "pretokenize_regex": "(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?[\\p{L}\\p{M}]+|\\p{N}| ?[^\\s\\p{L}\\p{M}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+",
25
+ "split_special_tokens": false,
26
+ "tokenizer_class": "TokenizersBackend",
27
+ "unk_token": null,
28
+ "video_token": "<|video_pad|>",
29
+ "vision_bos_token": "<|vision_start|>",
30
+ "vision_eos_token": "<|vision_end|>"
31
+ }