AEON-7 commited on
Commit
ce62efb
·
verified ·
1 Parent(s): 0fd8f49

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored
4
+ language:
5
+ - en
6
+ - zh
7
+ - multilingual
8
+ library_name: transformers
9
+ pipeline_tag: text-generation
10
+ tags:
11
+ - abliterated
12
+ - uncensored
13
+ - qwen3
14
+ - qwen3.6
15
+ - nvfp4
16
+ - compressed-tensors
17
+ - llmcompressor
18
+ - hybrid-attention
19
+ - mamba
20
+ - gated-deltanet
21
+ - multimodal
22
+ - aeon
23
+ - dgx-spark
24
+ ---
25
+
26
+ # Qwen3.6-27B-AEON-Ultimate-Uncensored-NVFP4
27
+
28
+ NVFP4 hardware-quantized release of [`AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored`](https://huggingface.co/AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored). Same model, same 0/50 refusal rate, same preserved capabilities — now compressed from **51 GB BF16** to **26 GB NVFP4** for native FP4 hardware throughput on DGX Spark / GB10 / Blackwell sm_121a.
29
+
30
+ This release is **multimodal-preserved** (vision tower stays BF16 — text + image inference fully functional) and **hybrid-attention-preserved** (the 48 linear-attention / GatedDeltaNet layers stay BF16; FP4 only applies to the 16 full-attention layers' output projections + all MLPs).
31
+
32
+ ---
33
+
34
+ ## What Changed vs BF16
35
+
36
+ | Aspect | BF16 (source) | NVFP4 (this release) |
37
+ |---|---|---|
38
+ | Disk size | 51 GB | **26 GB** (49% reduction) |
39
+ | Refusal rate | 0/50 | **inherited — to be verified post-deploy** |
40
+ | Multimodal | preserved | **preserved (vision BF16, no degradation)** |
41
+ | Hybrid SSM | repaired + intact | **intact (linear_attn BF16-preserved)** |
42
+ | Hardware target | A100 / H100 / RTX PRO 6000 BF16 | **DGX Spark (GB10), B100/B200, RTX PRO 6000 Blackwell** with native FP4 throughput |
43
+ | KL vs BF16 source | n/a | **expected ≤0.001** (typical for this recipe class) |
44
+
45
+ The NVFP4 quantization scheme is NVIDIA-mandated: E2M1 element format, block_size=16, FP8 E4M3 per-block scales, FP32 per-tensor scale, symmetric signed.
46
+
47
+ ---
48
+
49
+ ## Quantization Recipe
50
+
51
+ Tool: **llm-compressor 0.10.1.dev107** (vllm-project) using `QuantizationModifier(scheme="NVFP4")` post-training quantization.
52
+
53
+ ```python
54
+ from llmcompressor.modifiers.quantization import QuantizationModifier
55
+ recipe = QuantizationModifier(
56
+ targets="Linear",
57
+ scheme="NVFP4",
58
+ ignore=[
59
+ "lm_head", # always
60
+ "re:.*embed_tokens.*", # always
61
+ "re:.*\\.visual\\..*", # vision tower BF16 — preserves multimodal
62
+ "re:.*visual\\..*",
63
+ "re:.*linear_attn\\..*", # SSM/GDN BF16 — Mamba state collapses under FP4
64
+ "re:.*norm.*",
65
+ "re:.*q_norm.*",
66
+ "re:.*k_norm.*",
67
+ ],
68
+ )
69
+ ```
70
+
71
+ **Calibration:** open-platypus, 512 samples × 4096 tokens.
72
+ **Pipeline:** `sequential` with `sequential_targets=["Qwen3_5DecoderLayer"]` — required for hybrid stacks (mixed full + linear attention layers); without explicit targeting, llm-compressor's auto-discovery silently skips layers.
73
+ **Loader:** `AutoModelForImageTextToText` to preserve the `Qwen3_5ForConditionalGeneration` multimodal class.
74
+ **Processor:** passed explicitly to `oneshot()` to avoid the "model processor required when a dataset is provided" failure on multimodal builds without torchvision.
75
+
76
+ **Verification (pass):**
77
+ - 1 shard, 1952 keys
78
+ - 64 quantized full-attention projections (16 layers × 4 q/k/v/o)
79
+ - 432 `linear_attn.*` keys preserved BF16 (48 layers × 9 modules)
80
+ - 333 `visual.*` keys preserved BF16 (vision tower intact)
81
+ - 319 norm keys preserved BF16
82
+ - `lm_head` and `embed_tokens` preserved BF16
83
+ - NVFP4-packed weights present
84
+ - `input_global_scale` magnitudes 142–346 (healthy range)
85
+
86
+ Wall-clock quant time: ~57 minutes on 1× RTX PRO 6000 Blackwell (96 GB).
87
+
88
+ ---
89
+
90
+ ## Deployment
91
+
92
+ ### vLLM on DGX Spark (GB10 / sm_121a) — recommended
93
+
94
+ Use a vLLM build with the SM121 CUTLASS NVFP4 patches applied (e.g. `vllm-spark-omni-q36:v1.2`). The patched CUTLASS path uses native FP4 tensor-core kernels and **outperforms the Marlin fallback** — do NOT force `VLLM_NVFP4_GEMM_BACKEND=marlin` (that's the workaround for stock vLLM builds where CUTLASS is broken on SM121).
95
+
96
+ ```bash
97
+ docker run --gpus all --ipc=host --network=host \
98
+ -e TORCH_CUDA_ARCH_LIST="12.0+PTX" \
99
+ -e PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \
100
+ -e VLLM_USE_FLASHINFER_MOE_FP4=0 \
101
+ -v /path/to/model:/models/aeon-ultimate \
102
+ vllm-spark-omni-q36:v1.2 \
103
+ vllm serve /models/aeon-ultimate \
104
+ --served-model-name aeon-ultimate \
105
+ --host 0.0.0.0 --port 8000 \
106
+ --tensor-parallel-size 1 \
107
+ --dtype auto \
108
+ --quantization compressed-tensors \
109
+ --max-model-len 262144 \
110
+ --max-num-seqs 128 \
111
+ --max-num-batched-tokens 65536 \
112
+ --gpu-memory-utilization 0.85 \
113
+ --enable-chunked-prefill \
114
+ --no-enable-prefix-caching \
115
+ --load-format safetensors \
116
+ --trust-remote-code \
117
+ --enable-auto-tool-choice \
118
+ --tool-call-parser qwen3_coder \
119
+ --reasoning-parser qwen3 \
120
+ --attention-backend flash_attn \
121
+ --mm-encoder-tp-mode data \
122
+ --mm-processor-cache-type shm
123
+ ```
124
+
125
+ ### Python (transformers) — for testing or non-vLLM serving
126
+
127
+ ```python
128
+ from transformers import AutoModelForImageTextToText, AutoTokenizer
129
+ import torch
130
+
131
+ model_id = "AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-NVFP4"
132
+ tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
133
+ model = AutoModelForImageTextToText.from_pretrained(
134
+ model_id,
135
+ dtype=torch.bfloat16, # vision tower + non-quantized weights
136
+ device_map="cuda:0",
137
+ trust_remote_code=True,
138
+ )
139
+
140
+ messages = [{"role": "user", "content": "Your prompt here"}]
141
+ text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
142
+ inputs = tokenizer(text, return_tensors="pt").to(model.device)
143
+ outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
144
+ print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
145
+ ```
146
+
147
+ Requires `compressed-tensors >= 0.12` for NVFP4 dequant on the fly.
148
+
149
+ ---
150
+
151
+ ## Hardware notes
152
+
153
+ | Hardware | Notes |
154
+ |---|---|
155
+ | **DGX Spark (GB10, sm_121a)** | Primary target. Use patched vLLM CUTLASS path. Expect ~50 tok/s single-stream after warmup. |
156
+ | **B100 / B200 (sm_100)** | Native FP4 compute via `tcgen05`/UTCQMMA — fastest hardware for this format. |
157
+ | **RTX PRO 6000 Blackwell (sm_120)** | Native FP4 via CUTLASS path. Excellent throughput. |
158
+ | **A100 / H100 (sm_80, sm_90)** | NVFP4 dequantizes to BF16/FP8 at kernel level — works but no FP4 throughput advantage. Use BF16 release instead for best perf on these. |
159
+
160
+ ---
161
+
162
+ ## Provenance
163
+
164
+ - **BF16 source:** [`AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored`](https://huggingface.co/AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored) — see source card for full pipeline (FernflowerAI SSM repair → abliterix-v1.4 abliteration → trial 46 of 50 selected for capability preservation).
165
+ - **Original base:** [`Qwen/Qwen3.6-27B`](https://huggingface.co/Qwen/Qwen3.6-27B) by Alibaba.
166
+ - **Quantization tool:** [llm-compressor](https://github.com/vllm-project/llm-compressor) by vllm-project.
167
+ - **NVFP4 scheme:** [NVIDIA NVFP4 specification](https://developer.nvidia.com/blog/introducing-nvfp4-for-efficient-and-accurate-low-precision-inference/).
168
+
169
+ ---
170
+
171
+ ## User Responsibility & Arbitration Clause
172
+
173
+ **By accessing, downloading, using, running inference on, fine-tuning, merging, quantizing, distributing, integrating, or otherwise interacting with this model, you acknowledge and agree to the following:**
174
+
175
+ 1. **Sole Responsibility.** You, the user, are **solely and exclusively responsible** for every prompt issued, every response produced, every downstream action taken in reliance on those responses, and any harm — direct, indirect, consequential, or otherwise — that results.
176
+
177
+ 2. **No Warranty.** This model is provided strictly **"AS IS"**, without warranty of any kind, express or implied, including but not limited to warranties of merchantability, fitness for a particular purpose, non-infringement, safety, alignment, factual accuracy, or legal compliance in any jurisdiction. No contributor, author, publisher, or hosting platform assumes liability of any kind for outputs or downstream use.
178
+
179
+ 3. **Legal Compliance.** You are responsible for ensuring that your use complies with **all applicable laws, regulations, terms of service, industry codes of conduct, professional ethical standards, and organizational policies** in every jurisdiction in which you operate or in which your outputs may be received. The unaligned nature of this model does not grant you any legal authorization you did not already have.
180
+
181
+ 4. **Operational Safety Layer.** An uncensored model is not a toy. You are expected to implement appropriate **downstream safety layers** proportionate to your deployment context, including but not limited to: input validation, output filtering, content moderation, audit logging, rate limiting, access controls, and human-in-the-loop review for high-risk workflows. A production deployment of this model without such layers is **unsafe by construction** and is not a supported use case.
182
+
183
+ 5. **Heightened Duty of Care.** The absence of internal refusal behavior means the duty of care that would ordinarily rest partly with the model rests entirely with you. You are expected to exercise greater — not lesser — caution, forethought, and ethical discipline when operating this model. If you are uncertain whether your contemplated use is ethical, legal, or wise, the correct action is to **not make the request**.
184
+
185
+ 6. **No Endorsement of Outputs.** The authors, contributors, and publishers do not endorse, adopt, or take responsibility for any specific output. Outputs are a stochastic function of the prompt, the weights, and the sampler state — not a statement of position by any human.
186
+
187
+ 7. **Arbitration.** Any dispute, claim, or controversy arising out of or relating to the use of this model, its outputs, or this clause shall be resolved through **binding individual arbitration** under the rules of a mutually agreed arbitration body (or, absent agreement, the American Arbitration Association's Consumer Arbitration Rules), waiving any right to a jury trial, class action, representative action, or consolidated proceeding. Venue shall be the jurisdiction of the disputing party bringing the claim. Costs and attorneys' fees shall be allocated per the applicable arbitration rules. This clause does not expand, and where legally prohibited does not establish, any liability in the other direction; it limits how the user may proceed when alleging harm tied to their own use of this model.
188
+
189
+ 8. **Indemnification.** You agree to indemnify, defend, and hold harmless the authors, contributors, and publishers of this model from and against any claims, damages, losses, liabilities, costs, and expenses (including reasonable attorneys' fees) arising from or related to your use of the model or your breach of this clause.
190
+
191
+ 9. **Severability.** If any provision is held unenforceable in a given jurisdiction, the remaining provisions remain in full force, and the unenforceable provision is replaced by the closest enforceable equivalent consistent with the original intent.
192
+
193
+ 10. **Acceptance.** Your use of this model constitutes your acceptance of this clause in full. If you do not accept, do not use the model.
194
+
195
+ **This model is a tool with no opinions of its own. You supply the opinions. You supply the judgement. You supply the ethics. The outputs carry your fingerprints, not the model's.**
196
+
197
+ ---
198
+
199
+ ## License
200
+
201
+ Apache 2.0 (inherited from `Qwen/Qwen3.6-27B`).
chat_template.jinja ADDED
@@ -0,0 +1,154 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- set image_count = namespace(value=0) %}
2
+ {%- set video_count = namespace(value=0) %}
3
+ {%- macro render_content(content, do_vision_count, is_system_content=false) %}
4
+ {%- if content is string %}
5
+ {{- content }}
6
+ {%- elif content is iterable and content is not mapping %}
7
+ {%- for item in content %}
8
+ {%- if 'image' in item or 'image_url' in item or item.type == 'image' %}
9
+ {%- if is_system_content %}
10
+ {{- raise_exception('System message cannot contain images.') }}
11
+ {%- endif %}
12
+ {%- if do_vision_count %}
13
+ {%- set image_count.value = image_count.value + 1 %}
14
+ {%- endif %}
15
+ {%- if add_vision_id %}
16
+ {{- 'Picture ' ~ image_count.value ~ ': ' }}
17
+ {%- endif %}
18
+ {{- '<|vision_start|><|image_pad|><|vision_end|>' }}
19
+ {%- elif 'video' in item or item.type == 'video' %}
20
+ {%- if is_system_content %}
21
+ {{- raise_exception('System message cannot contain videos.') }}
22
+ {%- endif %}
23
+ {%- if do_vision_count %}
24
+ {%- set video_count.value = video_count.value + 1 %}
25
+ {%- endif %}
26
+ {%- if add_vision_id %}
27
+ {{- 'Video ' ~ video_count.value ~ ': ' }}
28
+ {%- endif %}
29
+ {{- '<|vision_start|><|video_pad|><|vision_end|>' }}
30
+ {%- elif 'text' in item %}
31
+ {{- item.text }}
32
+ {%- else %}
33
+ {{- raise_exception('Unexpected item type in content.') }}
34
+ {%- endif %}
35
+ {%- endfor %}
36
+ {%- elif content is none or content is undefined %}
37
+ {{- '' }}
38
+ {%- else %}
39
+ {{- raise_exception('Unexpected content type.') }}
40
+ {%- endif %}
41
+ {%- endmacro %}
42
+ {%- if not messages %}
43
+ {{- raise_exception('No messages provided.') }}
44
+ {%- endif %}
45
+ {%- if tools and tools is iterable and tools is not mapping %}
46
+ {{- '<|im_start|>system\n' }}
47
+ {{- "# Tools\n\nYou have access to the following functions:\n\n<tools>" }}
48
+ {%- for tool in tools %}
49
+ {{- "\n" }}
50
+ {{- tool | tojson }}
51
+ {%- endfor %}
52
+ {{- "\n</tools>" }}
53
+ {{- '\n\nIf you choose to call a function ONLY reply in the following format with NO suffix:\n\n<tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>\nvalue_1\n</parameter>\n<parameter=example_parameter_2>\nThis is the value for the second parameter\nthat can span\nmultiple lines\n</parameter>\n</function>\n</tool_call>\n\n<IMPORTANT>\nReminder:\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags\n- Required parameters MUST be specified\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\n</IMPORTANT>' }}
54
+ {%- if messages[0].role == 'system' %}
55
+ {%- set content = render_content(messages[0].content, false, true)|trim %}
56
+ {%- if content %}
57
+ {{- '\n\n' + content }}
58
+ {%- endif %}
59
+ {%- endif %}
60
+ {{- '<|im_end|>\n' }}
61
+ {%- else %}
62
+ {%- if messages[0].role == 'system' %}
63
+ {%- set content = render_content(messages[0].content, false, true)|trim %}
64
+ {{- '<|im_start|>system\n' + content + '<|im_end|>\n' }}
65
+ {%- endif %}
66
+ {%- endif %}
67
+ {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
68
+ {%- for message in messages[::-1] %}
69
+ {%- set index = (messages|length - 1) - loop.index0 %}
70
+ {%- if ns.multi_step_tool and message.role == "user" %}
71
+ {%- set content = render_content(message.content, false)|trim %}
72
+ {%- if not(content.startswith('<tool_response>') and content.endswith('</tool_response>')) %}
73
+ {%- set ns.multi_step_tool = false %}
74
+ {%- set ns.last_query_index = index %}
75
+ {%- endif %}
76
+ {%- endif %}
77
+ {%- endfor %}
78
+ {%- if ns.multi_step_tool %}
79
+ {{- raise_exception('No user query found in messages.') }}
80
+ {%- endif %}
81
+ {%- for message in messages %}
82
+ {%- set content = render_content(message.content, true)|trim %}
83
+ {%- if message.role == "system" %}
84
+ {%- if not loop.first %}
85
+ {{- raise_exception('System message must be at the beginning.') }}
86
+ {%- endif %}
87
+ {%- elif message.role == "user" %}
88
+ {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
89
+ {%- elif message.role == "assistant" %}
90
+ {%- set reasoning_content = '' %}
91
+ {%- if message.reasoning_content is string %}
92
+ {%- set reasoning_content = message.reasoning_content %}
93
+ {%- else %}
94
+ {%- if '</think>' in content %}
95
+ {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
96
+ {%- set content = content.split('</think>')[-1].lstrip('\n') %}
97
+ {%- endif %}
98
+ {%- endif %}
99
+ {%- set reasoning_content = reasoning_content|trim %}
100
+ {%- if (preserve_thinking is defined and preserve_thinking is true) or (loop.index0 > ns.last_query_index) %}
101
+ {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content + '\n</think>\n\n' + content }}
102
+ {%- else %}
103
+ {{- '<|im_start|>' + message.role + '\n' + content }}
104
+ {%- endif %}
105
+ {%- if message.tool_calls and message.tool_calls is iterable and message.tool_calls is not mapping %}
106
+ {%- for tool_call in message.tool_calls %}
107
+ {%- if tool_call.function is defined %}
108
+ {%- set tool_call = tool_call.function %}
109
+ {%- endif %}
110
+ {%- if loop.first %}
111
+ {%- if content|trim %}
112
+ {{- '\n\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
113
+ {%- else %}
114
+ {{- '<tool_call>\n<function=' + tool_call.name + '>\n' }}
115
+ {%- endif %}
116
+ {%- else %}
117
+ {{- '\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
118
+ {%- endif %}
119
+ {%- if tool_call.arguments is defined %}
120
+ {%- for args_name, args_value in tool_call.arguments|items %}
121
+ {{- '<parameter=' + args_name + '>\n' }}
122
+ {%- set args_value = args_value | string if args_value is string else args_value | tojson | safe %}
123
+ {{- args_value }}
124
+ {{- '\n</parameter>\n' }}
125
+ {%- endfor %}
126
+ {%- endif %}
127
+ {{- '</function>\n</tool_call>' }}
128
+ {%- endfor %}
129
+ {%- endif %}
130
+ {{- '<|im_end|>\n' }}
131
+ {%- elif message.role == "tool" %}
132
+ {%- if loop.previtem and loop.previtem.role != "tool" %}
133
+ {{- '<|im_start|>user' }}
134
+ {%- endif %}
135
+ {{- '\n<tool_response>\n' }}
136
+ {{- content }}
137
+ {{- '\n</tool_response>' }}
138
+ {%- if not loop.last and loop.nextitem.role != "tool" %}
139
+ {{- '<|im_end|>\n' }}
140
+ {%- elif loop.last %}
141
+ {{- '<|im_end|>\n' }}
142
+ {%- endif %}
143
+ {%- else %}
144
+ {{- raise_exception('Unexpected message role.') }}
145
+ {%- endif %}
146
+ {%- endfor %}
147
+ {%- if add_generation_prompt %}
148
+ {{- '<|im_start|>assistant\n' }}
149
+ {%- if enable_thinking is defined and enable_thinking is false %}
150
+ {{- '<think>\n\n</think>\n\n' }}
151
+ {%- else %}
152
+ {{- '<think>\n' }}
153
+ {%- endif %}
154
+ {%- endif %}
config.json ADDED
@@ -0,0 +1,542 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Qwen3_5ForConditionalGeneration"
4
+ ],
5
+ "dtype": "bfloat16",
6
+ "image_token_id": 248056,
7
+ "language_model_only": false,
8
+ "model_type": "qwen3_5",
9
+ "quantization_config": {
10
+ "config_groups": {
11
+ "group_0": {
12
+ "format": "nvfp4-pack-quantized",
13
+ "input_activations": {
14
+ "actorder": null,
15
+ "block_structure": null,
16
+ "dynamic": "local",
17
+ "group_size": 16,
18
+ "num_bits": 4,
19
+ "observer": "static_minmax",
20
+ "observer_kwargs": {},
21
+ "scale_dtype": "torch.float8_e4m3fn",
22
+ "strategy": "tensor_group",
23
+ "symmetric": true,
24
+ "type": "float",
25
+ "zp_dtype": null
26
+ },
27
+ "output_activations": null,
28
+ "targets": [
29
+ "Linear"
30
+ ],
31
+ "weights": {
32
+ "actorder": null,
33
+ "block_structure": null,
34
+ "dynamic": false,
35
+ "group_size": 16,
36
+ "num_bits": 4,
37
+ "observer": "memoryless_minmax",
38
+ "observer_kwargs": {},
39
+ "scale_dtype": "torch.float8_e4m3fn",
40
+ "strategy": "tensor_group",
41
+ "symmetric": true,
42
+ "type": "float",
43
+ "zp_dtype": null
44
+ }
45
+ }
46
+ },
47
+ "format": "nvfp4-pack-quantized",
48
+ "global_compression_ratio": null,
49
+ "ignore": [
50
+ "model.visual.blocks.0.attn.qkv",
51
+ "model.visual.blocks.0.attn.proj",
52
+ "model.visual.blocks.0.mlp.linear_fc1",
53
+ "model.visual.blocks.0.mlp.linear_fc2",
54
+ "model.visual.blocks.1.attn.qkv",
55
+ "model.visual.blocks.1.attn.proj",
56
+ "model.visual.blocks.1.mlp.linear_fc1",
57
+ "model.visual.blocks.1.mlp.linear_fc2",
58
+ "model.visual.blocks.2.attn.qkv",
59
+ "model.visual.blocks.2.attn.proj",
60
+ "model.visual.blocks.2.mlp.linear_fc1",
61
+ "model.visual.blocks.2.mlp.linear_fc2",
62
+ "model.visual.blocks.3.attn.qkv",
63
+ "model.visual.blocks.3.attn.proj",
64
+ "model.visual.blocks.3.mlp.linear_fc1",
65
+ "model.visual.blocks.3.mlp.linear_fc2",
66
+ "model.visual.blocks.4.attn.qkv",
67
+ "model.visual.blocks.4.attn.proj",
68
+ "model.visual.blocks.4.mlp.linear_fc1",
69
+ "model.visual.blocks.4.mlp.linear_fc2",
70
+ "model.visual.blocks.5.attn.qkv",
71
+ "model.visual.blocks.5.attn.proj",
72
+ "model.visual.blocks.5.mlp.linear_fc1",
73
+ "model.visual.blocks.5.mlp.linear_fc2",
74
+ "model.visual.blocks.6.attn.qkv",
75
+ "model.visual.blocks.6.attn.proj",
76
+ "model.visual.blocks.6.mlp.linear_fc1",
77
+ "model.visual.blocks.6.mlp.linear_fc2",
78
+ "model.visual.blocks.7.attn.qkv",
79
+ "model.visual.blocks.7.attn.proj",
80
+ "model.visual.blocks.7.mlp.linear_fc1",
81
+ "model.visual.blocks.7.mlp.linear_fc2",
82
+ "model.visual.blocks.8.attn.qkv",
83
+ "model.visual.blocks.8.attn.proj",
84
+ "model.visual.blocks.8.mlp.linear_fc1",
85
+ "model.visual.blocks.8.mlp.linear_fc2",
86
+ "model.visual.blocks.9.attn.qkv",
87
+ "model.visual.blocks.9.attn.proj",
88
+ "model.visual.blocks.9.mlp.linear_fc1",
89
+ "model.visual.blocks.9.mlp.linear_fc2",
90
+ "model.visual.blocks.10.attn.qkv",
91
+ "model.visual.blocks.10.attn.proj",
92
+ "model.visual.blocks.10.mlp.linear_fc1",
93
+ "model.visual.blocks.10.mlp.linear_fc2",
94
+ "model.visual.blocks.11.attn.qkv",
95
+ "model.visual.blocks.11.attn.proj",
96
+ "model.visual.blocks.11.mlp.linear_fc1",
97
+ "model.visual.blocks.11.mlp.linear_fc2",
98
+ "model.visual.blocks.12.attn.qkv",
99
+ "model.visual.blocks.12.attn.proj",
100
+ "model.visual.blocks.12.mlp.linear_fc1",
101
+ "model.visual.blocks.12.mlp.linear_fc2",
102
+ "model.visual.blocks.13.attn.qkv",
103
+ "model.visual.blocks.13.attn.proj",
104
+ "model.visual.blocks.13.mlp.linear_fc1",
105
+ "model.visual.blocks.13.mlp.linear_fc2",
106
+ "model.visual.blocks.14.attn.qkv",
107
+ "model.visual.blocks.14.attn.proj",
108
+ "model.visual.blocks.14.mlp.linear_fc1",
109
+ "model.visual.blocks.14.mlp.linear_fc2",
110
+ "model.visual.blocks.15.attn.qkv",
111
+ "model.visual.blocks.15.attn.proj",
112
+ "model.visual.blocks.15.mlp.linear_fc1",
113
+ "model.visual.blocks.15.mlp.linear_fc2",
114
+ "model.visual.blocks.16.attn.qkv",
115
+ "model.visual.blocks.16.attn.proj",
116
+ "model.visual.blocks.16.mlp.linear_fc1",
117
+ "model.visual.blocks.16.mlp.linear_fc2",
118
+ "model.visual.blocks.17.attn.qkv",
119
+ "model.visual.blocks.17.attn.proj",
120
+ "model.visual.blocks.17.mlp.linear_fc1",
121
+ "model.visual.blocks.17.mlp.linear_fc2",
122
+ "model.visual.blocks.18.attn.qkv",
123
+ "model.visual.blocks.18.attn.proj",
124
+ "model.visual.blocks.18.mlp.linear_fc1",
125
+ "model.visual.blocks.18.mlp.linear_fc2",
126
+ "model.visual.blocks.19.attn.qkv",
127
+ "model.visual.blocks.19.attn.proj",
128
+ "model.visual.blocks.19.mlp.linear_fc1",
129
+ "model.visual.blocks.19.mlp.linear_fc2",
130
+ "model.visual.blocks.20.attn.qkv",
131
+ "model.visual.blocks.20.attn.proj",
132
+ "model.visual.blocks.20.mlp.linear_fc1",
133
+ "model.visual.blocks.20.mlp.linear_fc2",
134
+ "model.visual.blocks.21.attn.qkv",
135
+ "model.visual.blocks.21.attn.proj",
136
+ "model.visual.blocks.21.mlp.linear_fc1",
137
+ "model.visual.blocks.21.mlp.linear_fc2",
138
+ "model.visual.blocks.22.attn.qkv",
139
+ "model.visual.blocks.22.attn.proj",
140
+ "model.visual.blocks.22.mlp.linear_fc1",
141
+ "model.visual.blocks.22.mlp.linear_fc2",
142
+ "model.visual.blocks.23.attn.qkv",
143
+ "model.visual.blocks.23.attn.proj",
144
+ "model.visual.blocks.23.mlp.linear_fc1",
145
+ "model.visual.blocks.23.mlp.linear_fc2",
146
+ "model.visual.blocks.24.attn.qkv",
147
+ "model.visual.blocks.24.attn.proj",
148
+ "model.visual.blocks.24.mlp.linear_fc1",
149
+ "model.visual.blocks.24.mlp.linear_fc2",
150
+ "model.visual.blocks.25.attn.qkv",
151
+ "model.visual.blocks.25.attn.proj",
152
+ "model.visual.blocks.25.mlp.linear_fc1",
153
+ "model.visual.blocks.25.mlp.linear_fc2",
154
+ "model.visual.blocks.26.attn.qkv",
155
+ "model.visual.blocks.26.attn.proj",
156
+ "model.visual.blocks.26.mlp.linear_fc1",
157
+ "model.visual.blocks.26.mlp.linear_fc2",
158
+ "model.visual.merger.linear_fc1",
159
+ "model.visual.merger.linear_fc2",
160
+ "model.language_model.layers.0.linear_attn.out_proj",
161
+ "model.language_model.layers.0.linear_attn.in_proj_qkv",
162
+ "model.language_model.layers.0.linear_attn.in_proj_z",
163
+ "model.language_model.layers.0.linear_attn.in_proj_b",
164
+ "model.language_model.layers.0.linear_attn.in_proj_a",
165
+ "model.language_model.layers.1.linear_attn.out_proj",
166
+ "model.language_model.layers.1.linear_attn.in_proj_qkv",
167
+ "model.language_model.layers.1.linear_attn.in_proj_z",
168
+ "model.language_model.layers.1.linear_attn.in_proj_b",
169
+ "model.language_model.layers.1.linear_attn.in_proj_a",
170
+ "model.language_model.layers.2.linear_attn.out_proj",
171
+ "model.language_model.layers.2.linear_attn.in_proj_qkv",
172
+ "model.language_model.layers.2.linear_attn.in_proj_z",
173
+ "model.language_model.layers.2.linear_attn.in_proj_b",
174
+ "model.language_model.layers.2.linear_attn.in_proj_a",
175
+ "model.language_model.layers.4.linear_attn.out_proj",
176
+ "model.language_model.layers.4.linear_attn.in_proj_qkv",
177
+ "model.language_model.layers.4.linear_attn.in_proj_z",
178
+ "model.language_model.layers.4.linear_attn.in_proj_b",
179
+ "model.language_model.layers.4.linear_attn.in_proj_a",
180
+ "model.language_model.layers.5.linear_attn.out_proj",
181
+ "model.language_model.layers.5.linear_attn.in_proj_qkv",
182
+ "model.language_model.layers.5.linear_attn.in_proj_z",
183
+ "model.language_model.layers.5.linear_attn.in_proj_b",
184
+ "model.language_model.layers.5.linear_attn.in_proj_a",
185
+ "model.language_model.layers.6.linear_attn.out_proj",
186
+ "model.language_model.layers.6.linear_attn.in_proj_qkv",
187
+ "model.language_model.layers.6.linear_attn.in_proj_z",
188
+ "model.language_model.layers.6.linear_attn.in_proj_b",
189
+ "model.language_model.layers.6.linear_attn.in_proj_a",
190
+ "model.language_model.layers.8.linear_attn.out_proj",
191
+ "model.language_model.layers.8.linear_attn.in_proj_qkv",
192
+ "model.language_model.layers.8.linear_attn.in_proj_z",
193
+ "model.language_model.layers.8.linear_attn.in_proj_b",
194
+ "model.language_model.layers.8.linear_attn.in_proj_a",
195
+ "model.language_model.layers.9.linear_attn.out_proj",
196
+ "model.language_model.layers.9.linear_attn.in_proj_qkv",
197
+ "model.language_model.layers.9.linear_attn.in_proj_z",
198
+ "model.language_model.layers.9.linear_attn.in_proj_b",
199
+ "model.language_model.layers.9.linear_attn.in_proj_a",
200
+ "model.language_model.layers.10.linear_attn.out_proj",
201
+ "model.language_model.layers.10.linear_attn.in_proj_qkv",
202
+ "model.language_model.layers.10.linear_attn.in_proj_z",
203
+ "model.language_model.layers.10.linear_attn.in_proj_b",
204
+ "model.language_model.layers.10.linear_attn.in_proj_a",
205
+ "model.language_model.layers.12.linear_attn.out_proj",
206
+ "model.language_model.layers.12.linear_attn.in_proj_qkv",
207
+ "model.language_model.layers.12.linear_attn.in_proj_z",
208
+ "model.language_model.layers.12.linear_attn.in_proj_b",
209
+ "model.language_model.layers.12.linear_attn.in_proj_a",
210
+ "model.language_model.layers.13.linear_attn.out_proj",
211
+ "model.language_model.layers.13.linear_attn.in_proj_qkv",
212
+ "model.language_model.layers.13.linear_attn.in_proj_z",
213
+ "model.language_model.layers.13.linear_attn.in_proj_b",
214
+ "model.language_model.layers.13.linear_attn.in_proj_a",
215
+ "model.language_model.layers.14.linear_attn.out_proj",
216
+ "model.language_model.layers.14.linear_attn.in_proj_qkv",
217
+ "model.language_model.layers.14.linear_attn.in_proj_z",
218
+ "model.language_model.layers.14.linear_attn.in_proj_b",
219
+ "model.language_model.layers.14.linear_attn.in_proj_a",
220
+ "model.language_model.layers.16.linear_attn.out_proj",
221
+ "model.language_model.layers.16.linear_attn.in_proj_qkv",
222
+ "model.language_model.layers.16.linear_attn.in_proj_z",
223
+ "model.language_model.layers.16.linear_attn.in_proj_b",
224
+ "model.language_model.layers.16.linear_attn.in_proj_a",
225
+ "model.language_model.layers.17.linear_attn.out_proj",
226
+ "model.language_model.layers.17.linear_attn.in_proj_qkv",
227
+ "model.language_model.layers.17.linear_attn.in_proj_z",
228
+ "model.language_model.layers.17.linear_attn.in_proj_b",
229
+ "model.language_model.layers.17.linear_attn.in_proj_a",
230
+ "model.language_model.layers.18.linear_attn.out_proj",
231
+ "model.language_model.layers.18.linear_attn.in_proj_qkv",
232
+ "model.language_model.layers.18.linear_attn.in_proj_z",
233
+ "model.language_model.layers.18.linear_attn.in_proj_b",
234
+ "model.language_model.layers.18.linear_attn.in_proj_a",
235
+ "model.language_model.layers.20.linear_attn.out_proj",
236
+ "model.language_model.layers.20.linear_attn.in_proj_qkv",
237
+ "model.language_model.layers.20.linear_attn.in_proj_z",
238
+ "model.language_model.layers.20.linear_attn.in_proj_b",
239
+ "model.language_model.layers.20.linear_attn.in_proj_a",
240
+ "model.language_model.layers.21.linear_attn.out_proj",
241
+ "model.language_model.layers.21.linear_attn.in_proj_qkv",
242
+ "model.language_model.layers.21.linear_attn.in_proj_z",
243
+ "model.language_model.layers.21.linear_attn.in_proj_b",
244
+ "model.language_model.layers.21.linear_attn.in_proj_a",
245
+ "model.language_model.layers.22.linear_attn.out_proj",
246
+ "model.language_model.layers.22.linear_attn.in_proj_qkv",
247
+ "model.language_model.layers.22.linear_attn.in_proj_z",
248
+ "model.language_model.layers.22.linear_attn.in_proj_b",
249
+ "model.language_model.layers.22.linear_attn.in_proj_a",
250
+ "model.language_model.layers.24.linear_attn.out_proj",
251
+ "model.language_model.layers.24.linear_attn.in_proj_qkv",
252
+ "model.language_model.layers.24.linear_attn.in_proj_z",
253
+ "model.language_model.layers.24.linear_attn.in_proj_b",
254
+ "model.language_model.layers.24.linear_attn.in_proj_a",
255
+ "model.language_model.layers.25.linear_attn.out_proj",
256
+ "model.language_model.layers.25.linear_attn.in_proj_qkv",
257
+ "model.language_model.layers.25.linear_attn.in_proj_z",
258
+ "model.language_model.layers.25.linear_attn.in_proj_b",
259
+ "model.language_model.layers.25.linear_attn.in_proj_a",
260
+ "model.language_model.layers.26.linear_attn.out_proj",
261
+ "model.language_model.layers.26.linear_attn.in_proj_qkv",
262
+ "model.language_model.layers.26.linear_attn.in_proj_z",
263
+ "model.language_model.layers.26.linear_attn.in_proj_b",
264
+ "model.language_model.layers.26.linear_attn.in_proj_a",
265
+ "model.language_model.layers.28.linear_attn.out_proj",
266
+ "model.language_model.layers.28.linear_attn.in_proj_qkv",
267
+ "model.language_model.layers.28.linear_attn.in_proj_z",
268
+ "model.language_model.layers.28.linear_attn.in_proj_b",
269
+ "model.language_model.layers.28.linear_attn.in_proj_a",
270
+ "model.language_model.layers.29.linear_attn.out_proj",
271
+ "model.language_model.layers.29.linear_attn.in_proj_qkv",
272
+ "model.language_model.layers.29.linear_attn.in_proj_z",
273
+ "model.language_model.layers.29.linear_attn.in_proj_b",
274
+ "model.language_model.layers.29.linear_attn.in_proj_a",
275
+ "model.language_model.layers.30.linear_attn.out_proj",
276
+ "model.language_model.layers.30.linear_attn.in_proj_qkv",
277
+ "model.language_model.layers.30.linear_attn.in_proj_z",
278
+ "model.language_model.layers.30.linear_attn.in_proj_b",
279
+ "model.language_model.layers.30.linear_attn.in_proj_a",
280
+ "model.language_model.layers.32.linear_attn.out_proj",
281
+ "model.language_model.layers.32.linear_attn.in_proj_qkv",
282
+ "model.language_model.layers.32.linear_attn.in_proj_z",
283
+ "model.language_model.layers.32.linear_attn.in_proj_b",
284
+ "model.language_model.layers.32.linear_attn.in_proj_a",
285
+ "model.language_model.layers.33.linear_attn.out_proj",
286
+ "model.language_model.layers.33.linear_attn.in_proj_qkv",
287
+ "model.language_model.layers.33.linear_attn.in_proj_z",
288
+ "model.language_model.layers.33.linear_attn.in_proj_b",
289
+ "model.language_model.layers.33.linear_attn.in_proj_a",
290
+ "model.language_model.layers.34.linear_attn.out_proj",
291
+ "model.language_model.layers.34.linear_attn.in_proj_qkv",
292
+ "model.language_model.layers.34.linear_attn.in_proj_z",
293
+ "model.language_model.layers.34.linear_attn.in_proj_b",
294
+ "model.language_model.layers.34.linear_attn.in_proj_a",
295
+ "model.language_model.layers.36.linear_attn.out_proj",
296
+ "model.language_model.layers.36.linear_attn.in_proj_qkv",
297
+ "model.language_model.layers.36.linear_attn.in_proj_z",
298
+ "model.language_model.layers.36.linear_attn.in_proj_b",
299
+ "model.language_model.layers.36.linear_attn.in_proj_a",
300
+ "model.language_model.layers.37.linear_attn.out_proj",
301
+ "model.language_model.layers.37.linear_attn.in_proj_qkv",
302
+ "model.language_model.layers.37.linear_attn.in_proj_z",
303
+ "model.language_model.layers.37.linear_attn.in_proj_b",
304
+ "model.language_model.layers.37.linear_attn.in_proj_a",
305
+ "model.language_model.layers.38.linear_attn.out_proj",
306
+ "model.language_model.layers.38.linear_attn.in_proj_qkv",
307
+ "model.language_model.layers.38.linear_attn.in_proj_z",
308
+ "model.language_model.layers.38.linear_attn.in_proj_b",
309
+ "model.language_model.layers.38.linear_attn.in_proj_a",
310
+ "model.language_model.layers.40.linear_attn.out_proj",
311
+ "model.language_model.layers.40.linear_attn.in_proj_qkv",
312
+ "model.language_model.layers.40.linear_attn.in_proj_z",
313
+ "model.language_model.layers.40.linear_attn.in_proj_b",
314
+ "model.language_model.layers.40.linear_attn.in_proj_a",
315
+ "model.language_model.layers.41.linear_attn.out_proj",
316
+ "model.language_model.layers.41.linear_attn.in_proj_qkv",
317
+ "model.language_model.layers.41.linear_attn.in_proj_z",
318
+ "model.language_model.layers.41.linear_attn.in_proj_b",
319
+ "model.language_model.layers.41.linear_attn.in_proj_a",
320
+ "model.language_model.layers.42.linear_attn.out_proj",
321
+ "model.language_model.layers.42.linear_attn.in_proj_qkv",
322
+ "model.language_model.layers.42.linear_attn.in_proj_z",
323
+ "model.language_model.layers.42.linear_attn.in_proj_b",
324
+ "model.language_model.layers.42.linear_attn.in_proj_a",
325
+ "model.language_model.layers.44.linear_attn.out_proj",
326
+ "model.language_model.layers.44.linear_attn.in_proj_qkv",
327
+ "model.language_model.layers.44.linear_attn.in_proj_z",
328
+ "model.language_model.layers.44.linear_attn.in_proj_b",
329
+ "model.language_model.layers.44.linear_attn.in_proj_a",
330
+ "model.language_model.layers.45.linear_attn.out_proj",
331
+ "model.language_model.layers.45.linear_attn.in_proj_qkv",
332
+ "model.language_model.layers.45.linear_attn.in_proj_z",
333
+ "model.language_model.layers.45.linear_attn.in_proj_b",
334
+ "model.language_model.layers.45.linear_attn.in_proj_a",
335
+ "model.language_model.layers.46.linear_attn.out_proj",
336
+ "model.language_model.layers.46.linear_attn.in_proj_qkv",
337
+ "model.language_model.layers.46.linear_attn.in_proj_z",
338
+ "model.language_model.layers.46.linear_attn.in_proj_b",
339
+ "model.language_model.layers.46.linear_attn.in_proj_a",
340
+ "model.language_model.layers.48.linear_attn.out_proj",
341
+ "model.language_model.layers.48.linear_attn.in_proj_qkv",
342
+ "model.language_model.layers.48.linear_attn.in_proj_z",
343
+ "model.language_model.layers.48.linear_attn.in_proj_b",
344
+ "model.language_model.layers.48.linear_attn.in_proj_a",
345
+ "model.language_model.layers.49.linear_attn.out_proj",
346
+ "model.language_model.layers.49.linear_attn.in_proj_qkv",
347
+ "model.language_model.layers.49.linear_attn.in_proj_z",
348
+ "model.language_model.layers.49.linear_attn.in_proj_b",
349
+ "model.language_model.layers.49.linear_attn.in_proj_a",
350
+ "model.language_model.layers.50.linear_attn.out_proj",
351
+ "model.language_model.layers.50.linear_attn.in_proj_qkv",
352
+ "model.language_model.layers.50.linear_attn.in_proj_z",
353
+ "model.language_model.layers.50.linear_attn.in_proj_b",
354
+ "model.language_model.layers.50.linear_attn.in_proj_a",
355
+ "model.language_model.layers.52.linear_attn.out_proj",
356
+ "model.language_model.layers.52.linear_attn.in_proj_qkv",
357
+ "model.language_model.layers.52.linear_attn.in_proj_z",
358
+ "model.language_model.layers.52.linear_attn.in_proj_b",
359
+ "model.language_model.layers.52.linear_attn.in_proj_a",
360
+ "model.language_model.layers.53.linear_attn.out_proj",
361
+ "model.language_model.layers.53.linear_attn.in_proj_qkv",
362
+ "model.language_model.layers.53.linear_attn.in_proj_z",
363
+ "model.language_model.layers.53.linear_attn.in_proj_b",
364
+ "model.language_model.layers.53.linear_attn.in_proj_a",
365
+ "model.language_model.layers.54.linear_attn.out_proj",
366
+ "model.language_model.layers.54.linear_attn.in_proj_qkv",
367
+ "model.language_model.layers.54.linear_attn.in_proj_z",
368
+ "model.language_model.layers.54.linear_attn.in_proj_b",
369
+ "model.language_model.layers.54.linear_attn.in_proj_a",
370
+ "model.language_model.layers.56.linear_attn.out_proj",
371
+ "model.language_model.layers.56.linear_attn.in_proj_qkv",
372
+ "model.language_model.layers.56.linear_attn.in_proj_z",
373
+ "model.language_model.layers.56.linear_attn.in_proj_b",
374
+ "model.language_model.layers.56.linear_attn.in_proj_a",
375
+ "model.language_model.layers.57.linear_attn.out_proj",
376
+ "model.language_model.layers.57.linear_attn.in_proj_qkv",
377
+ "model.language_model.layers.57.linear_attn.in_proj_z",
378
+ "model.language_model.layers.57.linear_attn.in_proj_b",
379
+ "model.language_model.layers.57.linear_attn.in_proj_a",
380
+ "model.language_model.layers.58.linear_attn.out_proj",
381
+ "model.language_model.layers.58.linear_attn.in_proj_qkv",
382
+ "model.language_model.layers.58.linear_attn.in_proj_z",
383
+ "model.language_model.layers.58.linear_attn.in_proj_b",
384
+ "model.language_model.layers.58.linear_attn.in_proj_a",
385
+ "model.language_model.layers.60.linear_attn.out_proj",
386
+ "model.language_model.layers.60.linear_attn.in_proj_qkv",
387
+ "model.language_model.layers.60.linear_attn.in_proj_z",
388
+ "model.language_model.layers.60.linear_attn.in_proj_b",
389
+ "model.language_model.layers.60.linear_attn.in_proj_a",
390
+ "model.language_model.layers.61.linear_attn.out_proj",
391
+ "model.language_model.layers.61.linear_attn.in_proj_qkv",
392
+ "model.language_model.layers.61.linear_attn.in_proj_z",
393
+ "model.language_model.layers.61.linear_attn.in_proj_b",
394
+ "model.language_model.layers.61.linear_attn.in_proj_a",
395
+ "model.language_model.layers.62.linear_attn.out_proj",
396
+ "model.language_model.layers.62.linear_attn.in_proj_qkv",
397
+ "model.language_model.layers.62.linear_attn.in_proj_z",
398
+ "model.language_model.layers.62.linear_attn.in_proj_b",
399
+ "model.language_model.layers.62.linear_attn.in_proj_a",
400
+ "lm_head"
401
+ ],
402
+ "kv_cache_scheme": null,
403
+ "quant_method": "compressed-tensors",
404
+ "quantization_status": "compressed",
405
+ "sparsity_config": {},
406
+ "transform_config": {},
407
+ "version": "0.15.1.a20260416"
408
+ },
409
+ "text_config": {
410
+ "attention_bias": false,
411
+ "attention_dropout": 0.0,
412
+ "attn_output_gate": true,
413
+ "bos_token_id": 248044,
414
+ "dtype": "bfloat16",
415
+ "eos_token_id": 248044,
416
+ "full_attention_interval": 4,
417
+ "head_dim": 256,
418
+ "hidden_act": "silu",
419
+ "hidden_size": 5120,
420
+ "initializer_range": 0.02,
421
+ "intermediate_size": 17408,
422
+ "layer_types": [
423
+ "linear_attention",
424
+ "linear_attention",
425
+ "linear_attention",
426
+ "full_attention",
427
+ "linear_attention",
428
+ "linear_attention",
429
+ "linear_attention",
430
+ "full_attention",
431
+ "linear_attention",
432
+ "linear_attention",
433
+ "linear_attention",
434
+ "full_attention",
435
+ "linear_attention",
436
+ "linear_attention",
437
+ "linear_attention",
438
+ "full_attention",
439
+ "linear_attention",
440
+ "linear_attention",
441
+ "linear_attention",
442
+ "full_attention",
443
+ "linear_attention",
444
+ "linear_attention",
445
+ "linear_attention",
446
+ "full_attention",
447
+ "linear_attention",
448
+ "linear_attention",
449
+ "linear_attention",
450
+ "full_attention",
451
+ "linear_attention",
452
+ "linear_attention",
453
+ "linear_attention",
454
+ "full_attention",
455
+ "linear_attention",
456
+ "linear_attention",
457
+ "linear_attention",
458
+ "full_attention",
459
+ "linear_attention",
460
+ "linear_attention",
461
+ "linear_attention",
462
+ "full_attention",
463
+ "linear_attention",
464
+ "linear_attention",
465
+ "linear_attention",
466
+ "full_attention",
467
+ "linear_attention",
468
+ "linear_attention",
469
+ "linear_attention",
470
+ "full_attention",
471
+ "linear_attention",
472
+ "linear_attention",
473
+ "linear_attention",
474
+ "full_attention",
475
+ "linear_attention",
476
+ "linear_attention",
477
+ "linear_attention",
478
+ "full_attention",
479
+ "linear_attention",
480
+ "linear_attention",
481
+ "linear_attention",
482
+ "full_attention",
483
+ "linear_attention",
484
+ "linear_attention",
485
+ "linear_attention",
486
+ "full_attention"
487
+ ],
488
+ "linear_conv_kernel_dim": 4,
489
+ "linear_key_head_dim": 128,
490
+ "linear_num_key_heads": 16,
491
+ "linear_num_value_heads": 48,
492
+ "linear_value_head_dim": 128,
493
+ "mamba_ssm_dtype": "float32",
494
+ "max_position_embeddings": 262144,
495
+ "model_type": "qwen3_5_text",
496
+ "mtp_num_hidden_layers": 1,
497
+ "mtp_use_dedicated_embeddings": false,
498
+ "num_attention_heads": 24,
499
+ "num_hidden_layers": 64,
500
+ "num_key_value_heads": 4,
501
+ "output_gate_type": "swish",
502
+ "pad_token_id": null,
503
+ "partial_rotary_factor": 0.25,
504
+ "rms_norm_eps": 1e-06,
505
+ "rope_parameters": {
506
+ "mrope_interleaved": true,
507
+ "mrope_section": [
508
+ 11,
509
+ 11,
510
+ 10
511
+ ],
512
+ "partial_rotary_factor": 0.25,
513
+ "rope_theta": 10000000,
514
+ "rope_type": "default"
515
+ },
516
+ "tie_word_embeddings": false,
517
+ "use_cache": true,
518
+ "vocab_size": 248320
519
+ },
520
+ "tie_word_embeddings": false,
521
+ "transformers_version": "5.6.0",
522
+ "video_token_id": 248057,
523
+ "vision_config": {
524
+ "deepstack_visual_indexes": [],
525
+ "depth": 27,
526
+ "dtype": "bfloat16",
527
+ "hidden_act": "gelu_pytorch_tanh",
528
+ "hidden_size": 1152,
529
+ "in_channels": 3,
530
+ "initializer_range": 0.02,
531
+ "intermediate_size": 4304,
532
+ "model_type": "qwen3_5_vision",
533
+ "num_heads": 16,
534
+ "num_position_embeddings": 2304,
535
+ "out_hidden_size": 5120,
536
+ "patch_size": 16,
537
+ "spatial_merge_size": 2,
538
+ "temporal_patch_size": 2
539
+ },
540
+ "vision_end_token_id": 248054,
541
+ "vision_start_token_id": 248053
542
+ }
generation_config.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 248044,
3
+ "do_sample": true,
4
+ "eos_token_id": [
5
+ 248046,
6
+ 248044
7
+ ],
8
+ "pad_token_id": 248044,
9
+ "temperature": 1.0,
10
+ "top_k": 20,
11
+ "top_p": 0.95,
12
+ "transformers_version": "5.6.0"
13
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4ff1493bd3263323335777601d3f7ac2ed11d209b8d44bae65c056438697006a
3
+ size 27702391880
preprocessor_config.json ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "size": {
3
+ "longest_edge": 16777216,
4
+ "shortest_edge": 65536
5
+ },
6
+ "patch_size": 16,
7
+ "temporal_patch_size": 2,
8
+ "merge_size": 2,
9
+ "image_mean": [
10
+ 0.5,
11
+ 0.5,
12
+ 0.5
13
+ ],
14
+ "image_std": [
15
+ 0.5,
16
+ 0.5,
17
+ 0.5
18
+ ],
19
+ "processor_class": "Qwen3VLProcessor",
20
+ "image_processor_type": "Qwen2VLImageProcessorFast"
21
+ }
recipe.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ default_stage:
2
+ default_modifiers:
3
+ QuantizationModifier:
4
+ targets: [Linear]
5
+ ignore: [lm_head, 're:.*embed_tokens.*', 're:.*\.visual\..*', 're:.*visual\..*', 're:.*linear_attn\..*',
6
+ 're:.*norm.*', 're:.*q_norm.*', 're:.*k_norm.*']
7
+ scheme: NVFP4
8
+ bypass_divisibility_checks: false
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d73c2c5f7aa0ed522c8d96ef3524739eb61e3c78e74839a2ce4a1c56ea340a20
3
+ size 19989424
tokenizer_config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "audio_bos_token": "<|audio_start|>",
4
+ "audio_eos_token": "<|audio_end|>",
5
+ "audio_token": "<|audio_pad|>",
6
+ "backend": "tokenizers",
7
+ "bos_token": null,
8
+ "clean_up_tokenization_spaces": false,
9
+ "eos_token": "<|im_end|>",
10
+ "errors": "replace",
11
+ "image_token": "<|image_pad|>",
12
+ "is_local": true,
13
+ "local_files_only": false,
14
+ "max_length": null,
15
+ "model_max_length": 262144,
16
+ "model_specific_special_tokens": {
17
+ "audio_bos_token": "<|audio_start|>",
18
+ "audio_eos_token": "<|audio_end|>",
19
+ "audio_token": "<|audio_pad|>",
20
+ "image_token": "<|image_pad|>",
21
+ "video_token": "<|video_pad|>",
22
+ "vision_bos_token": "<|vision_start|>",
23
+ "vision_eos_token": "<|vision_end|>"
24
+ },
25
+ "pad_to_multiple_of": null,
26
+ "pad_token": "<|endoftext|>",
27
+ "pad_token_type_id": 0,
28
+ "padding_side": "left",
29
+ "pretokenize_regex": "(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?[\\p{L}\\p{M}]+|\\p{N}| ?[^\\s\\p{L}\\p{M}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+",
30
+ "split_special_tokens": false,
31
+ "tokenizer_class": "Qwen2Tokenizer",
32
+ "unk_token": null,
33
+ "video_token": "<|video_pad|>",
34
+ "vision_bos_token": "<|vision_start|>",
35
+ "vision_eos_token": "<|vision_end|>"
36
+ }
video_preprocessor_config.json ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "size": {
3
+ "longest_edge": 25165824,
4
+ "shortest_edge": 4096
5
+ },
6
+ "patch_size": 16,
7
+ "temporal_patch_size": 2,
8
+ "merge_size": 2,
9
+ "image_mean": [
10
+ 0.5,
11
+ 0.5,
12
+ 0.5
13
+ ],
14
+ "image_std": [
15
+ 0.5,
16
+ 0.5,
17
+ 0.5
18
+ ],
19
+ "processor_class": "Qwen3VLProcessor",
20
+ "video_processor_type": "Qwen3VLVideoProcessor"
21
+ }