Qwen3.5-35B-A3B-GPTQ-Int4

GPTQ INT4 quantization of Qwen/Qwen3.5-35B-A3B.

Quantization Details

Parameter Value
Method GPTQ
Bits 4
Group Size 128
Desc Act True
Symmetric False
Calibration WikiText-2

Model Architecture

  • Type: Mixture-of-Experts (MoE) with linear + full attention
  • Experts: 256 per layer, top-8 routing
  • Layers: 40 (30 linear attention + 10 full attention)
  • Hidden Size: 2048
  • Parameters: ~35B total, ~3B active

Usage

from gptqmodel import GPTQModel
from transformers import AutoTokenizer

model = GPTQModel.from_quantized("RESMP-DEV/Qwen3.5-35B-A3B-GPTQ-Int4")
tokenizer = AutoTokenizer.from_pretrained("RESMP-DEV/Qwen3.5-35B-A3B-GPTQ-Int4")

inputs = tokenizer("Hello, world!", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Acknowledgments

Quantized using GPTQModel. Base model by Qwen.

Downloads last month
12
Safetensors
Model size
36B params
Tensor type
F32
I32
BF16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for RESMP-DEV/Qwen3.5-35B-A3B-GPTQ-Hybrid

Quantized
(244)
this model