Qwen3.5-35B-A3B-GPTQ-Int4

GPTQ INT4 quantization of Qwen/Qwen3.5-35B-A3B.

Quantization Details

Parameter	Value
Method	GPTQ
Bits	4
Group Size	128
Desc Act	True
Symmetric	False
Calibration	WikiText-2

Model Architecture

Type: Mixture-of-Experts (MoE) with linear + full attention
Experts: 256 per layer, top-8 routing
Layers: 40 (30 linear attention + 10 full attention)
Hidden Size: 2048
Parameters: ~35B total, ~3B active

Usage

from gptqmodel import GPTQModel
from transformers import AutoTokenizer

model = GPTQModel.from_quantized("RESMP-DEV/Qwen3.5-35B-A3B-GPTQ-Int4")
tokenizer = AutoTokenizer.from_pretrained("RESMP-DEV/Qwen3.5-35B-A3B-GPTQ-Int4")

inputs = tokenizer("Hello, world!", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Acknowledgments

Quantized using GPTQModel. Base model by Qwen.

Downloads last month: 12

Safetensors

Model size

36B params

Tensor type

F32

I32

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for RESMP-DEV/Qwen3.5-35B-A3B-GPTQ-Hybrid

Base model

Qwen/Qwen3.5-35B-A3B-Base

Finetuned

Qwen/Qwen3.5-35B-A3B

Quantized

(244)

this model