Instructions to use nameistoken/Qwen3.6-27B-Quark-W8A8-INT8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use nameistoken/Qwen3.6-27B-Quark-W8A8-INT8 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="nameistoken/Qwen3.6-27B-Quark-W8A8-INT8")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("nameistoken/Qwen3.6-27B-Quark-W8A8-INT8")
model = AutoModelForMultimodalLM.from_pretrained("nameistoken/Qwen3.6-27B-Quark-W8A8-INT8")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use nameistoken/Qwen3.6-27B-Quark-W8A8-INT8 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "nameistoken/Qwen3.6-27B-Quark-W8A8-INT8"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nameistoken/Qwen3.6-27B-Quark-W8A8-INT8",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/nameistoken/Qwen3.6-27B-Quark-W8A8-INT8

SGLang

How to use nameistoken/Qwen3.6-27B-Quark-W8A8-INT8 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "nameistoken/Qwen3.6-27B-Quark-W8A8-INT8" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nameistoken/Qwen3.6-27B-Quark-W8A8-INT8",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "nameistoken/Qwen3.6-27B-Quark-W8A8-INT8" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nameistoken/Qwen3.6-27B-Quark-W8A8-INT8",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use nameistoken/Qwen3.6-27B-Quark-W8A8-INT8 with Docker Model Runner:
```
docker model run hf.co/nameistoken/Qwen3.6-27B-Quark-W8A8-INT8
```

Qwen3.6-27B-Quark-W8A8-INT8

W8A8 INT8 quantized version of Qwen/Qwen3.6-27B using AMD Quark.

Model Details


Base Model	`Qwen/Qwen3.6-27B`
Architecture	`Qwen3_5ForConditionalGeneration` (hybrid attention + ViT)
Parameters	27B language tower (quantized) + 27-layer ViT (BF16, unquantized)
Layers	64 hybrid (16 full_attention + 48 linear_attention GatedDeltaNet) + 1 MTP head
Quantization	W8A8 INT8 (per-channel weight + per-token dynamic activation)
Quantizer	AMD Quark `0.11.1` (`pack_method='reorder'`, vLLM-native key naming)
Model Size	~29 GB (single safetensors)
Original Size	~52 GB (BF16)
Compression	~1.8x size reduction

Quantization Scheme

Component	dtype	Granularity	Mode
Linear weight (text decoder)	INT8	per-channel (`ch_axis=0`)	symmetric, static
Linear activation	INT8	per-token (`ch_axis=1`)	symmetric, dynamic
`lm_head`	BF16	-	unquantized
`embed_tokens`	BF16	-	unquantized
Vision tower (27 ViT blocks)	BF16	-	unquantized
MTP head (`mtp*`)	BF16	-	unquantized

Accuracy

GSM8K full 1319-question test split (vLLM, temperature=0, concurrency=16, max_tokens=1024, chat_template_kwargs.enable_thinking=false):

Model	Accuracy	Correct
`Qwen/Qwen3.6-27B` (BF16 baseline)	96.74%	1276 / 1319
This model (Quark W8A8 INT8)	96.74%	1276 / 1319

Net accuracy delta vs BF16: 0.00 pp.

Although the totals match exactly, the two models diverge on individual questions: only 38 / 1319 generations are token-identical, and the correct-set Jaccard is 0.9891 (1269 common correct, BF16 wins 7 unique, INT8 wins 7 unique — they cancel out). This is the typical W8A8 INT8 pattern: small per-token numerical drift causes reasoning paths to fork, but the accuracy averages out with no systematic degradation.

Eval setup: vLLM /v1/chat/completions, temperature=0, concurrency=16, max_tokens=1024, chat_template_kwargs.enable_thinking=false, single MI355X GPU (TP=1) for INT8 / TP=8 for BF16.

How to Use

With vLLM (Recommended)

vllm serve nameistoken/Qwen3.6-27B-Quark-W8A8-INT8 \
    --tensor-parallel-size 1 \
    --max-model-len 4096 \
    --gpu-memory-utilization 0.9 \
    --trust-remote-code

Chat completion call:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nameistoken/Qwen3.6-27B-Quark-W8A8-INT8",
    "messages": [{"role": "user", "content": "Hello!"}],
    "max_tokens": 256, "temperature": 0.7,
    "chat_template_kwargs": {"enable_thinking": false}
  }'

Hardware Requirements

~32 GB VRAM minimum (e.g., AMD MI300X / MI355X, NVIDIA A100-40G or larger).

Quantization Details

This model was quantized using AMD Quark's per-token per-channel INT8 scheme:

Weight: INT8 per-channel symmetric static (PerChannelMinMaxObserver, ch_axis=0).
Activation: INT8 per-token symmetric dynamic (ch_axis=1).
Excluded layers: lm_head, *embed_tokens*, *visual*, mtp*.
Export: pack_method='reorder', weight_format='real_quantized', custom_mode='quark'.
Key-name post-process: *.weight_quantizer.scale → *.weight_scale, drop *.weight_quantizer.zero_point (symmetric). Required for vLLM QuarkW8A8Int8 path with transformers 5.x.

License

Apache License 2.0 (inherited from Qwen/Qwen3.6-27B). See LICENSE and NOTICE.

Downloads last month: 4,241

Safetensors

Model size

27B params

Tensor type

BF16

Model tree for nameistoken/Qwen3.6-27B-Quark-W8A8-INT8

Base model

Qwen/Qwen3.6-27B

Quantized

(455)

this model