Instructions to use sakamakismile/Huihui-Qwen3.6-27B-abliterated-NVFP4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use sakamakismile/Huihui-Qwen3.6-27B-abliterated-NVFP4 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="sakamakismile/Huihui-Qwen3.6-27B-abliterated-NVFP4")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("sakamakismile/Huihui-Qwen3.6-27B-abliterated-NVFP4")
model = AutoModelForImageTextToText.from_pretrained("sakamakismile/Huihui-Qwen3.6-27B-abliterated-NVFP4")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use sakamakismile/Huihui-Qwen3.6-27B-abliterated-NVFP4 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "sakamakismile/Huihui-Qwen3.6-27B-abliterated-NVFP4"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sakamakismile/Huihui-Qwen3.6-27B-abliterated-NVFP4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/sakamakismile/Huihui-Qwen3.6-27B-abliterated-NVFP4

SGLang

How to use sakamakismile/Huihui-Qwen3.6-27B-abliterated-NVFP4 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "sakamakismile/Huihui-Qwen3.6-27B-abliterated-NVFP4" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sakamakismile/Huihui-Qwen3.6-27B-abliterated-NVFP4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "sakamakismile/Huihui-Qwen3.6-27B-abliterated-NVFP4" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sakamakismile/Huihui-Qwen3.6-27B-abliterated-NVFP4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use sakamakismile/Huihui-Qwen3.6-27B-abliterated-NVFP4 with Docker Model Runner:
```
docker model run hf.co/sakamakismile/Huihui-Qwen3.6-27B-abliterated-NVFP4
```

Huihui-Qwen3.6-27B-abliterated-NVFP4

NVFP4 quantized version of huihui-ai/Huihui-Qwen3.6-27B-abliterated — an abliterated (uncensored) variant of Qwen3.6-27B, the dense 27B VLM with Gated DeltaNet hybrid attention.

Quantized to NVIDIA FP4 by Lna-Lab using custom Blackwell NVFP4 GEMM kernels (lna-lab/blackwell-geforce-nvfp4-gemm).

55.6 GB → 19.7 GB (0.35x) — vision tower preserved in BF16. Runs on a single NVIDIA Blackwell GPU.

Key Specs


Base model	huihui-ai/Huihui-Qwen3.6-27B-abliterated
Original	Qwen/Qwen3.6-27B
Architecture	Dense 27B, Gated DeltaNet + Gated Attention hybrid, VLM
Quantization	NVFP4 (W4A4 — weights FP4, activations FP4, scales FP8)
Format	`compressed-tensors` (native vLLM support)
Tool	vllm-project/llm-compressor + blackwell-geforce-nvfp4-gemm
Size	19.7 GB
Requires	NVIDIA Blackwell GPU (SM 120), vLLM >= 0.19

Benchmark Results

Tested on a single NVIDIA RTX PRO 6000 Blackwell (96 GB), vLLM 0.19.1+, 128K context, FP8 KV cache.

Task	Tokens	Speed (tok/s)	Status
English reasoning	1,024	56.2	PASS
Japanese essay (方丈記)	2,048	59.7	PASS
Python code generation	2,048	59.1	PASS
Contradictory instructions	1,500	59.5	PASS
VLM image description	947	58.1	PASS
Math proof (√2 irrationality)	1,024	59.3	PASS

Sustained throughput: ~58 tok/s (single GPU, 128K context, FP8 KV cache)

VRAM Usage

State	GPU Memory
After model load	92,142 MiB
Peak (during inference)	92,150 MiB

Quick Start — From Scratch with Docker

1. Pull the model

hf download sakamakismile/Huihui-Qwen3.6-27B-abliterated-NVFP4 \
  --local-dir /models/Huihui-Qwen3.6-27B-abliterated-NVFP4

2. Run with Docker (128K context + FP8 KV cache)

docker run -d --name huihui-qwen36-27b \
  --gpus '"device=0"' --shm-size=16g \
  -v /models/Huihui-Qwen3.6-27B-abliterated-NVFP4:/models/current:ro \
  -p 8000:8000 \
  vllm/vllm-openai:cu130-nightly \
  --model /models/current \
  --trust-remote-code --quantization modelopt --language-model-only \
  --reasoning-parser qwen3 \
  --enable-auto-tool-choice --tool-call-parser qwen3_xml \
  --default-chat-template-kwargs '{"preserve_thinking":true}' \
  --enable-prefix-caching --enable-chunked-prefill \
  --max-model-len 131072 --gpu-memory-utilization 0.95 \
  --kv-cache-dtype fp8_e4m3

3. Run with vLLM directly

vllm serve /models/Huihui-Qwen3.6-27B-abliterated-NVFP4 \
  --max-model-len 131072 \
  --gpu-memory-utilization 0.95 \
  --dtype auto \
  --kv-cache-dtype fp8_e4m3 \
  --trust-remote-code

4. Test inference

# Text
curl -s http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Huihui-Qwen3.6-27B-abliterated-NVFP4",
    "messages": [{"role": "user", "content": "Write a haiku about quantization."}],
    "max_tokens": 256,
    "temperature": 0.0
  }'

# VLM (image input)
import base64, requests
from pathlib import Path

img_b64 = base64.b64encode(Path("photo.jpg").read_bytes()).decode()
resp = requests.post("http://localhost:8000/v1/chat/completions", json={
    "model": "Huihui-Qwen3.6-27B-abliterated-NVFP4",
    "messages": [{"role": "user", "content": [
        {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{img_b64}"}},
        {"type": "text", "text": "Describe this image."},
    ]}],
    "max_tokens": 1024,
})
print(resp.json()["choices"][0]["message"]["content"])

Quantization Details

Recipe

QuantizationModifier:
  targets: [Linear]
  ignore: [lm_head, 're:.*visual.*', 're:.*mlp.gate$', 're:.*mlp.shared_expert_gate$']
  scheme: NVFP4

What's Quantized / What's Not

Quantized (NVFP4): All Linear layers in the language model
Kept in BF16: lm_head, all vision layers (model.visual.*), MLP gates

Reproduction

from transformers import AutoProcessor, Qwen3_5ForConditionalGeneration
from llmcompressor import oneshot
from llmcompressor.modifiers.quantization import QuantizationModifier
import torch

model = Qwen3_5ForConditionalGeneration.from_pretrained(
    "huihui-ai/Huihui-Qwen3.6-27B-abliterated",
    torch_dtype=torch.bfloat16, trust_remote_code=True,
)
processor = AutoProcessor.from_pretrained(
    "huihui-ai/Huihui-Qwen3.6-27B-abliterated", trust_remote_code=True,
)

recipe = QuantizationModifier(
    targets="Linear", scheme="NVFP4",
    ignore=["lm_head", "re:.*visual.*", "re:.*mlp.gate$", "re:.*mlp.shared_expert_gate$"],
)

# Calibration with neuralmagic/calibration dataset (20 samples, 8192 seq len)
# ... (see quantization script in repo)

model.save_pretrained("output-NVFP4", save_compressed=True)
processor.save_pretrained("output-NVFP4")

Note: After saving, verify vision tower keys use model.visual.* prefix (not model.language_model.visual.*). See this fix for details.

Tested Environment

Component	Version
vLLM	0.19.1rc1+ (nightly)
Transformers	5.5.4
PyTorch	2.11.0+cu130
llm-compressor	0.1.dev5
CUDA	13.0
GPU	NVIDIA RTX PRO 6000 Blackwell (96 GB)
OS	Ubuntu 24.04, Linux 6.17

Credits

Original model: Qwen Team (Alibaba Group) — Qwen3.6-27B
Abliteration: huihui-ai — Huihui-Qwen3.6-27B-abliterated
NVFP4 quantization & benchmarking: Lna-Lab
Blackwell NVFP4 GEMM kernels: lna-lab/blackwell-geforce-nvfp4-gemm
Quantization framework: vllm-project/llm-compressor

Support the Base Model Authors

If you find this model useful, please consider supporting:

huihui-ai (abliteration): Ko-fi | BTC: bc1qqnkhuchxw0zqjh2ku3lu4hq45hc6gy84uk70ge
Qwen Team (original model): Star the Qwen repo

License

This model inherits the Apache 2.0 license.

Downloads last month: 7,734

Safetensors

Model size

17B params

Tensor type

F32

BF16

F8_E4M3

Model tree for sakamakismile/Huihui-Qwen3.6-27B-abliterated-NVFP4

Base model

Qwen/Qwen3.6-27B

Finetuned

huihui-ai/Huihui-Qwen3.6-27B-abliterated

Quantized

(22)

this model