How to use from
SGLang
Install from pip and serve model
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "DavidAU/Granite-4.1-30B-Claude-4.6-Opus-Thinking-X" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DavidAU/Granite-4.1-30B-Claude-4.6-Opus-Thinking-X",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'
Use Docker images
docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "DavidAU/Granite-4.1-30B-Claude-4.6-Opus-Thinking-X" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DavidAU/Granite-4.1-30B-Claude-4.6-Opus-Thinking-X",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'
Quick Links

Granite-4.1-30B-Claude-4.6-Opus-Thinking-X

Fine tune of Granite 4.1 with Unsloth on local hardware to convert model from "instruct" to full "reasoning/thinking".

Other versions below.

Context: 128k

Suggest: Temp 1, Topp: .95, Minp .05, rep pen 1 (off) OR for creative 1.05 to 1.1, min context of 8k.

Enjoy ;


TECH NOTE:

The "lm_head" was split from the "embed" to improve training and quant performance prior to training.

For GGUF quants, this will allow you to set the "output tensor" at bf16 and get stronger performance overall.


BENCHMARKS by Nightmedia:

         arc-c arc/e boolq hswag obkqa piqa  wino

Granite-4.1-30B-Claude-4.6-Opus-Thinking-Charles-Xavier
mxfp8    0.573,0.761,0.876,...

Granite-4.1-30B-Claude-4.6-Opus-Thinking-Xavier
mxfp8    0.563,0.739,0.879,0.722,0.430,0.779,0.723

THIS MODEL:
Granite-4.1-30B-Claude-4.6-Opus-Thinking-X
qx64-hi  0.526,0.696,0.894,...

- BASE UNTUNED MODEL -

granite-4.1-30b
mxfp8    0.456,0.572,0.897,0.621,0.444,0.757,0.616

granite-4.1-30b
qx64-hi  0.462,0.582,0.896,0.642,0.448,0.769,0.600
Downloads last month
41
Safetensors
Model size
29B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DavidAU/Granite-4.1-30B-Claude-4.6-Opus-Thinking-X

Finetuned
(8)
this model
Merges
8 models
Quantizations
2 models