How to use from
vLLM
Install from pip and serve model
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "rhoninseiei/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-ModelOpt-NVFP4"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rhoninseiei/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-ModelOpt-NVFP4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'
Use Docker
docker model run hf.co/rhoninseiei/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-ModelOpt-NVFP4
Quick Links

Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-ModelOpt-NVFP4

This repository contains a ModelOpt-quantized checkpoint derived from Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled.

Quantization Summary

  • Quantization tool: NVIDIA TensorRT Model Optimizer
  • Weight quantization: NVFP4
  • KV cache quantization: FP8
  • Calibration size: 1024 samples
  • Calibration sequence length: 4096
  • Calibration batch size: 1
  • Calibration source mix:
    • nohurry/Opus-4.6-Reasoning-3000x-filtered: 596 samples
    • Jackrong/Qwen3.5-reasoning-700x: 178 samples
    • TeichAI/claude-4.5-opus-high-reasoning-250x: 250 samples

The calibration set was converted into a single JSONL file with the source model's chat template applied before PTQ, so the activation distribution is closer to the reasoning format used by this distilled checkpoint.

Runtime Notes

  • Intended runtime target: SGLang with ModelOpt-compatible HF checkpoint loading
  • Quantization format: ModelOpt HF export, not compressed-tensors
  • A few unsupported or intentionally skipped modules may remain excluded by ModelOpt during export; see hf_quant_config.json for the final exclusion list

Files

  • model.safetensors: quantized weights
  • hf_quant_config.json: final quantization metadata
  • tokenizer and processor files inherited from the source checkpoint
Downloads last month
5
Safetensors
Model size
17B params
Tensor type
BF16
F8_E4M3
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for rhoninseiei/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-ModelOpt-NVFP4