Granite 4.1 3B ONNX GenAI INT4
This repository contains a fresh ONNX Runtime GenAI export of
ibm-granite/granite-4.1-3b,
quantized with INT4 RTN, block size 32, accuracy level 4.
It was exported directly from the IBM safetensors checkpoint with:
python -m onnxruntime_genai.models.builder \
--model_name ibm-granite/granite-4.1-3b \
--precision int4 \
--execution_provider cpu \
--extra_options int4_block_size=32 int4_accuracy_level=4 int4_algo_config=rtn
Validation
Local ORT GenAI smoke test passed with a short prompt.
This fresh export uses GroupQueryAttention and avoids the repeat_kv
shape/cache failure observed with the previous conversion based on
onnx-community/Granite-4.1-3b-Onnx.
Package
genai_config.jsonmodel.onnxmodel.onnx.datatokenizer.jsontokenizer_config.jsonchat_template.jinjaquantization_config.json
Model tree for MiCkSoftware/Granite-4.1-3b-Onnx-q4
Base model
ibm-granite/granite-4.1-3b