Granite 4.1 3B ONNX GenAI INT4

This repository contains a fresh ONNX Runtime GenAI export of ibm-granite/granite-4.1-3b, quantized with INT4 RTN, block size 32, accuracy level 4.

It was exported directly from the IBM safetensors checkpoint with:

python -m onnxruntime_genai.models.builder \
  --model_name ibm-granite/granite-4.1-3b \
  --precision int4 \
  --execution_provider cpu \
  --extra_options int4_block_size=32 int4_accuracy_level=4 int4_algo_config=rtn

Validation

Local ORT GenAI smoke test passed with a short prompt.

This fresh export uses GroupQueryAttention and avoids the repeat_kv shape/cache failure observed with the previous conversion based on onnx-community/Granite-4.1-3b-Onnx.

Package

genai_config.json
model.onnx
model.onnx.data
tokenizer.json
tokenizer_config.json
chat_template.jinja
quantization_config.json

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for MiCkSoftware/Granite-4.1-3b-Onnx-q4

Base model

ibm-granite/granite-4.1-3b

Quantized

(30)

this model