Granite 4.1 3B ONNX GenAI INT4

This repository contains a fresh ONNX Runtime GenAI export of ibm-granite/granite-4.1-3b, quantized with INT4 RTN, block size 32, accuracy level 4.

It was exported directly from the IBM safetensors checkpoint with:

python -m onnxruntime_genai.models.builder \
  --model_name ibm-granite/granite-4.1-3b \
  --precision int4 \
  --execution_provider cpu \
  --extra_options int4_block_size=32 int4_accuracy_level=4 int4_algo_config=rtn

Validation

Local ORT GenAI smoke test passed with a short prompt.

This fresh export uses GroupQueryAttention and avoids the repeat_kv shape/cache failure observed with the previous conversion based on onnx-community/Granite-4.1-3b-Onnx.

Package

  • genai_config.json
  • model.onnx
  • model.onnx.data
  • tokenizer.json
  • tokenizer_config.json
  • chat_template.jinja
  • quantization_config.json
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MiCkSoftware/Granite-4.1-3b-Onnx-q4

Quantized
(30)
this model