Data-aware int4 compression of Qwen3-1.7B using a custom dataset targeted at protecting those activations which are important for the functioning of the Qwen3 models.
optimum-cli export openvino \
--model Qwen/Qwen3-1.7B \
--task text-generation-with-past \
--weight-format int4 \
--trust-remote-code \
--dataset imperfect \
--num-samples 32 \
--ratio 0.85 \
--sensitivity-metric hessian_input_activation \
--group-size 128 \
--awq \
--scale-estimation \
--lora-correction \
qwen3-1.7b-int4-asym-awq-ov
The Following modifications were made to optimum-intel to enable the use of the targeted custom dataset.
Tell utils.py that "imperfect" is a valid option.
optimum/intel/openvino/utils.py (153):
PREDEFINED_CAUSAL_LANGUAGE_DATASETS = {"wikitext2", "c4", "c4-new", "auto", "gsm8k", "imperfect"}
Give quantization.py the ability to use the dataset.
optimum/intel/openvino/quantization.py (688):
elif config.dataset == "imperfect":
seq_len = seq_len or 4096
dataset = self.load_dataset(
"imperfect-follow/qwen3-compress-256",
dataset_split="train",
num_samples=config.num_samples
)
calibration_dataset = []
for i, _text in enumerate(dataset["text"], 1):
# Tokenize once without truncation to check full length
full_tokenized = tokenizer(_text, return_tensors="pt")
full_length = full_tokenized.input_ids.shape[1] # Use index 1 for length
if full_length > seq_len:
logger.warning(
f"Sample {i} truncated: Length {full_length} exceeds "
f"maximum {seq_len} by {full_length - seq_len}"
)
# Slice the existing tensors instead of re-tokenizing
# This slices input_ids, attention_mask, etc., to max seq_len
item = {k: v[:, :seq_len] for k, v in full_tokenized.items()}
calibration_dataset.append(item)
else:
calibration_dataset.append(full_tokenized)
- Downloads last month
- 32
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support