Data-aware int4 compression of Qwen3-1.7B using a custom dataset targeted at protecting those activations which are important for the functioning of the Qwen3 models.

optimum-cli export openvino \
  --model Qwen/Qwen3-1.7B \
  --task text-generation-with-past \
  --weight-format int4 \
  --trust-remote-code \
  --dataset imperfect \
  --num-samples 32 \
  --ratio 0.85 \
  --sensitivity-metric hessian_input_activation \
  --group-size 128 \
  --awq \
  --scale-estimation \
  --lora-correction \
  qwen3-1.7b-int4-asym-awq-ov

The Following modifications were made to optimum-intel to enable the use of the targeted custom dataset.

Tell utils.py that "imperfect" is a valid option.

optimum/intel/openvino/utils.py (153):

PREDEFINED_CAUSAL_LANGUAGE_DATASETS = {"wikitext2", "c4", "c4-new", "auto", "gsm8k", "imperfect"}

Give quantization.py the ability to use the dataset.

optimum/intel/openvino/quantization.py (688):

            elif config.dataset == "imperfect":
                seq_len = seq_len or 4096
                dataset = self.load_dataset(
                    "imperfect-follow/qwen3-compress-256",
                    dataset_split="train",
                    num_samples=config.num_samples
                )
                calibration_dataset = []

                for i, _text in enumerate(dataset["text"], 1):
                    # Tokenize once without truncation to check full length
                    full_tokenized = tokenizer(_text, return_tensors="pt")
                    full_length = full_tokenized.input_ids.shape[1] # Use index 1 for length
                    
                    if full_length > seq_len:
                        logger.warning(
                            f"Sample {i} truncated: Length {full_length} exceeds "
                            f"maximum {seq_len} by {full_length - seq_len}"
                        )
                        # Slice the existing tensors instead of re-tokenizing
                        # This slices input_ids, attention_mask, etc., to max seq_len
                        item = {k: v[:, :seq_len] for k, v in full_tokenized.items()}
                        calibration_dataset.append(item)
                    else:
                        calibration_dataset.append(full_tokenized)

Downloads last month: 32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for imperfect-follow/qwen3-1.7b-int4-asym-awq-ov

Base model

Qwen/Qwen3-1.7B-Base

Finetuned

Qwen/Qwen3-1.7B

Finetuned

(629)

this model

imperfect-follow
/

qwen3-1.7b-int4-asym-awq-ov

Model tree for imperfect-follow/qwen3-1.7b-int4-asym-awq-ov

Dataset used to train imperfect-follow/qwen3-1.7b-int4-asym-awq-ov