Docker error The size of tensor a (1280) must match the size of tensor b (2560) at non-singleton dimension 1
Hi, i have trouble to run ibm-granite/granite-4.0-3b-vision using docker.
vllm version: 0.19.0
command:
docker run --gpus all -p 9999:8000 --ipc=host vllm-granite-vision --model ibm-granite/granite-4.0-3b-vision --trust-remote-code --max-model-len 16384 --tens
or-parallel-size 2 --host 0.0.0.0 --port 8000 --hf-overrides '{"adapter_path": "ibm-granite/granite-4.0-3b-vision"}' --gpu-memory-utilization 0.7
error:
(Worker_TP1 pid=83) ERROR 04-07 07:26:04 [multiproc_executor.py:857] loaded_weights = model.load_weights(self.get_all_weights(model_config, model))
(Worker_TP1 pid=83) ERROR 04-07 07:26:04 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=83) ERROR 04-07 07:26:04 [multiproc_executor.py:857] File "/app/granite4_vision.py", line 948, in load_weights
(Worker_TP1 pid=83) ERROR 04-07 07:26:04 [multiproc_executor.py:857] self._apply_adapter()
(Worker_TP1 pid=83) ERROR 04-07 07:26:04 [multiproc_executor.py:857] File "/app/granite4_vision.py", line 940, in _apply_adapter
(Worker_TP1 pid=83) ERROR 04-07 07:26:04 [multiproc_executor.py:857] n = self._merge_lora_deltas(adapter_config, adapter_weights)
(Worker_TP1 pid=83) ERROR 04-07 07:26:04 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=83) ERROR 04-07 07:26:04 [multiproc_executor.py:857] File "/app/granite4_vision.py", line 919, in _merge_lora_deltas
(Worker_TP1 pid=83) ERROR 04-07 07:26:04 [multiproc_executor.py:857] if _add_delta(module_key + ".weight", delta):
(Worker_TP1 pid=83) ERROR 04-07 07:26:04 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=83) ERROR 04-07 07:26:04 [multiproc_executor.py:857] File "/app/granite4_vision.py", line 906, in _add_delta
(Worker_TP1 pid=83) ERROR 04-07 07:26:04 [multiproc_executor.py:857] param.data = (param.data.float() + delta.to(param.device)).to(param.dtype)
(Worker_TP1 pid=83) ERROR 04-07 07:26:04 [multiproc_executor.py:857] ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
(Worker_TP1 pid=83) ERROR 04-07 07:26:04 [multiproc_executor.py:857] RuntimeError: The size of tensor a (1280) must match the size of tensor b (2560) at non-singleton dimension 1
Hi, there might be an issue with handling tensor parallelism in case of full merge flow. I'm going to check it.
Meanwhile you can try the inference on a single GPU by removing the command arg --tensor-parallel-size 2
Pushed the fixed version, should work now with multiple GPUs