| # Multimodal |
|
|
| llama.cpp supports multimodal input via `libmtmd`. Currently, there are 2 tools support this feature: |
| - [llama-mtmd-cli](../tools/mtmd/README.md) |
| - [llama-server](../tools/server/README.md) via OpenAI-compatible `/chat/completions` API |
|
|
| To enable it, can use use one of the 2 methods below: |
|
|
| - Use `-hf` option with a supported model (see a list of pre-quantized model below) |
| - To load a model using `-hf` while disabling multimodal, use `--no-mmproj` |
| - To load a model using `-hf` while using a custom mmproj file, use `--mmproj local_file.gguf` |
| - Use `-m model.gguf` option with `--mmproj file.gguf` to specify text and multimodal projector respectively |
|
|
| By default, multimodal projector will be offloaded to GPU. To disable this, add `--no-mmproj-offload` |
|
|
| For example: |
|
|
| ```sh |
| # simple usage with CLI |
| llama-mtmd-cli -hf ggml-org/gemma-3-4b-it-GGUF |
| |
| # simple usage with server |
| llama-server -hf ggml-org/gemma-3-4b-it-GGUF |
| |
| # using local file |
| llama-server -m gemma-3-4b-it-Q4_K_M.gguf --mmproj mmproj-gemma-3-4b-it-Q4_K_M.gguf |
| |
| # no GPU offload |
| llama-server -hf ggml-org/gemma-3-4b-it-GGUF --no-mmproj-offload |
| ``` |
|
|
| ## Pre-quantized models |
|
|
| These are ready-to-use models, most of them come with `Q4_K_M` quantization by default. They can be found at the Hugging Face page of the ggml-org: https://huggingface.co/ggml-org |
|
|
| Replaces the `(tool_name)` with the name of binary you want to use. For example, `llama-mtmd-cli` or `llama-server` |
|
|
| NOTE: some models may require large context window, for example: `-c 8192` |
|
|
| ```sh |
| # Gemma 3 |
| (tool_name) -hf ggml-org/gemma-3-4b-it-GGUF |
| (tool_name) -hf ggml-org/gemma-3-12b-it-GGUF |
| (tool_name) -hf ggml-org/gemma-3-27b-it-GGUF |
| |
| # SmolVLM |
| (tool_name) -hf ggml-org/SmolVLM-Instruct-GGUF |
| (tool_name) -hf ggml-org/SmolVLM-256M-Instruct-GGUF |
| (tool_name) -hf ggml-org/SmolVLM-500M-Instruct-GGUF |
| (tool_name) -hf ggml-org/SmolVLM2-2.2B-Instruct-GGUF |
| (tool_name) -hf ggml-org/SmolVLM2-256M-Video-Instruct-GGUF |
| (tool_name) -hf ggml-org/SmolVLM2-500M-Video-Instruct-GGUF |
| |
| # Pixtral 12B |
| (tool_name) -hf ggml-org/pixtral-12b-GGUF |
| |
| # Qwen 2 VL |
| (tool_name) -hf ggml-org/Qwen2-VL-2B-Instruct-GGUF |
| (tool_name) -hf ggml-org/Qwen2-VL-7B-Instruct-GGUF |
| |
| # Qwen 2.5 VL |
| (tool_name) -hf ggml-org/Qwen2.5-VL-3B-Instruct-GGUF |
| (tool_name) -hf ggml-org/Qwen2.5-VL-7B-Instruct-GGUF |
| (tool_name) -hf ggml-org/Qwen2.5-VL-32B-Instruct-GGUF |
| (tool_name) -hf ggml-org/Qwen2.5-VL-72B-Instruct-GGUF |
| |
| # Mistral Small 3.1 24B (IQ2_M quantization) |
| (tool_name) -hf ggml-org/Mistral-Small-3.1-24B-Instruct-2503-GGUF |
| |
| # InternVL 2.5 and 3 |
| (tool_name) -hf ggml-org/InternVL2_5-1B-GGUF |
| (tool_name) -hf ggml-org/InternVL2_5-4B-GGUF |
| (tool_name) -hf ggml-org/InternVL3-1B-Instruct-GGUF |
| (tool_name) -hf ggml-org/InternVL3-2B-Instruct-GGUF |
| (tool_name) -hf ggml-org/InternVL3-8B-Instruct-GGUF |
| (tool_name) -hf ggml-org/InternVL3-14B-Instruct-GGUF |
| ``` |
|
|