vllm-tool-calling-guide / guides /OPEN_WEBUI_COMPATIBILITY.md
Joshua Odmark
Initial release: VLLM tool calling guide for open source models
634c038

Open WebUI Compatibility

Which models work with Open WebUI for tool calling, and why some don't.

Compatibility Matrix

Model VLLM API Open WebUI Notes
Hermes-3-Llama-3.1-70B Yes No Format incompatible
Llama-3.3-70B-Instruct Yes Yes Works out of the box
Qwen2-72B-Instruct Yes Yes Works with hermes parser
Mistral-Nemo-12B Yes Yes Works with mistral parser

Why Hermes-3 Doesn't Work with Open WebUI

Open WebUI expects tool calls in the standard OpenAI JSON format:

{
  "tool_calls": [{
    "id": "call_abc123",
    "type": "function",
    "function": {
      "name": "get_weather",
      "arguments": "{\"location\": \"SF\"}"
    }
  }]
}

Hermes-3's native format uses ChatML + XML tags:

<tool_call>
{"name": "get_weather", "arguments": {"location": "SF"}}
</tool_call>

VLLM's --tool-call-parser hermes converts between these formats, but Open WebUI's tool execution pipeline has additional requirements that the conversion doesn't fully satisfy.

The Flow

Working (Llama 3.3):
  Model β†’ Native JSON format β†’ VLLM parser β†’ OpenAI format β†’ Open WebUI βœ…

Broken (Hermes-3):
  Model β†’ ChatML+XML format β†’ VLLM parser β†’ OpenAI format β†’ Open WebUI ❌
                                                              (format mismatch
                                                               in execution)

Recommendations

If you need Open WebUI:

Use Llama-3.3-70B-Instruct-FP8 β€” it works immediately with no configuration beyond:

--tool-call-parser llama3_json
--enable-auto-tool-choice

If you're building a custom application:

Use Hermes-3 β€” it has the best tool calling quality and all formats work via the VLLM API.

If you need both:

Run two VLLM instances:

  • Hermes-3 on port 8000 for your custom application
  • Llama-3.3 on port 8001 for Open WebUI

Both fit on a 96GB GPU simultaneously (if using smaller context windows or if you have multi-GPU).

Open WebUI Setup for Llama 3.3

  1. Start VLLM:

    ./configs/llama33_70b_fp8.sh
    
  2. Add connection in Open WebUI:

    • Settings β†’ Connections β†’ OpenAI API
    • URL: http://your-gpu-server:8000/v1
    • API Key: (leave empty or use "none")
  3. Enable tools:

    • Settings β†’ Tools β†’ Enable
    • Add your tool definitions
  4. Test:

    • Start a new chat with Llama-3.3-70B-Instruct-FP8
    • Ask a question that requires tool use
    • Verify tool calls appear and execute