Open WebUI Compatibility

Which models work with Open WebUI for tool calling, and why some don't.

Compatibility Matrix

Model	VLLM API	Open WebUI	Notes
Hermes-3-Llama-3.1-70B	Yes	No	Format incompatible
Llama-3.3-70B-Instruct	Yes	Yes	Works out of the box
Qwen2-72B-Instruct	Yes	Yes	Works with hermes parser
Mistral-Nemo-12B	Yes	Yes	Works with mistral parser

Why Hermes-3 Doesn't Work with Open WebUI

Open WebUI expects tool calls in the standard OpenAI JSON format:

{
  "tool_calls": [{
    "id": "call_abc123",
    "type": "function",
    "function": {
      "name": "get_weather",
      "arguments": "{\"location\": \"SF\"}"
    }
  }]
}

Hermes-3's native format uses ChatML + XML tags:

<tool_call>
{"name": "get_weather", "arguments": {"location": "SF"}}
</tool_call>

VLLM's --tool-call-parser hermes converts between these formats, but Open WebUI's tool execution pipeline has additional requirements that the conversion doesn't fully satisfy.

The Flow

Working (Llama 3.3):
  Model → Native JSON format → VLLM parser → OpenAI format → Open WebUI ✅

Broken (Hermes-3):
  Model → ChatML+XML format → VLLM parser → OpenAI format → Open WebUI ❌
                                                              (format mismatch
                                                               in execution)

Recommendations

If you need Open WebUI:

Use Llama-3.3-70B-Instruct-FP8 — it works immediately with no configuration beyond:

--tool-call-parser llama3_json
--enable-auto-tool-choice

If you're building a custom application:

Use Hermes-3 — it has the best tool calling quality and all formats work via the VLLM API.

If you need both:

Run two VLLM instances:

Hermes-3 on port 8000 for your custom application
Llama-3.3 on port 8001 for Open WebUI

Both fit on a 96GB GPU simultaneously (if using smaller context windows or if you have multi-GPU).

Open WebUI Setup for Llama 3.3

Start VLLM:
```
./configs/llama33_70b_fp8.sh
```
Add connection in Open WebUI:
- Settings → Connections → OpenAI API
- URL: http://your-gpu-server:8000/v1
- API Key: (leave empty or use "none")
Enable tools:
- Settings → Tools → Enable
- Add your tool definitions
Test:
- Start a new chat with Llama-3.3-70B-Instruct-FP8
- Ask a question that requires tool use
- Verify tool calls appear and execute