Open WebUI Compatibility
Which models work with Open WebUI for tool calling, and why some don't.
Compatibility Matrix
| Model | VLLM API | Open WebUI | Notes |
|---|---|---|---|
| Hermes-3-Llama-3.1-70B | Yes | No | Format incompatible |
| Llama-3.3-70B-Instruct | Yes | Yes | Works out of the box |
| Qwen2-72B-Instruct | Yes | Yes | Works with hermes parser |
| Mistral-Nemo-12B | Yes | Yes | Works with mistral parser |
Why Hermes-3 Doesn't Work with Open WebUI
Open WebUI expects tool calls in the standard OpenAI JSON format:
{
"tool_calls": [{
"id": "call_abc123",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"location\": \"SF\"}"
}
}]
}
Hermes-3's native format uses ChatML + XML tags:
<tool_call>
{"name": "get_weather", "arguments": {"location": "SF"}}
</tool_call>
VLLM's --tool-call-parser hermes converts between these formats, but Open WebUI's tool execution pipeline has additional requirements that the conversion doesn't fully satisfy.
The Flow
Working (Llama 3.3):
Model β Native JSON format β VLLM parser β OpenAI format β Open WebUI β
Broken (Hermes-3):
Model β ChatML+XML format β VLLM parser β OpenAI format β Open WebUI β
(format mismatch
in execution)
Recommendations
If you need Open WebUI:
Use Llama-3.3-70B-Instruct-FP8 β it works immediately with no configuration beyond:
--tool-call-parser llama3_json
--enable-auto-tool-choice
If you're building a custom application:
Use Hermes-3 β it has the best tool calling quality and all formats work via the VLLM API.
If you need both:
Run two VLLM instances:
- Hermes-3 on port 8000 for your custom application
- Llama-3.3 on port 8001 for Open WebUI
Both fit on a 96GB GPU simultaneously (if using smaller context windows or if you have multi-GPU).
Open WebUI Setup for Llama 3.3
Start VLLM:
./configs/llama33_70b_fp8.shAdd connection in Open WebUI:
- Settings β Connections β OpenAI API
- URL:
http://your-gpu-server:8000/v1 - API Key: (leave empty or use "none")
Enable tools:
- Settings β Tools β Enable
- Add your tool definitions
Test:
- Start a new chat with Llama-3.3-70B-Instruct-FP8
- Ask a question that requires tool use
- Verify tool calls appear and execute