# Open WebUI Compatibility Which models work with Open WebUI for tool calling, and why some don't. ## Compatibility Matrix | Model | VLLM API | Open WebUI | Notes | |-------|----------|-----------|-------| | Hermes-3-Llama-3.1-70B | Yes | **No** | Format incompatible | | Llama-3.3-70B-Instruct | Yes | Yes | Works out of the box | | Qwen2-72B-Instruct | Yes | Yes | Works with hermes parser | | Mistral-Nemo-12B | Yes | Yes | Works with mistral parser | ## Why Hermes-3 Doesn't Work with Open WebUI Open WebUI expects tool calls in the standard OpenAI JSON format: ```json { "tool_calls": [{ "id": "call_abc123", "type": "function", "function": { "name": "get_weather", "arguments": "{\"location\": \"SF\"}" } }] } ``` Hermes-3's native format uses ChatML + XML tags: ```xml {"name": "get_weather", "arguments": {"location": "SF"}} ``` VLLM's `--tool-call-parser hermes` converts between these formats, but Open WebUI's tool execution pipeline has additional requirements that the conversion doesn't fully satisfy. ## The Flow ``` Working (Llama 3.3): Model → Native JSON format → VLLM parser → OpenAI format → Open WebUI ✅ Broken (Hermes-3): Model → ChatML+XML format → VLLM parser → OpenAI format → Open WebUI ❌ (format mismatch in execution) ``` ## Recommendations ### If you need Open WebUI: Use **Llama-3.3-70B-Instruct-FP8** — it works immediately with no configuration beyond: ```bash --tool-call-parser llama3_json --enable-auto-tool-choice ``` ### If you're building a custom application: Use **Hermes-3** — it has the best tool calling quality and all formats work via the VLLM API. ### If you need both: Run two VLLM instances: - Hermes-3 on port 8000 for your custom application - Llama-3.3 on port 8001 for Open WebUI Both fit on a 96GB GPU simultaneously (if using smaller context windows or if you have multi-GPU). ## Open WebUI Setup for Llama 3.3 1. **Start VLLM:** ```bash ./configs/llama33_70b_fp8.sh ``` 2. **Add connection in Open WebUI:** - Settings → Connections → OpenAI API - URL: `http://your-gpu-server:8000/v1` - API Key: (leave empty or use "none") 3. **Enable tools:** - Settings → Tools → Enable - Add your tool definitions 4. **Test:** - Start a new chat with Llama-3.3-70B-Instruct-FP8 - Ask a question that requires tool use - Verify tool calls appear and execute