# Tool Call Formats Explained VLLM supports multiple tool call formats. Each model family uses a different native format, but VLLM converts them all to OpenAI-compatible JSON. ## Format Comparison ### 1. Hermes Format (ChatML + XML) **Used by:** Hermes-3, Hermes-2-Pro, Qwen2 (via hermes parser) **Parser flag:** `--tool-call-parser hermes` **Model outputs:** ```xml {"name": "get_weather", "arguments": {"location": "San Francisco"}} ``` **Tool responses formatted as:** ```xml {"temperature": 22, "condition": "Sunny"} ``` **Characteristics:** - XML tags make tool calls easy to parse reliably - Supports parallel calls via `tool_calls` array inside tags - Most reliable format for structured output - ChatML-based (`<|im_start|>`, `<|im_end|>`) ### 2. Llama 3 JSON Format **Used by:** Llama-3.1, Llama-3.3 **Parser flag:** `--tool-call-parser llama3_json` **Model outputs:** ```json {"name": "get_weather", "parameters": {"location": "San Francisco"}} ``` **Characteristics:** - Pure JSON, no XML wrapping - Uses `parameters` instead of `arguments` (VLLM normalizes this) - Works natively with Open WebUI - Supports the special `<|python_tag|>` token for code execution ### 3. Mistral Format **Used by:** Mistral-Nemo, Mistral-7B, Mistral-Small **Parser flag:** `--tool-call-parser mistral` **Model outputs:** ``` [TOOL_CALLS] [{"name": "get_weather", "arguments": {"location": "San Francisco"}}] ``` **Characteristics:** - Uses `[TOOL_CALLS]` prefix token - Tool calls are a JSON array (natural parallel calling) - Clean, minimal format ## What Your Application Receives **Regardless of format, VLLM converts everything to OpenAI-compatible JSON:** ```json { "choices": [{ "message": { "role": "assistant", "content": null, "tool_calls": [{ "id": "call_abc123", "type": "function", "function": { "name": "get_weather", "arguments": "{\"location\": \"San Francisco\"}" } }] } }] } ``` Your application code is the same regardless of which model or parser you use. ## Which Parser for Which Model? | Model | Parser | Why | |-------|--------|-----| | Hermes-3 (any size) | `hermes` | Fine-tuned on ChatML + XML format | | Hermes-2-Pro | `hermes` | Same format family | | Llama-3.1 (any size) | `llama3_json` | Native Llama 3 format | | Llama-3.3 (any size) | `llama3_json` | Same format as 3.1 | | Qwen2 | `hermes` | ChatML-compatible, works with hermes parser | | Mistral-Nemo | `mistral` | Native Mistral format | | Mistral-7B | `mistral` | Same format family | ## Custom Middleware vs VLLM Parser ### When to use VLLM's built-in parser: - Standard OpenAI-compatible API usage - Open WebUI or similar frontends - Any application expecting OpenAI format ### When to build custom middleware: - You need to intercept and modify tool calls before execution - You're doing validation/retry logic at the tool call level - Your Hermes model outputs `` tags but VLLM's parser isn't available - You need custom error handling per tool call For custom parsing, see `examples/robust_json_extraction.py` which handles all the edge cases. ## Common Mistakes 1. **Wrong parser for model** — Using `hermes` parser with Llama 3.3 (or vice versa) silently produces no tool calls 2. **Missing `--enable-auto-tool-choice`** — Without this, the model never generates tool calls even with the right parser 3. **Custom system prompt overriding format** — If you add `` instructions to a Llama 3.3 system prompt, the model outputs XML but the `llama3_json` parser can't parse it 4. **Assuming all models use the same format** — They don't. Always match parser to model.