# Tool Call Formats Explained
VLLM supports multiple tool call formats. Each model family uses a different native format, but VLLM converts them all to OpenAI-compatible JSON.
## Format Comparison
### 1. Hermes Format (ChatML + XML)
**Used by:** Hermes-3, Hermes-2-Pro, Qwen2 (via hermes parser)
**Parser flag:** `--tool-call-parser hermes`
**Model outputs:**
```xml
{"name": "get_weather", "arguments": {"location": "San Francisco"}}
```
**Tool responses formatted as:**
```xml
{"temperature": 22, "condition": "Sunny"}
```
**Characteristics:**
- XML tags make tool calls easy to parse reliably
- Supports parallel calls via `tool_calls` array inside tags
- Most reliable format for structured output
- ChatML-based (`<|im_start|>`, `<|im_end|>`)
### 2. Llama 3 JSON Format
**Used by:** Llama-3.1, Llama-3.3
**Parser flag:** `--tool-call-parser llama3_json`
**Model outputs:**
```json
{"name": "get_weather", "parameters": {"location": "San Francisco"}}
```
**Characteristics:**
- Pure JSON, no XML wrapping
- Uses `parameters` instead of `arguments` (VLLM normalizes this)
- Works natively with Open WebUI
- Supports the special `<|python_tag|>` token for code execution
### 3. Mistral Format
**Used by:** Mistral-Nemo, Mistral-7B, Mistral-Small
**Parser flag:** `--tool-call-parser mistral`
**Model outputs:**
```
[TOOL_CALLS] [{"name": "get_weather", "arguments": {"location": "San Francisco"}}]
```
**Characteristics:**
- Uses `[TOOL_CALLS]` prefix token
- Tool calls are a JSON array (natural parallel calling)
- Clean, minimal format
## What Your Application Receives
**Regardless of format, VLLM converts everything to OpenAI-compatible JSON:**
```json
{
"choices": [{
"message": {
"role": "assistant",
"content": null,
"tool_calls": [{
"id": "call_abc123",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"location\": \"San Francisco\"}"
}
}]
}
}]
}
```
Your application code is the same regardless of which model or parser you use.
## Which Parser for Which Model?
| Model | Parser | Why |
|-------|--------|-----|
| Hermes-3 (any size) | `hermes` | Fine-tuned on ChatML + XML format |
| Hermes-2-Pro | `hermes` | Same format family |
| Llama-3.1 (any size) | `llama3_json` | Native Llama 3 format |
| Llama-3.3 (any size) | `llama3_json` | Same format as 3.1 |
| Qwen2 | `hermes` | ChatML-compatible, works with hermes parser |
| Mistral-Nemo | `mistral` | Native Mistral format |
| Mistral-7B | `mistral` | Same format family |
## Custom Middleware vs VLLM Parser
### When to use VLLM's built-in parser:
- Standard OpenAI-compatible API usage
- Open WebUI or similar frontends
- Any application expecting OpenAI format
### When to build custom middleware:
- You need to intercept and modify tool calls before execution
- You're doing validation/retry logic at the tool call level
- Your Hermes model outputs `` tags but VLLM's parser isn't available
- You need custom error handling per tool call
For custom parsing, see `examples/robust_json_extraction.py` which handles all the edge cases.
## Common Mistakes
1. **Wrong parser for model** — Using `hermes` parser with Llama 3.3 (or vice versa) silently produces no tool calls
2. **Missing `--enable-auto-tool-choice`** — Without this, the model never generates tool calls even with the right parser
3. **Custom system prompt overriding format** — If you add `` instructions to a Llama 3.3 system prompt, the model outputs XML but the `llama3_json` parser can't parse it
4. **Assuming all models use the same format** — They don't. Always match parser to model.