# Tool Call Formats Explained

VLLM supports multiple tool call formats. Each model family uses a different native format, but VLLM converts them all to OpenAI-compatible JSON.

## Format Comparison

### 1. Hermes Format (ChatML + XML)

**Used by:** Hermes-3, Hermes-2-Pro, Qwen2 (via hermes parser)
**Parser flag:** `--tool-call-parser hermes`

**Model outputs:**
```xml
<tool_call>
{"name": "get_weather", "arguments": {"location": "San Francisco"}}
</tool_call>
```

**Tool responses formatted as:**
```xml
<tool_response>
{"temperature": 22, "condition": "Sunny"}
</tool_response>
```

**Characteristics:**
- XML tags make tool calls easy to parse reliably
- Supports parallel calls via `tool_calls` array inside tags
- Most reliable format for structured output
- ChatML-based (`<|im_start|>`, `<|im_end|>`)

### 2. Llama 3 JSON Format

**Used by:** Llama-3.1, Llama-3.3
**Parser flag:** `--tool-call-parser llama3_json`

**Model outputs:**
```json
{"name": "get_weather", "parameters": {"location": "San Francisco"}}
```

**Characteristics:**
- Pure JSON, no XML wrapping
- Uses `parameters` instead of `arguments` (VLLM normalizes this)
- Works natively with Open WebUI
- Supports the special `<|python_tag|>` token for code execution

### 3. Mistral Format

**Used by:** Mistral-Nemo, Mistral-7B, Mistral-Small
**Parser flag:** `--tool-call-parser mistral`

**Model outputs:**
```
[TOOL_CALLS] [{"name": "get_weather", "arguments": {"location": "San Francisco"}}]
```

**Characteristics:**
- Uses `[TOOL_CALLS]` prefix token
- Tool calls are a JSON array (natural parallel calling)
- Clean, minimal format

## What Your Application Receives

**Regardless of format, VLLM converts everything to OpenAI-compatible JSON:**

```json
{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "call_abc123",
        "type": "function",
        "function": {
          "name": "get_weather",
          "arguments": "{\"location\": \"San Francisco\"}"
        }
      }]
    }
  }]
}
```

Your application code is the same regardless of which model or parser you use.

## Which Parser for Which Model?

| Model | Parser | Why |
|-------|--------|-----|
| Hermes-3 (any size) | `hermes` | Fine-tuned on ChatML + XML format |
| Hermes-2-Pro | `hermes` | Same format family |
| Llama-3.1 (any size) | `llama3_json` | Native Llama 3 format |
| Llama-3.3 (any size) | `llama3_json` | Same format as 3.1 |
| Qwen2 | `hermes` | ChatML-compatible, works with hermes parser |
| Mistral-Nemo | `mistral` | Native Mistral format |
| Mistral-7B | `mistral` | Same format family |

## Custom Middleware vs VLLM Parser

### When to use VLLM's built-in parser:

- Standard OpenAI-compatible API usage
- Open WebUI or similar frontends
- Any application expecting OpenAI format

### When to build custom middleware:

- You need to intercept and modify tool calls before execution
- You're doing validation/retry logic at the tool call level
- Your Hermes model outputs `<tool_call>` tags but VLLM's parser isn't available
- You need custom error handling per tool call

For custom parsing, see `examples/robust_json_extraction.py` which handles all the edge cases.

## Common Mistakes

1. **Wrong parser for model** — Using `hermes` parser with Llama 3.3 (or vice versa) silently produces no tool calls
2. **Missing `--enable-auto-tool-choice`** — Without this, the model never generates tool calls even with the right parser
3. **Custom system prompt overriding format** — If you add `<tool_call>` instructions to a Llama 3.3 system prompt, the model outputs XML but the `llama3_json` parser can't parse it
4. **Assuming all models use the same format** — They don't. Always match parser to model.