vllm-tool-calling-guide / guides /MULTI_STEP_WORKFLOWS.md
Joshua Odmark
Initial release: VLLM tool calling guide for open source models
634c038
# Multi-Step Workflow Architecture
For complex tasks, single-prompt tool calling is unreliable. This guide explains the multi-step architecture that makes tool calling work at production quality.
## Why Multi-Step?
Single-prompt tool calling fails when:
- There are 5+ tools and the LLM gets confused about which to use
- The task requires sequential operations (discover β†’ configure β†’ validate)
- You need to enforce that certain tools are called before others
- Validation must happen before the LLM returns a final answer
## Architecture Overview
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Step 1: β”‚ β”‚ Step 2: β”‚ β”‚ Step 3: β”‚
β”‚ Discovery │────>β”‚ Configuration │────>β”‚ Assembly β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ Tools: β”‚ β”‚ Tools: β”‚ β”‚ Tools: β”‚
β”‚ - search β”‚ β”‚ - get_details β”‚ β”‚ - assemble β”‚
β”‚ - list β”‚ β”‚ - validate_minimal β”‚ β”‚ - validate_full β”‚
β”‚ - get_info β”‚ β”‚ - validate_full β”‚ β”‚ - deploy β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ Output: β”‚ β”‚ Output: β”‚ β”‚ Output: β”‚
β”‚ What to use β”‚ β”‚ How to configure β”‚ β”‚ Final result β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
## Key Patterns
### 1. Isolated Tool Sets
Each step only sees relevant tools:
```python
registry.register(name="search", function=search_fn, steps=[1])
registry.register(name="get_details", function=details_fn, steps=[1, 2])
registry.register(name="validate", function=validate_fn, steps=[2])
```
**Why:** Reducing tool count per step dramatically improves LLM accuracy. With 15 tools, the LLM often calls the wrong one. With 3-4 tools per step, it's reliable.
### 2. Pydantic Schema Validation
Every LLM response is validated against a Pydantic schema:
```python
class StepResponse(BaseModel):
success: Optional[bool] = None
result: Optional[Dict] = None
tool_calls: Optional[List[ToolCall]] = None
# Validate structure
schema_class(**json.loads(llm_output))
```
**Why:** The LLM may return syntactically valid JSON that's structurally wrong (missing fields, wrong types). Pydantic catches this before it reaches your application.
### 3. Dual-Purpose Response Schema
The same schema handles both tool call requests and final responses:
```python
# Tool call request
{"tool_calls": [{"name": "search", "arguments": {"q": "test"}}]}
# Final response
{"success": true, "result": {...}, "reasoning": "..."}
```
**Why:** The LLM doesn't need to learn two different output formats. The orchestrator checks for `tool_calls` first, and treats anything else as a final response.
### 4. Validation Enforcement
The orchestrator requires certain tools to be called **and pass** before accepting a final response:
```python
result = run_step(
...,
validation_tools=["validate_minimal", "validate_full"]
)
```
If the LLM tries to return "success" without all validations passing:
```
You returned a final response but validations have not all passed.
Validation Errors Found:
1. Property: channel
Message: Required field 'channel' is missing
Please fix the errors and call the validation tools again.
```
### 5. Structured Error Feedback
When a tool call fails, the error is formatted with enough detail for the LLM to fix it:
```xml
<tool_response>
<tool_name>validate</tool_name>
<status>ERROR</status>
<result>{"valid": false, "errors": [{"property": "name", "message": "required"}]}</result>
</tool_response>
IMPORTANT: This tool call failed. Read the error, understand the issue, fix and retry.
```
### 6. Workflow Order Enforcement
Without explicit instructions, the LLM restarts from scratch when validation fails. The prompt must enforce:
```
Follow this workflow in order. Do NOT skip steps or go back.
1. Get information (ONCE)
2. Configure
3. Validate
4. If validation fails: FIX and re-validate (do NOT go back to step 1)
```
## Iteration Budget
Each step needs multiple LLM turns:
```
Turn 1: Call get_details for component A (tool call)
Turn 2: Call get_details for component B (tool call)
Turn 3: Configure both components (tool call to validate)
Turn 4: Validation fails β€” fix errors (tool call to re-validate)
Turn 5: Validation passes β€” return result (final response)
```
Minimum: 5 iterations per step. Recommended: 10. Complex: 15.
## Retry Logic
Steps can fail (LLM returns text instead of JSON, runs out of iterations, etc.). The workflow runner retries each step:
```python
for retry in range(max_step_retries):
result = run_step(...)
if result:
break
else:
# Step failed after all retries
```
Recommended: 3 retries per step.
## Implementation
See `examples/multi_step_orchestrator.py` for complete working code with:
- `VLLMClient` β€” Simple VLLM API client
- `ToolRegistry` β€” Step-based tool registration and execution
- `run_step()` β€” Single step execution with validation enforcement
- `run_workflow()` β€” Multi-step orchestration with retry logic
## When to Use Multi-Step
| Scenario | Single Prompt | Multi-Step |
|----------|--------------|-----------|
| 1-2 simple tools | Yes | Overkill |
| 3-5 tools, all independent | Yes | Optional |
| 5+ tools with dependencies | No | Yes |
| Sequential operations | No | Yes |
| Validation required | No | Yes |
| Production reliability needed | No | Yes |