guides/MULTI_STEP_WORKFLOWS.md · joshuaeric/vllm-tool-calling-guide at main

vllm-tool-calling-guide / guides /MULTI_STEP_WORKFLOWS.md

Joshua Odmark

Initial release: VLLM tool calling guide for open source models

634c038 2 months ago

6 kB

	# Multi-Step Workflow Architecture

	For complex tasks, single-prompt tool calling is unreliable. This guide explains the multi-step architecture that makes tool calling work at production quality.

	## Why Multi-Step?

	Single-prompt tool calling fails when:
	- There are 5+ tools and the LLM gets confused about which to use
	- The task requires sequential operations (discover → configure → validate)
	- You need to enforce that certain tools are called before others
	- Validation must happen before the LLM returns a final answer

	## Architecture Overview

	```
	┌─────────────────┐ ┌─────────────────────┐ ┌──────────────────┐
	│ Step 1: │ │ Step 2: │ │ Step 3: │
	│ Discovery │────>│ Configuration │────>│ Assembly │
	│ │ │ │ │ │
	│ Tools: │ │ Tools: │ │ Tools: │
	│ - search │ │ - get_details │ │ - assemble │
	│ - list │ │ - validate_minimal │ │ - validate_full │
	│ - get_info │ │ - validate_full │ │ - deploy │
	│ │ │ │ │ │
	│ Output: │ │ Output: │ │ Output: │
	│ What to use │ │ How to configure │ │ Final result │
	└─────────────────┘ └─────────────────────┘ └──────────────────┘
	```

	## Key Patterns

	### 1. Isolated Tool Sets

	Each step only sees relevant tools:

	```python
	registry.register(name="search", function=search_fn, steps=[1])
	registry.register(name="get_details", function=details_fn, steps=[1, 2])
	registry.register(name="validate", function=validate_fn, steps=[2])
	```

	Why: Reducing tool count per step dramatically improves LLM accuracy. With 15 tools, the LLM often calls the wrong one. With 3-4 tools per step, it's reliable.

	### 2. Pydantic Schema Validation

	Every LLM response is validated against a Pydantic schema:

	```python
	class StepResponse(BaseModel):
	success: Optional[bool] = None
	result: Optional[Dict] = None
	tool_calls: Optional[List[ToolCall]] = None

	# Validate structure
	schema_class(**json.loads(llm_output))
	```

	Why: The LLM may return syntactically valid JSON that's structurally wrong (missing fields, wrong types). Pydantic catches this before it reaches your application.

	### 3. Dual-Purpose Response Schema

	The same schema handles both tool call requests and final responses:

	```python
	# Tool call request
	{"tool_calls": [{"name": "search", "arguments": {"q": "test"}}]}

	# Final response
	{"success": true, "result": {...}, "reasoning": "..."}
	```

	Why: The LLM doesn't need to learn two different output formats. The orchestrator checks for `tool_calls` first, and treats anything else as a final response.

	### 4. Validation Enforcement

	The orchestrator requires certain tools to be called and pass before accepting a final response:

	```python
	result = run_step(
	...,
	validation_tools=["validate_minimal", "validate_full"]
	)
	```

	If the LLM tries to return "success" without all validations passing:

	```
	You returned a final response but validations have not all passed.

	Validation Errors Found:
	1. Property: channel
	Message: Required field 'channel' is missing

	Please fix the errors and call the validation tools again.
	```

	### 5. Structured Error Feedback

	When a tool call fails, the error is formatted with enough detail for the LLM to fix it:

	```xml
	<tool_response>
	<tool_name>validate</tool_name>
	<status>ERROR</status>
	<result>{"valid": false, "errors": [{"property": "name", "message": "required"}]}</result>
	</tool_response>
	IMPORTANT: This tool call failed. Read the error, understand the issue, fix and retry.
	```

	### 6. Workflow Order Enforcement

	Without explicit instructions, the LLM restarts from scratch when validation fails. The prompt must enforce:

	```
	Follow this workflow in order. Do NOT skip steps or go back.

	1. Get information (ONCE)
	2. Configure
	3. Validate
	4. If validation fails: FIX and re-validate (do NOT go back to step 1)
	```

	## Iteration Budget

	Each step needs multiple LLM turns:

	```
	Turn 1: Call get_details for component A (tool call)
	Turn 2: Call get_details for component B (tool call)
	Turn 3: Configure both components (tool call to validate)
	Turn 4: Validation fails — fix errors (tool call to re-validate)
	Turn 5: Validation passes — return result (final response)
	```

	Minimum: 5 iterations per step. Recommended: 10. Complex: 15.

	## Retry Logic

	Steps can fail (LLM returns text instead of JSON, runs out of iterations, etc.). The workflow runner retries each step:

	```python
	for retry in range(max_step_retries):
	result = run_step(...)
	if result:
	break
	else:
	# Step failed after all retries
	```

	Recommended: 3 retries per step.

	## Implementation

	See `examples/multi_step_orchestrator.py` for complete working code with:
	- `VLLMClient` — Simple VLLM API client
	- `ToolRegistry` — Step-based tool registration and execution
	- `run_step()` — Single step execution with validation enforcement
	- `run_workflow()` — Multi-step orchestration with retry logic

	## When to Use Multi-Step

	\| Scenario \| Single Prompt \| Multi-Step \|
	\|----------\|--------------\|-----------\|
	\| 1-2 simple tools \| Yes \| Overkill \|
	\| 3-5 tools, all independent \| Yes \| Optional \|
	\| 5+ tools with dependencies \| No \| Yes \|
	\| Sequential operations \| No \| Yes \|
	\| Validation required \| No \| Yes \|
	\| Production reliability needed \| No \| Yes \|