File size: 5,996 Bytes
634c038
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
# Multi-Step Workflow Architecture

For complex tasks, single-prompt tool calling is unreliable. This guide explains the multi-step architecture that makes tool calling work at production quality.

## Why Multi-Step?

Single-prompt tool calling fails when:
- There are 5+ tools and the LLM gets confused about which to use
- The task requires sequential operations (discover β†’ configure β†’ validate)
- You need to enforce that certain tools are called before others
- Validation must happen before the LLM returns a final answer

## Architecture Overview

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Step 1:          β”‚     β”‚ Step 2:              β”‚     β”‚ Step 3:           β”‚
β”‚ Discovery        │────>β”‚ Configuration        │────>β”‚ Assembly          β”‚
β”‚                  β”‚     β”‚                      β”‚     β”‚                   β”‚
β”‚ Tools:           β”‚     β”‚ Tools:               β”‚     β”‚ Tools:            β”‚
β”‚ - search         β”‚     β”‚ - get_details        β”‚     β”‚ - assemble        β”‚
β”‚ - list           β”‚     β”‚ - validate_minimal   β”‚     β”‚ - validate_full   β”‚
β”‚ - get_info       β”‚     β”‚ - validate_full      β”‚     β”‚ - deploy          β”‚
β”‚                  β”‚     β”‚                      β”‚     β”‚                   β”‚
β”‚ Output:          β”‚     β”‚ Output:              β”‚     β”‚ Output:           β”‚
β”‚ What to use      β”‚     β”‚ How to configure     β”‚     β”‚ Final result      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

## Key Patterns

### 1. Isolated Tool Sets

Each step only sees relevant tools:

```python
registry.register(name="search", function=search_fn, steps=[1])
registry.register(name="get_details", function=details_fn, steps=[1, 2])
registry.register(name="validate", function=validate_fn, steps=[2])
```

**Why:** Reducing tool count per step dramatically improves LLM accuracy. With 15 tools, the LLM often calls the wrong one. With 3-4 tools per step, it's reliable.

### 2. Pydantic Schema Validation

Every LLM response is validated against a Pydantic schema:

```python
class StepResponse(BaseModel):
    success: Optional[bool] = None
    result: Optional[Dict] = None
    tool_calls: Optional[List[ToolCall]] = None

# Validate structure
schema_class(**json.loads(llm_output))
```

**Why:** The LLM may return syntactically valid JSON that's structurally wrong (missing fields, wrong types). Pydantic catches this before it reaches your application.

### 3. Dual-Purpose Response Schema

The same schema handles both tool call requests and final responses:

```python
# Tool call request
{"tool_calls": [{"name": "search", "arguments": {"q": "test"}}]}

# Final response
{"success": true, "result": {...}, "reasoning": "..."}
```

**Why:** The LLM doesn't need to learn two different output formats. The orchestrator checks for `tool_calls` first, and treats anything else as a final response.

### 4. Validation Enforcement

The orchestrator requires certain tools to be called **and pass** before accepting a final response:

```python
result = run_step(
    ...,
    validation_tools=["validate_minimal", "validate_full"]
)
```

If the LLM tries to return "success" without all validations passing:

```
You returned a final response but validations have not all passed.

Validation Errors Found:
1. Property: channel
   Message: Required field 'channel' is missing

Please fix the errors and call the validation tools again.
```

### 5. Structured Error Feedback

When a tool call fails, the error is formatted with enough detail for the LLM to fix it:

```xml
<tool_response>
<tool_name>validate</tool_name>
<status>ERROR</status>
<result>{"valid": false, "errors": [{"property": "name", "message": "required"}]}</result>
</tool_response>
IMPORTANT: This tool call failed. Read the error, understand the issue, fix and retry.
```

### 6. Workflow Order Enforcement

Without explicit instructions, the LLM restarts from scratch when validation fails. The prompt must enforce:

```
Follow this workflow in order. Do NOT skip steps or go back.

1. Get information (ONCE)
2. Configure
3. Validate
4. If validation fails: FIX and re-validate (do NOT go back to step 1)
```

## Iteration Budget

Each step needs multiple LLM turns:

```
Turn 1: Call get_details for component A     (tool call)
Turn 2: Call get_details for component B     (tool call)
Turn 3: Configure both components            (tool call to validate)
Turn 4: Validation fails β€” fix errors        (tool call to re-validate)
Turn 5: Validation passes β€” return result    (final response)
```

Minimum: 5 iterations per step. Recommended: 10. Complex: 15.

## Retry Logic

Steps can fail (LLM returns text instead of JSON, runs out of iterations, etc.). The workflow runner retries each step:

```python
for retry in range(max_step_retries):
    result = run_step(...)
    if result:
        break
else:
    # Step failed after all retries
```

Recommended: 3 retries per step.

## Implementation

See `examples/multi_step_orchestrator.py` for complete working code with:
- `VLLMClient` β€” Simple VLLM API client
- `ToolRegistry` β€” Step-based tool registration and execution
- `run_step()` β€” Single step execution with validation enforcement
- `run_workflow()` β€” Multi-step orchestration with retry logic

## When to Use Multi-Step

| Scenario | Single Prompt | Multi-Step |
|----------|--------------|-----------|
| 1-2 simple tools | Yes | Overkill |
| 3-5 tools, all independent | Yes | Optional |
| 5+ tools with dependencies | No | Yes |
| Sequential operations | No | Yes |
| Validation required | No | Yes |
| Production reliability needed | No | Yes |