File size: 5,996 Bytes

634c038

# Multi-Step Workflow Architecture

For complex tasks, single-prompt tool calling is unreliable. This guide explains the multi-step architecture that makes tool calling work at production quality.

## Why Multi-Step?

Single-prompt tool calling fails when:
- There are 5+ tools and the LLM gets confused about which to use
- The task requires sequential operations (discover → configure → validate)
- You need to enforce that certain tools are called before others
- Validation must happen before the LLM returns a final answer

## Architecture Overview

```
┌─────────────────┐     ┌─────────────────────┐     ┌──────────────────┐
│ Step 1:          │     │ Step 2:              │     │ Step 3:           │
│ Discovery        │────>│ Configuration        │────>│ Assembly          │
│                  │     │                      │     │                   │
│ Tools:           │     │ Tools:               │     │ Tools:            │
│ - search         │     │ - get_details        │     │ - assemble        │
│ - list           │     │ - validate_minimal   │     │ - validate_full   │
│ - get_info       │     │ - validate_full      │     │ - deploy          │
│                  │     │                      │     │                   │
│ Output:          │     │ Output:              │     │ Output:           │
│ What to use      │     │ How to configure     │     │ Final result      │
└─────────────────┘     └─────────────────────┘     └──────────────────┘
```

## Key Patterns

### 1. Isolated Tool Sets

Each step only sees relevant tools:

```python
registry.register(name="search", function=search_fn, steps=[1])
registry.register(name="get_details", function=details_fn, steps=[1, 2])
registry.register(name="validate", function=validate_fn, steps=[2])
```

**Why:** Reducing tool count per step dramatically improves LLM accuracy. With 15 tools, the LLM often calls the wrong one. With 3-4 tools per step, it's reliable.

### 2. Pydantic Schema Validation

Every LLM response is validated against a Pydantic schema:

```python
class StepResponse(BaseModel):
    success: Optional[bool] = None
    result: Optional[Dict] = None
    tool_calls: Optional[List[ToolCall]] = None

# Validate structure
schema_class(**json.loads(llm_output))
```

**Why:** The LLM may return syntactically valid JSON that's structurally wrong (missing fields, wrong types). Pydantic catches this before it reaches your application.

### 3. Dual-Purpose Response Schema

The same schema handles both tool call requests and final responses:

```python
# Tool call request
{"tool_calls": [{"name": "search", "arguments": {"q": "test"}}]}

# Final response
{"success": true, "result": {...}, "reasoning": "..."}
```

**Why:** The LLM doesn't need to learn two different output formats. The orchestrator checks for `tool_calls` first, and treats anything else as a final response.

### 4. Validation Enforcement

The orchestrator requires certain tools to be called **and pass** before accepting a final response:

```python
result = run_step(
    ...,
    validation_tools=["validate_minimal", "validate_full"]
)
```

If the LLM tries to return "success" without all validations passing:

```
You returned a final response but validations have not all passed.

Validation Errors Found:
1. Property: channel
   Message: Required field 'channel' is missing

Please fix the errors and call the validation tools again.
```

### 5. Structured Error Feedback

When a tool call fails, the error is formatted with enough detail for the LLM to fix it:

```xml
<tool_response>
<tool_name>validate</tool_name>
<status>ERROR</status>
<result>{"valid": false, "errors": [{"property": "name", "message": "required"}]}</result>
</tool_response>
IMPORTANT: This tool call failed. Read the error, understand the issue, fix and retry.
```

### 6. Workflow Order Enforcement

Without explicit instructions, the LLM restarts from scratch when validation fails. The prompt must enforce:

```
Follow this workflow in order. Do NOT skip steps or go back.

1. Get information (ONCE)
2. Configure
3. Validate
4. If validation fails: FIX and re-validate (do NOT go back to step 1)
```

## Iteration Budget

Each step needs multiple LLM turns:

```
Turn 1: Call get_details for component A     (tool call)
Turn 2: Call get_details for component B     (tool call)
Turn 3: Configure both components            (tool call to validate)
Turn 4: Validation fails — fix errors        (tool call to re-validate)
Turn 5: Validation passes — return result    (final response)
```

Minimum: 5 iterations per step. Recommended: 10. Complex: 15.

## Retry Logic

Steps can fail (LLM returns text instead of JSON, runs out of iterations, etc.). The workflow runner retries each step:

```python
for retry in range(max_step_retries):
    result = run_step(...)
    if result:
        break
else:
    # Step failed after all retries
```

Recommended: 3 retries per step.

## Implementation

See `examples/multi_step_orchestrator.py` for complete working code with:
- `VLLMClient` — Simple VLLM API client
- `ToolRegistry` — Step-based tool registration and execution
- `run_step()` — Single step execution with validation enforcement
- `run_workflow()` — Multi-step orchestration with retry logic

## When to Use Multi-Step

| Scenario | Single Prompt | Multi-Step |
|----------|--------------|-----------|
| 1-2 simple tools | Yes | Overkill |
| 3-5 tools, all independent | Yes | Optional |
| 5+ tools with dependencies | No | Yes |
| Sequential operations | No | Yes |
| Validation required | No | Yes |
| Production reliability needed | No | Yes |