# Multi-Step Workflow Architecture For complex tasks, single-prompt tool calling is unreliable. This guide explains the multi-step architecture that makes tool calling work at production quality. ## Why Multi-Step? Single-prompt tool calling fails when: - There are 5+ tools and the LLM gets confused about which to use - The task requires sequential operations (discover → configure → validate) - You need to enforce that certain tools are called before others - Validation must happen before the LLM returns a final answer ## Architecture Overview ``` ┌─────────────────┐ ┌─────────────────────┐ ┌──────────────────┐ │ Step 1: │ │ Step 2: │ │ Step 3: │ │ Discovery │────>│ Configuration │────>│ Assembly │ │ │ │ │ │ │ │ Tools: │ │ Tools: │ │ Tools: │ │ - search │ │ - get_details │ │ - assemble │ │ - list │ │ - validate_minimal │ │ - validate_full │ │ - get_info │ │ - validate_full │ │ - deploy │ │ │ │ │ │ │ │ Output: │ │ Output: │ │ Output: │ │ What to use │ │ How to configure │ │ Final result │ └─────────────────┘ └─────────────────────┘ └──────────────────┘ ``` ## Key Patterns ### 1. Isolated Tool Sets Each step only sees relevant tools: ```python registry.register(name="search", function=search_fn, steps=[1]) registry.register(name="get_details", function=details_fn, steps=[1, 2]) registry.register(name="validate", function=validate_fn, steps=[2]) ``` **Why:** Reducing tool count per step dramatically improves LLM accuracy. With 15 tools, the LLM often calls the wrong one. With 3-4 tools per step, it's reliable. ### 2. Pydantic Schema Validation Every LLM response is validated against a Pydantic schema: ```python class StepResponse(BaseModel): success: Optional[bool] = None result: Optional[Dict] = None tool_calls: Optional[List[ToolCall]] = None # Validate structure schema_class(**json.loads(llm_output)) ``` **Why:** The LLM may return syntactically valid JSON that's structurally wrong (missing fields, wrong types). Pydantic catches this before it reaches your application. ### 3. Dual-Purpose Response Schema The same schema handles both tool call requests and final responses: ```python # Tool call request {"tool_calls": [{"name": "search", "arguments": {"q": "test"}}]} # Final response {"success": true, "result": {...}, "reasoning": "..."} ``` **Why:** The LLM doesn't need to learn two different output formats. The orchestrator checks for `tool_calls` first, and treats anything else as a final response. ### 4. Validation Enforcement The orchestrator requires certain tools to be called **and pass** before accepting a final response: ```python result = run_step( ..., validation_tools=["validate_minimal", "validate_full"] ) ``` If the LLM tries to return "success" without all validations passing: ``` You returned a final response but validations have not all passed. Validation Errors Found: 1. Property: channel Message: Required field 'channel' is missing Please fix the errors and call the validation tools again. ``` ### 5. Structured Error Feedback When a tool call fails, the error is formatted with enough detail for the LLM to fix it: ```xml validate ERROR {"valid": false, "errors": [{"property": "name", "message": "required"}]} IMPORTANT: This tool call failed. Read the error, understand the issue, fix and retry. ``` ### 6. Workflow Order Enforcement Without explicit instructions, the LLM restarts from scratch when validation fails. The prompt must enforce: ``` Follow this workflow in order. Do NOT skip steps or go back. 1. Get information (ONCE) 2. Configure 3. Validate 4. If validation fails: FIX and re-validate (do NOT go back to step 1) ``` ## Iteration Budget Each step needs multiple LLM turns: ``` Turn 1: Call get_details for component A (tool call) Turn 2: Call get_details for component B (tool call) Turn 3: Configure both components (tool call to validate) Turn 4: Validation fails — fix errors (tool call to re-validate) Turn 5: Validation passes — return result (final response) ``` Minimum: 5 iterations per step. Recommended: 10. Complex: 15. ## Retry Logic Steps can fail (LLM returns text instead of JSON, runs out of iterations, etc.). The workflow runner retries each step: ```python for retry in range(max_step_retries): result = run_step(...) if result: break else: # Step failed after all retries ``` Recommended: 3 retries per step. ## Implementation See `examples/multi_step_orchestrator.py` for complete working code with: - `VLLMClient` — Simple VLLM API client - `ToolRegistry` — Step-based tool registration and execution - `run_step()` — Single step execution with validation enforcement - `run_workflow()` — Multi-step orchestration with retry logic ## When to Use Multi-Step | Scenario | Single Prompt | Multi-Step | |----------|--------------|-----------| | 1-2 simple tools | Yes | Overkill | | 3-5 tools, all independent | Yes | Optional | | 5+ tools with dependencies | No | Yes | | Sequential operations | No | Yes | | Validation required | No | Yes | | Production reliability needed | No | Yes |