| # Multi-Step Workflow Architecture |
|
|
| For complex tasks, single-prompt tool calling is unreliable. This guide explains the multi-step architecture that makes tool calling work at production quality. |
|
|
| ## Why Multi-Step? |
|
|
| Single-prompt tool calling fails when: |
| - There are 5+ tools and the LLM gets confused about which to use |
| - The task requires sequential operations (discover β configure β validate) |
| - You need to enforce that certain tools are called before others |
| - Validation must happen before the LLM returns a final answer |
|
|
| ## Architecture Overview |
|
|
| ``` |
| βββββββββββββββββββ βββββββββββββββββββββββ ββββββββββββββββββββ |
| β Step 1: β β Step 2: β β Step 3: β |
| β Discovery βββββ>β Configuration βββββ>β Assembly β |
| β β β β β β |
| β Tools: β β Tools: β β Tools: β |
| β - search β β - get_details β β - assemble β |
| β - list β β - validate_minimal β β - validate_full β |
| β - get_info β β - validate_full β β - deploy β |
| β β β β β β |
| β Output: β β Output: β β Output: β |
| β What to use β β How to configure β β Final result β |
| βββββββββββββββββββ βββββββββββββββββββββββ ββββββββββββββββββββ |
| ``` |
|
|
| ## Key Patterns |
|
|
| ### 1. Isolated Tool Sets |
|
|
| Each step only sees relevant tools: |
|
|
| ```python |
| registry.register(name="search", function=search_fn, steps=[1]) |
| registry.register(name="get_details", function=details_fn, steps=[1, 2]) |
| registry.register(name="validate", function=validate_fn, steps=[2]) |
| ``` |
|
|
| **Why:** Reducing tool count per step dramatically improves LLM accuracy. With 15 tools, the LLM often calls the wrong one. With 3-4 tools per step, it's reliable. |
|
|
| ### 2. Pydantic Schema Validation |
|
|
| Every LLM response is validated against a Pydantic schema: |
|
|
| ```python |
| class StepResponse(BaseModel): |
| success: Optional[bool] = None |
| result: Optional[Dict] = None |
| tool_calls: Optional[List[ToolCall]] = None |
| |
| # Validate structure |
| schema_class(**json.loads(llm_output)) |
| ``` |
|
|
| **Why:** The LLM may return syntactically valid JSON that's structurally wrong (missing fields, wrong types). Pydantic catches this before it reaches your application. |
|
|
| ### 3. Dual-Purpose Response Schema |
|
|
| The same schema handles both tool call requests and final responses: |
|
|
| ```python |
| # Tool call request |
| {"tool_calls": [{"name": "search", "arguments": {"q": "test"}}]} |
| |
| # Final response |
| {"success": true, "result": {...}, "reasoning": "..."} |
| ``` |
|
|
| **Why:** The LLM doesn't need to learn two different output formats. The orchestrator checks for `tool_calls` first, and treats anything else as a final response. |
|
|
| ### 4. Validation Enforcement |
|
|
| The orchestrator requires certain tools to be called **and pass** before accepting a final response: |
|
|
| ```python |
| result = run_step( |
| ..., |
| validation_tools=["validate_minimal", "validate_full"] |
| ) |
| ``` |
|
|
| If the LLM tries to return "success" without all validations passing: |
|
|
| ``` |
| You returned a final response but validations have not all passed. |
| |
| Validation Errors Found: |
| 1. Property: channel |
| Message: Required field 'channel' is missing |
| |
| Please fix the errors and call the validation tools again. |
| ``` |
|
|
| ### 5. Structured Error Feedback |
|
|
| When a tool call fails, the error is formatted with enough detail for the LLM to fix it: |
|
|
| ```xml |
| <tool_response> |
| <tool_name>validate</tool_name> |
| <status>ERROR</status> |
| <result>{"valid": false, "errors": [{"property": "name", "message": "required"}]}</result> |
| </tool_response> |
| IMPORTANT: This tool call failed. Read the error, understand the issue, fix and retry. |
| ``` |
|
|
| ### 6. Workflow Order Enforcement |
|
|
| Without explicit instructions, the LLM restarts from scratch when validation fails. The prompt must enforce: |
|
|
| ``` |
| Follow this workflow in order. Do NOT skip steps or go back. |
| |
| 1. Get information (ONCE) |
| 2. Configure |
| 3. Validate |
| 4. If validation fails: FIX and re-validate (do NOT go back to step 1) |
| ``` |
|
|
| ## Iteration Budget |
|
|
| Each step needs multiple LLM turns: |
|
|
| ``` |
| Turn 1: Call get_details for component A (tool call) |
| Turn 2: Call get_details for component B (tool call) |
| Turn 3: Configure both components (tool call to validate) |
| Turn 4: Validation fails β fix errors (tool call to re-validate) |
| Turn 5: Validation passes β return result (final response) |
| ``` |
|
|
| Minimum: 5 iterations per step. Recommended: 10. Complex: 15. |
|
|
| ## Retry Logic |
|
|
| Steps can fail (LLM returns text instead of JSON, runs out of iterations, etc.). The workflow runner retries each step: |
|
|
| ```python |
| for retry in range(max_step_retries): |
| result = run_step(...) |
| if result: |
| break |
| else: |
| # Step failed after all retries |
| ``` |
|
|
| Recommended: 3 retries per step. |
|
|
| ## Implementation |
|
|
| See `examples/multi_step_orchestrator.py` for complete working code with: |
| - `VLLMClient` β Simple VLLM API client |
| - `ToolRegistry` β Step-based tool registration and execution |
| - `run_step()` β Single step execution with validation enforcement |
| - `run_workflow()` β Multi-step orchestration with retry logic |
|
|
| ## When to Use Multi-Step |
|
|
| | Scenario | Single Prompt | Multi-Step | |
| |----------|--------------|-----------| |
| | 1-2 simple tools | Yes | Overkill | |
| | 3-5 tools, all independent | Yes | Optional | |
| | 5+ tools with dependencies | No | Yes | |
| | Sequential operations | No | Yes | |
| | Validation required | No | Yes | |
| | Production reliability needed | No | Yes | |
|
|