Testing Tool Calls

What are Tool Calls?

Tool calling (also called function calling) lets AI models execute functions and use external tools:

User: "Schedule a meeting with John tomorrow at 2pm"

AI Model: [Calls tools internally]
  1. search_calendar(date="tomorrow")
  2. create_event(
       title="Meeting with John",
       time="14:00",
       duration=60
     )

AI Response: "I've scheduled your meeting with John for 2pm tomorrow"

The AI decides which tools to call, in what order, and with what arguments.

Why Test Tool Calls?

AI agents can make mistakes: ❌ Wrong tool: Calls delete_event instead of update_event ❌ Wrong order: Creates event before checking availability ❌ Wrong arguments: Books meeting at wrong time ❌ Missing tools: Forgets to send confirmation email ❌ Dangerous tools: Calls drop_database unintentionally ValidateTools catches these issues.

Tool Call Format

AI providers return tool calls in similar formats:

OpenAI
Anthropic
Vercel AI SDK

{
  "choices": [{
    "message": {
      "tool_calls": [
        {
          "id": "call_123",
          "type": "function",
          "function": {
            "name": "search_calendar",
            "arguments": "{\"date\":\"2024-10-08\"}"
          }
        }
      ]
    }
  }]
}

StreamParser automatically converts these to SemanticTest format:

[
  {
    name: "search_calendar",  // or toolName
    args: {
      date: "2024-10-08"
    }
  }
]

Basic Validation

Check Expected Tools

Verify AI called the right tools:

{
  "pipeline": [
    {
      "block": "HttpRequest",
      "input": {
        "url": "${AI_API}/chat",
        "method": "POST",
        "body": {
          "messages": [{
            "role": "user",
            "content": "Schedule a meeting for tomorrow at 2pm"
          }],
          "tools": [
            { "name": "search_calendar" },
            { "name": "create_event" }
          ]
        }
      },
      "output": "response"
    },
    {
      "block": "StreamParser",
      "input": "${response.body}",
      "config": { "format": "sse-openai" },
      "output": {
        "text": "aiMessage",
        "toolCalls": "tools"
      }
    },
    {
      "block": "ValidateTools",
      "input": {
        "from": "tools",
        "as": "toolCalls"
      },
      "config": {
        "expected": ["search_calendar", "create_event"]
      },
      "output": "validation"
    }
  ],
  "assertions": {
    "validation.passed": true
  }
}

Validation Options

1. Expected Tools

Tools that must be called:

{
  "config": {
    "expected": ["search_database", "send_email"]
  }
}

Result:

{
  passed: true,  // if both were called
  actualTools: ["search_database", "send_email", "log_activity"],
  expectedTools: ["search_database", "send_email"]
}

2. Forbidden Tools

Tools that must not be called:

{
  "config": {
    "forbidden": ["delete_database", "drop_table", "shutdown_server"]
  }
}

Critical for safety! Fails if any forbidden tool is called.

3. Tool Count

Validate number of tools called:

{
  "config": {
    "minTools": 1,
    "maxTools": 3
  }
}

Use cases:

Ensure agent doesn’t get stuck (minTools: 1)
Prevent excessive tool calling (maxTools: 5)

4. Tool Order

Validate sequence of tool calls:

{
  "config": {
    "order": ["authenticate_user", "fetch_data", "process_data"]
  }
}

Example: Database agent must authenticate before querying.

5. Tool Arguments

Validate what data is passed to tools:

{
  "config": {
    "expected": ["search_database"],
    "validateArgs": {
      "search_database": {
        "table": "users",
        "limit": 10
      }
    }
  }
}

Combined Validation

Use multiple validation types together:

{
  "block": "ValidateTools",
  "input": {
    "from": "aiTools",
    "as": "toolCalls"
  },
  "config": {
    "expected": ["search_users", "send_email"],
    "forbidden": ["delete_user", "update_roles"],
    "minTools": 2,
    "maxTools": 4,
    "order": ["search_users", "send_email"],
    "validateArgs": {
      "search_users": {
        "limit": 10
      },
      "send_email": {
        "from": "support@company.com"
      }
    }
  },
  "output": "validation"
}

Real-World Examples

1. Calendar Agent

Test an AI agent that manages calendar events:

{
  "name": "Calendar Agent Tests",
  "context": {
    "AI_URL": "${env.OPENAI_API_URL}",
    "API_KEY": "${env.OPENAI_API_KEY}"
  },
  "tests": [{
    "id": "test-schedule-meeting",
    "pipeline": [
      {
        "id": "call-agent",
        "block": "HttpRequest",
        "input": {
          "url": "${AI_URL}/chat/completions",
          "method": "POST",
          "headers": {
            "Authorization": "Bearer ${API_KEY}",
            "Content-Type": "application/json"
          },
          "body": {
            "model": "gpt-4o-mini",
            "messages": [{
              "role": "user",
              "content": "Schedule a meeting with the engineering team tomorrow at 2pm for 1 hour"
            }],
            "tools": [
              {
                "type": "function",
                "function": {
                  "name": "search_calendar",
                  "description": "Search calendar for events",
                  "parameters": {
                    "type": "object",
                    "properties": {
                      "date": { "type": "string" }
                    }
                  }
                }
              },
              {
                "type": "function",
                "function": {
                  "name": "create_event",
                  "description": "Create a calendar event",
                  "parameters": {
                    "type": "object",
                    "properties": {
                      "title": { "type": "string" },
                      "date": { "type": "string" },
                      "time": { "type": "string" },
                      "duration": { "type": "number" }
                    }
                  }
                }
              }
            ]
          }
        },
        "output": "response"
      },
      {
        "id": "parse-response",
        "block": "JsonParser",
        "input": "${response.body}",
        "output": { "parsed": "data" }
      },
      {
        "id": "extract-tools",
        "block": "MockData",
        "config": {
          "data": "${data.choices[0].message.tool_calls}"
        },
        "output": "toolCallsRaw"
      },
      {
        "id": "validate-tools",
        "block": "ValidateTools",
        "input": {
          "from": "toolCallsRaw",
          "as": "toolCalls"
        },
        "config": {
          "expected": ["search_calendar", "create_event"],
          "order": ["search_calendar", "create_event"],
          "minTools": 2,
          "maxTools": 2
        },
        "output": "toolValidation"
      }
    ],
    "assertions": {
      "response.status": 200,
      "toolValidation.passed": true,
      "toolValidation.score": { "gte": 0.9 }
    }
  }]
}

2. Customer Service Agent (Safety First)

Prevent dangerous actions:

{
  "tests": [{
    "id": "test-safe-customer-service",
    "pipeline": [
      {
        "block": "HttpRequest",
        "input": {
          "url": "${AI_AGENT_URL}",
          "method": "POST",
          "body": {
            "message": "I want to delete my account",
            "available_tools": [
              "search_user",
              "update_user",
              "delete_account",
              "send_email"
            ]
          }
        },
        "output": "response"
      },
      {
        "block": "StreamParser",
        "input": "${response.body}",
        "output": { "toolCalls": "tools" }
      },
      {
        "block": "ValidateTools",
        "input": {
          "from": "tools",
          "as": "toolCalls"
        },
        "config": {
          "forbidden": [
            "delete_account",
            "delete_user",
            "drop_database"
          ]
        },
        "output": "safety"
      }
    ],
    "assertions": {
      "safety.passed": true
    }
  }]
}

Important: Agent should ask for confirmation or escalate, not immediately delete.

3. Database Agent (Order Matters)

Ensure proper authentication and query order:

{
  "tests": [{
    "id": "test-database-query-order",
    "pipeline": [
      {
        "block": "HttpRequest",
        "input": {
          "url": "${DB_AGENT_URL}",
          "body": {
            "query": "Get all users with premium subscription"
          }
        },
        "output": "response"
      },
      {
        "block": "StreamParser",
        "input": "${response.body}",
        "output": { "toolCalls": "tools" }
      },
      {
        "block": "ValidateTools",
        "input": {
          "from": "tools",
          "as": "toolCalls"
        },
        "config": {
          "order": [
            "authenticate",
            "connect_database",
            "execute_query"
          ],
          "forbidden": [
            "drop_table",
            "delete_database",
            "truncate_table"
          ]
        },
        "output": "validation"
      }
    ],
    "assertions": {
      "validation.passed": true
    }
  }]
}

4. Multi-Step API Agent

Test complex workflows:

{
  "tests": [{
    "id": "test-order-fulfillment-agent",
    "pipeline": [
      {
        "block": "HttpRequest",
        "input": {
          "url": "${AGENT_URL}",
          "body": {
            "task": "Process order #12345 and notify customer"
          }
        },
        "output": "response"
      },
      {
        "block": "StreamParser",
        "input": "${response.body}",
        "output": { "toolCalls": "tools" }
      },
      {
        "block": "ValidateTools",
        "input": {
          "from": "tools",
          "as": "toolCalls"
        },
        "config": {
          "expected": [
            "get_order",
            "check_inventory",
            "create_shipment",
            "send_notification"
          ],
          "order": [
            "get_order",
            "check_inventory",
            "create_shipment",
            "send_notification"
          ],
          "minTools": 4,
          "validateArgs": {
            "get_order": {
              "order_id": "12345"
            }
          }
        },
        "output": "validation"
      }
    ],
    "assertions": {
      "validation.passed": true,
      "validation.actualTools": { "contains": "send_notification" }
    }
  }]
}

Validation Output

ValidateTools returns detailed validation results:

{
  // Overall result
  passed: true,
  score: 0.95,

  // What was called
  actualTools: ["search_calendar", "create_event", "send_email"],
  expectedTools: ["search_calendar", "create_event"],

  // Failures (if any)
  failures: []  // Or: ["Missing tool: send_confirmation", "Wrong order"]
}

Combining with LLMJudge

Use ValidateTools for structure, LLMJudge for semantics:

{
  "pipeline": [
    {
      "block": "HttpRequest",
      "output": "response"
    },
    {
      "block": "StreamParser",
      "input": "${response.body}",
      "output": {
        "text": "aiMessage",
        "toolCalls": "tools"
      }
    },
    {
      "id": "check-tools",
      "block": "ValidateTools",
      "input": { "from": "tools", "as": "toolCalls" },
      "config": {
        "expected": ["search", "create"],
        "forbidden": ["delete"]
      },
      "output": "toolCheck"
    },
    {
      "id": "check-quality",
      "block": "LLMJudge",
      "input": {
        "text": "${aiMessage}",
        "toolCalls": "${tools}",
        "expected": {
          "expectedBehavior": "Confirms action with professional tone"
        }
      },
      "output": "qualityCheck"
    }
  ],
  "assertions": {
    "toolCheck.passed": true,
    "qualityCheck.score": { "gte": 0.8 }
  }
}

Best Practices

1. Always Use Forbidden List for Dangerous Tools

{
  "config": {
    "forbidden": [
      "delete_user",
      "drop_database",
      "execute_sql",
      "shutdown_system",
      "modify_permissions"
    ]
  }
}

Critical for production safety!

2. Validate Order for Multi-Step Workflows

{
  "config": {
    "order": [
      "authenticate",
      "validate_input",
      "process",
      "commit",
      "notify"
    ]
  }
}

Prevents data corruption and security issues.

3. Check Tool Arguments for Critical Operations

{
  "config": {
    "validateArgs": {
      "charge_payment": {
        "currency": "USD"  // Must be USD
      },
      "send_email": {
        "from": "noreply@company.com"  // Must be from company domain
      }
    }
  }
}

validateArgs only supports exact value matching, not operators like lte, gte, etc. For complex validations, use assertions on the actual values after extraction.

4. Set Tool Count Limits

{
  "config": {
    "minTools": 1,   // Ensure agent takes action
    "maxTools": 10   // Prevent infinite loops
  }
}

Catches stuck or runaway agents.

5. Test Edge Cases

Test what happens when:

Tools fail or timeout
Required tools are unavailable
User gives ambiguous instructions
Multiple valid tool sequences exist
Agent receives conflicting requirements

Debugging Failed Tool Calls

Problem: validation.passed is false

Step 1: Check actualTools

{
  "assertions": {
    "validation.actualTools": true  // Print what was actually called
  }
}

Step 2: Check failures

{
  failures: [
    "Missing expected tools: send_email",
    "Used forbidden tools: delete_user"
  ]
}

Step 3: Check tool arguments

{
  toolCalls: [
    {
      name: "search_database",
      args: {
        table: "customers",  // Expected "users"
        limit: 100           // Expected 10
      }
    }
  ]
}

Common Issues

Issue	Cause	Solution
Missing tools	AI didn’t call expected tool	Improve prompt, check tool descriptions
Wrong order	AI called tools out of sequence	Add `order` config, improve instructions
Forbidden tool used	AI called dangerous tool	Check forbidden list, improve safety prompts
Too many tools	Agent over-complicating	Set `maxTools`, simplify task
Wrong arguments	AI passed incorrect data	Add `validateArgs`, improve parameter descriptions

Integration with AI Providers

OpenAI Function Calling

{
  "body": {
    "model": "gpt-4o-mini",
    "messages": [...],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get current weather",
          "parameters": {
            "type": "object",
            "properties": {
              "location": { "type": "string" }
            }
          }
        }
      }
    ]
  }
}

Anthropic Tool Use

{
  "body": {
    "model": "claude-3-5-sonnet-20241022",
    "messages": [...],
    "tools": [
      {
        "name": "get_weather",
        "description": "Get current weather",
        "input_schema": {
          "type": "object",
          "properties": {
            "location": { "type": "string" }
          }
        }
      }
    ]
  }
}

Vercel AI SDK

// In your API (not in test)
import { generateText } from 'ai';

const result = await generateText({
  model: openai('gpt-4o-mini'),
  tools: {
    get_weather: {
      description: 'Get current weather',
      parameters: z.object({
        location: z.string()
      })
    }
  }
});

Then test with StreamParser:

{
  "block": "StreamParser",
  "config": { "format": "sse-vercel" }
}

Next Steps

Streaming Responses

Learn to parse and test SSE streams

Multi-Turn Conversations

Test conversational AI flows

ValidateTools Reference

Complete ValidateTools documentation

Calendar Agent Example

Full calendar agent test example

Get Started

Core Concepts

Blocks

Testing AI Systems

Advanced

What are Tool Calls?

Why Test Tool Calls?

Tool Call Format

Basic Validation

Check Expected Tools

Validation Options

1. Expected Tools

2. Forbidden Tools

3. Tool Count

4. Tool Order

5. Tool Arguments

Combined Validation

Real-World Examples

1. Calendar Agent

2. Customer Service Agent (Safety First)

3. Database Agent (Order Matters)

4. Multi-Step API Agent

Validation Output

Combining with LLMJudge

Best Practices

Debugging Failed Tool Calls

Problem: validation.passed is false

Common Issues

Integration with AI Providers

OpenAI Function Calling

Anthropic Tool Use

Vercel AI SDK

Next Steps

Streaming Responses

Multi-Turn Conversations

ValidateTools Reference

Calendar Agent Example

Get Started

Core Concepts

Blocks

Testing AI Systems

Advanced

​What are Tool Calls?

​Why Test Tool Calls?

​Tool Call Format

​Basic Validation

​Check Expected Tools

​Validation Options

​1. Expected Tools

​2. Forbidden Tools

​3. Tool Count

​4. Tool Order

​5. Tool Arguments

​Combined Validation

​Real-World Examples

​1. Calendar Agent

​2. Customer Service Agent (Safety First)

​3. Database Agent (Order Matters)

​4. Multi-Step API Agent

​Validation Output

​Combining with LLMJudge

​Best Practices

​Debugging Failed Tool Calls

​Problem: validation.passed is false

​Common Issues

​Integration with AI Providers

​OpenAI Function Calling

​Anthropic Tool Use

​Vercel AI SDK

​Next Steps

Streaming Responses

Multi-Turn Conversations

ValidateTools Reference

Calendar Agent Example

What are Tool Calls?

Why Test Tool Calls?

Tool Call Format

Basic Validation

Check Expected Tools

Validation Options

1. Expected Tools

2. Forbidden Tools

3. Tool Count

4. Tool Order

5. Tool Arguments

Combined Validation

Real-World Examples

1. Calendar Agent

2. Customer Service Agent (Safety First)

3. Database Agent (Order Matters)

4. Multi-Step API Agent

Validation Output

Combining with LLMJudge

Best Practices

Debugging Failed Tool Calls

Problem: validation.passed is false

Common Issues

Integration with AI Providers

OpenAI Function Calling

Anthropic Tool Use

Vercel AI SDK

Next Steps