The Challenge of Testing AI
Traditional API testing uses exact matching:- ✅ “Your meeting is set for Tuesday at 2:00 PM”
- ✅ “Done! Meeting scheduled for 14:00 on Tuesday”
- ✅ “I’ve booked your Tuesday 2pm slot”
- ❌ All fail with exact matching!
Why AI Testing is Different
1. Non-Deterministic Responses
Same input → different outputs every time:2. Semantic Equivalence
These mean the same thing:- “Cannot find user”
- “User not found”
- “That user doesn’t exist”
- “I couldn’t locate that user”
3. Tool/Function Calling
AI agents call tools unpredictably:4. Streaming Responses
Responses arrive as chunks over Server-Sent Events (SSE):SemanticTest’s Approach
SemanticTest provides 4 specialized approaches for AI testing:1. Semantic Validation (LLMJudge)
Use AI to judge AI responses:2. Tool Call Validation
Validate what tools AI calls and with what arguments:3. Streaming Response Parsing
Parse SSE streams from AI APIs:4. Multi-Turn Conversation Testing
Test conversational flows with context:Traditional vs Semantic Testing
- ❌ Traditional (Brittle)
- ✅ Semantic (Robust)
What You’ll Learn
Semantic Validation
Use LLMJudge to validate AI responses semantically instead of exact matches
Tool Call Validation
Test AI agents that call tools/functions with ValidateTools
Streaming Responses
Parse and test Server-Sent Events (SSE) streams from AI APIs
Multi-Turn Conversations
Test conversational AI with context and memory
Quick Start
1. Install SemanticTest
2. Set OpenAI API Key (for LLMJudge)
Optional: LLMJudge requires OpenAI API key. All other blocks work without it!
3. Create Your First AI Test
test.json
4. Run Tests
When to Use Each Approach
Use Case | Approach | Block |
---|---|---|
Validate response quality/meaning | Semantic Validation | LLMJudge |
Check exact tools called | Tool Validation | ValidateTools |
Parse streaming AI responses | Stream Parsing | StreamParser |
Test conversation flows | Multi-Turn Testing | LLMJudge + history |
Verify specific keywords | Traditional | ValidateContent |
Check response structure | Traditional | Assertions |
Best Practices
Combine Multiple Validation Approaches
Combine Multiple Validation Approaches
Use structural validation (ValidateTools) + semantic validation (LLMJudge):
Use Traditional Assertions Where Possible
Use Traditional Assertions Where Possible
Exact assertions are faster and cheaper:Reserve LLMJudge for truly non-deterministic content.
Set Clear Expected Behavior
Set Clear Expected Behavior
Be specific in LLMJudge expectations:
Test Edge Cases
Test Edge Cases
AI systems need edge case testing:
- Ambiguous user input
- Missing context
- Conflicting instructions
- Tool call failures
- Stream interruptions
Real-World Examples
AI Chat API
Complete example testing OpenAI chat API
Calendar Agent
Test an AI agent that manages calendar events
Basic API Test
Traditional API testing patterns
Error Handling
Handle errors and retries
Next Steps
Semantic Validation with LLMJudge
Start with semantic validation - the core of AI testing