Introduction

Why SemanticTest?

Testing AI systems is hard. Responses are non-deterministic, you need to validate tool usage, and semantic meaning matters more than exact text matching. SemanticTest solves this with:

Composable Blocks

Build complex test scenarios using simple, reusable blocks for HTTP, parsing, validation, and AI evaluation

Pipeline Architecture

Data flows through named slots, making tests readable and maintainable

LLM Judge

Evaluate responses semantically using AI instead of exact text matching

JSON Test Definitions

Version-controllable, readable test definitions that anyone can understand

Quick Example

Here’s a simple test that validates an API response semantically:

{
  "name": "User API Test",
  "tests": [{
    "id": "get-user",
    "pipeline": [
      {
        "id": "request",
        "block": "HttpRequest",
        "input": {
          "url": "https://api.example.com/users/1",
          "method": "GET"
        },
        "output": "response"
      },
      {
        "id": "judge",
        "block": "LLMJudge",
        "input": {
          "text": "${response.body}",
          "expected": {
            "expectedBehavior": "Should return user information with name and email"
          }
        },
        "output": "validation"
      }
    ],
    "assertions": {
      "response.status": 200,
      "validation.score": { "gt": 0.7 }
    }
  }]
}

What Makes It Different?

No More Fragile Exact Matching

Instead of exact text matching, SemanticTest uses AI to understand the meaning of responses. “2:00 PM”, “2 PM”, “14:00”, and “two in the afternoon” are all semantically equivalent.

Built for AI Systems

Test tool calls, streaming responses, multi-turn conversations, and non-deterministic outputs with confidence.

Composable & Extensible

Mix and match 8 built-in blocks or create your own custom blocks. Each block does one thing well.

No Vendor Lock-in

100% open source, runs locally, works with any LLM provider. You control your data and costs.

Use Cases

AI Agent Testing

Test AI agents that use tools, make decisions, and maintain conversations

API Testing

Traditional REST API testing with powerful validation and semantic checks

Streaming Responses

Parse and validate streaming SSE responses from OpenAI, Vercel AI SDK, and more

Integration Testing

Test complex workflows with multiple API calls, data transformations, and validations

Next Steps

Quickstart

Get started in 30 seconds

Core Concepts

Understand how SemanticTest works

Browse Blocks

Explore all available building blocks

View Examples

Learn from real-world examples

Quickstart

Get Started

Core Concepts

Blocks

Testing AI Systems

Advanced

Why SemanticTest?

Composable Blocks

Pipeline Architecture

LLM Judge

JSON Test Definitions

Quick Example

What Makes It Different?

Use Cases

AI Agent Testing

API Testing

Streaming Responses

Integration Testing

Next Steps

Quickstart

Core Concepts

Browse Blocks

View Examples

​Why SemanticTest?

Composable Blocks

Pipeline Architecture

LLM Judge

JSON Test Definitions

​Quick Example

​What Makes It Different?

​Use Cases

AI Agent Testing

API Testing

Streaming Responses

Integration Testing

​Next Steps

Quickstart

Core Concepts

Browse Blocks

View Examples

Why SemanticTest?

Quick Example

What Makes It Different?

Use Cases

Next Steps