Why AI APIs Use Streaming
Modern AI APIs stream responses using Server-Sent Events (SSE) instead of waiting for the complete response:Without Streaming
With Streaming
- ⚡ Faster perceived response time
- 📱 Better user experience (see response forming)
- 🔄 Can cancel long responses early
- 🛠️ Get tool calls before full response completes
Server-Sent Events (SSE) Format
AI APIs send responses as SSE streams:data:
line is a chunk. Combined: “Hello there!”
StreamParser Block
StreamParser extracts text and tool calls from SSE streams:Supported Formats
StreamParser supports multiple streaming formats:Format | Provider | Description |
---|---|---|
sse-openai | OpenAI | ChatGPT API, Azure OpenAI |
sse-vercel | Vercel AI SDK | Next.js AI applications |
sse | Generic | Standard SSE format |
text | Any | Plain text (no parsing) |
Basic Usage
Parse OpenAI Stream
Parse Vercel AI SDK Stream
Extracting Content
Text Only
Tool Calls Only
Both Text and Tools
Include Metadata
Real-World Examples
1. Test ChatGPT-Style Interface
2. Test Tool Calls in Stream
3. Test Vercel AI SDK Streaming
4. Test Partial Response Quality
Test response quality even if stream is cut short:Performance Considerations
Response Time
Streaming doesn’t make the total time faster, but improves perceived speed:Testing Performance
Error Handling
Incomplete Streams
Malformed SSE
StreamParser handles common issues:- Missing
data:
prefix - Invalid JSON in chunks
- Incomplete tool call objects
Timeout Handling
Combining with Validation Blocks
StreamParser → ValidateContent → LLMJudge
Streaming vs Non-Streaming
- Streaming
- Non-Streaming
Pros:
- Better UX (see response forming)
- Can cancel long responses
- Get tool calls early
- More complex to parse
- Harder to debug
- Can’t easily inspect full response
- User-facing chat interfaces
- Long-form content generation
- Real-time feedback needed
Best Practices
1. Always Check Parse Errors
1. Always Check Parse Errors
2. Set Appropriate Timeouts
2. Set Appropriate Timeouts
3. Validate Both Structure and Semantics
3. Validate Both Structure and Semantics
4. Test Edge Cases
4. Test Edge Cases
- Empty streams
- Incomplete streams (connection drops)
- Very long responses (10,000+ tokens)
- Multiple tool calls in one stream
- Mixed text and tool calls
5. Extract Only What You Need
5. Extract Only What You Need
Debugging Streams
Problem: Empty aiMessage
Check:- Is stream format correct?
- Look at raw
response.body
Problem: Tool calls not extracted
Check:- Correct format? (
sse-openai
vssse-vercel
) - Are tool calls in the response?
- Check
metadata.totalTools
Problem: Parse errors
Common causes:- Wrong format specified
- Malformed JSON in chunks
- Mixed stream formats