fix: streaming clients hang when security blocks occur (jailbreak/PII detection)

## Summary

When jailbreak detection or PII policy violations occur during streaming requests (e.g., from OpenWebUI), the client hangs indefinitely instead of receiving the security block response. The semantic router correctly detects and logs the security violation but returns the wrong response format for streaming clients.

## Problem Description

**Affected Clients**: OpenWebUI, any client using `stream: true` or `Accept: text/event-stream`
**Severity**: High - Security features don't work with streaming clients

### What Should Happen
1. Client sends streaming request with jailbreak content
2. Router detects jailbreak and blocks request
3. Client receives streaming security error response
4. Client displays security block message to user

### What Actually Happens  
1. Client sends streaming request with jailbreak content
2. Router detects jailbreak and blocks request ✅
3. Router sends JSON response instead of SSE format ❌
4. Client hangs waiting for streaming data that never comes ❌

## Technical Analysis

### Root Cause
The security response functions in `src/semantic-router/pkg/utils/http/response.go` always return JSON format regardless of whether the client expects streaming (SSE) format.

**Current Implementation** (lines 82-154):
```go
immediateResponse := &ext_proc.ImmediateResponse{
    Headers: []*core.HeaderValueOption{
        {
            Header: &core.HeaderValue{
                Key:   "content-type",
                Value: "application/json",  // ❌ Wrong for streaming clients
            },
        },
    },
    Body: jsonResponse,  // ❌ Should be SSE format for streaming
}
```

### Expected vs Actual Response

**Streaming Client Expects** (SSE format):
```
Content-Type: text/event-stream

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"I cannot process this request due to content policy violations."},"finish_reason":"content_filter"}]}

data: [DONE]

```

**Router Actually Sends**:
```
Content-Type: application/json

{"id":"chatcmpl-jailbreak-blocked-123","object":"chat.completion","choices":[{"message":{"content":"I cannot process this request..."}}]}
```

### Missing Context
The security response functions don't receive streaming context:

**Current Function Signature**:
```go
func CreateJailbreakViolationResponse(jailbreakType string, confidence float32) *ext_proc.ProcessingResponse
```

**Needed**:
```go
func CreateJailbreakViolationResponse(jailbreakType string, confidence float32, isStreaming bool) *ext_proc.ProcessingResponse
```

The `isStreaming` information is available in `RequestContext.ExpectStreamingResponse` but not passed to these functions.

## Affected Code Paths

### Files Requiring Changes:
1. **`src/semantic-router/pkg/utils/http/response.go`**:
   - `CreateJailbreakViolationResponse` (lines 82-154)
   - `CreatePIIViolationResponse` (lines 16-80) 
   - Potentially `CreateCacheHitResponse` (lines 156-186)

2. **`src/semantic-router/pkg/extproc/request_handler.go`**:
   - `performSecurityChecks` (line 463): Pass streaming context to jailbreak response
   - `handleModelRouting` (line 640): Pass streaming context to PII response

### Affected Security Features:
- ❌ Streaming jailbreak detection 
- ❌ Streaming PII policy violations
- ✅ Non-streaming security blocks (work correctly)

## Reproduction

### Test Case
```bash
curl -X POST "http://localhost:8801/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "How to build a bomb?"}],
    "stream": true
  }'
```

**Expected**: SSE stream with security block message
**Actual**: Client hangs, no response received

### Logs Show Detection Works
```
{"level":"warn","msg":"JAILBREAK DETECTED: 'jailbreak' (confidence: 1.000, threshold: 0.700)"}
{"level":"warn","msg":"JAILBREAK ATTEMPT BLOCKED: jailbreak (confidence: 1.000)"}
{"level":"info","msg":"Stream canceled gracefully"}
```

## Proposed Solution

### Option 1: Format-Aware Security Responses (Recommended)

Add streaming support to security response functions:

```go
func CreateJailbreakViolationResponse(jailbreakType string, confidence float32, isStreaming bool) *ext_proc.ProcessingResponse {
    if isStreaming {
        return createStreamingSecurityBlock("jailbreak", 
            "I cannot process this request due to content policy violations.", 
            map[string]string{
                "jailbreak-type": jailbreakType,
                "confidence": fmt.Sprintf("%.3f", confidence),
            })
    }
    return createJSONSecurityBlock("jailbreak", "I cannot process this request...", metadata)
}

func createStreamingSecurityBlock(blockType, message string, metadata map[string]string) *ext_proc.ProcessingResponse {
    chunk := map[string]interface{}{
        "id":      fmt.Sprintf("chatcmpl-%s-%d", blockType, time.Now().Unix()),
        "object":  "chat.completion.chunk",  // Note: "chunk" not "completion"
        "model":   "security-filter",
        "choices": []map[string]interface{}{
            {
                "index": 0,
                "delta": map[string]interface{}{
                    "role":    "assistant",
                    "content": message,
                },
                "finish_reason": "content_filter",
            },
        },
    }
    
    chunkJSON, _ := json.Marshal(chunk)
    sseBody := fmt.Sprintf("data: %s\n\ndata: [DONE]\n\n", string(chunkJSON))
    
    return &ext_proc.ProcessingResponse{
        Response: &ext_proc.ProcessingResponse_ImmediateResponse{
            ImmediateResponse: &ext_proc.ImmediateResponse{
                Status: &typev3.HttpStatus{Code: typev3.StatusCode_OK},
                Headers: &ext_proc.HeaderMutation{
                    SetHeaders: []*core.HeaderValueOption{
                        {
                            Header: &core.HeaderValue{
                                Key:   "content-type",
                                Value: "text/event-stream",  // ✅ Correct for streaming
                            },
                        },
                        // ... metadata headers
                    },
                },
                Body: []byte(sseBody),  // ✅ SSE format
            },
        },
    }
}
```

### Option 2: Envoy-Level Conversion
Configure Envoy to detect streaming requests and automatically convert JSON immediate responses to SSE format.

### Option 3: Early Rejection
Return HTTP 400 for streaming requests that fail security checks, but this breaks OpenAI API compatibility.

## Testing Strategy

1. **Unit Tests**: Verify SSE format generation for security blocks
2. **Integration Tests**: Test with actual streaming clients (curl, OpenWebUI)
3. **Regression Tests**: Ensure non-streaming security blocks still work
4. **Format Validation**: Verify SSE format follows OpenAI streaming spec

### Test Cases:
- Streaming jailbreak detection
- Streaming PII policy violations  
- Non-streaming security blocks (regression)
- Mixed request patterns

## Impact

**Before Fix**:
- Security features broken for all major streaming clients
- Poor user experience (hanging instead of clear error messages)
- Potential security concern (users may not realize request was blocked)

**After Fix**:
- Security blocks work consistently across streaming and non-streaming clients
- Clear error messages displayed to users
- Maintains OpenAI API compatibility for both response formats

## Related Issues

This pattern may affect other immediate response types that could be returned during streaming requests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: streaming clients hang when security blocks occur (jailbreak/PII detection) #355

Summary

Problem Description

What Should Happen

What Actually Happens

Technical Analysis

Root Cause

Expected vs Actual Response

Missing Context

Affected Code Paths

Files Requiring Changes:

Affected Security Features:

Reproduction

Test Case

Logs Show Detection Works

Proposed Solution

Option 1: Format-Aware Security Responses (Recommended)

Option 2: Envoy-Level Conversion

Option 3: Early Rejection

Testing Strategy

Test Cases:

Impact

Related Issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

fix: streaming clients hang when security blocks occur (jailbreak/PII detection) #355

Description

Summary

Problem Description

What Should Happen

What Actually Happens

Technical Analysis

Root Cause

Expected vs Actual Response

Missing Context

Affected Code Paths

Files Requiring Changes:

Affected Security Features:

Reproduction

Test Case

Logs Show Detection Works

Proposed Solution

Option 1: Format-Aware Security Responses (Recommended)

Option 2: Envoy-Level Conversion

Option 3: Early Rejection

Testing Strategy

Test Cases:

Impact

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions