Skip to content

[Feature] Make SSE Keepalive Events Configurable #690

@crivetimihai

Description

@crivetimihai

Make SSE Keepalive Events Configurable

Summary

Currently, SSE keepalive events are hardcoded to be sent every 30 seconds and cannot be disabled. This should be configurable via environment variables to allow users to adjust the behavior based on their infrastructure needs.

Current Behavior

  • Keepalive events are sent immediately upon SSE connection
  • Additional keepalive events are sent every 30 seconds during idle periods
  • This interval is hardcoded in:
    • mcpgateway/transports/sse_transport.py (line 362): timeout=30.0
    • mcpgateway/translate.py (line 81): KEEP_ALIVE_INTERVAL = 30
  • Keepalive events cannot be disabled

Proposed Changes

1. Add Configuration Options to .env.example

#####################################
# Transport Configuration
#####################################

# ... existing settings ...

# SSE client retry timeout (milliseconds)
SSE_RETRY_TIMEOUT=5000

# Enable SSE keepalive events (true/false)
# Set to false to disable keepalive events completely
SSE_KEEPALIVE_ENABLED=true

# SSE keepalive interval (seconds)
# How often to send keepalive events during idle periods
# Common values: 30 (default), 60, 120
SSE_KEEPALIVE_INTERVAL=30

2. Update mcpgateway/config.py

Add the new settings to the Settings class:

class Settings(BaseSettings):
    # ... existing settings ...
    
    # Transport
    transport_type: str = "all"  # http, ws, sse, all
    websocket_ping_interval: int = 30  # seconds
    sse_retry_timeout: int = 5000  # milliseconds
    sse_keepalive_enabled: bool = True  # Enable SSE keepalive events
    sse_keepalive_interval: int = 30  # seconds between keepalive events

3. Update mcpgateway/transports/sse_transport.py

Modify the create_sse_response method to use the configuration:

async def create_sse_response(self, _request: Request) -> EventSourceResponse:
    # ... existing code ...
    
    async def event_generator():
        # Send the endpoint event first
        yield {
            "event": "endpoint",
            "data": endpoint_url,
            "retry": settings.sse_retry_timeout,
        }
        
        # Send keepalive immediately if enabled
        if settings.sse_keepalive_enabled:
            yield {
                "event": "keepalive",
                "data": "{}",
                "retry": settings.sse_retry_timeout,
            }
        
        try:
            while not self._client_gone.is_set():
                try:
                    # Use configured timeout or None if keepalives disabled
                    timeout = settings.sse_keepalive_interval if settings.sse_keepalive_enabled else None
                    message = await asyncio.wait_for(
                        self._message_queue.get(),
                        timeout=timeout
                    )
                    # ... send message ...
                except asyncio.TimeoutError:
                    if settings.sse_keepalive_enabled:
                        # Send keepalive on timeout
                        yield {
                            "event": "keepalive",
                            "data": "{}",
                            "retry": settings.sse_retry_timeout,
                        }
                    else:
                        # Continue waiting if keepalives disabled
                        continue

4. Update mcpgateway/translate.py

Use the configuration setting as the default:

from mcpgateway.config import settings

KEEP_ALIVE_INTERVAL = settings.sse_keepalive_interval if hasattr(settings, 'sse_keepalive_interval') else 30

Benefits

  1. Flexibility: Users can adjust keepalive interval based on their infrastructure (proxies, load balancers)
  2. Compatibility: Users experiencing issues with clients that don't handle keepalive events can disable them
  3. Performance: Users with stable direct connections can increase the interval or disable keepalives to reduce overhead
  4. Backward Compatible: Default values maintain current behavior

Use Cases

  • Cloud deployments: Adjust to match cloud provider timeout settings (AWS ALB: 60s, Azure: 4min)
  • Direct connections: Disable keepalives when not using proxies/load balancers
  • Client compatibility: Disable if using clients that don't handle unknown SSE event types
  • Long-running operations: Increase interval for tools that take >30s to execute

Testing

  1. Verify default behavior (keepalives enabled, 30s interval)
  2. Test with keepalives disabled (SSE_KEEPALIVE_ENABLED=false)
  3. Test with custom interval (SSE_KEEPALIVE_INTERVAL=60)
  4. Ensure changes work in both main gateway and translate module

Related Issues

Implementation Notes

  • Consider warning in logs if keepalives are disabled (potential timeout risk)
  • Document the security/reliability implications of disabling keepalives
  • Update tests to cover the new configuration options

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions