Skip to content

Conversation

@ochafik
Copy link
Collaborator

@ochafik ochafik commented Dec 24, 2025

Summary

This prototype PR adds MCP (Model Context Protocol) support to llama-server's web ui:

  • Tool calls for starters
  • Stdio transport only: the server manages subprocesses for the frontend, which connects to them through a WebSocket per conversation per server.
  • Tool calls & their results are displayed in same block w/ collapsed inputs & outputs

Note: not meant to merged as is, this will need deduping w/ any existing work (cc/ @allozaur ) / sending in reviewable chunks.

Screenshot 2025-12-24 at 00 33 25 Screenshot 2025-12-24 at 00 33 37 Screenshot 2025-12-24 at 00 33 44 Screenshot 2025-12-24 at 00 49 54 Screenshot 2025-12-24 at 00 51 38

Features

  • WebSocket server on HTTP port + 1 for real-time MCP communication
  • MCP bridge that spawns and manages MCP server subprocesses (stdio transport)
  • Frontend UI for MCP server management with tool exploration
  • Tool calling integration in chat completions with streaming support
  • Auto-reconnection with exponential backoff for resilience

New CLI Option

# Use default config location (~/.llama.cpp/mcp.json)
./llama-server -m model.gguf

# Or specify config path
./llama-server -m model.gguf --mcp-config /path/to/mcp.json

Configuration Example

{
  "mcpServers": {
    "brave-search": {
      "command": "npx",
      "args": ["-y", "@brave/brave-search-mcp-server", "--transport", "stdio"],
      "env": {
        "BRAVE_API_KEY": "... get your key at https://api.search.brave.com/app/keys ..."
      }
    },
    "python": {
      "command": "uvx",
      "args": ["mcp-run-python", "--deps", "numpy,pandas,pydantic,requests,httpx,sympy,aiohttp", "stdio"],
      "env": {}
    }
  }
}

Architecture

  • server-ws.cpp/h - WebSocket server implementation
  • server-mcp-bridge.cpp/h - Routes WebSocket connections to MCP subprocesses
  • server-mcp.h - MCP protocol type definitions
  • Uses sheredom/subprocess.h for cross-platform subprocess management
  • Frontend: MCP service, stores, and UI components

API Endpoints

  • GET /mcp/servers - List available MCP servers
  • WS /mcp?server=<name> - WebSocket connection (on HTTP port + 1)

Test plan

  • Unit tests added (tools/server/tests/unit/test_mcp.py)
  • Manual testing with @modelcontextprotocol/server-filesystem
  • Test tool calling in chat UI
  • Test connect/disconnect in MCP picker
  • Verify WebSocket reconnection after server restart

TODOs before undrafting

  • De-AI slopify this
  • Try and use same port for WS (not possible: httplib doesn't support WS)
  • Support more tool result types (image, audio, resources, resource links)

Possible follow ups:

  • More MCP features: resources, logging, remote servers...
  • MCP Apps

@github-actions github-actions bot added examples python python script changes server labels Dec 24, 2025
@ochafik ochafik changed the title server: add MCP (Model Context Protocol) support webui: add MCP (Model Context Protocol) support Dec 24, 2025
ochafik and others added 13 commits December 24, 2025 00:38
Add JSON-RPC 2.0 type definitions and MCP server configuration
structures for the Model Context Protocol implementation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add mcp_process class for spawning and managing MCP server subprocesses
with bidirectional stdio communication. Handles process lifecycle,
environment variables for unbuffered output, and cross-platform support.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add custom WebSocket server using raw sockets (no external library).
Implements RFC 6455 handshake, frame parsing, masking, and message
handling. Runs on HTTP port + 1 to avoid conflicts with httplib.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add server_mcp_bridge class that routes WebSocket messages to MCP
server subprocesses. Manages per-connection state, configuration
loading with hot-reload, and JSON-RPC 2.0 message forwarding.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Integrate MCP bridge and WebSocket server into main server:
- Add --mcp-config CLI argument for configuration path
- Add /mcp/servers and /mcp/ws-port HTTP endpoints
- Register WebSocket event handlers for MCP
- Update server-http to properly join thread on stop

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add TypeScript types for MCP protocol (JSON-RPC 2.0) and WebSocket
service for communicating with MCP servers:
- MCP types: tool definitions, JSON-RPC request/response/notification
- McpService: WebSocket client with auto-reconnect and request timeout
- API types: tool call interfaces for chat completions
- Vite config: proxy WebSocket connections to MCP port
- ESLint: allow underscore-prefixed unused args (common convention)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add reactive Svelte 5 stores for managing MCP state:
- mcpStore: Global MCP connection state, tool discovery, tool calling
- conversationMcpStore: Per-conversation MCP server enable/disable

Uses SvelteMap/SvelteSet for proper Svelte 5 reactivity.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add components for displaying MCP tool calls and results:
- ToolCallBlock: Collapsible display of tool call with arguments/results
- ToolResultDisplay: Format and render tool execution results
- tool-results.ts: Utility functions for parsing tool result messages

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add UI components for managing MCP server connections:
- ChatFormActionMcp: Server selector dropdown in chat input
- McpPanel: Full panel for viewing connected servers and tools

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Integrate MCP tool calling into the chat flow:
- chat.ts: Add tool parameter injection and MCP tool execution
- chat.svelte.ts: Track tool calls, results, and processing state
- ChatMessageAssistant: Display tool calls with status and duration
- ChatMessages: Build tool result map, filter tool result messages
- ChatScreen: Wire up tool result event handlers
- Add duration guard for negative timestamp differences

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add Python tests for MCP functionality:
- test_mcp_servers_endpoint: Test /mcp/servers HTTP endpoint
- test_mcp_ws_port_endpoint: Test /mcp/ws-port HTTP endpoint
- test_mcp_initialize_handshake: Test MCP JSON-RPC initialization
- test_mcp_tools_list: Test tools/list method
- test_mcp_tool_call: Test tools/call method

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add documentation and example configuration for MCP:
- README: Document MCP configuration, usage, and WebSocket port
- mcp_config.example.json: Example config with filesystem and brave-search
- Rebuild webui bundle with MCP support

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Force popover to open above (side="top") for consistent positioning
- Search input at bottom (flips based on popover position)
- Small solid dots for connection status (green/gray)
- Hover row to reveal connect/disconnect action icons
- Remove Connect All/Disconnect All footer buttons
- Fix double X button in search input (hide native WebKit clear)
- Add tooltips for status and actions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
ochafik and others added 2 commits December 24, 2025 00:44
Don't show "Streaming..." status while arguments are being streamed.
Only show "Calling tool..." when actually waiting for MCP server response.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Reorder assistant message layout so tool call blocks appear
before the model badge and statistics.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
ochafik and others added 2 commits December 24, 2025 01:15
- Remove unused parameter names from MCP HTTP lambda handlers
- Remove conditional websocket import (it's a required dependency)

Fixes unused-parameter warning and pyright type-check errors.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Adds optional "cwd" field to mcp.json server configurations to set the
working directory for stdio MCP servers.

- Add cwd field to mcp_server_config struct
- Unix: call chdir() before execvp() in child process
- Windows: pass lpCurrentDirectory to CreateProcessA()
- Update mcp_config.example.json with usage example

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@ochafik ochafik changed the title webui: add MCP (Model Context Protocol) support webui: simple MCP (Model Context Protocol) support (stdio, tool calls) Dec 24, 2025
@ngxson
Copy link
Collaborator

ngxson commented Dec 24, 2025

I think @allozaur already working on the same feature (not sure if there is a tracking issue somewhere), probably better to avoid duplicated works

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to note that we generally don't review / merge this much backend code at once because it's too risky in term of security. There was recently many instances of data race in the code that even AI can't pick up.

It's recommended to ship backend feature one-by-one, in separated PRs. Or even better, have first version to work without stdio, and add it at later stage.

ochafik and others added 3 commits December 24, 2025 11:39
- Add @modelcontextprotocol/sdk dependency for proper type support
- Update tool-results.ts to use CallToolResult from SDK (content array)
- Handle all MCP content types: text, image, audio, resource, resource_link
- Update components to use parsedResult.content instead of .items
- Store images/audio in extra field for LLM consumption
- Display images as expandable thumbnails, audio with controls, resources as links

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
ochafik and others added 3 commits December 24, 2025 12:30
Backend changes:
- Add --webui-mcp CLI flag to conditionally enable WebSocket/MCP support
- Remove /mcp/ws-port endpoint (client now assumes port+1)
- Remove superfluous handle_initialize function
- Remove unused MCP types (mcp_tool, mcp_tool_call, mcp_methods)
- Conditional init of WebSocket server and MCP bridge

Frontend changes:
- Refactor McpService to use official MCP SDK (Client + WebSocketTransport)
- Remove /mcp/ws-port fetching, assume HTTP port + 1
- Hide MCP badge/dropdown when /mcp/servers returns 404 (MCP disabled)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
The MCP SDK's WebSocketClientTransport uses the 'mcp' subprotocol when
connecting. Per the WebSocket spec, when a client requests a subprotocol,
the server must echo it back in the handshake response.

This fix:
- Parses the Sec-WebSocket-Protocol header from client handshake
- Echoes back the accepted protocol in the 101 response
- Uses string_starts_with for safer header matching

Without this, browsers reject the WebSocket connection because the
handshake doesn't properly negotiate the requested subprotocol.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is what subprocess.h already doing

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to it, thanks

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe SEE or streamable HTTP will be a more suitable choice if we actually (and seriously) plan to implement this into server, it can be supported with the existing httplib

A custom WS server implementation sounds good on surface, but eventually I don't think we can trust AI to build a HTTP server in C++

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll explore shttp as first step, makes sense.

Note that streamable HTTP / SSE come w/ challenges:

  • if called directly from the FE, they need proper cors headers, which many I've tried don't seem to have (e.g. Access-Control-Allow-Headers: content-type, mcp-protocol-version and Access-Control-Expose-Headers: Mcp-Session-Id)
  • if proxied by server, we might run into issues w/ auth, tbc (e.g. confused deputy problem)

I don't think we can trust AI to build a HTTP server in C++

Yeah only started deslopifying this PR. Even if we don't end up merging this it's an interesting journey :-)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if called directly from the FE, they need proper cors headers, which many I've tried don't seem to have

Currently, the WS component has only one job is to proxy stdin/out/err through WS. If it's all done in localhost, why do we need CORS? (I assume because you're running as 2 separated ports?)

The SSE / streamable HTTP impl will be very simple as it allow exposing stdio proxy as a simple API endpoint instead of a new port. This will be important for users who already using llama.cpp via a reverse proxy and may not be able to open a new public port.

if proxied by server, we might run into issues w/ auth, tbc (e.g. confused deputy problem)

What kind of auth we are talking about? I may missed something, but llama-server only support API key auth via Authorization header atm. I don't think we support multi-tenant config that can lead to the "confused deputy problem" you mentioned.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, just to clarify my intention about SSE / streamable HTTP: what I mean is that we only use them as the transport layer, meaning its only purpose is to deliver bytes from one place to another (in this case between stdio <--> frontend)

In this point of view, we don't need to be 100% compliant with the MCP specs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need CORS

@ngxson That's for the remote MCP endpoints to allow / behave well w/ requests received from web clients, which they are not necessarily tested with (the TypeScript MCP SDK examples don't show how to set those headers for instance)

What kind of auth we are talking about?

MCP auth, to access remote servers that are authenticated w/ oauth. Transparent proxying won't work well (if only because of oauth resource validation)

Copy link
Collaborator

@ngxson ngxson Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you still misunderstood my intention. I don't mean we will directly the SSE specs defined by MCP that requires auth and/or specific headers. We don't either need auth because it's a local server, not a remote server as you mentioned.

We will use it only to replace what WS currently does. For your use case, we need WS for bidirectional communication between browser and backend. We can just replace it with a simple SSE implementation for server --> front direction, and simple POST request for the front --> back direction. This is essentially what facebook messenger has been doing before WS was a standard.

I'll make a small PoC on how that works a bit later, but won't be too complicated as we already had all the necessary components the existing c++ code base.

Copy link
Collaborator

@ngxson ngxson Dec 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implement my idea in ochafik#5

TL;DR:

  • Completely removed websocket
  • Replaced with SSE + HTTP POST request as mentioned earlier
  • Transport served under /mcp endpoint (GET + POST)
  • Use a custom transport class on webui to bypass the auth logic - we are running on the same host with the webui, not an external server, there is no need for CORS or auth
  • The code can be reduced even further

Benefits are:

  • Users already serving llama.cpp via reverse proxy have don't have to change anything - it just works
  • SSL is automatically handled by httplib
  • C++ code is much simpler than websocket

Note: In case we need to use an external host with SSE / streamable HTTP in the future, just need to extend from the revered proxy component in server-models.cpp

ochafik and others added 20 commits December 24, 2025 13:10
Security hardening for server-ws.cpp:
- Add payload size limit (10MB) to prevent DoS via huge allocations
- Add message buffer limit (100MB) for fragmented messages
- Add receive buffer limit (16MB)
- Add connection limit (1000) to prevent thread exhaustion
- Add socket timeout (30s) to prevent slow-loris attacks
- Validate RSV bits per RFC 6455
- Enforce client frame masking per RFC 6455
- Limit PONG response size to 125 bytes per spec

Code simplification:
- Remove dead code in server-mcp-bridge.cpp (if/else did same thing)
- Remove unused JSON-RPC types from server-mcp.h (~100 lines)
- Use SDK Tool type directly instead of custom McpTool
- Remove convertToolToMcpTool conversion function
- Clean up unused FE types in mcp.ts (~90 lines)

Net result: -172 lines while adding comprehensive security protections

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
WebSocket port is always HTTP port + 1, no need for endpoint.

- Remove /mcp/ws-port from README, tests, vite proxy
- Simplify mcpServiceFactory to sync (no fetch needed)
- Add PR.md with updated description

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Default WS_MAX_CONNECTIONS reduced from 1000 to 10
- Configurable via LLAMA_WS_MAX_CONNECTIONS env var
- Makes connection limit tests lighter

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Analysis: subprocess.h lacks cwd support, has blocking async reads,
and no graceful shutdown (SIGTERM→wait→SIGKILL pattern).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Use vendor/sheredom/subprocess.h instead of custom process management
- Remove server-mproc.cpp (~600 lines) and server-mproc.h (~80 lines)
- Drop cwd support from mcp_server_config (subprocess.h limitation)
- Remove PYTHONUNBUFFERED auto-injection (user responsibility now)

Per @ngxson feedback to reuse existing subprocess.h.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Fix shutdown deadlock: terminate process before joining read thread
- Add 10MB line buffer limit to prevent OOM from malicious MCP servers

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Fix MCP connection leak: null transport/client on close so
  isConnected() returns false, preventing duplicate connections
- Remove 106 lines of reimplemented SHA1/base64, use existing:
  - sha1/sha1.h from examples/gguf-hash/deps
  - base64.hpp from common/

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Fix mutex deadlock in MCP bridge that blocked new connections:
  - on_connection_message: release mutex before forward_to_mcp()
    (process startup can take 3+ seconds)
  - on_connection_closed: release mutex before destroying state
    (subprocess destructor joins thread)

- Fix "Processing..." appearing for all messages instead of just
  the one being generated (use !message.timings check)

- Reduce WebSocket socket timeout from 30s to 5s for faster
  disconnect detection

- Add MCP WebSocket test script (mcp-ws-test.mjs)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add SKIP_MODEL_PRELOAD env var to conftest.py for tests using local models
- Add webui_mcp option to ServerProcess class in utils.py
- Fix test_mcp.py to use local model instead of HuggingFace download
- Fix Python echo script to ignore JSON-RPC notifications (no response for notifications)
- Fix test_mcp_servers_with_config assertion (API returns list of objects not strings)
- Fix test_websocket_connection_without_server_param to handle empty response
- Fix test_websocket_connection_invalid_server to send message first

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add mcp_config option to ServerProcess for --mcp-config flag
- Refactor tests to use server.mcp_config instead of env vars
- Add get_env_vars tool to echo server for security testing
- Add TestMcpEnvVarFiltering test class to verify:
  - Secret env vars (API keys, passwords) are NOT passed to subprocess
  - Allowed env vars (HOME, PATH, etc.) ARE inherited
  - Config-specified env vars ARE passed through
- All 8 MCP tests now pass

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Move the 113-line inline Python script to a proper file at
tools/server/tests/fixtures/mcp_echo_server.py

Benefits:
- Syntax highlighting and IDE support
- Can be linted/type-checked independently
- Serves as documentation for MCP server implementation
- Cleaner test code with simpler fixtures
- More visible in git history

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add detailed logging at each level of MCP tool call chain:
- chat.svelte.ts: executeToolCalls logs tool calls being processed
- mcp.svelte.ts: callTool logs before/after service calls
- mcp.ts: McpService.callTool logs timing and responses

This helps debug issues where tool calls appear stuck.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add mcp>=1.0.0 and pytest-asyncio to test requirements
- Update TestMcpJsonRpcProtocol to use MCP SDK's ClientSession
- Update TestMcpEnvVarFiltering to use MCP SDK
- Keep raw websocket tests for edge cases (TestMcpJsonRpcProtocolRaw)
- Configure pytest-asyncio in pytest.ini

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
The 5-second socket timeout was causing WebSocket connections to close
during long LLM prompt processing or slow MCP tool calls. Increased to
5 minutes to accommodate:
- Long-running MCP tools (web searches, API calls)
- LLM prompt processing that can take 10+ seconds

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Remove auto-connect logging (check isConnected before connecting)
- Simplify connection logs (only log success and errors)
- Clean up tool call logging (only log completion and failures)
- Remove redundant debug logs throughout MCP code

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@ochafik
Copy link
Collaborator Author

ochafik commented Dec 24, 2025

@allozaur feel free to reuse any parts of this PR. I've updated / fixed the most obvious issues, switched to the TS MCP SDK (+ added tests that use the Python MCP SDK), using existing helpers where possible (sha1, base64, subprocess.h), etc.

And Merry Christmas 🎄!

@allozaur
Copy link
Collaborator

@allozaur feel free to reuse any parts of this PR. I've updated / fixed the most obvious issues, switched to the TS MCP SDK (+ added tests that use the Python MCP SDK), using existing helpers where possible (sha1, base64, subprocess.h), etc.

And Merry Christmas 🎄!

Thanks, merry Xmas!

Also one note from me just for now is that I don't think we should introduce MCP implementation on the server-side in the first release.

Client-side implementation purely in WebUI is the least invasive and KISS approach that we can have from the beginning. After that we can of course iterate.

- Add shared auth validation helpers in server-common:
  - validate_auth_header() with constant-time comparison
  - extract_api_key_from_auth_header() for "Bearer " prefix handling
  - Uses XOR-based constant-time compare to prevent timing attacks
- Update HTTP server to use shared validation helper
- Add WebSocket authentication during handshake:
  - Validates Authorization header against configured API keys
  - Returns 401 with JSON error response on auth failure
  - Supports same "Bearer <token>" format as HTTP endpoints
- Add WebSocket authentication tests in test_security.py

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@ochafik
Copy link
Collaborator Author

ochafik commented Dec 24, 2025

Also one note from me just for now is that I don't think we should introduce MCP implementation on the server-side in the first release.

@allozaur Agree, although the only server-side bit in this PR is the websocket -> stdio passthrough transport, which is at the (core or) edge of the MCP standard w/ no knowledge of what it's transporting (just lines of text both ways). All the protocol level handling (knowledge of MCP messages) is still only on FE side.

If and when we want server-side MCP support, I think we should implement OpenAI's Messages API, or maybe their Realtime API (allowing bits of agent-loops to happen on the server, which would have larger consequences in terms of threading model, etc)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples python python script changes server

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants