-
Notifications
You must be signed in to change notification settings - Fork 14.2k
webui: simple MCP (Model Context Protocol) support (stdio, tool calls) #18334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Add JSON-RPC 2.0 type definitions and MCP server configuration structures for the Model Context Protocol implementation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add mcp_process class for spawning and managing MCP server subprocesses with bidirectional stdio communication. Handles process lifecycle, environment variables for unbuffered output, and cross-platform support. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add custom WebSocket server using raw sockets (no external library). Implements RFC 6455 handshake, frame parsing, masking, and message handling. Runs on HTTP port + 1 to avoid conflicts with httplib. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add server_mcp_bridge class that routes WebSocket messages to MCP server subprocesses. Manages per-connection state, configuration loading with hot-reload, and JSON-RPC 2.0 message forwarding. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Integrate MCP bridge and WebSocket server into main server: - Add --mcp-config CLI argument for configuration path - Add /mcp/servers and /mcp/ws-port HTTP endpoints - Register WebSocket event handlers for MCP - Update server-http to properly join thread on stop 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add TypeScript types for MCP protocol (JSON-RPC 2.0) and WebSocket service for communicating with MCP servers: - MCP types: tool definitions, JSON-RPC request/response/notification - McpService: WebSocket client with auto-reconnect and request timeout - API types: tool call interfaces for chat completions - Vite config: proxy WebSocket connections to MCP port - ESLint: allow underscore-prefixed unused args (common convention) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add reactive Svelte 5 stores for managing MCP state: - mcpStore: Global MCP connection state, tool discovery, tool calling - conversationMcpStore: Per-conversation MCP server enable/disable Uses SvelteMap/SvelteSet for proper Svelte 5 reactivity. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add components for displaying MCP tool calls and results: - ToolCallBlock: Collapsible display of tool call with arguments/results - ToolResultDisplay: Format and render tool execution results - tool-results.ts: Utility functions for parsing tool result messages 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add UI components for managing MCP server connections: - ChatFormActionMcp: Server selector dropdown in chat input - McpPanel: Full panel for viewing connected servers and tools 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Integrate MCP tool calling into the chat flow: - chat.ts: Add tool parameter injection and MCP tool execution - chat.svelte.ts: Track tool calls, results, and processing state - ChatMessageAssistant: Display tool calls with status and duration - ChatMessages: Build tool result map, filter tool result messages - ChatScreen: Wire up tool result event handlers - Add duration guard for negative timestamp differences 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add Python tests for MCP functionality: - test_mcp_servers_endpoint: Test /mcp/servers HTTP endpoint - test_mcp_ws_port_endpoint: Test /mcp/ws-port HTTP endpoint - test_mcp_initialize_handshake: Test MCP JSON-RPC initialization - test_mcp_tools_list: Test tools/list method - test_mcp_tool_call: Test tools/call method 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add documentation and example configuration for MCP: - README: Document MCP configuration, usage, and WebSocket port - mcp_config.example.json: Example config with filesystem and brave-search - Rebuild webui bundle with MCP support 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Force popover to open above (side="top") for consistent positioning - Search input at bottom (flips based on popover position) - Small solid dots for connection status (green/gray) - Hover row to reveal connect/disconnect action icons - Remove Connect All/Disconnect All footer buttons - Fix double X button in search input (hide native WebKit clear) - Add tooltips for status and actions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Don't show "Streaming..." status while arguments are being streamed. Only show "Calling tool..." when actually waiting for MCP server response. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Reorder assistant message layout so tool call blocks appear before the model badge and statistics. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Remove unused parameter names from MCP HTTP lambda handlers - Remove conditional websocket import (it's a required dependency) Fixes unused-parameter warning and pyright type-check errors. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Adds optional "cwd" field to mcp.json server configurations to set the working directory for stdio MCP servers. - Add cwd field to mcp_server_config struct - Unix: call chdir() before execvp() in child process - Windows: pass lpCurrentDirectory to CreateProcessA() - Update mcp_config.example.json with usage example 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
|
I think @allozaur already working on the same feature (not sure if there is a tracking issue somewhere), probably better to avoid duplicated works |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to note that we generally don't review / merge this much backend code at once because it's too risky in term of security. There was recently many instances of data race in the code that even AI can't pick up.
It's recommended to ship backend feature one-by-one, in separated PRs. Or even better, have first version to work without stdio, and add it at later stage.
- Add @modelcontextprotocol/sdk dependency for proper type support - Update tool-results.ts to use CallToolResult from SDK (content array) - Handle all MCP content types: text, image, audio, resource, resource_link - Update components to use parsedResult.content instead of .items - Store images/audio in extra field for LLM consumption - Display images as expandable thumbnails, audio with controls, resources as links 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Backend changes: - Add --webui-mcp CLI flag to conditionally enable WebSocket/MCP support - Remove /mcp/ws-port endpoint (client now assumes port+1) - Remove superfluous handle_initialize function - Remove unused MCP types (mcp_tool, mcp_tool_call, mcp_methods) - Conditional init of WebSocket server and MCP bridge Frontend changes: - Refactor McpService to use official MCP SDK (Client + WebSocketTransport) - Remove /mcp/ws-port fetching, assume HTTP port + 1 - Hide MCP badge/dropdown when /mcp/servers returns 404 (MCP disabled) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
The MCP SDK's WebSocketClientTransport uses the 'mcp' subprotocol when connecting. Per the WebSocket spec, when a client requests a subprotocol, the server must echo it back in the handshake response. This fix: - Parses the Sec-WebSocket-Protocol header from client handshake - Echoes back the accepted protocol in the 101 response - Uses string_starts_with for safer header matching Without this, browsers reject the WebSocket connection because the handshake doesn't properly negotiate the requested subprotocol. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
tools/server/server-mproc.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is what subprocess.h already doing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to it, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe SEE or streamable HTTP will be a more suitable choice if we actually (and seriously) plan to implement this into server, it can be supported with the existing httplib
A custom WS server implementation sounds good on surface, but eventually I don't think we can trust AI to build a HTTP server in C++
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll explore shttp as first step, makes sense.
Note that streamable HTTP / SSE come w/ challenges:
- if called directly from the FE, they need proper cors headers, which many I've tried don't seem to have (e.g.
Access-Control-Allow-Headers: content-type, mcp-protocol-versionandAccess-Control-Expose-Headers: Mcp-Session-Id) - if proxied by server, we might run into issues w/ auth, tbc (e.g. confused deputy problem)
I don't think we can trust AI to build a HTTP server in C++
Yeah only started deslopifying this PR. Even if we don't end up merging this it's an interesting journey :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if called directly from the FE, they need proper cors headers, which many I've tried don't seem to have
Currently, the WS component has only one job is to proxy stdin/out/err through WS. If it's all done in localhost, why do we need CORS? (I assume because you're running as 2 separated ports?)
The SSE / streamable HTTP impl will be very simple as it allow exposing stdio proxy as a simple API endpoint instead of a new port. This will be important for users who already using llama.cpp via a reverse proxy and may not be able to open a new public port.
if proxied by server, we might run into issues w/ auth, tbc (e.g. confused deputy problem)
What kind of auth we are talking about? I may missed something, but llama-server only support API key auth via Authorization header atm. I don't think we support multi-tenant config that can lead to the "confused deputy problem" you mentioned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, just to clarify my intention about SSE / streamable HTTP: what I mean is that we only use them as the transport layer, meaning its only purpose is to deliver bytes from one place to another (in this case between stdio <--> frontend)
In this point of view, we don't need to be 100% compliant with the MCP specs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need CORS
@ngxson That's for the remote MCP endpoints to allow / behave well w/ requests received from web clients, which they are not necessarily tested with (the TypeScript MCP SDK examples don't show how to set those headers for instance)
What kind of auth we are talking about?
MCP auth, to access remote servers that are authenticated w/ oauth. Transparent proxying won't work well (if only because of oauth resource validation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you still misunderstood my intention. I don't mean we will directly the SSE specs defined by MCP that requires auth and/or specific headers. We don't either need auth because it's a local server, not a remote server as you mentioned.
We will use it only to replace what WS currently does. For your use case, we need WS for bidirectional communication between browser and backend. We can just replace it with a simple SSE implementation for server --> front direction, and simple POST request for the front --> back direction. This is essentially what facebook messenger has been doing before WS was a standard.
I'll make a small PoC on how that works a bit later, but won't be too complicated as we already had all the necessary components the existing c++ code base.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implement my idea in ochafik#5
TL;DR:
- Completely removed websocket
- Replaced with SSE + HTTP POST request as mentioned earlier
- Transport served under
/mcpendpoint (GET + POST) - Use a custom transport class on webui to bypass the auth logic - we are running on the same host with the webui, not an external server, there is no need for CORS or auth
- The code can be reduced even further
Benefits are:
- Users already serving llama.cpp via reverse proxy have don't have to change anything - it just works
- SSL is automatically handled by
httplib - C++ code is much simpler than websocket
Note: In case we need to use an external host with SSE / streamable HTTP in the future, just need to extend from the revered proxy component in server-models.cpp
Security hardening for server-ws.cpp: - Add payload size limit (10MB) to prevent DoS via huge allocations - Add message buffer limit (100MB) for fragmented messages - Add receive buffer limit (16MB) - Add connection limit (1000) to prevent thread exhaustion - Add socket timeout (30s) to prevent slow-loris attacks - Validate RSV bits per RFC 6455 - Enforce client frame masking per RFC 6455 - Limit PONG response size to 125 bytes per spec Code simplification: - Remove dead code in server-mcp-bridge.cpp (if/else did same thing) - Remove unused JSON-RPC types from server-mcp.h (~100 lines) - Use SDK Tool type directly instead of custom McpTool - Remove convertToolToMcpTool conversion function - Clean up unused FE types in mcp.ts (~90 lines) Net result: -172 lines while adding comprehensive security protections 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
WebSocket port is always HTTP port + 1, no need for endpoint. - Remove /mcp/ws-port from README, tests, vite proxy - Simplify mcpServiceFactory to sync (no fetch needed) - Add PR.md with updated description 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Default WS_MAX_CONNECTIONS reduced from 1000 to 10 - Configurable via LLAMA_WS_MAX_CONNECTIONS env var - Makes connection limit tests lighter 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Analysis: subprocess.h lacks cwd support, has blocking async reads, and no graceful shutdown (SIGTERM→wait→SIGKILL pattern). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Use vendor/sheredom/subprocess.h instead of custom process management - Remove server-mproc.cpp (~600 lines) and server-mproc.h (~80 lines) - Drop cwd support from mcp_server_config (subprocess.h limitation) - Remove PYTHONUNBUFFERED auto-injection (user responsibility now) Per @ngxson feedback to reuse existing subprocess.h. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Fix shutdown deadlock: terminate process before joining read thread - Add 10MB line buffer limit to prevent OOM from malicious MCP servers 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Fix MCP connection leak: null transport/client on close so isConnected() returns false, preventing duplicate connections - Remove 106 lines of reimplemented SHA1/base64, use existing: - sha1/sha1.h from examples/gguf-hash/deps - base64.hpp from common/ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Fix mutex deadlock in MCP bridge that blocked new connections:
- on_connection_message: release mutex before forward_to_mcp()
(process startup can take 3+ seconds)
- on_connection_closed: release mutex before destroying state
(subprocess destructor joins thread)
- Fix "Processing..." appearing for all messages instead of just
the one being generated (use !message.timings check)
- Reduce WebSocket socket timeout from 30s to 5s for faster
disconnect detection
- Add MCP WebSocket test script (mcp-ws-test.mjs)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add SKIP_MODEL_PRELOAD env var to conftest.py for tests using local models - Add webui_mcp option to ServerProcess class in utils.py - Fix test_mcp.py to use local model instead of HuggingFace download - Fix Python echo script to ignore JSON-RPC notifications (no response for notifications) - Fix test_mcp_servers_with_config assertion (API returns list of objects not strings) - Fix test_websocket_connection_without_server_param to handle empty response - Fix test_websocket_connection_invalid_server to send message first 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add mcp_config option to ServerProcess for --mcp-config flag - Refactor tests to use server.mcp_config instead of env vars - Add get_env_vars tool to echo server for security testing - Add TestMcpEnvVarFiltering test class to verify: - Secret env vars (API keys, passwords) are NOT passed to subprocess - Allowed env vars (HOME, PATH, etc.) ARE inherited - Config-specified env vars ARE passed through - All 8 MCP tests now pass 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Move the 113-line inline Python script to a proper file at tools/server/tests/fixtures/mcp_echo_server.py Benefits: - Syntax highlighting and IDE support - Can be linted/type-checked independently - Serves as documentation for MCP server implementation - Cleaner test code with simpler fixtures - More visible in git history 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add detailed logging at each level of MCP tool call chain: - chat.svelte.ts: executeToolCalls logs tool calls being processed - mcp.svelte.ts: callTool logs before/after service calls - mcp.ts: McpService.callTool logs timing and responses This helps debug issues where tool calls appear stuck. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add mcp>=1.0.0 and pytest-asyncio to test requirements - Update TestMcpJsonRpcProtocol to use MCP SDK's ClientSession - Update TestMcpEnvVarFiltering to use MCP SDK - Keep raw websocket tests for edge cases (TestMcpJsonRpcProtocolRaw) - Configure pytest-asyncio in pytest.ini 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
The 5-second socket timeout was causing WebSocket connections to close during long LLM prompt processing or slow MCP tool calls. Increased to 5 minutes to accommodate: - Long-running MCP tools (web searches, API calls) - LLM prompt processing that can take 10+ seconds 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Remove auto-connect logging (check isConnected before connecting) - Simplify connection logs (only log success and errors) - Clean up tool call logging (only log completion and failures) - Remove redundant debug logs throughout MCP code 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
@allozaur feel free to reuse any parts of this PR. I've updated / fixed the most obvious issues, switched to the TS MCP SDK (+ added tests that use the Python MCP SDK), using existing helpers where possible (sha1, base64, subprocess.h), etc. And Merry Christmas 🎄! |
Thanks, merry Xmas! Also one note from me just for now is that I don't think we should introduce MCP implementation on the server-side in the first release. Client-side implementation purely in WebUI is the least invasive and KISS approach that we can have from the beginning. After that we can of course iterate. |
- Add shared auth validation helpers in server-common: - validate_auth_header() with constant-time comparison - extract_api_key_from_auth_header() for "Bearer " prefix handling - Uses XOR-based constant-time compare to prevent timing attacks - Update HTTP server to use shared validation helper - Add WebSocket authentication during handshake: - Validates Authorization header against configured API keys - Returns 401 with JSON error response on auth failure - Supports same "Bearer <token>" format as HTTP endpoints - Add WebSocket authentication tests in test_security.py 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
@allozaur Agree, although the only server-side bit in this PR is the websocket -> stdio passthrough transport, which is at the (core or) edge of the MCP standard w/ no knowledge of what it's transporting (just lines of text both ways). All the protocol level handling (knowledge of MCP messages) is still only on FE side. If and when we want server-side MCP support, I think we should implement OpenAI's Messages API, or maybe their Realtime API (allowing bits of agent-loops to happen on the server, which would have larger consequences in terms of threading model, etc) |
Summary
This prototype PR adds MCP (Model Context Protocol) support to llama-server's web ui:
Note: not meant to merged as is, this will need deduping w/ any existing work (cc/ @allozaur ) / sending in reviewable chunks.
Features
New CLI Option
Configuration Example
{ "mcpServers": { "brave-search": { "command": "npx", "args": ["-y", "@brave/brave-search-mcp-server", "--transport", "stdio"], "env": { "BRAVE_API_KEY": "... get your key at https://api.search.brave.com/app/keys ..." } }, "python": { "command": "uvx", "args": ["mcp-run-python", "--deps", "numpy,pandas,pydantic,requests,httpx,sympy,aiohttp", "stdio"], "env": {} } } }Architecture
server-ws.cpp/h- WebSocket server implementationserver-mcp-bridge.cpp/h- Routes WebSocket connections to MCP subprocessesserver-mcp.h- MCP protocol type definitionssheredom/subprocess.hfor cross-platform subprocess managementAPI Endpoints
GET /mcp/servers- List available MCP serversWS /mcp?server=<name>- WebSocket connection (on HTTP port + 1)Test plan
tools/server/tests/unit/test_mcp.py)@modelcontextprotocol/server-filesystemTODOs before undrafting
Try and use same port for WS(not possible: httplib doesn't support WS)Possible follow ups: