-
Notifications
You must be signed in to change notification settings - Fork 424
Description
🧭 Type of Feature
Please select the most appropriate category:
- Enhancement to existing functionality
- New feature or capability
- New MCP-compliant server
- New component or integration
- Developer tooling or test improvement
- Packaging, automation and deployment (ex: pypi, docker, quay.io, kubernetes, terraform)
- Other (please describe below)
🧭 Epic
Title: Interactive LLM Chat Interface with MCP Enabled Tool Orchestration
Goal: Provide a web based conversational interface for a inbuilt MCP Client that connects users to LLM providers (Azure OpenAI, Ollama, OpenAI) with real-time streaming capabilities and seamless integration with MCP servers for tool-augmented reasoning using ReAct agent patterns.
Why now:
- Users need a unified interface to interact with multiple LLM providers without switching platforms
- MCP tool integration enables LLMs to perform complex, multi-step tasks autonomously
- Real-time streaming improves user experience by providing immediate feedback
- Development teams require a testing and demonstration environment for MCP-enabled agentic workflows
Beneficiaries:
- Developers testing MCP server implementations and tool integrations
- Data scientists experimenting with different LLM providers and configurations
- Business users needing AI assistance with access to enterprise tools via MCP
🙋♂️ User Story 1: Connection Management and MCP Integration
As a: Developer or system integrator
I want: To configure LLM provider connections, integrate MCP servers via multiple transport protocols, and automatically discover available tools
So that: I can leverage different models with optimal settings and enable LLMs to access external capabilities through MCP tools
✅ Acceptance Criteria
Scenario: Connect to Azure OpenAI with custom configuration
Given I am on the LLM Chat tab
And I have valid Azure OpenAI credentials
When I select "Azure OpenAI" as the provider
And I enter my API key, endpoint URL, deployment name, and API version
And I configure model parameters (temperature: 0.7, max_tokens: 2000)
And I click "Connect"
Then the system should establish a session with MCPChatService
And the system should initialize a ReAct agent with the configured LLM
And the system should display "Connected" status with model name
And the system should show the count and list of available MCP tools
Scenario: Connect to Ollama local instance
Given I have Ollama running locally on port 11434
When I select "Ollama" as the provider
And I specify the model name (e.g., "llama3", "mistral")
And I configure base_url as "http://localhost:11434"
And I set temperature to 0.5
And I click "Connect"
Then the system should create a ChatOllama instance
And the system should validate the connection to Ollama
And the system should display available MCP tools
And I should receive a confirmation with tool count
Scenario: Connect to Virtual MCP server by selecting through a list of servers
Given I am configuring an LLM chat session
When I select a specific virtual server from the list of servers available in the gateway
And I provide a valid authentication token (JWT or Bearer) only for team or private level virtual servers
And I initialize the connection
Then the system should create a MultiServerMCPClient instance
And the system should authenticate using Authorization header
And the system should load all available tools from the MCP server
And each tool should be wrapped as a LangChain BaseTool
And tool metadata (name, description, parameters) should be accessible
Scenario: Tool discovery and listing
Given I have successfully connected to an MCP server
When the system loads tools via MCPClient.get_tools()
Then I should see a list of all available tool names
And each tool should have its schema and description
And the tool count should be displayed in the UI
And tools should be cached to avoid redundant server calls
And I should be able to force reload tools if needed
Scenario: Connection failure handling
Given I enter invalid credentials or unreachable endpoint
When I attempt to connect
Then the system should display a descriptive error message
And the error should indicate whether it's authentication, network, or configuration issue
And the system should not create an active session
And I should be able to modify my configuration and retry🙋♂️ User Story 2: Interactive Chat with Streaming and Tool Execution
As a: End user
I want: To send messages to the LLM and receive real-time streaming responses with visibility into tool invocations
So that: To send messages to the LLM and receive real-time streaming responses with visibility into tool invocations
✅ Acceptance Criteria
Scenario: Send message with streaming enabled
Given I have an active LLM chat session with streaming enabled
And I am authenticated with user_id
When I type a message "Explain quantum computing" in the chat input
And I click "Send" or press Enter
Then the system should call POST /llmchat/chat with streaming=true
And the response should stream via Server-Sent Events (SSE)
And I should see tokens appearing word-by-word in real-time
And the message should be added to conversation history
And the full response should be stored once streaming completes
Scenario: Stream with tool invocation events
Given the LLM needs to use MCP tools to answer my question
When I send a message requiring tool usage (e.g., "What's the weather in Paris?")
Then I should receive SSE events in this order:
| Event Type | Data |
| tool_start | tool_id, tool_name, input_parameters |
| tool_end | tool_id, output, execution_time |
| token | streaming response content |
| final | complete_text, tools_used, elapsed_ms |
And each tool invocation should show start and end events
And tool execution errors should emit "tool_error" events
And the UI should display tool usage indicators
Scenario: Non-streaming chat response
Given I have streaming disabled in my configuration
When I send a message
Then the system should call chat_with_metadata()
And I should receive a complete response after processing
And the response should include:
- Full text content
- Boolean flag indicating if tools were used
- List of tool names invoked
- Detailed tool invocations array
- Total elapsed time in milliseconds
Scenario: Multiple tool invocations in single response
Given my question requires multiple tool calls
When I ask "Compare weather in Paris and London, then convert temperatures to Fahrenheit"
Then the system should emit multiple tool_start/tool_end event pairs
And each tool invocation should have unique tool_id
And tools should execute in logical order determined by ReAct agent
And the final response should synthesize all tool outputs
And I should see a complete list of tools used in the metadata🙋♂️ User Story 3: Conversation History and Session Management
As a: User engaged in multi-turn conversations across multiple sessions
I want: The system to maintain conversation context within my session and manage multiple concurrent user sessions with isolated configurations
So that: The LLM provides contextually aware responses and multiple users can use the interface simultaneously without conflicts
✅ Acceptance Criteria
Scenario: Maintain conversation history within session
Given I have sent 5 messages in my current session
When I send a 6th message that references previous context
Then the system should include all previous messages in the agent invocation
And the LLM should have access to full conversation history
And responses should demonstrate awareness of prior exchanges
And history should persist until session disconnect or explicit clear
Scenario: Automatic history trimming
Given the system has chat_history_max_messages set to 50
When conversation history exceeds 50 messages
Then the system should automatically trim to the 50 most recent messages
And older messages should be removed from memory
And a debug log should indicate history was trimmed
And conversation coherence should be maintained
Scenario: Retrieve conversation history
Given I have an active chat session
When I request GET /llmchat/history or call get_conversation_history()
Then the system should return an array of message objects
And each message should have "role" (user/assistant) and "content"
And messages should be in chronological order
And the format should be compatible with LLM APIs
Scenario: Clear conversation history
Given I want to start a fresh conversation
When I invoke clear_history()
Then all messages should be removed from _conversation_history
And subsequent messages should start a new context
And previous conversation should not influence new responses
Scenario: Create isolated user session
Given a new user with user_id "user_123" connects
When POST /llmchat/connect is called with their configuration
Then the system should create a dedicated MCPChatService instance
And the instance should be stored in active_sessions["user_123"]
And user configuration should be stored in user_configs["user_123"]
And the session should be completely isolated from other users
And JWT token should be extracted from cookies if not provided
Scenario: Session refresh with cleanup
Given user "user_123" has an existing active session
When they connect again with new configuration
Then the old session should be gracefully shut down
And resources should be released (MCP connections closed)
And a new session should be created with updated config
And the transition should be seamless without data loss
Scenario: Check session status
Given user "user_456" may or may not have an active session
When GET /llmchat/status/user_456 is called
Then the response should indicate connection status (true/false)
And it should include user_id in the response
And no sensitive information should be exposed
Scenario: Disconnect and cleanup
Given user "user_789" wants to end their session
When POST /llmchat/disconnect is called
Then the MCPChatService should execute shutdown()
And MCP client should disconnect from servers
And the session should be removed from active_sessions
And user_configs should be cleared for this user
And response should confirm successful disconnection
And any errors during cleanup should be logged but not block disconnection🙋♂️ User Story 4: Error Handling and System Resilience
As a: User or developer
I want: Clear error messages, graceful degradation when issues occur, and system resilience across various failure scenarios
So that: I can understand what went wrong, take corrective action, and the system remains stable under adverse conditions
✅ Acceptance Criteria
Scenario: Handle MCP server connection failure
Given I provide an invalid or unreachable MCP server URL
When I attempt to connect
Then the system should catch the ConnectionError
And return HTTP 503 with descriptive message
And suggest verification steps (check URL, server status, auth)
And partial state should be cleaned up (no zombie sessions)
Scenario: Handle LLM authentication failure
Given I provide invalid API credentials
When the system attempts to initialize the LLM provider
Then the system should catch the authentication error
And return HTTP 400 with clear indication of credential issue
And the session should not be created in active_sessions
And sensitive credentials should not appear in error messages
Scenario: Handle timeout during LLM response
Given the LLM takes longer than configured timeout
When streaming or waiting for response
Then the system should catch TimeoutError
And emit an "error" SSE event for streaming mode
And return HTTP 504 for non-streaming mode
And the error should indicate the request timed out
And the session should remain active for retry
Scenario: Handle tool execution errors
Given an MCP tool fails during execution
When the ReAct agent invokes the tool
Then the system should emit "tool_error" event
And include error details (tool_id, error message, timestamp)
And the agent should continue processing (graceful degradation)
And the LLM should receive error context to adjust its response
Scenario: Handle malformed or invalid requests
Given a request with missing required fields or invalid data types
When the API endpoint receives the request
Then Pydantic validation should catch the error
And return HTTP 422 with detailed validation errors
And indicate which fields are invalid and why
And the system state should remain unchanged
Scenario: Handle concurrent session conflicts
Given user "user_999" attempts to connect twice simultaneously
When both connection requests arrive nearly at the same time
Then the system should handle race conditions gracefully
And only one session should be created
And the second request should either wait or refresh the first
And no resources should leak from abandoned connections📐 Design Sketch
High-Level Architecture:
flowchart TB
subgraph "Frontend - Admin UI"
A[LLM Chat Tab] --> B[Configuration Panel]
A --> C[Chat Interface]
A --> D[Connection Status]
C --> E[Message Input]
C --> F[Response Display]
C --> G[Tool Invocation Indicators]
end
subgraph "Backend - FastAPI Router"
H[POST /llmchat/connect]
I[POST /llmchat/chat]
J[POST /llmchat/disconnect]
K[GET /llmchat/status/:user_id]
L[GET /llmchat/config/:user_id]
end
subgraph "Core Service Layer"
M[MCPChatService]
N[MCPClient]
O[LLMProviderFactory]
P[ReAct Agent]
end
subgraph "External Integrations"
Q[MCP Servers]
R[Azure OpenAI]
S[Ollama]
T[OpenAI]
end
B --> H
E --> I
D --> J
D --> K
H --> M
I --> M
J --> M
K --> M
L --> M
M --> N
M --> O
M --> P
N -->|streamable_http/sse/stdio| Q
O -->|provider selection| R
O -->|provider selection| S
O -->|provider selection| T
P -->|tool invocation| Q
I -->|SSE Stream| F
F --> G
Chat Interaction Flow with Tool Usage:
sequenceDiagram
participant User
participant UI as Frontend
participant API as llmchat_router
participant Service as MCPChatService
participant Agent as ReAct Agent
participant MCP as MCP Server
participant LLM as LLM Provider
User->>UI: Type message & send
UI->>API: POST /llmchat/chat (streaming=true)
API->>Service: chat_events(message)
Service->>Agent: astream_events()
loop Streaming Events
Agent->>LLM: Generate response chunk
LLM-->>Agent: Token
Agent-->>Service: on_chat_model_stream
Service-->>API: SSE: event=token
API-->>UI: Display token
alt Tool Invocation Needed
Agent->>Agent: Identify tool need
Agent->>MCP: on_tool_start
Service-->>API: SSE: event=tool_start
API-->>UI: Show "Tool: xyz running..."
MCP->>MCP: Execute tool
MCP-->>Agent: Tool result
Agent-->>Service: on_tool_end
Service-->>API: SSE: event=tool_end
API-->>UI: Show "Tool: xyz completed"
Agent->>LLM: Continue with tool result
end
end
Service->>Service: Store in history
Service-->>API: SSE: event=final
API-->>UI: Complete response
UI->>User: Display full conversation
Session State Management:
stateDiagram-v2
[*] --> Disconnected
Disconnected --> Connecting: POST /connect
Connecting --> Connected: Success
Connecting --> Error: Failure
Error --> Disconnected: Cleanup
Connected --> Processing: POST /chat
Processing --> Streaming: streaming=true
Processing --> Waiting: streaming=false
Streaming --> StreamingToolUse: Tool needed
StreamingToolUse --> Streaming: Tool complete
Streaming --> Connected: Message complete
Waiting --> Connected: Response received
Connected --> Disconnected: POST /disconnect
Connected --> Error: Connection lost
🔗 MCP Standards Check
- Change adheres to current MCP specifications
- No breaking changes to existing MCP-compliant integrations
- If deviations exist, please describe them below:
📓 Additional Context
Technical Implementation Details
Backend Stack:
- FastAPI for REST API with async support
- Pydantic for configuration validation and data models
- LangChain for LLM abstraction and agent patterns
- LangGraph for ReAct agent creation (create_react_agent)
- langchain_mcp_adapters for MCP client implementation
- langchain_openai, langchain_ollama for LLM provider integrations
SSE Event Types:
- token: Streaming response content chunks
- tool_start: Tool invocation initiated (id, name, input)
- tool_end: Tool execution completed (id, output, timestamps)
- tool_error: Tool execution failed (id, error message)
- final: Complete response with metadata (content, tools_used, elapsed_ms)
- error: Error occurred (error message, recoverable flag)
Environment Variable Fallbacks:
The system supports configuration via environment variables for easier deployment:
-
Enable or Disable LLM chat functionality
LLMCHAT_ENABLED=true -
LLM Providers (Optional):
-
AZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_DEPLOYMENT, AZURE_OPENAI_API_VERSION, AZURE_OPENAI_MODEL -
OLLAMA_MODEL -
OPENAI_API_KEY, OPENAI_MODEL, OPENAI_BASE_URL
Configuration Models:
class MCPClientConfig(BaseModel):
"""Main configuration for MCP client."""
mcp_server: MCPServerConfig = Field(..., description="MCP server configuration")
llm: LLMConfig = Field(..., description="LLM provider configuration")
chat_history_max_messages: int = Field(default=50, gt=0, description="Maximum messages to keep in chat history")
enable_streaming: bool = Field(default=True, description="Enable streaming responses")
class MCPServerConfig(BaseModel):
"""Configuration for MCP server connection."""
url: Optional[str] = Field(None, description="MCP server URL for streamable_http/sse transports")
command: Optional[str] = Field(None, description="Command to run for stdio transport")
args: Optional[list[str]] = Field(None, description="Arguments for stdio command")
transport: Literal["streamable_http", "sse", "stdio"] = Field(default="streamable_http", description="Transport type for MCP connection")
auth_token: Optional[str] = Field(None, description="Authentication token for the server")
headers: Optional[Dict[str, str]] = Field(default=None, description="Additional headers for HTTP-based transports")
class LLMConfig(BaseModel):
"""Configuration for LLM provider."""
provider: Literal["azure_openai", "ollama"] = Field(..., description="LLM provider type")
config: Union[AzureOpenAIConfig, OllamaConfig] = Field(..., description="Provider-specific configuration")API Endpoints:
| Method | Endpoint | Purpose | Request Body | Response |
|---|---|---|---|---|
| POST | /llmchat/connect | Initialize session | ConnectInput (user_id, server, llm, streaming) | status, user_id, provider, tool_count, tools[] |
| POST | /llmchat/chat | Send message | ChatInput (user_id, message, streaming) | StreamingResponse (SSE) or JSON with response metadata |
| POST | /llmchat/disconnect | End session | DisconnectInput (user_id) | status, message |
| GET | /llmchat/status/:user_id | Check connection | - | user_id, connected (boolean) |
| GET | /llmchat/config/:user_id | Retrieve config | - | Sanitized config (secrets removed) |