Skip to content

[Feature Request]: In built MCP client - LLM Chat service for virtual servers with agentic capabilities and MCP Enabled Tool Orchestration #1200

@kevalmahajan

Description

@kevalmahajan

🧭 Type of Feature

Please select the most appropriate category:

  • Enhancement to existing functionality
  • New feature or capability
  • New MCP-compliant server
  • New component or integration
  • Developer tooling or test improvement
  • Packaging, automation and deployment (ex: pypi, docker, quay.io, kubernetes, terraform)
  • Other (please describe below)

🧭 Epic

Title: Interactive LLM Chat Interface with MCP Enabled Tool Orchestration

Goal: Provide a web based conversational interface for a inbuilt MCP Client that connects users to LLM providers (Azure OpenAI, Ollama, OpenAI) with real-time streaming capabilities and seamless integration with MCP servers for tool-augmented reasoning using ReAct agent patterns.

Why now:

  • Users need a unified interface to interact with multiple LLM providers without switching platforms
  • MCP tool integration enables LLMs to perform complex, multi-step tasks autonomously
  • Real-time streaming improves user experience by providing immediate feedback
  • Development teams require a testing and demonstration environment for MCP-enabled agentic workflows

Beneficiaries:

  • Developers testing MCP server implementations and tool integrations
  • Data scientists experimenting with different LLM providers and configurations
  • Business users needing AI assistance with access to enterprise tools via MCP

🙋♂️ User Story 1: Connection Management and MCP Integration

As a: Developer or system integrator
I want: To configure LLM provider connections, integrate MCP servers via multiple transport protocols, and automatically discover available tools
So that: I can leverage different models with optimal settings and enable LLMs to access external capabilities through MCP tools

✅ Acceptance Criteria

Scenario: Connect to Azure OpenAI with custom configuration
  Given I am on the LLM Chat tab
  And I have valid Azure OpenAI credentials
  When I select "Azure OpenAI" as the provider
  And I enter my API key, endpoint URL, deployment name, and API version
  And I configure model parameters (temperature: 0.7, max_tokens: 2000)
  And I click "Connect"
  Then the system should establish a session with MCPChatService
  And the system should initialize a ReAct agent with the configured LLM
  And the system should display "Connected" status with model name
  And the system should show the count and list of available MCP tools

Scenario: Connect to Ollama local instance
  Given I have Ollama running locally on port 11434
  When I select "Ollama" as the provider
  And I specify the model name (e.g., "llama3", "mistral")
  And I configure base_url as "http://localhost:11434"
  And I set temperature to 0.5
  And I click "Connect"
  Then the system should create a ChatOllama instance
  And the system should validate the connection to Ollama
  And the system should display available MCP tools
  And I should receive a confirmation with tool count

Scenario: Connect to Virtual MCP server by selecting through a list of servers
  Given I am configuring an LLM chat session
  When I select a specific virtual server from the list of servers available in the gateway
  And I provide a valid authentication token (JWT or Bearer) only for team or private level virtual servers
  And I initialize the connection
  Then the system should create a MultiServerMCPClient instance
  And the system should authenticate using Authorization header
  And the system should load all available tools from the MCP server
  And each tool should be wrapped as a LangChain BaseTool
  And tool metadata (name, description, parameters) should be accessible

Scenario: Tool discovery and listing
  Given I have successfully connected to an MCP server
  When the system loads tools via MCPClient.get_tools()
  Then I should see a list of all available tool names
  And each tool should have its schema and description
  And the tool count should be displayed in the UI
  And tools should be cached to avoid redundant server calls
  And I should be able to force reload tools if needed

Scenario: Connection failure handling
  Given I enter invalid credentials or unreachable endpoint
  When I attempt to connect
  Then the system should display a descriptive error message
  And the error should indicate whether it's authentication, network, or configuration issue
  And the system should not create an active session
  And I should be able to modify my configuration and retry

🙋♂️ User Story 2: Interactive Chat with Streaming and Tool Execution

As a: End user
I want: To send messages to the LLM and receive real-time streaming responses with visibility into tool invocations
So that: To send messages to the LLM and receive real-time streaming responses with visibility into tool invocations

✅ Acceptance Criteria

Scenario: Send message with streaming enabled
  Given I have an active LLM chat session with streaming enabled
  And I am authenticated with user_id
  When I type a message "Explain quantum computing" in the chat input
  And I click "Send" or press Enter
  Then the system should call POST /llmchat/chat with streaming=true
  And the response should stream via Server-Sent Events (SSE)
  And I should see tokens appearing word-by-word in real-time
  And the message should be added to conversation history
  And the full response should be stored once streaming completes

Scenario: Stream with tool invocation events
  Given the LLM needs to use MCP tools to answer my question
  When I send a message requiring tool usage (e.g., "What's the weather in Paris?")
  Then I should receive SSE events in this order:
    | Event Type   | Data                                    |
    | tool_start   | tool_id, tool_name, input_parameters   |
    | tool_end     | tool_id, output, execution_time        |
    | token        | streaming response content             |
    | final        | complete_text, tools_used, elapsed_ms  |
  And each tool invocation should show start and end events
  And tool execution errors should emit "tool_error" events
  And the UI should display tool usage indicators

Scenario: Non-streaming chat response
  Given I have streaming disabled in my configuration
  When I send a message
  Then the system should call chat_with_metadata()
  And I should receive a complete response after processing
  And the response should include:
    - Full text content
    - Boolean flag indicating if tools were used
    - List of tool names invoked
    - Detailed tool invocations array
    - Total elapsed time in milliseconds

Scenario: Multiple tool invocations in single response
  Given my question requires multiple tool calls
  When I ask "Compare weather in Paris and London, then convert temperatures to Fahrenheit"
  Then the system should emit multiple tool_start/tool_end event pairs
  And each tool invocation should have unique tool_id
  And tools should execute in logical order determined by ReAct agent
  And the final response should synthesize all tool outputs
  And I should see a complete list of tools used in the metadata

🙋♂️ User Story 3: Conversation History and Session Management

As a: User engaged in multi-turn conversations across multiple sessions
I want: The system to maintain conversation context within my session and manage multiple concurrent user sessions with isolated configurations
So that: The LLM provides contextually aware responses and multiple users can use the interface simultaneously without conflicts

✅ Acceptance Criteria

Scenario: Maintain conversation history within session
  Given I have sent 5 messages in my current session
  When I send a 6th message that references previous context
  Then the system should include all previous messages in the agent invocation
  And the LLM should have access to full conversation history
  And responses should demonstrate awareness of prior exchanges
  And history should persist until session disconnect or explicit clear

Scenario: Automatic history trimming
  Given the system has chat_history_max_messages set to 50
  When conversation history exceeds 50 messages
  Then the system should automatically trim to the 50 most recent messages
  And older messages should be removed from memory
  And a debug log should indicate history was trimmed
  And conversation coherence should be maintained

Scenario: Retrieve conversation history
  Given I have an active chat session
  When I request GET /llmchat/history or call get_conversation_history()
  Then the system should return an array of message objects
  And each message should have "role" (user/assistant) and "content"
  And messages should be in chronological order
  And the format should be compatible with LLM APIs

Scenario: Clear conversation history
  Given I want to start a fresh conversation
  When I invoke clear_history()
  Then all messages should be removed from _conversation_history
  And subsequent messages should start a new context
  And previous conversation should not influence new responses

Scenario: Create isolated user session
  Given a new user with user_id "user_123" connects
  When POST /llmchat/connect is called with their configuration
  Then the system should create a dedicated MCPChatService instance
  And the instance should be stored in active_sessions["user_123"]
  And user configuration should be stored in user_configs["user_123"]
  And the session should be completely isolated from other users
  And JWT token should be extracted from cookies if not provided

Scenario: Session refresh with cleanup
  Given user "user_123" has an existing active session
  When they connect again with new configuration
  Then the old session should be gracefully shut down
  And resources should be released (MCP connections closed)
  And a new session should be created with updated config
  And the transition should be seamless without data loss

Scenario: Check session status
  Given user "user_456" may or may not have an active session
  When GET /llmchat/status/user_456 is called
  Then the response should indicate connection status (true/false)
  And it should include user_id in the response
  And no sensitive information should be exposed

Scenario: Disconnect and cleanup
  Given user "user_789" wants to end their session
  When POST /llmchat/disconnect is called
  Then the MCPChatService should execute shutdown()
  And MCP client should disconnect from servers
  And the session should be removed from active_sessions
  And user_configs should be cleared for this user
  And response should confirm successful disconnection
  And any errors during cleanup should be logged but not block disconnection

🙋♂️ User Story 4: Error Handling and System Resilience

As a: User or developer
I want: Clear error messages, graceful degradation when issues occur, and system resilience across various failure scenarios
So that: I can understand what went wrong, take corrective action, and the system remains stable under adverse conditions

✅ Acceptance Criteria

Scenario: Handle MCP server connection failure
  Given I provide an invalid or unreachable MCP server URL
  When I attempt to connect
  Then the system should catch the ConnectionError
  And return HTTP 503 with descriptive message
  And suggest verification steps (check URL, server status, auth)
  And partial state should be cleaned up (no zombie sessions)

Scenario: Handle LLM authentication failure
  Given I provide invalid API credentials
  When the system attempts to initialize the LLM provider
  Then the system should catch the authentication error
  And return HTTP 400 with clear indication of credential issue
  And the session should not be created in active_sessions
  And sensitive credentials should not appear in error messages

Scenario: Handle timeout during LLM response
  Given the LLM takes longer than configured timeout
  When streaming or waiting for response
  Then the system should catch TimeoutError
  And emit an "error" SSE event for streaming mode
  And return HTTP 504 for non-streaming mode
  And the error should indicate the request timed out
  And the session should remain active for retry

Scenario: Handle tool execution errors
  Given an MCP tool fails during execution
  When the ReAct agent invokes the tool
  Then the system should emit "tool_error" event
  And include error details (tool_id, error message, timestamp)
  And the agent should continue processing (graceful degradation)
  And the LLM should receive error context to adjust its response

Scenario: Handle malformed or invalid requests
  Given a request with missing required fields or invalid data types
  When the API endpoint receives the request
  Then Pydantic validation should catch the error
  And return HTTP 422 with detailed validation errors
  And indicate which fields are invalid and why
  And the system state should remain unchanged

Scenario: Handle concurrent session conflicts
  Given user "user_999" attempts to connect twice simultaneously
  When both connection requests arrive nearly at the same time
  Then the system should handle race conditions gracefully
  And only one session should be created
  And the second request should either wait or refresh the first
  And no resources should leak from abandoned connections

📐 Design Sketch

High-Level Architecture:

flowchart TB
    subgraph "Frontend - Admin UI"
        A[LLM Chat Tab] --> B[Configuration Panel]
        A --> C[Chat Interface]
        A --> D[Connection Status]
        C --> E[Message Input]
        C --> F[Response Display]
        C --> G[Tool Invocation Indicators]
    end
    
    subgraph "Backend - FastAPI Router"
        H[POST /llmchat/connect]
        I[POST /llmchat/chat]
        J[POST /llmchat/disconnect]
        K[GET /llmchat/status/:user_id]
        L[GET /llmchat/config/:user_id]
    end
    
    subgraph "Core Service Layer"
        M[MCPChatService]
        N[MCPClient]
        O[LLMProviderFactory]
        P[ReAct Agent]
    end
    
    subgraph "External Integrations"
        Q[MCP Servers]
        R[Azure OpenAI]
        S[Ollama]
        T[OpenAI]
    end
    
    B --> H
    E --> I
    D --> J
    D --> K
    
    H --> M
    I --> M
    J --> M
    K --> M
    L --> M
    
    M --> N
    M --> O
    M --> P
    
    N -->|streamable_http/sse/stdio| Q
    O -->|provider selection| R
    O -->|provider selection| S
    O -->|provider selection| T
    P -->|tool invocation| Q
    
    I -->|SSE Stream| F
    F --> G

Loading

Chat Interaction Flow with Tool Usage:

sequenceDiagram
    participant User
    participant UI as Frontend
    participant API as llmchat_router
    participant Service as MCPChatService
    participant Agent as ReAct Agent
    participant MCP as MCP Server
    participant LLM as LLM Provider
    
    User->>UI: Type message & send
    UI->>API: POST /llmchat/chat (streaming=true)
    API->>Service: chat_events(message)
    Service->>Agent: astream_events()
    
    loop Streaming Events
        Agent->>LLM: Generate response chunk
        LLM-->>Agent: Token
        Agent-->>Service: on_chat_model_stream
        Service-->>API: SSE: event=token
        API-->>UI: Display token
        
        alt Tool Invocation Needed
            Agent->>Agent: Identify tool need
            Agent->>MCP: on_tool_start
            Service-->>API: SSE: event=tool_start
            API-->>UI: Show "Tool: xyz running..."
            
            MCP->>MCP: Execute tool
            MCP-->>Agent: Tool result
            Agent-->>Service: on_tool_end
            Service-->>API: SSE: event=tool_end
            API-->>UI: Show "Tool: xyz completed"
            
            Agent->>LLM: Continue with tool result
        end
    end
    
    Service->>Service: Store in history
    Service-->>API: SSE: event=final
    API-->>UI: Complete response
    UI->>User: Display full conversation

Loading

Session State Management:

stateDiagram-v2
    [*] --> Disconnected
    Disconnected --> Connecting: POST /connect
    Connecting --> Connected: Success
    Connecting --> Error: Failure
    Error --> Disconnected: Cleanup
    
    Connected --> Processing: POST /chat
    Processing --> Streaming: streaming=true
    Processing --> Waiting: streaming=false
    
    Streaming --> StreamingToolUse: Tool needed
    StreamingToolUse --> Streaming: Tool complete
    Streaming --> Connected: Message complete
    
    Waiting --> Connected: Response received
    
    Connected --> Disconnected: POST /disconnect
    Connected --> Error: Connection lost
Loading

🔗 MCP Standards Check

  • Change adheres to current MCP specifications
  • No breaking changes to existing MCP-compliant integrations
  • If deviations exist, please describe them below:

📓 Additional Context

Technical Implementation Details
Backend Stack:

  • FastAPI for REST API with async support
  • Pydantic for configuration validation and data models
  • LangChain for LLM abstraction and agent patterns
  • LangGraph for ReAct agent creation (create_react_agent)
  • langchain_mcp_adapters for MCP client implementation
  • langchain_openai, langchain_ollama for LLM provider integrations

SSE Event Types:

  • token: Streaming response content chunks
  • tool_start: Tool invocation initiated (id, name, input)
  • tool_end: Tool execution completed (id, output, timestamps)
  • tool_error: Tool execution failed (id, error message)
  • final: Complete response with metadata (content, tools_used, elapsed_ms)
  • error: Error occurred (error message, recoverable flag)

Environment Variable Fallbacks:
The system supports configuration via environment variables for easier deployment:

  • Enable or Disable LLM chat functionality
    LLMCHAT_ENABLED=true

  • LLM Providers (Optional):

  1. AZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_DEPLOYMENT, AZURE_OPENAI_API_VERSION, AZURE_OPENAI_MODEL

  2. OLLAMA_MODEL

  3. OPENAI_API_KEY, OPENAI_MODEL, OPENAI_BASE_URL

Configuration Models:

class MCPClientConfig(BaseModel):
    """Main configuration for MCP client."""

    mcp_server: MCPServerConfig = Field(..., description="MCP server configuration")
    llm: LLMConfig = Field(..., description="LLM provider configuration")
    chat_history_max_messages: int = Field(default=50, gt=0, description="Maximum messages to keep in chat history")
    enable_streaming: bool = Field(default=True, description="Enable streaming responses")

class MCPServerConfig(BaseModel):
    """Configuration for MCP server connection."""

    url: Optional[str] = Field(None, description="MCP server URL for streamable_http/sse transports")
    command: Optional[str] = Field(None, description="Command to run for stdio transport")
    args: Optional[list[str]] = Field(None, description="Arguments for stdio command")
    transport: Literal["streamable_http", "sse", "stdio"] = Field(default="streamable_http", description="Transport type for MCP connection")
    auth_token: Optional[str] = Field(None, description="Authentication token for the server")
    headers: Optional[Dict[str, str]] = Field(default=None, description="Additional headers for HTTP-based transports")

class LLMConfig(BaseModel):
    """Configuration for LLM provider."""

    provider: Literal["azure_openai", "ollama"] = Field(..., description="LLM provider type")
    config: Union[AzureOpenAIConfig, OllamaConfig] = Field(..., description="Provider-specific configuration")

API Endpoints:

Method Endpoint Purpose Request Body Response
POST /llmchat/connect Initialize session ConnectInput (user_id, server, llm, streaming) status, user_id, provider, tool_count, tools[]
POST /llmchat/chat Send message ChatInput (user_id, message, streaming) StreamingResponse (SSE) or JSON with response metadata
POST /llmchat/disconnect End session DisconnectInput (user_id) status, message
GET /llmchat/status/:user_id Check connection - user_id, connected (boolean)
GET /llmchat/config/:user_id Retrieve config - Sanitized config (secrets removed)

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requesttriageIssues / Features awaiting triage

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions