Skip to content

Fix/timeout response handling #350

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

cobach
Copy link

@cobach cobach commented Jun 29, 2025

Title:
Fix timeout issue in McpServerSession response handling

Description:

This PR fixes a timeout issue in McpServerSession where responses were not being sent to clients due to incorrect reactive stream operator chaining.

Motivation and Context

The MCP SDK v0.10.0 has a bug where the server receives requests but never sends responses, causing clients to timeout. This occurs because the flatMap operator was placed after onErrorResume in the response
handling chain. When an error occurred, the error handler would send the error response and return Mono.empty(), leaving nothing for the subsequent flatMap to process.

How Has This Been Tested?

  • Created McpServerSessionTimeoutTest that specifically tests the timeout scenario
  • The test verifies that the server responds to tools/list requests within the timeout period
  • Tested both success and error paths to ensure responses are always sent
  • All existing tests continue to pass

Breaking Changes

No breaking changes. This is a bug fix that maintains the existing API and behavior, only ensuring that responses are actually sent as intended.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

The fix reorders the reactive operators to ensure responses are always sent:

  • Moved flatMap before onErrorResume
  • Removed unnecessary then(Mono.empty()) in error handler
  • Both success and error paths now properly send responses via transport

Code change:

// Before
return handleIncomingRequest(request)
    .onErrorResume(error -> {
        // ... send error response
        return this.transport.sendMessage(errorResponse).then(Mono.empty());
    })
    .flatMap(this.transport::sendMessage);

// After
return handleIncomingRequest(request)
    .flatMap(response -> this.transport.sendMessage(response))
    .onErrorResume(error -> {
        // ... send error response
        return this.transport.sendMessage(errorResponse);
    });

tzolov and others added 4 commits May 12, 2025 14:39
Signed-off-by: Christian Tzolov <[email protected]>
Signed-off-by: Christian Tzolov <[email protected]>
The server was not sending responses due to incorrect reactive stream
operator chaining. The flatMap operator was placed after onErrorResume,
which caused the response sending logic to be skipped when errors occurred.

This fix reorders the operators to ensure responses are always sent:
- Move flatMap before onErrorResume
- Remove unnecessary then(Mono.empty()) in error handler
- Ensure both success and error paths send responses via transport

Added test to verify the fix works correctly.

Fixes timeout issues reported when using MCP SDK v0.10.0 with clients.
Resolved version conflicts by accepting main branch version (0.11.0-SNAPSHOT)
@chemicL
Copy link
Member

chemicL commented Jul 1, 2025

Thanks for taking the time to describe the issue and open a PR. Unfortunately, I disagree with this change. The order is such that in the case of failure to process the message by the server, the transport is used to communicate an error back to the client. In a regular case, a flatMap that follows will have the lambda triggered with an actual response. The order is correct. The test that you provided passes when I reverted the change you introduced, so it's not highlighting a problem.

@chemicL chemicL closed this Jul 1, 2025
@cobach
Copy link
Author

cobach commented Jul 1, 2025

Closing this PR - Our apologies

I'm closing this PR as it was based on an incorrect diagnosis.

@chemicL - I sincerely apologize for wasting your time. You were absolutely right about the operator ordering, and we should have done more thorough testing before opening this PR.

What happened:

  • We experienced timeouts in our application and incorrectly assumed it was due to the SDK's response handling
  • We didn't create proper tests to reproduce the issue before proposing a fix
  • After your feedback, we did comprehensive testing and confirmed the SDK works perfectly as designed

Our testing now shows:

  • The current operator order correctly handles both success and error cases
  • No timeout issues exist in SDK v0.10.0
  • The problem was in our application code, not the SDK

Thank you for your patience and for taking the time to explain why the current implementation is correct. This has been a valuable learning experience about the importance of thoroughly testing and understanding issues before proposing changes.

We'll be more careful in the future to ensure we have reproducible test cases before opening PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants