Skip to content

[BUG] Agent hanging on 5xx #1334

@AnirudhKonduru

Description

@AnirudhKonduru

Checks

  • I have updated to the lastest minor and patch version of Strands
  • I have checked the documentation and this is not expected behavior
  • I have searched ./issues and there are no duplicates of my issue

Strands Version

1.18.x

Python Version

3.11

Operating System

Amazon Linux 2023

Installation Method

other

Steps to Reproduce

Unable to reproduce this in a standalone script.

Expected Behavior

Fail fast and propagate the error.

Actual Behavior

Hangs indefinitely.

Additional Context

We've noticed several instances of the agent hanging indefinitely on 4xx/5xx errors, much like the case described in #995 and subsequently fixed in #1169.

It is possible it is stemming from the way we're using the MCPClient instance in our service (we subclass it to add functionality), but my hunch is that it's a race condition that's allowing tool calls to be scheduled to the background thread event loop after it's been closed due to an uncaught exception, but before the thread has been shutdown (i.e, _is_session_active still returns True, causing those tasks to hang indefinitely (and with it, the agent itself).

We also observed that the hanging would stop if we patched the MCPClient::_is_session_active method to also check if _close_future is done to check if the event loop is still active to accept new tasks:

(Happy to share service logs privately)

Possible Solution

This patch to the MCPClient::_is_session_active function appears to fix it.

def _patched_is_session_active(self) -> bool:
     if self._background_thread is None or not self._background_thread.is_alive():
        return False
    # Check if the close_future has been set, indicating the session is closing/closed
    # This detects session failure during the race condition window BEFORE the thread
    # fully exits, preventing retry attempts from hanging in run_coroutine_threadsafe
    if self._close_future is not None and self._close_future.done():
        return False
    return True


# Apply the monkey patch
MCPClient._is_session_active = _patched_is_session_active

Related Issues

#995

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions