Async concurrency regression with 5.29.0 #2446

DefiDebauchery · 2022-04-26T14:57:52Z

Version: 5.29.0
Python: 3.9.7
OS: macos

What was wrong?

Async code with the AsyncHTTPProvider hangs in 5.29.0; previously worked fine in 5.28.0.

(Shoddy) code sample:

import asyncio
from web3 import Web3, AsyncHTTPProvider
from web3.eth import AsyncEth
from web3.net import AsyncNet
from web3.middleware import async_geth_poa_middleware

networks = {
    'ARBITRUM' : 'https://arb1.arbitrum.io/rpc',
    'AURORA'   : 'https://mainnet.aurora.dev',
    'AVAX'     : 'https://rpc.ankr.com/avalanche/',
    'BSC'      : 'https://bscrpc.com',
    'CELO'     : 'https://rpc.ankr.com/celo/',
    'CRONOS'   : 'https://evm-cronos.crypto.org',
    'FANTOM'   : 'https://rpc.ankr.com/fantom/',
    'FUSE'     : 'https://rpc.fuse.io/',
    'HARMONY'  : 'https://rpc.ankr.com/harmony/',
    'HECO'     : 'https://http-mainnet.hecochain.com',
    'METIS'    : 'https://andromeda.metis.io/?owner=1088',
    'MOONBEAM' : 'https://rpc.api.moonbeam.network',
    'MOONRIVER': 'https://rpc.api.moonriver.moonbeam.network/',
    'POLYGON'  : 'https://polygon-rpc.com/'
}

async def load_networks():
    async def connect_rpc(network, uri):
        rpc = Web3(
            AsyncHTTPProvider(
                uri,
                request_kwargs={'timeout': 10},
            ),
            modules={'eth': AsyncEth, 'net': AsyncNet},
            middlewares=[]
        ) 
        rpc.middleware_onion.inject(async_geth_poa_middleware, layer=0)

        try:
            chain_id = await rpc.eth.chain_id
            print(network, chain_id)
        except Exception as e:
            print(network, e)

    tasks = [connect_rpc(chain, info) for chain, info in networks.items()]
    await asyncio.gather(*tasks)

if __name__ == "__main__":
    asyncio.run(load_networks())

With 5.28.0, the output is as expected.

BSC 56
AURORA 1313161554
POLYGON 137
HECO 128
MOONBEAM 1284
MOONRIVER 1285
ARBITRUM 42161
METIS 1088
AVAX 43114
FANTOM 250
CELO 42220
HARMONY 1666600000
FUSE 122
CRONOS 25

When updating to 5.29.0, the output hangs. When I abort the program, the error output references a session cache lock (edited for readability)

KeyboardInterrupt
Task exception was never retrieved
future: <Task finished name='Task-1' coro=<load_networks() done, defined at []/async_test.py:25> exception=KeyboardInterrupt()>
Traceback (most recent call last):
  File "[]/async_test.py", line 49, in <module>
    asyncio.run(load_networks())
  File "[]/python3.9/asyncio/runners.py", line 47, in run
    _cancel_all_tasks(loop)
  File "[]/python3.9/asyncio/runners.py", line 63, in _cancel_all_tasks
    loop.run_until_complete(
  File "[]/python3.9/asyncio/base_events.py", line 629, in run_until_complete
    self.run_forever()
  File "[]/python3.9/asyncio/base_events.py", line 596, in run_forever
    self._run_once()
  File "[]/python3.9/asyncio/base_events.py", line 1890, in _run_once
    handle._run()
  File "[]/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "[]/async_test.py", line 46, in load_networks
    await asyncio.gather(*tasks)
  File "[]/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "[]/python3.9/asyncio/base_events.py", line 629, in run_until_complete
    self.run_forever()
  File "[]/python3.9/asyncio/base_events.py", line 596, in run_forever
    self._run_once()
  File "[]/python3.9/asyncio/base_events.py", line 1890, in _run_once
    handle._run()
  File "[]/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "[]/async_test.py", line 38, in connect_rpc
    chain_id = await rpc.eth.chain_id
  File "[]/python3.9/site-packages/web3/eth.py", line 360, in chain_id
    return await self._chain_id()  # type: ignore
  File "[]/python3.9/site-packages/web3/module.py", line 72, in caller
    result = await w3.manager.coro_request(method_str,
  File "[]/python3.9/site-packages/web3/manager.py", line 213, in coro_request
    response = await self._coro_make_request(method, params)
  File "[]/python3.9/site-packages/web3/manager.py", line 160, in _coro_make_request
    return await request_func(method, params)
  File "[]/python3.9/site-packages/web3/middleware/formatting.py", line 137, in middleware
    response = await make_request(method, params)
  File "[]/python3.9/site-packages/web3/providers/async_rpc.py", line 80, in make_request
    raw_response = await async_make_post_request(
  File "[]/python3.9/site-packages/web3/_utils/request.py", line 113, in async_make_post_request
    session = await _get_async_session(endpoint_uri)
  File "[]/python3.9/site-packages/web3/_utils/request.py", line 94, in _get_async_session
    await cache_async_session(endpoint_uri, ClientSession(raise_for_status=True))
  File "[]/python3.9/site-packages/web3/_utils/request.py", line 68, in cache_async_session
    with _async_session_cache_lock:
KeyboardInterrupt
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x106ffafd0>

I do see that some SessionCache code is included in 5.29.0. My use case may be quite different than what was intended by the Session Cache and is likely related to my report in #2254 (ack, my apologies for not following up on that, @kclowes!).

The text was updated successfully, but these errors were encountered:

dbfreem · 2022-04-27T00:54:46Z

@DefiDebauchery any chance you can try what is in #2409 and see if you still have this problem. If it does, I can put in a PR to add this same change in the v5 branch.

DefiDebauchery · 2022-04-27T12:56:28Z

@DefiDebauchery any chance you can try what is in #2409 and see if you still have this problem. If it does, I can put in a PR to add this same change in the v5 branch.

Unfortunately, this does not change the results. Different line numbers for error output, though:

  File "[]/web3/middleware/formatting.py", line 137, in middleware
    response = await make_request(method, params)
  File "[]/web3/providers/async_rpc.py", line 80, in make_request
    raw_response = await async_make_post_request(
  File "[]/web3/_utils/request.py", line 124, in async_make_post_request
    session = await _get_async_session(endpoint_uri)
  File "[]/web3/_utils/request.py", line 87, in _get_async_session
    with _async_session_cache_lock:
KeyboardInterrupt

kclowes · 2022-04-27T16:43:57Z

oop, thanks for the report @DefiDebauchery. Putting this near the top of my list to look at this week.

kclowes · 2022-04-27T20:11:43Z

@DefiDebauchery the default cache size is 8 so you have to bump your cache size to 14 or however many connections you're trying to make. I didn't look very closely but I don't think there's a straightforward API to do that right at the moment, but I can get to it in a little bit or review if someone else gets to it first.

Edit: Here is where it gets set. I do think we need to allow users to configure this, but I don't really have a good idea about where to allow that customization. First thought is to allow a cache_size kwarg to the provider initialization. To steal your example, something like:

rpc = Web3(
            AsyncHTTPProvider(
                uri,
                request_kwargs={'timeout': 10},
                cache_size=20,
            ),
            modules={'eth': AsyncEth, 'net': AsyncNet},
            middlewares=[]
        )

Open to other ideas too!

DefiDebauchery · 2022-04-27T23:38:29Z

It seems awkward to have each individual provider instance explicitly set the cache size. Am I wrong that SessionCache is initialized once? So wouldn't it make more sense to instantiate it with the requested size, then pass the instance into the Provider?

Pedantically, I guess it's 6 in one hand / half-dozen in the other - but any ability to modify this is certainly welcomed.

dbfreem · 2022-04-28T10:23:36Z

@DefiDebauchery if you change the cache size does that fix the issue. You are correct the cache is only instantiated once across all providers so we need to think what is the best way to make that configurable.

One short term fix could be to bump the default cache size to something higher like 20 or 30. It doesn't seem like that would be to much of a memory issue it would be more of an issue with holding all those connections open simultaneously.

We could also maybe make request an instance on each provider instead of using it as a singletons but that seems way more intrusive and probably significant work.

@kclowes thoughts?

DefiDebauchery · 2022-04-28T13:55:23Z

@dbfreem Yes, taking your file from #2409 and simply changing the cache size does indeed fix the issue. If there are no performance implications, having a higher hardcoded value (30) would be a good stepping stone.

kclowes · 2022-04-29T20:47:47Z

I think bumping the hard coded value to 20 or 30 seems like a good step for now. I'll get that PRed in a little bit.

Right now, the API looks like:

w3 = Web3(
            AsyncHTTPProvider(
                uri,
                request_kwargs={'timeout': 10},
            ),
            modules={'eth': AsyncEth, 'net': AsyncNet},
            middlewares=[]
        ) 
custom_session = ClientSession()  # If you want to pass in your own session
await w3.provider.cache_async_session(custom_session)

@DefiDebauchery are you suggesting something like:

session_cache = SessionCache(size=25)
w3 = Web3(
            AsyncHTTPProvider(
                uri,
                request_kwargs={'timeout': 10},
                session_cache,
            ),
            modules={'eth': AsyncEth, 'net': AsyncNet},
            middlewares=[]
        )

Tagging @fselmo for opinions too!

DefiDebauchery · 2022-04-29T21:17:21Z

@kclowes Either that, or allowing ClientSession in your first example to accept the cache size.
My structural python is fairly non-existent, so ultimately I'm okay with whatever 😬

kclowes · 2022-05-18T21:13:10Z

@DefiDebauchery I bumped the default cache size to 20 in v5.29.1 as a band-aid fix. I'm currently looking into what the actual fix might be.

dbfreem · 2022-05-19T10:23:08Z

async def cache_async_session(endpoint_uri: URI, session: ClientSession) -> None:
    cache_key = generate_cache_key(endpoint_uri)
    evicted_items: Dict[str, Any] = None
    with _async_session_cache_lock:
        evicted_items = _async_session_cache.cache(cache_key, session)
    if evicted_items is not None:
        for key, session in evicted_items.items():
            await session.close()

@kclowes this is the fix for the async cache bug. I am closing the connection outside of the lock. What was happening was that since the ClientSession.close() is async it was awaiting that method and passing off the execution to the next async task. Well the first task was still stuck inside of the thread lock and a dead lock was occurring.

After stepping through here more I realized it is SUPER important for the session cache to be higher than the number of sessions needed. If it is not then it will continue to cycle sessions in and out of the cache. Once the await finally finishes the ClientSession.close() the session being returned may not even be in the cache anymore because it was evicted by a different task. This means there is still a race condition here where a session could be returned from the cache and then evicted by another task which then closes it before it can be used.

I almost think we should programmatically set the cache size and emit a warning if the cache grows over a certain size and then have a second hard limit where it can't grow above. couple of things to think about is that there could be some sort of connection limit on the underlying OS that we might hit, and memory constrained apps might have issues.

I also wonder if we should do something like this. It is not thread safe but are consumers going to mix threading and asyncio execution. If the consumer will be doing either threading on the blocking request or asyncio on the aiohttp session then maybe we should move to the above linked approach for the async cache.

Thoughts??

kclowes · 2022-06-06T22:38:38Z

@dbfreem I have not forgotten about this! I hope to get to a more thorough review of this comment and the outstanding session PR this week. I think we want to try and have it be threadsafe, but I need to spend some time thinking about how that might work 🤔

fselmo · 2022-06-06T22:39:59Z

I will try to take a look this week as well @kclowes 👀

fselmo · 2022-08-16T22:22:58Z

Resolved in #2409 for web3.py v6. This needs a v5 backport PR so keeping this open for now.

dbfreem · 2022-09-06T02:26:20Z

@fselmo @kclowes #2409 would fix the caching issue but should another issue be opened to track making the cache size configurable by the client?

fselmo · 2022-09-06T15:13:58Z

@dbfreem do you think raising the cache size to a reasonable higher value is enough for your needs or do you specifically want to be able to set the cache size? I think with PR #2409 and a reasonably high cache size 99% of cases should not have issues. Open to looking into configuring the cache size though, it just might take a back seat for a bit on our end.

DefiDebauchery · 2022-09-06T21:21:17Z

Being able to set the cache easily would be really helpful. Right now, cache size is 20, and at the moment, I'm running 16 concurrent endpoints to monitor the chains that our platform has a presence. This is set to increase within the next 6 months.

Granted, I could hardcode a change, but being able to set or otherwise override this in 'userspace' would be better.

fselmo · 2022-09-06T21:24:31Z

Yeah I'd definitely say 20 is on the low side considering how little memory that takes up. I imagine increasing this number to 50 or above for now. We'd definitely welcome a PR for customizing the cache size otherwise we can make an issue to track and we can get to it when we have some more time as I see this as a nice-to-have feature.

I'm all for making the library as customizable as possible as long as it makes sense to so I'd be onboard with that.

fselmo · 2022-10-21T23:02:38Z

closed by #2409

fselmo · 2022-10-21T23:04:37Z

Side note @DefiDebauchery, cache size is going up to 100 in PR #2690... still not at custom sessions but should be a pretty good buffer.

fselmo · 2022-10-21T23:06:35Z

Ah, I forgot this is in v5... re-opening. I'll try to get a back-port for those changes in.

fselmo · 2022-10-26T16:43:28Z

closed by #2691

kclowes mentioned this issue Apr 29, 2022

[v5] Increase session cache size #2451

Merged

1 task

dbfreem mentioned this issue Jul 7, 2022

Improve upon issues with session caching #2409

Merged

2 tasks

kclowes mentioned this issue Jul 18, 2022

Add support for async_simple_cache_middleware #2579

Merged

1 task

DefiDebauchery mentioned this issue Oct 8, 2022

Allow passing a custom aiohttp.ClientSession for AsyncHTTPProvider #2674

Closed

fselmo closed this as completed Oct 21, 2022

fselmo reopened this Oct 21, 2022

fselmo mentioned this issue Oct 21, 2022

[v5] request cache improvements #2691

Merged

1 task

fselmo closed this as completed Oct 26, 2022

Async concurrency regression with 5.29.0 #2446

Async concurrency regression with 5.29.0 #2446

Comments

DefiDebauchery commented Apr 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What was wrong?

dbfreem commented Apr 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DefiDebauchery commented Apr 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kclowes commented Apr 27, 2022

Uh oh!

kclowes commented Apr 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DefiDebauchery commented Apr 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dbfreem commented Apr 28, 2022

Uh oh!

DefiDebauchery commented Apr 28, 2022

Uh oh!

kclowes commented Apr 29, 2022

Uh oh!

DefiDebauchery commented Apr 29, 2022

Uh oh!

kclowes commented May 18, 2022

Uh oh!

dbfreem commented May 19, 2022

Uh oh!

kclowes commented Jun 6, 2022

Uh oh!

fselmo commented Jun 6, 2022

Uh oh!

fselmo commented Aug 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dbfreem commented Sep 6, 2022

Uh oh!

fselmo commented Sep 6, 2022

Uh oh!

DefiDebauchery commented Sep 6, 2022

Uh oh!

fselmo commented Sep 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fselmo commented Oct 21, 2022

Uh oh!

fselmo commented Oct 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fselmo commented Oct 21, 2022

Uh oh!

fselmo commented Oct 26, 2022

Uh oh!

DefiDebauchery commented Apr 26, 2022 •

edited

Loading

dbfreem commented Apr 27, 2022 •

edited

Loading

DefiDebauchery commented Apr 27, 2022 •

edited

Loading

kclowes commented Apr 27, 2022 •

edited

Loading

DefiDebauchery commented Apr 27, 2022 •

edited

Loading

fselmo commented Aug 16, 2022 •

edited

Loading

fselmo commented Sep 6, 2022 •

edited

Loading

fselmo commented Oct 21, 2022 •

edited

Loading