Skip to content

[block-sync] Don't hold client-cache lock while connecting #3197

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

TheBlueMatt
Copy link
Collaborator

lightning-block-sync's REST and RPC clients both hold a cache for a connected client to avoid the extra connection round-trip on each request. Because only one client can be using a connection at once, the connection is take()n out of an Option behind a Mutex and if there isn't one present, we call HttpClient::connect to build a new one.

However, this full logic is completed in one statement, causing a client-cache lock to be held during HttpClient::connect. This can turn into quite a bit of contention when using these clients as gossip verifiers as we can create many requests back-to-back during startup.

I noticed this as my node during startup only seemed to be saturating one core and managed to get a backtrace that showed several threads being blocked on this mutex when hitting a Bitcoin Core node over REST that is on the same LAN, but not the same machine.

`lightning-block-sync`'s REST and RPC clients both hold a cache for
a connected client to avoid the extra connection round-trip on each
request. Because only one client can be using a connection at once,
the connection is `take()`n out of an `Option` behind a `Mutex` and
if there isn't one present, we call `HttpClient::connect` to build
a new one.

However, this full logic is completed in one statement, causing a
client-cache lock to be held during `HttpClient::connect`. This
can turn into quite a bit of contention when using these clients as
gossip verifiers as we can create many requests back-to-back during
startup.

I noticed this as my node during startup only seemed to be
saturating one core and managed to get a backtrace that showed
several threads being blocked on this mutex when hitting a Bitcoin
Core node over REST that is on the same LAN, but not the same
machine.
Copy link

codecov bot commented Jul 21, 2024

Codecov Report

Attention: Patch coverage is 50.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 89.73%. Comparing base (9ce3dd5) to head (7945af7).

Files Patch % Lines
lightning-block-sync/src/rest.rs 50.00% 0 Missing and 1 partial ⚠️
lightning-block-sync/src/rpc.rs 50.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3197      +/-   ##
==========================================
- Coverage   89.78%   89.73%   -0.05%     
==========================================
  Files         121      121              
  Lines      100932   100935       +3     
  Branches   100932   100935       +3     
==========================================
- Hits        90619    90578      -41     
- Misses       7635     7672      +37     
- Partials     2678     2685       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tnull tnull self-requested a review July 22, 2024 07:38
Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this have us go from being bottlenecked on a single client/connection to potentially spawning an unbounded number of clients in the face of many parallel requests? Do we need to limit the number of workers here?

@TheBlueMatt
Copy link
Collaborator Author

TheBlueMatt commented Jul 22, 2024

No, we'd always generate as many clients as are needed to make all pending requests in parallel, this just ensures we don't block waiting on a common lock in doing so. I don't think we need to worry too much about limiting pending requests at the client level, the callers should probably have some mechanism to do so (which at least gossip currently does).

I mean we could limit it at the client level too, but given we currently do it at the callsite that's just dead code. Nothing wrong with doing it at the client, though, just separate from this PR.

Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, makes sense.

@@ -34,7 +34,8 @@ impl RestClient {
{
let host = format!("{}:{}", self.endpoint.host(), self.endpoint.port());
let uri = format!("{}/{}", self.endpoint.path().trim_end_matches("/"), resource_path);
let mut client = if let Some(client) = self.client.lock().unwrap().take() {
let reserved_client = self.client.lock().unwrap().take();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want to note that it's a bit weird this is race-y as we'd create and drop unnecessary HTTPClients for parallel calls, but this is pre-existing anyways.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean except for the extra memory usage or too-many-parallel-calls it doesn't really matter. Caching the TCP socket is nice, but certainly not a requirement for things to work.

@TheBlueMatt TheBlueMatt merged commit 2b1d6aa into lightningdevkit:main Jul 22, 2024
19 of 21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants