[block-sync] Don't hold client-cache lock while connecting #3197

TheBlueMatt · 2024-07-21T23:36:01Z

lightning-block-sync's REST and RPC clients both hold a cache for a connected client to avoid the extra connection round-trip on each request. Because only one client can be using a connection at once, the connection is take()n out of an Option behind a Mutex and if there isn't one present, we call HttpClient::connect to build a new one.

However, this full logic is completed in one statement, causing a client-cache lock to be held during HttpClient::connect. This can turn into quite a bit of contention when using these clients as gossip verifiers as we can create many requests back-to-back during startup.

I noticed this as my node during startup only seemed to be saturating one core and managed to get a backtrace that showed several threads being blocked on this mutex when hitting a Bitcoin Core node over REST that is on the same LAN, but not the same machine.

`lightning-block-sync`'s REST and RPC clients both hold a cache for a connected client to avoid the extra connection round-trip on each request. Because only one client can be using a connection at once, the connection is `take()`n out of an `Option` behind a `Mutex` and if there isn't one present, we call `HttpClient::connect` to build a new one. However, this full logic is completed in one statement, causing a client-cache lock to be held during `HttpClient::connect`. This can turn into quite a bit of contention when using these clients as gossip verifiers as we can create many requests back-to-back during startup. I noticed this as my node during startup only seemed to be saturating one core and managed to get a backtrace that showed several threads being blocked on this mutex when hitting a Bitcoin Core node over REST that is on the same LAN, but not the same machine.

codecov · 2024-07-21T23:43:06Z

Codecov Report

Attention: Patch coverage is 50.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 89.73%. Comparing base (9ce3dd5) to head (7945af7).

Files	Patch %	Lines
lightning-block-sync/src/rest.rs	50.00%	0 Missing and 1 partial ⚠️
lightning-block-sync/src/rpc.rs	50.00%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3197      +/-   ##
==========================================
- Coverage   89.78%   89.73%   -0.05%     
==========================================
  Files         121      121              
  Lines      100932   100935       +3     
  Branches   100932   100935       +3     
==========================================
- Hits        90619    90578      -41     
- Misses       7635     7672      +37     
- Partials     2678     2685       +7

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

tnull

Would this have us go from being bottlenecked on a single client/connection to potentially spawning an unbounded number of clients in the face of many parallel requests? Do we need to limit the number of workers here?

TheBlueMatt · 2024-07-22T13:17:14Z

No, we'd always generate as many clients as are needed to make all pending requests in parallel, this just ensures we don't block waiting on a common lock in doing so. I don't think we need to worry too much about limiting pending requests at the client level, the callers should probably have some mechanism to do so (which at least gossip currently does).

I mean we could limit it at the client level too, but given we currently do it at the callsite that's just dead code. Nothing wrong with doing it at the client, though, just separate from this PR.

tnull

Alright, makes sense.

tnull · 2024-07-22T13:41:51Z

lightning-block-sync/src/rest.rs

@@ -34,7 +34,8 @@ impl RestClient {
 	{
 		let host = format!("{}:{}", self.endpoint.host(), self.endpoint.port());
 		let uri = format!("{}/{}", self.endpoint.path().trim_end_matches("/"), resource_path);
-		let mut client = if let Some(client) = self.client.lock().unwrap().take() {
+		let reserved_client = self.client.lock().unwrap().take();


Want to note that it's a bit weird this is race-y as we'd create and drop unnecessary HTTPClients for parallel calls, but this is pre-existing anyways.

I mean except for the extra memory usage or too-many-parallel-calls it doesn't really matter. Caching the TCP socket is nice, but certainly not a requirement for things to work.

tnull self-requested a review July 22, 2024 07:38

tnull reviewed Jul 22, 2024

View reviewed changes

tnull approved these changes Jul 22, 2024

View reviewed changes

jkczyz approved these changes Jul 22, 2024

View reviewed changes

TheBlueMatt merged commit 2b1d6aa into lightningdevkit:main Jul 22, 2024
19 of 21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[block-sync] Don't hold client-cache lock while connecting #3197

[block-sync] Don't hold client-cache lock while connecting #3197

Uh oh!

TheBlueMatt commented Jul 21, 2024

Uh oh!

codecov bot commented Jul 21, 2024

Uh oh!

tnull left a comment

Uh oh!

TheBlueMatt commented Jul 22, 2024 •

edited

Loading

Uh oh!

tnull left a comment

Uh oh!

tnull Jul 22, 2024

Uh oh!

TheBlueMatt Jul 22, 2024

Uh oh!

Uh oh!

Uh oh!

[block-sync] Don't hold client-cache lock while connecting #3197

[block-sync] Don't hold client-cache lock while connecting #3197

Uh oh!

Conversation

TheBlueMatt commented Jul 21, 2024

Uh oh!

codecov bot commented Jul 21, 2024

Codecov Report

Uh oh!

tnull left a comment

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt commented Jul 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tnull left a comment

Choose a reason for hiding this comment

Uh oh!

tnull Jul 22, 2024

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt Jul 22, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

TheBlueMatt commented Jul 22, 2024 •

edited

Loading