-
Notifications
You must be signed in to change notification settings - Fork 404
[block-sync] Don't hold client-cache lock while connecting #3197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[block-sync] Don't hold client-cache lock while connecting #3197
Conversation
`lightning-block-sync`'s REST and RPC clients both hold a cache for a connected client to avoid the extra connection round-trip on each request. Because only one client can be using a connection at once, the connection is `take()`n out of an `Option` behind a `Mutex` and if there isn't one present, we call `HttpClient::connect` to build a new one. However, this full logic is completed in one statement, causing a client-cache lock to be held during `HttpClient::connect`. This can turn into quite a bit of contention when using these clients as gossip verifiers as we can create many requests back-to-back during startup. I noticed this as my node during startup only seemed to be saturating one core and managed to get a backtrace that showed several threads being blocked on this mutex when hitting a Bitcoin Core node over REST that is on the same LAN, but not the same machine.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3197 +/- ##
==========================================
- Coverage 89.78% 89.73% -0.05%
==========================================
Files 121 121
Lines 100932 100935 +3
Branches 100932 100935 +3
==========================================
- Hits 90619 90578 -41
- Misses 7635 7672 +37
- Partials 2678 2685 +7 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this have us go from being bottlenecked on a single client/connection to potentially spawning an unbounded number of clients in the face of many parallel requests? Do we need to limit the number of workers here?
No, we'd always generate as many clients as are needed to make all pending requests in parallel, this just ensures we don't block waiting on a common lock in doing so. I don't think we need to worry too much about limiting pending requests at the client level, the callers should probably have some mechanism to do so (which at least gossip currently does). I mean we could limit it at the client level too, but given we currently do it at the callsite that's just dead code. Nothing wrong with doing it at the client, though, just separate from this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, makes sense.
@@ -34,7 +34,8 @@ impl RestClient { | |||
{ | |||
let host = format!("{}:{}", self.endpoint.host(), self.endpoint.port()); | |||
let uri = format!("{}/{}", self.endpoint.path().trim_end_matches("/"), resource_path); | |||
let mut client = if let Some(client) = self.client.lock().unwrap().take() { | |||
let reserved_client = self.client.lock().unwrap().take(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Want to note that it's a bit weird this is race-y as we'd create and drop unnecessary HTTPClient
s for parallel calls, but this is pre-existing anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean except for the extra memory usage or too-many-parallel-calls it doesn't really matter. Caching the TCP socket is nice, but certainly not a requirement for things to work.
lightning-block-sync
's REST and RPC clients both hold a cache for a connected client to avoid the extra connection round-trip on each request. Because only one client can be using a connection at once, the connection istake()
n out of anOption
behind aMutex
and if there isn't one present, we callHttpClient::connect
to build a new one.However, this full logic is completed in one statement, causing a client-cache lock to be held during
HttpClient::connect
. This can turn into quite a bit of contention when using these clients as gossip verifiers as we can create many requests back-to-back during startup.I noticed this as my node during startup only seemed to be saturating one core and managed to get a backtrace that showed several threads being blocked on this mutex when hitting a Bitcoin Core node over REST that is on the same LAN, but not the same machine.