-
Notifications
You must be signed in to change notification settings - Fork 409
Speed up deserialization of secp256k1 objects significantly #933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up deserialization of secp256k1 objects significantly #933
Conversation
Codecov Report
@@ Coverage Diff @@
## main #933 +/- ##
==========================================
+ Coverage 90.53% 90.65% +0.11%
==========================================
Files 60 60
Lines 30697 30908 +211
==========================================
+ Hits 27793 28019 +226
+ Misses 2904 2889 -15
Continue to review full report at Codecov.
|
@@ -367,7 +367,7 @@ pub(super) fn process_onion_failure<T: secp256k1::Signing, L: Deref>(secp_ctx: & | |||
const NODE: u16 = 0x2000; | |||
const UPDATE: u16 = 0x1000; | |||
|
|||
let error_code = byte_utils::slice_to_be16(&error_code_slice); | |||
let error_code = u16::from_be_bytes(error_code_slice.try_into().expect("len is 2")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the message will prefix the error, so maybe something specific to the call site instead for these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I sure hope we never see it anywhere, so I'll just leave it and pretend its a comment.
$( len_calc += self.$field.serialized_length(); )* | ||
if $len != 0 { | ||
// In tests, assert that the hard-coded length matches the actual one | ||
assert_eq!(len_calc, $len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be covered during fuzzing? Seems like a useful check but codecov thinks it isn't hit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
9517d9c
to
b8eea10
Compare
NetworkGraph is one of the largest structures we generally deserialize, so it makes for a good benchmark, even if it isn't the most complicated one. As of this commit, on an Intel 2687W v3, these benchmarks take: test routing::network_graph::benches::read_network_graph ... bench: 2,101,420,078 ns/iter (+/- 6,649,020) test routing::network_graph::benches::write_network_graph ... bench: 344,696,835 ns/iter (+/- 229,061)
This makes a small difference for NetworkGraph deserialization as it enables more inlining across different files, hopefully better matching user performance as well. As of this commit, on an Intel 2687W v3, the serialization benchmarks take: test routing::network_graph::benches::read_network_graph ... bench: 2,037,875,071 ns/iter (+/- 760,370) test routing::network_graph::benches::write_network_graph ... bench: 320,561,557 ns/iter (+/- 176,343)
Now that our MSRV supports the native methods, we have no need for the helpers anymore. Because LLVM was already matching our byte_utils methods as byteswap functions, this should have no impact on generated (optimzied) code. This removes most of the byte_utils usage, though some remains to keep the patch size reasonable.
When writing out libsecp256k1 objects during serialization in a TLV, we potentially calculate the TLV length twice before performing the actual serialization (once when calculating the total TLV-stream length and once when calculating the length of the secp256k1-object-containing TLV). Because the lengths of secp256k1 objects is a constant, we'd ideally like LLVM to entirely optimize out those calls and simply know the expected length. However, without cross-language LTO, there is no way for LLVM to verify that there are no side-effects of the calls to libsecp256k1, leaving LLVM with no way to optimize them out. This commit adds a new method to `Writeable` which returns the length of an object once serialized. It is implemented by default using `LengthCalculatingWriter` (which LLVM generally optimizes out for Rust objects) and overrides it for libsecp256k1 objects. As of this commit, on an Intel 2687W v3, the serialization benchmarks take: test routing::network_graph::benches::read_network_graph ... bench: 2,035,402,164 ns/iter (+/- 1,855,357) test routing::network_graph::benches::write_network_graph ... bench: 308,235,267 ns/iter (+/- 140,202)
With the new `serialized_length()` method potentially being significantly more efficient than `LengthCalculatingWriter`, this commit ensures we call `serialized_length()` when calculating length of a larger struct. Specifically, prior to this commit a call to `serialized_length()` on a large object serialized with `impl_writeable`, `impl_writeable_len_match`, or `encode_varint_length_prefixed_tlv` (and `impl_writeable_tlv_based`) would always serialize all inner fields of that object using `LengthCalculatingWriter`. This would ignore any `serialized_length()` overrides by inner fields. Instead, we override `serialized_length()` on all of the above by calculating the serialized size using calls to `serialized_length()` on inner fields. Further, writes to `LengthCalculatingWriter` should never fail as its `write` method never returns an error. Thus, any write failures indicate a bug in an object's write method or in our object-creation sanity checking. We `.expect()` such write calls here. As of this commit, on an Intel 2687W v3, the serialization benchmarks take: test routing::network_graph::benches::read_network_graph ... bench: 2,039,451,296 ns/iter (+/- 4,329,821) test routing::network_graph::benches::write_network_graph ... bench: 166,685,412 ns/iter (+/- 352,537)
This substantially improves deserialization performance when LLVM decides not to inline many short methods, eg when not building with LTO/codegen-units=1. Even with the default bench params of LTO/codegen-units=1, the serialization benchmarks on an Intel 2687W v3 take: test routing::network_graph::benches::read_network_graph ... bench: 1,955,616,225 ns/iter (+/- 4,135,777) test routing::network_graph::benches::write_network_graph ... bench: 165,905,275 ns/iter (+/- 118,798)
b8eea10
to
7231b4a
Compare
Squashed with no diff:
|
ACK 7231b4a |
When we switched to TLV-ing nearly all objects we ended up with a substantial performance regression when writing secp256k1 objects. The chanmon_consistency_test fuzz target alone increased enough to nearly double our CI fuzz test time. This more than fixes the issue, actually decreasing runtime by avoiding one additional secp256k1 call in many cases. See individual commits for more info.