Skip to content

Speed up deserialization of secp256k1 objects significantly #933

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 1, 2021

Conversation

TheBlueMatt
Copy link
Collaborator

When we switched to TLV-ing nearly all objects we ended up with a substantial performance regression when writing secp256k1 objects. The chanmon_consistency_test fuzz target alone increased enough to nearly double our CI fuzz test time. This more than fixes the issue, actually decreasing runtime by avoiding one additional secp256k1 call in many cases. See individual commits for more info.

@codecov
Copy link

codecov bot commented May 30, 2021

Codecov Report

Merging #933 (7231b4a) into main (4cc320b) will increase coverage by 0.11%.
The diff coverage is 98.38%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #933      +/-   ##
==========================================
+ Coverage   90.53%   90.65%   +0.11%     
==========================================
  Files          60       60              
  Lines       30697    30908     +211     
==========================================
+ Hits        27793    28019     +226     
+ Misses       2904     2889      -15     
Impacted Files Coverage Δ
lightning/src/routing/network_graph.rs 91.97% <ø> (ø)
lightning/src/util/byte_utils.rs 100.00% <ø> (ø)
lightning/src/util/chacha20.rs 95.37% <89.47%> (+3.97%) ⬆️
lightning/src/ln/onion_utils.rs 94.90% <100.00%> (ø)
lightning/src/ln/peer_channel_encryptor.rs 95.14% <100.00%> (ø)
lightning/src/ln/wire.rs 64.10% <100.00%> (ø)
lightning/src/routing/router.rs 96.06% <100.00%> (+<0.01%) ⬆️
lightning/src/util/poly1305.rs 99.47% <100.00%> (ø)
lightning/src/util/ser.rs 90.53% <100.00%> (+0.31%) ⬆️
lightning/src/util/ser_macros.rs 97.73% <100.00%> (+0.18%) ⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4cc320b...7231b4a. Read the comment docs.

@@ -367,7 +367,7 @@ pub(super) fn process_onion_failure<T: secp256k1::Signing, L: Deref>(secp_ctx: &
const NODE: u16 = 0x2000;
const UPDATE: u16 = 0x1000;

let error_code = byte_utils::slice_to_be16(&error_code_slice);
let error_code = u16::from_be_bytes(error_code_slice.try_into().expect("len is 2"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the message will prefix the error, so maybe something specific to the call site instead for these.

Copy link
Collaborator Author

@TheBlueMatt TheBlueMatt May 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sure hope we never see it anywhere, so I'll just leave it and pretend its a comment.

$( len_calc += self.$field.serialized_length(); )*
if $len != 0 {
// In tests, assert that the hard-coded length matches the actual one
assert_eq!(len_calc, $len);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be covered during fuzzing? Seems like a useful check but codecov thinks it isn't hit

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@TheBlueMatt TheBlueMatt force-pushed the 2021-05-ser-fast branch 3 times, most recently from 9517d9c to b8eea10 Compare May 31, 2021 23:06
NetworkGraph is one of the largest structures we generally
deserialize, so it makes for a good benchmark, even if it isn't the
most complicated one.

As of this commit, on an Intel 2687W v3, these benchmarks take:

test routing::network_graph::benches::read_network_graph  ... bench: 2,101,420,078 ns/iter (+/- 6,649,020)
test routing::network_graph::benches::write_network_graph ... bench: 344,696,835 ns/iter (+/- 229,061)
This makes a small difference for NetworkGraph deserialization
as it enables more inlining across different files, hopefully
better matching user performance as well.

As of this commit, on an Intel 2687W v3, the serialization
benchmarks take:

test routing::network_graph::benches::read_network_graph  ... bench: 2,037,875,071 ns/iter (+/- 760,370)
test routing::network_graph::benches::write_network_graph ... bench: 320,561,557 ns/iter (+/- 176,343)
Now that our MSRV supports the native methods, we have no need
for the helpers anymore. Because LLVM was already matching our
byte_utils methods as byteswap functions, this should have no
impact on generated (optimzied) code.

This removes most of the byte_utils usage, though some remains to
keep the patch size reasonable.
When writing out libsecp256k1 objects during serialization in a
TLV, we potentially calculate the TLV length twice before
performing the actual serialization (once when calculating the
total TLV-stream length and once when calculating the length of the
secp256k1-object-containing TLV). Because the lengths of secp256k1
objects is a constant, we'd ideally like LLVM to entirely optimize
out those calls and simply know the expected length. However,
without cross-language LTO, there is no way for LLVM to verify that
there are no side-effects of the calls to libsecp256k1, leaving
LLVM with no way to optimize them out.

This commit adds a new method to `Writeable` which returns the
length of an object once serialized. It is implemented by default
using `LengthCalculatingWriter` (which LLVM generally optimizes out
for Rust objects) and overrides it for libsecp256k1 objects.

As of this commit, on an Intel 2687W v3, the serialization
benchmarks take:

test routing::network_graph::benches::read_network_graph  ... bench: 2,035,402,164 ns/iter (+/- 1,855,357)
test routing::network_graph::benches::write_network_graph ... bench: 308,235,267 ns/iter (+/- 140,202)
With the new `serialized_length()` method potentially being
significantly more efficient than `LengthCalculatingWriter`, this
commit ensures we call `serialized_length()` when calculating
length of a larger struct.

Specifically, prior to this commit a call to
`serialized_length()` on a large object serialized with
`impl_writeable`, `impl_writeable_len_match`, or
`encode_varint_length_prefixed_tlv` (and
`impl_writeable_tlv_based`) would always serialize all inner fields
of that object using `LengthCalculatingWriter`. This would ignore
any `serialized_length()` overrides by inner fields. Instead, we
override `serialized_length()` on all of the above by calculating
the serialized size using calls to `serialized_length()` on inner
fields.

Further, writes to `LengthCalculatingWriter` should never fail as
its `write` method never returns an error. Thus, any write failures
indicate a bug in an object's write method or in our
object-creation sanity checking. We `.expect()` such write calls
here.

As of this commit, on an Intel 2687W v3, the serialization
benchmarks take:

test routing::network_graph::benches::read_network_graph  ... bench: 2,039,451,296 ns/iter (+/- 4,329,821)
test routing::network_graph::benches::write_network_graph ... bench: 166,685,412 ns/iter (+/- 352,537)
This substantially improves deserialization performance when LLVM
decides not to inline many short methods, eg when not building
with LTO/codegen-units=1.

Even with the default bench params of LTO/codegen-units=1, the
serialization benchmarks on an Intel 2687W v3 take:

test routing::network_graph::benches::read_network_graph  ... bench: 1,955,616,225 ns/iter (+/- 4,135,777)
test routing::network_graph::benches::write_network_graph ... bench: 165,905,275 ns/iter (+/- 118,798)
@TheBlueMatt
Copy link
Collaborator Author

Squashed with no diff:

$ git diff-tree -U1 b8eea10e 7231b4a2
$

@jkczyz
Copy link
Contributor

jkczyz commented Jun 1, 2021

ACK 7231b4a

@TheBlueMatt TheBlueMatt merged commit 77bdc32 into lightningdevkit:main Jun 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants