Speed up deserialization of secp256k1 objects significantly #933

TheBlueMatt · 2021-05-29T23:47:49Z

When we switched to TLV-ing nearly all objects we ended up with a substantial performance regression when writing secp256k1 objects. The chanmon_consistency_test fuzz target alone increased enough to nearly double our CI fuzz test time. This more than fixes the issue, actually decreasing runtime by avoiding one additional secp256k1 call in many cases. See individual commits for more info.

codecov · 2021-05-30T00:01:47Z

Codecov Report

Merging #933 (7231b4a) into main (4cc320b) will increase coverage by 0.11%.
The diff coverage is 98.38%.

@@            Coverage Diff             @@
##             main     #933      +/-   ##
==========================================
+ Coverage   90.53%   90.65%   +0.11%     
==========================================
  Files          60       60              
  Lines       30697    30908     +211     
==========================================
+ Hits        27793    28019     +226     
+ Misses       2904     2889      -15

Impacted Files	Coverage Δ
lightning/src/routing/network_graph.rs	`91.97% <ø> (ø)`
lightning/src/util/byte_utils.rs	`100.00% <ø> (ø)`
lightning/src/util/chacha20.rs	`95.37% <89.47%> (+3.97%)`	⬆️
lightning/src/ln/onion_utils.rs	`94.90% <100.00%> (ø)`
lightning/src/ln/peer_channel_encryptor.rs	`95.14% <100.00%> (ø)`
lightning/src/ln/wire.rs	`64.10% <100.00%> (ø)`
lightning/src/routing/router.rs	`96.06% <100.00%> (+<0.01%)`	⬆️
lightning/src/util/poly1305.rs	`99.47% <100.00%> (ø)`
lightning/src/util/ser.rs	`90.53% <100.00%> (+0.31%)`	⬆️
lightning/src/util/ser_macros.rs	`97.73% <100.00%> (+0.18%)`	⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4cc320b...7231b4a. Read the comment docs.

lightning/src/routing/network_graph.rs

jkczyz · 2021-05-31T18:28:40Z

lightning/src/ln/onion_utils.rs

@@ -367,7 +367,7 @@ pub(super) fn process_onion_failure<T: secp256k1::Signing, L: Deref>(secp_ctx: &
 						const NODE: u16 = 0x2000;
 						const UPDATE: u16 = 0x1000;

-						let error_code = byte_utils::slice_to_be16(&error_code_slice);
+						let error_code = u16::from_be_bytes(error_code_slice.try_into().expect("len is 2"));


nit: the message will prefix the error, so maybe something specific to the call site instead for these.

I sure hope we never see it anywhere, so I'll just leave it and pretend its a comment.

lightning/src/util/ser.rs

lightning/src/routing/router.rs

valentinewallace · 2021-05-31T19:51:39Z

lightning/src/util/ser_macros.rs

+					$( len_calc += self.$field.serialized_length(); )*
+					if $len != 0 {
+						// In tests, assert that the hard-coded length matches the actual one
+						assert_eq!(len_calc, $len);


Would this be covered during fuzzing? Seems like a useful check but codecov thinks it isn't hit

Good catch!

NetworkGraph is one of the largest structures we generally deserialize, so it makes for a good benchmark, even if it isn't the most complicated one. As of this commit, on an Intel 2687W v3, these benchmarks take: test routing::network_graph::benches::read_network_graph ... bench: 2,101,420,078 ns/iter (+/- 6,649,020) test routing::network_graph::benches::write_network_graph ... bench: 344,696,835 ns/iter (+/- 229,061)

This makes a small difference for NetworkGraph deserialization as it enables more inlining across different files, hopefully better matching user performance as well. As of this commit, on an Intel 2687W v3, the serialization benchmarks take: test routing::network_graph::benches::read_network_graph ... bench: 2,037,875,071 ns/iter (+/- 760,370) test routing::network_graph::benches::write_network_graph ... bench: 320,561,557 ns/iter (+/- 176,343)

Now that our MSRV supports the native methods, we have no need for the helpers anymore. Because LLVM was already matching our byte_utils methods as byteswap functions, this should have no impact on generated (optimzied) code. This removes most of the byte_utils usage, though some remains to keep the patch size reasonable.

When writing out libsecp256k1 objects during serialization in a TLV, we potentially calculate the TLV length twice before performing the actual serialization (once when calculating the total TLV-stream length and once when calculating the length of the secp256k1-object-containing TLV). Because the lengths of secp256k1 objects is a constant, we'd ideally like LLVM to entirely optimize out those calls and simply know the expected length. However, without cross-language LTO, there is no way for LLVM to verify that there are no side-effects of the calls to libsecp256k1, leaving LLVM with no way to optimize them out. This commit adds a new method to `Writeable` which returns the length of an object once serialized. It is implemented by default using `LengthCalculatingWriter` (which LLVM generally optimizes out for Rust objects) and overrides it for libsecp256k1 objects. As of this commit, on an Intel 2687W v3, the serialization benchmarks take: test routing::network_graph::benches::read_network_graph ... bench: 2,035,402,164 ns/iter (+/- 1,855,357) test routing::network_graph::benches::write_network_graph ... bench: 308,235,267 ns/iter (+/- 140,202)

With the new `serialized_length()` method potentially being significantly more efficient than `LengthCalculatingWriter`, this commit ensures we call `serialized_length()` when calculating length of a larger struct. Specifically, prior to this commit a call to `serialized_length()` on a large object serialized with `impl_writeable`, `impl_writeable_len_match`, or `encode_varint_length_prefixed_tlv` (and `impl_writeable_tlv_based`) would always serialize all inner fields of that object using `LengthCalculatingWriter`. This would ignore any `serialized_length()` overrides by inner fields. Instead, we override `serialized_length()` on all of the above by calculating the serialized size using calls to `serialized_length()` on inner fields. Further, writes to `LengthCalculatingWriter` should never fail as its `write` method never returns an error. Thus, any write failures indicate a bug in an object's write method or in our object-creation sanity checking. We `.expect()` such write calls here. As of this commit, on an Intel 2687W v3, the serialization benchmarks take: test routing::network_graph::benches::read_network_graph ... bench: 2,039,451,296 ns/iter (+/- 4,329,821) test routing::network_graph::benches::write_network_graph ... bench: 166,685,412 ns/iter (+/- 352,537)

This substantially improves deserialization performance when LLVM decides not to inline many short methods, eg when not building with LTO/codegen-units=1. Even with the default bench params of LTO/codegen-units=1, the serialization benchmarks on an Intel 2687W v3 take: test routing::network_graph::benches::read_network_graph ... bench: 1,955,616,225 ns/iter (+/- 4,135,777) test routing::network_graph::benches::write_network_graph ... bench: 165,905,275 ns/iter (+/- 118,798)

TheBlueMatt · 2021-06-01T15:47:51Z

Squashed with no diff:

$ git diff-tree -U1 b8eea10e 7231b4a2
$

jkczyz · 2021-06-01T16:04:14Z

ACK 7231b4a

jkczyz approved these changes May 31, 2021

View reviewed changes

valentinewallace approved these changes May 31, 2021

View reviewed changes

TheBlueMatt force-pushed the 2021-05-ser-fast branch 3 times, most recently from 9517d9c to b8eea10 Compare May 31, 2021 23:06

valentinewallace approved these changes Jun 1, 2021

View reviewed changes

TheBlueMatt added 6 commits June 1, 2021 15:47

TheBlueMatt force-pushed the 2021-05-ser-fast branch from b8eea10 to 7231b4a Compare June 1, 2021 15:47

TheBlueMatt merged commit 77bdc32 into lightningdevkit:main Jun 1, 2021

TheBlueMatt mentioned this pull request Jun 1, 2021

TLV-ize enum serialization and a few additional structs #935

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speed up deserialization of secp256k1 objects significantly #933

Speed up deserialization of secp256k1 objects significantly #933

Uh oh!

TheBlueMatt commented May 29, 2021

Uh oh!

codecov bot commented May 30, 2021 •

edited

Loading

Uh oh!

Uh oh!

jkczyz May 31, 2021

Uh oh!

TheBlueMatt May 31, 2021 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

valentinewallace May 31, 2021

Uh oh!

TheBlueMatt May 31, 2021

Uh oh!

TheBlueMatt commented Jun 1, 2021

Uh oh!

jkczyz commented Jun 1, 2021

Uh oh!

Uh oh!

Speed up deserialization of secp256k1 objects significantly #933

Speed up deserialization of secp256k1 objects significantly #933

Uh oh!

Conversation

TheBlueMatt commented May 29, 2021

Uh oh!

codecov bot commented May 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

jkczyz May 31, 2021

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt May 31, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

valentinewallace May 31, 2021

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt May 31, 2021

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt commented Jun 1, 2021

Uh oh!

jkczyz commented Jun 1, 2021

Uh oh!

Uh oh!

codecov bot commented May 30, 2021 •

edited

Loading

TheBlueMatt May 31, 2021 •

edited

Loading