Skip to content

Clean up message forwarding and relay gossip messages #948

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 21, 2021

Conversation

TheBlueMatt
Copy link
Collaborator

While we really should eventually clean up some of the big PeerHandler refactors and merge them, this fixes a bit of low-hanging fruit in the mean time, also adding forwarding of gossip messages which @devrandom wanted.

@codecov
Copy link

codecov bot commented Jun 10, 2021

Codecov Report

Merging #948 (895d1a8) into main (94528f0) will increase coverage by 0.03%.
The diff coverage is 26.27%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #948      +/-   ##
==========================================
+ Coverage   90.58%   90.62%   +0.03%     
==========================================
  Files          60       60              
  Lines       30423    30409      -14     
==========================================
- Hits        27560    27559       -1     
+ Misses       2863     2850      -13     
Impacted Files Coverage Δ
lightning/src/ln/peer_handler.rs 45.44% <26.27%> (+1.13%) ⬆️
lightning/src/ln/functional_tests.rs 97.17% <0.00%> (-0.05%) ⬇️
lightning/src/ln/channelmanager.rs 83.85% <0.00%> (-0.05%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 94528f0...895d1a8. Read the comment docs.

@devrandom
Copy link
Member

is there a quick explanation why do_attempt_write_data call was superfluous?

@devrandom
Copy link
Member

utACK

@TheBlueMatt
Copy link
Collaborator Author

is there a quick explanation why do_attempt_write_data call was superfluous?

Not that its superfluous, but that it violates the stated API of read handling not generating reentrant callbacks, see eae0904.

@devrandom
Copy link
Member

I guess I'm asking if something else will kick off any writes that need to be sent out?

@TheBlueMatt
Copy link
Collaborator Author

I guess I'm asking if something else will kick off any writes that need to be sent out?

Yes, the docs indicate you need to call process_events() to flush outbound message buffers.

Copy link

@ariard ariard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, LGTM

@@ -808,8 +808,6 @@ impl<Descriptor: SocketDescriptor, CM: Deref, RM: Deref, L: Deref> PeerManager<D
}
}

self.do_attempt_write_data(peer_descriptor, peer);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you remember about it, what the initial purpose of this "write-just-after-read-in-case-of-reply" ? Save a call to process_events to the API user?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, just to be quicker about it - save an inertion into the peers_needing_send map and such. Its not all that critical unless the user only calls process_events in a loop and has some sleep we end up blocking on.

}
if except_node.is_some() && peer.their_node_id.as_ref() == except_node {
continue;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you include a non-replay-to-channel-update-broadcaster check as it's done for the 2 other gossips ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no node_id field in a channel_update, its assumed the network graph can figure it out from the short_channel_id.

@@ -1010,6 +1010,64 @@ impl<Descriptor: SocketDescriptor, CM: Deref, RM: Deref, L: Deref> PeerManager<D
Ok(())
}

fn forward_broadcast_msg(&self, peers: &mut PeerHolder<Descriptor>, msg: &wire::Message, except_node: Option<&PublicKey>) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: forward_gossip_msg ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably it could be used for any broadcast message? I like mentioning the word "broadcast" in it.

@@ -234,6 +234,13 @@ enum InitSyncTracker{
NodesSyncing(PublicKey),
}

/// When the outbound buffer has this many messages, we'll stop reading bytes from the peer until
/// we manage to send messages until we reach this limit.
const OUTBOUND_BUFFER_LIMIT_READ_PAUSE: usize = 10;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this constant call OUTBOUND_BUFFER_LIMIT_SYNC_PAUSE ? Otherwise, it's a bit confusing given it's called in do_attempt_write_data

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, its used for both, not sure which is better to focus on. I could just call it like OUTBOUND_BUFFER_LOW_WATER_MARK which is pretty normal for these types of constants, and document in the doccomment what its used for?

@devrandom
Copy link
Member

tested ACK

@TheBlueMatt TheBlueMatt mentioned this pull request Jun 17, 2021
We can never assume that messages were reliably delivered whether
we placed them in the socket or not, so there isn't a lot of use in
explicitly handling the case that a peer was not connected when we
went to send it a message.

Two TODOs are left for the generation of a `FundingAbandoned` (or
similar) event, though it ultimately belongs in `ChannelManager`.
`Julian Knutsen <[email protected]>` pointed
out in a previous discussion that `read_event` can reenter user
code despite the documentation stating explicitly that it will not.

This was addressed in lightningdevkit#456 by simply codifying the reentrancy, but
its somewhat simpler to just drop the `do_attempt_write_data` call.

Ideally we could land most of Julian's work, but its still in need
of substantial git history cleanup to get it in a reviewable state
and this solves the immediate issue.
This will allow us to broadcast messages received in the next
commit.
@TheBlueMatt
Copy link
Collaborator Author

Rebased to fix a trivial conflict and squashed fixup commits.

Copy link
Member

@devrandom devrandom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK other than nits

@valentinewallace valentinewallace self-requested a review June 21, 2021 17:32
@TheBlueMatt
Copy link
Collaborator Author

Addressed comments (with fixup commits and one new commit at the head).

We do a lot of work to track which peers have buffers which need
to be flushed, when we could instead just try to flush ever peer's
buffer.
This avoids a now-unnecessary SocketDescriptor clone() call in
addition to cleaning up the callsite code somewhat.
@TheBlueMatt
Copy link
Collaborator Author

Squashed with no changes. Changes since @devrandom's last ack are also trivial. Will merge after CI.

@TheBlueMatt TheBlueMatt merged commit 2f6205b into lightningdevkit:main Jun 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants