-
Notifications
You must be signed in to change notification settings - Fork 411
Separate route calculating from network message handlers #592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate route calculating from network message handlers #592
Conversation
0588112
to
ead8ff7
Compare
Nice! This generally looks good, though I'm not really a fan of exposing the graph structs as-is today (see conversation on #566) and feel pretty icky exposing a RwLock publicly, given its taken in our own handler stuff - this makes it somewhat easy to accidentally introduce deadlocks in user code by having the user hold a lock while handling some message which ends up taking the lock again. Instead, for now, may be best to let the router take a reference to the NetworkGraph (I think you can drop the word Monitor) and then take a read lock in the get_route function. |
ead8ff7
to
07b4030
Compare
@TheBlueMatt Made some updates, should be ready for review!
|
Just saw those compiler issues on travis, will address soon. Questions above still apply :) |
Travis is happy now, will squash last commit when we agree on those questions above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the delay, been rewriting our ChannelMonitor stuff all weekend :).
I’m not sure about just NetworkGraphname because it has a field NetworkMap which to me sounds like a synonym, and because of that is a little confusing… “Monitor” would indicate that it’s something with more powers, won’t it?
Right, definitely agree this is awkward. I'm not entirely clear on why they are separate, to be honest. There's effectively zero overhead to making all NetworkMaps hold a (potentially-dummy) chain_monitor and implement the message handling thereon.
It’s a little unclear where mark_node_bad should sit. I feel maybe it belongs to the Router, because it should be shared across different routing data providers.
Does our_node_id actually belong to NetworkMap? I feel it may be better in the Router.
From the current code, it looks like Router should just be a static function? It doesn't access anything in itself except for the logger, which we could just take as a parameter. Then our_node_id could be passed into it. In general, we should probably move this out of the ln module and into some kind of router module.
What’s the purpose of get_short_id in NetworkMap? I see why it get_key makes sense when there are non-Bitcoin chains, but get_short_id is unclear. What’s the difference between them?
Honestly, we should probably just drop the non-bitcoin chains thing. Last I checked it didn't even compile, and if someone is gonna use multi-asset lightning somehow you're gonna need more than just a router - you likely need deeper integration into the onion and such as the values are likely not denominated in mBTC, etc.
Finished a bunch of work here, I didn't expect that many changes :) How do we see NetworkMap at the end? |
lightning/src/routing/router.rs
Outdated
let old_entry = hm_entry.or_insert_with(|| { | ||
let node = network.nodes.get(&$directional_info.src_node_id).unwrap(); | ||
let mut fee_base_msat = 0; | ||
let mut fee_proportional_millionths = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We wanted to get rid of default values in node inbound fees, but this is the place about I left them.
I'm not sure I fully understand what's going on in this part of the code, so not sure this is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, sorry, lack of clarity on my end - I was dont like having default values in public interfaces (as Rust has a native Option<> for exactly this reason), but in this case were putting default values in a temporary map used for our routefinding, which is totally fine. That said, I think these should probably be max, right? If these aren't set it should imply we have no channels for this node, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I guess that's how the code should be read.
lightning/src/routing/router.rs
Outdated
res.last_mut().unwrap().fee_msat = final_value_msat; | ||
res.last_mut().unwrap().cltv_expiry_delta = final_cltv; | ||
let route = Route { paths: vec![res] }; | ||
// log_trace!(self, "Got route: {}", log_route!(route)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea how this thing works and why it fails right now with my code :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
our logging stuff generally assumes its being called from a method that takes an &self (ie a non-static method) and then accesses self.logger. While we should probably update our macros to make that not be the case, for now you can use the hack we have at https://github.com/rust-bitcoin/rust-lightning/blob/master/lightning/src/ln/onion_utils.rs#L443.
75ba981
to
637dc43
Compare
Codecov Report
@@ Coverage Diff @@
## master #592 +/- ##
==========================================
- Coverage 91.12% 91.11% -0.02%
==========================================
Files 34 35 +1
Lines 20505 20699 +194
==========================================
+ Hits 18686 18859 +173
- Misses 1819 1840 +21
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for your question about NetworkGraph vs NetworkMap, maybe NetworkGraph and NetGraphMessageHandler or so. Really the top-level thing is just something that can process messages and add them to a NetworkGraph with a lock in between.
pub channels: BTreeMap<u64, ChannelInfo>, | ||
|
||
/// Identifier of the local Lightning node | ||
pub our_node_id: PublicKey, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably just a parameter to get_route.
#[derive(PartialEq)] | ||
pub struct NetworkMap { | ||
/// Known valid channels | ||
pub channels: BTreeMap<u64, ChannelInfo>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont know how I feel about these being pub: RW. We have a few consistency requirements on this and nodes, etc, so maybe it makes more sense to just have accessors for them and options to add new channels directly (without going through the message abstractions)?
637dc43
to
c9575b0
Compare
@TheBlueMatt Okay, just wanted to clarify before finishing the last commit.
and get_route in this case takes a reference to NetGraphMessageHandler? I feel a bit odd to pass something message handler to be used for routing. Alternative (similarly to what happens with channels) would be to call that message handler NetworkGraphManager. |
You could make the internal fields read-only with something like this:
|
c9575b0
to
e7a0eb7
Compare
Alright, ready for review! There's some non-necessary renames (like NetworkGraphMonitor->NetworkGraph->NetGraphMsgHandler) in the middle of the PR, but they would be really difficult to squash/reorg at this point, so maybe we can keep commits as is? Also, I'm still a little unsure about our_node in NetGraphMsgHandler, now in the constructor. Should we keep it there? Or maybe keep, but make it optional. |
I thought we were just going to make that a parameter to get_route? |
Yeah, it is, with the current code. Since it's an explicit parameter, I assume it is no longer needed in NetGraphMsgHandler (it also doesn't make much sense). But currently it is still passed to the constructor of NetGraphMsgHandler, just to add |
remove_from_node!(chan.two_to_one.src_node_id); | ||
} | ||
|
||
/// Marks a node as having failed a route. This will avoid re-using the node in routes for now, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, we should probably just wholesale remove this - its not implemented and the API doesn't even make sense...we need a much more wholesale routing state machine.
|
||
impl NetworkGraph { | ||
#[inline] | ||
/// Key used to link channel on the Bitcoin chain to other entities |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can drop these - they were there because we had different map keys depending on the other_chains compile flag, but that is gone now.
} | ||
} | ||
|
||
impl RoutingMessageHandler for NetGraphMsgHandler { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, can you move this up in the file so all the NetGraphMsgHandler stuff is together and all the NetworkGraph stuff is together?
|
||
impl RoutingMessageHandler for NetGraphMsgHandler { | ||
|
||
fn handle_node_announcement(&self, msg: &msgs::NodeAnnouncement) -> Result<bool, LightningError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like if we probably want these to be in NetworkGraph, not NetGraphMsgHandler and just have thing wrappers across NetGraphMsgHandler. At a high level, the goal of encapsulating NetworkGraph and not letting users simply have direct write access to all the internal fields is that they have to stay consistent - by allowing someone to, eg, take a NodeAnnouncement, pass it into a NetworkGraph (with &mut self), we can easily provide the ability to update NetworkGraph without leaving us in an inconsistent state.
/// Returns a list of known valid channels | ||
pub fn get_channels<'a>(&'a self) -> &'a BTreeMap<u64, ChannelInfo> { &self.channels } | ||
/// Returns a mutable reference to an existing channel | ||
pub fn get_channel_mut<'a>(&'a mut self, chan_id: &u64) -> Option<&mut ChannelInfo> { self.channels.get_mut(chan_id) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't just expose these completely writeable, we want to allow users to add channels/nodes, but only in a way that keeps the internal state consistent. Otherwise we'll end up with panics like the one you showed at #570 (comment).
Ah, right, yea, I think we can drop that - tests pass fine without it mod a few routing tests that assume its there, which implies we're still able to route just fine (as I would expect). |
e7a0eb7
to
9a8c18e
Compare
Oh no, now there are conflicts in a gazillion of tests again. Will rebase tomorrow I guess. There are more commits than it should be, but selective squashing at this point would be painful, mostly because of the tests refactoring… Like every time I change the netgraphwhatever interface, 40 tests has to be updated. Last thing would be to update ARCH.md, once this passes the review. |
b4795ed
to
0f959f8
Compare
@TheBlueMatt rebasing was painful, so I had to squash everything and split into 3 commits. |
8ed266e
to
80aceb3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the annoying rebase, luckily there isn't much on the horizon that should cause it again. The commits squashed is totally fine. This is starting to get there, mostly just complaining that documentation should be more descriptive than it is and a few small notes about things that should be Optional.
/// Arguments for the creation of a NetGraphMsgHandler that are not deserialized. | ||
/// At a high-level, the process for deserializing a NetGraphMsgHandler and resuming normal operation is: | ||
/// 1) Deserialize the NetGraphMsgHandler by filling in this struct and calling <NetGraphMsgHandler>::read(reaser, args). | ||
/// 2) Register the new NetGraphMsgHandler with your ChainWatchInterface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think you need to do this - we only use the chain_monitor for queries, the NetGraphMsgHandler doesn't (currently) have any need for receiving block updates. Maybe in the future we can do that but its not implemented yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it even make sense to serialize NetGraphMsgHandler at all? The only field there is NetworkGraph, which can be serialized on it's own, and then spawn a new MsgHandler when needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, no, likely not, good idea. Probably makes the most sense to offer an alternative constructor which takes a NetGraphMsgHandler as a parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean a NetworkGraph?
Not sure why the whole NetGraphMsgHandler would be useful...
pub one_to_two: DirectionalChannelInfo, | ||
/// Details regarding another direction of a channel | ||
pub two_to_one: DirectionalChannelInfo, | ||
/// An initial announcement of the channel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you describe when this isn't filled in and why its Option<>al (namely that the data is redundant and we sometimes avoid storing things here if announcements include lots of extra data that we don't know how to parse, avoiding some basic DoS attacks).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"//" is a comment which is not published in the documentation, you have to use "///" for that. It would be nice, still, to describe why it might be None (ie if there is extra data and we dont want to store it). Not sure why hit was marked resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was resolved because I did add this comment, and then it got lost somewhere :(
/// Returns a list of known nodes | ||
pub fn get_nodes<'a>(&'a self) -> &'a BTreeMap<PublicKey, NodeInfo> { &self.nodes } | ||
|
||
fn process_node_announcement(&mut self, msg: &msgs::NodeAnnouncement) -> Result<bool, LightningError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these should be pub. Its fine to do it in a separate PR, but they're pretty useful, at least once they have sufficient documentation. May also make sense to not call them "process" as much as add_node_from_announcement or update_node_from_announcement or something like that, making sure to note in the docs that signatures are not checked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will do this, but if we making them public, does it make sense to move should_relay to the message handler, one level above?
It probably doesn't make much sense for every caller to know that, and if they want to relay or not, they can check it on their own, since it's trivial now and won't need db access.
dfc7bc5
to
3d88eb8
Compare
Addressed all the pending feedback. Travis passed, not sure why it shows waiting here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I think we're there, just need to clean up the documentation. I left a bunch of motivating questions at various points which I think someone should be able to figure out just by reading the documentation, but, in general, the documentation for almost everything should be full sentences which describe what something is and, more importantly, what you might want to use it for and how to use/interpret it.
/// All valid channels a node has announced | ||
pub channels: Vec<u64>, | ||
/// Lowest fees enabling routing via any of the known channels to a node | ||
pub lowest_inbound_channel_fees: Option<RoutingFees>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it somewhat confusing that this uses RoutingFees - it isn't clear either from the docs or from the datastructure that the two fields are independent - this contains the lowest flat fee and the lowest proportional fee, not the lowest flat + proportional fee. At a minimum the docs should be expanded greatly to indicate this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed it is confusing, but I don't think splitting fields would help much... We have to still apply the same comments and expect the user to read them.
So far planning to add this:
/// The two fields (flat and proportional fee) are independent,
/// meaning they don't have to refer to the same channel.
Don't know it's "greatly expanded", but not sure what else to say here.
use std::collections::btree_map::Entry as BtreeEntry; | ||
use std; | ||
|
||
/// Receives network updates from peers to track view of the network. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you use this? What else does it provide?
} | ||
|
||
impl NetGraphMsgHandler { | ||
/// Creates a new tracker of the actual state of the network of channels and nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the chain_monitor used for?
} | ||
} | ||
|
||
/// Get network addresses by node id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would this return none?
/// Arguments for the creation of a NetGraphMsgHandler that are not deserialized. | ||
/// At a high-level, the process for deserializing a NetGraphMsgHandler and resuming normal operation is: | ||
/// 1) Deserialize the NetGraphMsgHandler by filling in this struct and calling <NetGraphMsgHandler>::read(reaser, args). | ||
/// 2) Register the new NetGraphMsgHandler with your ChainWatchInterface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, no, likely not, good idea. Probably makes the most sense to offer an alternative constructor which takes a NetGraphMsgHandler as a parameter.
/// Lowest fees enabling routing via any of the known channels to a node | ||
pub lowest_inbound_channel_fees: Option<RoutingFees>, | ||
/// More information about a node from node_announcement | ||
/// Optional because we may have a NodeInfo entry before having received the announcement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do have :p. Its actually illegal to receive node_announcement before having received channels for the node (which will create this).
} | ||
|
||
impl NetworkGraph { | ||
/// Returns all known valid channels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please specify the return types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this one... Docs usually capture return values pretty well automatically. What do you think is unclear here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the u64 in the hashmap represent, mostly.
impl NetworkGraph { | ||
/// Returns all known valid channels | ||
pub fn get_channels<'a>(&'a self) -> &'a BTreeMap<u64, ChannelInfo> { &self.channels } | ||
/// Returns all known nodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same.
} | ||
|
||
/// For a new or already known (from previous announcement) channel, store or update channel info, | ||
/// after making sure it corresponds to a real transaction on-chain. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We dont do this? Thats what checked_utxo is for? checked_utxo's properties should be detailed, however.
} | ||
|
||
/// For an already known (from announcement) channel, update info regarding one of the directions of a channel. | ||
/// Announcement signatures *are checked* here and should be checked by a caller beforehand. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should they be checked beforehand if they're checked here? :p. I get the reasoning (the pubkey isnt available) but its super awkward to have it be different across different fns. Maybe we should just always do sig checking in NetworkGraph?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then in handle_channel_announcement we will be looking into the chain before checking sigs...
I guess it doesn't change much in this context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, ugh, good point. There's no winning here. Maybe just make the signature checking optional with both variants exposed for each and use the relevant one from the message handler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, it's enough in this case to just make secp_ctx optional.
But maybe we want to force users to explicitly set check_sig
variable, and then if secp_ctx is None, throw an exception?
Not sure which approach we should take within this project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I think provide-a-context-if-you-want-signatures-checked is pretty nice. just pass in an Option<&Secp256k1> and document it. No need to panic or force users to set a variable. good docs and a clear None/Some() distinction requires typing and thinking so I'm not too worried about it.
3d88eb8
to
97ac065
Compare
@@ -97,36 +99,6 @@ impl<'a, 'b, 'c> Drop for Node<'a, 'b, 'c> { | |||
assert!(self.node.get_and_clear_pending_events().is_empty()); | |||
assert!(self.chan_monitor.added_monitors.lock().unwrap().is_empty()); | |||
|
|||
// Check that if we serialize the Router, we can deserialize it again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did this get removed? It should instead be adapted to serialize the underlying graph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Network graph serialization is already tested here, unless you mean something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is a bit different - in the past we've had issues in edge cases failing to serialize-deserialize properly. So we added this so that after every test (which often test fun edge cases), we can serialize-deserialize everything properly. I think we should keep this, at least for the underlying graph.
6255bd1
to
521da4d
Compare
Addressed the last comments: returned serialization and updated Network Graph API. |
Not sure what i'm supposed to do with |
Just a few nits left. I added fixes for a few things at https://github.com/TheBlueMatt/rust-lightning/commits/2020-05-592-nits. I'm happy to just merge that branch, or you're welcome to squash them down onto the relevant commits here. That said, before merge, a few of your changes should be squashed down (eg the fuzz fixes, etc) - all tests (including cd fuzz && cargo test) should build and pass after each individual commit. |
@TheBlueMatt I squashed fuzzing into commits so that every single commit passes tests. If none, feel free to merge your branch now. I reviewed your code. looks good. |
521da4d
to
07a7e34
Compare
Will take as #621. |
Closes #432
Not fully ready yet, but it compiles and passes the tests, so I thought I could use some confirmation that I do sane things before I finish those small touches.TODO left:
pub
) is not too permissive