Handle double-HTLC-claims without failing the backwards channel #977

TheBlueMatt · 2021-06-30T02:56:10Z

When receiving an update_fulfill_htlc message, we immediately
forward the claim backwards along the payment path before waiting
for a full commitment_signed dance. This is great, but can cause
duplicative claims if a node sends an update_fulfill_htlc message,
disconnects, reconnects, and then has to re-send its
update_fulfill_htlc message again.

While there was code to handle this, it treated it as a channel
error on the inbound channel, which is incorrect - this is an
expected, albeit incredibly rare, condition. Instead, we handle
these double-claims correctly, simply ignoring them.

With debug_assertions enabled, we also check that the previous
close of the same HTLC was a fulfill, and that we are not moving
from a HTLC failure to an HTLC claim after its too late.

A test is also added, which hits all three failure cases in
Channel::get_update_fulfill_htlc.

Found by the chanmon_consistency fuzzer.

codecov · 2021-06-30T03:17:30Z

Codecov Report

Merging #977 (65fcf57) into main (431f807) will increase coverage by 1.64%.
The diff coverage is 94.69%.

❗ Current head 65fcf57 differs from pull request most recent head f06f9d1. Consider uploading reports for the commit f06f9d1 to get more accurate results

@@            Coverage Diff             @@
##             main     #977      +/-   ##
==========================================
+ Coverage   90.73%   92.37%   +1.64%     
==========================================
  Files          60       60              
  Lines       30702    37203    +6501     
==========================================
+ Hits        27857    34368    +6511     
+ Misses       2845     2835      -10

Impacted Files	Coverage Δ
lightning/src/ln/channelmanager.rs	`88.56% <77.14%> (+3.83%)`	⬆️
lightning/src/ln/channel.rs	`92.02% <92.85%> (+3.51%)`	⬆️
lightning/src/chain/channelmonitor.rs	`90.80% <100.00%> (ø)`
lightning/src/ln/chanmon_update_fail_tests.rs	`98.33% <100.00%> (+0.39%)`	⬆️
lightning/src/ln/functional_test_utils.rs	`95.89% <100.00%> (+1.05%)`	⬆️
lightning/src/ln/functional_tests.rs	`98.75% <100.00%> (+1.49%)`	⬆️
lightning/src/ln/onion_route_tests.rs	`96.53% <100.00%> (-0.28%)`	⬇️
lightning/src/util/message_signing.rs	`93.02% <0.00%> (-0.09%)`	⬇️
lightning/src/ln/features.rs	`99.78% <0.00%> (+0.07%)`	⬆️
... and 12 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 431f807...f06f9d1. Read the comment docs.

lightning/src/ln/channel.rs

ariard · 2021-07-13T17:31:14Z

lightning/src/ln/channel.rs

 			return Ok((None, Some(monitor_update)));
 		}
+		#[cfg(any(test, feature = "fuzztarget"))]
+		self.historical_inbound_htlc_fulfills.insert(htlc_id_arg);


Also add a contains() check inside the HTLCUpdateAwaitingACK::ClaimHTLC branch in get_update_fail_htlc to pass the debug_assert in case of downgrade from update_fulfill_htlcto update_fail_htlc, which is also a correct behavior ?

We always debug_assert!(false, ... there, because its a bug to duplicate-fail (as failing waits for the commitment signed dance to complete).

I was thinking about the sequence success-failure? We don't wait on forward channel full commitment_signed_dance to pass success backward, so the get_update_fail_htlc in process_pending_htlc_forwards can be reached second time ?

Oh! Yep, you're totally right, worse the fuzzer won't hit that case. I updated the new test in this PR to hit that case and fixed bugs.

ariard · 2021-07-13T17:36:16Z

lightning/src/ln/chanmon_update_fail_tests.rs

+	// can cause duplicative claims if a node sends an update_fulfill_htlc message, disconnects,
+	// reconnects, and then has to re-send its update_fulfill_htlc message again.
+	// In previous code, we didn't handle the double-claim correctly, spuriously closing the
+	// channel on which the inbound HTLC was received.


Are you saying that previous to this PR, if we did receive a duplicate update_fulfiil_htlc on the forward channel, we would have close the backward one ? You're referring to current get_update_fulfill_htlc or another behavior? Don't get it here...

Are you saying that previous to this PR, if we did receive a duplicate update_fulfiil_htlc on the forward channel, we would have close the backward one

Yes, that's correct. We'd debug_assert!(false, ... then close the channel when debug assertions were off.

ariard · 2021-07-13T17:46:03Z

lightning/src/ln/chanmon_update_fail_tests.rs

+fn test_reconnect_dup_htlc_claims() {
+	do_test_reconnect_dup_htlc_claims(HTLCStatusAtDupClaim::Received);
+	do_test_reconnect_dup_htlc_claims(HTLCStatusAtDupClaim::HoldingCell);
+	do_test_reconnect_dup_htlc_claims(HTLCStatusAtDupClaim::Cleared);


Just for clarity the third case you're mentioning in commit isn't exercised here ?

With debug_assertions enabled, we also check that the previous
close of the same HTLC was a fulfill, and that we are not moving
from a HTLC failure to an HTLC claim after its too late.

I translate as hitting the HTLCUpdateAwaitingACK::FailHTLC branch in get_update_fulfill_htlc ?

Just for clarity the third case you're mentioning in commit isn't exercised here ?

Yes, I do not believe that is possible to hit, so I certainly hope we aren't exercising it :).

I translate as hitting the HTLCUpdateAwaitingACK::FailHTLC branch in get_update_fulfill_htlc ?

Yes, roughly.

Agree, we shouldn't be able to hit if we dully wait the commitment_signed dance on forward before to pass the update_fail_htlc

ariard · 2021-07-14T14:30:31Z

Code Review ACK b64ab91 modulo #977 (comment)

TheBlueMatt · 2021-07-15T22:30:52Z

Pushed additional commits to simplify the call graph further and handle a TODO that's been in the code for some time.

lightning/src/ln/channel.rs

valentinewallace · 2021-07-22T20:56:19Z

lightning/src/ln/channel.rs

+			(Some(_), None) => {
+				// If we generated a claim message, we absolutely should have generated a
+				// ChannelMonitorUpdate, otherwise we are going to probably lose funds.
+				unreachable!();


Would it be better for the compiler to enforce this by making the return value an Enum? I think that would also allow us to get rid of the unwrap on line 2470 in free_holding_cell method

I don't think it drops the unwrap, sadly, but, yea, you're right, I simplified it a bit more.

I think the new return value of Option<(Option<msgs::UpdateFulfillHTLC>, ChannelMonitorUpdate)> isn't too easy on the eyes. Think we could add an abstraction, like an Enum or struct to encapsulate it?

Done, I made it into a little enum.

ariard

Code Review ACK dac9f3f

ariard · 2021-07-25T21:43:45Z

lightning/src/ln/channelmanager.rs

-					return Err(None)
+				Err((e, monitor_update)) => {
+					if let Err(e) = self.chain_monitor.update_channel(chan.get().get_funding_txo().unwrap(), monitor_update) {
+						log_error!(self.logger, "Critical error: failed to update channel monitor with preimage {:?}: {:?}",


Yeah I agree we're pretty stuck in this catastrophic scenario...if the error source was tight to new state key derivation we can still hope for the counterparty to broadcast its own and claim the offered output on it, though also assume we still have the key for this htlcpubkey...a lot of ifs!

One upgrade of your KeysInterface could also be to inform when signing abortion has detrimental consequences for funds safety, though I expect the claim back to be automated for routing nodes.

valentinewallace

This PR is shaping up to me! Just have a few more nits. I think the return values on get_update_fulfill and others are a bit complicated to parse, though that seems to be mostly an existing issue. Not sure if we want to spend the time on this release to make them nicer

lightning/src/ln/channel.rs

valentinewallace · 2021-07-25T23:20:08Z

lightning/src/ln/channelmanager.rs

-					return Err(None)
+				Err((e, monitor_update)) => {
+					if let Err(e) = self.chain_monitor.update_channel(chan.get().get_funding_txo().unwrap(), monitor_update) {
+						log_error!(self.logger, "Critical error: failed to update channel monitor with preimage {:?}: {:?}",


Isn't the above update_channel fail case also Critical error? Just trying to figure out why the logic on update_channel failure isn't symmetric..

You're right, I'm still a bit stuck in the old mental model where we'd retry ChannelMonitorUpates for users, but we no longer do that, so its not a "critical" error as long as users can get the update to the monitors. I updated the text to be more clear and logged in both places (with appropriate log levels).

What do you mean here by getting the update to the monitors ? If I'm following well we might have a HTLC claimed forward with the preimage passed as an arg to claim_funds_from_hop but we won't have been able to generate an updated commitment nor able to pass the preimage to our onchain backend. If the counterparty force-close the channel our ChannelMonitor won't claim the due offered HTLC?

Correct, kinda. The docs say you must still deliver the ChannelMonitorUpdate to some (probably in-memory) ChannelMonitor. The PermanentErr case is only for failure to update some ChannelMonitors, never all of them. There isn't anything better we can do, in any case, we've given the update to the user, they have to deliver it somewhere cause we can't do anything further.

lightning/src/ln/channel.rs

ariard

Code Review ACK 65fcf57 though let me know about the severity of this : #977 (comment), not sure we're all aligned

When receiving an update_fulfill_htlc message, we immediately forward the claim backwards along the payment path before waiting for a full commitment_signed dance. This is great, but can cause duplicative claims if a node sends an update_fulfill_htlc message, disconnects, reconnects, and then has to re-send its update_fulfill_htlc message again. While there was code to handle this, it treated it as a channel error on the inbound channel, which is incorrect - this is an expected, albeit incredibly rare, condition. Instead, we handle these double-claims correctly, simply ignoring them. With debug_assertions enabled, we also check that the previous close of the same HTLC was a fulfill, and that we are not moving from a HTLC failure to an HTLC claim after its too late. A test is also added, which hits all three failure cases in `Channel::get_update_fulfill_htlc`. Found by the chanmon_consistency fuzzer.

Previously, we could fail to generate a new commitment transaction but it simply indicated we had gone to doule-claim an HTLC. Now that double-claims are returned instead as Ok(None), we should handle the error case and fail the channel, as the only way to hit the error case is if key derivation failed or the user refused to sign the new commitment transaction. This also resolves an issue where we wouldn't inform our ChannelMonitor of the new payment preimage in case we failed to fetch a signature for the new commitment transaction.

TheBlueMatt · 2021-07-28T00:36:56Z

Squashed fixups without diff. Diff from Val's ack is below, will merge after CI.

$ git diff-tree -U1  b35d0e2 f06f9d11
diff --git a/lightning/src/ln/channel.rs b/lightning/src/ln/channel.rs
index 0a847a8a..5a0ceabe 100644
--- a/lightning/src/ln/channel.rs
+++ b/lightning/src/ln/channel.rs
@@ -325,4 +325,4 @@ pub enum UpdateFulfillCommitFetch {
 	},
-	/// Indicates the HTLC fulfill is duplicative, and already existed either in the holding cell,
-	/// or the HTLC is forgotten completely (presumably previously claimed).
+	/// Indicates the HTLC fulfill is duplicative and already existed either in the holding cell
+	/// or has been forgotten (presumably previously claimed).
 	DuplicateClaim {},
@@ -2509,3 +2509,3 @@ impl<Signer: Sign> Channel<Signer> {
 								// If an HTLC failure was previously added to the holding cell (via
-								// `get_update_fail_htlc`, then generating the fail message itself
+								// `get_update_fail_htlc`) then generating the fail message itself
 								// must not fail - we should never end up in a state where we
$

TheBlueMatt added this to the 0.1 milestone Jun 30, 2021

TheBlueMatt force-pushed the 2021-06-fix-double-claim-close branch from 4532e33 to db5ddcb Compare June 30, 2021 03:12

TheBlueMatt force-pushed the 2021-06-fix-double-claim-close branch from db5ddcb to 541411d Compare June 30, 2021 03:33

TheBlueMatt modified the milestones: 0.1, 0.0.100 Jul 2, 2021

TheBlueMatt force-pushed the 2021-06-fix-double-claim-close branch from 541411d to 14fd4a2 Compare July 7, 2021 17:49

ariard reviewed Jul 13, 2021

View reviewed changes

Support pending update_fail_htlcs in reconnect_nodes test util

fecac81

TheBlueMatt force-pushed the 2021-06-fix-double-claim-close branch from b64ab91 to 40b7615 Compare July 14, 2021 18:23

TheBlueMatt force-pushed the 2021-06-fix-double-claim-close branch from f4c2f80 to 9861d16 Compare July 15, 2021 22:45

TheBlueMatt mentioned this pull request Jul 16, 2021

Add a PaymentForwarded Event #1004

Merged

valentinewallace self-requested a review July 21, 2021 20:22

valentinewallace reviewed Jul 22, 2021

View reviewed changes

TheBlueMatt force-pushed the 2021-06-fix-double-claim-close branch from 9861d16 to dac9f3f Compare July 22, 2021 21:21

ariard reviewed Jul 25, 2021

View reviewed changes

valentinewallace reviewed Jul 25, 2021

View reviewed changes

TheBlueMatt force-pushed the 2021-06-fix-double-claim-close branch 4 times, most recently from 829c3d1 to b35d0e2 Compare July 27, 2021 00:06

valentinewallace approved these changes Jul 27, 2021

View reviewed changes

lightning/src/ln/channel.rs Outdated Show resolved Hide resolved

lightning/src/ln/channel.rs Show resolved Hide resolved

TheBlueMatt force-pushed the 2021-06-fix-double-claim-close branch from b35d0e2 to 65fcf57 Compare July 27, 2021 18:59

ariard reviewed Jul 27, 2021

View reviewed changes

TheBlueMatt added 3 commits July 28, 2021 00:34

Simplify call graph of get_update_fulfill_htlc since it can't Err.

c09104f

TheBlueMatt force-pushed the 2021-06-fix-double-claim-close branch from 65fcf57 to f06f9d1 Compare July 28, 2021 00:35

TheBlueMatt merged commit 1bb9e64 into lightningdevkit:main Jul 28, 2021

Handle double-HTLC-claims without failing the backwards channel #977

Handle double-HTLC-claims without failing the backwards channel #977

Uh oh!

Conversation

TheBlueMatt commented Jun 30, 2021

Uh oh!

codecov bot commented Jun 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ariard commented Jul 14, 2021

Uh oh!

TheBlueMatt commented Jul 15, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ariard left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

valentinewallace left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ariard left a comment

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt commented Jul 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jun 30, 2021 •

edited

Loading

TheBlueMatt commented Jul 28, 2021 •

edited

Loading