Fix a debug panic caused by receiving MPP parts after a failure #1266

TheBlueMatt · 2022-01-19T19:21:19Z

Prior to cryptographic payment secrets, when we process a received
payment in process_pending_htlc_fowards we'd remove its entry
from the pending_inbound_payments map and give the user a
PaymentReceived event.

Thereafter, if a second HTLC came in with the same payment hash, it
would find no entry in the pending_inbound_payments map and be
immediately failed in process_pending_htlc_forwards.

Thus, each HTLC will either result in a PaymentReceived event or
be failed, with no possibility for both.

As of 8464875, we no longer
materially have a pending-inbound-payments map, and thus
more-than-happily accept a second payment with the same payment
hash even if we just failed a previous one for having mis-matched
payment data.

This can cause an issue if the two HTLCs are received back-to-back,
with the first being accepted as valid, generating a
PaymentReceived event. Then, when the second comes in we'll hit
the "total value {} ran over expected value" condition and fail
all pending HTLCs with the same payment hash. At this point,
we'll have a pending failure for both HTLCs, as well as a
PaymentReceived event for the user.

Thereafter, if the user attempts to fail the HTLC in response to
the PaymentReceived, they'll get a debug panic at channel.rs:1657
'Tried to fail an HTLC that was already failed'.

The solution is to avoid bulk-failing all pending HTLCs for a
payment. This feels like the right thing to do anyway - if a sender
accidentally sends an extra HTLC after a payment has ben fully
paid, we shouldn't fail the entire payment.

Found by the chanmon_consistency fuzz test.

TheBlueMatt · 2022-01-19T19:21:35Z

Note that these panics are debug-only (apparently for good reason!)

codecov-commenter · 2022-01-19T19:46:49Z

Codecov Report

Merging #1266 (9cf0435) into main (0df247d) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main    #1266   +/-   ##
=======================================
  Coverage   90.50%   90.50%           
=======================================
  Files          71       71           
  Lines       39210    39258   +48     
=======================================
+ Hits        35486    35531   +45     
- Misses       3724     3727    +3

Impacted Files	Coverage Δ
lightning/src/ln/channelmanager.rs	`84.59% <ø> (ø)`
lightning/src/ln/functional_tests.rs	`97.03% <100.00%> (-0.01%)`	⬇️
lightning-background-processor/src/lib.rs	`92.60% <0.00%> (-0.44%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0df247d...9cf0435. Read the comment docs.

TheBlueMatt · 2022-01-25T03:45:53Z

Rebased on the Payee -> PaymentParameters change.

lightning/src/ln/functional_tests.rs

jkczyz · 2022-02-11T22:06:13Z

lightning/src/ln/functional_tests.rs

+	expect_pending_htlcs_forwardable_ignore!(nodes[1]);
+	nodes[1].node.process_pending_htlc_forwards();
+	nodes[1].node.fail_htlc_backwards(&our_payment_hash);
+
+	expect_pending_htlcs_forwardable_ignore!(nodes[1]);
+	nodes[1].node.process_pending_htlc_forwards();


Any reason expect_pending_htlcs_forwardable shouldn't be used in both places?

Yes, it currently calls process_pending_htlc_forwards twice without checking for a new event, which we don't want here.

jkczyz · 2022-02-11T22:10:14Z

lightning/src/ln/functional_tests.rs

+	nodes[0].node.handle_update_fail_htlc(&nodes[1].node.get_our_node_id(), &fail_updates_1.update_fail_htlcs[0]);
+	nodes[0].node.handle_update_fail_htlc(&nodes[1].node.get_our_node_id(), &fail_updates_1.update_fail_htlcs[1]);


Are there two failed HTLCs because the second payment was sent over two paths? Or am I misunderstanding what's going on here?

Hmm? There were two send_payment calls, so we have two HTLCs to fail?

My confusion again here on the testing scenario. I think what threw me was that nodes[1] was failing a payment hash that it knew the preimage for. And I was thinking that it would only fail one HTLC since the first should have been sufficient. But I understand now that that would be incorrect behavior.

Then again, seeing this part of the commit message:

The solution is to avoid bulk-failing all pending HTLCs for a payment. This feels like the right thing to do anyway - if a sender accidentally sends an extra HTLC after a payment has ben fully paid, we shouldn't fail the entire payment.

Makes me think only on path should be failed. What am I missing?

Actually, nevermind about this. I understand what's going on now.

Is there a way I could clarify the test comment?

I think just extending the last sentence to say something like "and that panic does not occur if the user manually fails the payment".

TheBlueMatt · 2022-02-14T17:02:25Z

Added a further comment in the test.

lightning/src/ln/functional_tests.rs

Prior to cryptographic payment secrets, when we process a received payment in `process_pending_htlc_fowards` we'd remove its entry from the `pending_inbound_payments` map and give the user a `PaymentReceived` event. Thereafter, if a second HTLC came in with the same payment hash, it would find no entry in the `pending_inbound_payments` map and be immediately failed in `process_pending_htlc_forwards`. Thus, each HTLC will either result in a `PaymentReceived` event or be failed, with no possibility for both. As of 8464875, we no longer materially have a pending-inbound-payments map, and thus more-than-happily accept a second payment with the same payment hash even if we just failed a previous one for having mis-matched payment data. This can cause an issue if the two HTLCs are received back-to-back, with the first being accepted as valid, generating a `PaymentReceived` event. Then, when the second comes in we'll hit the "total value {} ran over expected value" condition and fail *all* pending HTLCs with the same payment hash. At this point, we'll have a pending failure for both HTLCs, as well as a `PaymentReceived` event for the user. Thereafter, if the user attempts to fail the HTLC in response to the `PaymentReceived`, they'll get a debug panic at channel.rs:1657 'Tried to fail an HTLC that was already failed'. The solution is to avoid bulk-failing all pending HTLCs for a payment. This feels like the right thing to do anyway - if a sender accidentally sends an extra HTLC after a payment has ben fully paid, we shouldn't fail the entire payment. Found by the `chanmon_consistency` fuzz test.

TheBlueMatt · 2022-02-16T21:40:20Z

Squashed without changes.

arik-so

re-ACKing

TheBlueMatt · 2022-02-17T03:41:47Z

CI failure is unrelated (tokio MSRV bump).

TheBlueMatt force-pushed the 2022-01-fix-double-fail-panic branch from fff9f22 to 6d9668c Compare January 25, 2022 03:45

TheBlueMatt added the fuzzer fix label Jan 26, 2022

TheBlueMatt added this to the 0.0.105 milestone Feb 7, 2022

arik-so previously approved these changes Feb 8, 2022

View reviewed changes

jkczyz reviewed Feb 11, 2022

View reviewed changes

jkczyz previously approved these changes Feb 14, 2022

View reviewed changes

TheBlueMatt dismissed stale reviews from jkczyz and arik-so via e67894c February 14, 2022 17:03

jkczyz reviewed Feb 14, 2022

View reviewed changes

lightning/src/ln/functional_tests.rs Outdated Show resolved Hide resolved

TheBlueMatt force-pushed the 2022-01-fix-double-fail-panic branch 2 times, most recently from 26a48c5 to 9cf0435 Compare February 14, 2022 18:24

jkczyz previously approved these changes Feb 16, 2022

View reviewed changes

TheBlueMatt dismissed jkczyz’s stale review via be57e82 February 16, 2022 21:40

TheBlueMatt force-pushed the 2022-01-fix-double-fail-panic branch from 9cf0435 to be57e82 Compare February 16, 2022 21:40

jkczyz approved these changes Feb 16, 2022

View reviewed changes

arik-so approved these changes Feb 17, 2022

View reviewed changes

TheBlueMatt merged commit de1aca5 into lightningdevkit:main Feb 17, 2022

		nodes[0].node.handle_update_fail_htlc(&nodes[1].node.get_our_node_id(), &fail_updates_1.update_fail_htlcs[0]);
		nodes[0].node.handle_update_fail_htlc(&nodes[1].node.get_our_node_id(), &fail_updates_1.update_fail_htlcs[1]);

Fix a debug panic caused by receiving MPP parts after a failure #1266

Fix a debug panic caused by receiving MPP parts after a failure #1266

Uh oh!

Conversation

TheBlueMatt commented Jan 19, 2022

Uh oh!

TheBlueMatt commented Jan 19, 2022

Uh oh!

codecov-commenter commented Jan 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

TheBlueMatt commented Jan 25, 2022

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt commented Feb 14, 2022

Uh oh!

Uh oh!

TheBlueMatt commented Feb 16, 2022

Uh oh!

arik-so left a comment

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt commented Feb 17, 2022

Uh oh!

Uh oh!

codecov-commenter commented Jan 19, 2022 •

edited

Loading