Skip to content

Fix a debug panic caused by receiving MPP parts after a failure #1266

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

TheBlueMatt
Copy link
Collaborator

Prior to cryptographic payment secrets, when we process a received
payment in process_pending_htlc_fowards we'd remove its entry
from the pending_inbound_payments map and give the user a
PaymentReceived event.

Thereafter, if a second HTLC came in with the same payment hash, it
would find no entry in the pending_inbound_payments map and be
immediately failed in process_pending_htlc_forwards.

Thus, each HTLC will either result in a PaymentReceived event or
be failed, with no possibility for both.

As of 8464875, we no longer
materially have a pending-inbound-payments map, and thus
more-than-happily accept a second payment with the same payment
hash even if we just failed a previous one for having mis-matched
payment data.

This can cause an issue if the two HTLCs are received back-to-back,
with the first being accepted as valid, generating a
PaymentReceived event. Then, when the second comes in we'll hit
the "total value {} ran over expected value" condition and fail
all pending HTLCs with the same payment hash. At this point,
we'll have a pending failure for both HTLCs, as well as a
PaymentReceived event for the user.

Thereafter, if the user attempts to fail the HTLC in response to
the PaymentReceived, they'll get a debug panic at channel.rs:1657
'Tried to fail an HTLC that was already failed'.

The solution is to avoid bulk-failing all pending HTLCs for a
payment. This feels like the right thing to do anyway - if a sender
accidentally sends an extra HTLC after a payment has ben fully
paid, we shouldn't fail the entire payment.

Found by the chanmon_consistency fuzz test.

@TheBlueMatt
Copy link
Collaborator Author

Note that these panics are debug-only (apparently for good reason!)

@codecov-commenter
Copy link

codecov-commenter commented Jan 19, 2022

Codecov Report

Merging #1266 (9cf0435) into main (0df247d) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1266   +/-   ##
=======================================
  Coverage   90.50%   90.50%           
=======================================
  Files          71       71           
  Lines       39210    39258   +48     
=======================================
+ Hits        35486    35531   +45     
- Misses       3724     3727    +3     
Impacted Files Coverage Δ
lightning/src/ln/channelmanager.rs 84.59% <ø> (ø)
lightning/src/ln/functional_tests.rs 97.03% <100.00%> (-0.01%) ⬇️
lightning-background-processor/src/lib.rs 92.60% <0.00%> (-0.44%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0df247d...9cf0435. Read the comment docs.

@TheBlueMatt TheBlueMatt force-pushed the 2022-01-fix-double-fail-panic branch from fff9f22 to 6d9668c Compare January 25, 2022 03:45
@TheBlueMatt
Copy link
Collaborator Author

Rebased on the Payee -> PaymentParameters change.

@TheBlueMatt TheBlueMatt added this to the 0.0.105 milestone Feb 7, 2022
arik-so
arik-so previously approved these changes Feb 8, 2022
Comment on lines +9322 to +9659
expect_pending_htlcs_forwardable_ignore!(nodes[1]);
nodes[1].node.process_pending_htlc_forwards();
nodes[1].node.fail_htlc_backwards(&our_payment_hash);

expect_pending_htlcs_forwardable_ignore!(nodes[1]);
nodes[1].node.process_pending_htlc_forwards();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason expect_pending_htlcs_forwardable shouldn't be used in both places?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it currently calls process_pending_htlc_forwards twice without checking for a new event, which we don't want here.

Comment on lines +9333 to +9666
nodes[0].node.handle_update_fail_htlc(&nodes[1].node.get_our_node_id(), &fail_updates_1.update_fail_htlcs[0]);
nodes[0].node.handle_update_fail_htlc(&nodes[1].node.get_our_node_id(), &fail_updates_1.update_fail_htlcs[1]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there two failed HTLCs because the second payment was sent over two paths? Or am I misunderstanding what's going on here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm? There were two send_payment calls, so we have two HTLCs to fail?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My confusion again here on the testing scenario. I think what threw me was that nodes[1] was failing a payment hash that it knew the preimage for. And I was thinking that it would only fail one HTLC since the first should have been sufficient. But I understand now that that would be incorrect behavior.

Then again, seeing this part of the commit message:

The solution is to avoid bulk-failing all pending HTLCs for a
payment. This feels like the right thing to do anyway - if a sender
accidentally sends an extra HTLC after a payment has ben fully
paid, we shouldn't fail the entire payment.

Makes me think only on path should be failed. What am I missing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, nevermind about this. I understand what's going on now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way I could clarify the test comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just extending the last sentence to say something like "and that panic does not occur if the user manually fails the payment".

jkczyz
jkczyz previously approved these changes Feb 14, 2022
@TheBlueMatt
Copy link
Collaborator Author

Added a further comment in the test.

@TheBlueMatt TheBlueMatt dismissed stale reviews from jkczyz and arik-so via e67894c February 14, 2022 17:03
@TheBlueMatt TheBlueMatt force-pushed the 2022-01-fix-double-fail-panic branch 2 times, most recently from 26a48c5 to 9cf0435 Compare February 14, 2022 18:24
jkczyz
jkczyz previously approved these changes Feb 16, 2022
Prior to cryptographic payment secrets, when we process a received
payment in `process_pending_htlc_fowards` we'd remove its entry
from the `pending_inbound_payments` map and give the user a
`PaymentReceived` event.

Thereafter, if a second HTLC came in with the same payment hash, it
would find no entry in the `pending_inbound_payments` map and be
immediately failed in `process_pending_htlc_forwards`.

Thus, each HTLC will either result in a `PaymentReceived` event or
be failed, with no possibility for both.

As of 8464875, we no longer
materially have a pending-inbound-payments map, and thus
more-than-happily accept a second payment with the same payment
hash even if we just failed a previous one for having mis-matched
payment data.

This can cause an issue if the two HTLCs are received back-to-back,
with the first being accepted as valid, generating a
`PaymentReceived` event. Then, when the second comes in we'll hit
the "total value {} ran over expected value" condition and fail
*all* pending HTLCs with the same payment hash. At this point,
we'll have a pending failure for both HTLCs, as well as a
`PaymentReceived` event for the user.

Thereafter, if the user attempts to fail the HTLC in response to
the `PaymentReceived`, they'll get a debug panic at channel.rs:1657
'Tried to fail an HTLC that was already failed'.

The solution is to avoid bulk-failing all pending HTLCs for a
payment. This feels like the right thing to do anyway - if a sender
accidentally sends an extra HTLC after a payment has ben fully
paid, we shouldn't fail the entire payment.

Found by the `chanmon_consistency` fuzz test.
@TheBlueMatt
Copy link
Collaborator Author

Squashed without changes.

Copy link
Contributor

@arik-so arik-so left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-ACKing

@TheBlueMatt
Copy link
Collaborator Author

CI failure is unrelated (tokio MSRV bump).

@TheBlueMatt TheBlueMatt merged commit de1aca5 into lightningdevkit:main Feb 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants