Track claimed outbound HTLCs in ChannelMonitors #2048

TheBlueMatt · 2023-02-24T00:19:05Z

When we receive an update_fulfill_htlc message, we immediately try to "claim" the HTLC against the HTLCSource. If there is one, this works great, we immediately generate a ChannelMonitorUpdate for the corresponding inbound HTLC and persist that before we ever get to processing our counterparty's commitment_signed and persisting the corresponding ChannelMonitorUpdate.

However, if there isn't one (and this is the first successful HTLC for a payment we sent), we immediately generate a PaymentSent event and queue it up for the user. Then, a millisecond later, we receive the commitment_signed from our peer, removing the HTLC from the latest local commitment transaction as a side-effect of the ChannelMonitorUpdate applied.

If the user has processed the PaymentSent event by that point, great, we're done. However, if they have not, and we crash prior to persisting the ChannelManager, on startup we get confused about the state of the payment. We'll force-close the channel for being stale, and see an HTLC which was removed and is no longer present in the latest commitment transaction (which we're broadcasting). Because we claim corresponding inbound HTLCs before updating a ChannelMonitor, we assume such HTLCs have failed - attempting to fail after having claimed should be a noop. However, in the sent-payment case we now generate a PaymentFailed event for the user, allowing an HTLC to complete without giving the user a preimage.

Here we address this issue by storing the payment preimages for claimed outbound HTLCs in the ChannelMonitor, in addition to the existing inbound HTLC preimages already stored there. This allows us to fix the specific issue described by checking for a preimage and switching the type of event generated in response. In addition, it reduces the risk of future confusion by ensuring we don't fail HTLCs which were claimed but not fully committed to before a crash.

It does not, however, full fix the issue here - because the preimages are removed after the HTLC has been fully removed from available commitment transactions if we are substantially delayed in persisting the ChannelManager from the time we receive the update_fulfill_htlc until after a full commitment signed dance completes we may still hit this issue. The full fix for this issue is to delay the persistence of the ChannelMonitorUpdate until after the PaymentSent event has been processed. This avoids the issue entirely, ensuring we process the event before updating the ChannelMonitor, the same as we ensure the upstream HTLC has been claimed before updating the ChannelMonitor for forwarded payments.

The full solution will be implemented in a later work, however this change still makes sense at that point as well - if we were to delay the initial commitment_signed ChannelMonitorUpdate util after the PaymentSent event has been processed (which likely requires a database update on the users' end), we'd hold our commitment_signed + revoke_and_ack response for two DB writes (i.e. fsync() calls), making our commitment transaction processing a full fsync slower. By making this change first, we can instead delay the ChannelMonitorUpdate from the counterparty's final revoke_and_ack message until the event has been processed, giving us a full network roundtrip to do so and avoiding delaying our response as long as an fsync is faster than a network roundtrip.

lightning/src/ln/channel.rs

lightning/src/ln/channelmanager.rs

dunxen · 2023-02-26T19:55:28Z

Congrats on PR 2^11 :)
Added to my review list for tomorrow.

codecov-commenter · 2023-02-27T20:45:15Z

Codecov Report

Patch coverage: 82.38% and project coverage change: -0.11 ⚠️

Comparison is base (6ddf69c) 87.19% compared to head (75527db) 87.09%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2048      +/-   ##
==========================================
- Coverage   87.19%   87.09%   -0.11%     
==========================================
  Files         100      100              
  Lines       44560    45110     +550     
  Branches    44560    45110     +550     
==========================================
+ Hits        38853    39287     +434     
- Misses       5707     5823     +116

Impacted Files	Coverage Δ
lightning/src/sync/nostd_sync.rs	`40.90% <0.00%> (-3.00%)`	⬇️
lightning/src/chain/channelmonitor.rs	`89.08% <74.50%> (-0.51%)`	⬇️
lightning/src/ln/channelmanager.rs	`86.54% <79.59%> (-0.04%)`	⬇️
lightning/src/ln/payment_tests.rs	`95.52% <95.55%> (+<0.01%)`	⬆️
lightning/src/ln/channel.rs	`84.01% <100.00%> (+0.01%)`	⬆️
lightning/src/sync/debug_sync.rs	`81.61% <100.00%> (+0.41%)`	⬆️
lightning-invoice/src/utils.rs	`96.14% <0.00%> (+0.64%)`	⬆️
lightning/src/ln/peer_handler.rs	`64.19% <0.00%> (+5.78%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

lightning/src/ln/channelmanager.rs

lightning/src/chain/channelmonitor.rs

naumenkogs · 2023-02-28T13:05:54Z

lightning/src/ln/channelmanager.rs

-											false
+					for (htlc_source, (htlc, preimage_opt)) in monitor.get_all_current_outbound_htlcs() {
+						match htlc_source {
+							HTLCSource::PreviousHopData(prev_hop_data) => {


I was wondering if we could apply any invariant w.r.t option in this case... Should it be assert(is_none) or something? It would be better than just dropping it on the floor in this match condition.

Not sure which option you're referring to?

Ahhh i meant match. Inside the PreviousHopData condition, i wondered if anything could be asserted about preimage_opt instead of just dropping it of the floor. This is totally not a blocker, but if that's possible it just helps to understand better what and where is coming from

Ohhh, right, yea, its a bit annoying to add cause we end up with a good bit of lookups, so let me save it for a followup, but will do.

lightning/src/chain/channelmonitor.rs

valentinewallace

Initial pass

lightning/src/chain/channelmonitor.rs

lightning/src/ln/channelmanager.rs

lightning/src/ln/payment_tests.rs

TheBlueMatt · 2023-03-02T05:05:23Z

Sorry for the delay, fixed the fuzzing failure and addressed comments.

TheBlueMatt · 2023-03-02T19:00:00Z

I believe the CI failure will be fixed by squash - let me know if I can do that reviewers.

lightning/src/chain/channelmonitor.rs

TheBlueMatt · 2023-03-02T20:01:27Z

Squashed and pushed with one cleanup from @douglaz:

$ git diff-tree -U1 f407de1b 743d5639
diff --git a/lightning/src/chain/channelmonitor.rs b/lightning/src/chain/channelmonitor.rs
index 684a1163b..f67d0e330 100644
--- a/lightning/src/chain/channelmonitor.rs
+++ b/lightning/src/chain/channelmonitor.rs
@@ -2178,13 +2178,9 @@ impl<Signer: WriteableEcdsaChannelSigner> ChannelMonitorImpl<Signer> {
 			#[cfg(debug_assertions)] {
-				let mut found_matching_pending_htlc = false;
-				for (_, source_opt) in self.counterparty_claimable_outpoints.get(
-					&self.current_counterparty_commitment_txid.unwrap()
-				).unwrap().iter() {
+				let cur_counterparty_htlcs = self.counterparty_claimable_outpoints.get(
+						&self.current_counterparty_commitment_txid.unwrap()).unwrap();
+				assert!(cur_counterparty_htlcs.iter().any(|(_, source_opt)| {
 					if let Some(source) = source_opt {
-						if SentHTLCId::from_source(source) == *claimed_htlc_id {
-							found_matching_pending_htlc = true;
-						}
-					}
-				}
-				assert!(found_matching_pending_htlc);
+						SentHTLCId::from_source(source) == *claimed_htlc_id
+					} else { false }
+				}));
 			}

lightning/src/chain/channelmonitor.rs

lightning/src/ln/channelmanager.rs

lightning/src/ln/payment_tests.rs

wpaulino

LGTM after squash

lightning/src/chain/channelmonitor.rs

lightning/src/ln/channelmanager.rs

lightning/src/ln/payment_tests.rs

valentinewallace

LGTM after squash

lightning/src/ln/channelmanager.rs

When we receive an update_fulfill_htlc message, we immediately try to "claim" the HTLC against the HTLCSource. If there is one, this works great, we immediately generate a `ChannelMonitorUpdate` for the corresponding inbound HTLC and persist that before we ever get to processing our counterparty's `commitment_signed` and persisting the corresponding `ChannelMonitorUpdate`. However, if there isn't one (and this is the first successful HTLC for a payment we sent), we immediately generate a `PaymentSent` event and queue it up for the user. Then, a millisecond later, we receive the `commitment_signed` from our peer, removing the HTLC from the latest local commitment transaction as a side-effect of the `ChannelMonitorUpdate` applied. If the user has processed the `PaymentSent` event by that point, great, we're done. However, if they have not, and we crash prior to persisting the `ChannelManager`, on startup we get confused about the state of the payment. We'll force-close the channel for being stale, and see an HTLC which was removed and is no longer present in the latest commitment transaction (which we're broadcasting). Because we claim corresponding inbound HTLCs before updating a `ChannelMonitor`, we assume such HTLCs have failed - attempting to fail after having claimed should be a noop. However, in the sent-payment case we now generate a `PaymentFailed` event for the user, allowing an HTLC to complete without giving the user a preimage. Here we address this issue by storing the payment preimages for claimed outbound HTLCs in the `ChannelMonitor`, in addition to the existing inbound HTLC preimages already stored there. This allows us to fix the specific issue described by checking for a preimage and switching the type of event generated in response. In addition, it reduces the risk of future confusion by ensuring we don't fail HTLCs which were claimed but not fully committed to before a crash. It does not, however, full fix the issue here - because the preimages are removed after the HTLC has been fully removed from available commitment transactions if we are substantially delayed in persisting the `ChannelManager` from the time we receive the `update_fulfill_htlc` until after a full commitment signed dance completes we may still hit this issue. The full fix for this issue is to delay the persistence of the `ChannelMonitorUpdate` until after the `PaymentSent` event has been processed. This avoids the issue entirely, ensuring we process the event before updating the `ChannelMonitor`, the same as we ensure the upstream HTLC has been claimed before updating the `ChannelMonitor` for forwarded payments. The full solution will be implemented in a later work, however this change still makes sense at that point as well - if we were to delay the initial `commitment_signed` `ChannelMonitorUpdate` util after the `PaymentSent` event has been processed (which likely requires a database update on the users' end), we'd hold our `commitment_signed` + `revoke_and_ack` response for two DB writes (i.e. `fsync()` calls), making our commitment transaction processing a full `fsync` slower. By making this change first, we can instead delay the `ChannelMonitorUpdate` from the counterparty's final `revoke_and_ack` message until the event has been processed, giving us a full network roundtrip to do so and avoiding delaying our response as long as an `fsync` is faster than a network roundtrip.

TheBlueMatt · 2023-03-03T17:19:11Z

Squashed without further changes.

0.0.114 - Mar 3, 2023 - "Faster Async BOLT12 Retries" API Updates =========== * `InvoicePayer` has been removed and its features moved directly into `ChannelManager`. As such it now requires a simplified `Router` and supports `send_payment_with_retry` (and friends). `ChannelManager::retry_payment` was removed in favor of the automated retries. Invoice payment utilities in `lightning-invoice` now call the new code (lightningdevkit#1812, lightningdevkit#1916, lightningdevkit#1929, lightningdevkit#2007, etc). * `Sign`/`BaseSign` has been renamed `ChannelSigner`, with `EcdsaChannelSigner` split out in anticipation of future schnorr/taproot support (lightningdevkit#1967). * The catch-all `KeysInterface` was split into `EntropySource`, `NodeSigner`, and `SignerProvider`. `KeysManager` implements all three (lightningdevkit#1910, lightningdevkit#1930). * `KeysInterface::get_node_secret` is now `KeysManager::get_node_secret_key` and is no longer required for external signers (lightningdevkit#1951, lightningdevkit#2070). * A `lightning-transaction-sync` crate has been added which implements keeping LDK in sync with the chain via an esplora server (lightningdevkit#1870). Note that it can only be used on nodes that *never* ran a previous version of LDK. * `Score` is updated in `BackgroundProcessor` instead of via `Router` (lightningdevkit#1996). * `ChainAccess::get_utxo` (now `UtxoAccess`) can now be resolved async (lightningdevkit#1980). * BOLT12 `Offer`, `InvoiceRequest`, `Invoice` and `Refund` structs as well as associated builders have been added. Such invoices cannot yet be paid due to missing support for blinded path payments (lightningdevkit#1927, lightningdevkit#1908, lightningdevkit#1926). * A `lightning-custom-message` crate has been added to make combining multiple custom messages into one enum/handler easier (lightningdevkit#1832). * `Event::PaymentPathFailure` is now generated for failure to send an HTLC over the first hop on our local channel (lightningdevkit#2014, lightningdevkit#2043). * `lightning-net-tokio` no longer requires an `Arc` on `PeerManager` (lightningdevkit#1968). * `ChannelManager::list_recent_payments` was added (lightningdevkit#1873). * `lightning-background-processor` `std` is now optional in async mode (lightningdevkit#1962). * `create_phantom_invoice` can now be used in `no-std` (lightningdevkit#1985). * The required final CLTV delta on inbound payments is now configurable (lightningdevkit#1878) * bitcoind RPC error code and message are now surfaced in `block-sync` (lightningdevkit#2057). * Get `historical_estimated_channel_liquidity_probabilities` was added (lightningdevkit#1961). * `ChannelManager::fail_htlc_backwards_with_reason` was added (lightningdevkit#1948). * Macros which implement serialization using TLVs or straight writing of struct fields are now public (lightningdevkit#1823, lightningdevkit#1976, lightningdevkit#1977). Backwards Compatibility ======================= * Any inbound payments with a custom final CLTV delta will be rejected by LDK if you downgrade prior to receipt (lightningdevkit#1878). * `Event::PaymentPathFailed::network_update` will always be `None` if an 0.0.114-generated event is read by a prior version of LDK (lightningdevkit#2043). * `Event::PaymentPathFailed::all_paths_removed` will always be false if an 0.0.114-generated event is read by a prior version of LDK. Users who rely on it to determine payment retries should migrate to `Event::PaymentFailed`, in a separate release prior to upgrading to LDK 0.0.114 if downgrading is supported (lightningdevkit#2043). Performance Improvements ======================== * Channel data is now stored per-peer and channel updates across multiple peers can be operated on simultaneously (lightningdevkit#1507). * Routefinding is roughly 1.5x faster (lightningdevkit#1799). * Deserializing a `NetworkGraph` is roughly 6x faster (lightningdevkit#2016). * Memory usage for a `NetworkGraph` has been reduced substantially (lightningdevkit#2040). * `KeysInterface::get_secure_random_bytes` is roughly 200x faster (lightningdevkit#1974). Bug Fixes ========= * Fixed a bug where a delay in processing a `PaymentSent` event longer than the time taken to persist a `ChannelMonitor` update, when occurring immediately prior to a crash, may result in the `PaymentSent` event being lost (lightningdevkit#2048). * Fixed spurious rejections of rapid gossip sync data when the graph has been updated by other means between gossip syncs (lightningdevkit#2046). * Fixed a panic in `KeysManager` when the high bit of `starting_time_nanos` is set (lightningdevkit#1935). * Resolved an issue where the `ChannelManager::get_persistable_update_future` future would fail to wake until a second notification occurs (lightningdevkit#2064). * Resolved a memory leak when using `ChannelManager::send_probe` (lightningdevkit#2037). * Fixed a deadlock on some platforms at least when using async `ChannelMonitor` updating (lightningdevkit#2006). * Removed debug-only assertions which were reachable in threaded code (lightningdevkit#1964). * In some cases when payment sending fails on our local channel retries no longer take the same path and thus never succeed (lightningdevkit#2014). * Retries for spontaneous payments have been fixed (lightningdevkit#2002). * Return an `Err` if `lightning-persister` fails to read the directory listing rather than panicing (lightningdevkit#1943). * `peer_disconnected` will now never be called without `peer_connected` (lightningdevkit#2035) Security ======== 0.0.114 fixes several denial-of-service vulnerabilities which are reachable from untrusted input from channel counterparties or in deployments accepting inbound connections or channels. It also fixes a denial-of-service vulnerability in rare cases in the route finding logic. * The number of pending un-funded channels as well as peers without funded channels is now limited to avoid denial of service (lightningdevkit#1988). * A second `channel_ready` message received immediately after the first could lead to a spurious panic (lightningdevkit#2071). This issue was introduced with 0conf support in LDK 0.0.107. * A division-by-zero issue was fixed in the `ProbabilisticScorer` if the amount being sent (including previous-hop fees) is equal to a channel's capacity while walking the graph (lightningdevkit#2072). The division-by-zero was introduced with historical data tracking in LDK 0.0.112. In total, this release features 130 files changed, 21457 insertions, 10113 deletions in 343 commits from 18 authors, in alphabetical order: * Alec Chen * Allan Douglas R. de Oliveira * Andrei * Arik Sosman * Daniel Granhão * Duncan Dean * Elias Rohrer * Jeffrey Czyz * John Cantrell * Kurtsley * Matt Corallo * Max Fang * Omer Yacine * Valentine Wallace * Viktor Tigerström * Wilmer Paulino * benthecarman * jurvis

TheBlueMatt added this to the 0.0.114 milestone Feb 24, 2023

wpaulino self-requested a review February 24, 2023 01:50

naumenkogs reviewed Feb 24, 2023

View reviewed changes

lightning/src/ln/channel.rs Outdated Show resolved Hide resolved

naumenkogs reviewed Feb 24, 2023

View reviewed changes

lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved

dunxen self-requested a review February 26, 2023 19:52

TheBlueMatt force-pushed the 2023-02-send-persist-order-a branch from 4e3c530 to 55b0e61 Compare February 27, 2023 02:33

wpaulino reviewed Feb 28, 2023

View reviewed changes

naumenkogs reviewed Feb 28, 2023

View reviewed changes

lightning/src/chain/channelmonitor.rs Show resolved Hide resolved

naumenkogs reviewed Feb 28, 2023

View reviewed changes

TheBlueMatt force-pushed the 2023-02-send-persist-order-a branch from 0c6f20f to 7912091 Compare March 1, 2023 02:32

douglaz reviewed Mar 1, 2023

View reviewed changes

lightning/src/chain/channelmonitor.rs Outdated Show resolved Hide resolved

douglaz reviewed Mar 1, 2023

View reviewed changes

lightning/src/chain/channelmonitor.rs Outdated Show resolved Hide resolved

valentinewallace reviewed Mar 1, 2023

View reviewed changes

TheBlueMatt force-pushed the 2023-02-send-persist-order-a branch 3 times, most recently from bf95e1c to f407de1 Compare March 2, 2023 05:05

douglaz reviewed Mar 2, 2023

View reviewed changes

lightning/src/chain/channelmonitor.rs Outdated Show resolved Hide resolved

TheBlueMatt force-pushed the 2023-02-send-persist-order-a branch from f407de1 to 743d563 Compare March 2, 2023 19:59

valentinewallace reviewed Mar 2, 2023

View reviewed changes

TheBlueMatt force-pushed the 2023-02-send-persist-order-a branch 3 times, most recently from 8b41ad5 to 89d39f6 Compare March 2, 2023 21:43

wpaulino reviewed Mar 2, 2023

View reviewed changes

lightning/src/chain/channelmonitor.rs Show resolved Hide resolved

valentinewallace reviewed Mar 2, 2023

View reviewed changes

lightning/src/chain/channelmonitor.rs Show resolved Hide resolved

lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved

lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved

lightning/src/ln/payment_tests.rs Show resolved Hide resolved

Avoid removing stale preimages when hashes collide in fuzzing

33c36d0

TheBlueMatt force-pushed the 2023-02-send-persist-order-a branch from 89d39f6 to 50977ce Compare March 2, 2023 23:42

valentinewallace reviewed Mar 3, 2023

View reviewed changes

lightning/src/ln/channelmanager.rs Show resolved Hide resolved

TheBlueMatt added 2 commits March 3, 2023 17:19

Remove unused compat block in provide_latest_holder_commitment_tx

75527db

TheBlueMatt force-pushed the 2023-02-send-persist-order-a branch from 50977ce to 75527db Compare March 3, 2023 17:19

valentinewallace approved these changes Mar 3, 2023

View reviewed changes

wpaulino approved these changes Mar 3, 2023

View reviewed changes

TheBlueMatt merged commit a9e6341 into lightningdevkit:main Mar 3, 2023

Track claimed outbound HTLCs in ChannelMonitors #2048

Track claimed outbound HTLCs in ChannelMonitors #2048

Uh oh!

Conversation

TheBlueMatt commented Feb 24, 2023

Uh oh!

Uh oh!

Uh oh!

dunxen commented Feb 26, 2023

Uh oh!

codecov-commenter commented Feb 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

naumenkogs Feb 28, 2023

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt Mar 1, 2023

Choose a reason for hiding this comment

Uh oh!

naumenkogs Mar 1, 2023

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt Mar 2, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

valentinewallace left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TheBlueMatt commented Mar 2, 2023

Uh oh!

TheBlueMatt commented Mar 2, 2023

Uh oh!

Uh oh!

TheBlueMatt commented Mar 2, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wpaulino left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

valentinewallace left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

TheBlueMatt commented Mar 3, 2023

Uh oh!

Uh oh!

codecov-commenter commented Feb 27, 2023 •

edited

Loading