Skip to content

[0.1-bindings] Update bindings to 0.1.2 and include lightning-liquidity #3706

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

TheBlueMatt
Copy link
Collaborator

Merges the v0.1.2 tag as well as #3583 and #3701.

TheBlueMatt and others added 30 commits February 19, 2025 15:03
We generally expect `ChannelLiquidity` to be exactly three cache
lines to ensure the first bytes we need are all one one cache line.
This improves performance very marginally on some machines, but
the assertions that this is true do not work on some Android 32-bit
machines due to differing `Duration` sizes.

Here we simply remove the assertions to fix build on platforms
where the struct size isn't exactly on cache lines. This may
marginally harm performance but it shouldn't be that critical.

Fixes lightningdevkit#3415
…0.1-android-32-bit

[0.1] Stop checking `ChannelLiquidity`'s in-memory size
Instead of using elaborate calculations to determine the exact amount of
bytes need for a BOLT12 message are allocated, use a fixed size amount.
This reduces the code complexity and potentially reduces heap
fragmentation in the normal case.
Now that the previous commit removed assertions on Vec capacities for
BOLT12 messages, the use of reserve_exact in tests is no longer needed.
The formula for applying half lives was incorrect. Test coverage added.

Relatively straightforward merge conflicts (code added in
311a083 which was not included
neighbored new code added) fixed in:
 * lightning/src/routing/scoring.rs
`counterparty_spendable_height` is not used outside of `package.rs`
so there's not much reason to have an accessor for it. Also, in the
next commit an issue with setting the correct value for revoked
counterparty HTLC outputs is fixed, and the upgrade path causes the
value to be 0 in some cases, making using the value in too many
places somewhat fraught.
If the counterparty broadcasts a revoked transaction with offered
HTLCs, the output is not immediately pinnable as the counterparty
cannot claim the HTLC until the CLTV expires and they use an
HTLC-Timeout path.

Here we fix the `counterparty_spendable_height` value we set on
counterparty revoked HTLC claims to match reality. Note that
because we still consider these outputs `Pinnable` the value is
not used. In the next commit we'll start making them `Unpinnable`
which will actually change behavior.

Note that when upgrading we have to wipe the
`counterparty_spendable_height` value for non-offered HTLCs as
otherwise we'd consider them `Unpinnable` when they are, in fact,
`Pinnable`.
If the counterparty broadcasts a revoked transaction with offered
HTLCs, the output is not immediately pinnable as the counterparty
cannot claim the HTLC until the CLTV expires and they use an
HTLC-Timeout path.

Here we properly set these packages as `Unpinnable`, changing some
transaction generation during tests.
If one message handler refuses a connection by returning an `Err`
from `peer_connected`, other handlers which already got the
`peer_connected` will not see the corresponding
`peer_disconnected`, leaving them in a potentially-inconsistent
state.

Here we ensure we call the `peer_disconnected` handler for all
handlers which received a `peer_connected` event (except the one
which refused the connection).
`PublicKey` parsing is relatively expensive as we have to check if
the point is actually on the curve. To avoid it, our `NetworkGraph`
uses `NodeId`s which don't have the validity requirement.

Sadly, we were always parsing the broadcasting node's `PublicKey`
from the `node_id` in the network graph whenever we see an update
for that channel, whether we have a corresponding signature or not.

Here we fix this, only parsing the public key (and hashing the
message) if we're going to check a signature.
When we build a new `NetworkGraph` from empty, we're generally
doing an initial startup and will be syncing the graph very soon.
Using an initially-empty `IndexedMap` for the `channels` and
`nodes` results in quite some memory churn, with the initial RGS
application benchmark showing 15% of its time in pagefault handling
alone (i.e. allocating new memory from the OS, let alone the 23%
of time in `memmove`).

Further, when deserializing a `NetworkGraph`, we'd swapped the
expected node and channel count constants, leaving the node map
too small and causing map doubling as we read entries from disk.

Finally, when deserializing, allocating only exactly the amount of
map entries we need is likely to lead to at least one doubling, so
we're better off just over-estimating the number of nodes and
channels and allocating what we want.

Here we just always allocate `channels` and `nodes` based on
constants, leading to a 20%-ish speedup in the initial RGS
application benchmark.
If we have a first-hop channel from a first-hop hint, we'll ignore
the fees on it as we won't charge ourselves fees. However, if we
have a first-hop channel from the network graph, we should do the
same.

We do so here, also teeing up a coming commit which will remove
much of the custom codepath for first-hop hints and start using
this common codepath as well.
These tests are a bit annoying to deal with and ultimately work on
almost the same graph subset, so it makes sense to combine their
graph layout logic and then call it twice.

We do that here, combining them and also cleaning up the possible
paths as there actually are paths that the router could select
which don't meet the tests requirements.
In a coming commit we'll start calling
`add_entries_to_cheapest_to_target_node` without always having a
public-graph node entry in order to process last- and first-hops
via a common codepath. In order to do so, we always need the
`node_counter` for the node, however, and thus we track them in
`RouteGraphNode` and pass them through to
`add_entries_to_cheapest_to_target_node` here.

We also take this opportunity to swap the node preference logic to
look at the counters, which is slightly less computational work,
though it does require some unrelated test changes.
This likely only impacts very rare edge cases, but if we have two
equal-cost paths, we should likely prefer ones which contribute
more value (avoiding cases where we use paths which are
amount-limited but equal fee to higher-amount paths) and then paths
with fewer hops (which may complete faster).

It does make test behavior more robust against router changes,
which comes in handy over the coming commits.
When we handle the unblinded last-hop route hints from an invoice,
we had a good bit of code dedicated to handling fee propagation
through the (potentially) multiple last-hops and connecting them to
potentially directly-connected first-hops.

This was a good bit of code that was almost never used, and it
turns out was also buggy - we could process a route hint with
multiple hops, committing to one path through nodes A, B, to C,
then process another route hint (or public channel) which changes
our best path from B to C, making the A entry invalid.

Here we remove the whole maze, utilizing the normal hop-processing
logic in `add_entries_to_cheapest_to_target_node` for last-hops as
well. It requires tracking which nodes connect to last-hop hints
similar to the way we do with `is_first_hop_target` in
`PathBuildingHop`, storing the `CandidateRouteHop`s in a new map,
and always calling `add_entries_to_cheapest_to_target_node` on the
payee node, whether its public or not.
When we do pathfinding with blinded paths, we start each
pathfinding iteration by inserting all the blinded paths into our
nodes map as last-hops to the destination. As we do that, we check
if any of the introduction points happen to be nodes we have direct
chanels with, as we want to use the local info for such channels
and support finding a path even if that channel is not publicly
announced.

However, as we iterate the blinded paths, we may find a second
blinded path from the same introduction point which we prefer over
the first. If this happens, we would already have added info from
us over the local channel to that intro point and end up with
calculations for the first hop to a blinded path that we no longer
prefer.

This is ultimately fixed here in two ways:
(a) we process the first-hop channels to blinded path introduction
    points in a separate loop after we've processed all blinded
    paths, ensuring we only ever consider a channel to the blinded
    path we will ultimately prefer.
(b) In the next commit, we add we add a new tracking bool in
    `PathBuildingHop` called `best_path_from_hop_selected` which we
    set when we process a channel backwards from a node, indicating
    that we've committed to the best path to the node and check when
    we add a new path to a node. This would have resulted in a much
    earlier debug-assertion in fuzzing or several tests.
When we process a path backwards from a node during pathfinding, we
implicitly commit to the path up to that node. Any changes to the
preferred path up to that node will make the newly processed path's
state invalid.

In the previous few commits we fixed cases for this in last-hop
paths (both blinded and unblinded).

Here we add assertions to enforce this, tracked in a new bool in
`PathBuildingHop`.
The router is a somewhat complicated beast, and though the last few
commits removed some code from it, a complicated beast it remains.
Thus, having `expect`s in it is somewhat risky, so we take this
opportunity to replace some of them with `debug_assert!(false)`s
and an `Err`-return.
When we see a channel come into the router as a route-hint, but its
for a direct channel of ours, we'd like to ignore the route-hint as
we have more information in the first-hop channel info. We do this
by matching SCIDs, but only considered outbound SCID aliases.

Here we change to consider both outbound SCID aliases and the full
channel SCID, which some nodes may use in their invoices.
We recently introduced release branches that need to remain backwards
compatible. However, even small changes to item visibility during
backporting fixes might introduce SemVer violations (see
https://doc.rust-lang.org/cargo/reference/semver.html#change-categories
for a list of changs that would be considered major/minor).

To make sure we don't accidentally introduce such changes in the
backports, we here add a new `semver-checks` CI job that utilizes
`cargo-semver-checks`
(https://github.com/obi1kenobi/cargo-semver-checks), and have it run on
any push or pull requests towards anything else but `main`/`master`
(i.e., all feature branches to come).
In adb0afc we started raising
bucket weights to the power four in the historical model. This
improved our model's accuracy greatly, but resulted in a much
larger `total_valid_points_tracked`. In the same commit we
converted `total_valid_points_tracked` to a float, but retained the
64-bit integer math to build it out of integer bucket values.

Sadly, 64 bits are not enough to sum 1024 bucket pairs of 16-bit
integers multiplied together and then squared (we need 16*4 + 10 =
74 bits to avoid overflow). Thus, here we replace the summation
with 128-bit integers.

Fairly straightforward merge conflicts (code added in
311a083 which was not included
neighbored new code added plus references to new methods) fixed in:
 * lightning/src/routing/scoring.rs
In `ChannelMonitorImpl::cancel_prev_commitment_claims` we need to
cancel any claims against a removed commitment transaction. We
were checking if `holder_tx_signed` before checking if either the
current or previous holder commitment transaction had pending
claims against it, but (a) there's no need to do this, there's not
a big performance cost to just always trying to remove claims and
(b) we can't actually rely on `holder_tx_signed`.

`holder_tx_signed` being set doesn't necessarily imply that the
`ChannelMonitor` was persisted (i.e. it may simply be lost in a
poorly-timed restart) but we also (somewhat theoretically) allow
for multiple copies of a `ChannelMonitor` to exist, and a different
one could have signed the commitment transaction which was
confirmed (and then unconfirmed).

Thus, we simply remove the additional check here.
`provide_latest_holder_commitment_tx` is used to handle
`ChannelMonitorUpdateStep::LatestHolderCommitmentTXInfo` updates
and returns an `Err` if we've set `holder_tx_signed`.

However, later in `ChannelMonitorImpl::update_monitor` (the only
non-test place that `provide_latest_holder_commitment_tx` is
called), we will fail the entire update if `holder_tx_signed` is
(or a few other flags are) are set if the update contained a
`LatestHolderCommitmentTXInfo` (or a few other update types).

Thus, the check in `provide_latest_holder_commitment_tx` is
entirely redundant and can be removed.
If a channel is closed on startup, but we find that the
`ChannelMonitor` isn't aware of this, we generate a
`ChannelMonitorUpdate` containing a
`ChannelMonitorUpdateStep::ChannelForceClosed`. This ensures that
the `ChannelMonitor` will not accept any future updates in case we
somehow load up a previous `ChannelManager` (though that really
shouldn't happen).

Previously, we'd apply this update only if we detected that the
`ChannelManager` had not yet informed the `ChannelMonitor` about
the channel's closure, even if the `ChannelMonitor` would already
refuse any other updates because it detected a channel closure
on chain.

This doesn't accomplish anything but an extra I/O write, so we
remove it here.

Further, a user reported that, in regtest, they could:
 (a) coop close a channel (not generating a `ChannelMonitorUpdate`)
 (b) wait just under 4032 blocks (on regtest, taking only a day)
 (c) restart the `ChannelManager`, generating the above update
 (d) connect a block or two (during the startup sequence), making
     the `ChannelMonitor` eligible for archival,
 (d) restart the `ChannelManager` again (without applying the
     update from (c), but after having archived the
     `ChannelMonitor`, leading to a failure to deserialize as we
     have a pending `ChannelMonitorUpdate` for a `ChannelMonitor`
     that has been archived.

Though it seems very unlikely this would happen on mainnet, it is
theoretically possible.
The new `bech32-v0.11.0` version (prev: `v0.9.1`) now enforces a max length of
1023 bytes. Before there was no max.

BOLT11 invoices can definitely exceed 1023 B with a long-ish description and
2 route hints, so this limit is likely too low.

Having a limit is probably a good idea. What do other projects choose? Here's
a brief survey:

LDK  (pre-0.1):  (no limit)
LDK (post-0.1):  1023 B
LDK  (post-PR):  7089 B
        LND[1]:  7089 B
        CLN[2]:  (no limit)
   ACINQ[3][4]:  (no limit)

LND uses 7089 B, which was chosen to be "the max number of bytes that can fit
in a QR code". LND's rationale is technically incorrect as QR codes actually
have a max capacity of 7089 _numeric_ characters and only support up to 4296
all-uppercase alphanumeric characters. However, ecosystem-wide consistency
is more important.

A more conservative limit that would probably also suffice might be 2953 B,
the QR code length limit for a lowercase bech32-encoded invoice.

[1]: https://github.com/lightningnetwork/lnd/blob/6531d4505098eb14e6c24aedfd752fc15e85845d/zpay32/invoice.go#L87
[2]: https://github.com/ElementsProject/lightning/blob/0e7615b1b73eee161911763840d6260baf596755/common/bolt11.c#L683
[3]: https://github.com/ACINQ/lightning-kmp/blob/feda82c853660a792b911be518367a228ed6e0ee/modules/core/src/commonMain/kotlin/fr/acinq/lightning/payment/Bolt11Invoice.kt#L165
[4]: https://github.com/ACINQ/bitcoin-kmp/blob/master/src/commonMain/kotlin/fr/acinq/bitcoin/Bech32.kt#L122
Now that `lightning` depends on `lightning-invoice`, we should
re-export it like we do `bitcoin` and `types`.
In a coming commit we'll need to hold references to
`TestChannelManager` in threads, requiring that it be `Sync`.

Fairly minor merge conflicts addressed in:
 * `lightning/src/util/test_utils.rs`
In a comming commit we'll add a test that relies heavily on lock
fairness, which is not provided by the default Rust `Mutex`.
Luckily, `parking_lot` provided an `unlock_fair`, which we use
here, though it implies we have to manually implement lock
poisoning.

Trivial merge conflict resolved in `lightning/Cargo.toml`
TheBlueMatt and others added 18 commits April 2, 2025 00:29
When we claim an MPP payment, we need to track which channels have
had the preimage durably added to their `ChannelMonitor` to ensure
we don't remove the preimage from any `ChannelMonitor`s until all
`ChannelMonitor`s have the preimage.

Previously, we tracked each MPP part, down to the HTLC ID, as a
part which we needed to get the preimage on disk for. However, this
is not necessary - once a `ChannelMonitor` has a preimage, it
applies it to all inbound HTLCs with the same payment hash.

Further, this can cause a channel to wait on itself in cases of
high-latency synchronous persistence -
 * If we have receive an MPP payment for which multiple parts came
   to us over the same channel,
 * and claim the MPP payment, creating a `ChannelMonitorUpdate` for
   the first part but enqueueing the remaining HTLC claim(s) in the
   channel's holding cell,
 * and we receive a `revoke_and_ack` for the same channel before
   the `ChannelManager::claim_payment` method completes (as each
   claim waits for the `ChannelMonitorUpdate` persistence),
 * we will cause the `ChannelMonitorUpdate` for that
   `revoke_and_ack` to go into the blocked set, waiting on the MPP
   parts to be fully claimed,
 * but when `claim_payment` goes to add the next
   `ChannelMonitorUpdate` for the MPP claim, it will be placed in
   the blocked set, since the blocked set is non-empty.

Thus, we'll end up with a `ChannelMonitorUpdate` in the blocked set
which is needed to unblock the channel since it is a part of the
MPP set which blocked the channel.

Trivial conflicts resolved in `lightning/src/util/test_utils.rs`
v0.1.2 - Apr 02, 2025 - "Foolishly Edgy Cases"

API Updates
===========

 * `lightning-invoice` is now re-exported as `lightning::bolt11_invoice`
   (lightningdevkit#3671).

Performance Improvements
========================

 * `rapid-gossip-sync` graph parsing is substantially faster, resolving a
   regression in 0.1 (lightningdevkit#3581).
 * `NetworkGraph` loading is now substantially faster and does fewer
   allocations, resulting in a 20% further improvement in `rapid-gossip-sync`
   loading when initializing from scratch (lightningdevkit#3581).
 * `ChannelMonitor`s for closed channels are no longer always re-persisted
   immediately after startup, reducing on-startup I/O burden (lightningdevkit#3619).

Bug Fixes
=========

 * BOLT 11 invoices longer than 1023 bytes long (and up to 7089 bytes) now
   properly parse (lightningdevkit#3665).
 * In some cases, when using synchronous persistence with higher latency than
   the latency to communicate with peers, when receiving an MPP payment with
   multiple parts received over the same channel, a channel could hang and not
   make progress, eventually leading to a force-closure due to timed-out HTLCs.
   This has now been fixed (lightningdevkit#3680).
 * Some rare cases with multi-hop BOLT 11 route hints or multiple redundant
   blinded paths could have led to the router creating invalid `Route`s were
   fixed (lightningdevkit#3586).
 * Corrected the decay logic in `ProbabilisticScorer`'s historical buckets
   model. Note that by default historical buckets are only decayed if no new
   datapoints have been added for a channel for two weeks (lightningdevkit#3562).
 * `{Channel,Onion}MessageHandler::peer_disconnected` will now be called if a
   different message handler refused connection by returning an `Err` from its
   `peer_connected` method (lightningdevkit#3580).
 * If the counterparty broadcasts a revoked state with pending HTLCs, those
   will now be claimed with other outputs which we consider to not be
   vulnerable to pinning attacks if they are not yet claimable by our
   counterparty, potentially reducing our exposure to pinning attacks (lightningdevkit#3564).
We don't expect anyone to access `Bolt11Bech32` directly in most
use-cases, and the bindings don't support mapping an enum that
cannot be constructed (not to mention there would be no point), so
we mark it as no-export in bindings builds.
Recently, LSPS0, 1, and 2 were upstreamed as bLIP-50, 51, and 52,
respectively. Here, we

1. start linking to the bLIPs instead of the LSP spec repository, which
   is likely going to be deprecated.
2. start consistently citing the specs as `bLIP-5X / LSPSX` to avoid any
   confusions and to potentially initiate a process in which the LSP
   specs will be referred to by their bLIP number rather than their LSPS
   number (especially given that any upcoming specs by the LSP
   spec group will directly be drafted as bLIPs going forward).
As we're not testing with `cfg(lsps1_service)`, the builds broke during
previous refactoring. This isn't bad as we're looking to completely
overhaul / rewrite LSPS1 service soon, but until then we fix this
pre-existing breakage to make sure following changes make sense.
In order to avoid naming collisions with LDK's `Event` type, we rename
`lightning-liquidity`'s `Event` to `LiquidityEvent`. To minimize furhter
churn on the upcoming renaming changes, we also `impl From X for
LiquidityEvent` for the protocol-specific event variants, which also
allows us to reduce some boilerplate while enqueuing.
This wrapper is more ergonomic to use in the local context and will be
used as a serialization wrapper in following commits.
While using a `prelude` can be helpful when we import `*` from it,
importing individual types through a `prelude` is somewhat
unnecessary indirection. Further, importing `alloc` types like
`String` and `Vec` through a `prelude` confuse the bindings
generator.

Thus, here we drop the `alloc` imports from the
`lightning-liquidity` `prelude` and replace it with direct imports.

We leave the `lightning-liquidity::prelude::hash_tables` `pub(use)`
in place as the sym-linked `debug_sync` relies on it.
`LSPS2ServiceHandler` currently requires that the `Deref` to a
`ChannelManager` be `Clone`, but doesn't use it for anything. This
upsets the bindings somewhat as they generate a wrapper struct
which implements `Deref` (as it holds a pointer) but does not
implement `Clone` (as the inner object cannot be cloned.

Thus, we simply remove the additional bound here.
@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Apr 3, 2025

I've assigned @joostjager as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

Copy link
Contributor

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a temporary branch because #3583 and #3701 also need to be included?

Not sure how to review this exactly.

@ldk-reviews-bot
Copy link

👋 The first review has been submitted!

Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer.

@TheBlueMatt
Copy link
Collaborator Author

For each release we have a *-bindings branch which has a few commits that tweak the API to make it bindings-compatible. Here I'm just copying the 0.1.2 backports PR and including a few additional PRs that we need to include lightning-liquidity in the bindings release (which, indeed, won't have to appear separately in a future 0.2-bindings branch). We don't bother with direct CI here on the bindings since the bindings process itself has fairly good test coverage and the changes to the code are straightforward, so reviewing this should basically just be checking that the backports were done faithfully.

@TheBlueMatt TheBlueMatt added the weekly goal Someone wants to land this this week label Apr 4, 2025
@TheBlueMatt TheBlueMatt merged commit bbdbd6f into lightningdevkit:0.1-bindings Apr 4, 2025
6 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
weekly goal Someone wants to land this this week
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants