Skip to content

Conversation

@remyers
Copy link

@remyers remyers commented Jan 22, 2025

After Eclair successfully sends splice_locked I saw this message in the eclair log:

2025-01-22 14:00:12,833 INFO  f.a.eclair.Diagnostics n:034f3bafd420714d9a2c0e6986a4109af5e934604f199eafef4076844c9faf0a4c c:daf879e0b02d7c4c23c15768a3c975d605165c08e38fefe1946daf0b92fb78b7 - OUT msg=SpliceLocked(daf879e0b02d7c4c23c15768a3c975d605165c08e38fefe1946daf0b92fb78b7,c8f7611495e85cab7cb244ffe87537ba1cf4104c527b9a0e08cb9dba45b4aa54,TlvStream(Set(),Set()))
2025-01-22 14:00:38,600 ERROR f.a.e.c.TransportHandler CON n:034f3bafd420714d9a2c0e6986a4109af5e934604f199eafef4076844c9faf0a4c - cannot deserialize 004ddaf879e0b02d7c4c23c15768a3c975d605165c08e38fefe1946daf0b92fb78b7: fundingTxHash: expected exactly 256 bits but got 0 bits

Apparently splice_locked in clightning is missing the splice_txid field as described in the latest splice PR.

This PR includes the minimal changes I needed to continue my testing, but there are probably other things that should be done to check the splice_txid is correct, update tests, etc.

ddustin and others added 2 commits January 21, 2025 10:32
Interop testing with Eclair revealed an issue with remote funding key rotation.

This searches for the funding output using the rotated remote funding pubkey instead of the furrent funding pubkey.

Also update the variable name to be more clear which this represents.

Changelog-Changed: Interop fixes for compatability with Eclair
@remyers remyers changed the base branch from master to ddustin/splice_interop_final_(probably) January 22, 2025 15:00
@remyers
Copy link
Author

remyers commented Jan 22, 2025

I also noticed the clightning is not resending splice_locked on reconnection - this could be because it received the splice_locked from Eclair so clightning has updated it's commit_number and now will never send it's splice_locked.

On reconnection:
  - MUST retransmit its last `splice_locked` if the `commitment_number`
    is the same as before sending `splice_locked`.

Should the spec be changed to always respond to a splice_locked with your own splice_locked if the fields match?

@remyers
Copy link
Author

remyers commented Jan 22, 2025

I've attached the logs for the test:

  • connect
  • open single funded channel
  • Eclair does a splice-in
  • (error, because Eclair can't parse splice_locked from clightning)
  • force closed the problem channel
  • (update clightning with this patch)
  • open a new single funded channel
  • Eclair does a splice-in on the new channel
  • everything seems ok

eclair.log
cln.log

@ddustin
Copy link
Owner

ddustin commented Jan 22, 2025

Looking into this 👀.

Should the spec be changed to always respond to a splice_locked with your own splice_locked if the fields match?

Core Lightning's current behavior is to send splice_locked on reconnect if we already finished mutual splice lock and the peer sends next_funding (txid) in channel_reestablish tvls.

I believe you are correct that there is a missing second case where splice_locked is sent but never received and reconnect happens. Assuming the splice signatures were exchanged successfully -- than next_funding will not appear in reestablish and no new splice_locked is sent.

" expects one; resending splice_lock");
peer_write(peer->pps,
take(towire_splice_locked(NULL, &peer->channel_id)));
take(towire_splice_locked(NULL, &peer->channel_id, &peer->channel->funding.txid)));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be &peer->splice_state->locked_txid -- however this value is not preserved after channel restarts, so that needs to addressed as well.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of a heavy lift. I think this is the best way to make splice_state->locked_txid work over restarts.

  1. Adding {SQL("ALTER TABLE channel_funding_inflights ADD locked_onchain BOOL DEFAULT 0"), NULL}, to db.c at the end of the db table be able to mark an inflight locked onchain
  2. inflight.c needs to add serializiation of is_locked with inflight->is_locked = fromwire_bool(cursor, max); in fromwire_inflight and towire_bool(pptr, inflight->is_locked); in towire_inflight.
  3. Modifying channeld_updating_inflight to include an is_locked parameter
  4. Update handle_update_inflight to take is_locked parameter and update that value in its inflight
  5. Modify wallet_inflight_save SQL query to save is_locked to the database
  6. Modify wallet_channel_load_inflights to load `is_locked from the database
  7. Update the inflight copy code in the loop above towire_channeld_init to add is_locked to the infcopy
  8. Below fromwire_channeld_init, loop through each inflight in peer->splice_state->inflights and get the txid from the psbt of the one that is_locked, placing it in peer->splice_state->locked_txid. Additionally it would be useful to assert only 1 is locked as two would be an error.

I thiiiiink that would do it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of a heavy lift. I think this is the best way to make splice_state->locked_txid work over restarts.

I can't really comment on if that's the right approach. Do you need to use splice_state->locked_txid during reconnect? It should already be your latest funding tx once you have sent and received splice_locked. If it has reached acceptable depth then your node can resend splice_locked with the txid of the latest funding tx. You must have already have saved that information after you exchanged splice_locked with your peer.

However, when you do a reconnect before exchanging splice_locked I can see that you'll need to save the txid of the funding tx negotiated for the splice. We save this information with the splice commitment along with the signatures exchanged.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it has reached acceptable depth then your node can resend splice_locked with the txid of the latest funding tx.

Yes you're right, peer->channel->funding.txid is correct in this case.

There needs to be a new separate case that uses peer->splice_state->locked_txid that I'm working on at the moment.

@ddustin
Copy link
Owner

ddustin commented Jan 22, 2025

I also noticed the clightning is not resending splice_locked on reconnection - this could be because it received the splice_locked from Eclair so clightning has updated it's commit_number and now will never send it's splice_locked.

On reconnection:
  - MUST retransmit its last `splice_locked` if the `commitment_number`
    is the same as before sending `splice_locked`.

Should the spec be changed to always respond to a splice_locked with your own splice_locked if the fields match?

It sounds like we need to rethink the sending of splice_locked here in general. Perhaps we just send it every time? 🤔

@remyers
Copy link
Author

remyers commented Jan 23, 2025

I also noticed the clightning is not resending splice_locked on reconnection - this could be because it received the splice_locked from Eclair so clightning has updated it's commit_number and now will never send it's splice_locked.

On reconnection:
  - MUST retransmit its last `splice_locked` if the `commitment_number`
    is the same as before sending `splice_locked`.

Should the spec be changed to always respond to a splice_locked with your own splice_locked if the fields match?

It sounds like we need to rethink the sending of splice_locked here in general. Perhaps we just send it every time? 🤔

I looked a bit more into the spec and our own tests. My understanding is that in the case of a reconnect the node that has sent, but not received, splice_locked will send channel_reestablish with the tlv for next_funding (0) set to the txid of the next funding tx. The node that has both sent and received splice_locked will not set the tlv for next_funding in their channel_reestablish message because it is now in a state where the current funding tx is set to the splice funding tx.

In the test I ran, during channel reestablishment Eclair will have set the next_funding tlv and clightning should see that it matches their current funding tx and resend splice_locked.

From the Message Retransmission:

A receiving node:
  - if `next_funding_txid` is set:
.
.
.
    - if `next_funding_txid` matches the latest funding transaction:
      - if that transaction has reached acceptable depth:
        - MUST send `splice_locked`.  

If you don't have a test for this behavior, then I think that's what you need to add along with a fix to the handling of channel_reestablish to send splice_locked based on the spec.

I'd be happy to retest this situation once you have a fix in place.

@ddustin
Copy link
Owner

ddustin commented Jan 23, 2025

From the Message Retransmission:

A receiving node:
  - if `next_funding_txid` is set:
.
.
.
    - if `next_funding_txid` matches the latest funding transaction:
      - if that transaction has reached acceptable depth:
        - MUST send `splice_locked`.  

If you don't have a test for this behavior, then I think that's what you need to add along with a fix to the handling of channel_reestablish to send splice_locked based on the spec.

I believe the current behavior matches the spec as written

lightning/channeld/channeld.c

Lines 5211 to 5218 in 305c377

} else if (remote_next_funding) { /* No current inflight */
if (bitcoin_txid_eq(remote_next_funding,
&peer->channel->funding.txid)) {
status_info("We have no pending splice but peer"
" expects one; resending splice_lock");
peer_write(peer->pps,
take(towire_splice_locked(NULL, &peer->channel_id)));
}

"No current inflight" means the inflight candidate has become the latest funding transaction in CLN.

I believe the issue your test is hitting is if splice_locked got interrupted halfway and did not become the funding transaction.

The spec probably needs to be updated to something like

     - if `next_funding_txid` matches the latest funding transaction or locked splice candidate:

@ddustin
Copy link
Owner

ddustin commented Jan 23, 2025

// - if `next_funding_txid` is set:
 } else if (remote_next_funding) { /* No current inflight */

 	// - if `next_funding_txid` matches the latest funding transaction:
 	if (bitcoin_txid_eq(remote_next_funding, 
 			    &peer->channel->funding.txid)) {
 		status_info("We have no pending splice but peer" 
 			    " expects one; resending splice_lock"); 

 		// - MUST send `splice_locked`.
 		peer_write(peer->pps, 
 			   take(towire_splice_locked(NULL, &peer->channel_id))); 
 	} 

The line

- if that transaction has reached acceptable depth:

Is actually redundant because there is no way to become the funding transaction without reaching acceptable depth.

@ddustin
Copy link
Owner

ddustin commented Jan 23, 2025

In the test I ran, during channel reestablishment Eclair will have set the next_funding tlv and clightning should see that it matches their current funding tx and resend splice_locked.

Do you have a copy of these logs? 👀

@remyers
Copy link
Author

remyers commented Feb 1, 2025

Commit dc3ae19 fixes the problem with CLN sending the shared outpoint with prevtx set. With these changes the interop works when CLN intiates the splice with the dev-splice command. Interop where Eclair is the splice initiator works with the previous commit.

@remyers remyers changed the title Fixed splice_locked messgae to include splice_txid field Fixed splice_locked messgae to include splice_txid field and other interop fixes Feb 1, 2025
Copy link
Owner

@ddustin ddustin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work! Few little things I would fix up but otherwise a great fix. Looks like the main issue was not setting the shared_outpoint in splice_initiator_user_update which makes sense as when the funding input was sent recently moved around.

Great find!

status_info("ictx->shared_outpoint = %s",(ictx->shared_outpoint?"defined":"null"));
char txid_hex[65];
if (ictx->shared_outpoint && bitcoin_txid_to_hex(&(ictx->shared_outpoint->txid), txid_hex, sizeof(txid_hex))) {
status_info("ictx->shared_outpoint->txid=%s, ictx->shared_outpoint->n=%d", txid_hex, ictx->shared_outpoint->n);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use fmt_bitcoin_txid instead of bitcoin_txid_to_hex

For example:

	if (!bitcoin_txid_eq(&signed_psbt_txid, &current_psbt_txid))
		status_failed(STATUS_FAIL_INTERNAL_ERROR,
			      "Signed PSBT txid %s does not match"
			      " current_psbt_txid %s",
			      fmt_bitcoin_txid(tmpctx, &signed_psbt_txid),
			      fmt_bitcoin_txid(tmpctx, &current_psbt_txid));

status_info("ictx->shared_outpoint->txid=%s, ictx->shared_outpoint->n=%d", txid_hex, ictx->shared_outpoint->n);
}
if (bitcoin_txid_to_hex(&(point.txid), txid_hex, sizeof(txid_hex))) {
status_info("point.txid=%s, point.n=%d", txid_hex, point.n);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fmt_bitcoin_txid here as well

if (bitcoin_txid_to_hex(&(point.txid), txid_hex, sizeof(txid_hex))) {
status_info("point.txid=%s, point.n=%d", txid_hex, point.n);
}
status_info("here2");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would cut this log statement

" expects one; resending splice_lock");
peer_write(peer->pps,
take(towire_splice_locked(NULL, &peer->channel_id)));
take(towire_splice_locked(NULL, &peer->channel_id, &peer->channel->funding.txid)));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it has reached acceptable depth then your node can resend splice_locked with the txid of the latest funding tx.

Yes you're right, peer->channel->funding.txid is correct in this case.

There needs to be a new separate case that uses peer->splice_state->locked_txid that I'm working on at the moment.

@ddustin ddustin force-pushed the ddustin/splice_interop_final_(probably) branch 3 times, most recently from c73f22a to 8f8ef0e Compare February 1, 2025 22:15
@ddustin
Copy link
Owner

ddustin commented Feb 4, 2025

I fixed the changes but you have editing turned off for the PR.

I'll manually pull your commits over

@ddustin ddustin closed this Feb 4, 2025
ddustin pushed a commit that referenced this pull request Oct 27, 2025
And add a check for new uses creeping in, since it got cut & paste
everywhere.

This means "this is a valid string, but truncate it to this many characters"
vs "%.*s" which means "only read this many characters of string":

```
['lightningd-3 2025-10-23T02:31:40.890Z **BROKEN** plugin-funder: Plugin marked as important, shutting down lightningd!']
--------------------------- Captured stderr teardown ---------------------------
    #0 0x557da58ad1dc in printf_common(void*, char const*, __va_list_tag*) asan_interceptors.cpp.o
    #1 0x557da5aff814 in json_out_addv /home/runner/work/lightning/lightning/ccan/ccan/json_out/json_out.c:239:11
    #2 0x557da59740ce in plugin_logv /home/runner/work/lightning/lightning/plugins/libplugin.c:1777:2
    #3 0x557da5969b6f in plugin_log /home/runner/work/lightning/lightning/plugins/libplugin.c:1934:2
    #4 0x557da595c4f6 in datastore_del_success /home/runner/work/lightning/lightning/plugins/funder.c:161:2
    #5 0x557da598b837 in handle_rpc_reply /home/runner/work/lightning/lightning/plugins/libplugin.c:1072:10
    #6 0x557da598a4b0 in rpc_conn_read_response /home/runner/work/lightning/lightning/plugins/libplugin.c:1361:3
    #7 0x557da5adbea5 in next_plan /home/runner/work/lightning/lightning/ccan/ccan/io/io.c:60:9
    ElementsProject#8 0x557da5ae06ff in do_plan /home/runner/work/lightning/lightning/ccan/ccan/io/io.c:422:8
    ElementsProject#9 0x557da5adfb58 in io_ready /home/runner/work/lightning/lightning/ccan/ccan/io/io.c:439:10
    ElementsProject#10 0x557da5aec2ce in io_loop /home/runner/work/lightning/lightning/ccan/ccan/io/poll.c:455:5
    ElementsProject#11 0x557da59757ac in plugin_main /home/runner/work/lightning/lightning/plugins/libplugin.c:2409:3
    ElementsProject#12 0x557da594fe23 in main /home/runner/work/lightning/lightning/plugins/funder.c:1723:2
    ElementsProject#13 0x7f6572229d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    ElementsProject#14 0x7f6572229e3f in __libc_start_main csu/../csu/libc-start.c:392:3
    ElementsProject#15 0x557da588b584 in _start (/home/runner/work/lightning/lightning/plugins/funder+0x10d584) (BuildId: 71ba63ab577fc6fa60573d3e8555f6db7d5c584d)

0x624000009d28 is located 0 bytes to the right of 7208-byte region [0x624000008100,0x624000009d28)
allocated by thread T0 here:
    #0 0x557da590e7f6 in __interceptor_realloc (/home/runner/work/lightning/lightning/plugins/funder+0x1907f6) (BuildId: 71ba63ab577fc6fa60573d3e8555f6db7d5c584d)
    #1 0x557da5b2149b in tal_resize_ /home/runner/work/lightning/lightning/ccan/ccan/tal/tal.c:755:13
    #2 0x557da59f2032 in membuf_tal_resize /home/runner/work/lightning/lightning/common/utils.c:203:2
    #3 0x557da5b03934 in membuf_prepare_space_ /home/runner/work/lightning/lightning/ccan/ccan/membuf/membuf.c:45:12
    #4 0x557da59d4289 in jsonrpc_io_read_ /home/runner/work/lightning/lightning/common/jsonrpc_io.c:127:2
    #5 0x557da598a635 in rpc_conn_read_response /home/runner/work/lightning/lightning/plugins/libplugin.c:1366:9
    #6 0x557da5adbea5 in next_plan /home/runner/work/lightning/lightning/ccan/ccan/io/io.c:60:9
    #7 0x557da5ae06ff in do_plan /home/runner/work/lightning/lightning/ccan/ccan/io/io.c:422:8
    ElementsProject#8 0x557da5adfb58 in io_ready /home/runner/work/lightning/lightning/ccan/ccan/io/io.c:439:10
    ElementsProject#9 0x557da5aec2ce in io_loop /home/runner/work/lightning/lightning/ccan/ccan/io/poll.c:455:5
    ElementsProject#10 0x557da59757ac in plugin_main /home/runner/work/lightning/lightning/plugins/libplugin.c:2409:3
    ElementsProject#11 0x557da594fe23 in main /home/runner/work/lightning/lightning/plugins/funder.c:1723:2
    ElementsProject#12 0x7f6572229d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16

SUMMARY: AddressSanitizer: heap-buffer-overflow asan_interceptors.cpp.o in printf_common(void*, char const*, __va_list_tag*)
Shadow bytes around the buggy address:
  0x0c487fff9350: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c487fff9360: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c487fff9370: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c487fff9380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c487fff9390: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c487fff93a0: 00 00 00 00 00[fa]fa fa fa fa fa fa fa fa fa fa
  0x0c487fff93b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c487fff93c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c487fff93d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c487fff93e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c487fff93f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==26122==ABORTING
```

Signed-off-by: Rusty Russell <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants