Fixed splice_locked messgae to include splice_txid field and other interop fixes #6

remyers · 2025-01-22T14:59:30Z

After Eclair successfully sends splice_locked I saw this message in the eclair log:

2025-01-22 14:00:12,833 INFO  f.a.eclair.Diagnostics n:034f3bafd420714d9a2c0e6986a4109af5e934604f199eafef4076844c9faf0a4c c:daf879e0b02d7c4c23c15768a3c975d605165c08e38fefe1946daf0b92fb78b7 - OUT msg=SpliceLocked(daf879e0b02d7c4c23c15768a3c975d605165c08e38fefe1946daf0b92fb78b7,c8f7611495e85cab7cb244ffe87537ba1cf4104c527b9a0e08cb9dba45b4aa54,TlvStream(Set(),Set()))
2025-01-22 14:00:38,600 ERROR f.a.e.c.TransportHandler CON n:034f3bafd420714d9a2c0e6986a4109af5e934604f199eafef4076844c9faf0a4c - cannot deserialize 004ddaf879e0b02d7c4c23c15768a3c975d605165c08e38fefe1946daf0b92fb78b7: fundingTxHash: expected exactly 256 bits but got 0 bits

Apparently splice_locked in clightning is missing the splice_txid field as described in the latest splice PR.

This PR includes the minimal changes I needed to continue my testing, but there are probably other things that should be done to check the splice_txid is correct, update tests, etc.

Interop testing with Eclair revealed an issue with remote funding key rotation. This searches for the funding output using the rotated remote funding pubkey instead of the furrent funding pubkey. Also update the variable name to be more clear which this represents. Changelog-Changed: Interop fixes for compatability with Eclair

remyers · 2025-01-22T15:16:52Z

I also noticed the clightning is not resending splice_locked on reconnection - this could be because it received the splice_locked from Eclair so clightning has updated it's commit_number and now will never send it's splice_locked.

On reconnection:
  - MUST retransmit its last `splice_locked` if the `commitment_number`
    is the same as before sending `splice_locked`.

Should the spec be changed to always respond to a splice_locked with your own splice_locked if the fields match?

remyers · 2025-01-22T15:38:05Z

I've attached the logs for the test:

connect
open single funded channel
Eclair does a splice-in
(error, because Eclair can't parse splice_locked from clightning)
force closed the problem channel
(update clightning with this patch)
open a new single funded channel
Eclair does a splice-in on the new channel
everything seems ok

eclair.log
cln.log

ddustin · 2025-01-22T17:23:08Z

Looking into this 👀.

Should the spec be changed to always respond to a splice_locked with your own splice_locked if the fields match?

Core Lightning's current behavior is to send splice_locked on reconnect if we already finished mutual splice lock and the peer sends next_funding (txid) in channel_reestablish tvls.

I believe you are correct that there is a missing second case where splice_locked is sent but never received and reconnect happens. Assuming the splice signatures were exchanged successfully -- than next_funding will not appear in reestablish and no new splice_locked is sent.

ddustin · 2025-01-22T17:35:06Z

channeld/channeld.c

 				    " expects one; resending splice_lock");
 			peer_write(peer->pps,
-				   take(towire_splice_locked(NULL, &peer->channel_id)));
+				   take(towire_splice_locked(NULL, &peer->channel_id, &peer->channel->funding.txid)));


This should be &peer->splice_state->locked_txid -- however this value is not preserved after channel restarts, so that needs to addressed as well.

This is a bit of a heavy lift. I think this is the best way to make splice_state->locked_txid work over restarts.

Adding {SQL("ALTER TABLE channel_funding_inflights ADD locked_onchain BOOL DEFAULT 0"), NULL}, to db.c at the end of the db table be able to mark an inflight locked onchain

inflight.c needs to add serializiation of is_locked with inflight->is_locked = fromwire_bool(cursor, max); in fromwire_inflight and towire_bool(pptr, inflight->is_locked); in towire_inflight.

Modifying channeld_updating_inflight to include an is_locked parameter

Update handle_update_inflight to take is_locked parameter and update that value in its inflight

Modify wallet_inflight_save SQL query to save is_locked to the database

Modify wallet_channel_load_inflights to load `is_locked from the database

Update the inflight copy code in the loop above towire_channeld_init to add is_locked to the infcopy

Below fromwire_channeld_init, loop through each inflight in peer->splice_state->inflights and get the txid from the psbt of the one that is_locked, placing it in peer->splice_state->locked_txid. Additionally it would be useful to assert only 1 is locked as two would be an error.

I thiiiiink that would do it.

This is a bit of a heavy lift. I think this is the best way to make splice_state->locked_txid work over restarts.

I can't really comment on if that's the right approach. Do you need to use splice_state->locked_txid during reconnect? It should already be your latest funding tx once you have sent and received splice_locked. If it has reached acceptable depth then your node can resend splice_locked with the txid of the latest funding tx. You must have already have saved that information after you exchanged splice_locked with your peer.

However, when you do a reconnect before exchanging splice_locked I can see that you'll need to save the txid of the funding tx negotiated for the splice. We save this information with the splice commitment along with the signatures exchanged.

If it has reached acceptable depth then your node can resend splice_locked with the txid of the latest funding tx.

Yes you're right, peer->channel->funding.txid is correct in this case.

There needs to be a new separate case that uses peer->splice_state->locked_txid that I'm working on at the moment.

ddustin · 2025-01-22T17:50:18Z

I also noticed the clightning is not resending splice_locked on reconnection - this could be because it received the splice_locked from Eclair so clightning has updated it's commit_number and now will never send it's splice_locked.
On reconnection:
  - MUST retransmit its last `splice_locked` if the `commitment_number`
    is the same as before sending `splice_locked`.
Should the spec be changed to always respond to a splice_locked with your own splice_locked if the fields match?

It sounds like we need to rethink the sending of splice_locked here in general. Perhaps we just send it every time? 🤔

remyers · 2025-01-23T08:45:48Z

I also noticed the clightning is not resending splice_locked on reconnection - this could be because it received the splice_locked from Eclair so clightning has updated it's commit_number and now will never send it's splice_locked.
On reconnection:
  - MUST retransmit its last `splice_locked` if the `commitment_number`
    is the same as before sending `splice_locked`.
Should the spec be changed to always respond to a splice_locked with your own splice_locked if the fields match?
It sounds like we need to rethink the sending of splice_locked here in general. Perhaps we just send it every time? 🤔

I looked a bit more into the spec and our own tests. My understanding is that in the case of a reconnect the node that has sent, but not received, splice_locked will send channel_reestablish with the tlv for next_funding (0) set to the txid of the next funding tx. The node that has both sent and received splice_locked will not set the tlv for next_funding in their channel_reestablish message because it is now in a state where the current funding tx is set to the splice funding tx.

In the test I ran, during channel reestablishment Eclair will have set the next_funding tlv and clightning should see that it matches their current funding tx and resend splice_locked.

From the Message Retransmission:

A receiving node:
  - if `next_funding_txid` is set:
.
.
.
    - if `next_funding_txid` matches the latest funding transaction:
      - if that transaction has reached acceptable depth:
        - MUST send `splice_locked`.

If you don't have a test for this behavior, then I think that's what you need to add along with a fix to the handling of channel_reestablish to send splice_locked based on the spec.

I'd be happy to retest this situation once you have a fix in place.

ddustin · 2025-01-23T20:44:38Z

From the Message Retransmission:
A receiving node:
  - if `next_funding_txid` is set:
.
.
.
    - if `next_funding_txid` matches the latest funding transaction:
      - if that transaction has reached acceptable depth:
        - MUST send `splice_locked`.  
If you don't have a test for this behavior, then I think that's what you need to add along with a fix to the handling of channel_reestablish to send splice_locked based on the spec.

I believe the current behavior matches the spec as written

lightning/channeld/channeld.c

Lines 5211 to 5218 in 305c377

    
           } else if (remote_next_funding) { /* No current inflight */ 
        
           	if (bitcoin_txid_eq(remote_next_funding, 
        
           			    &peer->channel->funding.txid)) { 
        
           		status_info("We have no pending splice but peer" 
        
           			    " expects one; resending splice_lock"); 
        
           		peer_write(peer->pps, 
        
           			   take(towire_splice_locked(NULL, &peer->channel_id))); 
        
           	}

"No current inflight" means the inflight candidate has become the latest funding transaction in CLN.

I believe the issue your test is hitting is if splice_locked got interrupted halfway and did not become the funding transaction.

The spec probably needs to be updated to something like

     - if `next_funding_txid` matches the latest funding transaction or locked splice candidate:

ddustin · 2025-01-23T20:48:38Z

// - if `next_funding_txid` is set:
 } else if (remote_next_funding) { /* No current inflight */

 	// - if `next_funding_txid` matches the latest funding transaction:
 	if (bitcoin_txid_eq(remote_next_funding, 
 			    &peer->channel->funding.txid)) {
 		status_info("We have no pending splice but peer" 
 			    " expects one; resending splice_lock"); 

 		// - MUST send `splice_locked`.
 		peer_write(peer->pps, 
 			   take(towire_splice_locked(NULL, &peer->channel_id))); 
 	}

The line

- if that transaction has reached acceptable depth:

Is actually redundant because there is no way to become the funding transaction without reaching acceptable depth.

ddustin · 2025-01-23T21:03:53Z

In the test I ran, during channel reestablishment Eclair will have set the next_funding tlv and clightning should see that it matches their current funding tx and resend splice_locked.

Do you have a copy of these logs? 👀

remyers · 2025-02-01T11:02:56Z

Commit dc3ae19 fixes the problem with CLN sending the shared outpoint with prevtx set. With these changes the interop works when CLN intiates the splice with the dev-splice command. Interop where Eclair is the splice initiator works with the previous commit.

ddustin

Awesome work! Few little things I would fix up but otherwise a great fix. Looks like the main issue was not setting the shared_outpoint in splice_initiator_user_update which makes sense as when the funding input was sent recently moved around.

Great find!

ddustin · 2025-02-01T13:49:18Z

common/interactivetx.c

+		status_info("ictx->shared_outpoint = %s",(ictx->shared_outpoint?"defined":"null"));
+		char txid_hex[65];
+		if (ictx->shared_outpoint && bitcoin_txid_to_hex(&(ictx->shared_outpoint->txid), txid_hex, sizeof(txid_hex))) {
+			status_info("ictx->shared_outpoint->txid=%s, ictx->shared_outpoint->n=%d", txid_hex, ictx->shared_outpoint->n);


This should use fmt_bitcoin_txid instead of bitcoin_txid_to_hex

For example:

if (!bitcoin_txid_eq(&signed_psbt_txid, &current_psbt_txid)) status_failed(STATUS_FAIL_INTERNAL_ERROR, "Signed PSBT txid %s does not match" " current_psbt_txid %s", fmt_bitcoin_txid(tmpctx, &signed_psbt_txid), fmt_bitcoin_txid(tmpctx, &current_psbt_txid));

ddustin · 2025-02-01T13:51:24Z

common/interactivetx.c

+			status_info("ictx->shared_outpoint->txid=%s, ictx->shared_outpoint->n=%d", txid_hex, ictx->shared_outpoint->n);
+		}
+		if (bitcoin_txid_to_hex(&(point.txid), txid_hex, sizeof(txid_hex))) {
+			status_info("point.txid=%s, point.n=%d", txid_hex, point.n);


fmt_bitcoin_txid here as well

ddustin · 2025-02-01T13:51:41Z

common/interactivetx.c

+		if (bitcoin_txid_to_hex(&(point.txid), txid_hex, sizeof(txid_hex))) {
+			status_info("point.txid=%s, point.n=%d", txid_hex, point.n);
+		}
+		status_info("here2");


I would cut this log statement

ddustin · 2025-02-01T14:03:53Z

channeld/channeld.c

 				    " expects one; resending splice_lock");
 			peer_write(peer->pps,
-				   take(towire_splice_locked(NULL, &peer->channel_id)));
+				   take(towire_splice_locked(NULL, &peer->channel_id, &peer->channel->funding.txid)));


If it has reached acceptable depth then your node can resend splice_locked with the txid of the latest funding tx.

Yes you're right, peer->channel->funding.txid is correct in this case.

There needs to be a new separate case that uses peer->splice_state->locked_txid that I'm working on at the moment.

ddustin · 2025-02-04T17:22:42Z

I fixed the changes but you have editing turned off for the PR.

I'll manually pull your commits over

And add a check for new uses creeping in, since it got cut & paste everywhere. This means "this is a valid string, but truncate it to this many characters" vs "%.*s" which means "only read this many characters of string": ``` ['lightningd-3 2025-10-23T02:31:40.890Z **BROKEN** plugin-funder: Plugin marked as important, shutting down lightningd!'] --------------------------- Captured stderr teardown --------------------------- #0 0x557da58ad1dc in printf_common(void*, char const*, __va_list_tag*) asan_interceptors.cpp.o #1 0x557da5aff814 in json_out_addv /home/runner/work/lightning/lightning/ccan/ccan/json_out/json_out.c:239:11 #2 0x557da59740ce in plugin_logv /home/runner/work/lightning/lightning/plugins/libplugin.c:1777:2 #3 0x557da5969b6f in plugin_log /home/runner/work/lightning/lightning/plugins/libplugin.c:1934:2 #4 0x557da595c4f6 in datastore_del_success /home/runner/work/lightning/lightning/plugins/funder.c:161:2 #5 0x557da598b837 in handle_rpc_reply /home/runner/work/lightning/lightning/plugins/libplugin.c:1072:10 #6 0x557da598a4b0 in rpc_conn_read_response /home/runner/work/lightning/lightning/plugins/libplugin.c:1361:3 #7 0x557da5adbea5 in next_plan /home/runner/work/lightning/lightning/ccan/ccan/io/io.c:60:9 ElementsProject#8 0x557da5ae06ff in do_plan /home/runner/work/lightning/lightning/ccan/ccan/io/io.c:422:8 ElementsProject#9 0x557da5adfb58 in io_ready /home/runner/work/lightning/lightning/ccan/ccan/io/io.c:439:10 ElementsProject#10 0x557da5aec2ce in io_loop /home/runner/work/lightning/lightning/ccan/ccan/io/poll.c:455:5 ElementsProject#11 0x557da59757ac in plugin_main /home/runner/work/lightning/lightning/plugins/libplugin.c:2409:3 ElementsProject#12 0x557da594fe23 in main /home/runner/work/lightning/lightning/plugins/funder.c:1723:2 ElementsProject#13 0x7f6572229d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16 ElementsProject#14 0x7f6572229e3f in __libc_start_main csu/../csu/libc-start.c:392:3 ElementsProject#15 0x557da588b584 in _start (/home/runner/work/lightning/lightning/plugins/funder+0x10d584) (BuildId: 71ba63ab577fc6fa60573d3e8555f6db7d5c584d) 0x624000009d28 is located 0 bytes to the right of 7208-byte region [0x624000008100,0x624000009d28) allocated by thread T0 here: #0 0x557da590e7f6 in __interceptor_realloc (/home/runner/work/lightning/lightning/plugins/funder+0x1907f6) (BuildId: 71ba63ab577fc6fa60573d3e8555f6db7d5c584d) #1 0x557da5b2149b in tal_resize_ /home/runner/work/lightning/lightning/ccan/ccan/tal/tal.c:755:13 #2 0x557da59f2032 in membuf_tal_resize /home/runner/work/lightning/lightning/common/utils.c:203:2 #3 0x557da5b03934 in membuf_prepare_space_ /home/runner/work/lightning/lightning/ccan/ccan/membuf/membuf.c:45:12 #4 0x557da59d4289 in jsonrpc_io_read_ /home/runner/work/lightning/lightning/common/jsonrpc_io.c:127:2 #5 0x557da598a635 in rpc_conn_read_response /home/runner/work/lightning/lightning/plugins/libplugin.c:1366:9 #6 0x557da5adbea5 in next_plan /home/runner/work/lightning/lightning/ccan/ccan/io/io.c:60:9 #7 0x557da5ae06ff in do_plan /home/runner/work/lightning/lightning/ccan/ccan/io/io.c:422:8 ElementsProject#8 0x557da5adfb58 in io_ready /home/runner/work/lightning/lightning/ccan/ccan/io/io.c:439:10 ElementsProject#9 0x557da5aec2ce in io_loop /home/runner/work/lightning/lightning/ccan/ccan/io/poll.c:455:5 ElementsProject#10 0x557da59757ac in plugin_main /home/runner/work/lightning/lightning/plugins/libplugin.c:2409:3 ElementsProject#11 0x557da594fe23 in main /home/runner/work/lightning/lightning/plugins/funder.c:1723:2 ElementsProject#12 0x7f6572229d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16 SUMMARY: AddressSanitizer: heap-buffer-overflow asan_interceptors.cpp.o in printf_common(void*, char const*, __va_list_tag*) Shadow bytes around the buggy address: 0x0c487fff9350: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c487fff9360: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c487fff9370: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c487fff9380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c487fff9390: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 =>0x0c487fff93a0: 00 00 00 00 00[fa]fa fa fa fa fa fa fa fa fa fa 0x0c487fff93b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c487fff93c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c487fff93d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c487fff93e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c487fff93f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==26122==ABORTING ``` Signed-off-by: Rusty Russell <[email protected]>

ddustin and others added 2 commits January 21, 2025 10:32

Fixed splice_locked messgae to include splice_txid field

1d98b89

remyers changed the base branch from master to ddustin/splice_interop_final_(probably) January 22, 2025 15:00

ddustin reviewed Jan 22, 2025

View reviewed changes

ddustin mentioned this pull request Jan 23, 2025

Channel Splicing (feature 62/63) lightning/bolts#1160

Open

Add debug outputs and fix prevtx issue

dc3ae19

remyers changed the title ~~Fixed splice_locked messgae to include splice_txid field~~ Fixed splice_locked messgae to include splice_txid field and other interop fixes Feb 1, 2025

ddustin requested changes Feb 1, 2025

View reviewed changes

ddustin force-pushed the ddustin/splice_interop_final_(probably) branch 3 times, most recently from c73f22a to 8f8ef0e Compare February 1, 2025 22:15

ddustin closed this Feb 4, 2025

Fixed splice_locked messgae to include splice_txid field and other interop fixes #6

Fixed splice_locked messgae to include splice_txid field and other interop fixes #6

Uh oh!

Conversation

remyers commented Jan 22, 2025

Uh oh!

remyers commented Jan 22, 2025

Uh oh!

remyers commented Jan 22, 2025

Uh oh!

ddustin commented Jan 22, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ddustin commented Jan 22, 2025

Uh oh!

remyers commented Jan 23, 2025

Uh oh!

ddustin commented Jan 23, 2025

Uh oh!

ddustin commented Jan 23, 2025

Uh oh!

ddustin commented Jan 23, 2025

Uh oh!

remyers commented Feb 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ddustin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ddustin commented Feb 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

remyers commented Feb 1, 2025 •

edited

Loading