Skip to content

[bug]: unknown postgres query hangs on startup #9730

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ZZiigguurraatt opened this issue Apr 18, 2025 · 17 comments · May be fixed by #9756
Open

[bug]: unknown postgres query hangs on startup #9730

ZZiigguurraatt opened this issue Apr 18, 2025 · 17 comments · May be fixed by #9756
Assignees
Labels

Comments

@ZZiigguurraatt
Copy link

In #9729 we have a long delay on startup when using postgres and a medium delay on startup with SQLite getting through a search for in flight HTLC in the payments DB.

With postgres, I'm having another problem on startup. After the scan for in flight HTLC completes, I still have a postgres process running with 100% CPU:

Image

I believe this process was started around the same time and was running in parallel to the search for in flight HTLC in the payments DB. However, after the search for in flight HTLC in the payments DB completes, I've let things go for another ~3 additional hours waiting for that extra postgres process to complete.

I tried running a profile to see what could be going on:

Image

I'm not sure what is going on, so I tried to kill the postgres process. After killing, I get the following in my logs:

2025-04-17 20:14:04.922 [ERR] RPCS: [/lnrpc.Lightning/SubscribeInvoices]: unable to get invoices added since index 4700862: unknown postgres error: FATAL: terminating connection due to administrator command (SQLSTATE 57P01)
2025-04-17 20:14:04.925 [DBG] STAT: Setting the accounts sub-server as errored: Error in invoice subscription: rpc error: code = Unknown desc = unable to get invoices added since index 4700862: unknown postgres error: FATAL: terminating connection due to administrator command (SQLSTATE 57P01)
2025-04-17 20:14:04.925 [ERR] STAT: could not start the accounts sub-server: Error in invoice subscription: rpc error: code = Unknown desc = unable to get invoices added since index 4700862: unknown postgres error: FATAL: terminating connection due to administrator command (SQLSTATE 57P01)
2025-04-17 20:14:04.925 [ERR] LITD: Error thrown in the accounts service, keeping litd running: Error in invoice subscription: rpc error: code = Unknown desc = unable to get invoices added since index 4700862: unknown postgres error: FATAL: terminating connection due to administrator command (SQLSTATE 57P01)
db-1  | 2025-04-17 20:14:04.910 UTC [36] FATAL:  terminating connection due to administrator command
db-1  | 2025-04-17 20:14:04.910 UTC [36] STATEMENT:  -- name: FilterInvoices :many
db-1  | 	SELECT
db-1  | 	    invoices.id, invoices.hash, invoices.preimage, invoices.settle_index, invoices.settled_at, invoices.memo, invoices.amount_msat, invoices.cltv_delta, invoices.expiry, invoices.payment_addr, invoices.payment_request, invoices.payment_request_hash, invoices.state, invoices.amount_paid_msat, invoices.is_amp, invoices.is_hodl, invoices.is_keysend, invoices.created_at
db-1  | 	FROM invoices
db-1  | 	WHERE (
db-1  | 	    id >= $1 OR 
db-1  | 	    $1 IS NULL
db-1  | 	) AND (
db-1  | 	    id <= $2 OR 
db-1  | 	    $2 IS NULL
db-1  | 	) AND (
db-1  | 	    settle_index >= $3 OR
db-1  | 	    $3 IS NULL
db-1  | 	) AND (
db-1  | 	    settle_index <= $4 OR
db-1  | 	    $4 IS NULL
db-1  | 	) AND (
db-1  | 	    state = $5 OR 
db-1  | 	    $5 IS NULL
db-1  | 	) AND (
db-1  | 	    created_at >= $6 OR
db-1  | 	    $6 IS NULL
db-1  | 	) AND (
db-1  | 	    created_at < $7 OR 
db-1  | 	    $7 IS NULL
db-1  | 	) AND (
db-1  | 	    CASE
db-1  | 	        WHEN $8 = TRUE THEN (state = 0 OR state = 3)
db-1  | 	        ELSE TRUE 
db-1  | 	    END
db-1  | 	)
db-1  | 	ORDER BY
db-1  | 	CASE
db-1  | 	    WHEN $9 = FALSE OR $9 IS NULL THEN id
db-1  | 	    ELSE NULL
db-1  | 	    END ASC,
db-1  | 	CASE
db-1  | 	    WHEN $9 = TRUE THEN id
db-1  | 	    ELSE NULL
db-1  | 	END DESC
db-1  | 	LIMIT $11 OFFSET $10
db-1  | 	

Which leads me to believe that because I'm actually running litd, litd is what might be interacting with the DB here too. I'm wondering, can litd connect to the database directly and not go through lnd? If so, could litd not be seeing the tombstone marker on an old KV schema DB or not obeying the --lnd.db.use-native-sql argument?

Or, is this query that litd is making intended to be an ongoing thing that keeps litd accounts in sync with lnd invoices and it is just really inefficient in how it is implemented in postgres and it needs to be refactored?

Also, I re-tested the above with SQLite with what should be the same DB (it was created from the same original bbolt DB) and after a 17 minute period to complete a search through the payments DB for in flight HTLC (#9729 (comment)) there was no CPU activity after that. So SQLite must be finishing the extra (invoice???) workload very quickly and I didn't even notice it.

@ZZiigguurraatt ZZiigguurraatt added bug Unintended code behaviour needs triage labels Apr 18, 2025
@bhandras
Copy link
Collaborator

As you can see it's lnd's SubscribeInvoices call that is hogging your setup. Would you mind sharing some more details about your DB size, number of payments, how you created the test db?

@Roasbeef
Copy link
Member

What parameters are you calling SubscribeInvoices with?

@ZZiigguurraatt
Copy link
Author

What parameters are you calling SubscribeInvoices with?

I'm not calling SubscribeInvoices. I'm wondering if litd is doing that?

@ZZiigguurraatt
Copy link
Author

As you can see it's lnd's SubscribeInvoices call that is hogging your setup. Would you mind sharing some more details about your DB size, number of payments, how you created the test db?

I have 5,835,721 payments. I'm not sure how many invoices I have because of #9717, but it is on a similar order of magnitude (definitely within 20%).

I created the DB by sending 1 sat payments back and forth between two nodes, but there is only one channel opening between those two nodes. In the beginning I had a few different ways I was trying to ping-pong the payments and there were some problems, which is why the number of payments isn't the same as the number of invoices.

Here's the amount of data the DB uses:

# du -hs pgdata/
33G	pgdata/
#

@ZZiigguurraatt
Copy link
Author

ZZiigguurraatt commented Apr 20, 2025

Also, I re-tested the above with SQLite with what should be the same DB (it was created from the same original bbolt DB) and after a 17 minute period to complete a search through the payments DB for in flight HTLC (#9729 (comment)) there was no CPU activity after that. So SQLite must be finishing the extra (invoice???) workload very quickly and I didn't even notice it.

Been looking at this SQLite version that is much faster and I'm able to get some relevant information out of the log file using #9734

2025-04-20 17:22:34.479 [INF] GRDN: Starting main custodian event loop
2025-04-20 17:22:34.531 [INF] LITD: Starting LiT session server
2025-04-20 17:22:34.532 [INF] LITD: Starting LiT account service
2025-04-20 17:22:34.533 [DBG] STAT: Setting the accounts sub-server as running
2025-04-20 17:22:34.533 [INF] LITD: Starting LiT middleware manager
2025-04-20 17:22:34.533 [DBG] RPCS: [/lnrpc.Lightning/RegisterRPCMiddleware] requested
2025-04-20 17:22:34.533 [DBG] RPCS: [/lnrpc.Lightning/SubscribeInvoices] requested
2025-04-20 17:22:34.533 [DBG] INVC: Delivering added invoices since 4700862 for client 1
2025-04-20 17:22:34.533 [WRN] RPCP: A new gRPC middleware with the name 'lit-privacy-mapper' was registered  with custom_macaroon_caveat='privacy', read_only=
false. Make sure you trust the middleware author since that code will be able to intercept and possibly modify any gRPC messages sent/received to/from a clien
t that has a macaroon with that custom caveat.
2025-04-20 17:22:34.534 [DBG] RPCS: [/lnrpc.Lightning/RegisterRPCMiddleware] requested
2025-04-20 17:22:34.534 [WRN] RPCP: A new gRPC middleware with the name 'lit-account' was registered  with custom_macaroon_caveat='account', read_only=false. 
Make sure you trust the middleware author since that code will be able to intercept and possibly modify any gRPC messages sent/received to/from a client that 
has a macaroon with that custom caveat.
2025-04-20 17:22:34.534 [DBG] RPCS: [/lnrpc.Lightning/RegisterRPCMiddleware] requested
2025-04-20 17:22:34.534 [WRN] RPCP: A new gRPC middleware with the name 'lit-macaroon-firewall-logger' was registered  with custom_macaroon_caveat='', read_on
ly=true. Make sure you trust the middleware author since that code will be able to intercept and possibly modify any gRPC messages sent/received to/from a cli
ent that has a macaroon with that custom caveat.
2025-04-20 17:22:34.534 [INF] LITD: Internal sub server startup complete
2025-04-20 17:22:34.534 [DBG] STAT: Setting the lit sub-server as running

2025-04-20 17:22:45.758 [DBG] INVC: Completed scanning invoices added since index 4700862: total_processed=0, found_invoices=0, elapsed=11.225s
2025-04-20 17:22:45.758 [DBG] INVC: Delivering settled invoices since 4700421 for client 1

2025-04-20 17:22:57.359 [DBG] INVC: Completed scanning invoices settled since index 4700421: total_processed=0, found_invoices=0, elapsed=11.601s
2025-04-20 17:22:57.359 [INF] INVC: New invoice subscription client: id=1

So, it seems litd accounts is calling SubscribeInvoices.

What I don't get is does litd accounts keep track of what invoices it has already subscribed to and resume from there after each restart? If so, does it need to re-do this from the beginning after the invoice DB migration from KV to SQL schema?

Also, in #9716 (comment) I get much faster performance with listinvoices using postgres over SQLite, so I'm not sure why subscribeinvoices is doing so bad with postgres.

It could be that my SQLite version ran overnight once and I didn't realize it was doing that and now litd is just resuming and has nothing to re-scan (because I've created no new invoices).

@ZZiigguurraatt
Copy link
Author

ZZiigguurraatt commented Apr 20, 2025

OK, I've switched back to my postgres DB and am again now using #9734. One thing I've noticed here is that the litd accounts sub-server tries to start up before the Channel Router and then they run in parallel (with SQLite, Channel Router was started first and then after started, then litd accounts sub-server was started after scanning for inflight payments completed).

2025-04-20 18:49:37.357 [INF] LITD: Starting LiT session server
2025-04-20 18:49:37.358 [INF] LITD: Starting LiT account service
2025-04-20 18:49:37.359 [DBG] STAT: Setting the accounts sub-server as running
2025-04-20 18:49:37.359 [INF] LITD: Starting LiT middleware manager
2025-04-20 18:49:37.359 [DBG] RPCS: [/lnrpc.Lightning/SubscribeInvoices] requested
2025-04-20 18:49:37.359 [DBG] RPCS: [/lnrpc.Lightning/RegisterRPCMiddleware] requested
2025-04-20 18:49:37.360 [DBG] INVC: Delivering added invoices since 4700862 for client 1
2025-04-20 18:49:37.360 [WRN] RPCP: A new gRPC middleware with the name 'lit-privacy-mapper' was registered  with custom_macaroon_caveat='privacy', read_only=
false. Make sure you trust the middleware author since that code will be able to intercept and possibly modify any gRPC messages sent/received to/from a clien
t that has a macaroon with that custom caveat.
2025-04-20 18:49:37.360 [DBG] RPCS: [/lnrpc.Lightning/RegisterRPCMiddleware] requested
2025-04-20 18:49:37.361 [WRN] RPCP: A new gRPC middleware with the name 'lit-account' was registered  with custom_macaroon_caveat='account', read_only=false. 
Make sure you trust the middleware author since that code will be able to intercept and possibly modify any gRPC messages sent/received to/from a client that 
has a macaroon with that custom caveat.
2025-04-20 18:49:37.361 [DBG] RPCS: [/lnrpc.Lightning/RegisterRPCMiddleware] requested
2025-04-20 18:49:37.362 [WRN] RPCP: A new gRPC middleware with the name 'lit-macaroon-firewall-logger' was registered  with custom_macaroon_caveat='', read_on
ly=true. Make sure you trust the middleware author since that code will be able to intercept and possibly modify any gRPC messages sent/received to/from a cli
ent that has a macaroon with that custom caveat.
2025-04-20 18:49:37.362 [INF] LITD: Internal sub server startup complete
2025-04-20 18:49:37.362 [DBG] STAT: Setting the lit sub-server as running
2025-04-20 18:49:37.595 [DBG] INVC: Processed 1000 invoices added since invoice with add index 4700862
2025-04-20 18:49:37.788 [DBG] INVC: Processed 2000 invoices added since invoice with add index 4700862
2025-04-20 18:49:37.978 [DBG] INVC: Processed 3000 invoices added since invoice with add index 4700862
2025-04-20 18:49:38.168 [DBG] INVC: Processed 4000 invoices added since invoice with add index 4700862
2025-04-20 18:49:38.366 [DBG] INVC: Processed 5000 invoices added since invoice with add index 4700862
2025-04-20 18:49:38.565 [DBG] INVC: Processed 6000 invoices added since invoice with add index 4700862
2025-04-20 18:49:38.764 [DBG] INVC: Processed 7000 invoices added since invoice with add index 4700862
2025-04-20 18:49:38.975 [DBG] INVC: Processed 8000 invoices added since invoice with add index 4700862
2025-04-20 18:49:39.179 [DBG] INVC: Processed 9000 invoices added since invoice with add index 4700862
2025-04-20 18:49:39.388 [DBG] INVC: Processed 10000 invoices added since invoice with add index 4700862
2025-04-20 18:49:39.605 [DBG] INVC: Processed 11000 invoices added since invoice with add index 4700862
2025-04-20 18:49:39.810 [DBG] INVC: Processed 12000 invoices added since invoice with add index 4700862
2025-04-20 18:49:40.025 [DBG] INVC: Processed 13000 invoices added since invoice with add index 4700862
2025-04-20 18:49:40.247 [DBG] INVC: Processed 14000 invoices added since invoice with add index 4700862
2025-04-20 18:49:40.472 [DBG] INVC: Processed 15000 invoices added since invoice with add index 4700862
2025-04-20 18:49:40.699 [DBG] INVC: Processed 16000 invoices added since invoice with add index 4700862
2025-04-20 18:49:40.931 [DBG] INVC: Processed 17000 invoices added since invoice with add index 4700862
2025-04-20 18:50:12.034 [INF] CRTR: Channel Router starting
2025-04-20 18:50:12.034 [DBG] CRTR: Scanning for inflight payments
2025-04-20 18:50:12.072 [DBG] INVC: Processed 102000 invoices added since invoice with add index 4700862
2025-04-20 18:50:12.560 [DBG] INVC: Processed 103000 invoices added since invoice with add index 4700862
2025-04-20 18:50:13.067 [DBG] INVC: Processed 104000 invoices added since invoice with add index 4700862
2025-04-20 18:50:13.219 [DBG] CHDB: Scanning inflight payments (in progress), processed 1000, last processed payment: payment_id=000aaec6cc94b7f6cc1bc4e8455742f29df82da1cc5e48b8610e59cd37e1af1e, amount=1000 mSAT, created_at=2025-03-26 19:54:33.739105593 +0000 UTC, payment_request=[108 110 98 99 114 116 49 48 110 49 112 110 55 103 107 109 101 112 112 53 113 113 57 50 97 51 107 118 106 106 109 108 100 110 113 109 99 110 53 121 50 52 54 122 55 50 119 108 115 116 100 112 101 51 48 121 51 119 114 112 112 101 118 117 54 100 108 112 52 117 48 113 100 113 113 99 113 122 122 115 120 113 121 122 53 118 113 115 112 53 122 116 48 120 118 118 48 121 116 103 122 119 119 53 122 109 103 56 121 54 119 118 48 51 116 57 55 54 121 116 119 102 118 119 121 103 103 122 108 115 102 119 108 112 108 104 99 51 118 102 114 115 57 113 120 112 113 121 115 103 113 103 113 57 50 112 99 97 118 109 54 113 119 56 122 100 48 102 103 57 99 101 54 50 97 122 119 113 109 102 107 56 121 53 117 121 53 119 120 107 101 117 110 104 107 119 104 116 51 108 97 55 121 112 113 118 112 113 103 53 99 101 110 57 97 57 48 101 121 57 122 108 50 103 119 116 103 118 112 118 115 108 102 114 106 55 116 101 102 107 56 117 115 112 48 104 114 53 55 120 108 117 118 115 112 57 99 56 122 113 52]
2025-04-20 18:50:13.557 [DBG] INVC: Processed 105000 invoices added since invoice with add index 4700862
2025-04-20 18:50:14.062 [DBG] INVC: Processed 106000 invoices added since invoice with add index 4700862
2025-04-20 18:50:14.418 [DBG] CHDB: Scanning inflight payments (in progress), processed 2000, last processed payment: payment_id=0015f0989f969dfb64dabc930b6476c4e1bf3fa98db3d8e4afd06442e0c1b964, amount=1000 mSAT, created_at=2025-03-28 20:05:08.123212442 +0000 UTC, payment_request=[108 110 98 99 114 116 49 48 110 49 112 110 55 119 113 56 53 112 112 53 113 113 50 108 112 120 121 108 106 54 119 108 107 101 120 54 104 106 102 115 107 101 114 107 99 110 115 109 55 48 97 102 51 107 101 97 51 101 57 48 54 112 106 121 57 99 120 112 104 57 106 113 100 113 113 99 113 122 122 115 120 113 121 122 53 118 113 115 112 53 102 116 106 97 52 122 100 102 103 107 55 48 100 101 57 108 55 119 119 101 118 106 106 113 55 55 122 104 53 53 53 101 119 101 104 57 114 119 51 104 51 120 120 101 122 57 101 121 56 52 121 115 57 113 120 112 113 121 115 103 113 53 122 97 104 104 106 121 118 103 122 116 57 48 120 120 114 101 100 117 99 117 116 106 56 117 108 54 56 57 115 114 106 106 110 104 120 110 113 53 53 117 115 121 112 103 121 109 110 99 107 53 51 101 101 104 117 109 104 108 116 99 107 121 50 107 101 110 103 53 48 104 53 55 48 115 108 57 50 120 119 116 119 120 114 56 57 107 56 50 108 48 117 48 51 51 106 118 50 112 120 55 106 113 113 101 114 106 119 99 104]
2025-04-20 18:50:14.574 [DBG] INVC: Processed 107000 invoices added since invoice with add index 4700862
2025-04-20 18:50:15.083 [DBG] INVC: Processed 108000 invoices added since invoice with add index 4700862
2025-04-20 18:50:15.591 [DBG] INVC: Processed 109000 invoices added since invoice with add index 4700862
2025-04-20 18:50:15.625 [DBG] CHDB: Scanning inflight payments (in progress), processed 3000, last processed payment: payment_id=0020a2e05bb9df736ff50981d29305bbce7e5c0adc25dadf4c09d93ba887629d, amount=1000 mSAT, created_at=2025-03-29 17:36:40.465673778 +0000 UTC, payment_request=[108 110 98 99 114 116 49 48 110 49 112 110 55 115 116 97 103 112 112 53 113 113 115 50 57 99 122 109 104 56 48 104 120 109 108 52 112 120 113 97 57 121 99 57 104 48 56 56 117 104 113 50 109 115 106 97 52 104 54 118 112 56 118 110 104 50 121 56 118 50 119 115 100 113 113 99 113 122 122 115 120 113 121 122 53 118 113 115 112 53 114 110 119 122 112 104 48 109 51 100 104 57 54 122 115 117 50 100 115 107 53 121 48 55 108 116 53 106 53 55 53 106 109 101 117 99 99 102 54 99 54 57 106 102 100 114 113 102 57 100 56 113 57 113 120 112 113 121 115 103 113 50 114 103 100 115 117 116 99 118 103 101 122 110 113 113 54 117 115 57 115 107 101 100 119 115 54 119 109 54 120 57 99 113 120 115 110 107 54 99 53 56 55 112 50 56 56 104 121 53 51 107 113 106 106 97 103 120 107 54 54 106 113 113 56 51 101 112 106 107 53 109 102 53 114 102 117 104 115 109 118 56 112 112 113 101 57 50 56 112 119 50 57 121 100 109 104 122 50 113 107 101 116 115 113 119 100 56 113 119 110]
2025-04-20 18:50:16.126 [DBG] INVC: Processed 110000 invoices added since invoice with add index 4700862
2025-04-20 18:50:16.707 [DBG] INVC: Processed 111000 invoices added since invoice with add index 4700862

@ZZiigguurraatt
Copy link
Author

ZZiigguurraatt commented Apr 21, 2025

I let the above config with postgres run for the rest of the day. I eventually got this (note the elapsed=7h41m48.75s)

2025-04-21 02:51:35.513 [DBG] INVC: Processed 4700000 invoices added since invoice with add index 4700862
2025-04-21 02:51:46.899 [DBG] INVC: Completed scanning invoices added since index 4700862: total_processed=4700862, found_invoices=4700862, elapsed=7h41m48.75s
2025-04-21 02:51:46.899 [DBG] INVC: Delivering settled invoices since 4700421 for client 1
2025-04-21 02:51:47.588 [DBG] INVC: Completed scanning invoices settled since index 4700421: total_processed=0, found_invoices=0, elapsed=688ms
2025-04-21 02:51:53.653 [INF] INVC: New invoice subscription client: id=1

and then CPU was not active anymore and a lot of RAM was released.

I then restarted and got

2025-04-21 05:12:32.493 [INF] LITD: Starting LiT account service
2025-04-21 05:12:32.493 [DBG] RPCS: [/lnrpc.Lightning/SubscribeInvoices] requested
2025-04-21 05:12:32.493 [DBG] STAT: Setting the accounts sub-server as running
2025-04-21 05:12:32.494 [DBG] INVC: Delivering added invoices since 9457843 for client 1
2025-04-21 05:12:32.505 [DBG] INVC: Completed scanning invoices added since index 9457843: total_processed=0, found_invoices=0, elapsed=11ms
2025-04-21 05:12:32.505 [DBG] INVC: Delivering settled invoices since 4700421 for client 1
2025-04-21 05:12:36.198 [DBG] INVC: Completed scanning invoices settled since index 4700421: total_processed=0, found_invoices=0, elapsed=3.693s
2025-04-21 05:12:36.198 [INF] INVC: New invoice subscription client: id=1

which leads me to believe that

What I don't get is does litd accounts keep track of what invoices it has already subscribed to and resume from there after each restart? If so, does it need to re-do this from the beginning after the invoice DB migration from KV to SQL schema?

is probably true (it does keep track of where it already subscribed to and resume after each restart) and

It could be that my SQLite version ran overnight once and I didn't realize it was doing that and now litd is just resuming and has nothing to re-scan (because I've created no new invoices).

is probably also true. It's likely SQLite actually will even be longer than postgres since

in #9716 (comment) I get much faster performance with listinvoices using postgres over SQLite

but I'd have to do more testing to confirm.

One thing I am not sure of though is why we get both

2025-04-21 05:12:32.505 [DBG] INVC: Completed scanning invoices added since index 9457843: total_processed=0, found_invoices=0, elapsed=11ms

and

2025-04-21 05:12:36.198 [DBG] INVC: Completed scanning invoices settled since index 4700421: total_processed=0, found_invoices=0, elapsed=3.693s

?

But either way, for now I think people are likely going to need to be aware that if they use LND accounts, they are going to need to plan for a lot more migration time. If they don't use LND accounts but are running litd, they likely want to run litd with --accounts.disable as I think this likely will skip the invoice scanning takes forever here.

Also, I'm not sure if the SubscribeInvoices could be made any faster with the SQL schema?

Also, I've used lndinit migrate-db to generate my SQLite and postgres DBs. Is it possible stuff from litd is not properly migrated and that's why this invoice (re)scanning needs to happen in the first place?

Is this super long delay going to happen with a the first non-litd accounts usage of SubscribeInvoices too after migrating to SQLite or postgres? Or, is it due to the migration of the schema from KV to SQL that happens in LND? I don't get why I didn't see this long delay when I first started up with --lnd.db.use-native-sql as it seems like it happens on the (second) startup after that migration finished (which is a huge gotcha for people thinking they had a successful migration and then go to find out next time the restart LND they need to wait almost 8 hours for this). I'd need to confirm this with more testing though.

@Roasbeef
Copy link
Member

Thanks for the investigation!

I think as you noted, this is actually an issue with the way the litd accounts work. On start up, it'll try to see if we have a settle/add index we remembered, if not, it defaults to values of zero for both: https://github.com/lightninglabs/lightning-terminal/blob/b0230d4ae4806a8904f6a932e557ee396802d411/accounts/service.go#L141-L176.

This means that for that very first restart post migration (if accounts weren't active before?), it'll effectively ask for the totality of the invoice history to be delivered.

I think instead, it should pass in math.MaxInt64 there. Which means that no back log would be delivered.

@ZZiigguurraatt
Copy link
Author

This means that for that very first restart post migration (if accounts weren't active before?), it'll effectively ask for the totality of the invoice history to be delivered.

Why can't it remember this through the migration?

I think instead, it should pass in math.MaxInt64 there. Which means that no back log would be delivered.

For the add or settle index? What if we have some invoices that were generated and unsettled before shutdown?

@ZZiigguurraatt
Copy link
Author

ZZiigguurraatt commented Apr 22, 2025

Would it be of any value to stream some kind of scanning progress to the client so they know the node is still working and not just hung up? This could helpful for all users of SubscribeInvoices, not just this litd accounts situation.

@ziggie1984 ziggie1984 self-assigned this Apr 22, 2025
@ziggie1984
Copy link
Collaborator

But it seems like LITD had an invoice number stored. It did not start from 0 as seens in this log-line:

2025-04-20 18:50:12.072 [DBG] INVC: Processed 102000 invoices added since invoice with add index 4700862

it started from index 4700862 which makes me believe that you already had LITD somehow started before ?

@ziggie1984
Copy link
Collaborator

Anyways I agree we need to paginate here and deliver the progress in chunks to the subscriber. This would bring down memory consumption and the subsriber could process it on the fly rather then waiting for the whole bulk at the end.

@ziggie1984
Copy link
Collaborator

I think as you noted, this is actually an issue with the way the litd accounts work. On start up, it'll try to see if we have a settle/add index we remembered, if not, it defaults to values of zero for both: https://github.com/lightninglabs/lightning-terminal/blob/b0230d4ae4806a8904f6a932e557ee396802d411/accounts/service.go#L141-L176.

The 0 means acutally we will not do a rescan, so we are fine on the LITD side but we could add a comment there as well.

See:

lnd/channeldb/invoices.go

Lines 255 to 259 in c9fe051

// If an index of zero was specified, then in order to maintain
// backwards compat, we won't send out any new invoices.
if sinceAddIndex == 0 {
return newInvoices, nil
}

@ZZiigguurraatt
Copy link
Author

ZZiigguurraatt commented Apr 22, 2025

But it seems like LITD had an invoice number stored. It did not start from 0 as seens in this log-line:

2025-04-20 18:50:12.072 [DBG] INVC: Processed 102000 invoices added since invoice with add index 4700862

it started from index 4700862 which makes me believe that you already had LITD somehow started before ?

So, what I think I did was

  1. Create DB in older LITD/LND with bbolt
  2. Grow DB by sending 1 sat payments back and forth with another node (LITD/LND was restarted a few times during this step)
  3. Shutdown LITD/LND
  4. Open LITD/LND in latest version
  5. Let bbolt be updated
  6. Shutdown LITD/LND
  7. Run lndinit migrate-db
  8. Switch LITD/LND to postgres/SQLite and native SQL schema invoices
  9. Start LITD/LND
  10. Wait for invoice DB migration to SQL schema to complete
  11. Shutdown LITD/LND
  12. Start LITD/LND
  13. observe this issue.

Now there could be any point above where litd accounts did not properly record on shutdown the add and settle index that should be tracked on next restart, but I don't think it started doing this rescan at an old add/settle index until step 13. It could be that the problem was introduced in step 3, but migrations in steps 5 and/or 10 did not show it and those steps had an issue that deferred the problem from being surfaced until step 13 ( I think so because whenever I did shutdowns, I thought CPU usage was near 0%).

@ZZiigguurraatt
Copy link
Author

Now there could be any point above where litd accounts did not properly record on shutdown the add and settle index that should be tracked on next restart,

Also, I'm not sure if it did not record properly because I did a hard shutdown or if it is a bug.

@ZZiigguurraatt
Copy link
Author

If they don't use LND accounts but are running litd, they likely want to run litd with --accounts.disable as I think this likely will skip the invoice scanning takes forever here.

We could also be smart enough to check and see if the user has no accounts created but they've still left this enabled, we can know to automatically skip running SubscribeInvoices.

@ZZiigguurraatt
Copy link
Author

This might be a more useful view of the size of my DB?

Image

@ziggie1984 ziggie1984 linked a pull request Apr 23, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants