Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Wait for bitcoin-cli to warmup before crashing #1533

Open
NicolasDorier opened this issue Jul 10, 2018 · 10 comments
Open
Assignees
Labels
backend Related to the node backend software/interface (e.g. btcd, bitcoin-core) bitcoind Bitcoin Core backend P3 might get fixed, nice to have

Comments

@NicolasDorier
Copy link
Contributor

NicolasDorier commented Jul 10, 2018

If lnd start before bitcoind is fully started (like for example if bitcoind is checking blocks), lnd will crash.

Can lnd wait automatically in a similar way to bitcoin-cli -rpcwait getblockchaininfo?

@Roasbeef
Copy link
Member

Well bitcoind should be started before lnd itself. What do you mean by "crash" here? We have an exponential back off for the RPC connection, but it may be the case now that our DNS query fails early?

@Roasbeef Roasbeef added backend Related to the node backend software/interface (e.g. btcd, bitcoin-core) bitcoind Bitcoin Core backend P4 low prio labels Jul 10, 2018
@NicolasDorier
Copy link
Contributor Author

I did not experienced it yet, but this will probably happen when I run LND on mainnet.

I carefully looked your code (lnd and rpcClient), and could not find any place where you handle RPC error code -28 (RPC_IN_WARMUP).

LND fails to start when I get wrong RPC credentials (which is expected), so I am assuming (wrongly?) that it also would if calls to bitcoin rpc returns RPC_IN_WARMUP error.

@NicolasDorier
Copy link
Contributor Author

As I was expecting: here the error

btcpayserver_lnd_litecoin | LND_CHAIN=ltc
btcpayserver_lnd_litecoin | LND_ENVIRONMENT=mainnet
btcpayserver_lnd_litecoin | Added litecoin.active and litecoin.mainnet to config file /data/lnd.conf
btcpayserver_lnd_litecoin | Attempting automatic RPC configuration to litecoind
btcpayserver_lnd_litecoin | Automatically obtained litecoind's RPC credentials
btcpayserver_lnd_litecoin | 2018-07-11 08:52:18.623 [INF] LTND: Version 0.4.2-beta commit=f349707d213129672e199149f77956d250f583ba
btcpayserver_lnd_litecoin | 2018-07-11 08:52:18.636 [INF] LTND: Active chain: Litecoin (network=mainnet)
btcpayserver_lnd_litecoin | 2018-07-11 08:52:18.684 [INF] CHDB: Checking for schema update: latest_version=1, db_version=1
btcpayserver_lnd_litecoin | 2018-07-11 08:52:18.684 [INF] RPCS: Generating TLS certificates...
btcpayserver_lnd_litecoin | 2018-07-11 08:52:18.718 [INF] RPCS: Done generating TLS certificates
btcpayserver_lnd_litecoin | 2018-07-11 08:52:18.887 [INF] LTND: Primary chain is set to: litecoin
btcpayserver_lnd_litecoin | 2018-07-11 08:52:18.888 [INF] LTND: Initializing litecoind backed fee estimator
btcpayserver_lnd_litecoin | unable to create chain control: -28: Loading block index...
btcpayserver_lnd_litecoin | 2018-07-11 08:52:18.890 [INF] LTND: Shutdown complete
btcpayserver_lnd_litecoin | -28: Loading block index...
btcpayserver_lnd_litecoin exited with code 1

@Roasbeef Roasbeef added P2 should be fixed if one has time and removed P4 low prio labels Jul 11, 2018
@halseth halseth added P3 might get fixed, nice to have and removed P2 should be fixed if one has time labels Sep 11, 2018
sangaman added a commit to sangaman/btcwallet that referenced this issue Feb 18, 2020
This modifies the `Start` method to handle the `RPC_IN_WARMUP` error
code of `-28` by retrying the rpc call to determine the current network
every second until it either succeeds or fails with a different code.
The current behavior fails and terminates the connection upon receiving
this error code. This change allows for connecting to a recently started
bitcoind node and starting the client while bitcoind is still warming
up.

Related issue: lightningnetwork/lnd#1533
sangaman added a commit to sangaman/btcwallet that referenced this issue Feb 18, 2020
This modifies the `Start` method to handle the `RPC_IN_WARMUP` error
code of `-28` by retrying the rpc call to determine the current network
every second until it either succeeds or fails with a different code.
The current behavior fails and terminates the connection upon receiving
this error code. This change allows for connecting to a recently started
bitcoind node and starting the client while bitcoind is still warming
up.

Related issues: lightningnetwork/lnd#1533 &
https://github.com/ExchangeUnion/xud-docker/issues/195
@sangaman
Copy link
Contributor

I came up with a solution for this in the btcwallet code and it resolves this issue nicely in my tests. I couldn't come up with an elegant way to address the issue within the lnd codebase.

@guggero
Copy link
Collaborator

guggero commented Nov 30, 2020

Would this indirectly be fixed with the backend healthcheck that is now implemented? Causing lnd to gracefully shut down if the backend isn't ready yet so it can be restarted by docker/kubernetes/systemd?

@NicolasDorier
Copy link
Contributor Author

The way I fixed it is, I have my UTXO tracker NBXplorer writing some file when the node is ready.
Our LND's fork docker entrypoint just have an infinite loop waiting for this file to be created.

sangaman added a commit to sangaman/btcwallet that referenced this issue Dec 2, 2020
This modifies the `Start` method to handle the `RPC_IN_WARMUP` error
code of `-28` by retrying the rpc call to determine the current network
every second until it either succeeds or fails with a different code.
The current behavior fails and terminates the connection upon receiving
this error code. This change allows for connecting to a recently started
bitcoind node and starting the client while bitcoind is still warming
up.

Related issues: lightningnetwork/lnd#1533 &
https://github.com/ExchangeUnion/xud-docker/issues/195
sangaman added a commit to sangaman/btcwallet that referenced this issue Dec 3, 2020
This modifies the `Start` method to handle the `RPC_IN_WARMUP` error
code of `-28` by retrying the rpc call to determine the current network
every second until it either succeeds or fails with a different code.
The current behavior fails and terminates the connection upon receiving
this error code. This change allows for connecting to a recently started
bitcoind node and starting the client while bitcoind is still warming
up.

Related issues: lightningnetwork/lnd#1533 &
https://github.com/ExchangeUnion/xud-docker/issues/195
sangaman added a commit to sangaman/btcwallet that referenced this issue Jun 9, 2021
This modifies the `Start` method to handle the `RPC_IN_WARMUP` error
code of `-28` by retrying the rpc call to determine the current network
every second until it either succeeds or fails with a different code.
The current behavior fails and terminates the connection upon receiving
this error code. This change allows for connecting to a recently started
bitcoind node and starting the client while bitcoind is still warming
up.

Related issues: lightningnetwork/lnd#1533 &
https://github.com/ExchangeUnion/xud-docker/issues/195
sangaman added a commit to sangaman/btcwallet that referenced this issue Jun 9, 2021
This modifies the `Start` method to handle the `RPC_IN_WARMUP` error
code of `-28` by retrying the rpc call to determine the current network
every second until it either succeeds or fails with a different code.
The current behavior fails and terminates the connection upon receiving
this error code. This change allows for connecting to a recently started
bitcoind node and starting the client while bitcoind is still warming
up.

Related issues: lightningnetwork/lnd#1533 &
https://github.com/ExchangeUnion/xud-docker/issues/195
sangaman added a commit to sangaman/btcwallet that referenced this issue Jun 10, 2021
This modifies the `Start` method to handle the `RPC_IN_WARMUP` error
code of `-28` by retrying the rpc call to determine the current network
every second until it either succeeds or fails with a different code.
The current behavior fails and terminates the connection upon receiving
this error code. This change allows for connecting to a recently started
bitcoind node and starting the client while bitcoind is still warming
up.

Related issues: lightningnetwork/lnd#1533 &
https://github.com/ExchangeUnion/xud-docker/issues/195
@3nprob
Copy link

3nprob commented Aug 25, 2021

In cases of very brief interruptions, this should be addressed by btcsuite/btcd#1743

@Roasbeef Roasbeef added this to the v0.17.0 milestone Aug 19, 2022
@seth586
Copy link

seth586 commented Mar 11, 2023

This is still an ongoing issue (FreeBSD 13.1, bitcoind 0.24.0, lnd 0.15.5) . Here is a reboot situation, rc.d scrip for lnd requires bitcoind to start first

2023-03-06 04:21:27.654 [INF] LTND: Received terminated
2023-03-06 04:21:27.654 [INF] LTND: Shutting down...
2023-03-06 04:21:27.654 [INF] LTND: Gracefully shutting down.
2023-03-06 04:21:27.654 [INF] NANN: Channel Status Manager shutting down
2023-03-06 04:21:27.655 [INF] HSWC: HTLC Switch shutting down
2023-03-06 04:21:27.655 [INF] NTFN: Cancelling epoch notification, epoch_id=3
2023-03-06 04:21:27.655 [INF] HSWC: Removing channel link with ChannelID(censored)
2023-03-06 04:21:27.655 [INF] HSWC: Removing channel link with ChannelID(censored)
2023-03-06 04:21:27.655 [INF] HSWC: ChannelLink(censored:1): stopping
2023-03-06 04:21:27.655 [INF] HSWC: ChannelLink(censored:1): stopping
2023-03-06 04:21:27.655 [INF] HSWC: ChannelLink(censored:1): exited
2023-03-06 04:21:27.655 [INF] HSWC: ChannelLink(censored:1): exited
2023-03-06 04:21:27.659 [INF] HSWC: Onion processor shutting down
2023-03-06 04:21:27.659 [INF] HSWC: Decaying hash log received shutdown request
2023-03-06 04:21:27.659 [INF] NTFN: Cancelling epoch notification, epoch_id=8
2023-03-06 04:21:27.659 [INF] INVC: InvoiceRegistry shutting down
2023-03-06 04:21:27.659 [INF] NTFN: Cancelling epoch notification, epoch_id=7
2023-03-06 04:21:27.659 [INF] CRTR: Channel Router shutting down
2023-03-06 04:21:27.659 [INF] CRTR: FilteredChainView stopping
2023-03-06 04:21:27.659 [INF] CNCT: ChainArbitrator shutting down
2023-03-06 04:21:27.660 [INF] NTFN: Cancelling epoch notification, epoch_id=4
2023-03-06 04:21:27.660 [INF] FNDG: Funding manager shutting down
2023-03-06 04:21:27.660 [INF] BRAR: Breach arbiter shutting down
2023-03-06 04:21:27.660 [INF] UTXN: UTXO nursery shutting down
2023-03-06 04:21:27.660 [INF] NTFN: Cancelling epoch notification, epoch_id=2
2023-03-06 04:21:27.660 [INF] DISC: Authenticated gossiper shutting down
2023-03-06 04:21:27.660 [INF] DISC: Authenticated Gossiper is stopping
2023-03-06 04:21:27.660 [INF] NTFN: Cancelling epoch notification, epoch_id=5
2023-03-06 04:21:27.660 [INF] SWPR: Sweeper shutting down
2023-03-06 04:21:27.660 [INF] NTFN: Cancelling epoch notification, epoch_id=1
2023-03-06 04:21:27.660 [INF] CHNF: ChannelNotifier shutting down
2023-03-06 04:21:27.660 [INF] PRNF: PeerNotifier shutting down
2023-03-06 04:21:27.660 [INF] HSWC: HtlcNotifier shutting down
2023-03-06 04:21:27.660 [INF] CHBU: Stopping chanbackup.SubSwapper
2023-03-06 04:21:27.660 [INF] NTFN: bitcoind notifier shutting down
2023-03-06 04:21:27.660 [INF] CHFT: Stopping event store
2023-03-06 04:21:27.660 [ERR] RPCS: [/chainrpc.ChainNotifier/RegisterBlockEpochNtfn]: chain notifier shutting down
2023-03-06 04:21:27.661 [INF] SRVR: Disconnecting from censored@censored:9735
2023-03-06 04:21:27.661 [INF] PEER: Peer(censored): disconnecting censored@ce.ns.or.ed:9735, reason: server: DisconnectPeer called
2023-03-06 04:21:27.661 [INF] PEER: Peer(censored): unable to read message from peer: read tcp 127.0.0.1:15334->127.0.0.1:9050: use of closed network connection
2023-03-06 04:21:27.661 [INF] SRVR: Disconnecting from ce.ns.or.ed@ce.ns.or.ed:9735
2023-03-06 04:21:27.661 [INF] PEER: Peer(ce.ns.or.ed): disconnecting ce.ns.or.ed@ce.ns.or.ed:9735, reason: server: DisconnectPeer called
2023-03-06 04:21:27.661 [INF] PEER: Peer(ce.ns.or.ed): unable to read message from peer: read tcp 127.0.0.1:41952->127.0.0.1:9050: use of closed network connection
2023-03-06 04:21:27.661 [INF] HLCK: Health monitor shutting down
2023-03-06 04:21:27.876 [INF] RPCS: Stopping RPC Server
2023-03-06 04:21:27.876 [INF] RPCS: Stopping PeersRPC Sub-RPC Server
2023-03-06 04:21:27.876 [INF] RPCS: Stopping NeutrinoKitRPC Sub-RPC Server
2023-03-06 04:21:27.876 [INF] RPCS: Stopping WatchtowerRPC Sub-RPC Server
2023-03-06 04:21:27.876 [INF] RPCS: Stopping WatchtowerClientRPC Sub-RPC Server
2023-03-06 04:21:27.876 [INF] RPCS: Stopping SignRPC Sub-RPC Server
2023-03-06 04:21:27.876 [INF] RPCS: Stopping RouterRPC Sub-RPC Server
2023-03-06 04:21:27.876 [INF] RPCS: Stopping AutopilotRPC Sub-RPC Server
2023-03-06 04:21:27.876 [INF] RPCS: Stopping ChainRPC Sub-RPC Server
2023-03-06 04:21:27.876 [INF] RPCS: Stopping InvoicesRPC Sub-RPC Server
2023-03-06 04:21:27.876 [INF] RPCS: Stopping VersionRPC Sub-RPC Server
2023-03-06 04:21:27.876 [INF] RPCS: Stopping WalletKitRPC Sub-RPC Server
2023-03-06 04:21:27.876 [INF] TORC: Stopping tor controller
2023-03-06 04:21:27.886 [INF] LTND: Shutdown complete2023-03-06 04:30:38.706 [INF] LTND: Version: 0.15.5-beta commit=v0.15.5-beta, build=production, logging=default, debuglevel=info
2023-03-06 04:30:38.706 [INF] LTND: Active chain: Bitcoin (network=mainnet)
2023-03-06 04:30:38.708 [INF] RPCS: RPC server listening on 0.0.0.0:10009
2023-03-06 04:30:38.712 [INF] RPCS: gRPC proxy started at 0.0.0.0:8080
2023-03-06 04:30:38.712 [INF] LTND: Opening the main database, this might take a few minutes...
2023-03-06 04:30:38.713 [INF] LTND: Opening bbolt database, sync_freelist=false, auto_compact=false
2023-03-06 04:30:40.187 [INF] LTND: Creating local graph and channel state DB instances
2023-03-06 04:30:40.436 [INF] CHDB: Checking for schema update: latest_version=29, db_version=29
2023-03-06 04:30:40.436 [INF] CHDB: Checking for optional update: prune_revocation_log=false, db_version=empty
2023-03-06 04:30:40.436 [INF] LTND: Database(s) now open (time_to_open=1.723599787s)!
2023-03-06 04:30:40.436 [INF] LTND: Attempting automatic wallet unlock with password provided in file
2023-03-06 04:30:41.333 [INF] LNWL: Opened wallet
2023-03-06 04:30:41.389 [INF] CHRE: Primary chain is set to: bitcoin
2023-03-06 04:30:41.402 [ERR] LTND: unable to create partial chain control: -28: Loading block index…
2023-03-06 04:30:41.402 [ERR] LTND: Shutting down because error in main method: error creating wallet config: unable to create partial chain control: -28: Loading block index…
2023-03-06 04:30:41.405 [INF] LTND: Shutdown completeroot

I can hack it by introducing a sleep 10 in the bitcoind startup script, but I would rather lnd intelligently wait for bitcoind RPC services to become available.

@schildbach
Copy link

I'm stuck with this issue too. Without lnd waiting for bitcoind to start up it will never be possible to automatically start a node in a predictable way. It could crash just because bitcoind takes 15 minutes to start up one day, rather than 14.

@sangaman
Copy link
Contributor

I still think my PR here would fix this: btcsuite/btcwallet#677

Then there'd need to be another PR against lnd to use the updated dependency.

@saubyk saubyk removed this from the Low Priority milestone Aug 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Related to the node backend software/interface (e.g. btcd, bitcoin-core) bitcoind Bitcoin Core backend P3 might get fixed, nice to have
Projects
None yet
Development

No branches or pull requests

9 participants