Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LND shuts down if HTTP request to BTC node fails #5661

Open
3nprob opened this issue Aug 25, 2021 · 4 comments
Open

LND shuts down if HTTP request to BTC node fails #5661

3nprob opened this issue Aug 25, 2021 · 4 comments
Labels
bitcoind Bitcoin Core backend enhancement Improvements to existing features / behaviour P3 might get fixed, nice to have rpc Related to the RPC interface

Comments

@3nprob
Copy link

3nprob commented Aug 25, 2021

Background

If any HTTP request to bitcoind fails during sync, the process shuts down. This is true also for intermittent errors like timeouts, networks errors, or bitcoind still syncing (related to #1533). Even under good conditions, when connecting to a bitcoin node over Tor, these kind of errors are normal.

I didn't find a clean way to address this in the LND codebase itself.
I attempted to address this with this PR in the btcd/btcwallet codebase here: btcsuite/btcd#1743

Have been running with this patch on testnet for some time and restarts have gone from multiple times per day to 0 over the past week.

Expected behaviour

LND retries failed requests with an exponential backoff

Actual behaviour

LND exits, requiring a wallet unlock on restart

@guggero
Copy link
Collaborator

guggero commented Aug 25, 2021

Are you sure this isn't the healthcheck that is shutting down lnd? What do you see in the logs?
You should be able to turn that off with healthcheck.chainbackend.attempts=0.
Check lnd --help | grep health for more information.

@3nprob
Copy link
Author

3nprob commented Aug 25, 2021

@guggero Yeah, I see in stderr (as opposed to stdout where other logs end up) a Connection failed (or equivalent; I don't have the logs anymore. There was no log enrichment from lnd here, just an error bubbling up all the way from the http client in the btcd module). In my speciic case tracked it down to:

func (b *BtcWallet) GetBlockHash(blockHeight int64) (*chainhash.Hash, error) {

During server.Start()

During startup (either initial or starting up after falling behind), this gets called for each block so probability of this causing a shutdown compounds with the number of blocks to process.

But looking around it seems that there's currently nothing in place to handle errors from calling any RPC functions so anything unexpected in that whole path will just bubble up and stop the process.. Or am I missing something?

FWIW lnd --help | grep health returns nothing for me - is there some build tag I need to enable that? FWIW I didn't touch the config around that from the sample at some point:

[healthcheck]
healthcheck.chainbackend.attempts=3
healthcheck.chainbackend.timeout=10s
healthcheck.chainbackend.backoff=30s
healthcheck.chainbackend.interval=2m

I actually missed this healthcheck-backoff thing - correct me if I'm wrong here but it looks to me that this is a separate go routine that checks the chain backend, killing the process if it fails, but it has nothing to do with handling errors during RPC calls?

@Roasbeef
Copy link
Member

If any HTTP request to bitcoind fails during sync, the process shuts down

Are requests failing due to timeouts, or the network itself being unreliable? Typically we see users use a sort of hyper visor to automatically restart lnd in the background if there's a networking issue. Even if you add extra retries at the rpcclient level in btcsuite, you'd eventually need to bail out and rely on a restart at a higher level, right?

@Roasbeef Roasbeef added bitcoind Bitcoin Core backend enhancement Improvements to existing features / behaviour rpc Related to the RPC interface P3 might get fixed, nice to have labels Aug 31, 2021
@3nprob
Copy link
Author

3nprob commented Feb 27, 2022

Experiencing this now on a node requiring a wallet rescan. At some point on startup, one of the requests will fail:

[ERR] LNWL: Unable to complete chain rescan: Post "http://127.0.0.1:8330": read tcp 127.0.0.1:51674->127.0.0.1:8330: read: connection reset by peer

This effectively ends up in a restart loop (with wallet unlock required on each) and the node is unable to start up. Adding retry behavior would allow it to complete, as the errors are transient.

If I understand the dependency resolution right and it doesn't get dropped, this should be resolved when #6285 is merged, since it brings in btcsuite/btcd#1743

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bitcoind Bitcoin Core backend enhancement Improvements to existing features / behaviour P3 might get fixed, nice to have rpc Related to the RPC interface
Projects
None yet
Development

No branches or pull requests

3 participants