-
Notifications
You must be signed in to change notification settings - Fork 871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Corrupt gossip file causes core lightning to refuse to start (gossipd: FATAL SIGNAL 6) #7249
Comments
update: found the crashlogs in /var/lib/docker/volumes/generated_clightning_bitcoin_datadir/_data/bitcoin Available on request |
|
Was able to resolve the issue by stopping core lightning, deleting the gossip file and starting core lightning again. I would consider this issue resolved if the default behaviour of the gossipd plugin was to try to recover instead of crashing. Recovering could involve deleting the file and starting again. I also want to add that the invoice that was attempted paid was from a private channel and happened while the bitcoin node was still catching up. And the keysend also happened in the same way. The status now is that the lightning node is running again, but that some of the channels were either force closed or pending. |
full debug log available on request |
Same issue here. 2024-05-17T16:25:25.684Z INFO lightningd: v24.02.1-93-gc4edec8 The node had been offline too for a few days and lightningd would crash on restart, but only after bitcoin re-synced the blockchain. During the sync it didn't crash. I also deleted the gossipd folder, reset and the above is from /data/lightningd/cln.log following those steps. |
Experienced same issue today. CLN crashed after BROKEN gossipd: FATAL SIGNAL 6. Attempted restart failed. I had to remove both newly created gossip_store and gossip_store_corrupt files to successfully restart node. Latest stable v24.02.2. |
This seems to be happening to some people, so don't panic. Unfortunately we don't have a good error callback here, so msg to stderr. Fixes: ElementsProject#7249 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
OK, thanks for the report. Your gossip store seems to be gaining redundant channel announcements somehow. I've patched it to prevent the crash, and put some more debug info in, as I cannot figure out how this is happening... |
This seems to be happening to some people, so don't panic. Unfortunately we don't have a good error callback here, so msg to stderr. Fixes: ElementsProject#7249 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I also ended up with a corrupt gossip store today, probably because I accidentally turned the wrong circuit breaker off :-) The first boot after power came back ends with:
It immediately restarted
A bunch more stuff, but it stayed online... in fact I received a bunch of keysend payments. Then a bit later I crashed it, probably via the renepay plugin. Upon restart I got errors again and a crash:
I ended up deleting the gossip store files entirely. Other than seeing all my peers as |
Probably yes!
This is a really unfortunate crash :/ I am wondering if we can do something to catch this problem and recover without manual intervention |
I just had a crash on a fresh node, started from scratch within the hour:
|
If it's a clue, I had just seeded the node's gossip w/ three connections to the top three nodes at
Is there a problem w/ fast concurrent gossip from multiple sources? |
I'm really curious if people that use postgresql instead of sqlite experience same issue? |
Issue and Steps to Reproduce
This is from a node used for experimenting. I am sharing some logs and a rough timeline for what happened and my progress in resolving the issue. It is not immediately obvious based on the error messages what should be done to get it running again
Here are some log lines that I have saved:
The text was updated successfully, but these errors were encountered: