Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[investigate]: Prevoting nil err, wrong Block.Header.AppHash. #3061

Closed
1 task done
rootmout opened this issue Apr 16, 2024 · 9 comments
Closed
1 task done

[investigate]: Prevoting nil err, wrong Block.Header.AppHash. #3061

rootmout opened this issue Apr 16, 2024 · 9 comments
Labels
general questions other: decayed Stale issues that need follow up from commentators. Were closed for inactivity scope: comet-bft scope: cosmos-sdk Integration with Cosmos SDK

Comments

@rootmout
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

I'm running two nodes (v15.2.0), they both suddently stopped to work yesterday when they reached the block 20012316

4:51PM INF packet acknowledged dst_channel=channel-146 dst_port=consumer module=x/ibc/channel sequence=1576652 src_channel=channel-635 src_port=provider
4:51PM INF acknowledgement succeeded channel-id=channel-635 module=server port-id=provider result=RESPONSE_RESULT_TYPE_SUCCESS
4:51PM INF Ensure peers module=pex numDialing=0 numInPeers=0 numOutPeers=2 numToDial=8
4:51PM INF client state updated client-id=07-tendermint-1154 heights=[{"revision_height":8579370,"revision_number":1}] module=x/ibc/client
4:51PM INF packet received dst_channel=channel-635 dst_port=provider module=x/ibc/channel sequence=1488689 src_channel=channel-146 src_port=consumer
4:51PM INF VSCMaturedPacket received and enqueued chainID=stride-1 module=x/ibc-provider vscID=5342286
4:51PM INF successfully handled VSCMaturedPacket sequence: %d module=x/ibc-provider
4:51PM INF acknowledgement written dst_channel=channel-635 dst_port=provider module=x/ibc/channel sequence=1488689 src_channel=channel-146 src_port=consumer
4:51PM INF receive packet callback succeeded channel-id=channel-146 module=server port-id=consumer result=RESPONSE_RESULT_TYPE_SUCCESS
4:51PM INF received proposal module=consensus proposal="Proposal{20012316/0 (90009CF179BDF10B32DDEFED653A842EDFAF5CD28B0B440735603335FE6DED0C:2:88A20A5462D9, -1) ACB1CAF8EECE @ 2024-04-15T16:51:42.750731001Z}" proposer=FB4FB25A61B493A5BF8E3CD4B5E8F5B404DA8E23
4:51PM INF Timed out dur=4420.304283 height=20012316 module=consensus round=0 step=RoundStepNewHeight
4:51PM INF received complete proposal block hash=90009CF179BDF10B32DDEFED653A842EDFAF5CD28B0B440735603335FE6DED0C height=20012316 module=consensus
4:51PM ERR prevote step: consensus deems this block invalid; prevoting nil err="wrong Block.Header.AppHash.  Expected 36406D406578C00F1F080E4D5C552238528F244BF1099EA916DD39E5F61D5D0B, got 88B6B06A3B9DB3852FDD443CFA81AABEF9570B507E4C74AAF1789638F9889590" height=20012316 module=consensus round=0
4:51PM ERR CONSENSUS FAILURE!!! err="precommit step; +2/3 prevoted for an invalid block: wrong Block.Header.AppHash.  Expected 36406D406578C00F1F080E4D5C552238528F244BF1099EA916DD39E5F61D5D0B, got 88B6B06A3B9DB3852FDD443CFA81AABEF9570B507E4C74AAF1789638F9889590" module=consensus stack="goroutine 62956 [running]:\nruntime/debug.Stack()\n\truntime/debug/stack.go:24 +0x5e\ngithub.com/cometbft/cometbft/consensus.(*State).receiveRoutine.func2()\n\tgithub.com/cometbft/cometbft@v0.37.4/consensus/state.go:736 +0x46\npanic({0x2475320?, 0xc1d79b8220?})\n\truntime/panic.go:914 +0x21f\ngithub.com/cometbft/cometbft/consensus.(*State).enterPrecommit(0xc0008b6700, 0x1315d1c, 0x0)\n\tgithub.com/cometbft/cometbft@v0.37.4/consensus/state.go:1463 +0x15d9\ngithub.com/cometbft/cometbft/consensus.(*State).addVote(0xc0008b6700, 0xc1bea7ba40, {0xc13b3f74d0, 0x28})\n\tgithub.com/cometbft/cometbft@v0.37.4/consensus/state.go:2195 +0x140f\ngithub.com/cometbft/cometbft/consensus.(*State).tryAddVote(0xc0008b6700, 0xc1bea7ba40, {0xc13b3f74d0?, 0xc0045b1c08?})\n\tgithub.com/cometbft/cometbft@v0.37.4/consensus/state.go:2012 +0x26\ngithub.com/cometbft/cometbft/consensus.(*State).handleMsg(0xc0008b6700, {{0x35385a0, 0xc0acd53130}, {0xc13b3f74d0, 0x28}})\n\tgithub.com/cometbft/cometbft@v0.37.4/consensus/state.go:865 +0x3ce\ngithub.com/cometbft/cometbft/consensus.(*State).receiveRoutine(0xc0008b6700, 0x0)\n\tgithub.com/cometbft/cometbft@v0.37.4/consensus/state.go:772 +0x3d1\ncreated by github.com/cometbft/cometbft/consensus.(*State).OnStart in goroutine 79\n\tgithub.com/cometbft/cometbft@v0.37.4/consensus/state.go:383 +0x10c\n"
4:51PM INF service stop impl=baseWAL module=consensus msg="Stopping baseWAL service" wal=/var/lib/atom/data/cs.wal/wal
4:51PM INF service stop impl=Group module=consensus msg="Stopping Group service" wal=/var/lib/atom/data/cs.wal/wal
4:51PM INF Timed out dur=1000 height=20012316 module=consensus round=0 step=RoundStepPrevoteWait
4:52PM INF Saving AddrBook to file book=/var/lib/atom/config/addrbook.json module=p2p size=1149
4:52PM INF Ensure peers module=pex numDialing=0 numInPeers=0 numOutPeers=2 numToDial=8
4:52PM INF Ensure peers module=pex numDialing=0 numInPeers=0 numOutPeers=2 numToDial=8
4:53PM INF Ensure peers module=pex numDialing=0 numInPeers=0 numOutPeers=2 numToDial=8
4:53PM ERR Connection failed @ sendRoutine conn=MConn{164.68.107.188:26656} err="pong timeout" module=p2p peer=cfd785a4224c7940e9a10f6c1ab24c343e923bec@164.68.107.188:26656
4:53PM INF service stop impl=MConn{164.68.107.188:26656} module=p2p msg="Stopping MConnection service" peer=cfd785a4224c7940e9a10f6c1ab24c343e923bec@164.68.107.188:26656
4:53PM ERR Stopping peer for error err="pong timeout" module=p2p peer="Peer{MConn{164.68.107.188:26656} cfd785a4224c7940e9a10f6c1ab24c343e923bec out}"
4:53PM INF service stop impl="Peer{MConn{164.68.107.188:26656} cfd785a4224c7940e9a10f6c1ab24c343e923bec out}" module=p2p msg="Stopping Peer service" peer=cfd785a4224c7940e9a10f6c1ab24c343e923bec@164.68.107.188:26656
4:53PM ERR Connection failed @ sendRoutine conn=MConn{46.166.138.217:26656} err="pong timeout" module=p2p peer=bc0eeeabdc558cdb39f4a8148b5022ba537512b4@46.166.138.217:26656
4:53PM INF service stop impl=MConn{46.166.138.217:26656} module=p2p msg="Stopping MConnection service" peer=bc0eeeabdc558cdb39f4a8148b5022ba537512b4@46.166.138.217:26656
4:53PM ERR Stopping peer for error err="pong timeout" module=p2p peer="Peer{MConn{46.166.138.217:26656} bc0eeeabdc558cdb39f4a8148b5022ba537512b4 out}"
4:53PM INF service stop impl="Peer{MConn{46.166.138.217:26656} bc0eeeabdc558cdb39f4a8148b5022ba537512b4 out}" module=p2p msg="Stopping Peer service" peer=bc0eeeabdc558cdb39f4a8148b5022ba537512b4@46.166.138.217:26656
4:53PM INF Ensure peers module=pex numDialing=0 numInPeers=0 numOutPeers=0 numToDial=10
4:54PM INF Saving AddrBook to file book=/var/lib/atom/config/addrbook.json module=p2p size=1149
4:54PM INF Ensure peers module=pex numDialing=0 numInPeers=0 numOutPeers=0 numToDial=10
4:54PM INF Ensure peers module=pex numDialing=0 numInPeers=0 numOutPeers=0 numToDial=10
4:55PM INF Ensure peers module=pex numDialing=0 numInPeers=0 numOutPeers=0 numToDial=10
4:55PM INF Ensure peers module=pex numDialing=0 numInPeers=0 numOutPeers=0 numToDial=10

I reinitialized one of them and is now working fine but I'm just concerned by the fact that it might happen again.....

Thks for your help

Gaia Version

v15.2.0

How to reproduce?

No response

@rootmout rootmout added status: waiting-triage This issue/PR has not yet been triaged by the team. type: bug Issues that need priority attention -- something isn't working labels Apr 16, 2024
@MSalopek MSalopek added general questions scope: comet-bft scope: cosmos-sdk Integration with Cosmos SDK and removed type: bug Issues that need priority attention -- something isn't working status: waiting-triage This issue/PR has not yet been triaged by the team. labels Apr 16, 2024
@MSalopek
Copy link
Contributor

Thank you for opening the issue!

Could you let us know if you used a snapshot to sync after the upgrade or you upgraded your existing nodes? This would help us with triage.

Please let us know if this happens again. If you have further issues try running gaia rollback on the node that failed and re-starting.

Besides that, what usually works is re-syncing from a snapshot.

@MSalopek MSalopek changed the title [Bug]: Prevoting nil err, wrong Block.Header.AppHash. [investigate]: Prevoting nil err, wrong Block.Header.AppHash. Apr 16, 2024
@rootmout
Copy link
Author

Hi

Could you let us know if you used a snapshot to sync after the upgrade or you upgraded your existing nodes?

Both current node were running from a snapshot

@MSalopek
Copy link
Contributor

Hi

Could you let us know if you used a snapshot to sync after the upgrade or you upgraded your existing nodes?

Both current node were running from a snapshot

Thanks for the clarification. Apphash errors show more frequently on some of the snapshot providers.

Let's leave this issue opened in case it happens again. We'll keep an eye on the reports here and in other places.

@rootmout
Copy link
Author

Hi @MSalopek
It happened again, last friday both of my nodes went down at the same time

7:54AM ERR Stopping peer for error err=EOF module=p2p peer="Peer{MConn{80.71.51.52:26656} 8acc5a62ad8eeb2140cff79a13dff0f993ab2354 out}"
7:54AM INF service stop impl="Peer{MConn{80.71.51.52:26656} 8acc5a62ad8eeb2140cff79a13dff0f993ab2354 out}" module=p2p msg="Stopping Peer service" peer=8acc5a62ad8eeb2140cff79a13dff0f993ab2354@80.71.51.52:26656
7:54AM ERR Error in validation err="wrong Block.Header.AppHash.  Expected 4D92317C4522B1812AEA119D76A23ABA554F18D7F1875C1426C5EAC3CC97A425, got 53651095E88C911B3D0F0B5D163A4F1B95E11A7ED4D4142C3C91BF2BFEB98F09" module=blockchain
7:54AM ERR Stopping peer for error err="Reactor validation error: wrong Block.Header.AppHash.  Expected 4D92317C4522B1812AEA119D76A23ABA554F18D7F1875C1426C5EAC3CC97A425, got 53651095E88C911B3D0F0B5D163A4F1B95E11A7ED4D4142C3C91BF2BFEB98F09" module=p2p peer="Peer{MConn{65.109.32.148:26726} 6ecca845883e9273062ee515d2657080e6539d9e out}"
7:54AM INF service stop impl="Peer{MConn{65.109.32.148:26726} 6ecca845883e9273062ee515d2657080e6539d9e out}" module=p2p msg="Stopping Peer service" peer=6ecca845883e9273062ee515d2657080e6539d9e@65.109.32.148:26726
7:54AM INF service stop impl=MConn{65.109.32.148:26726} module=p2p msg="Stopping MConnection service" peer=6ecca845883e9273062ee515d2657080e6539d9e@65.109.32.148:26726
7:54AM ERR Stopping peer for error err="Reactor validation error: wrong Block.Header.AppHash.  Expected 4D92317C4522B1812AEA119D76A23ABA554F18D7F1875C1426C5EAC3CC97A425, got 53651095E88C911B3D0F0B5D163A4F1B95E11A7ED4D4142C3C91BF2BFEB98F09" module=p2p peer="Peer{MConn{89.149.218.64:26656} 4cb59a50805f0646111412cf5d189e805e3e0c3d out}"

So you're guess is that the issue come from the snapshot I'm using? Can you explain more?
What should I investigate in my node settings to hope to resolve the problem?

Thks

@MSalopek
Copy link
Contributor

MSalopek commented Apr 22, 2024

Hi @MSalopek It happened again, last friday both of my nodes went down at the same time

7:54AM ERR Stopping peer for error err=EOF module=p2p peer="Peer{MConn{80.71.51.52:26656} 8acc5a62ad8eeb2140cff79a13dff0f993ab2354 out}"
7:54AM INF service stop impl="Peer{MConn{80.71.51.52:26656} 8acc5a62ad8eeb2140cff79a13dff0f993ab2354 out}" module=p2p msg="Stopping Peer service" peer=8acc5a62ad8eeb2140cff79a13dff0f993ab2354@80.71.51.52:26656
7:54AM ERR Error in validation err="wrong Block.Header.AppHash.  Expected 4D92317C4522B1812AEA119D76A23ABA554F18D7F1875C1426C5EAC3CC97A425, got 53651095E88C911B3D0F0B5D163A4F1B95E11A7ED4D4142C3C91BF2BFEB98F09" module=blockchain
7:54AM ERR Stopping peer for error err="Reactor validation error: wrong Block.Header.AppHash.  Expected 4D92317C4522B1812AEA119D76A23ABA554F18D7F1875C1426C5EAC3CC97A425, got 53651095E88C911B3D0F0B5D163A4F1B95E11A7ED4D4142C3C91BF2BFEB98F09" module=p2p peer="Peer{MConn{65.109.32.148:26726} 6ecca845883e9273062ee515d2657080e6539d9e out}"
7:54AM INF service stop impl="Peer{MConn{65.109.32.148:26726} 6ecca845883e9273062ee515d2657080e6539d9e out}" module=p2p msg="Stopping Peer service" peer=6ecca845883e9273062ee515d2657080e6539d9e@65.109.32.148:26726
7:54AM INF service stop impl=MConn{65.109.32.148:26726} module=p2p msg="Stopping MConnection service" peer=6ecca845883e9273062ee515d2657080e6539d9e@65.109.32.148:26726
7:54AM ERR Stopping peer for error err="Reactor validation error: wrong Block.Header.AppHash.  Expected 4D92317C4522B1812AEA119D76A23ABA554F18D7F1875C1426C5EAC3CC97A425, got 53651095E88C911B3D0F0B5D163A4F1B95E11A7ED4D4142C3C91BF2BFEB98F09" module=p2p peer="Peer{MConn{89.149.218.64:26656} 4cb59a50805f0646111412cf5d189e805e3e0c3d out}"

So you're guess is that the issue come from the snapshot I'm using? Can you explain more? What should I investigate in my node settings to hope to resolve the problem?

Thks

I think the easiest would be to statesync instead of using a node snapshot using v15.2.0 binary. Try using this as a starting point: https://polkachu.com/state_sync/cosmos. From where are you getting your node snapshots?

Besides that, try purging your addressbook and download a new one and check that your persistent peers are upgraded correctly (if you use persistent peers).

Are you running a validator node or an RPC node? Are you accepting Txs or is it just used for reading state?

Could you share your config/app.toml and config/config.toml? Please take note to redact any sensitive information such as IP addresses, keys etc.

@rootmout
Copy link
Author

Could you share your config/app.toml and config/config.toml?

There is already the two config files you've asked for. I'll have a look for state_sync in the mean time.
config.toml.txt
app.toml.txt

Are you running a validator node or an RPC node? Are you accepting Txs or is it just used for reading state?

A RPC node that allow tx

@rootmout
Copy link
Author

Hello @MSalopek

From where are you getting your node snapshots?

They come from the following source: https://quicksync.io/cosmos.json
I'm using the cosmoshub-4-pruned file

@MSalopek
Copy link
Contributor

MSalopek commented Apr 24, 2024

Have you had any luck with other node snapshot providers or statesync?

Has the issue been resolved?
We don't have any other reports around this issue.

@MSalopek
Copy link
Contributor

Network upgraded to v16.0.0 and there was no recent activity on this issue.

I'm closing it for now. Feel free to reopen if the issue happens for v16.x.

@MSalopek MSalopek added the other: decayed Stale issues that need follow up from commentators. Were closed for inactivity label May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
general questions other: decayed Stale issues that need follow up from commentators. Were closed for inactivity scope: comet-bft scope: cosmos-sdk Integration with Cosmos SDK
Projects
Status: ✅ Done
Development

No branches or pull requests

2 participants