Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

State rollback error against gaiad data #7123

Closed
sosedoff opened this issue Oct 13, 2021 · 7 comments
Closed

State rollback error against gaiad data #7123

sosedoff opened this issue Oct 13, 2021 · 7 comments
Assignees

Comments

@sosedoff
Copy link

Versions:

  • Tendermint: v0.34.9
  • Gaia: v4.2.1

Background:
Trying to test out state rollback command backported in #7080. My current local state was at height 5213559.

Error
Executed the rollback command a few times and after starting the gaiad process again got this panic:

6:30PM INF starting ABCI with Tendermint
6:30PM INF Starting multiAppConn service impl=multiAppConn module=proxy
6:30PM INF Starting localClient service connection=query impl=localClient module=abci-client
6:30PM INF Starting localClient service connection=snapshot impl=localClient module=abci-client
6:30PM INF Starting localClient service connection=mempool impl=localClient module=abci-client
6:30PM INF Starting localClient service connection=consensus impl=localClient module=abci-client
6:30PM INF Starting EventBus service impl=EventBus module=events
6:30PM INF Starting PubSub service impl=PubSub module=pubsub
6:30PM INF Starting IndexerService service impl=IndexerService module=txindex
6:30PM INF Starting ExtractorService service impl=ExtractorService module=extractor
6:30PM INF configured stream output dest=/Users/xxx/.gaia/extractor/20211013.log module=extractor
6:30PM INF ABCI Handshake App Info hash="\b\x14�*r'\\m\x11GT\r\x16j�%\a~��K�&9{������X" height=5213559 module=consensus protocol-version=0 software-version=
6:30PM INF ABCI Replay Blocks appHeight=5213559 module=consensus stateHeight=5213543 storeHeight=5213560
panic: StoreBlockHeight (5213560) > StateBlockHeight + 1 (5213544)

goroutine 1 [running]:
github.com/tendermint/tendermint/consensus.(*Handshaker).ReplayBlocks(_, {{{0xb, 0x0}, {0xc0016aef00, 0x11}}, {0xc003d81a20, 0xb}, 0x4f5b97, 0x4f8d67, {{0xc001da6200, ...}, ...}, ...}, ...)
	github.com/tendermint/tendermint@v0.34.9/consensus/replay.go:382 +0xadf
github.com/tendermint/tendermint/consensus.(*Handshaker).Handshake(0xc002a744e0, {0x589df80, 0xc00014c9c0})
	github.com/tendermint/tendermint@v0.34.9/consensus/replay.go:268 +0x3c8
github.com/tendermint/tendermint/node.doHandshake({_, _}, {{{0xb, 0x0}, {0xc0016aef00, 0x11}}, {0xc003d81a20, 0xb}, 0x4f5b97, 0x4f8d67, ...}, ...)
	github.com/tendermint/tendermint@v0.34.9/node/node.go:322 +0x1b8
github.com/tendermint/tendermint/node.NewNode(0xc001088140, {0x58577c0, 0xc0000e8f00}, 0xc0000fe850, {0x583cba0, 0xc003899fb0}, 0x400ebd4, 0x92ca0, 0xc0000fe9b0, {0x5876558, ...}, ...)
	github.com/tendermint/tendermint@v0.34.9/node/node.go:746 +0x82e
github.com/cosmos/cosmos-sdk/server.startInProcess(_, {{0x0, 0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x587d8b0, 0xc000f6fc40}, ...}, ...)
	github.com/cosmos/cosmos-sdk@v0.42.4/server/start.go:244 +0x625
github.com/cosmos/cosmos-sdk/server.StartCmd.func2(0xc000fe0780, {0x6596f30, 0x0, 0x0})
	github.com/cosmos/cosmos-sdk@v0.42.4/server/start.go:120 +0x168
github.com/spf13/cobra.(*Command).execute(0xc000fe0780, {0x6596f30, 0x0, 0x0})
	github.com/spf13/cobra@v1.1.3/command.go:852 +0x60e
github.com/spf13/cobra.(*Command).ExecuteC(0xc00067f680)
	github.com/spf13/cobra@v1.1.3/command.go:960 +0x3ad
github.com/spf13/cobra.(*Command).Execute(...)
	github.com/spf13/cobra@v1.1.3/command.go:897
github.com/spf13/cobra.(*Command).ExecuteContext(...)
	github.com/spf13/cobra@v1.1.3/command.go:890
github.com/cosmos/cosmos-sdk/server/cmd.Execute(0x0, {0xc00062cd38, 0x15})
	github.com/cosmos/cosmos-sdk@v0.42.4/server/cmd/execute.go:36 +0x1d0
main.main()
	github.com/cosmos/gaia/v4/cmd/gaiad/main.go:16 +0x2c

I restored data directory from a working dev snapshot (on a slightly different height), and attempted to execute rollback once, this time i got a different error (no panic):

Error: error during handshake: error on replay: block time 2021-02-19 14:22:34.28947622 +0000 UTC not greater than last block time 2021-02-19 14:22:34.28947622 +0000 UTC

Is the rollback command compatible with gaiad?

@cmwaters
Copy link
Contributor

Hey @sosedoff, thanks for opening this issue.

The rollback shouldn't just be done on the Tendermint side but must also be coordinated with the application. The app must rollback its state to the previous height so that Tendermint knows to replay the last block. As an example if we want to rollback from 100 to 99, Tendermint will rollback it's state to height 99 but if the application stays at 100, when it replays block 100 to the app it will compare that the incorrect old app hash is equal to the new app hash (at height 100). If they're not it will panic.

All that being said, these two errors you've shown me are slightly different (and I think I know what's causing the first one). So I'll see if I can debug the problem.

For the second one, do you mind showing me more of the log that resulted in this error?

@cmwaters cmwaters self-assigned this Oct 14, 2021
@sosedoff
Copy link
Author

@cmwaters Here's the second log:

9:05AM INF starting ABCI with Tendermint
9:05AM INF Starting multiAppConn service impl=multiAppConn module=proxy
9:05AM INF Starting localClient service connection=query impl=localClient module=abci-client
9:05AM INF Starting localClient service connection=snapshot impl=localClient module=abci-client
9:05AM INF Starting localClient service connection=mempool impl=localClient module=abci-client
9:05AM INF Starting localClient service connection=consensus impl=localClient module=abci-client
9:05AM INF Starting EventBus service impl=EventBus module=events
9:05AM INF Starting PubSub service impl=PubSub module=pubsub
9:05AM INF Starting IndexerService service impl=IndexerService module=txindex
9:05AM INF Starting ExtractorService service impl=ExtractorService module=extractor
9:05AM INF configured stream output dest=/Users/sosedoff/.gaia/extractor/20211014.log module=extractor
9:05AM INF ABCI Handshake App Info hash="\f�۬\"�/\a9P�\r'���`dO�Զ$�\ueb8e���e�" height=5213449 module=consensus protocol-version=0 software-version=
9:05AM INF ABCI Replay Blocks appHeight=5213449 module=consensus stateHeight=5213448 storeHeight=5213449
9:05AM INF Replay last block using mock app module=consensus
Error: error during handshake: error on replay: block time 2021-02-19 14:22:34.28947622 +0000 UTC not greater than last block time 2021-02-19 14:22:34.28947622 +0000 UTC
Usage:
  gaiad start [flags]

@sosedoff
Copy link
Author

@cmwaters have you had any chance to look into this yet?

@cmwaters
Copy link
Contributor

Hey, I've posted a minor patch to address one concern I noted from your logs. I'm still unsure how the state store height and block store height diverged by so much: StoreBlockHeight (5213560) > StateBlockHeight + 1 (5213544). They should supposedly always be within 1 of each other.

I think many of these issues can be further ironed out once the cosmos SDK implement rollback functionality on their side and so the tooling is complete and can be properly tested on running nodes.

@sosedoff
Copy link
Author

The height has diverged simply due to me running the rollback a few times

@cmwaters
Copy link
Contributor

cmwaters commented Dec 7, 2021

Hey @sosedoff, I made some changes to the rollback feature which I hope has fixed things. Do you mind letting me know if you still have a problem with this else we can close this issue

@sosedoff
Copy link
Author

sosedoff commented Dec 7, 2021

Yea we can close for now

@sosedoff sosedoff closed this as completed Dec 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants