Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: create CLI tool to help with mainnet export testing #3082

Merged
merged 10 commits into from
May 14, 2024

Conversation

stana-miric
Copy link
Contributor

@stana-miric stana-miric commented Apr 23, 2024

This PR introduces a new unsafe-start-local-validator command that should be used only for testing.

The command makes changes to the local mainnet node to make it suitable for local testing. After changing the data, command will also start the node. The changes include modification of consensus and application states by removing old validator data and injecting the new one, and funding the addresses to be used in testing without affecting existing addresses.

The command uses the approach described here and adds necessary changes to consensus state and staking module.

The command is added as a sub-command of the gaiad testnet command. It is included in the gaia binary only if the tag unsafe_start_local_validator is used during building it (make build BUILD_TAGS="-tag unsafe_start_local_validator").

Example of running the command:

./gaiad testnet unsafe-start-local-validator  
--validator-operator="cosmosvaloper17fjdcqy7g80pn0seexcch5pg0dtvs45p57t97r"  
--validator-pukey="SLpHEfzQHuuNO9J1BB/hXyiH6c1NmpoIVQ2pMWmyctE=" 
--validator-privkey="AiayvI2px5CZVl/uOGmacfFjcIBoyk3Oa2JPBO6zEcdIukcR/NAe64070nUEH+FfKIfpzU2amghVDakxabJy0Q=="  
--accounts-to-fund="cosmos1ju6tlfclulxumtt2kglvnxduj5d93a64r5czge,cosmos1r5v5srda7xfth3hn2s26txvrcrntldjumt8mhl"  
[other_flags_from_server_start_cmd]

The use case is following:

  1. We have the mainnet node state on our local machine
  2. We replace all validator key files (keyring data, priv_validator_key.json, values in priv_validator_state.json are reset to 0...)
  3. We run the unsafe-start-local-validator command that switches the validator set and starts the node

The changes added in this command have some additional modifications compering to sdk guide:

  1. Modification of the consensus state db:
  • validator set is updated because otherwise the node is not known as a validator to consensus and will never propose the new block
  • validator info is updated because the distribution module will allocate tokens in the begin blocker based on validator info for the previous height and the voting validator address has to be present in the staking module, which is not the case for the old validator
  1. Modification of consensus blockstore db:
  • last commit data is changed by updating its signature's info. When building the last commit info for a new block, consensus will match the validator's set length and size of the lastCommit signatures and expect them to be equal. So if the mainnet state had e.g., 50 validators the size of the signatures would not match with the new validator set of 1 validator, and that is why we had to change the signatures of the last commit by adding just one signature of the vote for the lastcommited height(that's why the command needs --validator-privkey)
  1. Additional modifications to the staking module:
  • old validators are also removed from the store prefixed with ValidatorsKey. This is necessary and it is not only enough to set validator power to 0 for old validators because when the new block is created, staking will give validator updates from consensus, and the consensus will try to remove/add validators based on this, and the old vals with the power 0 will be seen as deletes and consensus will return err "applying the validator changes would result in an empty set"
  • unbounding old validators are removed from the ValidatorQueueKey store, otherwise the delition will be tried from the staking endblocker in the new block and it will fail because the validator does not exist in validator store anymore (we deleted it from the ValidatorsKey store because of the problem explained previously)

Future improvements

  1. cosmos-sdk v0.50 introduces the in-place-testnet command that offers changing the state and running the testnet node.

AddTestnetCreatorCommand receives the newTestnetApp app initer so we could easily switch to using that command when gaia is upgraded to v0.50.x, leveraging the current code since both commands do the same thing which is changing the state and running the node.

  1. There is a corner case where the node, according to the current implementation, will not be able to start. This happens when a snapshot is taken at an "inconvenient" moment, i.e., when the consensus store version is higher than the application version, but the newer version is written in the application's IAVL stores. In a regular node restart, this would be resolved by replaying the last block from the store, but we cannot do that because the block hash would not be the same. The solution is to delete the last uncommitted block from the consensus store to prevent replay, but this also requires deleting the tree from IAVL store with the version of that uncommitted block. Since IAVL does not export a function that could remove this, and pruning 'future' versions is not allowed, we currently cannot fix this case (e.g., If the DeleteVersionsFrom function on the IAVL mutable tree were available, which is called when rolling back to a version to delete higher versions, we could resolve this case)

@stana-miric stana-miric self-assigned this Apr 23, 2024
@dasanchez
Copy link
Contributor

dasanchez commented Apr 24, 2024

Hi! Could you post the priv_validator_key.json you're using to run this as well? That's all we should need, right?
We tried replacing these two values:

newValAddrStr := "D6E0B0F975791D654B6E2F315F1DC1FC9422F078"
newValPubKeyStr := "SLpHEfzQHuuNO9J1BB/hXyiH6c1NmpoIVQ2pMWmyctE="

But the log shows "This node is not a validator" on start.

@fastfadingviolets
Copy link

We managed to get around the issue mentioned by @dasanchez --it turns out this build only works if the gaia home is the default $HOME/.gaia directory. I think this is hardcoded in the updateConsensusState func, it'd be good to take it from the --home arg.

However, we then hit this nice consensus failure:

4:36PM ERR CONSENSUS FAILURE!!! err="commit size (50) doesn't match validator set length (1) at height 6183661\n\n[CommitSig{62AD36C02C7E by AE84D29EC8E3 on 2 @ 2024-04-24T15:34:35.731375604Z} CommitSig{0F0D842915BE by A1023B41F58B on 2 @ 2024-04-24T15:34:35.746306947Z} CommitSig{56909F02C5DE by 56E8B6ABC373 on 2 @ 2024-04-24T15:34:35.758487936Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{D6DD4C214E48 by A98E7F85928D on 2 @ 2024-04-24T15:34:35.790217383Z} CommitSig{D7D552D73523 by 34E7B1666275 on 2 @ 2024-04-24T15:34:35.745612334Z} CommitSig{EB554294ABB9 by 76B9CA78AE2F on 2 @ 2024-04-24T15:34:35.783484384Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{3DEE6386017C by 67FFCD2A4A82 on 2 @ 2024-04-24T15:34:35.783815637Z} CommitSig{2D4D7676C5FC by 48BF48F8637E on 2 @ 2024-04-24T15:34:35.768496241Z} CommitSig{48F23FFDCB21 by F87BED25738F on 2 @ 2024-04-24T15:34:35.747961075Z} CommitSig{60E7AC3908B9 by AB6DBD316687 on 2 @ 2024-04-24T15:34:35.754379781Z} CommitSig{0BA7003F1B0F by 6B393E5CC754 on 2 @ 2024-04-24T15:34:35.793654951Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{23C487302A90 by 5030B258A762 on 2 @ 2024-04-24T15:34:35.796646891Z} CommitSig{04AA39F48C66 by 688FDD2880D3 on 2 @ 2024-04-24T15:34:35.76659652Z} CommitSig{A21C7F2402B5 by 9911F9565582 on 2 @ 2024-04-24T15:34:35.794022795Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{22653CBA2D48 by AD38D301C37C on 2 @ 2024-04-24T15:34:35.742749415Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{67C24E881055 by 1E3AFDD98E0E on 2 @ 2024-04-24T15:34:35.777865383Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{F74B208A0A5A by 0218833A0214 on 2 @ 2024-04-24T15:34:35.724602318Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{92B294AC879D by 5BF8DEB9F40D on 2 @ 2024-04-24T15:34:35.796944665Z} CommitSig{0F3648D35E66 by 2237488A57ED on 2 @ 2024-04-24T15:34:35.77463921Z} CommitSig{549782F856D0 by CD959E25373D on 2 @ 2024-04-24T15:34:35.752285202Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{ED83135BB7E6 by 1FB8BCBFFA73 on 2 @ 2024-04-24T15:34:35.709791324Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{D9A6B5A22FD2 by 3D065D6DB8D4 on 2 @ 2024-04-24T15:34:35.760234918Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{1E1EEA96C1AE by 546226DD302A on 2 @ 2024-04-24T15:34:35.749558248Z} CommitSig{5A3FEB160518 by 55777DB9AFC5 on 2 @ 2024-04-24T15:34:35.803563962Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z}]\n\n[Validator{973C48DF8B3356C45E44494723A6E0D45DEB8131 PubKeyEd25519{C40AB38ECE9490483C62FA10CBAD1BC72B48A1C383C680D3351CF8F87F35B537} VP:900000000000000 A:0}]" module=consensus stack="goroutine 75 [running]:\nruntime/debug.Stack()\n\truntime/debug/stack.go:24 +0x5e\ngithub.com/cometbft/cometbft/consensus.(*State).receiveRoutine.func2()\n\tgithub.com/cometbft/cometbft@v0.37.4/consensus/state.go:736 +0x46\npanic({0x248ee00?, 0xc002e9c9a0?})\n\truntime/panic.go:914 +0x21f\ngithub.com/cometbft/cometbft/state.buildLastCommitInfo(0xc0013a41e0, {0x3598920?, 0xc0029c7ef0?}, 0x6?)\n\tgithub.com/cometbft/cometbft@v0.37.4/state/execution.go:413 +0x3b9\ngithub.com/cometbft/cometbft/state.(*BlockExecutor).CreateProposalBlock(_, _, {{{0xb, 0x0}, {0xc00271b42a, 0x6}}, {0xc00271b470, 0x8}, 0x1, 0x5e5aec, ...}, ...)\n\tgithub.com/cometbft/cometbft@v0.37.4/state/execution.go:113 +0x1ac\ngithub.com/cometbft/cometbft/consensus.(*State).createProposalBlock(0xc000fbc380)\n\tgithub.com/cometbft/cometbft@v0.37.4/consensus/state.go:1244 +0x22f\ngithub.com/cometbft/cometbft/consensus.(*State).defaultDecideProposal(0xc000fbc380, 0x5e5aed, 0x0)\n\tgithub.com/cometbft/cometbft@v0.37.4/consensus/state.go:1152 +0x53\ngithub.com/cometbft/cometbft/consensus.(*State).enterPropose(0xc000fbc380, 0x5e5aed, 0x0)\n\tgithub.com/cometbft/cometbft@v0.37.4/consensus/state.go:1131 +0x83e\ngithub.com/cometbft/cometbft/consensus.(*State).enterNewRound(0xc000fbc380, 0x5e5aed, 0x0)\n\tgithub.com/cometbft/cometbft@v0.37.4/consensus/state.go:1050 +0xb38\ngithub.com/cometbft/cometbft/consensus.(*State).handleTimeout(0xc000fbc380, {0xc0013a3cd8?, 0xc00018e65e?, 0x13a3ba0?, 0xc0?}, {0x5e5aed, 0x0, 0x1, {0x3a0871bd, 0xeddbb2c02, ...}, ...})\n\tgithub.com/cometbft/cometbft@v0.37.4/consensus/state.go:919 +0x909\ngithub.com/cometbft/cometbft/consensus.(*State).receiveRoutine(0xc000fbc380, 0x0)\n\tgithub.com/cometbft/cometbft@v0.37.4/consensus/state.go:801 +0x650\ncreated by github.com/cometbft/cometbft/consensus.(*State).OnStart in goroutine 54\n\tgithub.com/cometbft/cometbft@v0.37.4/consensus/state.go:383 +0x10c\n"

we figure something's checking the number of validators in the set and panicking when it goes from 50 to 1 from one block to the next

@stana-miric
Copy link
Contributor Author

Hi! Could you post the priv_validator_key.json you're using to run this as well? That's all we should need, right? We tried replacing these two values:

newValAddrStr := "D6E0B0F975791D654B6E2F315F1DC1FC9422F078"
newValPubKeyStr := "SLpHEfzQHuuNO9J1BB/hXyiH6c1NmpoIVQ2pMWmyctE="

But the log shows "This node is not a validator" on start.

Sure I can, but I've just pushed the version where the code for modifying app and consensus state is executed by the newly added unsafe-set-local-validator command and not in the app.go, which is what we want and you have flags to add the keys from your local validator.
In the first commit cadb4ea where the code is in the app.go I've used the keys from my local validator since the was just the proof of concept to see if the validator replacement is even possible because sdk instruction is incomplete. If using this version you can just put you own newValAddrStr and newValPubKeyStr and build the app but now when we have the command its easier using that.

To run the binary including this command use this commit and build gaia with the unsafe_set_local_validator tag(we agreed not to include this command unconditionally)
make build BUILD_TAGS="-tag unsafe_set_local_validator"
when you build it you can run command to change validators by running:
./gaiad testnet unsafe-set-local-validator --validator-operator= --validator-pukey= --accounts-to-fund=
after that you can start the node(validator address and key supplied are expected to be the one from local validator)

Note: the pr is still in draft state, the command is tested only on local machine by running gaia app for some time, then stopping it, then taking that state and setting new validator info and starting the node again with new validators. It is possible that command might change when we test it with the real mainnet data and check if something more needs to be covered and see if we need some additional flags to make some things configurable. After we confirm everything we will write detail description how to use the command.

In the case you want to use my keys and the first commit, I attached the files used val-files.zip

@stana-miric
Copy link
Contributor Author

We managed to get around the issue mentioned by @dasanchez --it turns out this build only works if the gaia home is the default $HOME/.gaia directory. I think this is hardcoded in the updateConsensusState func, it'd be good to take it from the --home arg.

However, we then hit this nice consensus failure:

4:36PM ERR CONSENSUS FAILURE!!! err="commit size (50) doesn't match validator set length (1) at height 6183661\n\n[CommitSig{62AD36C02C7E by AE84D29EC8E3 on 2 @ 2024-04-24T15:34:35.731375604Z} CommitSig{0F0D842915BE by A1023B41F58B on 2 @ 2024-04-24T15:34:35.746306947Z} CommitSig{56909F02C5DE by 56E8B6ABC373 on 2 @ 2024-04-24T15:34:35.758487936Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{D6DD4C214E48 by A98E7F85928D on 2 @ 2024-04-24T15:34:35.790217383Z} CommitSig{D7D552D73523 by 34E7B1666275 on 2 @ 2024-04-24T15:34:35.745612334Z} CommitSig{EB554294ABB9 by 76B9CA78AE2F on 2 @ 2024-04-24T15:34:35.783484384Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{3DEE6386017C by 67FFCD2A4A82 on 2 @ 2024-04-24T15:34:35.783815637Z} CommitSig{2D4D7676C5FC by 48BF48F8637E on 2 @ 2024-04-24T15:34:35.768496241Z} CommitSig{48F23FFDCB21 by F87BED25738F on 2 @ 2024-04-24T15:34:35.747961075Z} CommitSig{60E7AC3908B9 by AB6DBD316687 on 2 @ 2024-04-24T15:34:35.754379781Z} CommitSig{0BA7003F1B0F by 6B393E5CC754 on 2 @ 2024-04-24T15:34:35.793654951Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{23C487302A90 by 5030B258A762 on 2 @ 2024-04-24T15:34:35.796646891Z} CommitSig{04AA39F48C66 by 688FDD2880D3 on 2 @ 2024-04-24T15:34:35.76659652Z} CommitSig{A21C7F2402B5 by 9911F9565582 on 2 @ 2024-04-24T15:34:35.794022795Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{22653CBA2D48 by AD38D301C37C on 2 @ 2024-04-24T15:34:35.742749415Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{67C24E881055 by 1E3AFDD98E0E on 2 @ 2024-04-24T15:34:35.777865383Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{F74B208A0A5A by 0218833A0214 on 2 @ 2024-04-24T15:34:35.724602318Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{92B294AC879D by 5BF8DEB9F40D on 2 @ 2024-04-24T15:34:35.796944665Z} CommitSig{0F3648D35E66 by 2237488A57ED on 2 @ 2024-04-24T15:34:35.77463921Z} CommitSig{549782F856D0 by CD959E25373D on 2 @ 2024-04-24T15:34:35.752285202Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{ED83135BB7E6 by 1FB8BCBFFA73 on 2 @ 2024-04-24T15:34:35.709791324Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{D9A6B5A22FD2 by 3D065D6DB8D4 on 2 @ 2024-04-24T15:34:35.760234918Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z} CommitSig{1E1EEA96C1AE by 546226DD302A on 2 @ 2024-04-24T15:34:35.749558248Z} CommitSig{5A3FEB160518 by 55777DB9AFC5 on 2 @ 2024-04-24T15:34:35.803563962Z} CommitSig{000000000000 by 000000000000 on 1 @ 0001-01-01T00:00:00Z}]\n\n[Validator{973C48DF8B3356C45E44494723A6E0D45DEB8131 PubKeyEd25519{C40AB38ECE9490483C62FA10CBAD1BC72B48A1C383C680D3351CF8F87F35B537} VP:900000000000000 A:0}]" module=consensus stack="goroutine 75 [running]:\nruntime/debug.Stack()\n\truntime/debug/stack.go:24 +0x5e\ngithub.com/cometbft/cometbft/consensus.(*State).receiveRoutine.func2()\n\tgithub.com/cometbft/cometbft@v0.37.4/consensus/state.go:736 +0x46\npanic({0x248ee00?, 0xc002e9c9a0?})\n\truntime/panic.go:914 +0x21f\ngithub.com/cometbft/cometbft/state.buildLastCommitInfo(0xc0013a41e0, {0x3598920?, 0xc0029c7ef0?}, 0x6?)\n\tgithub.com/cometbft/cometbft@v0.37.4/state/execution.go:413 +0x3b9\ngithub.com/cometbft/cometbft/state.(*BlockExecutor).CreateProposalBlock(_, _, {{{0xb, 0x0}, {0xc00271b42a, 0x6}}, {0xc00271b470, 0x8}, 0x1, 0x5e5aec, ...}, ...)\n\tgithub.com/cometbft/cometbft@v0.37.4/state/execution.go:113 +0x1ac\ngithub.com/cometbft/cometbft/consensus.(*State).createProposalBlock(0xc000fbc380)\n\tgithub.com/cometbft/cometbft@v0.37.4/consensus/state.go:1244 +0x22f\ngithub.com/cometbft/cometbft/consensus.(*State).defaultDecideProposal(0xc000fbc380, 0x5e5aed, 0x0)\n\tgithub.com/cometbft/cometbft@v0.37.4/consensus/state.go:1152 +0x53\ngithub.com/cometbft/cometbft/consensus.(*State).enterPropose(0xc000fbc380, 0x5e5aed, 0x0)\n\tgithub.com/cometbft/cometbft@v0.37.4/consensus/state.go:1131 +0x83e\ngithub.com/cometbft/cometbft/consensus.(*State).enterNewRound(0xc000fbc380, 0x5e5aed, 0x0)\n\tgithub.com/cometbft/cometbft@v0.37.4/consensus/state.go:1050 +0xb38\ngithub.com/cometbft/cometbft/consensus.(*State).handleTimeout(0xc000fbc380, {0xc0013a3cd8?, 0xc00018e65e?, 0x13a3ba0?, 0xc0?}, {0x5e5aed, 0x0, 0x1, {0x3a0871bd, 0xeddbb2c02, ...}, ...})\n\tgithub.com/cometbft/cometbft@v0.37.4/consensus/state.go:919 +0x909\ngithub.com/cometbft/cometbft/consensus.(*State).receiveRoutine(0xc000fbc380, 0x0)\n\tgithub.com/cometbft/cometbft@v0.37.4/consensus/state.go:801 +0x650\ncreated by github.com/cometbft/cometbft/consensus.(*State).OnStart in goroutine 54\n\tgithub.com/cometbft/cometbft@v0.37.4/consensus/state.go:383 +0x10c\n"

we figure something's checking the number of validators in the set and panicking when it goes from 50 to 1 from one block to the next

yeah I ran to similar consensus failure error and also few different and fixed everything I run to. I managed to run the node after I switched validator but note that i did it with local gaia app, so maybe something more will happen when I take the mainnet data. Tomorrow will focus on testing, because until now I was spent time fixing those numerous consensus issues, Thanks for posting the error, I'll see if this is fixed with the new code. Can you point me to the chain state you used for testing to try the same tomorrow? And yes, will double check for some additional flags to add to the command. Depends on the way how we want to use it, but i thought the use case is following(correct me if I'm wrong):

  1. We take the node state from the mainnet (or other running chain) e.g. .gaiad folder and copy that to our local node
  2. We manually copy our keys we want to use(priv_validator_key and other key related files..,also priv_validator_state values needs to be set to 0). Now, we have state of the mainnet node and our local keys
  3. We run the command to change the state of application and consensus so that app can be started with our local keys
  4. We start the node

note that unsafe_set_local_validator command will not start the node, it will just update the state. in sdk v0.50 there is server command AddTestnetCreatorCommand that offers convenient way of providing app with updated state and it will run node as well but since gaia is on v0.47 we're going with the unsafe_set_local_validator command for now(its good enough for now and later we can switch to AddTestnetCreatorCommand after upgrade and use the code from unsafe_set_local_validator for upgrading the app state).

I'll do the tests tomorrow with the more representative state(can use the same you did) to check if everything works ok as it does now with my local gaia app state.

@fastfadingviolets
Copy link

fastfadingviolets commented Apr 24, 2024

1. We take the node state from the mainnet (or other running chain) e.g. .gaiad folder and copy that to our local node

2. We manually copy our keys we want to use(priv_validator_key and other key related files..,also priv_validator_state values needs to be set to 0). Now, we have state of the mainnet node and our local keys

3. We run the command to change the state of application and consensus so that app can be started with our local keys

4. We start the node

this looks perfect to me! I do think it'd be potentially useful to have a different --home just to not have to wreck the existing ~/.gaia folder, but it's also not that big a deal

I'm testing out the new command on a node export right now. You can download it here. This is the key I'm using:

{
  "address": "973C48DF8B3356C45E44494723A6E0D45DEB8131",
  "pub_key": {
    "type": "tendermint/PubKeyEd25519",
    "value": "xAqzjs6UkEg8YvoQy60bxytIocODxoDTNRz4+H81tTc="
  },
  "priv_key": {
    "type": "tendermint/PrivKeyEd25519",
    "value": "V9E2OFJ8ghMu/M15KALuNh0ZafFBDt7aUrGcSPOfP9rECrOOzpSQSDxi+hDLrRvHK0ihw4PGgNM1HPj4fzW1Nw=="
  }
}

@fastfadingviolets
Copy link

fastfadingviolets commented Apr 24, 2024

hmm i got an error from unsafe-set-local-validator:

Error: decoding bech32 failed: invalid separator index 39

i'm using --accounts-to-fund cosmos1r5v5srda7xfth3hn2s26txvrcrntldjumt8mhl fwiw

@fastfadingviolets
Copy link

Actually, nevermind my above comment --realized I needed to pass the valoper address to the --validator-operator flag. I'm still getting the consensus failure when using the state I posted, though

@stana-miric
Copy link
Contributor Author

Actually, nevermind my above comment --realized I needed to pass the valoper address to the --validator-operator flag. I'm still getting the consensus failure when using the state I posted, though

Hey, I've just pushed the changes that fixes mentioned consensus failure. Now the node works after switching validator, I had to add some updates to consensus state and staking module(staking was the another error after the cons err was fixed). I've tested this with the data you provided, thanks for that! Also, you can use --home flag it will be taken into consideration. I've also changed the command a bit, I has to add the flag for validator's private key as well because I had to add the validators vote signature(this key is in the priv_validator_key file where the pubkey is also). Here is example of the command:
./gaiad testnet unsafe-set-local-validator --validator-operator="cosmosvaloper17fjdcqy7g80pn0seexcch5pg0dtvs45p57t97r" --validator-pukey="SLpHEfzQHuuNO9J1BB/hXyiH6c1NmpoIVQ2pMWmyctE=" --validator-privkey="AiayvI2px5CZVl/uOGmacfFjcIBoyk3Oa2JPBO6zEcdIukcR/NAe64070nUEH+FfKIfpzU2amghVDakxabJy0Q==" --accounts-to-fund="cosmos1ju6tlfclulxumtt2kglvnxduj5d93a64r5czge,cosmos1r5v5srda7xfth3hn2s26txvrcrntldjumt8mhl"

and I've noticed that indeed some addresses cannot be parsed to betch32 for some reason, I've tried some from the localMerlinAccounts and for some reason some of them give an error even when called with ./gaiad keys parse , which is weird but will take a look tomorrow why is that. You can use some addresses that can be parsed when running ./gaiad keys parse e.g. the one i used is cosmos1ju6tlfclulxumtt2kglvnxduj5d93a64r5czge.

running the command takes ~10min on my machine for the data you provided, and after that node runs successfully

@fastfadingviolets
Copy link

Yes!! This latest build works. I'm able to fork the chain I linked and produce blocks 💃

Copy link
Contributor

@MSalopek MSalopek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work!

Thank you for spending time on this it's super valuable!

@fastfadingviolets
Copy link

I gave this a shot with v15 --you can see the branch I used here-- and a v15 mainnet snapshot from nodestake. The tool ran successfully as far as I can tell, but then the v15 chain refuses to start:

failed to load latest version: failed to load latest version: failed to load store: wanted to load target 20204357 but only found up to 20204356

I'm guessing my very naive v15 port was missing something or other

@stana-miric
Copy link
Contributor Author

I gave this a shot with v15 --you can see the branch I used here-- and a v15 mainnet snapshot from nodestake. The tool ran successfully as far as I can tell, but then the v15 chain refuses to start:

failed to load latest version: failed to load latest version: failed to load store: wanted to load target 20204357 but only found up to 20204356

I'm guessing my very naive v15 port was missing something or other

Same error on my machine, but as it looks like it is because this state is significantly larger (~18gb app store) and it took 1.5 hours for the command to be executed, and it finishes without saving changes to the app store. I tried to find other way to save the changes without calling revert to previous height because that call is the one that lasts, otherwise the command would be very quick. But unfortunately did not have success. We would not have this problem once we switch to a command that runs the node immediately after changing the state (this is enabled in sdk v0.50). I can try a few more things when I return from holiday next Tuesday, in order to avoid calling rollback that makes a problem with large state.

@MSalopek
Copy link
Contributor

MSalopek commented May 1, 2024

We would not have this problem once we switch to a command that runs the node immediately after changing the state (this is enabled in sdk v0.50). I can try a few more things when I return from holiday next Tuesday, in order to avoid calling rollback that makes a problem with large state.

@stana-miric
Feel free to change the command to start the blockchain right away if it makes the handling easier.

Stopping after swapping the valset is not a hard requirement, just a nice to have. Users can always stop the chain themselves.

@MSalopek MSalopek marked this pull request as draft May 9, 2024 12:41
@stana-miric
Copy link
Contributor Author

We would not have this problem once we switch to a command that runs the node immediately after changing the state (this is enabled in sdk v0.50). I can try a few more things when I return from holiday next Tuesday, in order to avoid calling rollback that makes a problem with large state.

@stana-miric Feel free to change the command to start the blockchain right away if it makes the handling easier.

Stopping after swapping the valset is not a hard requirement, just a nice to have. Users can always stop the chain themselves.

The command has been modified so that now, after it updates the state in the store, it also starts a node. The command name (as well as the build tag name) has been changed, so the command is now "unsafe-set-local-validator" instead of "unsafe-start-local-validator." Since the command also starts a node, in addition to the flags related to the new validator, we will specify the desired flags for starting the validator as well.

I gave this a shot with v15 --you can see the branch I used here-- and a v15 mainnet snapshot from nodestake. The tool ran successfully as far as I can tell, but then the v15 chain refuses to start:

failed to load latest version: failed to load latest version: failed to load store: wanted to load target 20204357 but only found up to 20204356

I'm guessing my very naive v15 port was missing something or other

I've tested this with the new changes and everything works ok. Execution is also much faster compared to the old command.

@fastfadingviolets
Copy link

i think i got it to work, but i do see this sort of thing spammed in logs:

11:11AM ERR failed to process message err="error adding vote" height=20403524 module=consensus msg_type=*consensus.VoteMessage peer=7c8c0d89b2f3416a8599cb423d770e62f869f2e9 round=0
11:11AM INF failed attempting to add vote err="cannot find validator 41 in valSet of size 1: invalid validator index" module=consensus
11:11AM ERR failed to process message err="error adding vote" height=20403524 module=consensus msg_type=*consensus.VoteMessage peer=7c8c0d89b2f3416a8599cb423d770e62f869f2e9 round=0
11:11AM INF failed attempting to add vote err="cannot find validator 90 in valSet of size 1: invalid validator index" module=consensus
11:11AM ERR failed to process message err="error adding vote" height=20403524 module=consensus msg_type=*consensus.VoteMessage peer=7c8c0d89b2f3416a8599cb423d770e62f869f2e9 round=0
11:11AM INF failed attempting to add vote err="cannot find validator 141 in valSet of size 1: invalid validator index" module=consensus
11:11AM ERR failed to process message err="error adding vote" height=20403524 module=consensus msg_type=*consensus.VoteMessage peer=7c8c0d89b2f3416a8599cb423d770e62f869f2e9 round=0

not urgent, but if there's a way we could stop those, that'd be nice --they make it tough to read much of anything in logs

@stana-miric
Copy link
Contributor Author

i think i got it to work, but i do see this sort of thing spammed in logs:

11:11AM ERR failed to process message err="error adding vote" height=20403524 module=consensus msg_type=*consensus.VoteMessage peer=7c8c0d89b2f3416a8599cb423d770e62f869f2e9 round=0
11:11AM INF failed attempting to add vote err="cannot find validator 41 in valSet of size 1: invalid validator index" module=consensus
11:11AM ERR failed to process message err="error adding vote" height=20403524 module=consensus msg_type=*consensus.VoteMessage peer=7c8c0d89b2f3416a8599cb423d770e62f869f2e9 round=0
11:11AM INF failed attempting to add vote err="cannot find validator 90 in valSet of size 1: invalid validator index" module=consensus
11:11AM ERR failed to process message err="error adding vote" height=20403524 module=consensus msg_type=*consensus.VoteMessage peer=7c8c0d89b2f3416a8599cb423d770e62f869f2e9 round=0
11:11AM INF failed attempting to add vote err="cannot find validator 141 in valSet of size 1: invalid validator index" module=consensus
11:11AM ERR failed to process message err="error adding vote" height=20403524 module=consensus msg_type=*consensus.VoteMessage peer=7c8c0d89b2f3416a8599cb423d770e62f869f2e9 round=0

not urgent, but if there's a way we could stop those, that'd be nice --they make it tough to read much of anything in logs

Those error msgs dont have effect on node startup as you said, but you can remove them by simply deleting files from the cs.wal folder which is located in data folder (this is Write-Ahead Log but the node will start even if it is deleted) before running the command and starting the node

@stana-miric stana-miric marked this pull request as ready for review May 14, 2024 10:45
@MSalopek MSalopek changed the title Create CLI tool to help with mainnet export testing tests: create CLI tool to help with mainnet export testing May 14, 2024
@MSalopek MSalopek merged commit 14e5927 into main May 14, 2024
17 checks passed
@MSalopek MSalopek deleted the mainnet-export-testing-cli branch May 14, 2024 15:00
mergify bot pushed a commit that referenced this pull request May 14, 2024
* replace validators

* added unsafe-set-local-validator testnet command

* fixed blockstate and lastcommit

* update comments and descriptions

* unsafe-set-local-validator changed to unsafe-start-local-validator

* delete block from store if app and cons state are not at that version

* add docs on testnet extensions

* fix typo; add extra docs

---------

Co-authored-by: MSalopek <matija.salopek994@gmail.com>
(cherry picked from commit 14e5927)
mergify bot pushed a commit that referenced this pull request May 14, 2024
* replace validators

* added unsafe-set-local-validator testnet command

* fixed blockstate and lastcommit

* update comments and descriptions

* unsafe-set-local-validator changed to unsafe-start-local-validator

* delete block from store if app and cons state are not at that version

* add docs on testnet extensions

* fix typo; add extra docs

---------

Co-authored-by: MSalopek <matija.salopek994@gmail.com>
(cherry picked from commit 14e5927)
mergify bot pushed a commit that referenced this pull request May 14, 2024
* replace validators

* added unsafe-set-local-validator testnet command

* fixed blockstate and lastcommit

* update comments and descriptions

* unsafe-set-local-validator changed to unsafe-start-local-validator

* delete block from store if app and cons state are not at that version

* add docs on testnet extensions

* fix typo; add extra docs

---------

Co-authored-by: MSalopek <matija.salopek994@gmail.com>
(cherry picked from commit 14e5927)
MSalopek pushed a commit that referenced this pull request May 14, 2024
…3095)

* replace validators

* added unsafe-set-local-validator testnet command

* fixed blockstate and lastcommit

* update comments and descriptions

* unsafe-set-local-validator changed to unsafe-start-local-validator

* delete block from store if app and cons state are not at that version

* add docs on testnet extensions

* fix typo; add extra docs

---------

Co-authored-by: MSalopek <matija.salopek994@gmail.com>
(cherry picked from commit 14e5927)

Co-authored-by: Stana Miric <stana.miric@ethernal.tech>
MSalopek pushed a commit that referenced this pull request May 14, 2024
…3096)

* replace validators

* added unsafe-set-local-validator testnet command

* fixed blockstate and lastcommit

* update comments and descriptions

* unsafe-set-local-validator changed to unsafe-start-local-validator

* delete block from store if app and cons state are not at that version

* add docs on testnet extensions

* fix typo; add extra docs

---------

Co-authored-by: MSalopek <matija.salopek994@gmail.com>
(cherry picked from commit 14e5927)

Co-authored-by: Stana Miric <stana.miric@ethernal.tech>
MSalopek pushed a commit that referenced this pull request May 14, 2024
…3097)

* replace validators

* added unsafe-set-local-validator testnet command

* fixed blockstate and lastcommit

* update comments and descriptions

* unsafe-set-local-validator changed to unsafe-start-local-validator

* delete block from store if app and cons state are not at that version

* add docs on testnet extensions

* fix typo; add extra docs

---------

Co-authored-by: MSalopek <matija.salopek994@gmail.com>
(cherry picked from commit 14e5927)

Co-authored-by: Stana Miric <stana.miric@ethernal.tech>
@MSalopek MSalopek linked an issue May 17, 2024 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create CLI tool to help with mainnet export testing
5 participants