New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
quick validator key swap #11264
Comments
The plan is to start with:
If looks fine then:
|
2024-05-15 (Wednesday) Update
Changes: Working on extending existing |
@saketh-are Is there anything that we need to do on the networking side? For example announce that that a different node has the validator keys? |
I don't believe any changes will be needed on the networking side. Every node in the network has either 1 or 2 public keys associated with it:
Every node maintains in memory a mapping of the AccountKey -> PeerId relationships it is aware of. There is some logic already implemented which allows the mapping to be updated if a validator key starts to be hosted by a different peer id. The things to make sure of are:
|
2024-05-21 (Tuesday) Update
However, the chain is stuck after the hotswap. Investigating why. |
Make it possible to quickly move validator keys from one node to another. The goal is to allow validators to perform common maintenance operations with no downtime.
In a typical scenario a validator needs to restart their node. The reason may be a new neard release, need to update configs or any issue that may be causing the node to misbehave. In such circumstance the node operator runs two nodes - the old one with the validator keys and the new one to get it warm and ready. Once the new node is ready the operator stops the old node, moves the validator keys to the new node and restarts the new node. Unfortunately restarting the node may take some time and this will get worse once the memtrie is release. This issue is to allow to move the validator keys from one node to another quickly.
It's important to make sure that no two nodes have the validator keys at the same time as those would both produce blocks and chunks which can be considered as malicious behaviour.
So far the best ideas to implement this are:
start height
in the future and stop the old node right before that height.The text was updated successfully, but these errors were encountered: