New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raft sync times out with default timers when boltdb is bigger than 5-6GB #11983
Comments
testing with higher settings for the raft transport timeout |
@write0nly In order to re-join the node, you should first clear out the entire raft data directory, so that both |
@raskchanky Thank you for the info. Indeed you are right. After doing the full rm, vault restart and unseal vault did recover (with the caveats mentioned at the bottom):
Caveats:
It's unclear to me if this Regarding the rm of vault.db + raft.db could this become a documentation point? Many thanks again! |
I agree we should write something up about defragmenting raft/boltdb. |
1- create a brand new vault cluster using the raft integrated storage backend
2- enable approle logins, create a role_id and secret_id
3- from a large set of clients run infinite loops that only do approle logins, nothing else in very harsh conditions, and leave it running for days
4- watch the database grow. the vault.db boltdb goes above 10GB
5- stop vault on one of the secondary nodes and move vault.db aside to vault.db.backup
6- restart and unseal the secondary
7- expectation is that the vault.db will be copied over like it normally is on smaller DBs
8- Instead experience errors where the node cannot copy the database and crashloops with:
9- the secondary never comes up
10- stop vault and move vault.db.backup back to vault.db, restart and unseal vault
11- vault starts working properly
FYI this was tested on 1.8.0-dev from #11072 (comment)
The text was updated successfully, but these errors were encountered: