Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix hot state disk leak #5768

Merged
merged 2 commits into from
May 23, 2024
Merged

Fix hot state disk leak #5768

merged 2 commits into from
May 23, 2024

Conversation

michaelsproul
Copy link
Member

Issue Addressed

Fix an issue whereby the v5.2.0-RC hot database grows indefinitely.

Proposed Changes

The main reason for the unbounded growth is the storage of extra hot states.

I accidentally introduced the bug in:

In that PR, we added logic to both:

  1. Store advanced states in the database with a "temporary flag".
  2. Delete temporary flags for advanced states during block processing.

The problem is, we were deleting the temporary flags for all advanced states, when we should have only been doing this at skipped slots. For example, with 2 consecutive blocks at slots N and N + 1 we would store (forever) the advanced pre-state for N + 1. Our pruning logic would never delete this state because it was not marked temporary, and it was not found by the prune_abandoned_forks logic because it did not lie on any side chains. To fix this bug, we now only delete the temporary flag when a skipped slot occurs.

In addition to this, states with temporary flags were previously only deleted on startup. To allow pruning of these states without restaring, I've ported over some of the state pruning code from full tree-states, which iterates the HotStateSummarys in the database and deletes all states older than the split.slot (or equal to the split slot with a different state root). This is safe even in the case where the split block lies at a slot prior to the split slot, because the database only stores the split slot's advanced (epoch-aligned) state in this case, and we do not risk deleting a canonical state by deleting states with slot < split.slot.

Thanks to @antondlr for noticing the state growth on the testnet deployment.

@michaelsproul michaelsproul added bug Something isn't working database v5.2.0 Q2 2024 labels May 13, 2024
@michaelsproul
Copy link
Member Author

Seems like some of the tests are unhappy. I will look into it tomorrow.

@michaelsproul michaelsproul added the ready-for-review The code is ready for review label May 14, 2024
@michaelsproul
Copy link
Member Author

Compared to the pruning in tree-states, this PR will prune around an epoch "behind", i.e. when updating the split for a new finalization, we delete states that are prior to the current split (old finalization). I think this is acceptable, as it still prevents state growth. It's also simpler to implement, as the pruning method doesn't need to know which state roots are canonical, as in:

if summary.slot <= new_finalized_slot {
// If state root doesn't match state root from canonical chain, then delete.
// We may also find older states here that should have been deleted by `migrate_db`
// but weren't due to wonky I/O atomicity.
if newly_finalized_chain
.get(&summary.slot)
.map_or(true, |(_, canonical_state_root)| {
state_root != Hash256::from(*canonical_state_root)
})
{

michaelsproul added a commit that referenced this pull request May 14, 2024
Squashed commit of the following:

commit 0f6ea4b
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Tue May 14 10:23:29 2024 +1000

    Don't delete the genesis state when split is 0x0!

commit aa8f7c1
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Mon May 13 12:51:47 2024 +1000

    Fix hot state leak
@michaelsproul michaelsproul mentioned this pull request May 14, 2024
michaelsproul added a commit that referenced this pull request May 16, 2024
Squashed commit of the following:

commit 0f6ea4b
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Tue May 14 10:23:29 2024 +1000

    Don't delete the genesis state when split is 0x0!

commit aa8f7c1
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Mon May 13 12:51:47 2024 +1000

    Fix hot state leak
Copy link
Member

@jimmygchen jimmygchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very familiar with the code here but changes look reasonable and I can't spot any flaws! Looks good to me 👍

@jimmygchen jimmygchen added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels May 21, 2024
@michaelsproul
Copy link
Member Author

@Mergifyio queue

Copy link

mergify bot commented May 22, 2024

queue

✅ The pull request has been merged automatically

The pull request has been merged automatically at 8762d82

mergify bot added a commit that referenced this pull request May 22, 2024
@mergify mergify bot merged commit 8762d82 into sigp:unstable May 23, 2024
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working database ready-for-merge This PR is ready to merge. v5.2.0 Q2 2024
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants