Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snapshots TM 0.35: understand backfilling #5471

Closed
1 task
wwestgarth opened this issue Jun 13, 2022 · 4 comments
Closed
1 task

Snapshots TM 0.35: understand backfilling #5471

wwestgarth opened this issue Jun 13, 2022 · 4 comments
Assignees
Labels
blocked snapshots spike technical research typically required to prove a concept, validate an approach or better understand

Comments

@wwestgarth
Copy link
Contributor

wwestgarth commented Jun 13, 2022

Spike Overview

When testing the new snapshot-pipelines against devnet with Tendermint v0.35 we noticed additional logs during a snapshot restore that did not happen when restoring in v0.34. The logs seem to suggest that tendermint now "backfills" historic blocks after a restore, for example if you snapshot restore to block-height 1000 tendermint will restore to that height, but now also send over the tendermint block-data for heights 1-999? This seems to increase the time it takes for a restore to happen, almost negating the reason for snapshot restore it in the first place.

This ticket is to look at the new tendermint release notes, and config options to see if this action is configurable so that we can fully understand how the new tendermint version works.
The spec of it is here: https://github.com/tendermint/tendermint/blob/master/docs/architecture/adr-068-reverse-sync.md
The RFC is here: https://github.com/tendermint/spec/blob/master/rfc/005-reverse-sync.md

Our snapshot system-tests currently relying on not having block-data for block_height 1 as a check that a restore did happen, so this ticket may have implications in general for updating core to 0.35.

Specs

  • Link to spec or milestone document info for the feature

Acceptance Criteria

How do we know when this spike is ready to either drop or move into technical tasks:

  • We understand how tendermint handles snapshots in v0.35

Additional Details (optional)

Any additional information including known dependencies, impacted components.

Examples (optional)

Code snippets, links to prototypes.

@wwestgarth wwestgarth added spike technical research typically required to prove a concept, validate an approach or better understand snapshots labels Jun 13, 2022
@wwestgarth wwestgarth self-assigned this Jun 13, 2022
@ze97286
Copy link
Contributor

ze97286 commented Jun 13, 2022

so an update from discussion with tendermint:

  • This was by design. It only restores the headers, commits and validator sets (not the entire blocks). This is so the node is capable of validating evidence in the past. Therefore it should only backfill the blocks within the evidence age.
  • This is actually broken in 34 - meaning if a node receives evidence that refers to a block it doesn't have it will panic.
  • The configuration of how many blocks to backfill for evidence is in consensus/evidence params and the defaults are:
func DefaultEvidenceParams() EvidenceParams {
	return EvidenceParams{
		MaxAgeNumBlocks: 100000, // 27.8 hrs at 1block/s
		MaxAgeDuration:  48 * time.Hour,
		MaxBytes:        1048576, // 1MB
	}
}

@wwestgarth

@gordsport gordsport added this to the 🤠 Oregon Trail milestone Jun 13, 2022
@wwestgarth
Copy link
Contributor Author

That makes sense, thanks.

The changelog suggests that tendermint has added events for start/end statesync so in terms of this breaking the system tests we will just have to switch this to check for those events instead.

The tendermint docs suggest that the expiry on evidence should be equal to the stake bonding period, which is something I don't think vega has. Also the evidence is passed over the ABCI for the application to penalise a validator who tries to cheat (double vote etc.) which again vega does not do (at the moment). So we could set MaxAgeNumBlocks to be much smaller.

But I think it makes sense to wait until we've migrated to 0.35 and to see just how long it takes to backfill 10,000 blocks, and if the performance is really crap think about what to set the EvidenceParams to that better fits vegas use-case.

@gordsport gordsport self-assigned this Jul 11, 2022
@gordsport
Copy link
Contributor

blocked until:

@gordsport gordsport removed their assignment Jul 11, 2022
@gordsport gordsport added blocked and removed blocked labels Jul 22, 2022
@gordsport
Copy link
Contributor

Closing as we need to move back to 0.34 as advised by Tendermint

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked snapshots spike technical research typically required to prove a concept, validate an approach or better understand
Projects
Archived in project
Development

No branches or pull requests

3 participants