Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Legacy receipt converter tool #23454

Closed
wants to merge 32 commits into from
Closed

Conversation

s1na
Copy link
Contributor

@s1na s1na commented Aug 24, 2021

See #23245 and #23247 for context. Adds a command db freezer-migrate that goes through the list of receipts in the freezer and converts the legacy ones.

@holiman
Copy link
Contributor

holiman commented Aug 24, 2021 via email

@s1na
Copy link
Contributor Author

s1na commented Aug 24, 2021

Ah right. I should check blocks from num 1 until a non-empty receipt is found then check if its legacy

@holiman
Copy link
Contributor

holiman commented Aug 24, 2021 via email

@MariusVanDerWijden
Copy link
Member

MariusVanDerWijden commented Sep 6, 2021

Do you have some numbers on how much space this saves on mainnet?

I'll sync my node up and can do a receipt conversion

@s1na s1na marked this pull request as ready for review November 18, 2021 09:01
@s1na
Copy link
Contributor Author

s1na commented Nov 18, 2021

To test this I did a full sync on an old commit for ~4mil blocks, then switched to version v1.9.0 for a while. Output of the migrate command:

INFO [11-18|13:07:37.018] First legacy receipt                     number=46147
INFO [11-18|13:07:37.018] Starting migration                       ancients=4,207,670
INFO [11-18|13:07:37.026] Processing legacy elements               number=0
INFO [11-18|13:07:43.214] Processing legacy elements               number=500,000
INFO [11-18|13:08:04.514] Processing legacy elements               number=1,000,000
INFO [11-18|13:08:41.157] Processing legacy elements               number=1,500,000
INFO [11-18|13:09:18.377] Processing legacy elements               number=2,000,000
INFO [11-18|13:09:57.718] Processing legacy elements               number=2,500,000
INFO [11-18|13:10:38.356] Processing legacy elements               number=3,000,000
INFO [11-18|13:11:48.204] Processing legacy elements               number=3,500,000
INFO [11-18|13:14:09.616] Found first non-legacy element           number=3,931,786 filenum=2 copyOver=true
INFO [11-18|13:14:21.536] Replacing table files
INFO [11-18|13:14:21.869] Migration finished                       duration=6m44.851466067s

Now to make sure the content of the files are correct I'm concatenating the table files and comparing that against the freezer of a node with a "modern" db. i.e.:

$ cmp -b ~/datadir-mainnet/geth/chaindata/ancient/receipts.part.cdat ~/datadir-receipts-backup/geth/chaindata/ancient/receipts.all.cdat
differ: byte 3312800212, line 12334563 is 257 M-/ 276 M->

But when I retrieve the encoded receipt for the corresponding block they seem to match. Wondering if it's snappy-related.

@s1na
Copy link
Contributor Author

s1na commented Nov 19, 2021

So I wrote a small script that compares the tables of a certain kind against that of another freezer and the result seems to match. My hunch as to the differences in the binary files is due to different snappy output. From this thread:

... Is snappy deterministic? ...
Yes, it will be the same, but not across library versions (and possibly not
even across architectures).

So possible that in v1.9.0 geth was using an older snappy version than master, causing the difference in compressed receipts that I saw via cmp.

Copy link
Contributor

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to do some more review of this, but have some comments so far

core/rawdb/freezer.go Outdated Show resolved Hide resolved
log.Info("No legacy receipts to migrate", "number", firstIdx)
return nil
}
log.Info("First legacy receipt", "number", firstIdx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afaict, this is the last time you make any use of firstIdx: the rest of the code starts over from 0 and feeds it into the transformer. And the transformer will (?) return stop=true for the first empty receipt list, so I don't see how this can work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to double-check to see why its working :) so an empty receipt list is both legacy and non-legacy and transformer will ask is this legacy? it is so stop=false. stop will be true only for the first non-empty non-legacy receipt list

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I was planning to pass firstIdx to MigrateTable so we could start from there but we also need to copy over the first empty receipts so I decided to keep MigrateTable as is and only use firstIdx to determine whether the db is in the old format.

@@ -101,6 +101,10 @@ func (t *table) Sync() error {
return t.db.Sync()
}

func (t *table) MigrateTable(kind string, fn TransformerFn) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring

@@ -111,6 +111,9 @@ type AncientWriter interface {

// Sync flushes all in-memory ancient store data to disk.
Sync() error

// MigrateTable Processes and migrates entries in a table to a new format.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document the semantics of the func([]byte)([]byte, bool, error)

Co-authored-by: Martin Holst Swende <martin@swende.se>
@holiman
Copy link
Contributor

holiman commented Nov 30, 2021 via email

@s1na
Copy link
Contributor Author

s1na commented Nov 30, 2021

Fair enough, I'll do a re-write which simply copies over all entries.

@fjl
Copy link
Contributor

fjl commented Mar 9, 2022

Closing because of #24028

@fjl fjl closed this Mar 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants