Skip to content
This repository has been archived by the owner on Sep 23, 2023. It is now read-only.

Create a binary version of hex_patricia_hashed - bin_patricia_hashes #288

Open
AlexeyAkhunov opened this issue Jan 29, 2022 · 2 comments
Open
Assignees

Comments

@AlexeyAkhunov
Copy link
Contributor

hex_patricia_hashed in the commitment package is the implementation of the commitment that is currently used in Ethereum - Hexadecimal Patricia Merkle tree with pre-hashes keys.

hex means that the Merkle tree is hexadecimal (rather than binary), i.e. each node has up to 16 children
patricia means it follows PATRICIA design (Practical Algorithm to Retrieve Information Coded in Alphanumeric). Notable feature is compressing the runs of nodes having just 1 child, into a single node (extension and leaf nodes).
hashed means that state keys are not directly used as paths in the tree (as it would be in a conventional radix tree), but they are first pre-processed by Keccak hash function, to resist the creation of specific structures of node to trigger worst-case performances of radix trees.

We need to create a variant of this called bin_patricia_hashed, with difference that instead of 16, any node may have up to 2 children. This also means that instead of using 4 bits of the hashed key per level of the tree, we need to use 1 bit per level.

Integrating this into prototype and running two variants alongside each other, would give us information about the overhead of binary variant in terms of number of nodes, and consequently, size of commitment files.

@primalcs primalcs self-assigned this Jan 31, 2022
@AlexeyAkhunov AlexeyAkhunov assigned awskii and unassigned primalcs Mar 14, 2022
@awskii awskii closed this as completed Mar 15, 2022
@awskii awskii reopened this Mar 15, 2022
@awskii
Copy link
Collaborator

awskii commented Apr 8, 2022

Finished implementation of bin patricia trie.

Goals of that implementation are following:

  • obtain same hashes as bintrie implementation at go-ethereum
  • ability to produce aggregation artifacts as it does by HexPatriciaHashed
  • compare aggregation files sizes from bin and hex trie

First issue were resolved simply by using same hash approach as used in go-eth implementation . That approach uses node paths (here and after node path is a prefix you need to follow from root to that node) as well as node values, which requires more hash re-evaluations on insertion. E.g. in case when we insert new node which splits existing node path - we should re-evaluate all hashes from that node to it's leaves, then re-evaluate hashes for all parental nodes up to the root. IMO, that approach is discussable, but for now, we use it.

Second issue required a bit more complex approach. To produce artifacts, aggregator uses binary branch updates, rewrites them on the fly (replacing hashed keys with stored murmur3 hashes and so on).
I tried to be as much closer to hex updates implementation as i could.

  • Added new flag for branch nodes BRANCH_PART.
  • branches (node with L and R child) are encoded as [LSize, LPadding, LPrefix, RSize, RPadding, RPrefix]
  • leaves encoded as [PlainKeySize, PlainKey, EncodedAccountSize, EncodedAccount]
  • added functions for extracting plainkeys from binary branch updates and their processing.
  • use of touchMap and afterMap, but using least two bits. 3 represents branch, 0 represents leaf

During insertion i do not update any hashes, they should be evaluated after ProcessUpdate is finished. That lead to first deviation from hex updates - branchUpdates does not provide any node hashes (there are no valid hashes at the moment of insertion).

Next, hex branch updates with same prefix could be merged with each other, for binary trie i decided instead of merge just use new version for provided prefix.

That implementation currently does not support:

  • deletions
  • any storage updates, only balance\nonce updates
  • partial hash evaluation instead of full re-hashing on call of RootHash()
  • we could not run erigon2 against chaindata directory processed for binary trie - while reading files we don't know, hex or bin update we met (maybe we should use fallback decode methods and stick to the worked one).

@awskii
Copy link
Collaborator

awskii commented Apr 20, 2022

  • Implemented deletions and storage updates
  • get rid of usage of branch length in node hash
  • reimplemented node and branch encoding, get rid of BRANCH_PART constant. Now we encode account\storage plain keys, node hashes. Branch node produces just branch hash, which led to correct branch merging after plain keys are replaced with murmur3 hash.
  • overall test coverage increased, get rid of obsolete tests, fixed update builder (delete update now encoded correctly)

My humble test on goerli chain up to block 1.200.000:

==> bin <==
index files total
5.1M	total
data files total
316M	total

==> hex <==
index files total
4.9M	total
data files total
203M	total

Met an issue with decompressor (out out bound error) both on hex\bin impelmentation, so decided to rebase branch on erigon devel and erigon-lib main branches.

After update, hex commitment seems to proceed correctly, while bin implementation seems quite incorrectly performing valTransform:

panic: runtime error: slice bounds out of range [20:2]

goroutine 50 [running]:
github.com/ledgerwatch/erigon-lib/compress.(*Getter).Next(0x840d894c050, {0x0, 0x0, 0x0})
	github.com/ledgerwatch/erigon-lib@v0.0.0-20220418023534-87a7a2124418/compress/decompress.go:390 +0x450
github.com/ledgerwatch/erigon-lib/aggregator.(*CommitmentValTransform).commitmentBinValTransform(0x84095657ed8, {0x840afa06720, 0x2b, 0x60}, {0x840cce2cba0, 0x0, 0x60})
	github.com/ledgerwatch/erigon-lib@v0.0.0-20220418023534-87a7a2124418/aggregator/aggregator.go:1573 +0x378
github.com/ledgerwatch/erigon-lib/aggregator.(*Aggregator).mergeIntoStateFile(0x140001ca000, 0x840aef35890, 0x0, 0x3, 0x0, 0xf423f, {0x140002074d0, 0x25}, 0x84095657de8, 0x104439828, ...)
	github.com/ledgerwatch/erigon-lib@v0.0.0-20220418023534-87a7a2124418/aggregator/aggregator.go:3344 +0x9c8
github.com/ledgerwatch/erigon-lib/aggregator.(*Aggregator).computeAggregation(0x140001ca000, 0x3, {0x840e8ee3180, 0xb, 0x10}, 0x0, 0xf423f, 0x8408a6f9de8, 0x104439828, 0x1, ...)
	github.com/ledgerwatch/erigon-lib@v0.0.0-20220418023534-87a7a2124418/aggregator/aggregator.go:3122 +0x380
github.com/ledgerwatch/erigon-lib/aggregator.(*Aggregator).backgroundMerge(0x140001ca000)
	github.com/ledgerwatch/erigon-lib@v0.0.0-20220418023534-87a7a2124418/aggregator/aggregator.go:1657 +0x668
created by github.com/ledgerwatch/erigon-lib/aggregator.NewAggregator
	github.com/ledgerwatch/erigon-lib@v0.0.0-20220418023534-87a7a2124418/aggregator/aggregator.go:1123 +0xa18

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants