Stateless witness prefetcher changes #29519

karalabe · 2024-04-12T12:07:37Z

Superseeds #29035 because OP didn't permit modifications from maintainers...

karalabe · 2024-04-15T09:49:40Z

core/state/trie_prefetcher.go

 		case ch := <-sf.copy:
 			// Somebody wants a copy of the current trie, grant them
 			ch <- sf.db.CopyTrie(sf.trie)

 		case <-sf.stop:
-			// Termination is requested, abort and leave remaining tasks
+			// Termination is requested, abort


Changing the comment is nice, but the code doesn't reflect it :P

The code should check if sf.tasks is nil or not and if not, should keep looping until it becomes so, otherwise we run the risk of receiving a last task and immediately closing down; the close being executed first (remember, select branch evaluation is non-deterministic if multiple channels are ready).

karalabe · 2024-04-15T09:51:44Z

core/state/trie_prefetcher.go

 func (p *triePrefetcher) used(owner common.Hash, root common.Hash, used [][]byte) {
+	if p.closed {
+		return
+	}


These are IMO not good changes. It makes things harder to reason about and the close becomes this magic thing that nukes the prefercher offline, but I'm not sure that's the intended case, since close is also teh thing that waits for data to be finished. So we need to figure out what close does: kill it, or wait on it.

karalabe · 2024-04-15T09:52:31Z

core/state/trie_prefetcher.go

-			return nil
-		}
-		return sf.db.CopyTrie(sf.trie)
+		return nil


I don't really see the point of this change, it makes peek useless after close, but close it teh thing that waits for all the data to be loaded, so it's kind of ... weird

karalabe · 2024-04-15T09:53:08Z

core/state/trie_prefetcher.go

-// abort interrupts the subfetcher immediately. It is safe to call abort multiple
-// times but it is not thread safe.
-func (sf *subfetcher) abort() {
+// close waits for the subfetcher to finish its tasks. It cannot be called multiple times


But it can be called multiple times. Close might also not be the best name since we're waiting for it to finish but should AFAIK not kill the thing.

karalabe · 2024-04-15T09:54:56Z

core/state/trie_prefetcher.go

 			}
-
 		case ch := <-sf.copy:


I'm unsure about this code path here with the rewrite. Do we want to allow retrieval from a live prefetcher? If yes, why? Perhaps for tx boundaries? We should really document it somewhere why - if - it's needed. It's a very specific use case.

When we call updateTrie on a state object, we attempt to source the trie from the prefetcher. So copying from a live prefetcher is used here to preserve that functionality.

karalabe

I think one issue that teh PR does not address but it must is what the new lifecycle of the prefetcher is. Previously it was jsut something we threw data at, and then at some point we aborted it and pulled every useful data it [re-loaded and built our stuff on top.

The new logic seems to push it towards a witness where we wait for all data to be loaded before pulling and operating on it. But the code doesn't seem to reflect that, many paths instead becomming dud after a close.

Either this PR is only half the code needed that actually uses the prefetcher as is, or something's kind of borked. Either way, we must define what the intended behavior is and both document it as well as make sure teh prefetcher adheres to it.

I'm kind of wondering whether close is needed, rather we should have a wait method which perhaps just ensure everything is loaded. Whether we're between txs or at block end, waiting for prefetching to finish makes sense. I guess close might be needed to nuke out the loop goroutine, but we should still have a wait then before peeking at stuff. Ah, I guess the "implicit" behavioral thing this PR is aiming for is that the prefetcher is not thread safe so by the time qwe wall peek, any shceduled data is already prefetched. I don't think that's the case, at least it's a dangerous data-race to assume that events fired on 2 different channels will arrive in the exact order one expects. If this is the inteded behavior, I'd rather make it ever so slightly more explicit that hoping for a good order of events.

holiman · 2024-04-15T10:39:42Z

I'm kind of wondering whether close is needed, rather we should have a wait method which perhaps just ensure everything is loaded

As I see it, the prefetcher needs a couple of phases.

Phase 1: open for scheduling. At this point, it accepts tasks to be fetched. Callers must not (cannot?) retrieve data from it at this point. When an external caller tells it to, it goes into
Phase 2: No longer open for scheduling tasks. At this point, finishes all tasks, and once all tasks are done, it goes into
Phase 3: (again, not open for scheduling tasks) At this point, callers can retrieve data from it.

Perhaps we need something more elaborate than this, but, whatever we need, we would be well served by first jotting down the description in human language; before doing some lock/mutex/channel-based implementation of "something"

karalabe · 2024-04-15T13:41:08Z

As I understand the difference between the old and new prefetcher is (should be) as follows:

Old pre-fetcher:
- Purpose is to warm up the trie during execution, so that while we're crunching some EVM code, our disk is kept busy pulling in data that the hasher will need at the end.
- All operations are async, running in the background, the most important thing is to never ever block. If we get more useful data great, if less, thats life, but we should never hold up execution.
- When execution reaches a boundary (IntermediateRoot pre-Byzantium; or Finalize after Byzantium), insta-terminate all pre-fetchers to avoid the main committer thread from racing for disk accesses. Whatever we managed to load will be used, the rest pulled on demand.
New pre-fetcher:
- Purpose is to act as a witness constructor (write only for now) during execution, so that while we're crunching some EVM code, our disk is kept busy pulling in data that both the hasher hasher, but also a cross-validator will need at the end.
- Almost all operations are async, running in the background, the most important thing is to never ever block during EVM execution. However, on commit boundaries we have to switch to blocking mode, since the witness needs all data, not just whatever we loaded until that point in time.
- When execution reaches a boundary (IntermediateRoot pre-Byzantium; or Finalize after Byzantium), wait for all pre-fetchers to finish. This will block the main committer thread, but ideally if we're not loading junk, it should be all the same, the data needs to be loaded anyway to commit. For the witness, the data must be loaded, before tries are mutated.
(Threading) Quirks:
- Pre-Byzantium does an IntermediateRoot call between each transaction. A witness pre-fetcher for that block range must support stopping after a transaction, collecting the witness; then continuing against the next transaction, collecting witnesses from updated tries. This is significantly more complex from both a witness and a threading perspective, to have data across tries. Given that pre-Byzantium is ancient, it doesn't make sense to support it, but we need to very explicitly handle / reject that case, otherwise it's going to be "weird" trying to understand the code.
- The old pre-fetcher was best-effort, with no guarantees on correctness (as to how much and what data it loaded). The new pre-fetcher needs to be correct to construct a proper witness, so sometimes blocking is necessary. That however means that code paths need to be re-thought, as we still want to maximise the main EVM execution pathways even whilst waiting for data. Particularly, when terminating a pre-fetcher (i.e. waiting), we should start integrating results from finished subfetchers before waiting for all storage tries to finish loading.
Qustions:
- Does slot mutation order make the witness different? I.e. If i change 3 slots in a contract (including delete/create), does the order of applying them change what trie nodes we need? Because if so, there might be a hidden step still needed during commit to add prefetch tasks (?)

jwasinger · 2024-04-15T20:38:45Z

Purpose is to act as a witness constructor (write only for now) during execution, so that while we're crunching some EVM code, our disk is kept busy pulling in data that both the hasher hasher, but also a cross-validator will need at the end.

It's actually meant to gather witnesses for read values. In the stateless witness builder PR, I gather write witnesses from committing the tries.

But iirc, earlier on the call today you mentioned not tying the retrieval of write witnesses to the commit operation, which would change the assumptions from my original code.

…ssociated subfetcher Co-authored-by: Martin HS <martin@swende.se> Co-authored-by: Péter Szilágyi <peterke@gmail.com>

karalabe · 2024-05-07T06:23:34Z

karalabe · 2024-05-07T07:00:06Z

core/state/state_object.go

+// if a prefetcher is available. This path is used if snapshots are unavailable,
+// since that requires reading the trie *during* execution, when the prefetchers
+// cannot yet return data.
+func (s *stateObject) getTrie(skipPrefetcher bool) (Trie, error) {


FWIW, skipPrefetcher is kind of an ugly hack, I just wanted to avoid the lack-of-snapshot poking into the prefetcher. Open to cleaner suggestions.

rjl493456442 · 2024-05-09T10:00:01Z

core/state/state_object.go

@@ -197,7 +216,7 @@ func (s *stateObject) GetCommittedState(key common.Hash) common.Hash {
 	// If the snapshot is unavailable or reading from it fails, load from the database.
 	if s.db.snap == nil || err != nil {
 		start := time.Now()
-		tr, err := s.getTrie()
+		tr, err := s.getTrie(true)


It's probably problematic?

Prefetcher is enabled if the associated snapshot layer is available

Retrievals from snapshot can succeed if the requested item is covered

Retrievals from snapshot can fail if the requested item is not covered yet(under the generation)

The scenario is possible a few data retrieval through snapshot succeed and some prefetching tasks have been scheduled; then a retrieval fails and the prefetch'd trie is abandoned

The witness of the trie could be incomplete

karalabe requested review from holiman and rjl493456442 as code owners April 12, 2024 12:07

rjl493456442 approved these changes Apr 15, 2024

View reviewed changes

karalabe commented Apr 15, 2024

View reviewed changes

jwasinger and others added 4 commits May 3, 2024 10:15

core/state: trie prefetcher change: calling trie() doesn't stop the a…

3da6b1c

…ssociated subfetcher Co-authored-by: Martin HS <martin@swende.se> Co-authored-by: Péter Szilágyi <peterke@gmail.com>

core/state: improve prefetcher

43babe4

core/state: restore async prefetcher stask scheduling

f166ce1

core/state: finish prefetching async and process storage updates async

f5ec2e7

karalabe force-pushed the stateless-witness-prefetcher-changes branch from 139448f to f5ec2e7 Compare May 3, 2024 10:34

core/state: don't use the prefetcher for missing snapshot items

6d3c6a1

jwasinger mentioned this pull request May 7, 2024

Stateless witness builder #29719

Draft

core/state: remove update concurrency for Verkle tries

e9b599c

karalabe commented May 7, 2024

View reviewed changes

karalabe added this to the 1.14.2 milestone May 7, 2024

core/state: add some termination checks to prefetcher async shutdowns

8d9f5ee

rjl493456442 reviewed May 9, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stateless witness prefetcher changes #29519

Stateless witness prefetcher changes #29519

karalabe commented Apr 12, 2024

karalabe Apr 15, 2024

karalabe Apr 15, 2024

karalabe Apr 15, 2024

karalabe Apr 15, 2024

karalabe Apr 15, 2024

jwasinger Apr 15, 2024

karalabe left a comment

holiman commented Apr 15, 2024

karalabe commented Apr 15, 2024

jwasinger commented Apr 15, 2024 •

edited

karalabe commented May 7, 2024

karalabe May 7, 2024

rjl493456442 May 9, 2024

Stateless witness prefetcher changes #29519

Are you sure you want to change the base?

Stateless witness prefetcher changes #29519

Conversation

karalabe commented Apr 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karalabe left a comment

Choose a reason for hiding this comment

holiman commented Apr 15, 2024

karalabe commented Apr 15, 2024

jwasinger commented Apr 15, 2024 • edited

karalabe commented May 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jwasinger commented Apr 15, 2024 •

edited