Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficient Ethereum Log Queries #56

Open
hkalodner opened this issue Aug 30, 2019 · 2 comments
Open

Efficient Ethereum Log Queries #56

hkalodner opened this issue Aug 30, 2019 · 2 comments
Labels
arb-compiler-evm enhancement New feature or request
Projects
Milestone

Comments

@hkalodner
Copy link
Contributor

Currently there is no efficient trustless way to query for ethereum events output by Arbitrum contracts. I'll begin by reviewing how logs work in ethereum (which are the compilation target of events) and what properties they're expected to have.

Ethereum Logs

A log in Ethereum is a list of up to 4 32 byte values referred to as topics along with an arbitrary sized data field. An ethereum block contains a bloom filter which contains each topic within each log output by any transaction within the block. The correctness of this bloom filter is enforced by consensus. If a user wishes to find all logs with a certain topic within a certain range of block heights, they first download all of the block headers and log bloom filters for those blocks and then do an inclusion check on each of the bloom filters. Next they download the receipts for every transaction from each of the blocks which matches the bloom filter. Finally they search each of those transactions for a matching log output.

When contracts output matching events relatively infrequently, this process can be performed much more efficiently than having to check every single transaction. This process is also trustless since bloom filters are guaranteed to have no false negatives. Note that one might attempt to solve this problem by asking an untrusted node to just send you all events matching a given query. It would be easy to verify that all events returned correctly match the query, but it would be impossible to guarantee

Arbitrum Logs

The Arbitrum VM supports two forms of verifiable output. Messages which transfer data and value which must post their data on chain to guarantee data availability to non-validator parties, and logs which are just a commitment to data from the VM and only show up in an accumulator attached to each assertion.

Currently a log entry is created for every transaction processed which essentially specifies the receipt for that transaction which includes the return data and a list of logged events. In order to find all events matching a given query, you must look at every single logged value. This eliminates the bloom filter optimization that Ethereum provides making it much more expensive to filter events.

Suggested changes

Arbitrum could reach a similar level of performance if Arbitrum VM's log receipts for blocks of transactions rather than single transactions. One way to accomplish this would be to log a block of transactions along with a bloom filter periodically, say once per 2 seconds. This would move Arbitrum VM's even closer to mimicking Ethereum's functionality which we have already done in many other ways.

This would allow light clients to calculate events matching a given query efficiently while maintaining an AnyTrust validity guarantee.

@hkalodner hkalodner added enhancement New feature or request arb-compiler-evm labels Aug 30, 2019
@hkalodner hkalodner added this to the Alpha 3 milestone Aug 30, 2019
@hkalodner hkalodner added this to To do in Alpha 3 Aug 30, 2019
@hkalodner hkalodner removed this from To do in Alpha 3 Sep 27, 2019
@edfelten
Copy link
Contributor

edfelten commented Jan 3, 2020

The expected setup is that some node has a full list of the log outputs and can supply the result of a query. What we want is for the Arbitrum VM code to emit periodically an accumulator hash of the log history, in such a way that the result of a query can come with a concise proof of correctness, verifiable by a client who only knows a recent accumulator hash.

@edfelten
Copy link
Contributor

Propose a design roughly like this:

VM publishes a simple hash-chain* of log items, each with a sequence number.

(If we prefer, the hash-chain* structure can really be a structure like a Merkle mountain range, which allows log-time proof of contents at any sequence number. Or we can stick with simpler linear chain.)

After every N log items, VM also publishes a summary block. Summary blocks are also hash-chained* together. Each summary block includes a Bloom timestamp data structure.

Bloom timestamp data structure is like a Bloom filter, except that instead of having a boolean in each slot, each slot instead records the latest summary block number whose log-item group touches that Bloom filter slot. If a topic hits several Bloom filter slots, we can take the min of the Bloom timestamp slots it touches, and we know that the topic doesn't appear in groups before that one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arb-compiler-evm enhancement New feature or request
Projects
No open projects
Alpha 4
Awaiting triage
Development

No branches or pull requests

2 participants