Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix getLogs query (avoid repeated execution, separate tx, and improve performance) #250

Merged
merged 5 commits into from
Jun 16, 2023

Conversation

telackey
Copy link
Contributor

@telackey telackey commented Jun 15, 2023

Fix for #248 and #249.

This does a few things:

  1. Support for "earliest", "latest", etc. block specifiers. This did not work before.

  2. The default "fromBlock" is now "latest" to match geth.

  3. The old code did a loop across fromBlock:toBlock, executing a query on each one inside a big tx (!). This does a single query, with no separate tx.

  4. Add a max range limit (default 500) with an error message if the range is too broad (I used the same error message as cloudflare)

  5. Simplify the query a good bit. This improved performance significantly in my manual testing adapted to v4. The old version, adapted to use a small range of 50 blocks took over 20s. The new version took about 700ms. A query across 500 blocks with 150K results takes less than 4s.

Some additional test cases added here: https://git.vdb.to/cerc-io/system-tests/src/branch/main/test/test_logs.py

@telackey telackey requested a review from i-norden June 15, 2023 00:31
@telackey telackey self-assigned this Jun 15, 2023
Copy link
Collaborator

@i-norden i-norden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once the config is added. 100 might be low (I imagine geth doesn't timeout at that range size). And while we are at it we should add the missing canonical_header_hash condition on the getLogs query.

pkg/eth/sql.go Show resolved Hide resolved
@telackey
Copy link
Contributor Author

I rewrote the query to select the canonical hash of the blocks in specified range, then check for inclusion in that set. I also simplified the query structure a good bit. I had to use a v4 database for comparison, but using a small block range of 50 was running about 23s on the old query, 700ms on the new.

@telackey
Copy link
Contributor Author

I also upped the limit to 500, as that is still decently quick for me (3 to 4s) on blocks 17,000,000 to 17,000,500 with about 150,727 results (not that I'm sure we would want to send that many...)

@telackey telackey marked this pull request as ready for review June 16, 2023 02:11
@telackey telackey changed the title WIP: Avoid repeated queries in getLogs. Fix getLogs query (avoid repeated execution, separate tx, and improve performance) Jun 16, 2023
@telackey telackey requested a review from dboreham June 16, 2023 02:34
@telackey telackey merged commit 39f8b6e into v5 Jun 16, 2023
5 checks passed
@telackey telackey deleted the telackey/249 branch June 16, 2023 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants