Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Iterator Performance of Seeking with Prefix #1719

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

zzyalbert
Copy link

@zzyalbert zzyalbert commented Jul 2, 2021

When I was trying to change the db engine to badgerdb in some of my projects, I found the iterator Seek with prefix was pretty slow in the following situation:

  • lots of keys we were seeking with prefix didn't exist
  • lots of keys have lots of versions

Then I use pprof to found out the iterator was still running parseItem even if the current key was not match the prefix.
image

So I fix this by skipping the parseItem process when the current key is not match the prefix.


This change is Reviewable

@CLAassistant
Copy link

CLAassistant commented Jul 2, 2021

CLA assistant check
All committers have signed the CLA.

@zzyalbert
Copy link
Author

@jarifibrahim Would you please review this pr?

Copy link
Contributor

@jarifibrahim jarifibrahim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. @NamanJain8 @ahsanbarkati should also review.

iterator.go Outdated Show resolved Hide resolved
iterator.go Outdated Show resolved Hide resolved
@NamanJain8
Copy link
Contributor

Thanks, @zzyalbert for raising the PR. I would need to verify though but I think that this would not work. parseItem is a complex function that does a lot of things. One of them being is calling extra Next on the MergeIterator.

It would be good if we could verify a few test cases like:
Consider a table at L5 with the following keys: ax. b and a table at L6 with keys ay, b. Now if we iterate through them using iterator with prefix a, then we should be able to access both ax and ay.

@zzyalbert
Copy link
Author

Thanks, @zzyalbert for raising the PR. I would need to verify though but I think that this would not work. parseItem is a complex function that does a lot of things. One of them being is calling extra Next on the MergeIterator.

It would be good if we could verify a few test cases like:
Consider a table at L5 with the following keys: ax. b and a table at L6 with keys ay, b. Now if we iterate through them using iterator with prefix a, then we should be able to access both ax and ay.

Theoretically, both ax and ay could be accessed, because this pr only skipped parseItem when inner iterator's current key had no prefix. If we iterated with prefix a when we reached ax, parseItem would also be called because ax still had the prefix a

@stale
Copy link

stale bot commented Aug 22, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the status/stale The issue hasn't had activity for a while and it's marked for closing. label Aug 22, 2021
@stale
Copy link

stale bot commented Sep 3, 2021

This issue was marked as stale and no activity has occurred since then, therefore it will now be closed. Please, reopen if the issue is still relevant.

@stale stale bot closed this Sep 3, 2021
@NamanJain8 NamanJain8 reopened this Sep 3, 2021
@stale stale bot removed the status/stale The issue hasn't had activity for a while and it's marked for closing. label Sep 3, 2021
@NamanJain8 NamanJain8 added the skip/stale Skip stalebot label Sep 3, 2021
@AlexMackowiak
Copy link

Any update here? In my application, I need to do iterator Seek() with a prefix, and according to pprof this is responsible for >90% of the total time spent doing range operations. Any optimizations here would be greatly appreciated!

image

@NamanJain8
Copy link
Contributor

Hi @AlexMackowiak , the change looks great to me as far as correctness is concerned. Also, it would be useful for the case where the iterations are made upon the prefixes that have no keys.
But in the general case, where the keys actually exist for the prefix, this would be a slight overhead. I would discuss this internally with the team and report back.

Thanks for the PR. :)

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do some benchmarks here? One, where all the keys are present -- so we can see what overhead does this code bring. And 2. When the prefix is absent, so we can understand what gains we achieve?

Reviewable status: 0 of 1 files reviewed, 2 unresolved discussions (waiting on @jarifibrahim)

@NamanJain8
Copy link
Contributor

NamanJain8 commented Sep 22, 2021

@zzyalbert / @AlexMackowiak, can you please add benchmarks for this? If that looks promising, then we can proceed further on the basis of them. I am happy to provide any help that you need to do that.

@AlexMackowiak
Copy link

@NamanJain8 Sorry for the delay, I had some pre-existing benchmark tests for the key-value store my company is building on top of badger. Running these against the two different implementations seems to look promising, especially for reading from large ranges with pagination.

I have attached the raw benchmarking data below, let me know if you need anything else!
image
Badger PR Benchmark.xlsx

@AlexMackowiak
Copy link

@NamanJain8 Any update on merging here? Or even like a config option for this behavior? The benchmarks I took show that this change would be quite beneficial at least for my company's use case.

Copy link

@whess96 whess96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm! Lol wrong page was open. But yes, this PR also lgtm.

@joshua-goldstein joshua-goldstein added area/performance Performance related issues. and removed skip/stale Skip stalebot labels Nov 4, 2022
mYmNeo added a commit to mYmNeo/badger that referenced this pull request Jan 18, 2023
Signed-off-by: thomassong <thomassong2012@gmail.com>
@joshua-goldstein joshua-goldstein changed the base branch from master to main February 6, 2023 22:12
@joshua-goldstein joshua-goldstein changed the base branch from main to master February 6, 2023 22:13
mYmNeo added a commit to mYmNeo/badger that referenced this pull request Feb 13, 2023
Signed-off-by: thomassong <thomassong2012@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance Performance related issues.
Development

Successfully merging this pull request may close these issues.

None yet

8 participants