Improve Iterator Performance of Seeking with Prefix #1719

zzyalbert · 2021-07-02T10:46:35Z

When I was trying to change the db engine to badgerdb in some of my projects, I found the iterator Seek with prefix was pretty slow in the following situation:

lots of keys we were seeking with prefix didn't exist
lots of keys have lots of versions

Then I use pprof to found out the iterator was still running parseItem even if the current key was not match the prefix.

So I fix this by skipping the parseItem process when the current key is not match the prefix.

This change is

CLAassistant · 2021-07-02T10:46:39Z

All committers have signed the CLA.

zzyalbert · 2021-07-16T12:41:19Z

@jarifibrahim Would you please review this pr?

jarifibrahim

Looks good to me. @NamanJain8 @ahsanbarkati should also review.

iterator.go

NamanJain8 · 2021-07-19T08:08:15Z

Thanks, @zzyalbert for raising the PR. I would need to verify though but I think that this would not work. parseItem is a complex function that does a lot of things. One of them being is calling extra Next on the MergeIterator.

It would be good if we could verify a few test cases like:
Consider a table at L5 with the following keys: ax. b and a table at L6 with keys ay, b. Now if we iterate through them using iterator with prefix a, then we should be able to access both ax and ay.

zzyalbert · 2021-07-19T09:43:23Z

Thanks, @zzyalbert for raising the PR. I would need to verify though but I think that this would not work. parseItem is a complex function that does a lot of things. One of them being is calling extra Next on the MergeIterator.

It would be good if we could verify a few test cases like:
Consider a table at L5 with the following keys: ax. b and a table at L6 with keys ay, b. Now if we iterate through them using iterator with prefix a, then we should be able to access both ax and ay.

Theoretically, both ax and ay could be accessed, because this pr only skipped parseItem when inner iterator's current key had no prefix. If we iterated with prefix a when we reached ax, parseItem would also be called because ax still had the prefix a

stale · 2021-08-22T18:37:04Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale · 2021-09-03T02:20:06Z

This issue was marked as stale and no activity has occurred since then, therefore it will now be closed. Please, reopen if the issue is still relevant.

AlexMackowiak · 2021-09-21T20:13:47Z

Any update here? In my application, I need to do iterator Seek() with a prefix, and according to pprof this is responsible for >90% of the total time spent doing range operations. Any optimizations here would be greatly appreciated!

NamanJain8 · 2021-09-22T15:00:34Z

Hi @AlexMackowiak , the change looks great to me as far as correctness is concerned. Also, it would be useful for the case where the iterations are made upon the prefixes that have no keys.
But in the general case, where the keys actually exist for the prefix, this would be a slight overhead. I would discuss this internally with the team and report back.

Thanks for the PR. :)

manishrjain

Can we do some benchmarks here? One, where all the keys are present -- so we can see what overhead does this code bring. And 2. When the prefix is absent, so we can understand what gains we achieve?

Reviewable status: 0 of 1 files reviewed, 2 unresolved discussions (waiting on @jarifibrahim)

NamanJain8 · 2021-09-22T17:33:07Z

@zzyalbert / @AlexMackowiak, can you please add benchmarks for this? If that looks promising, then we can proceed further on the basis of them. I am happy to provide any help that you need to do that.

AlexMackowiak · 2021-09-24T22:36:46Z

@NamanJain8 Sorry for the delay, I had some pre-existing benchmark tests for the key-value store my company is building on top of badger. Running these against the two different implementations seems to look promising, especially for reading from large ranges with pagination.

I have attached the raw benchmarking data below, let me know if you need anything else!

Badger PR Benchmark.xlsx

AlexMackowiak · 2021-11-18T23:14:25Z

@NamanJain8 Any update on merging here? Or even like a config option for this behavior? The benchmarks I took show that this change would be quite beneficial at least for my company's use case.

whess96

~~Lgtm!~~ Lol wrong page was open. But yes, this PR also lgtm.

Signed-off-by: thomassong <thomassong2012@gmail.com>

zzyalbert requested a review from manishrjain as a code owner July 2, 2021 10:46

jarifibrahim approved these changes Jul 19, 2021

View reviewed changes

iterator.go Outdated Show resolved Hide resolved

iterator.go Outdated Show resolved Hide resolved

stale bot added the status/stale The issue hasn't had activity for a while and it's marked for closing. label Aug 22, 2021

stale bot closed this Sep 3, 2021

NamanJain8 reopened this Sep 3, 2021

stale bot removed the status/stale The issue hasn't had activity for a while and it's marked for closing. label Sep 3, 2021

NamanJain8 added the skip/stale Skip stalebot label Sep 3, 2021

manishrjain reviewed Sep 22, 2021

View reviewed changes

AlexMackowiak mentioned this pull request Mar 20, 2022

Seek iterator with Reverse doesn't work #436

Closed

whess96 approved these changes Mar 29, 2022

View reviewed changes

joshua-goldstein added area/performance Performance related issues. and removed skip/stale Skip stalebot labels Nov 4, 2022

mYmNeo added a commit to mYmNeo/badger that referenced this pull request Jan 18, 2023

Improve Iterator Performance of Seeking with Prefix dgraph-io#1719

7c7dc31

Signed-off-by: thomassong <thomassong2012@gmail.com>

zzyalbert added 2 commits February 6, 2023 16:12

improve iterator performance of seeking with prefix

7b56978

fix typo

2ec98c3

joshua-goldstein force-pushed the feature/improve_iterator_prefix_seek branch from 2ee9c97 to 2ec98c3 Compare February 6, 2023 22:12

joshua-goldstein requested review from akon-dey, billprovince and joshua-goldstein as code owners February 6, 2023 22:12

joshua-goldstein requested a review from skrdgraph as a code owner February 6, 2023 22:12

joshua-goldstein changed the base branch from master to main February 6, 2023 22:12

joshua-goldstein changed the base branch from main to master February 6, 2023 22:13

mYmNeo added a commit to mYmNeo/badger that referenced this pull request Feb 13, 2023

Improve Iterator Performance of Seeking with Prefix dgraph-io#1719

cae7356

Signed-off-by: thomassong <thomassong2012@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Iterator Performance of Seeking with Prefix #1719

Improve Iterator Performance of Seeking with Prefix #1719

zzyalbert commented Jul 2, 2021 •

edited by manishrjain

CLAassistant commented Jul 2, 2021 •

edited

zzyalbert commented Jul 16, 2021

jarifibrahim left a comment

NamanJain8 commented Jul 19, 2021

zzyalbert commented Jul 19, 2021

stale bot commented Aug 22, 2021

stale bot commented Sep 3, 2021

AlexMackowiak commented Sep 21, 2021

NamanJain8 commented Sep 22, 2021

manishrjain left a comment

NamanJain8 commented Sep 22, 2021 •

edited

AlexMackowiak commented Sep 24, 2021

AlexMackowiak commented Nov 18, 2021

whess96 left a comment •

edited

Improve Iterator Performance of Seeking with Prefix #1719

Are you sure you want to change the base?

Improve Iterator Performance of Seeking with Prefix #1719

Conversation

zzyalbert commented Jul 2, 2021 • edited by manishrjain

CLAassistant commented Jul 2, 2021 • edited

zzyalbert commented Jul 16, 2021

jarifibrahim left a comment

Choose a reason for hiding this comment

NamanJain8 commented Jul 19, 2021

zzyalbert commented Jul 19, 2021

stale bot commented Aug 22, 2021

stale bot commented Sep 3, 2021

AlexMackowiak commented Sep 21, 2021

NamanJain8 commented Sep 22, 2021

manishrjain left a comment

Choose a reason for hiding this comment

NamanJain8 commented Sep 22, 2021 • edited

AlexMackowiak commented Sep 24, 2021

AlexMackowiak commented Nov 18, 2021

whess96 left a comment • edited

Choose a reason for hiding this comment

zzyalbert commented Jul 2, 2021 •

edited by manishrjain

CLAassistant commented Jul 2, 2021 •

edited

NamanJain8 commented Sep 22, 2021 •

edited

whess96 left a comment •

edited