New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving "Sparse postings" intersection #13971
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: alanprot <alanprot@gmail.com>
I tried adding a length method here #13093, maybe worth revisiting? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the whole BenchmarkQuerier
look like compared to main
?
I did not run the whole thing but its mostly showing the improvement in that case (99% though)... Idk if is worth it... but it seems to make sense as "why would we deprioritize some postings over others?" But idk tbh! hhaha
|
Previously, the intersectPostings algorithm prioritized iterating through posting lists until finding intersections between them, neglecting the possibility of other lists having intersections beforehand. Consider the following example:
P1: [2, 5, 9, 18, 21]
P2: [3, 7, 14, 19, 21]
P3: [1, 21]
The algorithm would only advance through P1 and P2 until discovering an intersection and then checking P3. In essence, the traversal order was: 2, 3, 5, 7, 9, 14, 18, 19, 21 (intersection found).
With the proposed change, P3 is also examined even if P1 and P2 haven't found an intersection yet. This adjustment allows for the possibility of skipping some iterations.
Post-change, the traversal order becomes: 2, 3, 21 (3 iterations instead of 9).
To validate the improvement, benchmarks were adjusted to simulate this scenario. Additionally, calling next on the resulting postings demonstrates the benefits. In this extreme case, a significant 97% reduction in time is observed.
Ideally we could sort the
arr []Postings
by size before iterating but, unfortunately, the postings interface does not allow us to retrieve the underlining size.