topdown/sets_bench_test: Add `intersection` builtin benchmarks. #5000

philipaconrad · 2022-08-10T23:14:25Z

This PR is an experiment to try to improve performance for the set intersection builtin, inspired by work done in #4980 on the set union builtin.

The original logic for the builtin did pairwise Set.Intersection calls between the input sets, theoretically resulting in some wasted intermediate sets. Ideally, we'd like to minimize wasted allocations, and get a faster and more efficient solution.

philipaconrad · 2022-08-12T20:22:25Z

Local benchmarks are not showing much, if any, improvement over doing set intersection in the naive pairwise way. I suspect this is due to the intersection algorithm guaranteeing equal or smaller sets as iteration progresses, eliminating most of the win potential observed for union.

I'll play around with one or two more ideas in this space before dropping this PR though. 😃

srenatus

Some nitpicks -- sorry for reviewing a draft PR, I couldn't resist :D

srenatus · 2022-08-17T18:26:47Z

topdown/sets.go

+		return nil, err
+	}
+	presentInAll = make(map[*ast.Term]struct{}, first.Len())
+	first.Iter(func(x *ast.Term) error {


[nit] Iter(func(*ast.Term) error) -> Foreach(func(*ast.Term))

srenatus · 2022-08-17T18:28:19Z

topdown/sets.go

+	// Add any surviving Terms to the output set.
+	for k := range presentInAll {
+		result.Add(k)
+	}
 	return result, err


💭 err? Let's put an if err != nil { ... } right after the err = and return result, nil here

philipaconrad · 2022-08-23T21:55:08Z

I've tinkered around with a few different approaches for trying to do set intersection more efficiently, and I can't beat the original implementation, at least with the benchmark that I have (which generates N identical sets, the worst case).

The best case (no keys match across all sets), and the average case (a few keys match across all sets) are not benchmarked currently, and those might be worth exploring. A solution that dramatically improves average-case performance with only a minor worst-case penalty might be valuable.

This commit adds tests for the `intersection` Set builtin, and cleans up the existing tests with a new data generator function. Signed-off-by: Philip Conrad <philipaconrad@gmail.com>

philipaconrad · 2022-09-09T22:26:33Z

I couldn't get a meaningful speedup, even with relatively pathological input sets-of-sets. I'm throwing in the towel on this for now, and have changed the PR title to reflect a massive cutting-back in scope.

This PR now is limited to adding 2x new benchmarks, and cleaning up how data is generated for both the intersection and union benchmarks.

srenatus

Thanks for wrapping this up. Keeping the benchmarks is a good idea!

philipaconrad added the optimization label Aug 10, 2022

philipaconrad requested review from tsandall and srenatus August 10, 2022 23:14

philipaconrad self-assigned this Aug 10, 2022

philipaconrad added this to In Progress in Open Policy Agent via automation Aug 10, 2022

philipaconrad force-pushed the set-intersection-logic-hoist branch 2 times, most recently from fdfc258 to 5f7f55b Compare August 12, 2022 17:29

philipaconrad removed request for tsandall and srenatus August 12, 2022 20:20

philipaconrad force-pushed the set-intersection-logic-hoist branch from 5f7f55b to fb8cb7a Compare August 17, 2022 18:06

srenatus reviewed Aug 17, 2022

View reviewed changes

philipaconrad force-pushed the set-intersection-logic-hoist branch from fb8cb7a to c43119c Compare August 23, 2022 21:29

topdown/sets_bench_test: Add intersection builtin tests.

438e5df

This commit adds tests for the `intersection` Set builtin, and cleans up the existing tests with a new data generator function. Signed-off-by: Philip Conrad <philipaconrad@gmail.com>

philipaconrad force-pushed the set-intersection-logic-hoist branch from c531d15 to 438e5df Compare September 9, 2022 22:23

philipaconrad changed the title ~~builtins: Speed up set intersections~~ topdown/sets_bench_test: Add intersection builtin benchmarks. Sep 9, 2022

philipaconrad marked this pull request as ready for review September 9, 2022 22:26

philipaconrad requested a review from srenatus September 9, 2022 22:26

philipaconrad removed the optimization label Sep 9, 2022

srenatus approved these changes Sep 10, 2022

View reviewed changes

srenatus merged commit cb4cf0d into open-policy-agent:main Sep 10, 2022

Open Policy Agent automation moved this from In Progress to Done Sep 10, 2022

philipaconrad deleted the set-intersection-logic-hoist branch September 14, 2022 20:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

topdown/sets_bench_test: Add `intersection` builtin benchmarks. #5000

topdown/sets_bench_test: Add `intersection` builtin benchmarks. #5000

philipaconrad commented Aug 10, 2022 •

edited

philipaconrad commented Aug 12, 2022

srenatus left a comment

srenatus Aug 17, 2022

srenatus Aug 17, 2022

philipaconrad commented Aug 23, 2022 •

edited

philipaconrad commented Sep 9, 2022

srenatus left a comment

topdown/sets_bench_test: Add intersection builtin benchmarks. #5000

topdown/sets_bench_test: Add intersection builtin benchmarks. #5000

Conversation

philipaconrad commented Aug 10, 2022 • edited

philipaconrad commented Aug 12, 2022

srenatus left a comment

Choose a reason for hiding this comment

srenatus Aug 17, 2022

Choose a reason for hiding this comment

srenatus Aug 17, 2022

Choose a reason for hiding this comment

philipaconrad commented Aug 23, 2022 • edited

philipaconrad commented Sep 9, 2022

srenatus left a comment

Choose a reason for hiding this comment

topdown/sets_bench_test: Add `intersection` builtin benchmarks. #5000

topdown/sets_bench_test: Add `intersection` builtin benchmarks. #5000

philipaconrad commented Aug 10, 2022 •

edited

philipaconrad commented Aug 23, 2022 •

edited