New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
union() is slower than pure Rego, can be up to O(n^3) #4979
Labels
Comments
@charlesdaniels Thank you for reporting this, and for the detailed analysis/example code. 😄 I think for now the easiest way to boost performance for this builtin would be to inline a chunk of the set unioning logic into the builtin's definition. That could get rid of a whole level of wasted allocations, since we'd not be throwing away temporary sets along the way. |
philipaconrad
added a commit
that referenced
this issue
Aug 11, 2022
This commit fixes a performance regression for the Set `union` builtin, discovered in issue #4979. The original logic for the builtin did pairwise `Set.Union` calls between the input sets, resulting in many wasted temporary Sets that were almost immediately discarded, and a bunch of duplicated work. The improved builtin inlines the logic from `Set.Union`, so that only one pass is made across the incoming sets' members. Fixes #4979 Signed-off-by: Philip Conrad <philipaconrad@gmail.com>
netbsd-srcmastr
pushed a commit
to NetBSD/pkgsrc
that referenced
this issue
Sep 13, 2022
Changes: ## 0.44.0 This release contains a number of fixes, two new builtins, a few new features, and several performance improvements. ### Security Fixes This release includes the security fixes present in the recent v0.43.1 release, which mitigate CVE-2022-36085. See the Release Notes for v0.43.1 for more details. ### Set Element Addition Optimization Rego Set element addition operations did not scale linearly in the past, and like the Object type before v0.43.0, experienced noticeable reallocation/memory movement overheads once the Set grew past 120k-150k elements in size. This release introduces different handling of Set internals during element addition operations to avoid pathological reallocation behavior, and allows linear performance scaling up into the 500k key range and beyond. ### Set `union` Built-in Optimization The Set `union` builtin allows applying the union operation to a set of sets. However, as discovered in <open-policy-agent/opa#4979>, its implementation generated unnecessary intermediate copies, which resulted in poor performance; in many cases, worse than writing the equivalent operation in pure Rego. This release improves the `union` builtin's implementation, such that only the final result set is ever modified, reducing memory allocations and GC pressure. The `union` builtin is now about 15-30% faster than the equivalent operation in pure Rego. ### New Built-in Functions: `strings.any_prefix_match` and `strings.any_suffix_match` This release introduces two new builtins, optimized for bulk matching of string prefixes and suffixes: `strings.any_prefix_match`, and `strings.any_suffix_match`. It works with sets and arrays of strings, allowing efficient matching of collections of prefixes or suffixes against a target string. See the built-in functions docs for all the details: <https://www.openpolicyagent.org/docs/v0.42.0/policy-reference/#builtin-strings-stringsany_prefix_match> ## 0.43.1 This is a security release fixing the following vulnerabilities: - CVE-2022-36085: Respect unsafeBuiltinMap for 'with' replacements in the compiler See <GHSA-f524-rf33-2jjr> for all details.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
union()
is very slow in Rego because of how it is implemented currently.Consider the implementation of the
union
builtin:And also
ast.set.Union()
:If we have$n$ many sets to union, of size $k$ each, then that's going to result in $n$ many calls to
ast.set.Union()
, except that each call toast.set.Union()
is going to end up creating a whole new set and copying all of the existing elements over to it.This means that each call to$O(k), O(2k), O(3k), \dots$ .$O\left( \frac{(n+1)!}{2!(n-2)!} \cdot k\right) = O\left(\frac{1}{2} k n (n^2-1)\right) \approx O(kn^3)$ .
ast.set.Union()
takes increasingly more time, specificallyThis means the running time is O(k * nth triangle number) =
Empirical Benchmarks
With the
union()
builtin:With pure Rego instead:
System info:
The text was updated successfully, but these errors were encountered: