object.union_n() is superlinear on the number of objects to merge #4985

charlesdaniels · 2022-08-08T21:47:11Z

This is conceptually the same problem from #4979.

Suppose we are unioning together $n$ many objects of $k$ keys each. topdown.builtinObjectUnionN() will make $n$ many calls to topdown.mergeWithOverwrite(), implying $\Omega(1)$ calls to ast.object.MergeWith(). Though MergeWith appears linear at first glance, the calls are not independent from one another. The first call is merging two objects of $k$ keys each, the second call merges an object of $2k$ keys with an object of $k$ keys, and so on. Thus the actual work done by MergeWith in the inner loop ends up being: $O\left(2k + 3k + 4k + 5k + \dots\right) = O\left(k \left(\frac12 n (n+1) -1\right)\right) \approx O(kn^2)$.

This occurs in the optimal case where $\Omega(1) = 1$. If the objects contain complex nested objects as values rather than primitive types, then the actual running time could be substantially worse.

Empirical Benchmarks

With the object.union_n() builtin:

$ cat slowunion/slowunion.rego
package slowunion

SIZE := 250
NSETS := 250

nums := numbers.range(0, NSETS)
sizes := numbers.range(0, SIZE)

sets[n] = x {
	nums[n]
	x := {sprintf("%d,%d", [n, i]): true | sizes[i]}
}

combined := object.union_n([s | s := sets[_]])
#combined := {k: true | s := sets[_]; s[k]}
$ time opa eval --profile --format pretty --bundle ./slowunion/ --import "data.slowunion" 'count(data.slowunion.combined)'
63001
+------------------------------+------------+
|            METRIC            |   VALUE    |
+------------------------------+------------+
| timer_rego_load_bundles_ns   | 533250     |
| timer_rego_module_compile_ns | 708000     |
| timer_rego_module_parse_ns   | 210625     |
| timer_rego_query_compile_ns  | 46625      |
| timer_rego_query_eval_ns     | 3028944250 |
| timer_rego_query_parse_ns    | 33875      |
+------------------------------+------------+
+-------------+----------+----------+-----------------------------+
|    TIME     | NUM EVAL | NUM REDO |          LOCATION           |
+-------------+----------+----------+-----------------------------+
| 2.88952125s | 4        | 254      | slowunion/slowunion.rego:14 |
| 110.7097ms  | 63503    | 126253   | slowunion/slowunion.rego:11 |
| 18.169001ms | 2        | 2        | data.slowunion.combined     |
| 154.583µs   | 3        | 3        | slowunion/slowunion.rego:6  |
| 114.531µs   | 1        | 251      | slowunion/slowunion.rego:10 |
| 91.833µs    | 3        | 3        | slowunion/slowunion.rego:7  |
| 4.458µs     | 1        | 1        | slowunion/slowunion.rego:4  |
| 2.042µs     | 1        | 1        | slowunion/slowunion.rego:3  |
+-------------+----------+----------+-----------------------------+

real	0m3.055s
user	0m4.560s
sys	0m0.153s

With pure Rego:

$ cat slowunion/slowunion.rego
package slowunion

SIZE := 250
NSETS := 250

nums := numbers.range(0, NSETS)
sizes := numbers.range(0, SIZE)

sets[n] = x {
	nums[n]
	x := {sprintf("%d,%d", [n, i]): true | sizes[i]}
}

#combined := object.union_n([s | s := sets[_]])
combined := {k: true | s := sets[_]; s[k]}
$ time opa eval --profile --format pretty --bundle ./slowunion/ --import "data.slowunion" 'count(data.slowunion.combined)'
63001
+------------------------------+-----------+
|            METRIC            |   VALUE   |
+------------------------------+-----------+
| timer_rego_load_bundles_ns   | 554167    |
| timer_rego_module_compile_ns | 710166    |
| timer_rego_module_parse_ns   | 154792    |
| timer_rego_query_compile_ns  | 48458     |
| timer_rego_query_eval_ns     | 194535167 |
| timer_rego_query_parse_ns    | 32042     |
+------------------------------+-----------+
+--------------+----------+----------+-----------------------------+
|     TIME     | NUM EVAL | NUM REDO |          LOCATION           |
+--------------+----------+----------+-----------------------------+
| 111.776641ms | 63503    | 126253   | slowunion/slowunion.rego:11 |
| 48.923215ms  | 254      | 63254    | slowunion/slowunion.rego:15 |
| 20.819749ms  | 2        | 2        | data.slowunion.combined     |
| 160.292µs    | 3        | 3        | slowunion/slowunion.rego:6  |
| 106.874µs    | 1        | 251      | slowunion/slowunion.rego:10 |
| 93.332µs     | 3        | 3        | slowunion/slowunion.rego:7  |
| 4.584µs      | 1        | 1        | slowunion/slowunion.rego:4  |
| 2.084µs      | 1        | 1        | slowunion/slowunion.rego:3  |
+--------------+----------+----------+-----------------------------+

real	0m0.218s
user	0m0.327s
sys	0m0.024s

System info:

$ uname -a
Darwin styra-macbook.local 21.5.0 Darwin Kernel Version 21.5.0: Tue Apr 26 21:08:37 PDT 2022; root:xnu-8020.121.3~4/RELEASE_ARM64_T6000 arm64
$ system_profiler SPSoftwareDataType SPHardwareDataType
Software:

    System Software Overview:

      System Version: macOS 12.4 (21F79)
      Kernel Version: Darwin 21.5.0
      Boot Volume: Macintosh HD
      Boot Mode: Normal
      Computer Name: styra-macbook
      User Name: Charles Daniels (cad)
      Secure Virtual Memory: Enabled
      System Integrity Protection: Enabled
      Time since boot: 5 days 21:41

Hardware:

    Hardware Overview:

      Model Name: MacBook Pro
      Model Identifier: MacBookPro18,3
      Chip: Apple M1 Pro
      Total Number of Cores: 8 (6 performance and 2 efficiency)
      Memory: 32 GB
      System Firmware Version: 7459.121.3
      OS Loader Version: 7459.121.3
      Serial Number (system): REDACTED
      Hardware UUID: REDACTED
      Provisioning UDID: REDACTED
      Activation Lock Status: Disabled

$ opa version
Version: 0.43.0
Build Commit: d75bbdd
Build Timestamp: 2022-07-29T21:15:42Z
Build Hostname: Mac-1659129125033.local
Go Version: go1.18.4
Platform: darwin/arm64
WebAssembly: unavailable

Note that unlike in #4979, this is not an apples-to-apples comparison, because object.union_n() is not expressible in the general case (to my knowledge) using pure Rego. The fact that object.union_n() is slower than the given Rego code is not an "obvious" problem in of itself, it's the fact that the it grows far faster than linearly on the number of objects to merge.

Although the asymptotic analysis is sufficient on it's own to demonstrate this, I have implemented the following ad-hoc benchmark to demonstrate. Note that this needs a version of Bash new enough to support EPOCHREALTIME. Notice that though nsets varies with each run of the benchmark, size is fixed at 250 for all of them. This in the previous asymptotic analysis, $n$ is allowed to vary, and $k$ is fixed at 250.

$ cat slowunion/slowunion.rego
package slowunion

nums := numbers.range(0, input.nsets)
sizes := numbers.range(0, input.size)

sets[n] = x {
	nums[n]
	x := {sprintf("%d,%d", [n, i]): true | sizes[i]}
}

combined := object.union_n([s | s := sets[_]])
#combined := {k: true | s := sets[_]; s[k]}
$ rm out.csv
$ for n in 10 100 200 300 400 500 ; do START=$EPOCHREALTIME ; time opa eval --format pretty --bundle ./slowunion/ --import "data.slowunion" 'count(data.slowunion.combined) with input as {"nsets": '$n', "size": 250}' ; echo "$n,$START,$EPOCHREALTIME" >> out.csv ; done
$ cat out.csv
10,1659994724.161146,1659994724.199003
100,1659994724.199294,1659994724.692205
200,1659994724.692459,1659994726.621330
300,1659994726.621561,1659994731.005600
400,1659994731.005872,1659994738.928419
500,1659994738.928674,1659994751.609016
$ awk 'BEGIN{FS=","} {printf("n=%d, elapsed seconds=%f\n", $1, (1.0 * $3 - 1.0 * $2))}' < out.csv
n=10, elapsed seconds=0.037857
n=100, elapsed seconds=0.492911
n=200, elapsed seconds=1.928871
n=300, elapsed seconds=4.384039
n=400, elapsed seconds=7.922547
n=500, elapsed seconds=12.680342

As we can see, between n=100 and n=500 (factor of 5x increase in input size), the running time has gone from 0.49s to 12.68s, a 25x increase.

The text was updated successfully, but these errors were encountered:

This commit fixes a performance regression for the object.union_n builtin, discovered in issue open-policy-agent#4985. The original logic for the builtin did pairwise mergeWithOverwrite calls between the input Objects, resulting in many wasted intermediate result Objects that were almost immediately discarded. The updated builtin uses a new algorithm to do a single pass across the input Objects, respecting the "last assignment wins, with merges" semantics of the original builtin implementation. In the included benchmarks, this provides a 2x speed and 2-3x memory efficiency improvement over using a pure-Rego comprehension to do the same job, and a 6x or greater improvement over the original implementation on all metrics as input Object arrays grow larger. Fixes open-policy-agent#4985 Signed-off-by: Philip Conrad <philipaconrad@gmail.com>

#5127) This commit fixes a performance regression for the object.union_n builtin, discovered in issue #4985. The original logic for the builtin did pairwise mergeWithOverwrite calls between the input Objects, resulting in many wasted intermediate result Objects that were almost immediately discarded. The updated builtin uses a new algorithm to do a single pass across the input Objects, respecting the "last assignment wins, with merges" semantics of the original builtin implementation. In the included benchmarks, this provides a 2x speed and 2-3x memory efficiency improvement over using a pure-Rego comprehension to do the same job, and a 6x or greater improvement over the original implementation on all metrics as input Object arrays grow larger. Fixes #4985 Signed-off-by: Philip Conrad <philipaconrad@gmail.com>

charlesdaniels added the bug label Aug 8, 2022

anderseknert added feature-request and removed bug labels Aug 29, 2022

philipaconrad mentioned this issue Sep 12, 2022

topdown/object: Rework object.union_n to use in-place merge algorithm. #5127

Merged

srenatus closed this as completed in #5127 Sep 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

object.union_n() is superlinear on the number of objects to merge #4985

object.union_n() is superlinear on the number of objects to merge #4985

charlesdaniels commented Aug 8, 2022

object.union_n() is superlinear on the number of objects to merge #4985

object.union_n() is superlinear on the number of objects to merge #4985

Comments

charlesdaniels commented Aug 8, 2022

Empirical Benchmarks