-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
topdown/object: Rework object.union_n
to use in-place merge algorithm.
#5127
topdown/object: Rework object.union_n
to use in-place merge algorithm.
#5127
Conversation
e544e93
to
05f71b9
Compare
EDIT: There was indeed a subtle issue in iterating backwards through the list of objects. Details about the solution are covered in the comment thread below. I've added unit tests to catch some of the edge cases that were missed earlier. Interestingly, the corrected code has better performance on large objects in the benchmarks than before, because entire sub-trees of objects can be skipped for recursive merging once they've been "frozen". 😄 |
05f71b9
to
ba63c06
Compare
// Example: | ||
// Input: [{"a": {"b": 2}}, {"a": 4}, {"a": {"c": 3}}] | ||
// Want Output: {"a": {"c": 3}} | ||
result := ast.NewObject() | ||
frozenKeys := map[*ast.Term]struct{}{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a way to know when a key has been "frozen" by an assignment, such that any objects ahead of it are not allowed to be used for recursive merging while we iterate backwards through the list of objects.
The above example is illustrative of the issue. The middle assignment "blocks" the first assignment of the key "a"
, so we need some way to ensure we don't accidentally try to merge the value {"b": 2}
into the final result object.
Luckily, because pointers to distinct keys in the result object will be unique, we can create a set of the keys in the result Object that should be "frozen" by using a map as a set-like data structure. This set of frozen keys is passed down into the recursive merging function.
if ok1 && ok2 { | ||
// Check to make sure that this key isn't frozen before merging. | ||
if _, ok := frozenKeys[v2]; !ok { | ||
mergewithOverwriteInPlace(originalValueObj, updateValueObj, frozenKeys) | ||
} | ||
} else { | ||
// Else, original value wins. Freeze the key. | ||
frozenKeys[v2] = struct{}{} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we see two objects, and the key isn't in the "frozen" set, we can merge them recursively.
However, if one of the values did not correspond to an object type, we should freeze the key because we've found a "blocking" assignment, such that it would have overwritten everything that came before it.
ba63c06
to
9b91696
Compare
9b91696
to
7f857fb
Compare
This commit fixes a performance regression for the object.union_n builtin, discovered in issue open-policy-agent#4985. The original logic for the builtin did pairwise mergeWithOverwrite calls between the input Objects, resulting in many wasted intermediate result Objects that were almost immediately discarded. The updated builtin uses a new algorithm to do a single pass across the input Objects, respecting the "last assignment wins, with merges" semantics of the original builtin implementation. In the included benchmarks, this provides a 2x speed and 2-3x memory efficiency improvement over using a pure-Rego comprehension to do the same job, and a 6x or greater improvement over the original implementation on all metrics as input Object arrays grow larger. Fixes open-policy-agent#4985 Signed-off-by: Philip Conrad <philipaconrad@gmail.com>
7f857fb
to
f5cf66e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
This PR fixes a performance regression for the
object.union_n
builtin, discovered in issue #4985.The original logic for the builtin did pairwise
mergeWithOverwrite
calls between the input Objects, resulting in many wasted intermediate result Objects that were almost immediately discarded. The updated builtin uses a new algorithm to do a single pass across the input Objects, respecting the "last assignment wins, with merges" semantics of the original builtin implementation.In the included benchmarks, this provides a 2x speed and 2-3x memory efficiency improvement over using a pure-Rego comprehension to do the same job, and a 6-30x improvement over the original implementation on all metrics as input Object arrays grow larger.
Fixes #4985
The 3x relevant benchmarks are shown below:
object.union_n
onmain
branchobject.union_n
on this PRBenchmarks
Benchmark of
object.union_n
onmain
branch:Benchmark of Pure Rego implementation:
Benchmark of
object.union_n
on this PR:Summary of Results
As mentioned above, and as shown in the benchmark results, this PR's changes provide dramatic performance improvements on all metrics, and result in the builtin having around 2-3x better performance overall than a pure Rego equivalent.