Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: object.update builtin function #2839

Closed
3 tasks
jaspervdj-luminal opened this issue Oct 28, 2020 · 3 comments
Closed
3 tasks

Proposal: object.update builtin function #2839

jaspervdj-luminal opened this issue Oct 28, 2020 · 3 comments

Comments

@jaspervdj-luminal
Copy link
Contributor

I'd like to propose a new builtin function to handle some limited cases of
recursion. We already allow recursively iterating over an object using
walk(obj). However, there's no way to use resulting outputs to update the
original object (or construct a new one).

This is because comprehensions only allow us to update or create objects with
known (fixed) depth. This proposal attempts to fix that using a new builtin
function object.update.

I imagine it would look a lot like this:

output := object.update(doc, patches)

Where doc is an object (or array), for example:

doc := {
  "name": {"first": "Tom", "last": "Nook"},
  "address": ["Sunset ave 1", "Tropical Island"]
}

And patches is a array of tuples. These tuples are exactly the same shape
as the output of walk(obj): a path together with a value. For example,
this could be the patches array:

patches := [
  [["name", "first"], "Tim"],
  [["address", 1], "Subtropical Island"],
])

Applying these patches to the original doc would apply the updates to the
object:

output = {
  "name": {"first": "Tim", "last": "Nook"},
  "address": ["Sunset ave 1", "Subtropical Island"]
}

In the concrete use case I encountered this it would also be extremely useful
to be able to insert new keys into the object. This would mean the following
holds as well:

object.update(doc, [
  [["name", "middle"], "Sr."],
]) == {
  "name": {"first": "Tom", "middle": "Sr.", "last": "Nook"},
  "address": ["Sunset ave 1", "Tropical Island"],
}

If you combine walk and object.update, you can do relatively complex generic
tree traversals, such as incrementing all numbers in a tree:

tree := {
  "bar": 1,
  "qux": [2, 3]
}

treePlusOne = ret {
  ret := update.object.update(tree, [[path, value + 1] |
    [path, value] = walk(tree)
    is_number(value)
  ])
}

It can also be used to merge two objects in more interesting ways than
object.merge allows for. The concrete use case I'm interested in is merging
together different views on terraform resources.

Because this is what every normal sane person would I've actually given
implementing update a try in pure Rego and it seems to work well. Of course
replacing this by a builtin would drastically improve performance and I wouldn't
have the recursion limit anymore.

This is of course still a draft and there's some open questions:

  • Is object.update a good name?
  • I haven't given sets much thought, and I'll look into if it also works
    there.
  • What should happen if the updates overlap? Should the function fail or
    just apply them in the order they are given?
@tsandall
Copy link
Member

tsandall commented Nov 9, 2020

Sorry for the delayed reply. This seems totally reasonable. We had an issue a while ago that proposed a json.patch built-in function for similar use cases: #1617. @patrick-east implemented that and may have thoughts here.

Is object.update a good name?

As far as naming goes, what about json.update? The reason is that the current object functions (with exception of object.union) only operate on top-level keys (i.e., they don't recurse or support nested fields.) OTOH, the json functions do.

I haven't given sets much thought, and I'll look into if it also works there.

I'd expect the path segment and value would have to be the same. E.g., object.update(set(), [[1,2]]) would be invalid but object.update(set(), [[1,1]]) == {1}. Similarly, we'll have to deal with references to non-existent array elements, e.g., object.update([], [[1,1]]).

What should happen if the updates overlap? Should the function fail or just apply them in the order they are given?

I'd let the use cases dictate the behaviour for now. We could always add the other variant in the future. If we allow overlapped updates to apply, I'd lean toward ensuring a stable order (e.g., by sorting by path and value.)

@jaspervdj
Copy link
Contributor

I read through the earlier discussion in those tickets. Full JSON-patch seems
to be a bit too much for our use case. However, if we don't go with that, we're
stuck specifying our own semantics for json.update, while there's already a
nice RFC out there. So a standards-compliant json.patch that works directly
on AST values seems like the way to go here.

jaspervdj-luminal added a commit to fugue/opa that referenced this issue Nov 24, 2020
This is an implementation of the `json.patch` builtin.  Previous discussions
about this include open-policy-agent#2839 and open-policy-agent#2167.

It does not use an external dependency but rather implements [RFC 6902] directly
on AST terms.  This avoids a conversion to JSON as well as the dependency; and
as an added bonus we can make `json.patch` work for sets as well, covering the
full space of AST terms.

In my first implementation I used a mutable approach by first creating a deep
copy and then modifying it in-place.  However, this leads to issues with
the cached `hash` values in objects on the path.  I replaced this with an
implementation that creates shallow copies.  The performance tradeoff is that
smaller patches should be faster; but replacing parts will be slower.  Since we
don't know about too many people using this, I think both sides are acceptable.

I vendored the [json-patch-tests] into the test suite in a way that should
make updating them fairly easy.  I am also testing the cases disabled there
(since they do work for us!) but I disabled two test cases by adding a new
`opa_disabled` key.  These are:

 -  Us allowing `"foo"` as path (which should be `"/foo"` if you interpret
    the RFC strictly).
 -  A duplicate entry in the JSON patch object which isn't caught by OPA
    since it's consistent.

I added some additional tests for sets and things seem to work.  I'm going
to try out this new functionality in our larger codebase to see if any issues
come up, but I expect it to hold up.  Update: we've been using this builtin
and haven't seen any issues so far.

[RFC 6902]: https://tools.ietf.org/html/rfc6902#section-4.4
[json-patch-tests]: https://github.com/json-patch/json-patch-tests

Signed-off-by: Jasper Van der Jeugt <jasper@fugue.co>
tsandall pushed a commit that referenced this issue Nov 24, 2020
This is an implementation of the `json.patch` builtin.  Previous discussions
about this include #2839 and #2167.

It does not use an external dependency but rather implements [RFC 6902] directly
on AST terms.  This avoids a conversion to JSON as well as the dependency; and
as an added bonus we can make `json.patch` work for sets as well, covering the
full space of AST terms.

In my first implementation I used a mutable approach by first creating a deep
copy and then modifying it in-place.  However, this leads to issues with
the cached `hash` values in objects on the path.  I replaced this with an
implementation that creates shallow copies.  The performance tradeoff is that
smaller patches should be faster; but replacing parts will be slower.  Since we
don't know about too many people using this, I think both sides are acceptable.

I vendored the [json-patch-tests] into the test suite in a way that should
make updating them fairly easy.  I am also testing the cases disabled there
(since they do work for us!) but I disabled two test cases by adding a new
`opa_disabled` key.  These are:

 -  Us allowing `"foo"` as path (which should be `"/foo"` if you interpret
    the RFC strictly).
 -  A duplicate entry in the JSON patch object which isn't caught by OPA
    since it's consistent.

I added some additional tests for sets and things seem to work.  I'm going
to try out this new functionality in our larger codebase to see if any issues
come up, but I expect it to hold up.  Update: we've been using this builtin
and haven't seen any issues so far.

[RFC 6902]: https://tools.ietf.org/html/rfc6902#section-4.4
[json-patch-tests]: https://github.com/json-patch/json-patch-tests

Signed-off-by: Jasper Van der Jeugt <jasper@fugue.co>
@tsandall
Copy link
Member

tsandall commented Dec 3, 2020

Fixed by #2909

@tsandall tsandall closed this as completed Dec 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants