Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server+plugins: Integrate server + NDBCache in decision logs. #5147

Conversation

philipaconrad
Copy link
Contributor

@philipaconrad philipaconrad commented Sep 16, 2022

This PR integrates the non-deterministic builtins cache (introduced in #4926) into the OPA server and SDK, ensuring that the cache's contents are added to the decision logs generated by the decision logging plugin when the nd_builtin_cache key is set to true.

Fixes: #1514

Task List

  • Remove the "always true" setting in the server for enabling the ND builtins cache.
    • Plumb in a config file setting for enabling the ND builtins cache.
  • Add tests for the server/plugin to test out the new field addition. (A few more tests may be possible to add here, but the current ones exercise the most important functionality in the plugin and server.)
  • Integrate the ND builtins cache into the SDK package. (See Caveats below.)
  • Add tests for the SDK integration.
  • Update decision logging docs.
    • Specifically call out how this feature affects decision log sizes when using builtins like http.send.

Known Caveats / Questions for Code Review

Copy link
Contributor

@srenatus srenatus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far

server/server.go Outdated Show resolved Hide resolved
@philipaconrad philipaconrad force-pushed the issue-1514-integrate-ndbcache-for-decision-logging branch from bae94a4 to 9b627c7 Compare September 19, 2022 22:17
@srenatus
Copy link
Contributor

I wonder if a CLI option is the right approach here. If it was part of the runtime config, it would

  1. be open to getting set by the discovery config
  2. still be available to the CLI args via --set some.key=true

@philipaconrad philipaconrad force-pushed the issue-1514-integrate-ndbcache-for-decision-logging branch from 9b627c7 to 61ed310 Compare September 21, 2022 23:47
@philipaconrad
Copy link
Contributor Author

I wonder if a CLI option is the right approach here. If it was part of the runtime config, it would

  1. be open to getting set by the discovery config

  2. still be available to the CLI args via --set some.key=true

This was a great suggestion, and I've integrated that feature into the PR. 😄 We now use the decision_logs.nd_builtins_cache boolean key as the on/off flag for the feature.

Implementation Note: I found that the flag has to be set up in the runtime (by querying the plugin's config), and then it can be propagated down into the server package from there. Otherwise, some ugly circular import issues crop up.

Copy link
Contributor

@srenatus srenatus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an interesting case, this functionality... we'll push the ndbcache's data out via the DL plugin, but it needs to be passed through topdown. So, its config is associated with decision_logs, although its effect is in evaluation. I guess that's OK, though 🤔

plugins/logs/plugin.go Outdated Show resolved Hide resolved
@philipaconrad philipaconrad force-pushed the issue-1514-integrate-ndbcache-for-decision-logging branch 3 times, most recently from 7c13b09 to edc028f Compare September 27, 2022 17:30
@netlify
Copy link

netlify bot commented Sep 27, 2022

Deploy Preview for openpolicyagent ready!

Name Link
🔨 Latest commit b84c523
🔍 Latest deploy log https://app.netlify.com/sites/openpolicyagent/deploys/633f242f9ad4a6000858bbdd
😎 Deploy Preview https://deploy-preview-5147--openpolicyagent.netlify.app/docs/edge/configuration
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@philipaconrad philipaconrad force-pushed the issue-1514-integrate-ndbcache-for-decision-logging branch 2 times, most recently from c398503 to f694c0c Compare September 30, 2022 21:04
@philipaconrad philipaconrad marked this pull request as ready for review October 1, 2022 01:02
@@ -793,6 +811,9 @@ func (p *Plugin) reconfigure(config interface{}) {
p.config = *newConfig
}

// NOTE(philipc): Because ND builtins caching can cause unbounded growth in
// decision log entry size, we do best-effort event encoding here, and when we
// run out of space, we drop the ND builtins cache, and try encoding again.
func (p *Plugin) encodeAndBufferEvent(event EventV1) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is where the retry-encoding-without-NDBCache logic lives. If encoding the event succeeds after dropping the NDBCache, we print an error message + increment a counter metric, so that this behavior can be tracked.

server/server.go Outdated
Comment on lines 2925 to 2932
if ndbCache != nil {
var x interface{} = ast.MustJSON(ndbCache.AsValue())
info.NDBuiltinCache = &x
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This snippet is responsible for transforming an NDBCache into a map[string]interface{} that the masking system will know how to work on later in the logging pipeline.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this panic if we're storing malformed json as part of a http.send response? (I'm just wondering because I don't know the format of the values stored in nbdcache by heart..)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For values OPA is generating, we should be just fine: NDBCache is just a map[string]ast.Object under-the-hood, which is constructed during eval. The .AsValue method just turns the cache into an ast.Object with the same nested structure as the map[string]ast.Object. This lets us run it safely through the value-to-maps/interfaces conversion step.

Still, it couldn't hurt to catch any weird error cases that crop up, so I'll rework this logic while I'm fixing a few other things. Thanks for the catch/suggestion here, @srenatus!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, we don't need to overdo it. The output of http.send will be an object, so I guess we're just OK here -- anything wonky in its raw_body key for example will be escaped accordingly. Your call 😄

@@ -57,31 +66,28 @@ func (c NDBCache) Get(name string, k ast.Value) (ast.Value, bool) {

// Convenience functions for serializing the data structure.
func (c NDBCache) MarshalJSON() ([]byte, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the NDBCache was first developed, I'd thought I needed to use ast.Term types to wrapper over AST values. I was wrong, and it turns out you can serialize raw ast.Value types (like ast.Object), but you have to be careful about how you do it, especially on the deserialization side.

The changes in JSON ser/des for the NDBCache result in much cleaner JSON blobs, which save a lot of space in decision logs, and are easier to read too! 😄

Comment on lines -1189 to +1195
EvalNDBuiltinCache(r.ndBuiltinCache),
EvalSeed(r.seed),
}

if r.ndBuiltinCache != nil {
evalArgs = append(evalArgs, EvalNDBuiltinCache(r.ndBuiltinCache))
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern crops up across a few places. In my first pass at adding ND builtin caching support, I was too aggressive with using the cache value from the parent context. This could result in a situation where the Runtime already had an NDBCache set for it to use, but the cache got forcibly reset to nil later on during Eval, because of the too-aggressive setting logic.

The original (incorrect) behavior is fixed in this PR by breaking out every NDBCache opt-in as a conditional block that only fires when the parent-context NDBCache value is non-nil.

@@ -153,6 +153,15 @@ func (r maskRule) Mask(event *EventV1) error {
}
maskObj = event.Result
maskObjPtr = &event.Result
case partNDBCache:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the careful interface/type-wrangling we do earlier in the server's Log function, we can just copy/paste the masking logic here for the NDBCache. It behaves exactly like the Input and Result masking targets, so the logic is safe to reuse with minimal changes. 😄

@@ -832,6 +832,7 @@ included in the actual bundle gzipped tarball.
| `decision_logs.mask_decision` | `string` | No (default: `system/log/mask`) | Set path of masking decision. |
| `decision_logs.plugin` | `string` | No | Use the named plugin for decision logging. If this field exists, the other configuration fields are not required. |
| `decision_logs.console` | `boolean` | No (default: `false`) | Log the decisions locally to the console. When enabled alongside a remote decision logging API the `service` must be configured, the default `service` selection will be disabled. |
| `decision_logs.nd_builtins_cache` | `boolean` | No (default: `false`) | Enable the non-deterministic builtins caching system during policy evaluation, and include the contents of the cache in decision logs. Note that decision logs that are larger than `upload_size_limit_bytes` will silently drop the `nd_builtins_cache` key from the log entry before uploading. |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one configuration flag sets off everything ND builtins cache-related. In the future, it could be worthwhile to allow enabling the cache without requiring the decision_logs plugin's involvement. 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting the nd_builtins_cache option as a top level object in the config with options to set it in the decision logs etc. The decision logger's Log would simply add this if it was set in the input object. I think keeping this as its own config would make it easier for us to update with additional extensions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's what we envision, let's promote it to top-level.

@philipaconrad philipaconrad force-pushed the issue-1514-integrate-ndbcache-for-decision-logging branch 2 times, most recently from 3453e65 to 19466e6 Compare October 3, 2022 18:07
Copy link
Contributor

@srenatus srenatus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far, just questions and nitpicks...

Another one: will the whole of the NDBCache work require existing, external DL plugins to adapt? I wasn't sure just from looking at this PR.

plugins/logs/mask.go Outdated Show resolved Hide resolved
server/server.go Outdated
Comment on lines 2925 to 2932
if ndbCache != nil {
var x interface{} = ast.MustJSON(ndbCache.AsValue())
info.NDBuiltinCache = &x
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this panic if we're storing malformed json as part of a http.send response? (I'm just wondering because I don't know the format of the values stored in nbdcache by heart..)

topdown/builtins/builtins.go Outdated Show resolved Hide resolved
@philipaconrad
Copy link
Contributor Author

will the whole of the NDBCache work require existing, external DL plugins to adapt? I wasn't sure just from looking at this PR.

@srenatus If I understand you correctly, your question is on whether non-OPA-provided DL plugins will require adaptation after this PR? There was an interface change, namely the new NDBuiltinsCache field in the EventV1 struct. That's a breaking change, right? They don't have to do anything with that struct field, but it's there now.

For downstream consumers of the JSON DL events, as long as they stick to using JSON keys that they understand, this set of changes can remain entirely invisible to them. The nd_builtins_cache key only appears in Decision Logs when:

  • The DL plugin && NDBCache are enabled
  • ND builtins were used during policy eval
  • The cache was small enough to fit in upload_size_limit_bytes when serialized

@srenatus
Copy link
Contributor

srenatus commented Oct 4, 2022

There was an interface change, namely the new NDBuiltinsCache field in the EventV1 struct. That's a breaking change, right? They don't have to do anything with that struct field, but it's there now.

That's ok. As you've said, the can ignore it. I was wondering about the interface of the DL plugins. If they don't need adaptation since all added things are in the struct, then we're good.

@philipaconrad philipaconrad force-pushed the issue-1514-integrate-ndbcache-for-decision-logging branch 3 times, most recently from c591e12 to 51c7666 Compare October 4, 2022 16:45
@philipaconrad
Copy link
Contributor Author

I did a quick search/replace to refactor the pluralized nd_builtins_cache -> nd_builtin_cache, because all of the existing internal APIs leave it singular, and I think it'd be silly to have some parts of the PR have an extra 's', and other parts not.

@srenatus
Copy link
Contributor

srenatus commented Oct 4, 2022

💭 nd_builtin_cache I've been wondering if we should spell out the nd in user-facing places (i.e. the config). I'm pretty certain that few people will think "non-deterministic" when they see "nd"...

@philipaconrad
Copy link
Contributor Author

I've been wondering if we should spell out the nd in user-facing places (i.e. the config)

I have considered a few different possibilities:

  • nd_ :: Shortest, but also very opaque.
  • nondet_/non_det_ :: Short, but still a bit opaque. I avoided this initially because of the lack of symmetry with anything else, however, that might be just a "foolish consistency" thing on my part. 🤷
  • nondeterministic_ :: Longest, unambiguous, but hard to spell out quickly on the CLI.

I'm fine leaving the NDBCache stuff behind the more obscure nd_-prefixed config option, but I'm definitely open to suggestions! I've tried to consistently spell out "non-deterministic" in the docs where possible, but for the config option, I think there's a lot more wiggle room in deciding on what's a good name. 🤔

Copy link
Member

@ashutosh-narkar ashutosh-narkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. Few comments.

plugins/logs/plugin.go Show resolved Hide resolved
plugins/logs/plugin_test.go Show resolved Hide resolved
@philipaconrad philipaconrad force-pushed the issue-1514-integrate-ndbcache-for-decision-logging branch 3 times, most recently from 6a238df to c811320 Compare October 6, 2022 17:42
This commit integrates the ND builtins caching system into decision
logging, both in the server and sdk packages. Some reworking of the
NDBCache's serialization format were required to accommodate this.
The feature is disabled by default, and must be opted into by user
configuration.

The feature can be enabled via a top-level config key:

    nd_builtin_cache=true

The NDBCache is exposed to the masking system under the
`/nd_builtin_cache` path, which allows masking or dropping sensitive
values from decision logs selectively.

Note: If a decision log event exceeds the `upload_size_limit_bytes`
value for the OPA instance, OPA will reattempt uploading it, after
dropping the NDBCache from the event. This behavior will trigger a log
error, and will increment the `decision_logs_nd_builtin_cache_dropped`
metrics counter.

Fixes: open-policy-agent#1514

Signed-off-by: Philip Conrad <philipaconrad@gmail.com>
@philipaconrad philipaconrad force-pushed the issue-1514-integrate-ndbcache-for-decision-logging branch from c811320 to b84c523 Compare October 6, 2022 18:53
@philipaconrad
Copy link
Contributor Author

@ashutosh-narkar I moved the config key up one level, so that it's a top-level key: nd_builtin_cache: <bool>

@philipaconrad philipaconrad merged commit ac20ef2 into open-policy-agent:main Oct 6, 2022
@philipaconrad
Copy link
Contributor Author

Thanks for the review, @ashutosh-narkar!

byronic pushed a commit to byronic/opa that referenced this pull request Oct 17, 2022
…olicy-agent#5147)

This commit integrates the non-deterministic builtins caching system
into decision logging, both in the server and sdk packages. Some
reworking of the NDBCache's serialization format were required to
accommodate this. The feature is disabled by default, and must be
opted into by user configuration.

The feature can be enabled via a top-level config key:

    nd_builtin_cache=true

The NDBCache is exposed to the masking system under the
`/nd_builtin_cache` path, which allows masking or dropping sensitive
values from decision logs selectively.

Note: If a decision log event exceeds the `upload_size_limit_bytes`
value for the OPA instance, OPA will reattempt uploading it, after
dropping the NDBCache from the event. This behavior will trigger a log
error, and will increment the `decision_logs_nd_builtin_cache_dropped`
metrics counter.

Fixes: open-policy-agent#1514

Signed-off-by: Philip Conrad <philipaconrad@gmail.com>
Signed-off-by: Byron Lagrone <byron.lagrone@seqster.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

decision log should include values from non-deterministic built-in functions like time.now_ns
3 participants