New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server+plugins: Integrate server + NDBCache in decision logs. #5147
server+plugins: Integrate server + NDBCache in decision logs. #5147
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good so far
bae94a4
to
9b627c7
Compare
I wonder if a CLI option is the right approach here. If it was part of the runtime config, it would
|
9b627c7
to
61ed310
Compare
This was a great suggestion, and I've integrated that feature into the PR. 😄 We now use the Implementation Note: I found that the flag has to be set up in the runtime (by querying the plugin's config), and then it can be propagated down into the server package from there. Otherwise, some ugly circular import issues crop up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an interesting case, this functionality... we'll push the ndbcache's data out via the DL plugin, but it needs to be passed through topdown. So, its config is associated with decision_logs, although its effect is in evaluation. I guess that's OK, though 🤔
7c13b09
to
edc028f
Compare
✅ Deploy Preview for openpolicyagent ready!
To edit notification comments on pull requests, go to your Netlify site settings. |
c398503
to
f694c0c
Compare
@@ -793,6 +811,9 @@ func (p *Plugin) reconfigure(config interface{}) { | |||
p.config = *newConfig | |||
} | |||
|
|||
// NOTE(philipc): Because ND builtins caching can cause unbounded growth in | |||
// decision log entry size, we do best-effort event encoding here, and when we | |||
// run out of space, we drop the ND builtins cache, and try encoding again. | |||
func (p *Plugin) encodeAndBufferEvent(event EventV1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is where the retry-encoding-without-NDBCache logic lives. If encoding the event succeeds after dropping the NDBCache, we print an error message + increment a counter metric, so that this behavior can be tracked.
server/server.go
Outdated
if ndbCache != nil { | ||
var x interface{} = ast.MustJSON(ndbCache.AsValue()) | ||
info.NDBuiltinCache = &x | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This snippet is responsible for transforming an NDBCache
into a map[string]interface{}
that the masking system will know how to work on later in the logging pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this panic if we're storing malformed json as part of a http.send response? (I'm just wondering because I don't know the format of the values stored in nbdcache by heart..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For values OPA is generating, we should be just fine: NDBCache
is just a map[string]ast.Object
under-the-hood, which is constructed during eval. The .AsValue
method just turns the cache into an ast.Object
with the same nested structure as the map[string]ast.Object
. This lets us run it safely through the value-to-maps/interfaces conversion step.
Still, it couldn't hurt to catch any weird error cases that crop up, so I'll rework this logic while I'm fixing a few other things. Thanks for the catch/suggestion here, @srenatus!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know, we don't need to overdo it. The output of http.send will be an object, so I guess we're just OK here -- anything wonky in its raw_body key for example will be escaped accordingly. Your call 😄
@@ -57,31 +66,28 @@ func (c NDBCache) Get(name string, k ast.Value) (ast.Value, bool) { | |||
|
|||
// Convenience functions for serializing the data structure. | |||
func (c NDBCache) MarshalJSON() ([]byte, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the NDBCache was first developed, I'd thought I needed to use ast.Term
types to wrapper over AST values. I was wrong, and it turns out you can serialize raw ast.Value
types (like ast.Object
), but you have to be careful about how you do it, especially on the deserialization side.
The changes in JSON ser/des for the NDBCache result in much cleaner JSON blobs, which save a lot of space in decision logs, and are easier to read too! 😄
EvalNDBuiltinCache(r.ndBuiltinCache), | ||
EvalSeed(r.seed), | ||
} | ||
|
||
if r.ndBuiltinCache != nil { | ||
evalArgs = append(evalArgs, EvalNDBuiltinCache(r.ndBuiltinCache)) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern crops up across a few places. In my first pass at adding ND builtin caching support, I was too aggressive with using the cache value from the parent context. This could result in a situation where the Runtime
already had an NDBCache set for it to use, but the cache got forcibly reset to nil
later on during Eval
, because of the too-aggressive setting logic.
The original (incorrect) behavior is fixed in this PR by breaking out every NDBCache opt-in as a conditional block that only fires when the parent-context NDBCache value is non-nil.
@@ -153,6 +153,15 @@ func (r maskRule) Mask(event *EventV1) error { | |||
} | |||
maskObj = event.Result | |||
maskObjPtr = &event.Result | |||
case partNDBCache: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of the careful interface/type-wrangling we do earlier in the server's Log
function, we can just copy/paste the masking logic here for the NDBCache. It behaves exactly like the Input
and Result
masking targets, so the logic is safe to reuse with minimal changes. 😄
docs/content/configuration.md
Outdated
@@ -832,6 +832,7 @@ included in the actual bundle gzipped tarball. | |||
| `decision_logs.mask_decision` | `string` | No (default: `system/log/mask`) | Set path of masking decision. | | |||
| `decision_logs.plugin` | `string` | No | Use the named plugin for decision logging. If this field exists, the other configuration fields are not required. | | |||
| `decision_logs.console` | `boolean` | No (default: `false`) | Log the decisions locally to the console. When enabled alongside a remote decision logging API the `service` must be configured, the default `service` selection will be disabled. | | |||
| `decision_logs.nd_builtins_cache` | `boolean` | No (default: `false`) | Enable the non-deterministic builtins caching system during policy evaluation, and include the contents of the cache in decision logs. Note that decision logs that are larger than `upload_size_limit_bytes` will silently drop the `nd_builtins_cache` key from the log entry before uploading. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one configuration flag sets off everything ND builtins cache-related. In the future, it could be worthwhile to allow enabling the cache without requiring the decision_logs
plugin's involvement. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was expecting the nd_builtins_cache
option as a top level object in the config with options to set it in the decision logs etc. The decision logger's Log
would simply add this if it was set in the input object. I think keeping this as its own config would make it easier for us to update with additional extensions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's what we envision, let's promote it to top-level.
3453e65
to
19466e6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good so far, just questions and nitpicks...
Another one: will the whole of the NDBCache work require existing, external DL plugins to adapt? I wasn't sure just from looking at this PR.
server/server.go
Outdated
if ndbCache != nil { | ||
var x interface{} = ast.MustJSON(ndbCache.AsValue()) | ||
info.NDBuiltinCache = &x | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this panic if we're storing malformed json as part of a http.send response? (I'm just wondering because I don't know the format of the values stored in nbdcache by heart..)
@srenatus If I understand you correctly, your question is on whether non-OPA-provided DL plugins will require adaptation after this PR? There was an interface change, namely the new For downstream consumers of the JSON DL events, as long as they stick to using JSON keys that they understand, this set of changes can remain entirely invisible to them. The
|
That's ok. As you've said, the can ignore it. I was wondering about the interface of the DL plugins. If they don't need adaptation since all added things are in the struct, then we're good. |
c591e12
to
51c7666
Compare
I did a quick search/replace to refactor the pluralized |
💭 |
I have considered a few different possibilities:
I'm fine leaving the NDBCache stuff behind the more obscure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. Few comments.
6a238df
to
c811320
Compare
This commit integrates the ND builtins caching system into decision logging, both in the server and sdk packages. Some reworking of the NDBCache's serialization format were required to accommodate this. The feature is disabled by default, and must be opted into by user configuration. The feature can be enabled via a top-level config key: nd_builtin_cache=true The NDBCache is exposed to the masking system under the `/nd_builtin_cache` path, which allows masking or dropping sensitive values from decision logs selectively. Note: If a decision log event exceeds the `upload_size_limit_bytes` value for the OPA instance, OPA will reattempt uploading it, after dropping the NDBCache from the event. This behavior will trigger a log error, and will increment the `decision_logs_nd_builtin_cache_dropped` metrics counter. Fixes: open-policy-agent#1514 Signed-off-by: Philip Conrad <philipaconrad@gmail.com>
c811320
to
b84c523
Compare
@ashutosh-narkar I moved the config key up one level, so that it's a top-level key: |
Thanks for the review, @ashutosh-narkar! |
…olicy-agent#5147) This commit integrates the non-deterministic builtins caching system into decision logging, both in the server and sdk packages. Some reworking of the NDBCache's serialization format were required to accommodate this. The feature is disabled by default, and must be opted into by user configuration. The feature can be enabled via a top-level config key: nd_builtin_cache=true The NDBCache is exposed to the masking system under the `/nd_builtin_cache` path, which allows masking or dropping sensitive values from decision logs selectively. Note: If a decision log event exceeds the `upload_size_limit_bytes` value for the OPA instance, OPA will reattempt uploading it, after dropping the NDBCache from the event. This behavior will trigger a log error, and will increment the `decision_logs_nd_builtin_cache_dropped` metrics counter. Fixes: open-policy-agent#1514 Signed-off-by: Philip Conrad <philipaconrad@gmail.com> Signed-off-by: Byron Lagrone <byron.lagrone@seqster.com>
This PR integrates the non-deterministic builtins cache (introduced in #4926) into the OPA server and SDK, ensuring that the cache's contents are added to the decision logs generated by the decision logging plugin when the
nd_builtin_cache
key is set totrue
.Fixes: #1514
Task List
http.send
.Known Caveats / Questions for Code Review
Should we be silently dropping thend_builtins_cache
key from DL entries when they are too large? Could we use theError
field to perhaps signal that all is not well? Is there maybe an out-of-band logging system that could be used?(EDIT: @ashutosh-narkar suggested logging + a metrics counter, and I have implemented this in my latest changeset.)
sdk.Opa.Partial
has no support for the ND builtins cache, because partial evaluation will not run the ND builtins, as of topdown+builtins: Block all ND builtins from partial eval. #5172, and topdown+builtins: Restore legacyIgnoreDuringPartialEval
list. #5175 merging. I am fairly sure this is correct, but would appreciate a second opinion from a reviewer. 😅