Allow namespace inclusion for replicating data #1044

ghsbhatia · 2020-12-23T23:50:16Z

What steps did you take and what happened:

Currently, objects matching gvk are synched into OPA cache and there is support for namespace exclusion. This may not be desirable for certain use cases where only the objects from certain namespaces need to be cached for lookup and depending on cluster size and ever growing namespaces to support end user applications, caching objects that are not required result in extra memory consumption.

apiVersion: config.gatekeeper.sh/v1alpha1
kind: Config
metadata:
  name: config
  namespace: "gatekeeper-system"
spec:
  sync:
    syncOnly:
      - group: ""
        version: "v1"
        kind: "Pod"

What did you expect to happen:

Extend the CRD so that it is possible to cache objects only from desired namespaces.

apiVersion: config.gatekeeper.sh/v1alpha1
kind: Config
metadata:
  name: config
  namespace: "gatekeeper-system"
spec:
  match:
    - includedNamespaces: ["gatekeeper-included-namespace"]
      processes: ["*"]
  sync:
    syncOnly:
      - group: ""
        version: "v1"
        kind: "Pod"

Environment:

Gatekeeper version: 3.2.1
Kubernetes version: 1.18

The text was updated successfully, but these errors were encountered:

maxsmythe · 2021-01-09T05:39:28Z

There are two sides of this, I think:

The cache that is maintained inside of OPA
The cache that is maintained as part of the K8s informer (as implemented by controller-runtime)

(1) is easier to address than (2) I think. I wonder if @shomron has done any thinking about namespace-specific watches?

jalseth · 2021-02-13T18:50:53Z

+1 for this issue, but I imagine it working a little differently.

I'd expect each SyncOnlyEntry object to support namespaces and excludedNamespaces matchers, that way there is more granular control since there can only be one Config resource for Gatekeeper.

The use case for me is that I have CronJobs in the cluster that pull data from external APIs and create ConfigMaps in specific namespaces that Gatekeeper can then use in its policy decisions. However, I don't need Gatekeeper to sync every ConfigMap from the whole cluster, which is just a waste of RAM.

maxsmythe · 2021-02-17T04:44:49Z

I'm wondering if we want to accept a list of the same match criteria as used in constraints (kinds, scope, namespaces, excludedNamespaces, labelSelector, namespaceSelector).

That would give maximum flexibility. There is a danger that this would no longer be performant, at least until we come up with a decent way of indexing these criteria to avoid testing each one individually.

jalseth · 2021-02-18T04:22:28Z

@maxsmythe That would be great! Ideally we could just pass the matchers on to a watcher filter, but that hasn't been implemented in K8s yet. Perhaps we should devote some effort to that?

In the mean-time, we would need to implement this in Gatekeeper. Anecdotally, I think most clusters could handle the increased CPU load without any issue, but it could cause issues in large clusters. That said, could we leave the behavior as it is if no matchers are specified, so users can choose to use more RAM or more CPU when using the sync feature?

maxsmythe · 2021-02-20T07:07:18Z

Yeah, there are definitely some performance optimizations we can do under the hood. I think it may take some design to figure out the least-brittle way of indexing things in terms of avoiding a large jump in latency due to one constraint that breaks optimization, but the problem seems at least tractable.

FWIW we are already looking at adding namespace selectors per #1078, so I think we've already committed to solving the performance implications.

@shomron may find the overlap between what we're doing here and watch filters interesting.

ritazh · 2022-07-20T16:53:58Z

Closing this in favor of #1078

ghsbhatia added the bug Something isn't working label Dec 23, 2020

sozercan added enhancement New feature or request and removed bug Something isn't working labels Dec 23, 2020

maxsmythe mentioned this issue Jan 9, 2021

opa.runtime() returns an empty JSON object #1043

Closed

ritazh closed this as completed Jul 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow namespace inclusion for replicating data #1044

Allow namespace inclusion for replicating data #1044

ghsbhatia commented Dec 23, 2020

maxsmythe commented Jan 9, 2021

jalseth commented Feb 13, 2021

maxsmythe commented Feb 17, 2021

jalseth commented Feb 18, 2021

maxsmythe commented Feb 20, 2021

ritazh commented Jul 20, 2022

Allow namespace inclusion for replicating data #1044

Allow namespace inclusion for replicating data #1044

Comments

ghsbhatia commented Dec 23, 2020

maxsmythe commented Jan 9, 2021

jalseth commented Feb 13, 2021

maxsmythe commented Feb 17, 2021

jalseth commented Feb 18, 2021

maxsmythe commented Feb 20, 2021

ritazh commented Jul 20, 2022