Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow namespace inclusion for replicating data #1044

Closed
ghsbhatia opened this issue Dec 23, 2020 · 6 comments
Closed

Allow namespace inclusion for replicating data #1044

ghsbhatia opened this issue Dec 23, 2020 · 6 comments
Labels
enhancement New feature or request

Comments

@ghsbhatia
Copy link

What steps did you take and what happened:

Currently, objects matching gvk are synched into OPA cache and there is support for namespace exclusion. This may not be desirable for certain use cases where only the objects from certain namespaces need to be cached for lookup and depending on cluster size and ever growing namespaces to support end user applications, caching objects that are not required result in extra memory consumption.

apiVersion: config.gatekeeper.sh/v1alpha1
kind: Config
metadata:
  name: config
  namespace: "gatekeeper-system"
spec:
  sync:
    syncOnly:
      - group: ""
        version: "v1"
        kind: "Pod"

What did you expect to happen:

Extend the CRD so that it is possible to cache objects only from desired namespaces.

apiVersion: config.gatekeeper.sh/v1alpha1
kind: Config
metadata:
  name: config
  namespace: "gatekeeper-system"
spec:
  match:
    - includedNamespaces: ["gatekeeper-included-namespace"]
      processes: ["*"]
  sync:
    syncOnly:
      - group: ""
        version: "v1"
        kind: "Pod"

Environment:

  • Gatekeeper version: 3.2.1
  • Kubernetes version: 1.18
@ghsbhatia ghsbhatia added the bug Something isn't working label Dec 23, 2020
@sozercan sozercan added enhancement New feature or request and removed bug Something isn't working labels Dec 23, 2020
@maxsmythe
Copy link
Contributor

There are two sides of this, I think:

  1. The cache that is maintained inside of OPA
  2. The cache that is maintained as part of the K8s informer (as implemented by controller-runtime)

(1) is easier to address than (2) I think. I wonder if @shomron has done any thinking about namespace-specific watches?

@jalseth
Copy link
Member

jalseth commented Feb 13, 2021

+1 for this issue, but I imagine it working a little differently.

I'd expect each SyncOnlyEntry object to support namespaces and excludedNamespaces matchers, that way there is more granular control since there can only be one Config resource for Gatekeeper.

The use case for me is that I have CronJobs in the cluster that pull data from external APIs and create ConfigMaps in specific namespaces that Gatekeeper can then use in its policy decisions. However, I don't need Gatekeeper to sync every ConfigMap from the whole cluster, which is just a waste of RAM.

@maxsmythe
Copy link
Contributor

I'm wondering if we want to accept a list of the same match criteria as used in constraints (kinds, scope, namespaces, excludedNamespaces, labelSelector, namespaceSelector).

That would give maximum flexibility. There is a danger that this would no longer be performant, at least until we come up with a decent way of indexing these criteria to avoid testing each one individually.

@jalseth
Copy link
Member

jalseth commented Feb 18, 2021

@maxsmythe That would be great! Ideally we could just pass the matchers on to a watcher filter, but that hasn't been implemented in K8s yet. Perhaps we should devote some effort to that?

In the mean-time, we would need to implement this in Gatekeeper. Anecdotally, I think most clusters could handle the increased CPU load without any issue, but it could cause issues in large clusters. That said, could we leave the behavior as it is if no matchers are specified, so users can choose to use more RAM or more CPU when using the sync feature?

@maxsmythe
Copy link
Contributor

Yeah, there are definitely some performance optimizations we can do under the hood. I think it may take some design to figure out the least-brittle way of indexing things in terms of avoiding a large jump in latency due to one constraint that breaks optimization, but the problem seems at least tractable.

FWIW we are already looking at adding namespace selectors per #1078, so I think we've already committed to solving the performance implications.

@shomron may find the overlap between what we're doing here and watch filters interesting.

@ritazh
Copy link
Member

ritazh commented Jul 20, 2022

Closing this in favor of #1078

@ritazh ritazh closed this as completed Jul 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants