Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gatekeeper needs a default+override functionality #3082

Open
skaven81 opened this issue Oct 18, 2023 · 11 comments
Open

Gatekeeper needs a default+override functionality #3082

skaven81 opened this issue Oct 18, 2023 · 11 comments
Labels

Comments

@skaven81
Copy link

skaven81 commented Oct 18, 2023

Describe the solution you'd like
I need the ability to define a default behavior for the cluster, and then selectively implement overrides to that default behavior. With the toolkit available in Constraints today (notably, the match field), combined with the fact that ALL matching Constraints are evaluated for a resource to make a policy decision, this leads to operationally infeasible complexity when trying to implement something that behaves with a default behavior.

Let's work through an example. I have a large multi-tenant cluster, using Rancher's Project model. This means I have hundreds of teams, each with the ability to create namespaces in their Project. Rancher Projects are just special labels on namespaces. All the namespaces with the same field.cattle.io/projectId label are in the same Project and subject to the same set of security and behavioral policies. Rancher has its own validating webhook that prevents unprivileged tenants from modifying that label, and only allows them to create namespaces in their own Project.

Now I, the Kubernetes administrator, want to impose some security or behavioral policy on the cluster. For this example I'll use memory limits, but the principle applies to virtually any Constraint I'd want to create in the cluster.

I start by creating a Constraint that declares that all Pods must have memory limits defined, which may not exceed 16GB.

kind: K8sMemoryLimits
metadata:
  name: memory-limit-enforcement
spec:
  match:
    kinds:
    - apiGroups: [ "" ]
      kinds: [ Pod ]
  parameters:
    limit: 16Gi

Immediately I have a problem...this enforcement has now applied to all of my administrative workloads as well, which is undesirable. So I need to exempt all those namespaces from the Constraint. What I need to do is exclude all namespaces with the "System" projectId, e.g. exclude namespaces with field.cattle.io/projectId=system or whatever. I can update the Constraint like so:

kind: K8sMemoryLimits
metadata:
  name: memory-limit-enforcement
spec:
  match:
    kinds:
    - apiGroups: [ "" ]
      kinds: [ Pod ]
    namespaceSelector:
      matchExpressions:
      - { key: "field.cattle.io/projectId, operation: NotIn, values: [ system ]}
  parameters:
    limit: 16Gi

Ok great...now my administrative workloads are exempted. But I still have a problem -- my 16GB limit applies to ALL other namespaces in the cluster. Inevitably, some team is going to come to me and ask for an exception. They need to be able to run Pods with a 32Gi limit. And after evaluation I agree that it's OK. Now I need two Constraints, that implement MECE to ensure that every namespace in the cluster only matches one (and only one) of these memory limit constraints.

kind: K8sMemoryLimits
metadata:
  name: memory-limit-enforcement
spec:
  match:
    kinds:
    - apiGroups: [ "" ]
      kinds: [ Pod ]
    namespaceSelector:
      matchExpressions:
      - { key: "field.cattle.io/projectId", operation: NotIn, values: [ system, 32g-team ]}
  parameters:
    limit: 16Gi
---
kind: K8sMemoryLimits
metadata:
  name: memory-limit-enforcement-32gteam
spec:
  match:
    kinds:
    - apiGroups: [ "" ]
      kinds: [ Pod ]
    namespaceSelector:
      matchLabels:
        field.cattle.io/projectId: 32g-team
  parameters:
    limit: 32Gi

This was a little awkward, because I had to make mention of the 32g-team label in the "primary" (default) Constraint as well as the override Constraint. And I don't like repeating myself. But a few weeks later, another team needs a limit of 64Gi... now we have three Constraints...

kind: K8sMemoryLimits
metadata:
  name: memory-limit-enforcement
spec:
  match:
    kinds:
    - apiGroups: [ "" ]
      kinds: [ Pod ]
    namespaceSelector:
      matchExpressions:
      - { key: "field.cattle.io/projectId", operation: NotIn, values: [ system, 32g-team, 64g-team ]}
  parameters:
    limit: 16Gi
---
kind: K8sMemoryLimits
metadata:
  name: memory-limit-enforcement-32gteam
spec:
  match:
    kinds:
    - apiGroups: [ "" ]
      kinds: [ Pod ]
    namespaceSelector:
      matchLabels:
        field.cattle.io/projectId: 32g-team
  parameters:
    limit: 32Gi
---
kind: K8sMemoryLimits
metadata:
  name: memory-limit-enforcement-64gteam
spec:
  match:
    kinds:
    - apiGroups: [ "" ]
      kinds: [ Pod ]
    namespaceSelector:
      matchLabels:
        field.cattle.io/projectId: 64g-team
  parameters:
    limit: 64Gi

Now imagine this same system but with hundreds of tenants in the same cluster. And dozens of Constraints that each, as a group, have to be MECE-managed, and a single slip-up that breaks MECE in the match sections of a group of Constraints, will cause a production outage. Not good.

What I would very much rather, is that each Constraint for a given security/policy enforcement, can be authored with a built-in ability to match the resource being evaluated against a list of matching rules, and the first one that "wins" gets that set of parameters (or overrides to a default set of parameters) applied. So in the example above with 16, 32, and 64 gig teams, instead of needing three separate Constraints, I can instead author a single constraint that guarantees MECE (because there's only one).

kind: K8sMemoryLimits
metadata:
  name: memory-limit-enforcement-64gteam
spec:
  match:
    kinds:
    - apiGroups: [ "" ]
      kinds: [ Pod ]
    namespaceSelector:
      matchExpressions:
      - { key: "field.cattle.io/projectId", operation: NotIn, values: [ system, 32g-team, 64g-team ]}
  parameterSelection:
  - match:
      namespaceSelector:
        matchLabels:
          field.cattle.io/projectId: 32g-team
    parameters:
      limit: 32Gi
  - match:
      namespaceSelector:
        matchLabels:
          field.cattle.io/projectId: 64g-team
    parameters:
      limit: 64Gi
  - parameters:
      limit: 16Gi

My proposal is to add a new field to Constraints called parameterSelection. This would be a list of objects. Each object may optionally contain a match field (if match is not specified then it matches all resources), and a required parameters field.

The Constraint's top-level match still behaves as it does today, providing initial guidance to Gatekeeper on whether the Constraint should be evaluated for the resource at all. Assuming the resource matches the top level spec.match field, then Gatekeeper then evaluates each of the parameterSelection items in order. The first one that has a match object that matches the object under review, has its parameters object merged over the spec.parameters object (or sets it, if spec.parameters is not present). Upon the first match, further evaluation of parameterSelection is halted. If the entire list of parameterSelection is exhausted with no match, then the Constraint is evaluated with spec.parameters unmodified (or, another top level field can be added to Constraint to control the behavior -- I can imagine situations where aborting evaluation of the Constraint would be preferred if nothing in parameterSelection matches).

Additonal attributes can be added to the top level spec to further refine the behavior of parameterSelection. For example:

  • spec.onParameterSelectionNoMatch: {proceed,allow,warn,deny} - proceed continues, using spec.parameters unmodified. allow and warn skips evaluating the Constraint and assumes allow (with warn issuing a warning in the log and an Event). deny skips evaluating the Constraint and assumes a denial.
  • spec.parameterSelectionBehavior: {MatchFirst,MatchAny} - MatchFirst will stop when the first match is encountered. MatchAny will continue and merge each matching item's parameters over spec.parameters. This would allow for far more expressive default-and-override behaviors.

Environment:

  • Gatekeeper version: 3.12
  • Kubernetes version: Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.8", GitCommit:"395f0a2fdc940aeb9ab88849e8fa4321decbf6e1", GitTreeState:"clean", BuildDate:"2023-08-24T00:43:07Z", GoVersion:"go1.20.7", Compiler:"gc", Platform:"linux/amd64"}
@skaven81 skaven81 added the enhancement New feature or request label Oct 18, 2023
@skaven81
Copy link
Author

I think it's worthwhile noting that Kyverno is actively working on this, and has a default-and-overrides mechanism already in alpha: https://kyverno.io/docs/writing-policies/exceptions/. But it's rather bolted-on and I think my proposal would be a lot more elegant in Gatekeeper.

@skaven81 skaven81 changed the title Gatekeeper needs a default+override functionality, and/or grouping of Constraints with evaluation ordering Gatekeeper needs a default+override functionality Oct 18, 2023
@maxsmythe
Copy link
Contributor

This is an interesting idea. I'm wondering how generalizable it is.

It seems like it depends on a set of exceptions fitting in to a hierarchy of permissiveness (e.g. sets of parameters that are increasingly more strict such that the first match will provide an exception that "short circuits" stricter scrutiny).

This definitely makes sense for numeric-type constraints (CPU, RAM limits and similar).

It could also make sense for something like pod security standards, where privileged > baseline > restricted.

How many other styles of policy fit into that mold? One example of a probably poor fit would be allowed container registries. It's possible to imagine a system of registries that are tiered by sensitivity, but I don't know if that's a common use case.

The simplicity also depends on the sets of exemption labels being disjoint. For example, let's say a user wants to use "baseline" but needs to have "host ports" disabled in their "networking" namespace. They also have to use "privileged" for their "system" namespace. Once the labels stop being disjoint, the hierarchical nature of the exceptions breaks down slightly (it's still expressible, but the complexity starts to creep back in).

I think knowing the range of use cases and how these exceptions grow in practice would be important for grounding/evaluating a design.

If the overall usefulness is limited to a few constraints (specifically the numeric ones), one possibility may be to do the override in Rego, taking the override directly from the label. Here is a (partially implemented, pseudocode) example of what that might look like:

apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8smaxram
spec:
  crd:
    spec:
      names:
        kind: K8sMaxRAM
      validation:
        openAPIV3Schema:
          type: object
          properties:
            defaultMaxRAM:
              type: string
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8smaxram

        get_max_ram_from_namespace(object) = max {
          ns := data.inventory.cluster.v1.Namespace[object.metadata.namespace]
          max := ns.metadata.labels["max-ram-exception"]
        }

        get_max_ram(object) = max {
            not get_max_ram_from_namespace(object)
            max := input.parameters.defaultMaxRAM
        }
        
        get_max_ram(object) = max {
            max := get_max_ram_from_namespace(object)
        }

        violation[{"msg": msg}] {
           max_ram := get_max_ram(input.review.object)
           ram_used := input.review.object.spec.ram
           ram_used > max_ram
           msg := sprintf("ram used must be less than %v", [max_ram])
        }

The advantage of using the label value directly would be that admins can easily set whatever maximum they want on a per-namespace basis without needing to modify any constraints or label selectors. This should work for all constraints if you want a per-namespace override and is possible today.

@skaven81
Copy link
Author

It seems like it depends on a set of exceptions fitting in to a hierarchy of permissiveness (e.g. sets of parameters that are increasingly more strict such that the first match will provide an exception that "short circuits" stricter scrutiny).

No, I wouldn't say that. Creating such a heirarchy is of course possible with the construct I've proposed, but the policy exceptions I'm descrbing have no need to be "tiered" or "ordered" in any way.

The key feature I'm looking for is a default. That just can't be done today with Gatekeeper without a lot of customization in the ConstraintTemplates themselves. What I would like to see is the ability to set a default policy enforcement behavior, and then punch out specific exceptions to the rule, without having to manually manage a collection of mutually-exclusive-collectively-exhaustive matching selectors.

In programming languages, this is just a case statement. That's all -- a case statement doesn't care if the decisions it is making are "tiered" or "ordered" or have a "priority", beyond the fact that it's "first match wins".

How many other styles of policy fit into that mold? One example of a probably poor fit would be allowed container registries.

Why would allowed container registries be a poor fit? Seems like a perfectly reasonable fit to me, if you think of it in terms of a multi-tenant cluster where you want a default behavior for most tenants with overrides for a few. Using my own proposed syntax, the "allowed container registries" policy would look something like this:

spec:
  match:
    kinds:
    - apiGroups: [ "" ]
      kinds: [ Pod ]
  parameterSelection:
  - match:
      namespaceSelector:
        field.cattle.io/projectId: special-team1
    parameters:
      allowed-registries:
      - gcr.io
      - hub.docker.io
  - match:
      namespaceSelector:
        field.cattle.io/projectId: special-team2
    parameters:
      allowed-registries:
      - hub.docker.io
  parameters:
    allowed-registries:
    - internal.artifactory.example.com

Observe that in this case the default case is just the spec.parameters. My proposal suggests that upon a parameterSelection match, the spec.parameterSelection[<match>].parameters object is merged over the spec.parameters object. So this results in the desired behavior of:

Special Team 1

parameters:
  allowed-registries:
  - internal.artifactory.example.com
  - gcr.io
  - hub.docker.io

Special Team 2

parameters:
  allowed-registries:
  - internal.artifactory.example.com
  - hub.docker.io

Everybody else

parameters:
  allowed-registries:
  - internal.artifactory.example.com

That "everybody else" part is what is unique and desirable about this way of configuring a Constraint. In the absence of a default+overrides mechanism, I would have to create three separate Constraints, and very carefully ensure that the match field of all three Constraints are mutually exclusive and collectively exhaustive, to get the desired behavior.

Once the labels stop being disjoint, the hierarchical nature of the exceptions breaks down slightly (it's still expressible, but the complexity starts to creep back in).

That's only if you think of this strictly in terms of a heirarchy. It's not a heirarchy. It's just a default behavior, with the ability to express overrides. Who cares if the labels and selectors are disjoint, because it's still "first one wins" (like a case statement) or (as I suggest) optionally "all that match win" which still gives desirable behavior for certain cases. I get that a given team might need exceptions across multiple Constraints to achieve a certain customized security profile that differs from the default. But that's precisely why I think this syntax/model will work. And I know it will work, because I'm already implementing it at my company, just in Rego instead of a baked-in feature of Constraints. I've written all my Constraint Templates with additional Rego that allows for a default-and-overrides syntax in spec.parameters. The proposed syntax in this RFE is directly inspired by my experience doing this exact same thing in production already.

If the overall usefulness is limited to a few constraints (specifically the numeric ones),

It would not be limited to just "orderable" or "sortable" things like numeric values. It will work for any set of parameters to a Constraint.

The challenge with your proposal of using labels on the namespace itself to create these sorts of exceptions, is that it's limited to just a namespace. That only works for namespaced multi-tenancy. With Project-based multi-tenancy, where groups of namespaces are assigned automatically to tenants upon creation (this is how Rancher does things and it's glorious), we would need some kind of controller that synchronizes these exception labels across all the namespaces in a given Project. But Rancher already does that for us -- it provides field.cattle.io/projectId labels on every namespace (and has validating admission webhooks that prevent unprivileged tenants from modifying that label). So I would like to be able to declare exceptions to the default behavior in my Constraints, by leveraging that label. Not by creating a whole new ecosystem of lables on namespaces that have to both be protected from abuse and also synchronized among the namespaces in a Project.

I think knowing the range of use cases and how these exceptions grow in practice would be important for grounding/evaluating a design.

I can provide dozens of use cases if desired. As I mentioned earlier, we are already doing this in production across nearly 100 multi-tenant Kubernetes clusters with nearly 1000 different development teams. And it works really well...except for the fact that I have to add quite a bit of custom Rego to every Constraint Template I write, to patch in the default-and-overrides behavior I need.

@skaven81
Copy link
Author

I updated the top level description based on the discussion in #3081.

I should also clarify that my objective here is not to create a single, monolithic Constraint that handles everything in the cluster. I would still have dozens of Constraints. It's just that each Constraint would focus on a single policy/security "topic" with the parameters selected at runtime based on a ruleset (e.g. a case statement) that can match some aspect of the object under evaluation and its containing namespace.

I should also clarify that I'm not looking to uproot the current model -- I was quite careful in my proposal to ensure that existing Constraints with a typical match + parameters construction, would continue to work just fine for any Gatekeeper users who don't want to use the parameterSelection model. It would just be an additional bit of flexibility granted to Gatekeeper users that would rather avoid having to maintain dozens of Constraints in a "group" that all do the same thing, just with MECE match fields and slightly different parameters.

For cluster administrators that have no need for a Constraint that has a "default" application across the cluster, or for those that don't do multi-tenancy, or for those that don't mind using the existing methods provided by match to create a MECE group, more power to them. I'm asking for this feature because it would make my life a lot easier and simplify the application, management, and testing of Gatekeeper policy in all my clusters.

@skaven81
Copy link
Author

I'll reinforce again that deciding whether the basic concept of "default+overrides" is applicable to Gatekeeper, has already been answered by the Kyverno team, as they're already doing it. They've created CRDs for it and everything. I don't personally like their solution (and I find Kyverno rather obtuse) but clearly there is demand out there for the ability to establish some kind of default behavior and then punching out overrides to that default as needed.

@maxsmythe
Copy link
Contributor

maxsmythe commented Oct 24, 2023

No, I wouldn't say that. Creating such a heirarchy is of course possible with the construct I've proposed, but the policy exceptions I'm descrbing have no need to be "tiered" or "ordered" in any way.

That's fair, I got distracted by the cascading nature of the RAM example.

Why would allowed container registries be a poor fit? Seems like a perfectly reasonable fit to me, if you think of it in terms of a multi-tenant cluster where you want a default behavior for most tenants with overrides for a few.

Agreed. I assumed it was a poor fit due to the "ordered" lens I put on the initial idea.

That only works for namespaced multi-tenancy. With Project-based multi-tenancy, where groups of namespaces are assigned automatically to tenants upon creation (this is how Rancher does things and it's glorious), we would need some kind of controller that synchronizes these exception labels across all the namespaces in a given Project.

+1

An alternative API design with the same effect could be to create a "MultiConstraint" object (better name TBD). Something like below:

kind: MultiConstraint
metadata:
  name: memory-limit-enforcement
spec:
  constraintKind: K8sMemoryLimits
  match:
    kinds:
    - apiGroups: [ "" ]
      kinds: [ Pod ]
   constraints:
   - name: max-64g
     spec:
       match:
         namespaceSelector:
           matchLabels:
             field.cattle.io/projectId: 64g-team
       parameters:
         limit: 64Gi
   - name: max-32g
     spec:
       match:
         namespaceSelector:
           matchLabels:
             field.cattle.io/projectId: 32g-team
       parameters:
         limit: 32Gi
   - name: max-16g
     spec
       parameters:
         limit: 64Gi

Or perhaps, to preserve schema validation, create a per-template GVK, something like K8sMemoryLimitsMultiConstraint (though max string length for the kind could become a concern, also CRD proliferation).

Would need to think about the correct balance between expressiveness/DRY-ness, but doing it this way could allow for things like overriding enforcement actions and would decouple the proposed default/exception model from how constraints may evolve in the future (example, multi-enforcement-point-actions)

I'm guessing there are plenty of other roughly-logically-equivalent ways to express this MECE-achieving intent worth exploring.

There are a couple of ancillary concerns here:

K8s Validating Admission Policy

K8s Validating Admission Policy is intended to have the API server itself enforce admission policies. One plan Gatekeeper has is to project templates (at least those which can be projected) onto VAP primitives.

I'm not sure VAP currently has enough expressiveness to make projection of this viable. I don't think this is a deal breaker -- there will be plenty of constraint templates that will not be compatible with VAP (for instance, anything that uses referential data). However, it's worth considering if there's anything to be done about that, or how to make the UX/asymmetry grok-able to users.

Rego performance

Right now we pre-cache constraints in Rego's data store. IIRC this is to avoid the serialization/deserialization penalty that injecting constraints at eval time would create. If we need to dynamically choose which constraint to use, that approach becomes less viable.

@davis-haba the work on iterable cached data may be useful here. I could imagine sideloading a "constraints" data storage object per-request, which could be dynamic AND bypass serialization. This would have the side benefit of allowing us to remove all of these methods from the Driver interface:

https://github.com/open-policy-agent/frameworks/blob/3eb381ce6cbedf3c1adedf2bfb20aa5e491c5baa/constraint/pkg/client/drivers/interface.go#L25-L41

@skaven81
Copy link
Author

I agree that there is room for debate and discussion about the technical implementation, and also that whatever model is accepted should fit cleanly into the rest of the Gatekeeper code stack and not create unnecessary diversion from upstream K8s architectural direction.

Really the key thing that would satisfy my particular use case is a concise and flexible way to express MECE collections of Constraints. It's technically possible today but it's not very concise and requires repetition across multiple Constraints to achieve MECE.

Perhaps we can come up with a small but simple approach that would get us 80% of the way there with 20% of the effort.

The MultiConstraint resource I think is a reasonable approach, provided the way it would work is more or less like what I propoosed, where Gatekeeper evaluates each of the embedded Constraints in order and stops once it finds a match (or falls through to a default condition). A simple tweak would also allow for MultiConstraint to function as a collection of traditional Constraints, just by having a flag that instructs Gatekeeper whether or not to continue evaluating the list once it finds a match. There could be some benefits to collecting a bunch of similar constraints (even if they are not intended to use the default-and-overrides MECE pattern) into a single resource, simply for the sake of clarity.

@skaven81
Copy link
Author

If you're looking for suggestions, I like the names OrderedConstraint for a resource type that cannot be coerced into a simple group. For a resource that has a configurable behavior for evaluating the list of Constraints, GroupedConstraint or ConstraintGroup might be good. It's also possible to have both, with OrderedConstraint and GroupedConstraint being the exact same resource spec (just a different kind) but handled differently by the controller -- OrderedConstraint stops once it hits the first match, while GroupedConstraint evaluates all of them and processes any matches.

@maxsmythe
Copy link
Contributor

Perhaps we can come up with a small but simple approach that would get us 80% of the way there with 20% of the effort.

Curious if you have any thoughts on this direction, especially if it is an incremental move to a more fully-realized solution

The MultiConstraint resource I think is a reasonable approach, provided the way it would work is more or less like what I propoosed, where Gatekeeper evaluates each of the embedded Constraints in order and stops once it finds a match (or falls through to a default condition).

Yep! That was my thought.

There could be some benefits to collecting a bunch of similar constraints (even if they are not intended to use the default-and-overrides MECE pattern) into a single resource, simply for the sake of clarity.

Curious if there are benefits other than clarity here. E.g. useful evaluation methods beyond "first matched only".

Another question would be if all the constraints in a collection resource must be the same kind or not: mixed kinds may be more useful, but the accounting may be harder as templates are added/removed. Of course, mixed kinds could always be added later if room were left in the API.

One design feature of constraints I don't want to lose is composability (AFACT nothing here threatens this). Basically that the consequences of adding/removing a constraint are well-defined and that constraints are non-interacting amongst each other. This allows G8r to play nicely with K8s eventual consistency and means users can mix constraints from various sources without worrying about a regression in enforcement.

If you're looking for suggestions, I like the names OrderedConstraint for a resource type that cannot be coerced into a simple group. For a resource that has a configurable behavior for evaluating the list of Constraints, GroupedConstraint or ConstraintGroup might be good. It's also possible to have both, with OrderedConstraint and GroupedConstraint being the exact same resource spec (just a different kind) but handled differently by the controller -- OrderedConstraint stops once it hits the first match, while GroupedConstraint evaluates all of them and processes any matches.

Definitely like the names for the ideas proposed. I think knowing what a good path forward would be to know the shape of what we want (e.g. is there an 80% story in here somewhere), then we'll be in a good place to name things.

@skaven81
Copy link
Author

skaven81 commented Oct 25, 2023

Curious if there are benefits other than clarity here. E.g. useful evaluation methods beyond "first matched only".

Really just for clarity. With traditional Constraints, grouping a bunch together requires something like a naming convention, to make it obvious which Constraints are in or out of the group. The only "built in" grouping mechanism is for the kind of the Constraints (which, to be fair, is a pretty good grouping mechanism). But if I kubectl get constraint and get 50 results back, but really there's only five actual "policy constraints" -- just each of them have ten different match selectors and parameters -- you can probably imagine that it would feel a lot tidier to be able to group those together so that a single intent maps to a single resource. I suppose what I'm proposing here includes some degree of expanding intent from a single (kind, match, parameters) tuple to a more expressive (kind, (match, parameters), [(match, parameters),...]) tuple.

Another question would be if all the constraints in a collection resource must be the same kind or not

I could see the value in allowing mixed kinds for a generic GroupedConstraint (one that doesn't break out of evaluation upon finding a match). But as noted above, other than visual/conceptual clarity, such a construct probably doesn't hold much additional value to begin with, so I think we can out-scope it.

Which leaves the MultiConstraint/OrderedConstraint model, which my gut tells me should stick to a single kind. I'm imagining the kind of trouble one could get themselves into trying to predict the behavior of a MultiConstraint that could actually end up enforcing different policies (e.g. activating different ConstraintTemplates as it works through the evaluation looking for a match. Feels very "spooky action at a distance". Opaque and tough to troubleshoot. So I think we should stick with a single kind for this type of Constraint grouping/evaluation model.

I think knowing what a good path forward would be to know the shape of what we want (e.g. is there an 80% story in here somewhere), then we'll be in a good place to name things.

Based on the discussion so far (which has been great!) I feel like we've landed on a few key architectural decisions:

  1. Scope for the new feature is limited to just the "first match wins" style of ordered evaluation. More generic "grouping" is out-scoped as it would create additional complexity (both for implementation and for testing) without adding anything that isn't already possible today.
  2. The added approach will apply only to groups of Constraints of the same kind. This will ensure predictable behavior: an ordered Constraint still enforces a single "policy" just like traditional Constraints do, it's just we're adding a more flexible way of determining whether a given resource violates that policy. I have to imagine this also has performance benefits, as Gatekeeper only needs to cache a single ConstraintTemplate and Rego program to evaluate an ordered Constraint.
  3. The intended behavior is for an ordered Constraint to contain an ordered list of (match, parameters) tuples. Each match is evaluated in turn. The first match that matches the resource under evaluation, uses its parameters to execute the ConstraintTemplate's Rego. The resulting violation (or admission) is then passed out.

Where it sounds like we still have some discussion is:

  • When evaluating a list of (match, parameters) tuples, what does a missing or empty match mean? I believe it should mean "match anything (that has already matched the parent match)". Is there any value in allowing for a negative case (match nothing)? Since we're limiting ourselves to a linear "first one wins" approach, there is no practical value in a "match nothing" as it would just be skipped over every time. Even if it's made that a "match nothing" triggers a halt to evaluation, it's not like there's a branch that could take evaluation to the items below the "match nothing". I'm rambling a bit but suffice to say I think it's reasonable to have an empty match mean "match everything" and not provide for a symmetrical "match nothing".
  • Do we create a new custom resource named (for example) OrderedConstraint? If we do, then this would deviate from the current model of how Constraints work, in the sense that there is no kind: Constraint at all. Constraints are instances of a resource kind that are a reflection of a ConstraintTemplate. And since we've noted above that an "ordered Constraint" will only be of a single Kind, I think it makes logical sense to maintain the current model, where implementing a policy enforcement is done by creating an instance of the kind created by the ConstraintTemplate.
  • If the above is accepted, then we are keeping the current basic Constraint structure -- as it would surely be a noodly mess to have each ConstraintTemplate create two CRDs each -- one for a "single Constraint" and another for an "ordered Constraint". Not impossible, but sure seems needlessly complex to me when we consider that....
  • The current development and operational model around Constraint Templates and Constraints is that the ConstraintTemplate contains a CRD template (which in turn contains an OpenAPIV3 spec for the parameters). That means it can be copied and pasted anywhere into the CRD that gets created. So if we go back to my original proposal, the CRD created by a ConstraintTemplate would simply add a new attribute, spec.parameterSelection (or a better name if we can think of one). This would be an array of objects, each object having an optional match (which Gatekeeper already has the spec for), and a required parameters (which is just copied and pasted from the spec.parameters spec).
  • Such an approach means that existing Constraints are unaffected, as spec.match and spec.parameters are unchanged, and a Constraint with a missing spec.parameterSelection just behaves like a Constraint does today. But the new behavior is now available just by adding an attribute to implement the override.
  • What about enforcementAction? Do we allow overrides to change the Constraint's enforcementAction? My gut again says no -- as it would cause the Constraint's behavior to no longer be self-documenting and transparent. When one lists Constraints, it's nice to have the enforcementAction listed right there. If overrides can change that, then trying to express to an administrator what the current state of enforcement is for a given policy would get a lot more complicated. I can think of cases where it would be valuable to change enforcementAction in an override, but I think the loss in clarity and added complexity for troubleshooting, makes it not worth it.

I feel like this is a way more elegant approach than Kyverno's, which implemented a special CRD just for expressing overrides. With what we're hopefully converging on here, one can start with normal Constraint that behaves in some global fashion. Then when that first exception case comes along, instead of having to create an entirely new resource (which one then has to remember interacts with the original "default" resource), the administrator just has to add a handful of lines of YAML into the original Constraint to clearly express the exceptional condition. As more exceptions appear it's trivial to continue adding to the list. It's self-documenting (which after all is one of the hallmarks of Gatekeeper's approach over competing solutions) and transparent.

@eleeaz95
Copy link

eleeaz95 commented Dec 12, 2023

This is a very good feature, I hope it will be implemented in the short time.

My 2 cents:
We have similar use case, where I have a cluster wide baseline policy for e.g. hostPorts.

  1. I have applied that policy but now I need my node-exporter to use hostNetwork in a port out of the specified range.
  2. I've created another policy using the same constraintTemplate but this time specifying the prometheus namespace/pod and the desired hostPort.

The problem:

  • I am still receiving policy violation from the baseline policy.

Possible solution?

  • Besides the options already mentioned, we could have a priority or weight annotation (? for constraints under the same constraintTemplate. (Quite basic proposition, other options may be more resilient.)

Example:

Template:

apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8sallowedhostports
...
spec:
  crd:
    spec:
      names:
        kind: K8sPSPHostNetworkingPorts
...

Baseline constraint:

apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sPSPHostNetworkingPorts
metadata:
  name: psp-host-network-ports
  annotations:
    metadata.gatekeeper.sh/weight: 100  # example new feature
spec:
  enforcementAction: warn
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Pod"]
  parameters:
    hostNetwork: true
    min: 8000
    max: 9000

Specific constraint:

apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sPSPHostNetworkingPorts
metadata:
  name: psp-host-network-ports-prometheus-node-exporter
  annotations:
    metadata.gatekeeper.sh/weight: 90  # example new feature
spec:
   ...
    namespaces:
      - monitoring
    name: kube-prometheus-stack-prometheus-node-exporter          
  parameters:
    hostNetwork: true
    min: 9100
    max: 9100

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants